Is Buffer Cache Still Effective for High Speed PCM (Phase Change Memory) Storage?

Size: px
Start display at page:

Download "Is Buffer Cache Still Effective for High Speed PCM (Phase Change Memory) Storage?"

Transcription

1 2011 IEEE 17th International Conference on Parallel and Distributed Systems Is Buffer Cache Still Effective for High Speed PCM (Phase Change Memory) Storage? Eunji Lee, Daeha Jin, Kern Koh Dept. of Computer Engineering Seoul National University Seoul, Korea {ejlee, dhjin, Hyokyung Bahn Dept. of Computer Science and Engineering Ewha University Seoul, Korea Abstract Recently, PCM (phase change memory) emerges as a new storage media and there is a bright prospect that PCM will be used as a storage device in the near future. Since the optimistic access time of PCM is expected to be almost identical to that of DRAM, we can make a question that the traditional buffer cache will be still effective for high speed secondary storage such as PCM. This paper answers it by showing that the buffer cache is still effective in such environments due to the software overhead and the bimodal block reference characteristics. Based on this observation, we present a new buffer cache management scheme appropriately for the system where the speed gap between the cache and storage is small. To this end, we analyze the condition that caching gains and find the characteristics of I/O traces that can be exploited in managing buffer cache for PCM-storage. Keywords-Phase Change Memory; Buffer Cache; File Systmes; I. INTRODUCTION For decades, the wide speed gap between main memory and hard disks has been a serious performance bottleneck in computer systems. To relieve this problem, operating systems store requested disk blocks in a certain part of main memory called buffer cache, thereby servicing subsequent requests directly without accessing slow disk storage. The primary issue of buffer cache management is in minimizing the number of disk accesses, as hard disk is five or six orders of magnitude slower than DRAM buffer cache. Recently, high-speed nonvolatile storage technologies such as PCM (Phase Change Memory) emerge and there is a bright prospect that PCM will be used as a storage replacing hard disk or coexisting with it in 2020 [1]. This may be possible due to the rapid enhancement of micro-fabrication processes and multi-level cell (MLC) technologies [2, 3, 4]. It is anticipated that the cost of PCM will be no more than 3-5x of hard disk (HDD), and its power consumption will also be 10x lower than HDD. Furthermore, PCM is byte-addressable and its optimistic access time is expected to be almost identical to that of DRAM. Thus, some research community considers PCM as a disk-like secondary storage [5, 6, 7, 8], while others use it as a part of DRAM-like memory hierarchy [9-13, 19]. Among these two branches of research trend, this paper considers PCM as storage devices in line with studies by Venkataraman et al, Condit et al, etc [5, 6]. With this basic situation, we can make a question that the traditional buffer cache will be still effective for high speed secondary storage such as PCM. The first contribution of this paper lies in that we answer it by analyzing the condition that caching of a block will gain or not when the storage access time is relatively small or identical to buffer cache. Though the access time of DRAM and PCM could be identical, our empirical analysis shows that accessing a block from PCM storage is 1.1 to 1.3 times slower than accessing it from DRAM buffer cache due to the software overhead of I/O processing. Furthermore, current hardware technologies indicate that PCM access time may be slightly longer than that of DRAM [19]. By considering overall these situations, we show the condition when the caching of a block will be effective according to the different access time of secondary storage. Our results show that a cached block from PCM will be beneficial to I/O performance only when the number of hits during each cache residence is more than 2, and in some cases, it would be 3 to 5 depending on the cache miss penalty. The reason is that caching itself requires an additional cost to store the block in the buffer cache. Specifically, if a block is stored to buffer cache and then delivered to user program space, one more memory copy is needed. To offset this overhead and gain from caching, more than a certain number of subsequent cache hits are needed. In other words, if a block is likely to be referenced less than a certain lower bound during in the cache, it would be more effective to filter out the block from caching. We show this condition through the measurement of each time component during file I/O processes and then analyze them. We analyze various file I/O traces, and make two prominent observations that can be exploited in managing buffer cache for PCM-storage, which is our second contribution. The first observation is that a large portion of blocks are referenced only once during in the cache, which would degrade the I/O performance if they are cached in our environments. Secondly, we show that blocks referenced more than twice are highly likely to be re-referenced many times in the future. These hot blocks should be the target of buffer caching in our PCM storage environments. Based on the aforementioned observations, our third contribution is to present a new buffer cache management /11 $ IEEE DOI /ICPADS

2 scheme appropriately for PCM-based storage systems. Before caching a referenced block, our proposed scheme estimates whether it will be referenced more than twice in the cache. Specifically, we do not cache a block when it is first referenced, and insert it in the cache only after its second reference occurs within a certain time window. To do this, we use a small amount of history buffer that does not store the contents of actual blocks, but maintains the information that the blocks were referenced recently. This is reasonable since we have shown that blocks referenced twice are highly likely to be re-referenced many times in the future. One problem with this scheme is that it incurs a cache miss for the first reference of those blocks that will be referenced more than three times as it does not cache at the first reference. However, unlike HDD environments that require five or six orders of magnitude larger accessing cost compared to accessing the cache, the miss penalty of PCM is slightly larger than a cache access, and thus retrieving it again from PCM incurs reasonably small cost. Furthermore, as we do not know blocks to be referenced more than three times a priori, the benefit of filtering a large amount of single-referenced blocks will be more effective than the cost of accessing PCM one more time. Our final contribution is to extend the algorithm to consider the asymmetric read/write operation cost of PCM. Specifically, the write access time of PCM is expected to be about 8-10 times slower than that of DRAM [14]. Thus, it is intuitive that caching of a write reference always gains. This is because bypassing of a write reference directly incurs an expensive PCM write operation. However, our analysis shows that this is not the case and the filtering of the first write reference is also effective in some cases. The reason is shown through the analysis of various file I/O traces, in which a large portion of write references are made only once during the residence of cache. Simulation experiments with various I/O traces show that our scheme improves the performance of file systems by 23% on average and up to 75%. The remainder of this paper is organized as follows. Section II shows the motivation of this research through analyzing various file I/O traces and discussing the new buffer caching condition where the miss penalty is very small. Section III presents a new buffer cache management algorithm for PCM storage. Then, Section IV presents our experimental results obtained through trace-driven simulations to assess the effectiveness of the proposed algorithm. Section V discusses some related work on this research. Finally, we conclude this Figure 1. Access time to file system and buffer cache on ramdisk 10 paper in Section VI. Figure 2. The minumum number of hits to make caching profitable II. MOTIVATION A. Modeling the cache performance in PCM storage To investigate whether the conventional buffer cache is still effective in fast storage devices, we measured several time components of file system operations in the Linux. We modified the ext2 file system and measured the time of directly accessing data in the storage file system bypassing the buffer cache and the time of accessing data in the buffer cache. As PCM is not commercially available and the read performance of PCM is similar to that of DRAM, we used ram-disk consisting of DRAM as PCM file storage in our measurement. As Figure 1 shows, the access time of data from secondary storage is 1.1 to 1.3 times longer than accessing it from the buffer cache. Though we assume the device access time of DRAM and PCM to be identical, this speed gap happens due to the overhead of file system software layers. Based on this result, one can think that buffer cache is still effective for file accesses in fast secondary storage whose physical access time is identical to that of main memory. However, we show that this is not always true but some conditions should be satisfied for caching to be efficient. In the experiments shown in Figure 1, the storage access time T does not include the buffer cache layer overhead. That is, the retrieved block from PCM bypasses the buffer cache, and then it is transferred directly to user memory. If buffer cache is used, the missed block should be stored into the buffer cache first and then copied to user space. This additional memory copy overhead is important in the PCM storage environment as the time required for one additional memory copy is similar to PCM access time. This implies that the benefit of caching becomes smaller or caching does not gain in the worst case due to the trade-off between the caching overhead and the PCM access time. Thus, to make caching profitable, the gain of cache hits should be larger than the caching overhead. In other words, a certain number of hits may be needed for a cached block to be beneficial. This is different from slow storage devices such as hard disk where a single hit of a cached block is always beneficial due to significant time difference between caching overhead and storage accesses. 357

3 Figure 3. The ratio of single reference (zero hit) blocks during residing in cache For quantitative comparison, we measure the time component of cache hit t, storage access time bypassing the cache T, and the cache miss penalty Tm including the cache uploading overhead of the requested block from storage. Then, we find the condition where caching is effective in terms of the number of hits. The following expressions represent the time required to service a requested block with cache Tcache and without cache Tno_cache, respectively. Tcache = Tm + (n 1) t Tno_cache = n T where n is the number of references for a block during in the cache. The condition that makes caching profitable is obtained when Tcache is smaller than Tno_cache. This condition can be expressed as n > (Tm t) / (T t). With the above equation, we can calculate the minimum number of references n, necessary for a cached block to make the caching profitable as the storage access time varies. Figure 2 plots the above equation as a graph. As seen in the graph, the number of cache hits required for caching to gain increases dramatically as the performance gap between memory and storage becomes smaller. Applying this model to the PCM storage system where the read access is only 1.3 times slower than memory, the number of cache hits required for caching to be beneficial is at least twice during in the cache. Thus, blocks accessed less than three times are better to bypass the cache in order to improve the performance. B. Analyzing the file I/O traces Now let us analyze the file I/O access traces to investigate the hit count distribution for cached blocks. Figure 3 shows the ratio of blocks that does not referenced again in the cache before evicted from the cache, varying the cache size from 0.1 to 1.0 relatively to the total I/O footprint. In reality, the cache size of 1.0 implies an identical condition to the infinite cache capacity that a complete block references in the trace can be cached at the same time. This is unrealistic condition but we present it to show the complete trend of hit count distribution as a function of the cache size. In practical aspects, the cache size smaller than 0.5 will represent most of real system situations. As shown in Figure 3, the ratio of blocks accessed only once (i.e., zero hit) accounts for a large portion of cached blocks. This large amount of single-referenced blocks does not gain to cache performance as they are not re-referenced at all before evicted from the cache. Some kind of file I/O accesses such as sequential references or large loop references whose length exceeds the cache size can make this situation. We make another important analysis for efficient caching in PCM storage environments. That is, blocks incurring hits during in the cache are highly likely to make multiple hits before they are evicted from the cache. Figure 4 shows the ratio of multiple hits as a function of the cache size for each trace. As shown in the figure, multiple hits for a cached block accounts for 88-99% of the total cache hits. This indicates that blocks referenced more than twice tend to be referenced again in the near future, that should be the target of our caching. In summary, the hit count of blocks in the buffer cache exhibits bimodal distributions where the most of cached blocks make either no hits or many hits. III. BUFFER CACHE FOR FAST STORAGE In this section, we present a new buffer cache management scheme for fast storage devices, specially targeting on PCM storage, whose access time is almost identical to DRAM memory access time. Figure 4. The ratio of multiple hits in the cache among total hit references 358

4 A. Selective Cache Bypassing Scheme The principle of caching is to retrieve a block from slow storage and maintain it in the cache even after servicing the current request assuming the block to be requested again in the near future. Usually, we can achieve the performance gain even when the cached block is subsequently requested only once during in the cache. However, this is not the case for fast storage devices such as PCM as shown in Section II. Specifically, if the number of re-references after being cached does not large enough to offset the caching cost, caching would lead to the performance degradation even though a certain number of hits occur from the cache. As the speed gap between storage and memory becomes narrow, the marginal cache hits required to cover the caching cost also increase. As the relative storage access time varies from 10 5 to 1.1 compared to the cache access time, the minimum number of hits to obtain a net profit also changes from 1 to 8. This situation will incur much more cases that a cached block eventually becomes non-profitable. As a result, we need to predict such non-profitable blocks that are unlikely to be re-referenced enough times and bypass those requests from the cache. To this end, we propose the selective cache bypassing scheme that does not cache a block when it is first referenced. The block is allowed to be cached after it is referenced again within a certain period of time window. To maintain the time window, we use a small amount of history buffer that does not store the contents of actual blocks, but maintains the information that the blocks were referenced recently. The optimal size of the history buffer varies depending not only on the workload characteristics but also on the actual cache size, and thus it can be a control parameter to be tuned. We think that finding the optimal size of the history buffer is beyond the scope of this paper. As a basic configuration, we set the size of history buffer to the same size of the actual cache. This is reasonable because a bypassed block itself is not cached but its history information is maintained as if it is cached until it is evicted from the history buffer whose size is identical to actual cache. Note that maintaining this size of history buffer has very low overhead because it only contains a small size of metadata (less than 20 bytes for each block) whereas each of the actual blocks contains 4KB of data. Now, let us return to the description of the selective cache bypassing scheme. The motivation of this scheme is already explained in Section II where the hit count distribution of cached blocks is bimodal (i.e., zero hit or many hits). Thus, the second reference within a short time duration is a good indicator of a block if it will be accessed many times or not in the near future. Therefore, the cache bypassing on the first access is effective in discriminating non-profitable blocks and filtering them out from the cache. The benefit of our selective cache bypassing scheme can be observed in terms of two aspects. First, the time cost of storing non-profitable blocks into the cache can be saved. In addition, our scheme can save the expensive cache space from being polluted by non-profitable blocks. The space can be utilized for more profitable blocks. However, our scheme has a weakness in that it incurs an additional miss for those blocks that are finally placed in the cache. This may incur significant performance degradation when the miss penalty is large like hard disks, but it is not the case for our environments. To quantify the effectiveness of our selective cache bypassing scheme, we analyze the benefit and cost of our scheme in terms of time as follows: Benefit = Tcaching R Cost = Tmiss_penalty (1 R) where R represents the ratio of single-accessed blocks, Tcaching the additional time for storing a block into the cache, and Tmiss_penalty the time of retrieving a block from secondary storage. Then the final gain of our scheme can be calculated as Gain = Benefit Cost. The above equation represents that our scheme gains when the benefit from saving the caching cost of single-accessed blocks is larger than the cost of additional miss penalty caused by the first access bypassing. If the storage access overhead Tmiss_penalty is significantly larger than the caching time cost Tcaching like hard disks, our scheme does not gain unless the ratio of singleaccessed blocks R is high. However, if the storage is fast enough, that implies the values of Tcaching and Tmiss_penalty are similar, our scheme gains even though the portion of singleaccessed blocks is relatively small. For example, when the secondary storage access is 1.3 times slower than the buffer cache like PCM, our scheme always gains when the ratio of single-accessed blocks is just larger than 23%. Since our hit count analysis (Figure 3) indicates that the ratio of singleaccessed blocks is larger than 50% for all practical conditions, we can conclude that our selective bypassing will be beneficial in PCM storage systems. Furthermore, the actual gain will be larger than this analysis as we do not consider the gain from additional cache space through bypassing. B. Considering write performance of PCM Another important issue in applying our scheme to PCM is that PCM has asymmetric read and write operation time. A write operation is known to be about 8-10 times slower than a read operation as shown in Table 1 [14]. Due to this feature of PCM, our scheme may not be efficient for a write operation because bypassing write requests accompanies an expensive write I/O to PCM. When we apply the same Gain expression in the previous paragraph to the write operation, bypassing becomes beneficial when the ratio of single-accessed blocks is larger than 75% of the total write operations. This is very tight condition for caching to be effective. In other words, the expected cost is likely to be larger than the expected benefit. Considering this situation, applying the bypassing scheme to write operations seems to be impractical. Thus, we basically adopt the bypassing scheme to read operations only to avoid the possible performance degradation due to the write overhead of PCM. However, as we will discuss in the next section, empirical results show that bypassing write operations also gains for many cases. The reason is that most write references are made only once. In our traces, more than 75% of writes are single-accessed writes, for practical cache sizes in case of proxy and varmail workloads. On the other hand, we 359

5 cannot observe that the bypassing write operations incurs serious performance degradations in other traces. The reason is that write requests of a block usually follow a read request of the same block. This implies that write operations are difficult to be the first reference of a block. Thus, if we apply the bypassing scheme for write operations, it does not happen frequently. In addition, the effect of bypassing write operations is not significant as most workloads are readintensive. In case of the workloads we used, only one workload, proxy, is write-intensive, and the ratio of read operations is significantly large in the other three workloads, 7x to 37x of write operations. TABLE 1. DRAM AND PCM PERFORMANCE CHARACTERISTICS DRAM PCM Read Latency 50ns 50ns Write Latency 50ns 400~500us IV. EXPERIMENTAL RESULT In this section, we present the performance evaluation results to assess the effectiveness of the selective cache bypassing scheme. Trace-driven simulation is performed to manage the buffer caching system with accurate I/O timing of PCM including software overheads. We collected system call traces with strace utility during running Filebench applications [18]. Our traces consist of four workloads: proxy server, varmail, web server, and video server. The characteristics of these traces are summarized in Table 2. The performance of buffer caching schemes is measured by the total I/O time for the given workloads. We compare the performance of our selective cache bypassing scheme with the conventional no-bypassing scheme. The cache replacement policy is set to the LRU policy for all of our experiments. Figure 5 shows the total I/O time for the two schemes, with the cache size ranging from 0.1 to 1.0 of the maximum cache usage of the program. Cache size of 1.0 means the environment that a complete block references in the trace can be cached at the same time and thus no cache replacement is needed. In practical aspects, the cache size smaller than 0.5 will represent most of real system situations. As shown in Figure 5, our selective bypassing scheme TABLE 2. SUMMARY OF WORKLOAD CHARACTERISTICS Total # of distinct block requests Ratio of ops. (read:write) Operation counts proxy 11,085 1:2.24 1,461,219 varmail 9, :1 213,227 web server 28, :1 1,321,168 video server 13, :1 4,940,520 performs better than the conventional no-bypassing scheme for most cases, and the performance gain is 11% on average, and by up to 36%. In particular, our selective bypassing scheme exhibits excellent performance in the cache size ranging from 0.1 to 0.5 that represents the realistic system environments. In the small cache size, since the time duration that blocks reside in the cache is short, blocks are more likely to be evicted from the cache without hit. In this situation, our bypassing scheme performs well by filtering the large amount of zero-hit cache blocks. It can save a substantial amount of additional memory access cost of storing them into the cache. In addition, the bypassing scheme has an effect of increasing the effective cache size because it reduces the I/O requests coming into the cache and prevents the cache from being polluted with sequential accesses. This gain is larger when the cache size becomes smaller because it cannot accommodate the working-set of program enough. Note that the marginal performance gain per increasing cache size is very large when the workload suffers from short cache capacity. Specifically, our selective bypassing scheme provides the same performance as that of no-bypassing scheme with several times smaller cache. On the other hand, when the cache size becomes larger than 0.5, the performance gap of the two schemes becomes smaller and finally their performances merge to a single point for all traces. The reason is that most references can be accommodated in the cache size of 0.5 irrespective of the cache management schemes. Now, let us investigate the performance of read/write aware bypassing schemes for PCM that has asymmetric read and write cost. We measure the total I/O time for three schemes, R-bypassing that bypasses read operation only, RW-bypassing that bypasses both read and write operations, and nobypassing. The write time of PCM is set to be ten times larger than the read time. Figure 6 shows that the R-bypass and RW- Figure 5. Total I/O time when storage access is 1.3 times longer than cache 360

6 bypass schemes perform better than the conventional nobypassing scheme by 23%, 20% on average and up to 75%, 74%, respectively. Now, let us compare the performance of R-bypass and RWbypass schemes. We have expected that R-bypass performs better than RW-bypass for the most cases, but the result is against our expectation. Specifically, RW-bypassing outperforms R-bypassing by up to 13% and 16% in proxy and varmail workloads. The reason for this improvement is that those traces include a considerable amount of single-accessed writes. To investigate this, we extract the write operations from the workloads and analyze the ratio of single-accessed (zero hit) blocks in cache for the write workloads. As seen in Figure 7, the ratio of single accessed blocks is mostly over 75% in the cases where RW-bypassing performs better than R- bypassing, which is lower bound to make write bypassing beneficial. In summary, though the write bypassing has a relatively smaller benefit compared to the cost of miss penalty, it can enhance the performance when a large number of write references are single-referenced. In the web server trace, R-bypass and RW-bypass schemes exhibit almost identical performance. That is because the webserver trace is read intensive where the read operations are 37 times more than the write operations. Thus, the write bypassing does not affect the overall I/O performance significantly. The video server trace includes many small sized loops and has relatively less single-accessed blocks than other traces. For this reason, the performance enhancement due to bypassing is smallest among the traces. Since the ratio of zero-hit write blocks in cache is not so high, RW-bypass performs worse than R-bypass. As the performance effect of our bypassing scheme dynamically varies according to the workload pattern and cache capacity, as well as storage access time, it is required to monitor the dynamic situation changes and use the bypassing scheme adaptively. We will remain this adaptive scheme as our future work. Before concluding this section, we briefly discuss the write endurance problem of PCM, which is a challenging issue in using PCM as a practical storage system. Since the maximum number of writes allowed for each PCM cell is limited to 10 7 ~10 8, the research community have studied on reducing the amount of writes performed to PCM and balancing the write count of each PCM cell to extend the overall lifetime of PCM. Though this paper does not focus on this issue, we investigate the effect of our scheme on the write amount of PCM. Figure 8 shows the total write amount to PCM with our bypassing schemes and conventional no-bypassing scheme. As seen in the figure, our bypassing schemes reduce writes to PCM significantly compared to the conventional no-bypassing scheme for most cases. When comparing R-bypass and RWbypass schemes, R-bypass is superior to RW-bypass in terms of PCM writes for all cases because write bypassing incurs additional writes to PCM. When the cache size is very large, RW-bypass increases the number of writes compared to nobypassing, but they are not realistic cache conditions. Unlike RW-bypass, R-bypass always reduces the number of PCM writes irrespective of cache size and all workload conditions. Figure 6. Performance of bypass cache algorithms considering asymmteric operation cost Figure 7. The ratio of zero hit blocks in cache for write workloads 361

7 Figure 8. Write amount to PCM storage for three buffer cache algorithms V. RELATED WORKS A. Phase Change Memory Technology In this section, we describe the feasibility of PCM based storage systems. Although there remain several challenges to be resolved such as stability, it is expected that PCM will be used as a main memory and/or secondary storage in the near future. This expectation is based on that fast progress in density and significant benefits in power consumption and performance when replacing DRAM and/or hard disk with PCM. In terms of density, PCM is anticipated to outperform DRAM, and even NAND flash memories. Specifically, DRAM is considered hard to be fabricated beyond 40nm, and NAND flash memory technology has almost reached the scalability limit because it relies on the memory structure that is increasingly difficult to shrink at smaller lithography. In contrast, although the current PCM technology is nothing but 90nm, PCM has already been proven to have stable characteristics to a 5nm node [2]. Therefore, PCM is evaluated to have a promise of scalability beyond that of other memory technologies such as NAND or NOR flash memories. In addition to micro-fabrication process development, multilevel cells (MLC) technology of PCM is accelerating the density progress. Although most PCM prototypes are being produced as single-level cell (SLC) which considers only two states of crystalline and amorphous, recent researches demonstrated the additional intermediate states that enable MLC [3, 4]. MLC can store multiple bits by choosing between multiple levels of electrical charge. Due to MLC technology, PCM is possible to provide an order of magnitude higher scalability than other nonvolatile RAMs such as FeRAM and MRAM, which are structurally hard to provide MLC. For this reason, the major semiconductor manufacturers including Samsung and Intel have an optimistic outlook for the potential of PCM technology. Using PCM instead of DRAM or hard disk can bring a large saving in energy consumption as well. The fact that PCM requires only 1/100 of energy consumption amount of hard disks becomes another momentum towards PCM based storage. Several recent researches are considering the hybrid main memory with large PCM and small amount of DRAM which trades performance and energy efficiency [11, 13]. In conclusion, it has been demonstrated in many previous researches that PCM has potential to be used as large sized storage. Such expectations in those works indicate that designing efficient software for the emerging PCM based storage systems is also necessarily required. B. Software techniques for PCM Recently considerable research has attempted to introduce PCM in conventional hierarchical memory layers. Most approaches exploit PCM as main memory replacing or supplementing DRAM in order to improve performance and energy efficiency. The key issue in these works is how to handle write efficiently to overcome a long write latency and limited endurance of PCM. Mogul et al have suggested an efficient memory management policy for the hybrid memory system consisting of DRAM and PCM. They proposed a page-attribute aware memory allocation policy that places read-only pages like code segment in PCM, while loading read/write pages in DRAM so as to avoid write from occurring in PCM [11]. Querishi et al also have proposed a PCM and DRAM based hybrid main memory. They use a small amount of DRAM as a write buffer for PCM to mitigate wear-out and hide a long write latency of PCM [13]. Lee et al attempt to improve write performance between last level on-chip cache and main memory. They have proposed two policies; buffer reorganization and partial writes, which track data modifications and write only modified cache lines or words to the PCM array [14, 10]. There have been studies that aim to overcome the endurance limitation in using PCM as a write-intensive main memory. Zhou et al have suggested the row shifting and segment swapping techniques in order to mitigate wearing effect and prolong the lifetime of PCM [12]. Ipek et al have proposed a dynamically replicated memory scheme that maps two faulty physical pages into a single logical page, thereby enabling to reuse pages that contain hard faults [9]. Meanwhile, some file systems that consider non-volatile RAM like PCM as final data storage have been suggested. The proposed file systems mostly discuss an efficient way of using a small amount of non-volatile RAM. Pramfs is designed for storing the frequently accessed or strongly important data in a non-volatile ram block, so as to enable the fast reboot and system survival from crash [8]. It is mounted on a block of non-volatile RAM separate from normal system memory. MRAMFS [16] and NEB file system [7] have been suggested in order to improve space efficiency of expensive storage 362

8 based on non-volatile RAM. MRAMFS saves space by applying compression on metadata, while NEB file system improves space efficiency by using extent-based file management. In contrast, as density of non-volatile RAM like PCM progresses rapidly, the recent studies are considering a scalable PCM based storage system as a replacement of conventional storage device like hard disk or flash memories. Some studies prospect fast and scalable non-volatile RAM to bring a unified memory architecture where main memory and storage are served by one single memory device. Baek et al have designed and implemented a software layer to support both file object and memory object together for unified memory system [17]. Moreover, the studies on file system that considers reliability as well as performance have been conducted. Condit et al have suggested a new copy-on-write file system for byte-addressable storage, BPFS [6]. By exploiting byte-accessibility, BPFS performs in-place write when update size is smaller than atomic operation unit, thereby breaking the recursive path node update of copy-onwrite behavior. Venkataraman et al have suggested a novel data structure to store data fast and efficiently in NVN based single-level store [5]. VI. CONCLUSION As high performance storage such as PCM emerges, the effectiveness of traditional buffer cache should be reinvestigated. This paper showed that buffer cache is still effective even when the storage is nearly as fast as main memory due to software overhead. However, since the gain of caching becomes small, the caching is beneficial only for the blocks that will be frequently hit in the cache. We observed that a large portion of cached blocks are never hit in the cache, and the hit count of blocks in the buffer cache exhibits the bimodal distribution; no hits or many hits. Based on this observation, we presented a new buffer cache management scheme called selective cache bypassing that does not cache a block on its first access, and caches it when re-referenced within the time window, regarding re-access as an indicator of many cache hits in the future. Experimental results showed that our scheme outperforms conventional no-bypassing scheme in the PCM storage system by 23% on average and up to 75%. ACKNOWLEDGMENT This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No and No ). REFERENCES [1] R. F. Freitas and W. W. Wilcke, Storage-class memory: the next storage system technology, IBM Journal of Research and Development, Vol. 52, No. 4, , [2] C. D. Wright, M. M Aziz, M. Armand, S Senkader, and W Yu, Can We Reach Tbit/sq.in. Storage Densities With Phase- Change Media? European Phase Change and Ovonics Symposium (EPCOS), [3] F. Bedeschi et al, A multi-level-cell bipolar-selected phasechange memory, International Solid-State Circuits Conf., [4] T. Nirschl et al. Write strategies for 2 and 4-bit multi-level phase-change memory, International Electron Devices Meeting, [5] S. Venkataraman et al, Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory, 9th USENIX Conference on File and Storage Technologies (FAST), [6] J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger and D. Coetzee, Better I/O through byte-addressable, persistent memory, ACM SIGOPS 22nd symposium on Operating systems principles (SOSP), [7] S. Baek, C. Hyun, J. Choi, D. Lee and S. H. Noh, Design and Analysis of a Space Conscious Nonvolatile-RAM File System IEEE Region 10 Conference (TENCON), [8] PRAMFS: [9] E. Ipek, J. Condit, E. B. Nightingale, D. Burger, and T. Moscibroda, Dynamically Replicated Memory: Building Reliable systems from Nanoscale Resistive Memories, Architectural Support for Programming Languages and Operating Systems (ASPLOS), [10] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, Phase Change Memory Architecture and the Quest for Scalability, Communications of the ACM, Vol. 53, Issue 7, pp , [11] J. C. Mogul, E. Argollo, M. Shah and P. Faraboschi, Operating system support for NVM+DRAM hybrid main memory, 12th workshop on Hot Topics in Operating Systems (HotOS XII), [12] P. Zhou, B. Zhao, J. Yang and Y. Zhang, A durable and energy efficient main memory using phase change memory technology, 36th International symposium on Computer Architecture (ISCA), [13] M. K. Qureshi, V. Srinivasan and J. A. Rivers, Scalable high performance main memory system using phase-change memory technology, 36th International symposium on Computer Architecture (ISCA), [14] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, Architecting Phase Change Memory as a Scalable DRAM Alternative, 36th International Symposium on Computer Architecture (ISCA), [15] Numonyx, Phase Change Memory: A new memory to enable new memory usage models. White paper, [16] N. K. Edel, D. Tuteja, E. L. Miller, and S. A. Brandt, MRAMFS: A Compressing File System for Non-Volatile RAM, 12th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, [17] S. Baek, K. Sun, J. Choi, E. Kim, D. Lee and S. H. Noh, Taking advantage of storage class memory technology through system software support, workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), [18] [19] S. Lee, H. Bahn, and S. H. Noh, Characterizing Memory Write References for Efficient Management of Hybrid PCM and DRAM Memory, 19th IEEE Int'l Symp. on Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS),

P2FS: supporting atomic writes for reliable file system design in PCM storage

P2FS: supporting atomic writes for reliable file system design in PCM storage LETTER IEICE Electronics Express, Vol.11, No.13, 1 6 P2FS: supporting atomic writes for reliable file system design in PCM storage Eunji Lee 1, Kern Koh 2, and Hyokyung Bahn 2a) 1 Department of Software,

More information

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,

More information

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective Phase Change Memory An Architecture and Systems Perspective Benjamin Lee Electrical Engineering Stanford University Stanford EE382 2 December 2009 Benjamin Lee 1 :: PCM :: 2 Dec 09 Memory Scaling density,

More information

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

Buffer Caching Algorithms for Storage Class RAMs

Buffer Caching Algorithms for Storage Class RAMs Issue 1, Volume 3, 29 Buffer Caching Algorithms for Storage Class RAMs Junseok Park, Hyunkyoung Choi, Hyokyung Bahn, and Kern Koh Abstract Due to recent advances in semiconductor technologies, storage

More information

Design and Implementation of a Random Access File System for NVRAM

Design and Implementation of a Random Access File System for NVRAM This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Design and Implementation of a Random Access

More information

A Working-set Sensitive Page Replacement Policy for PCM-based Swap Systems

A Working-set Sensitive Page Replacement Policy for PCM-based Swap Systems JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.1, FEBRUARY, 217 ISSN(Print) 1598-1657 https://doi.org/1.5573/jsts.217.17.1.7 ISSN(Online) 2233-4866 A Working-set Sensitive Page Replacement

More information

Chapter 12 Wear Leveling for PCM Using Hot Data Identification

Chapter 12 Wear Leveling for PCM Using Hot Data Identification Chapter 12 Wear Leveling for PCM Using Hot Data Identification Inhwan Choi and Dongkun Shin Abstract Phase change memory (PCM) is the best candidate device among next generation random access memory technologies.

More information

Energy-Aware Writes to Non-Volatile Main Memory

Energy-Aware Writes to Non-Volatile Main Memory Energy-Aware Writes to Non-Volatile Main Memory Jie Chen Ron C. Chiang H. Howie Huang Guru Venkataramani Department of Electrical and Computer Engineering George Washington University, Washington DC ABSTRACT

More information

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Hyunchul Seok Daejeon, Korea hcseok@core.kaist.ac.kr Youngwoo Park Daejeon, Korea ywpark@core.kaist.ac.kr Kyu Ho Park Deajeon,

More information

The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities

The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities Sudhanva Gurumurthi gurumurthi@cs.virginia.edu Multicore Processors Intel Nehalem AMD Phenom IBM POWER6 Future

More information

Phase-change RAM (PRAM)- based Main Memory

Phase-change RAM (PRAM)- based Main Memory Phase-change RAM (PRAM)- based Main Memory Sungjoo Yoo April 19, 2011 Embedded System Architecture Lab. POSTECH sungjoo.yoo@gmail.com Agenda Introduction Current status Hybrid PRAM/DRAM main memory Next

More information

Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory. Hyokyung Bahn (Ewha University)

Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory. Hyokyung Bahn (Ewha University) Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory Hyokyung Bahn (Ewha University) Contents Reliability issues in storage systems Consistency problem Journaling techniques Consistency

More information

Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM

Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM Eunji Lee Julie Kim Hyokyung Bahn* Sam H. Noh Chungbuk Nat l University Cheongju, Korea eunji@cbnu.ac.kr Ewha

More information

Exploiting superpages in a nonvolatile memory file system

Exploiting superpages in a nonvolatile memory file system Exploiting superpages in a nonvolatile memory file system Sheng Qiu Texas A&M University herbert198416@neo.tamu.edu A. L. Narasimha Reddy Texas A&M University reddy@ece.tamu.edu Abstract Emerging nonvolatile

More information

LBM: A Low-power Buffer Management Policy for Heterogeneous Storage in Mobile Consumer Devices

LBM: A Low-power Buffer Management Policy for Heterogeneous Storage in Mobile Consumer Devices LBM: A Low-power Buffer Management Policy for Heterogeneous Storage in Mobile Consumer Devices Hyojung Kang Department of Computer Science, Ewha University, Seoul, Korea Junseok Park Semiconductor Business,

More information

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory Storage Architecture and Software Support for SLC/MLC Combined Flash Memory Soojun Im and Dongkun Shin Sungkyunkwan University Suwon, Korea {lang33, dongkun}@skku.edu ABSTRACT We propose a novel flash

More information

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections ) Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency

More information

APP-LRU: A New Page Replacement Method for PCM/DRAM-Based Hybrid Memory Systems

APP-LRU: A New Page Replacement Method for PCM/DRAM-Based Hybrid Memory Systems APP-LRU: A New Page Replacement Method for PCM/DRAM-Based Hybrid Memory Systems Zhangling Wu 1, Peiquan Jin 1,2, Chengcheng Yang 1, and Lihua Yue 1,2 1 School of Computer Science and Technology, University

More information

NBM: An Efficient Cache Replacement Algorithm for Nonvolatile Buffer Caches

NBM: An Efficient Cache Replacement Algorithm for Nonvolatile Buffer Caches : An Efficient Cache Replacement Algorithm for Nonvolatile Buffer Caches JUNSEOK PARK and KERN KOH Seoul National University 56-1 Shillim-dong, Kwanak-gu, Seoul, 151-742 REPUBLIC OF KOREA HYUNKYOUNG CHOI

More information

A Page-Based Storage Framework for Phase Change Memory

A Page-Based Storage Framework for Phase Change Memory A Page-Based Storage Framework for Phase Change Memory Peiquan Jin, Zhangling Wu, Xiaoliang Wang, Xingjun Hao, Lihua Yue University of Science and Technology of China 2017.5.19 Outline Background Related

More information

Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory

Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory UENIX FA 13 Eunji Lee (Ewha University, eoul, Korea) Hyokyung Bahn (Ewha University) am H. Noh (Hongik University) Outline Reliability

More information

Using Transparent Compression to Improve SSD-based I/O Caches

Using Transparent Compression to Improve SSD-based I/O Caches Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Ilhoon Shin Seoul National University of Science & Technology ilhoon.shin@snut.ac.kr Abstract As the amount of digitized

More information

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China doi:10.21311/001.39.7.41 Implementation of Cache Schedule Strategy in Solid-state Disk Baoping Wang School of software, Nanyang Normal University, Nanyang 473061, Henan, China Chao Yin* School of Information

More information

A Superblock-based Memory Adapter Using Decoupled Dual Buffers for Hiding the Access Latency of Non-volatile Memory

A Superblock-based Memory Adapter Using Decoupled Dual Buffers for Hiding the Access Latency of Non-volatile Memory , October 19-21, 2011, San Francisco, USA A Superblock-based Memory Adapter Using Decoupled Dual Buffers for Hiding the Access Latency of Non-volatile Memory Kwang-Su Jung, Jung-Wook Park, Charles C. Weems

More information

Page Replacement for Write References in NAND Flash Based Virtual Memory Systems

Page Replacement for Write References in NAND Flash Based Virtual Memory Systems Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 3, September 2014, pp. 1-16 Page Replacement for Write References in NAND Flash Based Virtual Memory Systems Hyejeong Lee and Hyokyung

More information

WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems

WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems D. Seo and D. Shin: WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems 803 WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems Dongyoung Seo and Dongkun

More information

Efficient Page Caching Algorithm with Prediction and Migration for a Hybrid Main Memory

Efficient Page Caching Algorithm with Prediction and Migration for a Hybrid Main Memory Efficient Page Caching Algorithm with Prediction and Migration for a Hybrid Main Memory Hyunchul Seok, Youngwoo Park, Ki-Woong Park, and Kyu Ho Park KAIST Daejeon, Korea {hcseok, ywpark, woongbak}@core.kaist.ac.kr

More information

Scalable High Performance Main Memory System Using PCM Technology

Scalable High Performance Main Memory System Using PCM Technology Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY International Symposium on

More information

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:

More information

LAST: Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems

LAST: Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems : Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems Sungjin Lee, Dongkun Shin, Young-Jin Kim and Jihong Kim School of Information and Communication Engineering, Sungkyunkwan

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)

More information

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory Memory Hierarchy Contents Memory System Overview Cache Memory Internal Memory External Memory Virtual Memory Memory Hierarchy Registers In CPU Internal or Main memory Cache RAM External memory Backing

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different

More information

SFS: Random Write Considered Harmful in Solid State Drives

SFS: Random Write Considered Harmful in Solid State Drives SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea

More information

BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks

BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks Hui Zhao 1, Peiquan Jin *1, Puyuan Yang 1, Lihua Yue 1 1 School of Computer Science and Technology, University of Science

More information

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory HotStorage 18 BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory Gyuyoung Park 1, Miryeong Kwon 1, Pratyush Mahapatra 2, Michael Swift 2, and Myoungsoo Jung 1 Yonsei University Computer

More information

Lecture 7: PCM, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

Lecture 7: PCM, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro Lecture 7: M, ache coherence Topics: handling M errors and writes, cache coherence intro 1 hase hange Memory Emerging NVM technology that can replace Flash and DRAM Much higher density; much better scalability;

More information

A Reliable B-Tree Implementation over Flash Memory

A Reliable B-Tree Implementation over Flash Memory A Reliable B-Tree Implementation over Flash Xiaoyan Xiang, Lihua Yue, Zhanzhan Liu, Peng Wei Department of Computer Science and Technology University of Science and Technology of China, Hefei, P.R.China

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department SIP: Speculative Insertion Policy for High Performance Caching Hongil Yoon Tan Zhang Mikko H. Lipasti Technical Report #1676 June 2010 SIP: Speculative Insertion Policy for

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

Amnesic Cache Management for Non-Volatile Memory

Amnesic Cache Management for Non-Volatile Memory Amnesic Cache Management for Non-Volatile Memory Dongwoo Kang, Seungjae Baek, Jongmoo Choi Dankook University, South Korea {kangdw, baeksj, chiojm}@dankook.ac.kr Donghee Lee University of Seoul, South

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories 2011 29th IEEE VLSI Test Symposium Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories Rudrajit Datta and Nur A. Touba Computer Engineering Research

More information

WWW. FUSIONIO. COM. Fusion-io s Solid State Storage A New Standard for Enterprise-Class Reliability Fusion-io, All Rights Reserved.

WWW. FUSIONIO. COM. Fusion-io s Solid State Storage A New Standard for Enterprise-Class Reliability Fusion-io, All Rights Reserved. Fusion-io s Solid State Storage A New Standard for Enterprise-Class Reliability iodrive Fusion-io s Solid State Storage A New Standard for Enterprise-Class Reliability Fusion-io offers solid state storage

More information

Bitmap discard operation for the higher utilization of flash memory storage

Bitmap discard operation for the higher utilization of flash memory storage LETTER IEICE Electronics Express, Vol.13, No.2, 1 10 Bitmap discard operation for the higher utilization of flash memory storage Seung-Ho Lim 1a) and Woo Hyun Ahn 2b) 1 Division of Computer and Electronic

More information

L3/L4 Multiple Level Cache concept using ADS

L3/L4 Multiple Level Cache concept using ADS L3/L4 Multiple Level Cache concept using ADS Hironao Takahashi 1,2, Hafiz Farooq Ahmad 2,3, Kinji Mori 1 1 Department of Computer Science, Tokyo Institute of Technology 2-12-1 Ookayama Meguro, Tokyo, 152-8522,

More information

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS Samira Khan MEMORY IN TODAY S SYSTEM Processor DRAM Memory Storage DRAM is critical for performance

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University NAND Flash Memory Jinkyu Jeong (Jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ICE3028: Embedded Systems Design, Fall 2018, Jinkyu Jeong (jinkyu@skku.edu) Flash

More information

CS 320 February 2, 2018 Ch 5 Memory

CS 320 February 2, 2018 Ch 5 Memory CS 320 February 2, 2018 Ch 5 Memory Main memory often referred to as core by the older generation because core memory was a mainstay of computers until the advent of cheap semi-conductor memory in the

More information

New Memory Organizations For 3D DRAM and PCMs

New Memory Organizations For 3D DRAM and PCMs New Memory Organizations For 3D DRAM and PCMs Ademola Fawibe 1, Jared Sherman 1, Krishna Kavi 1 Mike Ignatowski 2, and David Mayhew 2 1 University of North Texas, AdemolaFawibe@my.unt.edu, JaredSherman@my.unt.edu,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Architecture Exploration of High-Performance PCs with a Solid-State Disk

Architecture Exploration of High-Performance PCs with a Solid-State Disk Architecture Exploration of High-Performance PCs with a Solid-State Disk D. Kim, K. Bang, E.-Y. Chung School of EE, Yonsei University S. Yoon School of EE, Korea University April 21, 2010 1/53 Outline

More information

Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song

Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song Outline 1. Storage-Class Memory (SCM) 2. Motivation 3. Design of Aerie 4. File System Features

More information

Flash Memory Based Storage System

Flash Memory Based Storage System Flash Memory Based Storage System References SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers, ISLPED 06 Energy-Aware Flash Memory Management in Virtual Memory System, islped

More information

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week) + (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (4 th Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Flash Trends: Challenges and Future

Flash Trends: Challenges and Future Flash Trends: Challenges and Future John D. Davis work done at Microsoft Researcher- Silicon Valley in collaboration with Laura Caulfield*, Steve Swanson*, UCSD* 1 My Research Areas of Interest Flash characteristics

More information

JOURNALING techniques have been widely used in modern

JOURNALING techniques have been widely used in modern IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, XXXX 2018 1 Optimizing File Systems with a Write-efficient Journaling Scheme on Non-volatile Memory Xiaoyi Zhang, Dan Feng, Member, IEEE, Yu Hua, Senior

More information

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy. Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality Song Jiang, Lei Zhang, Xinhao Yuan, Hao Hu, and Yu Chen Department of Electrical and Computer Engineering Wayne State

More information

Pseudo SLC. Comparison of SLC, MLC and p-slc structures. pslc

Pseudo SLC. Comparison of SLC, MLC and p-slc structures. pslc 1 Pseudo SLC In the MLC structures, it contains strong pages and weak pages for 2-bit per cell. Pseudo SLC (pslc) is to store only 1bit per cell data on the strong pages of MLC. With this algorithm, it

More information

Architecture Tuning Study: the SimpleScalar Experience

Architecture Tuning Study: the SimpleScalar Experience Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.

More information

Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory

Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory Dong In Shin, Young Jin Yu, Hyeong S. Kim, Jae Woo Choi, Do Yung Jung, Heon Y. Yeom Taejin Infotech, Seoul

More information

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression Xuebin Zhang, Jiangpeng Li, Hao Wang, Kai Zhao and Tong Zhang xuebinzhang.rpi@gmail.com ECSE Department,

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Virtual Memory 11282011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Cache Virtual Memory Projects 3 Memory

More information

AdaMS: Adaptive MLC/SLC Phase-Change Memory Design for File Storage

AdaMS: Adaptive MLC/SLC Phase-Change Memory Design for File Storage B-2 AdaMS: Adaptive MLC/SLC Phase-Change Memory Design for File Storage Xiangyu Dong and Yuan Xie Department of Computer Science and Engineering Pennsylvania State University e-mail: {xydong,yuanxie}@cse.psu.edu

More information

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks Jinho Seol, Hyotaek Shim, Jaegeuk Kim, and Seungryoul Maeng Division of Computer Science School of Electrical Engineering

More information

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Young-Joon Jang and Dongkun Shin Abstract Recent SSDs use parallel architectures with multi-channel and multiway, and manages multiple

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

Optimizing Translation Information Management in NAND Flash Memory Storage Systems Optimizing Translation Information Management in NAND Flash Memory Storage Systems Qi Zhang 1, Xuandong Li 1, Linzhang Wang 1, Tian Zhang 1 Yi Wang 2 and Zili Shao 2 1 State Key Laboratory for Novel Software

More information

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives Chao Sun 1, Asuka Arakawa 1, Ayumi Soga 1, Chihiro Matsui 1 and Ken Takeuchi 1 1 Chuo University Santa Clara,

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Exploring the Potential of Phase Change Memories as an Alternative to DRAM Technology

Exploring the Potential of Phase Change Memories as an Alternative to DRAM Technology Exploring the Potential of Phase Change Memories as an Alternative to DRAM Technology Venkataraman Krishnaswami, Venkatasubramanian Viswanathan Abstract Scalability poses a severe threat to the existing

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo 1 June 4, 2011 2 Outline Introduction System Architecture A Multi-Chipped

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

The Need for Consistent IO Speed in the Financial Services Industry. Silverton Consulting, Inc. StorInt Briefing

The Need for Consistent IO Speed in the Financial Services Industry. Silverton Consulting, Inc. StorInt Briefing The Need for Consistent IO Speed in the Financial Services Industry Silverton Consulting, Inc. StorInt Briefing THE NEED FOR CONSISTENT IO SPEED IN THE FINANCIAL SERVICES INDUSTRY PAGE 2 OF 5 Introduction

More information

Evaluating Phase Change Memory for Enterprise Storage Systems

Evaluating Phase Change Memory for Enterprise Storage Systems Hyojun Kim Evaluating Phase Change Memory for Enterprise Storage Systems IBM Almaden Research Micron provided a prototype SSD built with 45 nm 1 Gbit Phase Change Memory Measurement study Performance Characteris?cs

More information

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University NAND Flash-based Storage Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics NAND flash memory Flash Translation Layer (FTL) OS implications

More information

Solid State Drive (SSD) Cache:

Solid State Drive (SSD) Cache: Solid State Drive (SSD) Cache: Enhancing Storage System Performance Application Notes Version: 1.2 Abstract: This application note introduces Storageflex HA3969 s Solid State Drive (SSD) Cache technology

More information

Design Considerations for Using Flash Memory for Caching

Design Considerations for Using Flash Memory for Caching Design Considerations for Using Flash Memory for Caching Edi Shmueli, IBM XIV Storage Systems edi@il.ibm.com Santa Clara, CA August 2010 1 Solid-State Storage In a few decades solid-state storage will

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Phase Change Memory: Replacement or Transformational

Phase Change Memory: Replacement or Transformational Phase Change Memory: Replacement or Transformational Hsiang-Lan Lung Macronix International Co., Ltd IBM/Macronix PCM Joint Project LETI 4th Workshop on Inovative Memory Technologies 06/21/2012 PCM is

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine 777 LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine Hak-Su Lim and Jin-Soo Kim *College of Info. & Comm. Engineering, Sungkyunkwan University, Korea {haksu.lim,

More information

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements

More information

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook CS356: Discussion #9 Memory Hierarchy and Caches Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook The Memory Hierarchy So far... We modeled the memory system as an abstract array

More information

1110 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 7, JULY 2014

1110 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 7, JULY 2014 1110 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 7, JULY 2014 Adaptive Paired Page Prebackup Scheme for MLC NAND Flash Memory Jaeil Lee and Dongkun Shin,

More information

Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories

Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories Prof. Onur Mutlu http://www.ece.cmu.edu/~omutlu onur@cmu.edu HiPEAC ACACES Summer School 2013 July 17, 2013

More information

The Unwritten Contract of Solid State Drives

The Unwritten Contract of Solid State Drives The Unwritten Contract of Solid State Drives Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Department of Computer Sciences, University of Wisconsin - Madison Enterprise SSD

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Operating Systems Design Exam 2 Review: Spring 2011

Operating Systems Design Exam 2 Review: Spring 2011 Operating Systems Design Exam 2 Review: Spring 2011 Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 CPU utilization tends to be lower when: a. There are more processes in memory. b. There are fewer processes

More information

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved. + William Stallings Computer Organization and Architecture 10 th Edition 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 2 + Chapter 4 Cache Memory 3 Location Internal (e.g. processor registers,

More information