LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory

Size: px

Start display at page:

Download "LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory"

Sabina Gilbert
6 years ago
Views:

1 H. Jung et al.: LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory 1215 Hoyoung Jung, Hyoki Shim, Sungmin Park, Sooyong Kang, Jaehyuk Cha Abstract Most mobile devices are equipped with a NAND flash memory even if it has characteristics of not-in-place update and asymmetric I/O latencies among read, write, and erase operations: write/erase operations are much slower than a read operation in a flash memory. For the overall performance of a flash memory system, the buffer replacement policy should consider the above severely asymmetric I/O latencies. However, existing LRU buffer replacement algorithm cannot deal with the above problem. This paper proposes the LRU-WSR buffer replacement algorithm that enhances LRU by reordering writes of not-cold dirty pages from the buffer cache to flash storage. The enhanced LRU- WSR algorithm focuses on reducing the number of write/erase operations as well as preventing serious degradation of buffer hit ratio. The experimental results show that the LRU-WSR outperforms other algorithms including LRU, CF-LRU, and FAB 1. Index Terms flash memory, buffer replacement, storage system I. INTRODUCTION Flash memory has many attractive features, including low power consumption, shock resistance, low weight, high density, low-noise, and high I/O performance. As its price decreases and its capacity increases, flash memory is widely used for storage in digital cameras, mobile phones, PDAs, and notebooks[1][2]. Also, emerging devices such as IP-phone and home gateways seriously consider the adoption of flash memory. Various programs including database and server applications which are run in these devices, thus the performance of flash memory storages become more important. However, several hardware limitations exist in a flash memory. First, a data unit of erase operations is a block that is the set of fixed number of pages even if a data unit of read/write operations is a page. Second, it is impossible to rewrite the page in-place in a flash memory. So, in order to update data of the page, a system should perform only one of the following: 1) writing these data to newly allocated page, and invalidating the original page or 2) writing these data to the original page only after erasing the block containing that 1 This work was supported by grant No. R from the Basic Research Program of the Korea Science & Engineering Foundation. Hoyoung Jung, Hyoki Shim, and Sungmin Park are with the Department of Electronics and Computer Engineering, Hanyang University, Seoul, Korea. ( horong@hanyang.ac.kr,dahlia@hanyang.ac.kr,syrilo@hanyang.ac.kr) Sooyong Kang is with the department of Computer Science Education, Hanyang University, Seoul, Korea ( sykang@hanyang.ac.kr) Jaehyuk Cha(Corresponding author) is with the department of Information and Communications, Hanyang University, Seoul, Korea ( chajh@hanyang.ac.kr) Contributed Paper Manuscript received June 10, /08/$ IEEE page. In the latter case, it is difficult to keep the data consistency. In the former case, reclaiming invalid pages for reading/writing requires erasing blocks containing these pages. Third, the life time of a flash memory is shorter than the life time of a hard disk and a DRAM. In other words, only a limited number of erase operations can be performed safely to each memory cell, typically between 100,000 and 1,000,000 cycles. Finally, there exist differences among I/O latencies according to the kinds of I/O operations, i.e., read, write, and erase. The write operation is about 10 times slower than the read operation, and the erase operation is about 20 times slower than the write operation [3][4][5]. Disk caching has been used for reducing disk I/O latency. A buffer replacement algorithm for a disk tries to obtain the optimal I/O sequence from the original I/O sequence by reducing the number of accesses for the overall performance. Least Recently Used (LRU) is one of the famous buffer replacement algorithms for hard disk. Since a flash memory becomes an alternative of a disk, flash caching is needed for reducing flash I/O latency. By the way, a buffer replacement algorithm for a flash memory has to additionally deal with the problem of different I/O latencies according to the kind of I/O operations, i.e. read, write, and erase, even though it is similar to the buffer replacement algorithms for a disk. It tries to obtain the optimal I/O sequence from the original I/O sequence by discriminatively reducing the number of accesses according to the kind of I/O operations. Since LRU ignores the severely asymmetric I/O latencies, it shows the more poor performance in a flash memory than in a hard disk. In addition, an I/O sequence generated from a buffer replacement algorithm for a flash consists of read/write operations only since an erase operation is directly controlled by the underneath layer of buffer layer. Fortunately, the number of write requests from the buffer management layer is usually proportional to the number of physical writes and erases to the flash. Therefore, we focus on finding an algorithm that minimize the number of write requests as well as the loss of hit ratio for generating optimal I/O sequence from a given I/O sequence. For a flash memory, this paper proposes an efficient buffer replacement algorithm, LRU-WSR, that enhances an existing LRU buffer replacement algorithm with add-on buffer replacement strategy, namely Write Sequence Reordering (WSR). WSR reorders writing not-cold dirty pages from the buffer cache to the disk to reduce the number of write operations while preventing excessive degradation of the hit ratio. For seamless integration of LRU and WSR, we have modified all the steps of the LRU algorithm while maintaining

2 1216 IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008 advantages of that algorithm. This algorithm is also designed to minimize both temporal and spatial overheads required to achieve the goal. Our simulation results and the integration into PostgreSQL RDMBS [6] show that LRU-WSR effectively reduces the number of physical page writes and page erases and consequently outperforms other algorithms. Section 2 introduces some related work. In Section 3, an efficient buffer replacement algorithm, LRU-WSR, that enhances LRU with WSR, is described in detail. In Section 4, the tracedriven simulation results show that our algorithm is superior to the existing algorithms such as LRU, LRU, ARC, and even CFLRU in a flash memory. Finally, we concluded in Section 5. II. RELATED WORKS A. Flash Memory Flash memory is a type of EEPROM. Flash memory is nonvolatile, that is, it retains data without power. There are two types of flash memory, NAND and NOR. Table 1 compares their characteristics. TABLE I CHARACTERISTICS OF FLASH MEMORY [7] Device NOR NAND current (ma) Access time (4kB) Idle Active Read Write Erase us 25 us 28 ms 250 us 1.2 sec 2 ms The read latency of NOR is slightly lower than that of NAND, but its write and erase latencies are much higher. The NAND architecture offers extremely high cell densities and a high capacity. NOR flash is typically used for code storage and execution, NAND for data storage [8]. NAND flash memory supports page I/O, and its write latency is about 10 times lower than the read latency given in Table 1. Read and write operations are performed in units of pages, which are usually 512 bytes in size. Erase operations are performed on blocks, which consist of 32 pages (16KB) each. Because of these features, flash memory storage architecture needs a block mapping structure to use flash memory as a block device (like a magnetic disk). Various mapping techniques support flash block devices. FTL (Flash Translation Layer) is one of these techniques, and stores part of the map on the flash device itself, reducing the cost of map updates. FTL stores the mapping table in S-ram for fast address translation and also performs garbage collection and bad block management. Fig. 1 shows the architecture of NAND flash storage system using FTL [9]. As Fig. 1 shows, file system regards flash memory storage as a block device. Page rewrites and in-place updates can be done logically on the file system layer. However, rewritten pages with the same address are physically rewritten in different pages, or even different blocks. Thus reducing the number of page rewrites on file system layer reduces the number of physical write and erase operations. This both improves the performance of file system and lengthens life time of flash memory Fig. 1. Architecture of NAND Flash Storage System B. Buffer Replacement Algorithm for Flash Memory The buffer cache policy used in operating systems stores some parts of every disk block to reduce the number of physical I/O requests. Various buffer replacement algorithms have been developed to increase I/O performance, because the size of the buffer cache is much smaller than that of the disk [10][11][12][13][14][15]. However, existing buffer replacement algorithms are only designed to maximize the page hit ratio. These algorithms treat the costs of page reads and writes as equal. However, because the write cost for evicting a dirty or modified page is much higher than the read cost in flash memory, existing algorithms may not maximize flash I/O performance. In [3], a new buffer replacement algorithm called CF-LRU (Clean First LRU) was proposed. CF-LRU is a flash memoryaware page replacement algorithm that considers the different execution times for reading and writing. Fig. 2. CF-LRU page replacement example [3] Suppose pages were recently accessed in the order E, D, C, B, A, as illustrated in Fig. 2 (so that A is the most recently used clean page and E is the least recently used dirty page). Under the LRU page replacement algorithm, the sequence of victim pages is E, D, C, B, always evicting the least recently used page first. When using NAND flash memory for storing victim page data, however, it may be advantageous to first evict the clean page D to reduce the number of flash write operations, even though the page was more recently accessed than the dirty page E.

3 H. Jung et al.: LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory 1217 As the page fault ratio may increase if the recently used clean page is evicted, only the clean pages within a predetermined window size (w) become candidate victims in CF-LRU. If the algorithm does not find a clean page within the window, it defaults to the normal LRU algorithm, in which the least recently used page becomes the victim whether the page is dirty or not [3]. Despite that the hit ratio of CF-LRU may be lower than that of normal LRU, in many cases it reduces the numbers of write and erase operations more effectively. However, CF-LRU needs to determine w and thus is difficult to adapt to tasks with various workloads. CF-LRU also has a search overhead, as it should determine whether each page in the window is dirty. Above all things, it keeps both cold- and hot-write data; it sometimes performs more read operations than normal LRU, reducing performance. In particular, it needs an adaptive on-line algorithm to determine window size and should apply hot-cold identification to avoid keeping a cold-write page in the buffer. In [16], authors suggested another buffer replacement algorithm called Flash-Aware Buffer (FAB) for portable media players equipped with flash memory. It reduces the number of erase operations by selecting a victim based on its page utilization rather than based on the LRU policy. Fig. 3. The main data structure of FAB. Fig. 3 shows main data structure of FAB. FAB selects a victim block (not a victim page) by the following rules in order and will flush all the pages in the block. 1. The block which has the largest number of pages is chosen as a victim block. 2. If multiple blocks are selected by the above rules, the block which has not been accessed for the longest time is chose as a victim block (which is at the tail of the list in Fig. 3.). Because FAB writes all of the pages in a new block of flash memory, it doesn t need to copy valid pages to another block under FTL. As a result, FAB reduces the number of erase operation than normal LRU. FAB shows good performance when most IO requests are sequential (i.e. IO requests of portable media player) because the latency of erase operation is the slowest among operations of all flash memory. However, it may lower hit ratio when the IO request pattern is random thus causes lower performance. III. LRU-WSR Write Sequence Reordering (WSR) policy and LRU-WSR algorithm are designed for a buffer cache of the flash memory based storage system. The objective of LRU-WSR is reducing the number of flushes of dirty pages from the buffer into flash memory when page replacement occurs. To achieve this objective, it uses the following strategy: delaying evicting the page which is dirty and has high access frequency as possible. Using this strategy, the hit ratio of LRU-WSR algorithms maybe lower than that of LRU, resulting in more physical page reads. However, this algorithm effectively reduces the number of page writes and erases. As a result, it increases the overall performance of the flash memory based storage system. A. WSR Policy The runtime of flash memory under buffer layer is simply Runtime = a RC + b WC + c EC + C (1) a: missed read count, b: flushed write count, c: erase count RC: read cost, WC: write cost, EC: erase cost, C : CPU cost Note that a flash read operation occurs only buffer miss and a flash write operation occurs only a dirty page is chosen as victim page by buffer replacement policy. As mentioned in section II.B, an erase count cannot be measured in the buffer layer, but it is proportional to write count in general [18]. Thus (1) becomes: Runtime = a RC + b WC + C (2) a: read count, b: write count RC: read cost, WC : write cost consider erase cost, C : CPU cost From (2) and Table I, it can be deducted that the write cost is much larger than the read cost. The exact ratio of read cost and write cost depends on the FTL algorithm used for flash storage but their FTL algorithm is hidden so it only can be obtained by experiments. For example, the write cost of mini SD is 200 times expensive than the read cost in [18]. From this result, it is easily guessed that the performance of flash storage will increase if the write count is reduced. For the purpose, to delay flushing dirty pages in buffer layer can reduce write count of flash memory. However above policy may increase read count because of lowering hit ratio, thus to increase overall performance, b WC a RC > 0 a : increased read count, b: reduced write count That is, the benefit of reducing write count should be greater than the cost of increasing read count. In [3], CF-LRU algorithm keeps dirty pages in the buffer without consideration of the access frequencies of these pages. As mentioned in the previous section, keeping dirty pages in the buffer may degrade overall performance because it lowers the hit ratio. To overcome the limit of CF-LRU, we propose Write Sequence Reordering (WSR) policy. Basic scheme of WSR is following: 1. Use cold-detection algorithm to judge whether the page is cold or not 2. Delays flushing dirty pages which are not regarded as cold. For these purpose, cold-detection algorithm is introduced. The idea of cold-detection algorithm is similar to the idea of [17], while it is implemented more simply using the data structure of buffer replacement algorithm. Only a bit flag called cold- (3)

4 1218 flag is added to the page data for cold-detection algorithm. When the buffer manager chooses the victim candidate page by its replacement algorithm, it is examined whether the page is dirty. If the page is dirty and cold-flag is not set, this page regarded as a not-cold dirty page. Then the cold-flag of the page is set and buffer manager tries to find other page as a victim. If the candidate is clean or cold-dirty page a dirty page of which dirty flag is set) it is evicted out of the buffer. In addition, a cold-flag of dirty page is cleared when the page is referenced again. WSR is heuristic algorithm based on the second-chance algorithm [20]. Because it is very hard to theoretically determine whether the dirty page is evicted for the performance; in other words, it is intractable to find the buffer sequence that maximizes the benefit (the left side of (3)). However it is experimentally proved that WSR effectively reduces the page writes and erases of flash memory without much degradation of hit-ratio. B. LRU-WSR In this section, we propose an enhanced LRU algorithm, called LRU-WSR. Using cold detection mechanism of WSR, LRU- WSR tries not to keep cold-dirty-pages which are less likely to be referenced soon in a buffer. As Fig. 5 shows, LRU-WSR uses a page list and an additional flag - Cold flag. When a dirty page is chosen as a candidate for being a victim, its cold flag is checked. If it is not set, the page is moved to the MRU position of the buffer list with setting the flag, and another candidate page is selected from the LRU position of the buffer list. If the candidate page is dirty and its cold flag is set, the page is regarded as a cold dirty page, and is flushed into flash memory to avoid excessive decrement of the hit ratio. If a candidate page is clean, it is selected as a victim regardless of the status of the Cold flag. When a dirty page in a buffer list is referenced, the page is moved to the MRU position and its cold flash is cleared. Fig. 6 shows the victim selection algorithm of LRU-WSR. Since it uses only one additional binary variable for the Cold flag, the overhead for additional data structure is minimal. Other parts of the algorithm are not different to those of the original LRU. IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008 IV. PERFORMANCE EVALUATION In this section, we compare the hit ratios, number of write operations and runtime of the buffer replacement algorithms on a NAND flash memory storage system. For comparison, we conducted a trace-driven simulation. For the experiment, we used four kinds of traces which contain random, sequential, and looping pattern. The write locality of each trace is also different for the precision. Also, real performance of LRU-WSR on postgresql under Linux is evaluated in this chapter. A. Simulation Workloads We collected trace of the Wisconsin Benchmark [21] on PostgreSQL running on the Linux operating system on a Samsung SMDK 2410 embedded board [22]. A K9S1208VOM SMC (smart media card) NAND flash memory [22] was used for the storage device. The access pattern of the given trace data is shown in Fig. 7, and its characteristics are shown in Table II. As shown in Figure 6, the trace contains most of the important access patterns including random, sequential, and looping patterns. The locality expression p% / g%, in Table 2, means that g% of the total number of accesses call p% of the total number of pages. The table shows that the write locality is higher than read locality under this workload. TABLE II CHARACTERISTICS OF POSTGRESQL TRACE DATA File System YAFFS Physical Page Size Logical Page Size 512 Bytes 4 K Bytes Total # of I/O Requests Total # of Page Write 5751 (11.08 %) Read Locality 30% / 70% Write Locality 15% / 85% Fig. 4. LRU List of the LRU-WSR Algorithm L = buffer list of LRU victim = the page at LRU position in L while (victim is dirty) : if (cold-flag of victim is set) exit while else move victim to MRU position in L set cold-flag of victim victim = the page at LRU position in L remove victim from L return victim Fig. 5. Algorithm of the victim selection on LRU-WSR Fig. 6. Access pattern of the PostgreSQL trace Traces of GCC, Viewperf and Cscope in Table III-V are not generated from DBMS s. But they are used for comparing the performance of proposed buffer replacement algorithms because they show the different access patterns, respectively. These traces are obtained by strace Linux utility [24] that intercepts the system calls of the traced process to record the I/O information. Table III-V show their characteristics and Fig show their access patterns. The GCC trace is obtained by building Linux kernel It performs long sequential write and short iterative read.

5 H. Jung et al.: LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory 1219 TABLE III CHARACTERISTICS OF GCC TRACE DATA Applications GCC builds on Linux Total # of I/O Requests Total # of Page Write (12.03 %) Read Locality 12% / 88% Write Locality 32% / 68% Fig. 9. Access pattern of the cscope trace Application Logical Page Size Fig. 7. Access pattern of the GCC trace TABLE IV CHARACTERISTICS OF VIEWPERF TRACE DATA Viewperf benchmark on Linux OS 4 KBytes Total # of I/O Requests Total # of Writes Req (2.42%) Read Locality 33% / 67% Write Locality 38% / 62% Application Logical Page Size Fig. 8. Access pattern of the viewperf trace TABLE V CHARACTERISTICS OF CSCOPE TRACE DATA Cscope Tool on Linux 4 KBytes Total # of I/O Requests Total # of Writes Req (5.46%) Read Locality 41% / 59% Write Locality 25% / 75% The Viewperf trace is a SPEC benchmark measures the performance of a graphics workstation. It performs small random write and sequential read. The cscope trace is obtained by Linux kernel source code examination. The first half of Cscope performs random read/ write and their locality is very high. The write locality is a particularly important factor for the proposed scheme, because dirty pages are kept in a buffer to reduce the number of write operations. If the write locality is low, as in Viewperf or Cscope, WSR policy may not be effective, and can even decrease the overall performance, because the benefit of reducing the number of write operations may be smaller than the additional cost due to the increased number of read operations caused by the lower hit ratio. Based on the write locality of each trace, we can expect that WSR policy will be most effective for PostgreSQL which shows the highest write locality. B. Buffer Hit Ratio Fig. 10 shows the hit ratios of each buffer replacement algorithm. As we can see from the figure, the hit ratio of LRU- WSR is usually lower than LRU because of not-cold-dirty pages in the buffer. As mentioned earlier, the hit ratio of CF-LRU is affected by the value of w (0<w<1). Let B denote the size of the buffer cache. Then the size of window becomes wb. When w is close to 0, CF- LRU behaves similarly to LRU algorithm. When w is close to 1, it can use the entire buffer space to store dirty pages. The experiment used the values for w = 0.1. Those figures show that the hit ratios LRU-WSR very closely approximate LRU. Hence, we can see that the cold-detection policy is effective for flushing cold-write pages. On the contrary, since the CF-LRU and the FAB algorithm do not have any cold-detection algorithm, it keeps the largest number of dirty pages in the buffer among those algorithms. CF-LRU and FAB thus exhibit lower hit ratio than LRU and LRU-WSR in many cases. C. Write Count Fig. 11 shows the number of pages written into flash memory. We obtained these results by counting the number of physical page writes whenever page replacement occurs and, at the end of the simulation, adding the number of dirty pages remaining in the buffer. While CF-LRU algorithm keeps dirtypages for the longest time, in average, among all algorithms,

6 1220 IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008 effective than LRU-WSR, because of the fact we mentioned above. FAB is also effective in Fig. 11. (c), because this trace has many sequential patterns like a multimedia application and FAB is designed for this purpose. (a) PostgreSQL (a) PostgreSQL (b) GCC (b) GCC (c) Viewperf (c) Viewperf (d) Cscope Fig. 10. Hit Ratio under various buffer cache sizes: (a) PostgreSQL, (b) GCC, (c) Viewperf, (d) Cscope sometimes it could not reduce the number of write operations effectively because of low hit ratio like Fig. 11.(a). As expected, we can see from figures that the write count of LRU-WSR algorithm is effectively reduced. However when the ratio of write/read is small like Viewperf (2.42%), CF-LRU is more (d) Cscope Fig. 11. Write Count under various buffer cache sizes: (a) PostgreSQL, (b) GCC, (c) Viewperf, (d) Cscope

7 H. Jung et al.: LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory 1221 D. Runtime The overall runtime of each algorithm is also given in Fig. 12. Runtime is estimated as the sum of all operation times, and each operation time is calculated by multiplying physical time of each operation (shown in Table 1) by the number of each operation. Runtime therefore reflects overall performance. Runtime is highly influenced by hit ratio and the number of writes to the flash memory, because a low hit ratio increases the number of page faults, and as a result increases the number of page reads. In particular, as the number of write increases, so does both the page write and erase overheads. (a) PostgreSQL (d) Cscope Fig. 12. Overall runtime under various buffer size : (a) PostgreSQL, (b) GCC, (c) Viewperf, (d) Cscope CF-LRU shows better performance than LRU when the buffer size is small, but its performance degrades as the buffer size becomes larger because of relatively lower hit-ratio than other algorithms. LRU-WSR always outperforms LRU. Moreover, it outperforms other algorithms in most cases. In Fig. 13 (a), LRU-WSR shows about 1.4 times faster than LRU algorithm. E. Integration into PostgreSQL The LRU-WSR policy is integrated into the PostgreSQL under Linux operating system. In a real system, there exist other factors that affect the performance of flash storage system as well as buffer replacement policy, i.e., prefetching, sync system call [24], and IO scheduling of device driver such as SCAN [20]. Fig. 14 shows the architecture diagram of our experiment. (b) GCC Fig. 13. The overall architecture of experiment (c) Viewperf This experiment is done using 3 different manufacturers USB type flash disks: Sky-Drv [25], Sandisk [26], Samsung Electronics [27]. For benchmarking purpose, we used Wisconsin benchmark and fixed 2MB as the buffer size of postgresql because working set of this benchmark is small. Fig. 15 shows the performance gap between LRU and LRU- WSR. The performance gap may depend on the FTL algorithm of each flash-disk uses, but we cannot examine because manufacturers does not open FTL layer of their flash

8 1222 disks to the public. In the Fig. 14, the performance of LRU- WSR is up to 2 times faster than LRU and 1.4 times faster than LRU on the average. Among all flash disks, LRU-WSR always outperforms LRU algorithm, so it proves LRU-WSR is effective for flash memory storage. Fig. 14. Performance under PostgreSQL on Linux OS using various manufacturer s USB type flash memory V. CONCLUSION In a flash memory, a write operation is much slower than a read operation, and an erase operation is much slower than a write operation. Reducing the number of write requests only may deteriorate the I/O overall performance by decreasing the buffer hit-ratio. For the overall performance of a flash memory system, the buffer replacement algorithms should focus on reducing the number of write requests as well as the number of read requests while considering the asymmetric read/write latencies. In this paper, we proposed a new add-on policy for buffer replacement in a flash memory, WSR (Write Sequence Reordering), that reorders writes of not-cold dirty pages only. To avoid keeping cold pages in the buffer, we used cold-page detection. To show the effectiveness of WSR policy we have developed LRU-WSR algorithms by adding the WSR policy to LRU buffer replacement algorithms. We performed the trace-drive simulation using four kinds of traces representing various kinds of access patterns. Our tracedriven simulation results show that LRU-WSR algorithm improves the overall performance significantly by up to 1.4 times faster than LRU algorithm by effectively reducing the number of physical write and erase operations. Also, integration into PostgreSQL shows that the performance of LRU-WSR is 1.4 times faster than LRU algorithm on the average. ACKNOWLEDGMENTS We are grateful to Dr. Song Jiang at Los Alamos National Laboratory for the simulator and the trace data in [10], and also to Ali R. Butt at Virginia Polytechnic Institute and State University for the Accusim simulator, the trace data, and modified Linux strace toolkit in [24]. Finally we thanks to Kyung-hoon Yoon at Hanyang University for the IPSINS simulator of buffer manager and FTL. IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008 REFERENCES [1] H. Kim, S. Lee, A new flash memory management for flash storage system, In 32nd Annual Intl. Computer Science and Applications Conference, Oct [2] Sunhwa Park, Seong-Young Ohm, New techniques for real-time FAT file system in mobile multimedia devices, IEEE Trans. Consumer Electron., Vol. 52, No. 1, pp1 9, Feb [3] Chanik Park, Jeong-Uk Kang, Seon-Yeong Park, Jin-Soo Kim, Energyaware demand paging on NAND flash-based embedded storages. Proc. of the 2004 Intl. Symposium on Low Power Electronics and Design, pp , [4] A. Kawaguchi, S. Nishioka, H. Motoda, A Flash Memory Based File System. Proc. of the USENIX Technical Conference, 1995 [5] M. L. Chiang, C. H. Paul, R. C. Chang: Manage flash memory in personal communicate devices. Proc. of IEEE Intl. Symposium on Consumer Electronics, (1997) [6] [7] Samsung Electronics: NAND flash memory & SmartMedia data book, 2004 [8] Arie Tal: Two Technologies Compared: Nor vs. NAND White Paper. C1DE21986D28/77/NOR_vs_NAND6.pdf [9] Eran Gal, Sivan Toledo: Mapping Structures for Flash Memories: Techniques and Open Problems. Proc. of the IEEE Intl. Conference on Software-Science, Technology and Engineering (2005) [10] Song Jiang, Xiaodong Zhang: LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance, ACM SIGMETRICS Performance Evaluation Review archive (2002) Vol. 30, Issue 1 pp31-42 [11] Nimrod Megiddo, Dharmendra Modha: ARC: A Self-Tuning, Low Overhead Replacement Cache. Proc. 2nd USENIX Conference on File and Storage Technologies (FAST 03) (2003) [12] Donghee Lee, Jongmoo Choi et al.,lrfu: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies, IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, (DEC 2001). [13] Theodore Johnson, Dennis Shasha, 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm, Proceedings of the Twentieth International Conference on Very Large Databases [14] S. Jiang, F. Chen and X. Zhang: CLOCK-Pro: An Effective Improvement of the CLOCK Replacement, Proc. Of USENIX 05, April [15] 20. E. J. O Neil, P. E. O Neil, and G.Weikum: The LRU-K Page Replacement Algorithm for Database Disk Buffering, Proc.of SIGMOD 93, [16] Heesung Jo, Jin-Soo Kim et al., FAB: Flash-Aware Buffer Management Policy for Portable Media Players, IEEE Trans. Consumer Electron., Vol. 52, No. 2, pp , May [17] Jen-Wei Hsieh, Li-Pin Chang, Tei-Wei Kuo: Efficient On-line Identification of Hot Data for Flash-memory Management. Proc. of the 2005 ACM symposium on Applied computing (2005) [18] Suman Nath, Aman Kansal, FlashDB: Dynamic Self-tuning Database for NAND Flash, Microsoft white paper, MSR-TR , 2006 [19] Li-Pin Chang, Tei-Wei Kuo. An Adaptive Striping Architecture for Flash Memory Storage Systems of Embedded Systems. Proceeding of the 8th IEEE Real-Time and Embedded Technology and Applications Symposium (2002) [20] A. Sliberschantz et al: Operating System Concepts. 6th ed., John Wiley & Sons, Inc. (2004) [21] Dina Bitton et al.: A retrospective on the Wisconsin benchmark. Readings in database systems, Morgan Kaufmann Publishers Inc. (1988) [22] [23] [24] Ali R. Butt, Chris Gniady, Y. Charlie Hu: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms, Proc. of the 2005 ACM SIGMETRICS intl. conference on Measurement and modeling of computer systems (2005) [25] [26] [27] SanDisk_Cruzer_Micro_USB_Flash_Drive.aspx

Communications from Hanyang University, Seoul, Korea in 2004, 2006, respectively. He is currently Ph.D candidate at the school of Electronics and Computer Engineering, Hanyang University.

Sungmin Park received his BS degree in Computer Education and MS degree in Electronics and Computer Engineering, Hanyang University in 2005, 2007, respectively. He is currently Ph.

Hyoki Shim received his BS degree in Urban Enginerring, Hanyang University in 2005. He is currently a graduate student at the school of Electronics and Computer Engineering, Hanyang University.

D degrees in computer science, all from Seoul National University (SNU), Seoul, Korea, in 1996, 1998 and 2002, respectively.

9 H. Jung et al.: LRU-WSR: Integration of LRU and Writes Sequence Reordering for Flash Memory 1223 Hoyoung Jung received his BS degree in Material Science and Engineering and MS degree in Information and Communications from Hanyang University, Seoul, Korea in 2004, 2006, respectively. He is currently Ph.D candidate at the school of Electronics and Computer Engineering, Hanyang University. His research interests include DBMS, flash memory, storage system, and embedded system. Sungmin Park received his BS degree in Computer Education and MS degree in Electronics and Computer Engineering, Hanyang University in 2005, 2007, respectively. He is currently Ph.D student at the school of Electronics and Computer Engineering, Hanyang University. His research interests include file system and flash memory based storage system. Hyoki Shim received his BS degree in Urban Enginerring, Hanyang University in He is currently a graduate student at the school of Electronics and Computer Engineering, Hanyang University. His research interests include DBMS, file system and flash memory based storage system. Sooyong Kang received his B.S. degree in mathematics and the M.S. and Ph.D degrees in computer science, all from Seoul National University (SNU), Seoul, Korea, in 1996, 1998 and 2002, respectively. He was then a Postdoctoral Researcher in the School of Computer Science and Engineering, SNU. He is now with the Department of Computer Science Education, Hanyang University, Seoul. His research interests include multimedia systems, especially multimedia data transmission system and multimedia storage system. He is also interested in the security area including secure communication on the Internet, intrusion detection systems and attack detection systems Jaehyuk Cha received his B.S., M.S. and Ph.D degrees in computer science, all from Seoul National University (SNU), Seoul, Korea, in 1987, 1991 and 1997, respectively. He worked for Korea Research Information Center (KRIC) from From 1998, He taught in Dept. of Computer Education, Hanyang University. He is now an associate professor of the Dept. of Information and Communications, Hanyang University. His research interests include XML, DBMS, flash memory based storage system, multimedia contents adaptation, e-learning and etc.

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea