CFLRU:A A Replacement Algorithm for Flash Memory

Outline Introduction CFLRU Algorithm Simulation Implementation Conclusion

Introduction The characteristics of flash memory are significantly different from magnetic disks Flash memory has no latency Flash memory has asymmetric read and write operation characteristics in terms of performance and energy consumption Flash memory not support in-place update

Introduction Motivation Most operation systems are customized for disk-based storage system Replacement policies only concern the number of cache hits Propose the CFLRU replacement algorithm

Introduction Two kinds of replacement costs One is generated when a requested page is fetched from secondary storage to the page cache in RAM The other cost is generated when a page is evicted from the page cache to secondary storage Clean page and dirty page Replacement cost and cache hit

CFLRU Algorithm Clean-First LRU (CFLRU) Divide the LRU list into two regions LRU : P8 P7 P6 P5 CFLRU:P7 P5 P8 P6

CFLRU Algorithm The size of the clean-first region is called a window size Large windows will increase the cache miss rate Small windows will increase the number of evicted dirty pages

CFLRU Algorithm Cost of a flash write operation Cost of a flash read operation The number of dirty pages that should have been evicted in the LRU order but are kept in the cache The benefit of the CFLRU algorithm The number of clean pages that are evicted instead of dirty pages within the clear-first region The probability if the future reference of a clear page, i, which is evicted at the k-th position The cost of the CFLRU algorithm

CFLRU Algorithm

CFLRU Algorithm In the real world, it s s not easy to find to the probability of the future reference Static Repetitive experiment Dynamically Periodically collected information about flash read and write operation

Simulation Simulation is performed with two different types of real workload traces Trace for swap system Gather virtual memory reference traces using a profiling tool, called Valgrind Trace for file system Gather block reference traces directly from the buffer cache under the ext2 file system

Simulation Chose five different application and executed them on Linux/x86 machine 32 MB SDRAM and 128 MB flash memory

Simulation

Simulation CFLRU-static:reduced by 28.4% with LRU CFLRU-dynamic:reduced by 23.1% with LRU

Simulation CFLRU-static:reduced by 26.2% with LRU CFLRU-dynamic:reduced by 23.5% with LRU

Implementation Original Linux page reclamation Page cache consists of two pseudo LRU lists Active list and inactive list When kernel decides to make free space, it starts the reclaiming phase First, scan the pages in the inactive list Second, if there are too many process-mapped pages, start the swap-out phase

Implementation CFLRU Implementation Insert an additional reclamation function Clean pages in the inactive list are evicted first until the enough number of pages is freed The concept of priority is correctly matched with that of the window size of the clean-first region in CFLRU A kernel daemon periodically checks the replacement cost and compare with last replacement

Implementation Optimization Read-ahead ahead has a bad effect on cache hit rate Remove the sequential read-ahead ahead function

Implementation CFLRU replacement algorithm is evaluated on a system with Pentium IV processor 32 MB SDRAM running the Linux kernel 2.4.28 64 MB flash memory for swap space 256 MB flash memory for file system (ext2) Compare four Linux kernel implementation Plain Linux kernel Linux kernel without swap read-ahead ahead CFLRU-static CFLRU-dynamic The window size of the CFLRU static algorithm is ¼ of the inactive list Five applications gcc, tar, diff, encoding and file system benchmark

Implementation

Implementation No read-ahead:reduced by 2.4% CFLRU-static:reduced by 6.2% CFLRU-dynamic:reduced by 5.7% No read-ahead:saving by 4.4% CFLRU-static:saving by 11.4% CFLRU-dynamic:saving by 12.1%

Conclusion This paper presents the CFLRU replacement algorithm for flash memory CFLRU tries to reduce the number of costly write Static and dynamic method of CFLRU reduce the replacement cost in simulation and implementation