HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems

Size: px

Start display at page:

Download "HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems"

Milton Boyd
6 years ago
Views:

1 Front.Comput.Sci. DOI RESEARCH ARTICLE HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems Yanfei LV 1,2, Bin CUI 1, Xuexuan CHEN 1, Jing LI 3 1 Department of Computer Science & Key Lab of High Confidence Software Technologies (Ministry of Education), Peking University, Beijing, China, National Computer network Emergency Response technical Team/ Coordination Center of China 3 University of California, San Diego, USA Higher Education Press and Springer-Verlag Berlin Heidelberg 2012 Abstract Flash solid-state drives (SSDs) provide much faster access to data compared with traditional hard disk drives (HDDs). The current price and performance of SSD suggest it can be adopted as a data buffer between main memory and HDD, and buffer management policy in such hybrid systems has attracted more and more interest from research community recently. In this paper, we propose a novel approach to manage the buffer in flash-based hybrid storage systems, named Hotness Aware Hit (HAT). HAT exploits a page reference queue to record the access history as well as the status of accessed pages, i.e., hot, warm and cold. Additionally, the page reference queue is further split into hot and warm regions which correspond to the memory and flash in general. The HAT approach updates the page status and deals with the page migration in the memory hierarchy according to the current page status and hit position in the page reference queue. Compared with the existing hybrid storage approaches, the proposed HAT can manage the memory and flash cache layers more effectively. Our empirical evaluation on benchmark traces demonstrates the superiority of the proposed strategy against the state-of-the-art competitors. Keywords Flash memory, SSD, Hybrid Storage, Buffer management, Hotness aware Received month dd, yyyy; accepted month dd, yyyy bin.cui@pku.edu.cn 1 Introduction With the development of flash memory technology, the NAND flash-based solid state drive (SSD) has been widely used as the storage device for various systems, ranging from personal computer to enterprise scale data center. Although the SSD shows better read/write performance than the traditional hard disk drive (HDD), the adoption of the SSD is still limited by its price and capacity. As shown in Table 1, the price, capacity, and access latency are compared among the mainstream commercialized SSD, HDD, and DRAM-based main memory found on the market. It is easy to find that the price per bit of SSD is still much higher than that of HDD. Thus, it may take a long time for the SSD to completely replace the HDD [1]. Therefore, flash-hdd hybrid storage becomes more and more attractive because it can leverage the advantages from both technologies. Recently, various flash-hdd hybrid storage devices have been presented. Seagate provides a mixed storage hard disk with 4GB flash chip to improve the overall performance [2]. Windows operating system has supported Ready Boost since Vista to accelerate the booting [3]. Moreover, the hybrid storage has been implemented in some data-centers [4]. As shown in Table 1, SSD displays a moderate I/O performance and price per GB between DRAM and hard disk. Consequently, it is straightforward to adopt flash memory as a level of memory between the HDD and main memory because of its advantages in performance [5, 6]. The key

2 2 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems problem is how to design an efficient buffer management policy to improve the I/O performance of such a memory hierarchy. There are several existing approaches attempting to better utilize the memory hierarchy in flash-based hybrid storage systems [7 9]. TAC (Temperature-Aware Caching) [7] uses the concept temperature to perform hotness detection, which divides the pages on disk into regions and maintains the whole history of page accesses. TAC identifies the hot ones by monitoring the number of accesses on each region. The pages with higher temperature will be held on the flash memory. The main drawback of TAC is not adaptive to the access pattern change. Though the temperature-based design records the number of page accesses to reflect the page hotness, they react slowly to the pattern evolvement. It needs a rather long time to replace the old pages cached on the flash, e.g., those old pages have high temperature though they will be seldom used in the future workload. Even TAC exploits aging policy to capture access pattern change, ascertaining a suitable aging parameter is not a trivial task. The paper [8] analyzes the design of hybrid storage system and presents three alternative designs implemented in SQL Server: CW (clean-write), DW (dual-write), and LC (lazy-cleaning). As illustrated in the paper, LC is the best design, which keeps the random accessed pages on the flash. In LC policy, a dirty page is written to flash first and flush to disk afterward. LC method shows better performance than TAC on write intensive traces. FaCE (Flash as Cache Extension) [9] adopts the SSD in a FIFO manner. In this way, FaCE can facilitate the high sequential write performance of SSD. Furthermore, FaCE proposes GSC to increase the hit ratio on flash memory. The GSC gives a page a second chance for eviction if the page is referenced while staying in the flash cache. FaCE also modifies the recovery component to extend persistent scope to flash memory in hybrid storage. The main drawback of these designs is they do not make full use of the storage hierarchy. All the pages replaced out of main memory will be kept on the flash no matter whether they will be reused again. Nevertheless, sometimes a page is visited only once but never referenced in the future, this kind of page may waste the flash memory and bring unnecessary write to the flash. In order to overcome the problems in the existing approaches, in this paper we propose a novel strategy named Hotness Aware Hit (HAT) for efficient buffer management in flash-based hybrid storage systems. The pages in HAT are divided into three hotness categories: hot, warm and cold. In general, the hot, warm and cold pages are kept in main memory, flash and hard disk respectively. Furthermore, we construct a page reference queue which is an LRU list to record the access history and the status of accessed pages, and the queue itself is split into hot and warm regions. Based on these data structures, we design a novel light weight page replacement mechanism for hybrid storage systems. The current status of the accessed page and the hit position in the page reference queue are taken into consideration for the page replacement in the storage hierarchy as well as the page status update. We enumerate 6 types of page access scenarios, and design the relevant operations on HAT structure accordingly. The page access scenarios include cold page access in hard disk, cold page hit in warm region, cold page hit in hot region, warm page hit in warm region, warm page hit in hot region and hot page hit in hot region, which cover all the access cases for a data access workload. The details about page status update and buffer replacement in the storage hierarchy will be given in Section 3. Instead of recording the exact access frequency of a page, our proposed HAT mechanism shows that integration of page status and page hit position is effective and incurs lower computational cost. Moreover, our approach is more adaptive to the access pattern change and shows considerable improvements on different workloads. Compared with the existing approaches, HAT has the following advantages. 1. Integral buffer management: We utilize a single page reference queue to record the page access history, thus the main memory and flash are managed as a whole buffer in HAT. The hotness detection starts from main memory, and the hot pages are kept in main memory whereas the warm ones in flash memory generally. In addition, only the hot pages evicted from main memory will be held on the flash. The cold ones, however, are evicted to disk directly. Consequently, HAT can enlarge the effective buffer size and increase buffer efficiency. 2. Workload evolvement adaption: Since HAT takes both page status and hit position in page reference queue into consideration for buffer management, thus a new page will be detected as hot page only after its in-memory hit happening when the access pattern changes. Therefore HAT can better capture frequently accessed pages and automatically adapt itself to the workload evolvement. 3. Low computational cost. The HAT approach utilizes three page statuses and an LRU based page reference queue for buffer replacement. The time consumption is

3 Front. Comput. Sci. 3 Table 1 Comparison on different storage media Price($) Capacity(GB) Price($)/GB Read(µs) Write(µs) DRAM(DDR3) GB SSD GB Disk(7.2K) TB * The price is obtained from O(1) on average for each page access operation. Compared with the existing TAC method which takes logarithmic time to update the temperature of a page, our approach is more computational efficient. We evaluate the performance of the proposed HAT approaches by comparing with the state-of-the-art buffer strategies on flash-based hybrid systems. The experiments are conducted on both synthetic and real traces from public benchmarks including TPC-B, TATP, TPC-H and MLK (Make Linux Kernel), and our experimental study shows that the HAT approach is superior to the the existing buffer replacement methods. This paper extends a preliminary work [10] with an in-depth investigation and performance analysis of the proposed HAT mechanism. Specifically, this paper makes the following additional contributions and is extended in several substantial ways. First, we provide a comprehensive analysis of related work. Second, we present an in-depth discussion of the problems, issues and solutions on buffer management for flash-based hybrid storage systems, and deliver the detailed HAT algorithms. Third, we redesign the experimental study and conduct more extensive experiments and performance analysis. The remainder of the paper is organized as follows. Related work is introduced in Section 2. Section 3 describes our framework and detailed algorithms. Experimental results are shown in Section 4, and we make a conclusion in Section 5. 2 Related work In this section, we briefly review the related work on flash memory and flash-based hybrid system management. Flash-based systems have become a hot topic for several years and many efforts has been made to design more effective systems on the flash. Detailed studies [11, 12] have been conducted to reveal the internal I/O feature of flash disks. [13] discusses the design trade-offs on a flash-based system to improve the overall performance. FTL designs [14, 15] are also investigated in recent years. Nowadays, flash-based hybrid storage has been gradually recognized as an economical way for a practical system by more and more researchers. Some hard disks leverage a small flash memory to improve I/O performance [16]. With the increment on the capacity, SSD is more and more widely deployed in storage systems. The early SSD is skilled in reading but uncompetitive in writing. Thus migration methods [6, 17] are proposed to dynamically transfer read intensive pages to flash and write intensive ones to disk. The authors further proposed to exploit concurrency to improve latency and throughput in a hybrid storage system [18]. Recently, SSD thoroughly surpasses disk on both read and write speed, and hence the popular method is to adopt flash as a middle-level cache between disk and main memory. Existing works can be separated into two categories, i.e., static deployment and dynamical loading. An object placement method [19] is developed to give a proper deployment for the objects of Database. By comparing the object performance on SSD and disk beforehand, those with higher benefit per size are chosen to be placed on SSD. Other methods suggest putting certain part of the system to flash. FlashLogging [20] illustrates that storing the log of DMBS to flash can largely improve the overall performance. Debnath et al. [21, 22] proposed FlashStore and SkimpyStash to discuss the proper way to put the key-value pair to SSD. The static methods need to know the specific information about the application and cannot be self-adaptive to various environments. Dynamical page transferring is more attractive compared with static strategies. Ou Yi et al. [5] tested the performance for different hybrid structures, which shows global structure outperforms local with less flash/main memory ratio, and vise versa. TAC (Temperature-Aware Caching) [7] is the dynamic version of object placement strategy. It allocates temperature to the extents according to access pattern and I/O cost and keeps the data with higher temperature to higher level of the storage structure. To deal with access pattern changes, the authors of TAC proposed to use the aging policy [23] to reflect changing access patterns, that is, the temperatures of pages are halved periodically to give higher priority to the recently accessed pages. The aging

4 4 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems policy forgets the access history of all the pages no matter the page is still hot or it is just accessed. In addition, the aging frequency is difficult to determine. Our testing on a variety of traces shows that the best aging interval ranges from thousands of accesses to the length of trace (which corresponds to the case without aging). Researchers from Microsoft [8] discussed several possible designs for hybrid storage methods. According to the test, the LC (Lazy-Cleaning) method is the best design. LC method shows better performance than TAC on write intensive traces and similar on read-intensive traces. FaCE proposes to use the flash in FIFO manner to improve throughput and provide faster recovery, which yields better performance than LC [9]. HStorageDB [24] adopts semantic information to exploit the capability of hybrid storage system, which is from another aspect to solve the hybrid storage problem. On the other hand, hotness-aware buffering has been studied recently [25, 26]. Though these methods also apply hotness-aware strategy, they target different problems. AD-LRU [25] and CCF-LRU [26] are designed for flash-only storage system, and the key issue of buffer design is to reduce the number of flash write operation. Our proposed HAT is specifically designed for flash-disk hybrid storage system, and the main purpose is to reduce the number of disk accesses Fig. 1 Flash Read from Flash Write to Flash Disk Main Memory Write to Disk Illustration of hybrid storage system the hard disk directly; and 3) how to elevate a hot page in the flash and move it to the main memory. Different answers to these questions lead to different management strategies. The optimal decisions, however, depend on the workload pattern as well as the detailed I/O cost which make the problem more complex. 3.2 Overview of HAT In this section, we will introduce the basic idea of our approach for effective buffer management in flash-based hybrid system, named Hotness Aware Hit (HAT). Read from Disk 3 The HAT Approach 3.1 Hybrid storage structure The typical structure of a hybrid storage system is illustrated in Figure 1. Compared with traditional storage hierarchy, this architecture contains an additional flash-level storage device to accelerate data access speed by buffering certain part of pages on the flash. All the data is stored on hard disk and organized as data pages. A page needs to be loaded into main memory before being accessed. Since flash-based device has better I/O performance compared with hard disk, it works as the level between main memory and disk. When a page miss happens in main memory, the flash will be checked first. The disk is only accessed when the page is not found in the flash. The big challenge in hybrid storage system design is that we have to face many choices on data allocation and migration. These choices include: 1) whether a page newly referenced from the disk should replace a page in the main memory, and which page should be replaced; 2) whether a page evicted from the main memory should be written to the flash which may benefit future reference or just evicted to Fig. 2 Basic Structure of HAT Page reference management In HAT, we exploit a page reference queue to record the page reference history. Based on the reference information, the pages are marked with different hotness levels, e.g., hot, warm and cold. Pages are allocated to different levels in the storage hierarchy according to the page access sequence and pages hotness. For ease of the following presentation, we define some key notations and provide their detailed description as follows. Page reference queue In order to perform hotness detection, we record a recent part of the page access history with a page reference queue, as illustrated in Figure 2. Only

5 Front. Comput. Sci. 5 the IDs of accessed pages are kept in the page reference queue. The length of page reference queue corresponds to the size of memory and flash, as well as the page access pattern, and less recently visited pages will be discarded eventually. The page reference queue is organized in an LRU manner, that is, a newly referenced page will be added or moved to the MRU end of the queue. In the rest part of the paper, we name the MRU end of the queue as the head of the queue, while name LRU end of the queue as the tail for simplicity. Hot Region and Warm Region In such an LRU based page reference queue, pages near to the head of the queue have higher hotness, while pages at the tail are colder. The page reference queue is divided into two regions named Hot Region and Warm Region respectively. The sizes of Hot Region and Warm Region are determined by the reference history and buffer sizes of memory and flash in general, which will be introduced in detail later. In our design, instead of recording the exact access frequency of a page, ascertaining the hotness level is sufficient as there are two level caches in the hybrid storage system. Note that, the Hot Region and Warm Region are proposed based on reference history and do not correspond to the buffer space holding the real data pages. HAT uses these regions to facilitate page hotness detection and thus make the page replacement computation more efficient. Hit We name a page reference a hit on a region in the page reference queue if a page is referenced when it is currently in the region. For example, in Figure 2, if page 3 is referenced again, we can name the reference a page 3 hit on the hot region, and page 3 turns to be hot afterward. Similarly, a reference of page 6 hits on the warm region. We use the information of hit as the measurement for hot detection. Hit can reflect the reference status of a page effectively. As described above, a page hit happens only when the page ID is already contained in the corresponding region which indicates the historical access information of this page. Thereby the hit can provide accurate hotness judgement by considering both the historical information and the current status. At the same time, the hit reacts quickly for the page hotness change. If a page turns to be hot, then this change can be detected only after one hit in hot region and HAT can adjust to this change efficiently. Additionally, the information needed to detect hit is the last reference of a certain page, and the process of hit detection can be performed easily and quickly, and hence the hit-based page hotness detection is efficient both on space and time consumption Page status and deployment HAT categories the pages into three priority levels based on their hotness, namely hot, warm and cold, which are referred as page status in this paper. The status of a page is determined according to the hit region on the page reference queue. Generally, a hit on the hot region marks the corresponding page to hot, and a hit on warm region marks the page to warm. Each status has different behaviors on page accesses and is used to conduct page deployment. We introduce three types of pages in detail as follows. Hot page Hot page is with the highest priority in the HAT approach, and thus all the hot pages are determined to be kept in main memory and occupy a fixed percentage of main memory space. A page is marked as hot if a reference on this page hits on the hot region. For example, in Figure 2 a hit on page 1 will turn page 1 to hot page. If the number of hot pages exceeds the threshold, the hot page with the maximum recency is degraded to warm page, and the hot region shrinks. Warm page A page may turn to warm in two ways: 1) a reference hit on the warm region will turn a cold page to warm; 2) a hot page is degraded to warm from the hot region as discussed previously. The number of warm pages on the flash is limited by the capacity of flash device, and thus if the flash is full, the warm page with the largest recency will be degraded to cold and evicted from the flash to the disk. All the pages on the flash are warm pages. However, although the amount is not large, some recently accessed warm pages may be buffered in the main memory. For example a newly referenced page will be kept in main memory for a while. A special case is an in-memory cold or hot page may turn to warm, and in this case, the page becomes in-memory warm page. After evicted from the main memory, the warm page will be moved to the flash. Cold page A newly referenced page from the hard disk is considered to be cold, though its page ID will be queued in the hot region of page reference queue. A certain percentage of main memory space is allocated to store the newly referenced cold pages and warm pages regardless of their status. After evicted from the main memory, the cold page will be flushed back to the disk directly. Tagging pages with different status based on their hotness is one of the key operations in our buffer management strategy. Here we provide details about how to use the tag information to effectively manage the data in storage hierarchy. The page deployment in the hierarchy roughly corresponds to the status of the page, and we basically try to

6 6 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems put the hot pages in the main memory, warm pages in the flash and remain the cold ones on the disk. However some newly referenced cold/warm pages are likely to be referenced again, so HAT allocates a certain percentage of main memory to buffer these cold/warm pages, puts their IDs in the hot region of page reference queue but remains their status unchanged. These new comers will be temporarily kept in main memory for further hotness examination. If these pages are accessed again, they can be obtained directly from main memory and upgraded to hot pages. The memory space used to cache the hot pages is named hot buffer zone, and the memory space used to cache the non-hot pages is named non-hot buffer zone. The sum of hot buffer zone and non-hot buffer zone is the size of main memory buffer. Thus, the page deployment of HAT is listed as follows. 1. Put all the hot pages to hot buffer zone in the main memory. 2. Allocate the non-hot buffer zone to the newly arrival cold/warm pages. 3. Keep the warm pages at least on flash memory. Note that, some recently visited warm pages may reside in the non-hot buffer zone. 4. Remain all the other cold pages on the disk. The principle of data placement also determines the data transferring between different storage hierarchies, which will be presented in details in the following section. For example, if a page turns to be hot, it will be moved to the main memory. Another example is a cold page evicted from the main memory is flushed back to the disk, while a warm one is flushed to the flash, which is a key difference from FaCE [9] and LC [8] Page replacement strategy In the previous sections, we have introduced some key notations and basic HAT structure features. In the following, we will present the page replacement strategy of HAT in the hybrid storage hierarchy. Since the capacities of main memory and flash are limited, we set some quantity constraints on the number of pages with a certain status. Constraint 1: The number of hot pages should not exceed the hot buffer zone size. Constraint 2: The number of flash-resident warm pages should not exceed the capacity of flash memory. Constraint 3: The number of in-memory cold and warm pages should not exceed the non-hot buffer zone size. When a certain constraint is violated, data page replacement in the storage hierarchy should be performed. The key innovation of HAT is how to conduct the place deployment of relevant pages according to the status and hit position of the accessed page, and modify the relevant pages status accordingly. In our approaches, we can enumerate 6 data page access scenarios for buffer replacement in the hybrid storage hierarchy according to current page status and its access history. These scenarios can cover all the cases for a data access workload. We present the operations and structure update for each case as follows: Cold page access on hard disk: The page resides in the hard disk, and no historical access information is maintained by page reference queue. In this case, we put the page ID in the hot region of page reference queue, buffer the data page in the non-hot buffer zone, and check Constraint 3. If Constraint 3 is violated, we flush the most out-of-date data page in the non-hot buffer zone. Cold page hit in hot region: This indicates the accessed cold page is in hot region and this page is referenced again. In this case, we upgrade the page status to hot, move the page ID to the head of hot region, move the data page to the hot page buffer zone, and check Constraint 1. If Constraint 1 is not satisfied, move the tail of hot region to the warm region, and downgrade the status of tail page to warm. After that, check Constraint 3, and replace out the most out-of-date data page in the non-hot buffer zone if needed. Then we check Constraint 2, and move out the tail from warm region if Constraint 2 is violated. Cold page hit in warm region: We change the page status to warm and move the page ID to the hot region of page reference queue. If the data page is in the non-hot zone of memory, no further actions are needed; otherwise, load the page to the non-hot buffer zone, and check Constraint 3. If Constraint 3 is violated, replace the most out-of-date data page. Warm page hit in warm region: We move the page ID to the hot region of page reference queue. If the data page is in the non-hot buffer zone, no further actions are needed; otherwise, load the page to the non-hot buffer zone, and check Constraint 3. If Constraint 3 is violated, replace the most out-of-date data page.

7 Front. Comput. Sci. 7 Warm page hit in hot region: We update the page status to hot, move the page ID to the head of hot region, move the data page to the hot page buffer zone, and check Constraint 1. If Constraint 1 is not satisfied, move the tail of hot region to the warm region, and downgrade the status of tail page to warm. After that, check Constraint 2, and update the warm region accordingly. Hot page hit in hot region: We simply move the page ID to the head of the queue without any other adjustment. Note that, in some cases above, the constraint violation may appear cascadingly. In this case, the HAT will conduct a sequence of adjustments to ensure the system satisfy all the constraints. Second, we call the pages whose status to be downgraded as the victim of status downgrade. The victim hot and warm pages are all determined in LRU manner, i.e., the hot/warm pages nearest to the tail of hot/warm regions will be selected. We can find the victim by scanning from the tail of corresponding region. In this process of the victim selection, the downgraded pages are the most out-of-data page references of the corresponding region and will be removed accordingly from the region. The references removed from the warm region will be removed from the page reference queue, and thus the HAT forgets the historical access information naturally. Third, the Non-hot buffer zone in main memory is also managed in LRU manner. The cold/warm page with the largest recency in the reference queue will be evicted. The evicted warm or cold page will be flushed to flash or disk respectively Example For better understanding our buffer replacement mechanism, we proceed to give a detailed example to show how the HAT approach works with an access sequence. The first subfigure in Figure 3 shows the initial state of HAT, and the following access sequence is on page 5, 14, 14, 12, 10. The size of main memory is 4, flash size is 5, and the maximum number of hot page, i.e., hot zone buffer size, is set to 2 in this example. The first access is on page 5, which is new to the buffer manager and corresponds to the aforementioned Cold page access on hard disk" case. We read the page from the disk, and put the ID 5 to the head of the hot region. After checking Constraint 3, we find that the main memory contains too many pages. We then flush out page 12 from main memory, which is the most out-of-date page in the non-hot buffer zone. This page is a cold page, so we should flush it to disk. Since Constraint 2 is currently satisfied, no additional actions should be taken. The next access is on page 14, which is a warm page hit in warm region. This page has its data on the flash memory, so we first load it into main memory, and move its page ID to the head of the hot region. Constraint 3 is violated again, so we flush the most out-of-date page, which is now page 4, to the flash. Constraint 2 continues being satisfied, so that is done. The system continues to access page 14, and this is a warm page hit in hot region. Since its page ID is already in the head of the hot region, we only change its page status to hot. By checking Constraint 1, we find that we have too many hot pages. And hence, we move the tail of the hot region, i.e., page 8, to the warm region, and change its status from hot to warm. These actions do not violate Constraint 2 either. The fourth access is on page 12, which is a cold page hit in hot region. This is the most complicated case in this example. The page ID is moved to the head of the page reference queue and its status is upgraded to hot. Because the data of this page does not reside in the main memory, we need to load it from the disk. We proceed to check the constraints. Constraint 1 is violated again, so page 9, which is the tail of the hot region, is moved to the warm region with its status changed to warm. So are page 4 and page 5 that are not hot, because the hot region should always end with a hot page, which is page 14 in this case. Constraint 3 is also violated because we have too many pages in non-hot buffer zone, thus we flush page 8 into the flash. We find that Constraint 2 is also violated, and we flush page 6 from flash memory which is at the very tail of the warm region. The warm region has to be shrunken until it finds page 1 as its new tail, which must be a warm page in flash. Note that, we guarantee that the tail of hot region is hot page and tail of warm region is a flash-resident warm page, which is an implementation issue to facilitate the victim search process, more details can refer to Section The last access is on page 10, which corresponds to the cold page hit in warm region case. We move the ID to the head of hot region, upgrade the status to warm and load its data from disk, just the same as we described above. Main memory usage exceeds Constraint 3, so we flush page 9 to the flash. Afterward, the flash has too many pages due to Constraint 2, and we drop page 1 from it. The warm region is shrunken as well. To better illustrate the process, we further present the page status change of the aforementioned example. The results are presented in Table 2, which includes the page access sequence and the changes of each region, as well as

8 8 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems (a) The initial state of HAT (b) After page 5 accessed (c) After page 14 accessed (d) After page 14 accessed again (e) After page 12 accessed (f) After page 10 accessed Fig. 3 An example of access sequence process by HAT the hotness of each page. 3.3 Implementation of HAT In this section, we will present the implementation detail of the HAT approach, including the data structure and algorithms. Figure 4 shows the structure of HAT in our implementation. We utilize two linked LRU lists, i.e., hot list and warm list, to represent hot region and warm region to record the access history. The two lists work together as one LRU list to represent the whole page reference queue, that is, no matter a page is hit on hot region or warm region, this page will be moved to the head of hot region, and the tail page evicted from the tail of hot list will be moved to the head of warm list. Note that, only page ID is recorded in the node of hot and warm region to record the access history instead of the real data pages. Consequently, some pages whose page IDs are recorded in the lists may reside in

9 Front. Comput. Sci. 9 Table 2 Page hotness change in the example(h:hot, W:Warm, C:Cold) Initial state Page 5 accessed Page 14 accessed Page 14 accessed Page 12 accessed Page 10 accessed Page 1 W W W W W C Page 2 W W W W W W Page 3 W W W W W W Page 4 W W W W W W Page 5 C C C C C C Page 6 W W W W C C Page 7 C C C C C C Page 8 H H H W W W Page 9 H H H H W W Page 10 C C C C C W Page 11 C C C C C C Page 12 C C C C H H Page 13 C C C C C C Page 14 W W W H H H neither main memory nor flash memory. Additionally, we design another auxiliary LRU list to maintain the pages in non-hot buffer zone, which can facilitate the fast access of pages in this area. Three flags are used to identify the page status, named hot, warm and cold flag respectively. Fig Brush of region tails Illustration of HAT The size of hot buffer zone is set to a fixed percentage of main memory buffer capacity, and if the number of hot page exceeds the threshold, a hot page will be downgraded. Our implementation of HAT ensures that the tail of hot region is hot page and tail of warm region is a flash-resident warm page. This design can facilitate victim searching and simplify the implementation. When a hot victim is needed, the tail page of hot list is selected directly and the same operation holds for the warm victim. There are two cases that may cause the tail of hot region not ended with a hot page. First, when the number of hot page exceeds threshold, the tail of hot region will be degraded to warm and moved to the head of warm region. Thus the tail of hot region may no longer be a hot page. Second, this case may also take place when the tail page of hot list is referenced and moved to the head. In this case, we move pages from the tail of hot region to the head of warm region until encountering a hot one. We name this process Brush. The same process can take place for the warm region. When 1) the number of on-flash warm pages exceeds the capacity of flash memory, the tail of warm region is evicted, or 2) the tail of warm list is referenced and it is moved to the head of hot region. If these happen, a warm brush is also conducted to remove some pages from the tail to ensure the tail of warm region is still a flash-resident warm page. In the process of warm brush, once encountering a memory-resident warm page, this page will be degraded to cold as it must be less recently visited than other warm pages. In the process of brush, HAT can forget some out-of-date page references, and adjust the size of Hot Region and Warm Region automatically Auxiliary list for non-hot buffer zone When the main memory buffer is full and an empty slot is required for new page access, an in-memory page has to be evicted out of the maim memory. In this case, we evict the non-hot page with the largest recency, that is, the in-memory non-hot page which is nearest to the tail in the page reference queue. This page can be obtained by scanning backward from the tail of reference queue, but the process could be time consuming, as the queue maintains at lease all the pages residing in the memory and flash. We introduce an auxiliary LRU list in order to accelerate this searching process. The pages in auxiliary list are all non-hot pages that are buffered in the main memory. As illustrated in Figure 4, the auxiliary list maintains 2 pages. The auxiliary list may also hold the pages that are evicted

10 10 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems from page reference queue but still in main memory, which may be incurred by the warm region brush operation. The pages in the auxiliary list are organized in LRU manner as page reference queue. Thus if a page in the auxiliary list is referenced, we should update its order in auxiliary list besides the operations on page reference queue. When a page is needed to be replaced out of the main memory, the tail of auxiliary list (page 8 in the figure) is simply selected. We adopt a structure named frame to store all the information of pages, including the hotness flag and the page location in storage hierarchy. The frames are organized in a hash table to facilitate fast searching. As discussed above, the ratio of non-hot buffer zone (Non-hot Ratio) in the system is a parameter of our approach. Small hot buffer zone will lead to more main memory space for the new non-hot pages. Thus, this renders longer hotness examination time for the newly arrival pages before they are identified to be hot, as they can stay in memory for longer time. On the contrary, if we give more memory space to hot buffer zone, a new page will be replaced from the main memory more quickly. We will evaluate the effect of this parameter in the experiments The algorithms of HAT The detailed algorithm of HAT is listed as Algorithm 1. The process of HAT algorithm can be divided into 4 steps, namely page load (lines 1-7), flag update (lines 8-17), list maintenance (lines 18-22) and constraint check (line 23). To begin with, the data page is loaded from the flash or hard disk, and then the flag of the page is updated. If the page hit is in Hot Region, it is marked as hot and added to the hot buffer zone (lines 8-10). If the page hit is in Warm Region, the page is marked as warm (lines 11-13). Otherwise the page is a cold page (lines 15-16). Either the cold or warm page will be held in non-hot zone. In the list maintenance step, we first record the page reference information at the head of Hot Region (line 18) and the old reference on this page is removed from page reference queue. After that, the Auxiliary List is updated if the page accessed is in non-hot buffer zone (lines 19-22). At last, the constraints are checked, which is detailed separately in Algorithm 2. First, the number of hot pages is checked (lines 1-6). Once Constraint 1 is violated, the tail of Hot region is downgraded to warm and inserted into Auxiliary List. And then the Hot Region brush is conducted to ensure the Hot Region is ended with a hot page. Second, the non-hot region constraint is checked (lines 7-14). If this constraint is unsatisfied, i.e., the non-hot buffer zone is full, the tail of Auxiliary List is evicted. At last, the flash memory constraint is checked (lines 15-22). If the flash is full, the tail of warm region is degraded to cold and evicted out (lines 16-17). The Warm Region brush action will be taken (lines 18-22). Note that, the order of constraint checks is important, because the process of previous constraints may cause the violation of the latter one. Algorithm 1: The HAT algorithm Input: an access on page P 1 if P is not in main memory then 2 if P is in flash memory then 3 Load page P from flash memory; 4 else 5 Load page P from disk; 6 end 7 end // Update the flags 8 if P hit in Hot Region then 9 mark P as hot page; 10 hold P in hot buffer zone in main memory; 11 else if P hit in Warm Region then 12 mark P as warm; 13 hold P in non-hot buffer zone; 14 else 15 mark P as cold; 16 hold P in non-hot buffer zone; 17 end // Maintain the Lists 18 move the node to the head of Hot Region; 19 remove P from the Auxiliary List if existing; 20 if P is warm or cold and in main memory then 21 insert the page back to the Auxiliary List; 22 end 23 invoke CheckConstraints(); 4 Performance evaluation In this section, a trace-driven simulation is conducted to evaluate the effectiveness of our HAT approach, and the experimental results are illustrated in comparison with some state-of-the-art flash-based hybrid buffer replacement algorithms, including FaCE [9] and TAC [7]. We implement the FaCE approach with GSC (Group Second Chance), since the experiments indicate FaCE+GSC performs the best among FaCE variants. The aging frequency of TAC is an important parameter. We tested TAC with different aging intervals ranging from 0.1M to 10M and choose the parameter with the best performance for the comparison. The simulation is developed in Visual Studio 2010 using C#.

11 Front. Comput. Sci. 11 Algorithm 2: Subroutine CheckConstraints() // Check Constraint 1 1 if hot page count > Hot buffer zone size then 2 find the tail of Hot Region; 3 mark it as warm; 4 insert it into Auxiliary List; // Hot Region Brush 5 shrink the hot bound to the next hot page; 6 end // Check Constraint 3 7 if Non-hot buffer zone is full then 8 remove the tail page P ta of Auxiliary List; 9 if P ta is warm then 10 flush it to flash; 11 else 12 write back to disk if P ta is dirty; 13 end 14 end // Check Constraint 2 15 if flash is full then 16 mark the tail page of Warm Region as cold; 17 remove the tail of Warm Region and flush to disk; // Warm Region Brush 18 get the tail page P tw ; 19 while not (P tw is warm and in flash) do 20 change P tw to cold; 21 remove P tw from Warm Region; 22 end 23 end All experiments are run on a Windows 2008 server with two 2.4 GHz Intel E5530 CPU and 32 GB of physical memory equipped with Samsung SSD (64GB, 470 series) and Seagate disk ( ST380011A). 4.1 Experimental setup We use both real and synthetic traces for performance evaluation. We exploit four real traces, TPC-B, TPC-H, TATP and making Linux kernel (MLK for short) to evaluate the performance on various workloads. The three benchmarks are run on PostgreSQL with default settings, e.g., the page size is 8KB. The MLK is a record of the page accesses of making Linux kernel For the synthetic traces, we make a series of traces varying from very stable access pattern to unstable one. The stable trace is generated conforming to the 80/20 distribution [27], whereas the unstable traces are produced by combining multiple stable traces. We utilize a tool named strace to monitor these processes and obtain the disk access history. Specification on these traces is shown in Table 3. The total I/O time including both flash and disk accesses Table 3 Specification on the Traces Filename Number of Pages (10 3 ) Number of References (10 6 ) Write Ratio TPC-B % TATP % TPC-H % MLK % Synthetic % is used as the primary metric to evaluate the performance, while we also show the buffer hit ratio and number of accesses in our experiments. The parameters used in our experiments are listed in Table 4. The first parameter S M is the memory buffer size, and we also consider various memory and flash sizes to test the performance under different environments, where parameter Ratio F/M is used to represent the ratio between the flash and memory. The costs of flash I/O and disk I/O are obtained from testing on Samsung SSD (64GB, 470 series) and Seagate disk ( ST380011A), where C Fr, C Fw, C Dr and C Dw represent read and write costs of the flash and disk respectively. We conduct the experiments on different SSDs including Samsung and Intel, and our approach yields similar performance on different devices, and hence we only present the results one Samsung SSD due to space constraint. The R nonhot represents the percentage of non-hot buffer zone out of the overall memory buffer space. Unless stated explicitly, the default parameter values, given in bold, are used. Table 4 Experimental Parameters Parameter Value S M (10 3 pages) 0.2 (for TPC-B) 2 (for TATP) 1 (for MLK) 2 (for Synthetic) 50, 100, 200, 400 (for TPC-H) Ratio F/M 1:1, 2:1,... 5:1,... 20:1 C Fr, C Fw (ms) 0.245, C Dr, C Dw (ms) 12.7, 13.7 R nonhot 0.02, 0.04,..., 0.08, 0.1, 0.2,..., Parameter tuning The ratio of non-hot buffer zone (R nonhot ) in the main memory is the only parameter of HAT. As discussed in the previous section, larger non-hot buffer size can provide more space for the newly arrival pages and thus can better adapt to the workload changes. On the contrary, smaller non-hot buffer zone means the system can allocate more space to hot

12 12 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems pages, and hence improve the buffer efficiency on frequently accessed pages. Therefore, the optimal performance is a tradeoff between pattern change adoption and hot page buffer. In this experiment, we show how the parameter R nonhot affects the performance of HAT based on both the synthetic trace and benchmark traces, and the results are illustrated in Figure 5. The real benchmark traces shows similar results, and thus only the result on TPC-H is given as a representative. Fig. 5 Parameter Tuning In Figure 5, the performance of our approach on synthetic workload is rather stable when the R nonhot is low. This is because the access pattern of synthetic trace conforms to a fixed distribution. Although with a small R nonhot, HAT may evict a new page quickly, but it can still recognize hot pages after it is accessed again. The total I/O time slightly increases when non-hot page occupies over 70% of the main memory, in which the main memory is polluted with cold pages. The access pattern of TPC-H is not stable, thus the total I/O time first decreases and then increases. The optimal performance appears when the parameter is around 0.1. The experimental results suggest that we should reserve the majority of memory size to buffer the frequently used pages, and leave a small percentage buffer space to hold newly pages for further hotness examination. Only if the pages are really hot, it will be moved to hot zone, thus the utility of hot buffer zone can be improved. Our design can better utilize the overall buffer space, but also adapt to the access pattern change. We use 0.1 as the default value of R nonhot in the following experiments. 4.3 Comparison with other techniques We proceed to show the comparison with other existing approaches, i.e., TAC and FaCE. We fix the main memory size and vary the flash memory size to evaluate total I/O time on each trace. The results are illustrated in Figure 6. The horizontal ordinate stands for the ratio between the flash and memory. With the increment of ratio, the total buffer size of the system also increases as we consider the flash as a second level buffer. Consequently, the total I/O time decreases for all the approaches. However, our approach is better than other approaches in most of the cases and achieves up to 50% speedup against the competitors. On TPC-B trace, when the flash/memory ratio is low, TAC yields better performance than other approaches. TPC-B is a trace with stable pattern; TAC can precisely detect the hottest pages and store them in the flash on this kind of workload, which shows superiority especially when the buffer is small. However, the performance of TAC degrades very fast when the size of the flash increases. The reason to this result is that the TPC-B trace is a write-intensive trace, the ratio of write operations is around 20% as shown in Table 3. TAC adopts the flash as a write through cache, and thus, when the flash size is large, the drawback of this policy is more obvious which also verifies the results in paper [8]. FaCE and HAT outperform TAC with the increment of flash memory size ratio. Our HAT strategy steadily outperforms FaCE, as HAT has better hot page detection mechanism than FaCE. The results in Figure 6 (b) on TATP shows similar trend with TPC-B. As an OLAP workload, TPC-H is read-intensive and includes 22 complex queries. Figure 6 (c) illustrates the performance comparison on the TPC-H trace. As the access pattern varies among these 22 queries, the lines in this figure are not as smooth as in TPC-B trace for all the approaches. Our approach HAT steadily outperforms TAC and FaCE, and achieves up to 30% performance improvement. The temperature-based statistics become invalid when the access pattern changes. FaCE has similar performance with TAC. The change of access pattern also has effects on the page management in FaCE. The workload size of TPC-H is larger than TPC-B and TATP, which validates the efficiency of our approach on various data sizes. The making Linux kernel process needs to compile the source code to object code which will be then linked to the executable files, and thus this process contains a large number of read operations and a few write operations. As shown in Table 3, the ratio of write operations is very low in MLK trace and most of the write operations are sequential one. As the access pattern of MLK trace is also not stable, HAT is always better than TAC. Although using the second chance mechanism, the hot detection of FaCE is still inefficient in hot page detection, which can not learn

13 Front. Comput. Sci. 13 (a) Performance on TPC-B trace (b) Performance on TATP trace (c) Performance on TPC-H trace (d) Performance on MLK trace Fig. 6 I/O performance comparison on traces with various flash/main memory ratio frequently accessed data well. Hence, a large number of cold pages are flushed from main memory to flash which wastes the flash capacity. Our approach has an effective hot page detection mechanism so that HAT can prevent this kind of drawback and shows better performance. The total hit ratio of memory and flash is illustrated in Figure 7. Our approach HAT shows comparable or better buffer hit ratio than others in all the cases. The locality of TPC-B is very strong as the buffer hit ratio is larger than 90% for all the approaches even though the total buffer size is very small. When the buffer size is low, on this high locality trace, TAC shows the highest hit ratio, since TAC can record the temperature of pages and leverage the long history to detect exact hotness of a page which better manages the precious buffer resource. HAT records less historical access information than TAC and thus lose accuracy on very hot page detection. But when the buffer size is high, HAT shows its priority. FaCE is not skilled at the hotness detection and thus performs the worst. Compared with the results in Figure 6 (a), TAC has higher hit ratio than FaCE but also higher I/O cost. This result is caused by the write through cache design of TAC, as the ratio of write operations in TPC-B is high. The buffer hit ratio on other workloads is coincident with the total I/O cost results shown in Figure 6, and can be explained accordingly. To further investigate the behavior of these approaches, we list the number of read and write counts on TPC-H in detail in Table 5 by enlarging the sizes of memory and flash. Our algorithm yields better performance compared with others in almost all the cases, which is mainly owing to the smallest number of disk reads in our approach. When the total buffer size is small, HAT has more flash reads but fewer disk operations, which means our algorithm manages flash more efficiently than others. TAC has the fewest flash write operations. TAC adopts temperature to determine the page placement and the temperature of a page is very stable, and hence the page replacement on flash is low. This low replacement on flash makes TAC react slowly to the workload change. FaCE keeps all the pages replaced out of main memory to the flash, and hence has the most flash write operations especially when the main memory size is small and more memory cache miss happens. When the buffer size is very large, i.e., 3.2G memory and 16G flash in the last case, all the competitors yield similar performance,

14 14 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems (a) Performance on TPC-B trace (b) Performance on TATP trace (c) Performance on TPC-H trace (d) Performance on MLK trace Fig. 7 Total buffer hit ratio on traces with total buffer size as almost all the visited pages can be buffered. Note that the overall data size of TPC-H is around 17.5G as shown in Table 3. We also replay the traces on real SSD and hard disk devices with different hybrid storage management approaches as illustrated in Figure 8 and 9. As the test on real devises has fluctuation, we conduct the evaluation three times and present the average time. HAT is better than the competitors on all the tests. Although FaCE has more I/O operations on TPC-H according to Table 5, it performs better on real devices. This is because FaCE turns the flash I/O accesses into sequential accesses which can better fit the flash device. The performance difference between TAC and HAT is similar with the simulation results and HAT outperforms TAC in all the cases. We further vary the flash/memory ratio and examine the total run time on hybrid storage system in Figure 9. HAT has the lowest run time. The run time of TAC does not show obvious decrement with the enlargement of the buffer size. This is due to the heavy writes to the disk. FaCE performs worse than our HAT approach although FaCE is enhanced with sequential I/O accesses, the reason is that the hot Fig. 8 Performance on SSD devices detection capability of FaCE is weak and needs more buffer to hold the hot pages. We finally evaluate the computational cost of different approaches, and the results of are illustrated in Figure 10. Note that, the average cost for each access of the workload is proportional to the total computation cost. In this experiment, we use the same parameter setting as in Figure 8. In general, computational cost grows proportional to number of page references as given in Table 3. TAC takes

15 Front. Comput. Sci. 15 Table 5 Read and write counts on TPC-H for different algorithms Main/Flash Size Algorithm Flash Read Flash Write Disk Read Disk Write Total I/O Cost FaCE 7,562,176 17,533,638 10,372, , ,190, MB/2GB TAC 1,346,622 1,787,085 9,510, , ,660,434 HAT 1,720,603 2,256,064 9,189, , ,152,433 FaCE 5,174,275 11,580,971 8,175, , ,921, MB/4GB TAC 3,652,386 1,846,571 7,005, ,251 98,281,977 HAT 4,622,445 2,311,450 6,037, ,380 86,735,198 FaCE 6,359,417 5,902,331 4,853, ,887 74,452, GB/8GB TAC 5,452,735 2,830,796 5,062, ,450 74,574,629 HAT 5,650,503 3,559,431 4,421, ,533 67,209,372 FaCE 8,688,859 3,302,628 1,704, ,887 32,949, GB/16GB TAC 8,146,849 1,791,877 1,791, ,158 32,713,115 HAT 6,961,194 3,547,915 1,704, ,887 32,723,556 Fig. 9 Performance on TPC-B traces Fig. 10 Computational time the most computational cost on TPC-B, TATP and MLK benchmark. The reason is that TAC has to maintain the temperature of pages and the cost is very expensive. However, HAT performs worst on TPC-H benchmark. Since TPC-H has the most number of pages, the page reference queue in HAT become larger and leads to more maintenance cost. Compared with Figure 8, the computational time is one or two magnitudes less than I/O time. Thus I/O time is the key cost in hybrid storage management, so the computational time difference has little effect on the overall performance. hot, warm and cold. We exploited the hotness aware hit" to process the buffer replacement, which migrates the relevant pages in the memory hierarchy according to the current page status and hit position in the page reference queue. Compared with the existing methods, HAT can effectively buffer frequently accessed pages with a low computational cost and better adapt to the workload changes. Experiments on different traces show that HAT can achieve better performance against existing approaches. Acknowledgements This research was supported by NSFC under Grant No and MIIT grant 2010ZX Conclusion In this paper, we have presented a novel buffer management strategy HAT for flash-based hybrid storage systems. HAT utilizes a page reference queue to maintain the historical access information, and the queue itself is divided into hot region and warm region. Furthermore, we proposed to categorize the status of accessed pages to three levels, i.e., References 1. Times E. SSDs: Still not a Solid State Business. Stillnot-a solid-state business, Momentus XT Solid State Hybrid Drives

16 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems 4. Oracle.

2011, 241 253 6. Wu X, Reddy A L N. Managing storage space in a flash and disk hybrid storage system. In: MASCOTS. 2009, 1 4 7. Canim M, Mihaila G A, Bhattacharjee B, Ross K A, Lang C A.

In: SIGMOD Conference. 2011, 1113 1124 9. Kang W H, Lee S W, Moon B. Flash-based extended cache for higher throughput and faster recovery. PVLDB, 2012, 5(11): 1615 1626 10. Lv Y, Cui B, Chen X, Li J.

Chen F, Koufaty D A, Zhang X. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In: SIGMETRICS/Performance. 2009, 181 192 13.

16 16 Yanfei LV et al. HAT: An Efficient Buffer Management Method for Flash-based Hybrid Storage Systems 4. Oracle. Deploying hybrid storage pools with oracle flash technology and the oracle solaris zfs file system Ou Y, Härder T. Trading memory for performance and energy. In: DASFAA Workshops. 2011, Wu X, Reddy A L N. Managing storage space in a flash and disk hybrid storage system. In: MASCOTS. 2009, Canim M, Mihaila G A, Bhattacharjee B, Ross K A, Lang C A. Ssd bufferpool extensions for database systems. PVLDB, 2010, 3(2): Do J, Zhang D, Patel J M, DeWitt D J, Naughton J F, Halverson A. Turbocharging dbms buffer pool using ssds. In: SIGMOD Conference. 2011, Kang W H, Lee S W, Moon B. Flash-based extended cache for higher throughput and faster recovery. PVLDB, 2012, 5(11): Lv Y, Cui B, Chen X, Li J. Hotness-aware buffer management for flash-based hybrid storage systems. In: CIKM Conference Bouganim L, Jónsson B T, Bonnet P. uflip: Understanding flash io patterns. In: CIDR Chen F, Koufaty D A, Zhang X. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In: SIGMETRICS/Performance. 2009, Agrawal N, Prabhakaran V, Wobber T, Davis J D, Manasse M S, Panigrahy R. Design tradeoffs for ssd performance. In: USENIX Annual Technical Conference. 2008, Cho j H, Shin D, Eom Y I. Kast: K-associative sector translation for nand flash memory in real-time systems. In: DATE. 2009, Gupta A, Kim Y, Urgaonkar B. Dftl: a flash translation layer employing demand-based selective caching of page-level address mappings. In: ASPLOS. 2009, Bisson T, Brandt S A, Long D D E. A hybrid disk-aware spin-down algorithm with i/o subsystem support. In: IPCCC. 2007, Koltsidas I, Viglas S. Flashing up the storage layer. PVLDB, 2008, 1(1): Wu X, Reddy A L N. Exploiting concurrency to improve latency and throughput in a hybrid storage system. In: MASCOTS. 2010, Canim M, Bhattacharjee B, Mihaila G A, Lang C A, Ross K A. An object placement advisor for db2 using solid state storage. PVLDB, 2009, 2(2): Chen S. Flashlogging: exploiting flash devices for synchronous logging performance. In: SIGMOD Conference. 2009, Debnath B K, Sengupta S, Li J. Flashstore: High throughput persistent key-value store. PVLDB, 2010, 3(2): Debnath B K, Sengupta S, Li J. Skimpystash: Ram space skimpy keyvalue store on flash-based storage. In: SIGMOD Conference. 2011, Zhou Y, Chen Z, Li K. Second-level buffer cache management. IEEE Trans. Parallel Distrib. Syst., 2004, 15(6): Luo T, Lee R, Mesnier M P, Chen F, Zhang X. hstorage-db: Heterogeneity-aware data management to exploit the full capability of hybrid storage systems. PVLDB, 2012, 5(10): Jin P, Ou Y, Häoder T, Li Z. Ad-lru: An efficient buffer replacement algorithm for flash-based databases. Data Knowl. Eng., 2012, Li Z, Jin P, Su X, Cui K, Yue L. Ccf-lru: a new buffer replacement algorithm for flash memory. IEEE Trans. Consumer Electronics, 2009, 55(3): Johnson T, Shasha D. 2Q: A low overhead high performance buffer management replacement algorithm. In: VLDB. 1994, Yanfei Lv is a staff in National Computer network Emergency Response technical Team/Coordination Center of China. He obtained his B.Sc. from Northeastern University in 2006, and Ph.D. in 2013 from Peking University. His research interests include flashbased database, Hadoop and big data. Dr. Bin Cui is a professor in the School of EECS, Peking University. His research interests include database performance issues, query and index techniques, Web data management and data mining. He has served in the Technical Program Committee of various international conferences including SIGMOD, VLDB and ICDE. He is currently in the Editorial Board of VLDB Journal, TKDE, DAPD, and Information Systems. Jing Li is currently a PhD student at Department of Computer Science and Engineering, University of California, San Diego. Prior to that, Jing Li obtained his bachelor degree from Peking University in His research interests include database, architecture and mobile computing. Xuexuan Chen is an employee of Google Switzerland working as a software engineer in search ads quality. He obtained his B.Sc. and M.Sc. from Department of Computer Science, Peking University, in 2010 and 2013 respectively. From 2008 to 2013, his research work focused on flash-based database systems, especially on performance evaluation, buffer management algorithms, index structures of relational database systems on top of flash-based SSDs.

Operation-Aware Buffer Management in Flash-based Systems

Operation-Aware Buffer Management in Flash-based Systems Operation-Aware uffer Management in Flash-based Systems Yanfei Lv 1 in Cui 1 1 Department of Computer Science & Key Lab of High Confidence Software Technologies (Ministry of Education), Peking University