Block Recycling Schemes and Their Cost-based Optimization in NAND Flash Memory Based Storage System

Size: px

Start display at page:

Download "Block Recycling Schemes and Their Cost-based Optimization in NAND Flash Memory Based Storage System"

Louise Thompson
6 years ago
Views:

1 Block Recycling Schemes and Their Cost-based Otimization in NAND Flash Memory Based Storage System Jongmin Lee School of Comuter Science University of Seoul Sunghoon Kim Center for the Info. Security Tech. Korea University Hunki Kwon School of Comuter Science University of Seoul Choulseung Hyun School of Comuter Science University of Seoul Seongjun Ahn Software Laboratories Samsung lectronics Co. Jongmoo Choi Division of Information and CS Dankook University Donghee Lee * School of Comuter Science University of Seoul dhl_exress@uos.ac.kr Sam H. Noh School of Comuter and Information ngineering Hongik University Samhnoh@hongik.ac.kr ABSTRACT Flash memory has many merits such as light weight, shock resistance, and low ower consumtion, but also has limitations like the erase-before-write roerty. To overcome such limitations and to use it efficiently as storage media in mobile systems, Flash memory based storage systems require secial address maing software called the FTL (Flash-memory Translation Layer). Like cleaning in Log-structured file system (LFS), the FTL often erforms a oeration for block recycling and its efficiency affects the erformance of the storage system. To reduce the block recycling costs in NAND Flash memory based storage, we introduce another block recycling scheme that we call migration. Our cost-models and exerimental results show that cost-based selection of or migration for each block recycling can decrease block recycling costs and, therefore, imrove erformance of Flash memory based storage systems. Also, we derive the macroscoic otimal migration/ sequence minimizing block recycling costs for each migration/ combination eriod. * Corresonding Author ermission to make digital or hard coies of all or art of this work for ersonal or classroom use is granted without fee rovided that coies are not made or distributed for rofit or commercial advantage and that coies bear this notice and the full citation on the first age. To coy otherwise, or reublish, to ost on servers or to redistribute to lists, requires rior secific ermission and/or a fee. MSOFT 7, Setember 3-October 3, 27, Salzburg, Austria. Coyright 27 ACM /7/9...$5.. xerimental results show that the erformance of Flash memory based storage can be further imroved by the macroscoic otimization than the simle cost-based selection. Categories and Subject Descritors C.5.3 [Microcomuters]: ortable Devices; D.4.2 [Oerating System]: Storage Management Secondary storage General Terms Design, erformance, xerimentation Keywords Flash memory based storage system, oeration, migration oeration, FTL (Flash-memory Translation Layer) 1. INTRODUCTION Flash memory has been used as storage media in mobile devices such as cellular hones and DAs and is now to be used in ersonal comuters in the name of solid state disks (SSD). Flash memory has advantages over conventional magnetic disks in terms of weight, shock resistance, and ower consumtion. However, it also has some inherent limitations such as the erase-before-write requirement and different unit sizes for read/write and erase oerations. Therefore, secial address maing software called the Flashmemory Translation Layer (FTL) was introduced into Flash memory based storage systems to overcome these limitations [6]. The FTL is a software layer that emulates an array of logical sectors 174

2 over the Flash memory media by translating a logical sector address to a hysical data location on Flash memory. When a read request is given with a sector address, the FTL searches a ma to find the location of the sector data on Flash memory and, if the search is successful, it reads and delivers the sector data. In case of a write request, the FTL allocates a new emty sace for that sector and writes the sector data on it. It then modifies the maing information for the sector. Overall, the oerations of the FTL is similar to Log-structured file system (LFS) [16] in that it remas the sector location to a new area for each write request and sace need to be reclaimed via a sace recycling scheme [13]. Secifically, in NAND Flash memory, erase units called blocks need to be recycled and, henceforth, we call the sace recycling scheme in NAND Flash memory FTL as a block recycling scheme. This block recycling scheme in NAND Flash memory FTL is also often called a oeration because it collects sectors scattered on two or more blocks into a block and the other blocks are reclaimed. In this aer, we assume that the FTL uses the oeration for its recycling scheme. Like the LFS where its erformance deends on the efficiency of the sace recycling scheme, the block recycling efficiency of the FTL, secifically the efficiency, affects the erformance of Flash memory based storage systems. Moreover, its efficiency varies according to the data write attern as can be similarly observed with the LFS. If data were written sequentially, the oeration can reclaim a whole block with low cost and, conversely, for randomly or reeatedly written data the block reclaiming cost is quite exensive. Unfortunately, we can observe both sequential and reeated write atterns in file system requests and, thus, we need to find an efficient block recycling scheme for reeated write as well as for sequential write. To imrove block recycling efficiency for the reeated write attern, we introduce a new block recycling scheme, which we call the migration oeration. Secifically, the migration oeration can reduce the block recycling costs for reeated write atterns that forces the costly oeration to reclaim a block. Also, we resent cost models of the and the migration oerations that enable the FTL to select a low cost oeration between them when it comes to recycle a block. xeriments with the ostmark benchmark and embedded system workloads show that the costbased selection of migration or oeration reduces block recycling costs of FTL and, as a result, imroves the erformance of Flash memory based storage. During the exeriments, we found the existence of a macroscoic otimal migration/ sequence that minimizes block recycling costs for each migration/ combination eriod. xerimental results of the benchmark and the workloads show that the erformance of Flash memory based storage can be further imroved with the macroscoic otimization than the simle cost-based selection. This aer is organized as follows. In the next section, we describe characteristics of Flash memory and related works. In Section 3, we rovide basic knowledge regarding the block recycling scheme, secifically the oeration in FTL. In Section 4, we introduce the migration oeration and then derive the cost models of the migration and oerations. In Section 5, we resent a macroscoic otimization that minimizes block recycling costs for each migration/ combination eriod. Section 6 resents the exerimental results of the cost-based selection and the macroscoic otimization, and finally we conclude this aer in Section FLASH MMORY CHARACTRISTICS AND RLATD WORKS NAND Flash memory has unique characteristics as follows. NAND Flash memory consists of the same size blocks, each of which in turn consists of the same size ages. The block size and the age size are tyically 16 to 256 KB and.5 to 4 KB, resectively. Read/write data from/to NAND Flash memory is erformed in age units. SLC (Single Level Cell) Flash memory tyically takes 25 μs for age read and 3 μs for age write, resectively, while for MLC (Multi Level Cell) Flash memory it takes considerably longer for both age read and age write oerations. Data in Flash memory, once written to a age, cannot be udated in lace. This means that to udate a age without an FTL the block containing the age must be erased and that all ages in the block must be written again along with the new data. Such an erase oeration is erformed in block units and tyically takes 2 ms in SLC Flash memory and 1.5 ms in MLC Flash memory, resectively [1, 2]. According to their maing unit, tyical FTLs can be divided into two categories, a age maing FTL and a block maing FTL. A age maing FTL calculates a logical age number with a requested sector number and then translates the logical age number to a hysical location (a age in a block) through a ma. In case of block maing FTL, a hysical block has a fixed number of sectors in ordered manner on its ages. When a sector read/write request is given, a block maing FTL calculates a logical block number by dividing the requested sector number with the number of sectors in a block and then translates the logical block number to a hysical block number through a ma [4, 1, 14]. Then, the FTL can easily find a sector location within the hysical block because sectors are stored at that block tyically in ascending order. age maing FTLs have been used in low-caacity NOR Flash memory cards while block maing FTLs have been used in most high-caacity NAND Flash memory cards [3, 5] because the ma size for block maing is much smaller than for age maing. As mentioned earlier, erformance of a age maing FTL deends on sace recycling efficiency just as the erformance of LFS deends on cleaning efficiency [13]. Also, as erformances of LFS and LFSlike Flash memory file systems such as JFFS [17] and YAFFS [7] have been imroved by searating hot and cold data, sace recycling efficiency of a age maing FTL could be imroved by searating hot and cold data [8, 9]. Block maing FTLs have some advantages in that the ma size is small and overall oeration is simle. Also, HW/SW co-design of frequent oerations can boost the erformance of an FTL as we can see in solid state disks. However, the erformance of a block maing FTL is degraded in case of random and reeated write atterns. To overcome this roblem, Kang et al. roosed a hybrid scheme in which a maing unit is a suer block (a grou of logically adjacent blocks) while age maing is used inside a suer block [12]. Also, Lee et al. roosed another hybrid aroach called FAST (Fully Associative Sector Translation) [15], where the sector maing (similar to age maing) was introduced restrictively to the write buffer area (Log blocks described in Section 3) while the maing unit for the data area is still in blocks. xerimental results of the FAST scheme show that the comlementary sector maing is helful in imroving the erformance of Flash memory storage, secifically, in case of random and reeated write atterns. However, these hybrid schemes require additional imlementations of age maing features and garbage collection mechanisms for 175

3 age maing areas aside from the original block maing FTL. In contrast, the migration oeration roosed in this aer can be imlemented with minor modifications of the original oeration, but the imrovement in erformance is limited to reeated write atterns only. ventually, the FTL designers will have to choose between the hybrid schemes or our aroach to imrove erformance of non-sequential write atterns based on the given resources and erformance requirements, 3. FLASH-MMORY TRANSLATION LAYR AND MRG ORATION As mentioned earlier, an FTL makes Flash memory aear to the uer file system layer like a virtual block device. For this urose, the FTL creates virtual sectors on Flash memory and manages them so that they seem to be udated in lace. Also, the FTL reclaims blocks with a block recycling scheme when it needs free sace for virtual sector udates. Before we introduce the new block recycling scheme that we call migration, we will exlain an existing bock recycling scheme called the oeration. In exlaining these FTL oerations, we assume that the FTL uses a block maing scheme in translating the logical address to a hysical address. Though the commercial FTL used in our exeriments has sohisticated features such as multi-level maing, wear-leveling, and other erformance and reliability enhancement schemes, those features do not conflict with the fundamentals of the block recycling schemes and the cost functions described in this aer. As existing data in a Flash memory block cannot be udated in lace, FTL redirects a sector write request to an unused age in a temorary block which we call henceforth a log block. In Fig. 1(b), reeated write requests to sectors 8 and 9 are redirected to ages in a log block. Subsequent write requests will eventually consume all ages in the log block. Then, a oeration will need to be erformed to reclaim a log block to service subsequent write requests. A oeration is comosed of erasing a new emty log block and coying all valid sectors from either the old log block or the old data block to the new emty log block (Fig. 1(c)). In the examle in Fig. 1(c), after coying all sectors, the new log block becomes the data block for sectors 8~11 and both the old data block and the old log block become free log blocks that will be used for subsequent write requests or later oerations. Last of all, the relevant ma is udated to reflect the new status of the data and log blocks. If all sectors 8~11 are written sequentially in the log block, an otimized form of the oeration, sometimes called the switch, is ossible. For switch, the FTL, without erasing a new log block and the coying of ages, simly udates the ma to designate the log block as the new data block for sectors 8~11, and the old data block for those sectors is designated a free log block. With this otimization, block recycling costs for sequential write atterns become much lower than those for random or reeated write atterns that require the full rocedure of erasing a block and coying all ages as deicted in Fig. 1. By the way, we can easily observe not only sequential write atterns, but also reeated such atterns from the uer layer file system. In Fig. 2, we see a mixture of two tyes of write requests, sequential write of file data and reeated udate of metadata such as FAT and directories. We think that these mixed write atterns are common in many file systems and that an otimization is also needed for this reeated write besides the already efficient sequential write. Sector number Figure 1. Merge oeration in FTL NIKON Trace Access attern (Write) Write command Figure 2. Write attern of a FAT file system running on Flash memory based storage of a Nikon camera 4. MIGRATION ORATION AND COST MODLS This section describes the new block recycling scheme for reeated write atterns that we refer to as the migration oeration. Also, we derive the cost models for the and migration oerations that are used by the FTL to select the more cost effective method when reclaiming a block. If writes to some sectors are reeatedly requested, then those sectors will be redirected to a log block until all ages in the log block are consumed. When all the ages have been used, as we know, a oeration must be erformed to generate a free block and, after the, all ages in the new free block can be used again 176

4 for subsequent write requests. Observe in Fig. 3(b), that only two ages are written reeatedly in the log block. When a relatively small number of ages are written reeatedly in a log bock, we have an oortunity to aly a migration oeration and eventually, to reduce block recycling costs. Unlike the oeration that erases a new log block and coies all ages from the old data block and/or the old log block, the migration oeration erases a new log block and coies only the valid ages, two ages in Fig. 3(b), from the old log block to the new log block. The old log block, then, becomes a free block, while the new log block becomes a write buffering area for sectors 8~11 to which all incoming write requests for those sectors will be redirected. for MLC Flash memory chis, this would be done via two oerations, that is, a age read and a subsequent age write. To calculate the cost er available age, let us begin from when the oeration has finished in Fig. 1(c) and a write request for sector 8 follows it. To service the write request, the FTL allocates and erases an emty log block with cost C. Then, all subsequent write requests are redirected to one of the N ages in the newly allocated log block. Later when all the ages are consumed, a oeration is erformed by erasing a new log block with cost C and coying all N ages from either the old data block or from the old log block with cost N C. Then, the relevant ma is udated and a eriod of block life cycle (interval between successive oerations) ends. If a write request for sector 8~11 follows, then a new eriod of block life cycle begins again and the FTL will allocate an emty log block to serve that request. In total, the FTL ays cost, C, to acquire N available ages for each eriod. Summing u, we can define cost as 2C + N C and its benefit as N (available ages in the new log block) for each eriod. Hence, the cost er available age, W, is as follows; C 2C + N C W N N To calculate migration cost er available age, let us again begin from when the migration oeration has comleted in Fig. 3(c) and a write request for sector 8 follows it. To service the write request, the FTL redirects the write request to one of N - unused ages in the log block. After N - write requests are served with that log block, either a migration or a oeration should be erformed. Let us assume that a migration oeration is referred. Then it will erase a new log block with cost C and coy valid ages from the old log block to the new log block with cost C. In total, FTL ays migration cost, C mig, to acquire N - available ages for each eriod, where the migration cost, C mig, is C +C. Hence, the migration cost er available age, W mig, is as follows. W mig C + C N Figure 3. Migration oeration Assume that the number of ages in a block is N and the number of valid unique ages in the old log block is (2 in Fig. 3(b)). Then, the migration oeration coies ages from the old log block to a new log block and, as a result, N - ages are available in the new log block after migration. Generally, the oeration ays a high rice, but roduces all N ages available in the new log block, while the migration oeration ays a lower rice, but roduces a relatively small number of ages that are available in the new log block. Therefore, a cost-benefit analysis is required about which oeration is more cost-efficient for each block recycling. To identify the more cost-efficient oeration between the migration and oerations, we derive cost-er-benefit models for both oerations. Let us assume that C is the block erase cost (or time) and C is the age coy cost (or time) from a block to another. Many contemorary SLC Flash memory chis rovide a coyback oeration, which coies a age from a block to another through an internal buffer in the Flash memory chi. For those chis, a age coy could be done with a single coyback oeration. Otherwise, as These cost models for migration and oerations do not consider the ma udate cost because it is almost identical for both oerations and makes u only a small ortion in the total block recycling cost. For examle, our FTL is highly otimized to gather in a single ma age all information that is to be modified during block recycling and, thus, it writes a single ma age after block recycling. With the cost models, an FTL can select a cost-effective oeration for each block recycling. Note that the migration cost for an available age, W mig, varies according to, the number of coied ages during migration. At a certain value of, W mig becomes equal to W and we call it the equilibrium. The existence of an equilibrium suggests that the migration oeration be referred when is smaller than equilibrium and vice versa. 2C + N N C N equilibrium 2 C + C N 177

5 5. MACROSCOIC OTIMAL MIGRATION/MRG SQUNC We imlemented the cost-based selection between and migration oerations and measured the erformance with the ostmark benchmark. However, the erformance imrovements were much lower than we exected as will be shown later in Section 6. Also, better erformance was observed by alying a oeration even when was much smaller than the equilibrium. These observations lead us to recognize the existence of a macroscoic otimal migration/ sequence that minimizes block recycling costs for each migration/ combination eriod. requests with those ages. Again if all ages in the log block are consumed, the FTL erforms the second migration oeration and roduces N - 2 ages available in the new log block. Similarly, migration oerations are erformed a total of n times and finally, a oeration follows to end the macroscoic eriod. After the oeration, all N ages are available in the new log block and another overall macroscoic migration/ sequence begins. Now we define the macroscoic otimization as finding an otimal migration/ sequence minimizing total block recycling costs for the macroscoic eriod. Secifically, when function n is given, the macroscoic otimization is identifying how many times the migration oerations should be alied before the concluding oeration by which the total block recycling cost becomes minimal. 7 Number of coied ages Number of successive migration oerations Figure 4. Number of coied ages during migration oeration Before we introduce the macroscoic otimization, we need to define n, a function for value. Assume that a new eriod begins after a migration that coied ages from an old log block to a new log block. In the next eriod, if only ages that already reside in the log block are written again, then the next migration will still coy ages. On the other hand, if a write request for a new sector that does not exist in the log block arrives and is written to the log block, then the next migration will coy +1 ages. In every case, the number of coied ages never decreases until a oeration is erformed; in fact, also it often increase as new sectors enter the log block. To confirm this analysis, we observed with the ostmark benchmark the number of coied ages of a block containing FAT and root directory sectors. The results, deicted in Fig. 4, show that the number of coied ages during migration increases almost linearly for successive migrations. This suggests a non-decreasing function n with an increasing rate α, where n is the number of successive migrations since an originating oeration. Let us now assume that a macroscoic eriod begins right after a oeration and it ends by n erforming α n another oeration. Also, we assume that migration oerations are alied n times until the ending oeration. Then, the objective becomes finding the n value such that the sequence minimizes the total block recycling costs for the macroscoic eriod, as described below. Let us now elaborate using Fig. 5. In Fig. 5, N ages are available in a log block at the beginning of the macroscoic eriod. If all N ages are consumed, the FTL erforms a migration oeration that coies 1 ages. This first migration roduces N - 1 ages available in the new log block, and now the FTL can serve N - 1 write Figure 5. Migration/ sequence To figure out the otimal number of migration oerations, we need to change some cost functions. With given n, C mig n, the n th migration cost, can be defined as follows (see Fig. 3). n mig C C + C α n C + C n After the (n-1) th migration, N - n-1 ages are available in the new log block and d n refers to the number of available ages after the (n- 1) th migration. d n Now we can define W(n), the migration/ cost er an available age, for T n, a macroscoic eriod of migration/ sequence, as follows; W ( n) n k 1 C k mig n+ 1 k 1 As the macroscoic otimization of migration/ sequence is finding n that minimizes W(n), we differentiate W(n) to roduce W (n) and solve for n that makes W (n ) C + C 2 2 W'( n) 1 In order to differentiate, we assume that n and n are real. N n 1 N α( n 1), n > 1 + C d k 2 N α n + α C 2 α C n α n + N 2 ( C N + C ) α C α n α n + N α n + N C n + C 1 α n + N 2 N + α C N C N C 178

6 n 2 2 ( C N + C ) α ± ( C N + C ) α ( C + C N ) α ( C N α + α C + 2 N C 2 N C ) ( C + C N ) α This solution means that, when α is given, we can minimize the block recycling cost of a macroscoic eriod by erforming migration oerations n times and then, subsequently erform a oeration. 6. XRIMNTAL RSULTS OF MIGRATION/MRG SQUNC We imlemented the migration oeration by simly modifying the existing oeration in our FTL that has been used commercially for M3 layers and camcorders. As a result, the FTL could select a cheaer oeration between migration and for each block recycling. Also, the macroscoic otimization was imlemented by estimating the increasing rate α of each log block with accumulated history of. With the increasing rate α, the FTL calculates on-line the otimal number of migrations before a for each log block. In our exeriments, a benchmark or a workload-generating rogram runs on an MS-DOS FAT comatible file system and again the file system runs on our FTL. In turn, the FTL runs on a Flash memory emulator. The Flash memory emulator mimics the oerations of Flash memory chis and also has an additional feature to calculate the elased time of Flash memory oerations using time information rovided in Flash memory chi datasheets. As we are interested in suorting high-caacity MLC Flash memory chis, tyical Flash memory oeration times secified in an MLC Flash memory datasheet were used in the exeriments (Table 1). However, because the emulator ignores many chores encountered in real systems, the elased time of the emulator may differ from that of real systems. To validate the elased time of the Flash memory emulator, we comared it with a measured elased time of a real embedded board that has a 2 Mhz S3C241 (ARM92T core) rocessor and a 128 Mbyte NAND Flash memory chi. 2 As the elased time of the real board includes both Flash memory oeration times and code execution times of test alications, the file system, and the FTL, we exected that the elased time of the real board would be larger than that of the emulator. Indeed, file read time of the board was greater than the emulator time as shown in Fig. 6. Surrisingly, however, file write time of the board exactly matched the emulator time. The reason was that actual age read time of the Flash memory chi was closely matched with that secified in the datasheet, but actual age write and block erase times were much smaller than those secified in the datasheet. As a result, in case of the file write test, the real board consumed less time for Flash memory oerations than the emulator, but consumed additional time for code execution. These effects comensated for each other to generate almost the same file write time as shown in Fig. 6. From these observations, we can conclude that the emulator roduces very realistic results for write oerations, but underestimates for read oerations. 2 The Flash memory emulator used Flash memory oeration times secified in the datasheet excet for the bus transfer time, which was measured in a real board and used in the emulator. lased Time (sec) Table 1. Tyical Flash memory oeration times [1] Flash memory oeration xecution time (ms) age read (2112 bytes).113 age write (2112 bytes) 1.13 age coy (2112 bytes) Block erase 1.5.5M 1M 1.5M 2M 2.5M 3M 3.5M 4M File size Write (Board) Write (mulator) Read (Board) Read (mulator) Figure 6. lased time comarisons of emulator and real board In the exeriments, we used two test rograms, the ostmark benchmark and embedded system workloads that had been used by Gal and Toledo [11]. With these test rograms, we exerimented with the following four schemes, that is, 1) only for block recycling, 2) cost-based selection between the migration or the oerations, 3) cost-based selection with eriodic, and 4) macroscoic otimization. In resenting the exerimental results, these four schemes will be denoted as -only, +migration, +migration (eriodic ), and +migration (otimization), resectively. For the third scheme, the FTL counts the number of consecutive migration oerations and if the count exceeds a redefined limit value, N /2 in our exeriments, then it alies a oeration even though the cost model recommends erforming a migration oeration. In other words, the +migration (eriodic ) is a blind version of +migration (otimization) because the former does not estimate the otimal number of migrations before, but simly guesses it. Like the LFS, the erformance of a age maing FTL deends on the remaining free sace. Also, the erformance of the FAST scheme, which adots sector maing in log blocks, varies according to the number of log blocks. However, the erformance of a block maing FTL is not directly related to the number of log blocks excet for when the number of log blocks is extremely small. In our exeriments, only eight blocks were used as log blocks comared to the thousands of data blocks, and the erformance was insensitive to the number of log blocks in this range. 179

7 6.1 xeriments with ostmark benchmark The ostmark benchmark allows us to generate various workloads by setting arameters for behavior of transactions, number of transactions, numbers of files/directories to be created/deleted, file size, etc. We set those arameters to generate two different workloads that we will denote as directory/small file workload and large file workload in our exeriments. In the former workload, the ostmark benchmark executes transactions that create/delete 5 directories and files of 512 bytes. In the latter workload, it executes transactions that create/delete 1 Mbytes files. Flash Memory Oerations lased Time (sec) only +migration +migration(eriodic ) +migration(otimization) Number of Transactions Figure 7. lased times of Flash memory oerations under directory/small file workload of ostmark benchmark Fig. 7 shows that +migration imroves erformance about 5% over original -only scheme. Also, +migration (eriodic ) and +migration (otimization) imrove erformance u to 26~27% over only scheme. In +migration scheme, the FTL executes the migration oeration when was smaller than 64 (N /2), the equilibrium ; otherwise it executes the oeration for block recycling. However, +migration (otimization) referred to migration even when was much smaller than 64. For examle, in the ostmark benchmark, the increasing rate α of a log block for FAT and directory sectors was estimated to.1 and n to 49, resectively, and the total block recycling costs became minimal by alying successively migration oerations 49 times and then alying a oeration and, at that time,, the number of coied ages during migration, was about 5. To investigate the benefits of the migration/ sequence in detail, we analyzed the costs er available age of the only scheme, W, and the migration/ sequence, W(n) where n is the number of successive migration oerations before ( α.1). In Fig. 8, W(n) lunges down as migration oerations are alied for several times since the originating oeration. It then becomes minimal when n is 49 and then increases slightly as n increases beyond the minimal oint. At the minimal oint, the migration/ sequence cost is only 5% of the -only scheme for an available age. Mig/ cost er available age W W( n) Number of successive migrations (n) Figure 8. Cost er available age when α.1 W reresents cost-er-age of -only scheme and W(n) reresents cost-er-age of migration/ sequence when migrations are erformed successively n times before a oeration is alied. In our exeriments, the +migration (otimization) calculates the otimal number of consecutive migrations on-line by estimating the increasing rate α while the +migration (eriodic ) fixes it with a redefined eriod value, N /2. In Fig. 7, surrisingly, the erformance of +migration (eriodic ) is only a little better than +migration (otimization). We had chosen the eriod value, 64 (N /2), arbitrarily, but further investigation showed that the chosen value was almost the off-line otimal for that workload. Fig. 9 shows the erformance of +migration (eriodic ) for various eriod values from 2 to 12. In the figure, we observe that the off-line otimal erformance of migration/ sequence is around the eriod value 64 and it is better than that of the on-line +migration (otimization). However, the otimal eriod value may differ from workload to workload and the +migration (eriodic ) scheme would generally be blind to the workload characteristics. In Section 6.2, indeed, you can find that the blind +migration (eriodic ) erforms worse than the workload adative +migration (otimization) for the embedded system workloads in which the redefined eriod value is not otimal any more. However, we should note that the +migration (eriodic ) scheme can be an alternative to +migration (otimization) if we already know the workload characteristics or if we want a tradeoff between imlementation overhead and erformance. 18

8 Flash Memory Oerations lased Time (sec) eriod Figure 9. lased times of +migration (eriodic ) scheme for various eriod values Fig. 1 shows the erformance of the four schemes under large file workload of the ostmark benchmark. Under the large file workload, three schemes including migration erformed better than the -only scheme, but the erformance imrovement rates were somewhat smaller than those under the directory/small file workload. This is because most write activity of large file workload is sequential and that is not the target of the migration oeration. However, some metadata such as FAT and directories are reeatedly modified while file data are sequentially written and the migration oeration imroves erformance for those reeated udates of metadata as can be observed in Fig. 1. Flash Memory Oerations lased Time (sec) only +migration +migration(eriodic ) +migration(otimization) Number of Transactions Figure 1. lased times of Flash memory oerations under large file workload of ostmark benchmark 6.2 xeriments with embedded system workloads mbedded system workloads are a collection of Flash memory references of three different embedded devices such as a fax, a cellular hone, and an event recorder. Fig. 11 shows the erformance of the four schemes under these workloads. In the figure, all workloads show a similar attern in that erformance increases more in the order of -only, +migration, +migration (eriodic ), and +migration (otimization) excet for the one case with the event recorder where -only erforms better than +migration. Secifically, +migration (otimization) increases erformance u to 5% over the -only scheme under the fax workload, 22% under the cellular hone workload, and 14% under the event recorder workload, resectively. Flash Memory Oerations lased Time (sec) only +migration +migration(eriodic ) +migration(otimization) Fax Cellular hone vent Recorder Workloads Figure 11. lased times of Flash memory oerations for the embedded system workloads Under the event recorder workload, +migration erformed slightly worse than -only, and we investigated the reason behind this. Before the exeriments, we formatted the file system and all the FAT sectors were written during the format. As a result, the log block for FAT sectors had already almost N /2 unique ages before the exeriments. As the event recorder workload modifies only a small set of data reeatedly, only a few ages in the log block were reeatedly udated, but each migration coied almost N /2 ages all the time and roduced N /2 available ages in the new log block. As only half of the ages are available in the new log block, block recycling incurred so frequently as to increase the burden of ma udates and, as a matter of course, this is the worst case encountered by the +migration scheme. If a oeration was executed just once after the format, however, reetition of this worst case migration could be avoided. With the oeration, the FTL would flush the already halffull log block and roduce a whole-emty new log block. Thereafter then, each migration would coy only a small number of ages modified since the. In this case, the coy overhead and block recycling frequency would be much smaller than the revious case. This examle exlains why the +migration (eriodic ) and the +migration (otimization), which occasionally flush cold ages from log blocks with the oeration, show better erformance than the +migration scheme that does not. Also, under the embedded system workloads, the +migration (otimization) erforms better than the +migration (eriodic ) because the former alies the oeration at the best time determined via best-effort, while the latter does it 181

9 in a blind manner. These exerimental results strongly suggest that a oeration be alied among successive migrations to flush cold ages from log blocks, secifically when the workloads change. 7. CONCLUSION As Flash memory has some inherent limitations, Flash memory based storage requires a software layer called the FTL that redirects modified data to a new location on Flash memory and recycles blocks with the so-called oeration. In this aer, aside from the existing oeration, we introduced another block recycling scheme that we call migration. xeriments with the ostmark benchmark and embedded system workloads show that a cost-based selection of migration or oeration for each block recycling can reduce block recycling costs and therefore imrove the erformance of Flash memory based storage systems. Also, reliminary exeriments suggested the existence of an otimal migration/ sequence minimizing block recycling costs and, indeed, exeriments with various workloads show that macroscoic otimization delivers much better erformance than the simle cost-based selection. Finally, it should be noted that, as the macroscoic otimization suggests analytically, a eriodic flushing of log blocks, that is, executing a oeration among successive migration oerations, is needed when combination of migration and is used for block recycling. In the future, we will investigate whether our scheme has a beneficial effect on other file systems such as the Linux xt3 file system. 8. ACKNOWLDGMNTS This work was artly suorted by the IT R&D rogram of MIC/IITA [26-S-4-1, Develoment of Flash Memory-based mbedded Multimedia Software]] and suorted in art by grant No. R from the Basic Research rogram of the Korea Science & ngineering Foundation. We would also like to thank rof. Sang Lyul Min of the Seoul National University for roviding us with the Nikon Camera trace and other valuable advise. 9. RFRNCS [1] 1G x 8Bit / 2G x 8Bit NAND Flash memory (K9L8G8UM) Data Sheets, Samsung lectronics, Co., 25. [2] 512M x 8Bit / 256M x 16Bit NAND Flash Memory (K9K4GxxxM) Data Sheets, Samsung lectronics, Co., 23. [3] CF+ and ComactFlash Secification Revision 3., ComactFlash Association, 24. [4] Flash-Memory translation layer for NAND flash (NFTL), M-Systems. [5] The MultiMediaCard System Summary, MMCA Technical Committee, 25. [6] Understanding the Flash Translation Layer (FTL) Secification, Intel Cororation, [7] YAFFS (Yet Another Flash File System) Secification Version.3, htt:// 22. [8] Chiang, M.-L., Lee,.C.H. and Chang, R.-C. Using Data Clustering to Imrove Cleaning erformance for Flash Memory. Software: ractice and xerience, 29 (3) [9] Chiang, M.-L., Lee,.C.H. and Chang, R.C., Managing Flash Memory in ersonal Communication Devices. in roceedings of the 1997 International Symosium on Consumer lectronics (ISC'97), (1997), [1] Gal,. and Toledo, S. Algorithms and Data Structures for Flash Memories. ACM Comuting Surveys, 37 (2) [11] Gal,. and Toledo, S., A Transactional Flash File System for Microcontrollers. in USNIX Annual Technical Conference, (25), [12] Kang, J.-U., Jo, H., Kim, J.-S. and Lee, J., A Suerblockbased Flash Translation Layer for NAND Flash Memory. in roceedings of the 6th ACM & I International conference on mbedded software, (Seoul, 26), [13] Kawaguchi, A., Nishioka, S. and Motoda, H., A Flash- Memory Based File System. in roceedings of the Winter 1995 USNIX Technical Conference, (1995), [14] Kim, J., Kim, J.M., Noh, S.H., Min, S.L. and Cho, Y. A Sace-efficient Flash Translation Layer for ComactFlash Systems. I Transactions on Consumer lectronics, 28 (2) [15] Lee, S.-W., ark, D.-J., Chung, T.-S., Lee, D.-H., ark, S. and Song, H.-J. A Log Buffer based Flash Translation Layer using Fully Associative Sector Translation. ACM Transactions on mbedded Comuting Systems, 6 (1). [16] Rosenblum, M. and Ousterhout, J.K. The Design and Imlementation of a Log-Structured File System. ACM Transactions on Comuter Systems, 1 (1) [17] Woodhouse, D. JFFS: The Journaling Flash File System Ottawa Linux Symosium,

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 Mingliang Chen 1, Weiyao Lin 1*, Xiaozhen Zheng 2 1 Deartment of Electronic Engineering, Shanghai Jiao Tong University, China