Professional Documents
Culture Documents
6, JUNE 2009
Abstract—While NAND flash memory is used in a variety of end-user devices, it has a few disadvantages, such as asymmetric speed
of read and write operations, inability to in-place updates, among others. To overcome these problems, various flash-aware strategies
have been suggested in terms of buffer cache, file system, FTL, and others. Also, the recent development of next-generation
nonvolatile memory types such as MRAM, FeRAM, and PRAM provide higher commercial value to Non-Volatile RAM (NVRAM). At
today’s prices, however, they are not yet cost-effective. In this paper, we suggest the utilization of small-sized, next-generation
NVRAM as a write buffer to improve the overall performance of NAND flash memory-based storage systems. We propose various
block-based NVRAM write buffer management policies and evaluate the performance improvement of NAND flash memory-based
storage systems under each policy. Also, we propose a novel write buffer-aware flash translation layer algorithm, optimistic FTL, which
is designed to harmonize well with NVRAM write buffers. Simulation results show that the proposed buffer management policies
outperform the traditional page-based LRU algorithm and the proposed optimistic FTL outperforms previous log block-based FTL
algorithms, such as BAST and FAST.
Index Terms—Nonvolatile RAM, flash memory, write buffer, flash translation layer, solid-state disk, storage device.
1 INTRODUCTION
TABLE 1 TABLE 2
Characteristics of Storage Media Small-Block and Large-Block NAND Flash Memories
algorithm. Generally, large sequential write operations can mapping. Hence, consideration of the FTL algorithm is
induce switch merge operations while random write opera- inevitable for designing a write buffer management scheme
tions induce full merge operations. Therefore, if random write for flash memory. For example, since most of the
operations occur frequently, the performance of the flash FTL algorithms use block-level mapping, the write buffer
memory system decreases. management scheme should be designed to decrease the
number of merge operations (each of which consists of
multiple copies of valid data pages and a maximum of two
3 NVRAM WRITE BUFFER MANAGEMENT POLICIES
block erase operations) that can be invoked internally while
FORFLASH MEMORY processing write operations.
3.1 Design Considerations To decrease the number of extra operations, the write
Write buffer management schemes for hard disk have buffer management scheme is required to 1) decrease the
been developed over the past decade. According to those number of merge operations by clustering pages in the
schemes, the performance of the write buffer management same block and destaging them at the same time, 2) destage
scheme for a hard disk depends on the following two pages such that the FTL may invoke switch merge or partial
factors: merge operations which show relatively low cost rather than
the full merge operation, which is very expensive, and
. Total number of destages to the hard disk: The total 3) detect sequential page writes and destage those sequen-
number of destages to the hard disk is the number of tial pages preferentially and simultaneously.
write operations actually issued to the hard disk. As Fig. 2 shows the necessity of the block-level management
more write requests are accommodated by the write of the write buffer. There are 10 pages in the NVRAM, and
buffer, less requests are issued to the hard disk. In the flash memory consists of more than four data blocks,
the LRW (Least Recently Written page) scheme [3], including blocks A, B, C, and D, and only two log blocks
[4], [5], they exploited the temporal locality of the (L1 and L2). Each block in the flash memory consists of
data access pattern to increase the write buffer hit four physical pages. Data blocks A, B, C, and D contain
ratio and decrease the number of destages to the corresponding data pages: block A contains pages A1, A2,
hard disk. A3, and A4; block B contains B1, B2, B3, and B4; block C
. Average access cost of each destage: Since read and contains C1, C2, C3, and C4; and block D contains D1, D2,
write operations in a hard disk show a symmetric D3, and D4. NVRAM is currently filled with pages A3’,
operation speed, the access cost in a hard disk can be A4’, B1’, B2’, C1’, D1’, A1’, D2’, A2’, and C2’ (each page is
modeled as the sum of the seek time, rotational the newer version of the corresponding page in the flash
delay, and transfer time. By exploiting the spatial memory), and among these, page A3’ is the oldest page
locality of the data access pattern, SSTF, LST [4], and (LRU page) and page C2’ is the newest page (MRU page).
CSCAN [5] attempted to decrease the seek time and While four data blocks in the flash memory are fully used,
rotational delay. no page in the log blocks is used yet. When subsequent
The stack model in [4] used the LRW and LST schemes, write requests are issued to NVRAM, the write buffer
simultaneously, to exploit both the temporal locality and manager selects victim pages according to the replacement
spatial locality, and WOW [5] used LRW and CSCAN, algorithm and evicts them to the flash memory, one after
simultaneously, to exploit both localities. These two factors another. The figure shows the sequence of operations to the
also make sense when using NAND flash memory instead flash memory when victim pages are evicted, assuming
of a hard disk. First, the number of destages to flash memory that BAST is used for the FTL algorithm.
should be decreased to increase the overall performance. To Assume that buffer pages are managed in a page-level
decrease the number of destages to flash memory, the write scheme and the victim selection sequence is A3’, A4’, B1’,
buffer hit ratio should be increased. Therefore, we can use B2’, C1’, D1’, A1’, A2’, A2’, and C2’. In this case, FTL writes
traditional buffer management schemes that exploits the pages A3’ and A4’ into log block L1 and B1’ and B2’ into log
temporal locality of the data access pattern. However, the block L2. When page C1’ is evicted to the flash memory,
access cost factor makes it necessary to devise a novel write since there is no remaining log block, FTL merges (full
buffer management scheme. While the access costs for data merge) data block A and log block L1 into a new data block
blocks that are stored in physically different locations in the and erases A and L1 to acquire an empty log block, and
hard disk vary, the physical location of a data block in the then writes C1’ into the erased L1. At this time, the merge
flash memory does not affect the access time to the block. operation consists of two erase operations, four read
The spatial locality is no longer important factor for flash operations, and four write operations.
memory. Therefore, instead of the seek time and rotational When buffer pages are managed in a block-level scheme,
delay, another important factor should be considered to pages in the NVRAM are clustered by corresponding block
estimate the access cost for flash memory: extra operations number in the flash memory. Assume that the victim cluster
issued by FTL. selection sequence is Cluster A, Cluster B, Cluster C, and
As described in Section 2.3, FTL issues extra read, write, Cluster D. Since a page cluster is selected as a victim, all
or erase operations internally to efficiently manage the pages in the victim page cluster are evicted to the flash
storage space in flash memory, and the number of those memory simultaneously. Hence, the sequence of evicted
extra operations depends both on the data access pattern pages becomes A1’, A2’, A3’, A4’, B1’, B2’, C1’, C2’, D1’, and
from the upper layer and the algorithm used for address D2’. In this case, FTL writes pages A1’, A2’, A3’, and A4’
748 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009
Fig. 2. Comparison between page-level buffer management and block-level buffer management.
into log block L1 and B1’ and B2’ into log block L2. When the large block NAND flash memory is 2 KB, we assumed
pages C1’ and C2’ are evicted to the flash memory, since the page size in the NVRAM is also 2 KB. For block-level
there is no remaining log block, FTL merges (switch merge) buffer replacement policies, we clustered pages in the
data block A and log block L1 and erases L1, and then NVRAM by the block number in the flash memory and
writes C1’ and C2’ into the erased L1. At this time, the those page clusters are maintained through a linked list. In
merge operation consists of only one erase operation. each page cluster, pages with the same block number in
In this manner, we can easily obtain the total number of flash memory are residing through a linked list. The size of
extra operations when all pages in NVRAM are evicted to a cluster is defined as the number of pages in the cluster,
the flash memory. Using page-level management policy, which varies from 1 to 64. Fig. 3 shows the data structure of
11 read, 11 write, and 5 erase operations are invoked while write buffers for block-level buffer management policies.
2 read, 2 write, and 2 erase operations are invoked using
block-level management policy. As we can see from this 3.2.1 Least Recently Used Page (LRU-P) Policy
example, block-level buffer management policy not only In LRU-P policy, the replacement unit is a page and the least
invokes relatively fewer merge operations than page-level recently used (written) page in the buffer is selected as a
buffer management policy but also invokes switch merge or victim. Since the reference to the NVRAM write buffer is
partial merge rather than full merge for merge operation. done in a page unit, LRU-P policy most precisely reflects the
Block level buffer management policy has one drawback: “hotness” of pages, which results in a relatively high page
since it evicts multiple pages at a time even though only hit ratio in the buffer. To handle sequential page writes,
one page replacement is needed, the utilization of the LRU-P regards 64 or more consecutive page writes as a
NVRAM pages can be decreased, which results in a lower sequential page write and maintains those sequential pages
buffer hit ratio than that of page-level buffer management with a sequential page list. Pages in the sequential page list
policy. However, since the benefit from the reduced cost for are selected preferentially as victim pages and if the list is
extra operations by block-level buffer management policy is empty, the least recently used page is selected as a victim.
much greater than the drawback, the block-level buffer 3.2.2 Least Recently Used Cluster (LRU-C) Policy
management policy shows better overall performance than
In LRU-C policy, the replacement unit is a page cluster and
the page-level buffer management policy.
the least recently accessed cluster is selected as a victim.
3.2 Write Buffer Management Policies Access to a cluster means either modifying an already
In this section, we propose four write buffer management
policies, among which three are block-level buffer manage-
ment policies and one is a page-level buffer management
policy. Page-level buffer management policy is introduced
only for comparison purposes.
We assumed that the block-level address mapping is
used in FTL since the page-level address mapping is not
used widely in pratical situation. Hence, data movement
between NVRAM and flash memory is done according to
the block mapping algorithm in FTL. Since the page size in Fig. 3. Data structure for block-level write buffer management.
KANG ET AL.: PERFORMANCE TRADE-OFFS IN USING NVRAM WRITE BUFFER FOR FLASH MEMORY-BASED STORAGE DEVICES 749
TABLE 3
Characteristics of Disk I/O Trace
Fig. 7. Characteristics of write buffer management policies (FAT trace). (a) Page hit ratio, (b) number of destaged clusters, and (c) average size of
victim clusters.
Preserving hot clusters while evicting cold clusters can merge operation is performed, and a merge operation
provide a chance to page clusters in the buffer to form a consists of valid page copies and erases. Hence, we itemized
larger spectrum of cluster sizes, which can enlarge the the extra overhead into valid page copy overhead and erase
average size of the evicted clusters and resultantly decrease overhead, each of which is the time overhead for each
the total number of destaged clusters. As we can see from operation. The y-axis of the figure represents the normal-
Fig. 7b, the number of destaged clusters in the CLC policy is ized extra overhead and the x-axis represents the write
much smaller than that in the LC policy. buffer size.
Fig. 7c shows the average size of the victim clusters in We can figure out the effect of page clustering by
each policy. The result shows, fairly well, the relationship comparing the performances of the LRU-P and LRU-C
between the total number of destaged clusters and the policies. The overall performance of the LRU-C policy is
average size of the victim clusters. As the average victim about 11-43 percent (both in FAT and NTFS) higher in the
cluster size increases, the number of destaged clusters BAST case and 12-40 percent (in FAT) and 18-47 percent (in
decreases. Hence, as we increase the victim cluster size, we NTFS) higher in the FAST case. It shows that clustering
can decrease the total number of extra operations in FTL, pages in the same erasure unit (i.e., block) can decrease the
which, as a result, increases the overall I/O performance of number of valid page copy and erase operations. Also, the
the flash memory-based storage system. As we can see from effect of page clustering increases as the write buffer size
the figure, the CLC and LC policies show the largest and increases, since a cluster can stay in the buffer for a longer
smallest average size of the victim clusters, respectively. time during which more pages can be gathered in the
While the average cluster size in LRU-C policy is much cluster. Hence, the performance gap between LRU-P and
larger than that in CLC policy (Fig. 8), the average size of LRU-C increases as the buffer size increases.
the victim clusters in LRU-C policy is smaller than that in LRU-C shows far better page hit ratio than the LC policy
CLC policy. This is because the CLC policy selects the (Fig. 7a), and the overall I/O performance of the LRU-C
largest cluster, from the size-dependent LRU cluster lists, as policy is better than that of the LC policy. Figs. 7b and 7c
a victim, while the LRU-C policy does not consider the show that the average size of victim pages in the LC policy
cluster size. is smaller than that of the LRU-C policy and the number of
4.2 Performance Comparison destaged clusters in the LC policy is larger than that in the
Fig. 9 shows the overall performance of each write buffer LRU-C policy. Writing a small-sized cluster may invoke,
management policy. The extra overhead is used as the with higher probability than writing a large-sized cluster,
performance metric. Extra overhead is the time overhead the full merge operation, which requires a greater number of
induced by the extra operations. Extra operations occur while valid page copies and erasures rather than other types of
merge operations (switch merge or partial merge). Hence,
frequent writing of small-sized clusters makes the overall
performance of the LC policy worse even than that of the
LRU-P policy. Therefore, considering only the size of the
page cluster can be the worst choice for victim selection.
While the CLC policy shows a slightly lower page hit
ratio than that of the LRU-C policy (Fig. 7a), not only the
number of destaged clusters in the CLC policy is smaller
than that in the LRU-C policy (Fig. 7b), but also the average
size of the victim clusters is larger than that in LRU-C policy
(Fig. 7c). The CLC policy could harvest those affirmative
Fig. 8. Average cluster size in the buffer (FAT trace). (a) LRU-C and effects only by reserving part of the buffer space for a pure
(b) LC and CLC. LRU cluster list (size-independent LRU cluster list). We can
752 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009
Fig. 9. Performance comparison: Extra overhead is normalized such that the overhead with no NVRAM write buffer is 1. (a) BAST: number of log
blocks = 16 (FAT trace), (b) FAST: number of log blocks = 16 (FAT trace), (c) BAST: number of log blocks = 16 (NTFS trace), and (d) FAST: number
of log blocks = 16 (NTFS trace).
see the effect of the pure LRU cluster list from Fig. 9, where page clusters for hot clusters (in the size-independent
the CLC policy outperforms others in all cases. LRU cluster list) can sufficiently exploit the temporal
Fig. 10 shows the overall performance of the CLC locality of the storage access.
policy with various proportions of size-independent LRU
cluster list in the total NVRAM buffer space. The time is
normalized such that the execution time is 1 when ¼ 0:1
5 OPTIMISTIC FTL
and the buffer size is 1 MB. The CLC policy is the same as 5.1 Motivation
the LC policy when ¼ 0 and the LRU-C policy when Since previous FTL algorithms did not consider the
¼ 1. It shows the best performance when ¼ 0:1, which existence of the NVRAM write buffer, their design policy
means that maintaining 10 percent of the total number of to “efficiently accommodate different write patterns (e.g., random
small writes or sequential large writes) to the flash memory”
added considerable complexity to the FTL. For example,
BAST and FAST algorithms use log blocks to cope with
random small writes. A page is not updated but is
invalidated and the new version of the page is written in
the log block. To keep a trace of the up-to-date pages,
log block-based FTL algorithms maintain a sector mapping
table for log blocks, and when a read/write request to a
page arrives, they must first search the page mapping
information in the mapping table. If it exists in the table, the
corresponding page in the log block is accessed; otherwise,
the original page in the data block is accessed. In this way,
the erase operation is delayed until a fairly large number of
updated pages are written to the log block. However, until
Fig. 10. The effect of the proportion for size-independent LRU cluster list the data block and log block are merged, all page accesses
in the buffer (FAT trace). require a mapping table lookup. Moreover, small random
KANG ET AL.: PERFORMANCE TRADE-OFFS IN USING NVRAM WRITE BUFFER FOR FLASH MEMORY-BASED STORAGE DEVICES 753
writes can invoke the full merge operation when merging the
data block and log block, which is relatively expensive.
Using an NVRAM write buffer for page clustering,
small random writes can be transformed into sequential
writes to the flash memory. The sequential writes to the
flash memory decrease the necessity of sector mapping for
log blocks. If we can keep the ordered sequence of pages in
a log block, without large overhead, we can remove the
sector mapping table, which requires not only much
memory space but also page search overhead. Also, it
can simplify the complicated merge process by removing
the full merge operation.
the cost for making a new complete and sequential log block. 9: update BAT with LP I ¼ 0;
To make a complete and sequential log block, pages that are 10: end if
not in the victim cluster but in the old log block should be 11: if I min > LP I then // do append
copied to the new log block (Fig. 11d). The Log block switch 12: LP Inew ¼ I max ;
operation can be used for this case and total LP I Nv valid 13: for i, i ¼ LP I þ 1; ; LP Inew do
page copies are necessary, which can be very expensive 14: if page[i] 2 VC then
when LP I is large. When the old log block contains a large 15: write page[i] to LB½i;
number of pages (large LP I), instead of making a new log 16: else
block through a large number of valid page copies, 17: copy DB½i to LB½i;
switching the old log block to the data block can be more 18: end if
efficient (Fig. 11b). To make a data block from the old log 19: end for
block, missing pages in the log block should be copied from 20: else
the original data block. Since LP I of the log block is large, 21: LBnew = newly assigned log block;
the number of missing pages is small. Also, the new log //selecting victim log block can be needed
block is enough to store only those pages whose indices are 22: if ðNB LP IÞ > ðLP I Imax Þ then
smaller than or equal to Imax . We call this a Data block switch // do Log block switch
operation. 23: LP Inew ¼ maxfLP I; I max g;
The Data block switch operation consists of four steps: 24: for i, i ¼ 1; ; LP Inew do
1.valid page copies for pages whose indices are larger 25: if page[i] 2 VC then
than LP I, from the original data block to the old log 26: write page[i] to LBnew ½i;
block (total NB LP I copies), 27: else
2. valid page copies for pages whose indices are 1, , 28: if page[i] 2 LB then
Imin 1 from the old log block to the new log block, 29: copy LB½i to LBnew ½i;
3. writing Nv pages from the victim cluster to the 30: end if
new log block (in this step, valid page copies for 31: else
missing pages in the victim cluster, whose indices 32: copy DB½i to LBnew ½i;
lie between Imin and Imax , also occur), and 33: end if
4. erasing the original data block and updating BMT 34: end for
and BAT. 35: erase LB; update BAT;
The Log block switch operation, in the above case, 36: else // do Data block switch
consists of four steps where the first two steps are the 37: LP Inew ¼ I max ;
same as steps 2 and 3 in the Data block switch operation. 38: for i, i ¼ 1; ; LP Inew do
The remaining steps are: 3) valid page copies for pages 39: if page[i] 2 VC then
whose indices are Imax þ 1; ; LP I, from the old log block 40: write page[i] to LBnew ½i;
to the new log block (total LP I Imax copies), and 41: else //the page is in the LB
4) erasing the old log block and updating BAT. Assuming 42: copy LB½i to LBnew ½i;
that the cost for updating BMT is small, the cost difference 43: end if
between the two operations arises from step 1 in the Data 44: end for
block switch operation and step 3 in the Log block switch 45: for i, i ¼ LP I þ 1; ; NB do // fill up log block
operation. Hence, when ðNB LP IÞ > LP I Imax , then 46: copy DB½i to LB½i;
the Log block switch operation is more efficient than the 47: end for
Data block switch operation, and vice versa. In case of 48: erase DB; update BMT and BAT;
Fig. 11c, LP I Imax < 0. Hence, we can combine two 49: end if
cases, Figs. 11c and 11d, into one case in which ðNB 50: end if
LP IÞ > LP I Imax is satisfied. 51: update BAT such that LP I ¼ LP Inew ;
Algorithm 1. Log Block Management Algorithm The log block management scheme is formalized in
Notations: Algorithm 1. If the victim cluster contains all pages for the
VC: Victim cluster LB: log block block, it is not necessary to maintain the log block. Hence,
DB: Data block LP Inew : new LP I the Optimistic FTL replaces the corresponding data block
with the new data block and updates BMT (steps 1-6 in
1: if the size of VC == block size then Algorithm 1).
2: DBnew = newly assigned data block; When the victim cluster overlaps with the current log
3: write pages in the VC to DBnew ; block (step 20 in Algorithm 1), a new log block is assigned
4: erase old DB and update BMT; to replace the old one. At the time, if there is no free log
5: return; block, a victim log block is selected to be merged with the
6: end if corresponding data block. Since all log blocks in the
7: if corresponding LB does not exist then Optimistic FTL are complete and sequential, a partial merge
8: allocate a LB; //selecting victim log block can be needed (Fig. 1b) operation is performed.
KANG ET AL.: PERFORMANCE TRADE-OFFS IN USING NVRAM WRITE BUFFER FOR FLASH MEMORY-BASED STORAGE DEVICES 755
Fig. 12. Merge latencies in each FTL algorithm: in Optimistic FTL, latencies for log block switch, data block switch, and partial merge are plotted.
(FAT trace). (a) BAST. (b) FAST. (c) Optimistic FTL.
The Optimistic FTL is much simpler than previous log evicted from the write buffer. If the time is small, the
block-based FTL algorithms, such as BAST and FAST. It probability that an append operation will occur becomes
maintains only BMT and BAT in the memory. The BAST large, which enables efficient use of log blocks by Optimistic
and FAST algorithms maintain not only BMT and BAT but FTL. Interval 0 means that no future victim cluster having
also a sector mapping table for the log block. Optimistic FTL the same corresponding data block with the current victim
does not maintain a sector mapping table since it does not cluster will be evicted from the write buffer. For those
use sector mapping for the log block. Also, Optimistic FTL victim clusters, not only append but also data block switch or
uses partial merge, append, Data block switch, and Log block log block switch operations will occur. Hence, assuming an
switch operations, which are far cheaper than full merge extreme case where the intervals of all victim clusters are 0,
operations. When a full merge operation occurs, in BAST, only a partial merge operation will be performed to secure
while merging a data block and a log block, 64 read, free log blocks. We can see from Fig. 13a that the
64 write and 2 erase operations are always performed, interreference interval becomes large as the write buffer
assuming a large block flash memory. In FAST, each page in
the log block should be merged with its original data block.
Hence, assuming the worst case, where all pages in the log
block are from different data blocks, 4,096 (64 64) read,
4,096 write and 65 erase operations are performed. How-
ever, in Optimistic FTL, 63 read, 64 write and 1 erase
operations are performed even in the worst case of Log block
switch or Data block switch operations.3
Fig. 12 shows the latencies of merge operations in each
FTL algorithm. The x-axis represents the sequence of
3,000 merge operations and the y-axis shows the latency
of each merge operation. Since the merge operation
greatly affects the response time to the write request
from the above layer (file system), the merge latency is
one of the important factors in designing FTL algorithm.
Since the latency of a merge operation in FAST largely
depends on the number of corresponding data blocks for
each page in the log block, latencies of merge operations
in FAST show a large deviation (Fig. 12b). As we can see
from Figs. 12a and 12c, BAST and Optimistic FTL show
very stable merge latency, and the average merge latency
in Optimistic FTL is much lower than that in BAST.
Fig. 13 shows two important considerations for making
elaborate Optimistic FTL. In this experiment, we used CLC
for the write buffer management policy and Optimistic FTL
with 128 log blocks is used for the underlying FTL
algorithm. Fig. 13a shows the cummulative distribution of
the interreference interval of each victim cluster. The x-axis
represents the time that a victim cluster evicted from the
write buffer has to stay in the log block until a new victim
cluster with the same corresponding data block will be
Fig. 14. Extra overhead in each FTL algorithm. (a) FAT trace. (b) NTFS trace.
size increases. As the write buffer size increases, the size of by merge operations (BAST and FAST) or append, log block
the page clusters in the buffer also increases because a larger switch, and data block switch operations (Optimistic FTL).
portion of rereferences to each cluster are absorbed in the The extra overhead is normalized such that the overhead
write buffer. Hence, the probability that a victim cluster for Optimistic FTL with 16 log blocks and 512 KBytes
evicted from the write buffer will be referenced again NVRAM is 1. As we can see from the figure, Optimistic FTL
decreases as the write buffer size increases. outperforms BAST in all cases and FAST when the NVRAM
When all log blocks are associated with data blocks, it is size is larger than or equal to 2 Mbytes (except when the
necessary to select and merge a victim log block to make a number of log blocks is 128). While FAST looks to be
free log block. At this time, the victim selection scheme can competitive when the number of log blocks are large, the
affect the overall performance of Optimistic FTL. Fig. 13b high comlexity of merge operation in FAST makes its merge
shows the performance of Optimistic FTL with three latencies very high and unstable, which can be a critical
different victim selection schemes. In MAX, the log block problem for flash memory-based storage devices. Also,
that has the largest LP I value is selected for the victim. when the NVRAM size is large, the number of log blocks in
Since MAX selects the log block which has the largest FAST does not largely affect the overall performance. We
can conclude, based on the results, that using NVRAM as a
number of pages in it, the average number of valid page
write buffer for a flash memory-based storage system not
copies in each merge operation is minimized. However, it
only necessitates a write buffer-aware FTL algorithm for
has the same drawbacks as the LC write buffer management
performance improvement but also can simplify the
policy, which results in more erase operations than others.
FTL algorithm. Optimistic FTL can be an efficient write
LRU and FIFO showed similar performance in all cases, buffer-aware FTL algorithm.
which means that the destaged victim clusters from the Fig. 15 compares the performance of the proposed
write buffer show very weak temporal locality. Also, we can scheme (CLC + Optimistic FTL) with BPLRU + BAST. When
see from the figure that the performance gap among the a victim cluster is selected, BPLRU reads pages which are
three schemes decreases as the write buffer size increases. not in the victim cluster from the data block and combines
As the write buffer size increases, the probability that a them with the pages in the victim cluster to make a full
victim cluster will be appended to an existing log block block. Then, it flushes the full block to the FTL, which, in
decreases. Hence, the effect of the victim selection scheme turn, performs a switch merge operation. The overhead for
on the overall performance also decreases. all operations is exactly the same as with the partial merge
Fig. 14 shows the performance of three log block-based operation in traditional log block based FTL algorithms.
FTL algorithms with various numbers of log blocks for each The underlying FTL has nothing to do except to perform a
NVRAM buffer size. The CLC policy is used for write buffer switch merge operation. Hence, the performance of the
management. We measured the time for extra operations BPLRU is not affected by the underlying FTL algorithm.
(valid page copies and erase operations), which are invoked Actually, the performance of BPLRU + FAST was identical
KANG ET AL.: PERFORMANCE TRADE-OFFS IN USING NVRAM WRITE BUFFER FOR FLASH MEMORY-BASED STORAGE DEVICES 757
Fig. 16. Overall performance comparison between the traditional and the
Fig. 15. Performance comparison between the proposed scheme and
proposed schemes (FAT trace).
BPLRU (FAT trace).