Professional Documents
Culture Documents
7, JULY 2021
Abstract—Existing persistent memory file systems exploit the fast, byte-addressable persistent memory (PM) to boost storage
performance but ignore the limited endurance of PM. Particularly, the PM storing the inode section is extremely vulnerable for the
inodes are most frequently updated, fixed on a location throughout lifetime, and require immediate persistency. The huge endurance
variation of persistent memory domains caused by process variation makes things even worse. In this article, we propose a process
variation aware wear leveling mechanism called Contour for the inode section of persistent memory file system. Contour first enables
the movement of inodes by virtualizing the inodes with a deflection table. Then, Contour adopts cross-domain migration algorithm and
intra-domain migration algorithm to balance the writes across and within the memory domains. We implement the proposed Contour
mechanism in Linux kernel 4.4.30 based on a real persistent memory file system, SIMFS. We use standard benchmarks, including
Filebench, MySQL, and FIO, to evaluate Contour. Extensive experimental results show Contour can improve the wear ratios of pages
417.8 and 4.5 over the original SIMFS and PCV, the state-of-the-art inode wear-leveling algorithm, respectively. Meanwhile, the
average performance overhead and wear overhead of Contour are 0.87 and 0.034 percent in application-level workloads, respectively.
Index Terms—Persistent memory file system, wear leveling, process variation, persistent memory, metadata management
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1035
inodes [1] is unaware of variant endurance caused by pro- We design a process variation aware inode migra-
cess variation. tion algorithm to balance the wear of inodes across
In this paper, we focus on solving the wear-leveling the memory domains.
problem for the inode section of file systems on the PM We implement Contour in Linux kernel based on a
device with process variation. We present a process-varia- real persistent memory file system, SIMFS.
tion-aware mechanism called Contour, that balances the Extensive experimental are conducted with standard
wear of the physical pages storing inodes by fitting the benchmarks. The experimental results show that
unevenly updated inodes to the memory pages with var- Contour significantly improves the wear leveling of
ied endurance. Contour consists of two techniques. First, inodes with negligible overhead.
we enable the migration of inodes by inode virtualization. The reminder of this paper is organized as follows. Section 2
Inode virtualization uncouples inodes from “fixed” physi- introduces the background and shows the motivational exam-
cal locations by a “virtual inode” layer between the logical ple for managing the inodes of file system on persistent mem-
inodes and the inode slots on physical PM. The mapping ory. In Section 3, we present the design and implementation of
relation between a logical inode and the corresponding the proposed Contour mechanism. Section 4 evaluates the
virtual inode is recorded as an offset in a deflection table. proposed Contour by extensive experiments. In Section 5, we
Given the inode number (i-number) of a logical inode, the summarize the studies related to this work. Finally, Section 6
file system can access the actual inode slot using the offset concludes the paper.
and the virtual address of the corresponding virtual inode.
Therefore, inode virtualization enables a logical inode1 2 BACKGROUND AND MOTIVATION
to be moved to different physical locations by changing
its offset. 2.1 Endurance Variation of Persistent Memory
Second, we tackle the unbalanced inode updates by an A common problem of persistent memory technologies is
inode migration mechanism. A PM device is divided into limited endurance. To make things worse, the endurance of
multiple memory domains that show various endur- persistent memory cells show huge variation due to process
ance [14]. The memory cells in the same domain, however, variation [14], [15], [17], [23], [24]. Previous works [14], [18]
have the same endurance. Take the features of persistent divide the memory space into multiple domains that the
memory domains in mind, we propose cross-domain migra- memory cells in a domain have the same endurance. Accord-
tion and intra-domain migration for migrating inodes. The ing to [17], [18], the endurance of domains in a persistent
cross-domain migration algorithm controls the balance of memory device approximately follows a linear distribution.
writes on the memory domains by “write budgets”. We For a 2 GB persistent memory device, the endurance of
migrate an inode to a proper memory domain according to domains ranges from 3:0 106 to 1:7 108 . The endurance
the write frequency of the inode and the relative write of the weakest domain can be 56 times lower than that of the
budgets of memory domains. In intra-domain migration, strongest domain. Thus, the persistent memory device can
we evenly distribute the writes of inodes to the inode slots fail quickly without protection for the weak persistent mem-
in the same domain by migrating inodes to less-wear slots. ory cells will be worn out early.
We implement the proposed Contour mechanism in The size of a domain is determined by the architecture of
SIMFS [10], a typical file system for managing PM device, as persistent memory device. The persistent memory device
a case study in Linux kernel 4.4.30. We evaluate the wear- usually organizes the memory cells in a hierarchical manner,
leveling effect and overheads of Contour using the typical including sub-arrays, sub-banks, and banks. The memory
benchmarks, including Filebench [20], MySQL [21], and cells residing in the same sub-array show the same endur-
Flexible I/O [22]. The wear-leveling effect is expressed by ance [14]. Hence, we regard a memory sub-array as a
wear ratio, i.e., write counts/endurance. domain, which is generally 4MB. In this case, the inode zone
The experimental results show that the proposed Con- in a file system consists of multiple memory domains with
tour mechanism brings significant wear-leveling improve- huge endurance variation. For example, the size of an inode
ment for inodes. Compared with the original SIMFS and zone is about 2.5GB for a 256GB persistent memory device
PCV [1], the state-of-the-art algorithm for the wear-leveling since the file system generally reserves 1 percent space as
of inodes, Contour shows 417.8 and 4.5 lower standard inode zone. Such an inode zone covers 640 domains with
deviation of wear ratios of pages, respectively. The maxi- more than 56 variations from the weakest domain to the
mum wear ratio of pages in Counter can be 2740.3 and strongest domain. In summary, the inode management of
2.5 lower than these of original SIMFS and PCV, respec- persistent memory file systems should fully consider the
tively. Meanwhile, the average performance overhead and endurance variation of the memory cells.
wear overhead of Contour are 0.87 and 0.034 percent in
application-level workloads, respectively. 2.2 Existing Persistent Memory File Systems
The main contribution of this paper includes: With the development of new persistent memories, a set of
persistent memory file systems have been proposed to break
We propose a process variation aware wear-leveling the traditional I/O bottleneck. The design of existing persis-
mechanism, Contour, to protect PM device from being tent memory file systems mainly focus on improving perfor-
damaged by frequently updated inodes. mance or data consistency [4], [9], [25]. For example, BPFS [4]
proposes “short-circuit shadow paging”, an improved
1. In the following, “inode” is stand for logical inode without special shadow paging mechanism exploiting the byte-addressabil-
explanation. ity of persistent memory, to provide highly efficient data
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1036 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1037
Fig. 2. Comparison of conventional inode section and the proposed virtualized inode section.
system can get the virtual address of a VInode using its off- memory domain are evenly distributed over the pages in
set and the inode virtual address space’s beginning address. the memory domain, a stronger memory domain can bear
In Contour, we realize the endurance variation of memory more writes than a weaker memory domain with the same
domains in the persistent memory, whereas existing persis- wear ratio. Thus, it is necessary to move the inodes with dif-
tent memory file systems are unaware of that. ferent writes counts between the domains such that the
On the other side, inode virtualization mechanism maps wear ratio of memory domains are balanced.
logical inodes to VInodes via a deflection table. In the deflec- In this section, we propose a cross-domain migration algo-
tion table, each logical inode has an entry indexed by its i- rithm to balance the wear ratios of different memory domains.
number. The entry of a logical inode maintains the offset to To control the number of wear ratios of memory domains, we
the corresponding VInode. With the help of deflection table, define a Write Budget Line (WBL) for all the domains. WBL is
a logical inode can be mapped to different VInode dynami- an adjustable wear ratio constraint. Within the write budget
cally by changing its offset. Similar to array-structured inode line, each domain dj has its own “Write Budget” WBj , which
sections, tree-structured inode sections can also be virtual- indicates the maximum number of writes can be performed
ized by constructing a deflection table for inodes in the leaf on all the memory pages in the domain. On one hand, the
nodes. Using the proposed Contour mechanism, the file sys- write budgets of memory domains are various due to endur-
tem access an inode in three steps: (1) The file system gets the ance variation. On the other hand, the updating counts of
offset of the logical inode via its i-number; (2) The file system inodes are diverse for the files have different access features.
calculates the virtual address of the inode by adding its offset The main idea of the proposed cross-domain migration is to
with the beginning address of the inode virtual address hold the write budget line for all the domains by migrating
space; (3) The file system accesses the inode slot using the inodes between the domains. We now discuss two key con-
virtual address of inode and the hardware memory manage- cerns of the idea.
ment unit in the processor. Matching Inodes and Domains. A basic observation is that
In implementation, the deflection table is an array that migration operations actually bring additional writes to the
corresponds to the logical inodes of the file system. The memory domains. We need to avoid migration operations
inode virtual address space is a segment of the kernel vir- as far as possible. From this perspective, to achieve balanced
tual address space. The overhead for accessing an inode in wear ratios and reduce the potential inode migrations, the
Contour is only an additional memory access and an addi- ideal case is to place the inodes with most writes on the
tion, which is negligible for the file operation. strongest memory domains and so on. To achieve the ideal
case, we should match the inodes and memory domains
3.2 Inode Migration according to the potential updating counts of inodes and
Given a PM page (domain), we define wear ratio as the por- the available write budget of memory domains.
tion of the write counts performed on the page (domain) The potential updating counts of inodes is related to the
with respect to the corresponding endurance. For an inode I/O patterns of applications [31], such as open-write-close,
section across the memory domains with endurance varia- write-seek, and aggregate-write. For example, an applica-
tion, wear leveling basically means that the wear ratio of the tion adopting open-write-close pattern, such as CESM [32],
physical pages for storing inodes are the same. In this case, generally opens a file, write it, and then close it. Thus, it will
we propose a two-layer strategy for the wear leveling of not cause much writes to the same inode. On the contrary,
inode zone: first, we design a cross-domain migration algo- an application employing aggregate-write pattern, such as
rithm for balancing the wear ratio of different domains; sec- MADCAP [33] and the file log in Webserver [20], opens a
ond, we present intra-domain migration algorithm to even file and contiguously appends it, which will bring lots of
distribute the writes on the PM pages in the same domain, writes to the inode of the corresponding file. Hence, the
which has identical endurance. potential updating counts of an inode can be reflected by
the write frequency of the inode. We define damaging flow
DF T ðIi Þ as the write counts of inode Ii in a timing period T .
3.2.1 Cross-Domain Migration The length of each timing period is set to t.
Given a wear ratio, the affordable numbers of writes on the With the damaging flow, we can judge the criticality of
memory domains are various. Assume the writes upon a each inode to the memory domain, i.e., how harmful an
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1038 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021
inode can be to the current memory domain. To determine In period T , the expected maximum write counts of domain
T 1
the criticality of an inode, we use the average damaging dj is t NWBj DFavg . Thus, the expected maximum write
flow of previous periods as the reference. The average dam- counts WI ðIi Þmax of inode Ii in memory domain dj is:
T
T
aging flow DFavg of period T is calculated by Equation (1).
T 1
NWBj DFavg
T
þ T 1 WI T ðIi Þmax ¼ t : (5)
WItotal =t DFavg NP NS
T
DFavg ¼ ; (1)
2
It is because that each domain has NP NS inode slots and
T
where WItotal is the total write counts of all the inodes in we assume the inode slots sustain the same write counts.
0
period T . Initially, DFavg equals zero. Then, we calculate the When the write counts of an inode Ii reaches the expected
T
criticality factor ‘i of inode Ii in period T as follows: maximum write counts, the file system calculates the current
criticality factor of ‘curr
i of Ii . According to Equations (2) and
DF T ðIi Þ (5), ‘curr is calculated by:
‘Ti ¼ T
NP NS : (2) i
DFavg
WI T ðIi Þmax NP NS t NWBj
‘curr ¼ T 1
¼ ; (6)
where NP is the number of pages in each memory domain, i
Dti DFavg Dti
which is 1024 for a memory domain is 4MB and a page is
4KB. NS is the number of slots in each page, which is a fixed where Dti is the time passed from the beginning of period T .
value for any file system. In Equation (2), we amplify the rel- Then, we migrate Ii to the corresponding memory domain
ative damaging flow of an inode by NP NS times for the crit- according to ‘curr
i and Equation (3). To avoid migrations as
ical factor reveals the importance of an inode to the whole much as possible, we do not proactively migrate the ino-
memory domain. des that do not reach the minimum write counts in the
In each period, we match an inode with a memory period. We simply move it to the best-fit memory domain
domain according to the criticality factor of the inode and when the current memory domain is running out of
the endurance of the memory domains. During a period, if space.
the criticality factor of an inode reaches a higher level, we Adjusting the Write Budgets. The balance of the whole
move it to a stronger memory domain. In the cross-domain write distribution on memory domains is controlled by
migration algorithm, we setup D criticality levels for match- write budget line. Each time we set the new write budget
ing the inodes with memory domains, where D is the num- line, we call it a new round R. At the beginning of round R,
ber of memory domains. The criticality levels are one-to-one the file system calculates the total write budget WBR
total :
matched with the memory domains. The criticality levels is
defined by a step function f : Rþ ! N: X
D
total ¼ WBL
WBR Ej þ ðWBtotal WItotal Þ;
R R1 R1
(7)
8 j¼1
>
> 1; if ‘Ti 2 A1
<
2; if ‘Ti 2 A2
fð‘Ti Þ ¼ : (3) where Ej , j ¼ 1; 2; ldots; D, is the endurance of domain dj .
>
> :::
: During the round, the file system records the total writes
D; if ‘i 2 AD
T
R
counts WItotal of inodes and estimates the percentage of
remaining total budget. Once the remaining total budget
where Aj (j ¼ 1; 2; ldots; D) are the ranges of the criticality fac-
reaches a threshold, the file system will start a new round. A
tors of inodes. The ranges of criticality factor is determined by
busy system needs a larger threshold to ensure that the file
the relative write budgets of memory domains. First, we take
system has enough time to reset the budgets. In our imple-
the average write budget of the memory domains as the base-
mentation, we always set the write budget line and the new
line for the criticality factor is normalized to the average dam-
round threshold to 1 and 5 percent, respectively. With this
aging flow of the system. Then, we calculate the normalized
configuration and the 2 GB persistent memory device from
write budget value (denoted by NWBj for domain dj ) of each
[18], each round can support about 4:42 108 inode updates.
domain over the baseline. Finally, we set the range of memory
The file system can support another 2:21 107 inode updates
domain dj to Aj ¼ ðNWBj1 ; NWBj , except that A0 ¼ ½0;
before starting a new round, which is enough for most
NWB1 and AD ¼ ðNWBD1 ; þ1Þ.
workloads [9].
Migrating the Inodes. Basically, we do not need to migrate
At the beginning of each round, we also need to adjust the
an inode if its damaging flow matches the located memory
write budget of each memory domain. A memory domain
domain. Thus, there is an expected damaging flow for each
may overuse the prior write budget or have surplus. Thus,
inode. In a period T , we migrate an inode Ii if its write counts
we need to rectify the ranges in Equation (3) according to the
exceeds its expected damaging flow DF T ðIi Þexp in the current
actual write budgets of the memory domains: First, the file
period. DF T ðIi Þexp is based on the expected criticality factor
T 1 system finds out the average write budget WBR avg of the ino-
of Ii and the average damaging flow DFavg in the prior
des in round R; then, the file system calculates the normal-
period. The expected criticality factor of Ii in memory domain
ized write budget NWBR j for each memory domain by
dj is NWBj1 < ‘Ti NWBj according to Equation (3). Given
T 1 dividing WBR avg ; finally, we reset the boundaries of the ranges
a period T , DFavg is a known variable. Thus, the predicted
in Equation (3) according to the normalized write budgets of
damaging flow can be calculated by Equation (4).
memory domains.
T 1 T 1 In a certain round, a memory domain dj may have a write
NWBj1 DFavg < DF T ðIi Þexp NWBj DFavg : (4)
budget equal to or larger than that of the adjacent stronger
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1039
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1040 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021
TABLE 1 TABLE 2
Characteristics of Filebench Workloads and MySQL The Number of Writes After Running 10 Hours on NoWL
# of files Average Threads I/O size NoWL fileserver webserver OLTP MySQL
(Data/Log) file size (r/w)
Total writes 6.80E+08 5.00E+08 4.10E+08 4.00E+08
fileserver 200K/0 4KB 50 1MB/16KB Max on domains 21345585 20198403 35243334 16559976
webserver 200K/100 4KB 100 1MB/256B Max on pages 25788 20197700 24028600 16559700
OLTP 200K/1 10MB 12 2KB/2KB
MySQL 30M/20 8KB 50 32B
The selected workloads have different read/write ratios and access patterns.
workload to test the performance of the transactions that
poses small random reads and writes on a file system using
damage of these counters is alleviated to a lower level than 10 database writers, 1 database readers, and one log writer.
the inodes. Even a system failure occurs, we will only miss Furthermore, we install MySQL on the file systems and eval-
the write counts in a short timing period, which is negligible uate wear leveling by the mysqlslap [21] benchmark. The
in the lifetime of PM. detailed characteristics of the workloads are shown in
There is a wear-leveling mechanism called Kevlar [34] Table 1. Then, we measure the performance overhead by Fil-
that also deflects virtual and physical pages and migrates ebench and Flexible I/O (FIO) [22] respectively. Finally, we
hot data. However, since Kevlar is a universal technology evaluate the wear overhead caused by inode migration.
for applications and Contour is dedicated to the inodes of
persistent memory file systems, the granularity, migration 4.2 Effect of Wear Leveling
mechanism, and wear-leveling strategy of Contour and In this subsection, we show the effect of wear leveling from
Kevlar are different. For the wear-leveling of inode section domain level and page level. First, the domain-level wear
in persistent memory file systems, Contour is promised to ratios reveal the trend of wear of the whole memory
be more efficient and economic than Kevlar. domains for inodes. Second, the page-level write distribu-
tions show the wear-leveling effect of the approaches at one
4 EVALUATION point. Table 2 shows the absolute numbers of writes after
running the workloads for 10 hours without support of
4.1 Experimental Setups
wear leveling. For webserver, OLTP, and MySQL, the writes
In this section, we evaluate the wear-leveling effect and effi- on the most heavily written domain mainly come from the
ciency of Contour. We implement the proposed Contour most heavily written page, indicating that the write distri-
mechanism in Linux kernel 4.4.30 based on SIMFS [10], a butions in some domains are highly unbalanced. The max
typical persistent memory file system. We compare Contour numbers of writes on pages for webserver, OLTP, and
with two file systems: 1) The original SIMFS that has no MySQL already reach 108 , showing that some pages are
wear-leveling mechanism (denoted by NoWL) for inodes; 2) closing to be damaged. Hence, it is necessary to evaluate the
PCV, the state-of-the-art wear-leveling algorithm for ino- wear-leveling effect of different approaches.
des [1], which is also implemented in SIMFS. PCV balances
the write counts of pages by migrating the inodes that meets
the thresholds. We set the period T of Contour to 10 seconds 4.2.1 Domain-Level Wear Ratio
for it can achieve the best trade-off between domain wear In the experiments, we capture the wear ratios of domains
ratio and migration overhead. after running the workloads for one hour, five hours, and
The experiments are conducted on a server that equipped ten hours, respectively. We define “wear gap” as the differ-
with an Intel Xeon E5-2640 v4 2.40 GHz processor and ence by subtracting the minimum wear ratio of memory
256 GB DRAM. We use PMEM [35] to emulate 64 GB DRAM domains from the maximum wear ratio. Fig. 3 shows the
as persistent memory of the file systems. To compensate the experimental results. For all the workloads, Contour stably
difference of write latency between PM and DRAM, we maintains balanced domain wear ratios at each moment.
adopt the RDTSC instruction to read timestamp and add After running the fileserver workload for 10 hours, the
software latency after the clflush instruction. Similar to [7], maximum wear ratios of NoWL, PCV, and Contour are 22.2,
the write latency of PM is set to 200 ns and the read latency is 27.8, and 9.0 percent respectively, as shown in Fig. 3a. Mean-
the same to DRAM. In each file system, the inode section has while, the Standard Deviation (SD) of the wear ratios by
32 memory domains that can support one million files. The NoWL, PCV, and Contour are 4.6, 5.7, and 0.64 respectively.
system runs on Ubuntu 16.04 with Linux kernel 4.4.30. Thus, the wear-leveling effect of Contour is 7.4 and 9.5
In the experiments, we first use Filebench [20], a widely better than that of NoWL and PCV respectively. The wear
used benchmark for file systems, and MySQL to evaluate the gaps in Contour are 0.95, 0.74, and 1.64 percent for one hour,
effect of wear leveling. We select three typical application- five hours, and ten hours respectively. The wear gaps in
level workloads from Filebench: fileserver, webserver, and NoWL are 3.4, 12.4, and 17.2 percent respectively, which
OLTP. fileserver emulates a file server that hosts 50 threads. means that the wear of domains becomes more unbalanced
Each thread performs a sequence of file operations, includ- along with the use of the system. Similarly, the wear gaps in
ing create, write, read, append, delete, and stat. webserver simu- PCV also grows from 3.2 percent (at 1h), 12.3 percent (at 5h)
lates a Web server that uses 100 threads to open, read, and to 21.5 percent (at 10h). The results show that only Contour
close multiple files in a directory tree, along with a set of file maintains balanced writes on the domains along with system
appends to simulate the web log. OLTP emulates a TPC-C running for Contour is aware of process variation.
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1041
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1042 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021
Fig. 4. Comparison of page-level write distribution in NoWL, PCV, and Contour at 10 hours.
show 6.8 and 8.1 percent performance degradation on aver- for fileserver, webserver, and OLTP respectively. It shows
age compared with NoWL, which can easily cause data that the performance overhead of Contour is negligible for the
loss for unaware of wear-leveling. For random write, the applications with complex file access patterns.
throughput of PCV and Contour drop by 8.9 and 8.6 percent
on average respectively compared to NoWL. The write 4.4 Wear Overhead
throughput degradation of Contour decreases with the PCV and Contour both have wear overheads caused by
increase of I/O size. When I/O size is larger than 32 KB, writing the migrated inodes to new locations and updating
the sequential and random write throughputs of Contour are the offsets in the deflection table. Table 3 shows the evalua-
between 95.6 and 100.0 percent these of NoWL. It is because tion results of wear overheads. “AWR (%)” is calculated by
the weight of inode management cost in a write operation dividing the additional writes cause by migration to the
decreases with the increase of the size of data transfer. total write counts in the experiments. “Domain wear (%)” is
In the cases of application-level workloads, we measure the the average wear ratio of the domains after running the
Ops/s of fileserver, webserver, and OLTP on the file systems. workload for 10h. “AWER (%)” is the wear ratio caused by
Fig. 5c shows the experimental results. The performance of additional writes. The AWR of PCV and Contour are both
Contour achieves 98.2, 99.3, and 99.9 percent that of NoWL less than 1 percent on all the benchmarks. The average
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1043
AWR of Contour is 0.06 percent higher than that of PCV. It WBL configurations to that with WBL=0.5, as shown in
is because that Contour protects the weak memory domains Fig. 6b. When WBL is 1, 1.5, and 2 percent, the performance
by controlling the write budgets. The AWER of PCV and shows 1, 1.1, and 1.4 percent improvement over that with
Contour are 0.041 and 0.034 percent respectively. The WBL=0.5, respectively.
AWER of Contour is lower than that of PCV for Counter In summary, when WBL grows larger than 1 percent, the
has lower average wear ratio of domains. Hence, the wear wear ratios of domains varies drastically while the overhead
overhead of Contour is negligible to the system. reduces slightly. Thus, we set WBL to 1 percent in the
experiments to balance the trade-off between the wear-
4.5 Sensitive Analysis of Tuning Write Budget Line leveling effect and the migration overhead.
Contour manages the writes of domains using a Write Bud-
get Line (WBL). Here we analysis the sensitivity of Contour 5 RELATED WORK
to the WBL. We set the WBL of each round to 0.5, 1, 1.5, and 5.1 Wear Leveling in Persistent Memory
2 percent. Then, we run the webserver workload for 5 hours File Systems
to evaluate the wear ratios of domains, the wear overhead Several existing persistent memory file systems also consider
and the performance overhead. wear leveling. BPFS [4] mentions that page-level wear leveling
Fig. 6a shows the wear ratios of memory domains in Con- can be achieved by periodically swapping virtual-to-physical
tour with different WBL. The variation of wear ratios grows page mappings. However, swapping page mappings for dif-
larger with the increase of WBL. On one hand, the SDs of ferent data in the file system is a complex mechanism that
wear ratios of the memory domains are 0.0023, 0.0126, 0.0074, requires careful analysis and design. PMFS [5] leaves wear
and 0.018 when WBL is 0.5, 1, 1.5, and 2 percent respectively. leveling to hardware for they believe that software-based
On the other hand, the differences between the maximum wear leveling may be overly complicated. In this paper, we
and minimum wear ratio grows from 0.67 percent with show that software-base wear leveling only has negligible
“WBL=0.5” to 4.96 percent with “WBL=2”. Therefore, the overhead to the file system. HMVFS [36] simply improves
smaller the WBL is, the better wear leveling is guaranteed. wear leveling of metadata and file data by write ordering.
Nevertheless, a smaller WBL will bring larger overhead. UnistorFS [37] achieves wear leveling by interchanging the
As Fig. 6b shows, the additional write ratio AWR of Con- pages between PM-based main memory and storage. The
tour is decreased from 1.69 percent with “WBL=0.5” to 0.37 problem is that existing persistent memory file systems still
percent with “WBL=2”. Accordingly, the additional wear maintain the inodes on fixed locations. The inode section of
ratio AWER of Contour is also reduced from 0.056 percent these persistent memory file systems can be easily damaged
with “WBL=0.5” to 0.01 percent with “WBL=2”. With a by frequently updated inodes.
larger WBL, each domain can tolerant more writes and
thereby reduce the number of migrations. The reduction of
5.2 Process Variation Aware Wear Leveling
migration result in lower wear overhead and overall perfor-
Process variation is not only a problem for PM but also for
mance overhead. We normalize the Ops/s with different
flash memory [38]. Most works for flash memory with pro-
TABLE 3
The Wear Overhead Caused by Migrations
in PCV [1] and Contour
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1044 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021
cess variation focus on optimizing the lifetime and perfor- [3] J. Izraelevitz et al., “Basic performance measurements of the intel
optane DC persistent memory module,” 2019, arXiv: 1903.05714.
mance in the SSD controller [39], [40], [41], rather than in [4] J. Condit et al., “Better I/O through byte-addressable, persistent
the system software. Besides, the granularity difference memory,” in Proc. ACM Symp. Operating Syst. Princ., 2009,
between flash memory and PM brings fundamental discrep- pp. 133–146.
ancy between their wear leveling techniques. There are also [5] S. R. Dulloor et al., “System software for persistent memory,” in
Proc. Euro. Conf. Comput. Syst., 2014, pp. 15:1–15:15.
works that consider process variation in the wear leveling [6] K. Zeng, Y. Lu, H. Wan, and J. Shu, “Efficient storage manage-
of PM. Dong et al. [15] propose a hardware migration algo- ment for aged file systems on persistent memory,” in Proc. Des.
rithm for wear rate leveling considering the trade-off Autom. Test Eur. Conf. Exhib., 2017, pp. 1773–1778.
[7] J. Ou, J. Shu, and Y. Lu, “A high performance file system for non-
between endurance benefits and migration cost. Yun volatile main memory,” in Proc. Eur. Conf. Comput. Syst., 2016,
et al. [42] present a bloom-filter-based algorithm to achieve pp. 12:1–12:16.
highly efficient fine-grained wear leveling of PCM. Zhao [8] E. Lee, S. H. Yoo, and H. Bahn, “Design and implementation of a
et al. [24], [43] take advantages of cell morphing to improve journaling file system for phase-change memory,” IEEE Trans.
Comput., vol. 64, no. 5, pp. 1349–1360, May 2015.
the endurance of MLC PCM cells. Han et al. [44] propose an [9] J. Xu and S. Swanson, “NOVA: A log-structured file system for
architecture-level approach to balance the wear rate of PCM hybrid volatile/non-volatile main memories,” in Proc. USENIX
chips. Zhang et al. [45] evenly put writes on weak or strong Conf. File Storage Technol., 2016, pp. 323–338.
domains according to an endurance-dependent probability. [10] E. H.-M. Sha, X. Chen, Q. Zhuge, L. Shi, and W. Jiang, “A new
design of in-memory file system based on file virtual address
Most of the process variation aware wear leveling works framework,” IEEE Trans. Comput., vol. 65, no. 10, pp. 2959–2972,
focus on architecture level, which bring large hardware cost Oct. 2016.
and show inflexibility on the granularity of migration. In [11] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable high perfor-
mance main memory system using phase-change memory tech-
this paper, the proposed OS-level approach, Contour, is nology,” in Proc. Int. Symp. Comput. Archit., 2009, pp. 24–33.
easy to deploy with negligible hardware and software costs. [12] F. Huang, D. Feng, Y. Hua, and W. Zhou, “A wear-leveling-aware
counter mode for data encryption in non-volatile memories,” in
Proc. Des. Autom. Test Eur. Conf. Exhib., 2017, pp. 910–913.
6 CONCLUSION [13] H. S. P. Wong et al., “Phase change memory,” Proc. IEEE, vol. 98,
no. 12, pp. 2201–2227, Dec. 2010.
In this paper, we studied the wear leveling problem of ino- [14] W. Zhang and T. Li, “Characterizing and mitigating the impact of
des, the most frequently updated data in a file system, on process variations on phase change based memory systems,” in
persistent memory considering process variation. We Proc. IEEE/ACM Int. Symp. Microarchit., 2009, pp. 2–13.
showed the potential wear risk of inodes by a motivational [15] J. Dong, L. Zhang, Y. Han, Y. Wang, and X. Li, “Wear rate level-
ing: Lifetime enhancement of pram with endurance variation,” in
example obtained on existing persistent memory file sys- Proc. Des. Autom. Conf., 2011, pp. 972–977.
tems. We proposed a process variation aware wear leveling [16] Z. Sun, X. Bi, and H. Li, “Process variation aware data manage-
mechanism, Contour, to balance the wear of memory pages ment for SRR-RAM cache design,” in Proc. Int. Symp. Low Power
for inodes. Contour enables dynamic migration of inodes by Electron. Des., 2012, pp. 179–184.
[17] W. Zhou, D. Feng, Y. Hua, J. Liu, F. Huang, and P. Zuo,
a deflection table. In tackle of varied endurance of memory “Increasing lifetime and security of phase-change memory with
domains, we designed cross-domain migration algorithm to endurance variation,” in Proc. IEEE Int. Conf. Parallel Distrib. Syst.,
match inodes to proper memory domains. For the memory 2016, pp. 861–868.
[18] J. Xu et al., “An efficient spare-line replacement scheme to enhance
pages in the same domain show same endurance, we pre- NVM security,” in Proc. Des. Autom. Conf. Conf. Exhib., 2019,
sented an intra-domain migration algorithm to balance the pp. 91:1–6.
writes on the pages in the same memory domain. We imple- [19] X. Wu, S. Qiu, and A. L. Narasimha Reddy,“SCMFS: A file system
mented a case of Contour in Linux kernel based on a real per- for storage class memory and its extensions,” ACM Trans. Storage,
vol. 9, no. 3, pp. 1–11, 2013.
sistent memory file system. Experimental results show that [20] V. Tarasov, E. Zadok, and S. Shepler,“Filebench: A flexible frame-
Contour achieves significant wear-leveling improvement work for file system benchmarking,” Login, vol. 41, no. 1, pp. 6–12,
over existing solutions with negligible overhead. 2016.
[21] “mysqlslap,” 2019. [Online]. Available: https://dev.mysql.com/
doc/refman/8.0/en/mysqllap.html
ACKNOWLEDGMENTS [22] “Fio: flexible I/O tester,” 2014. [Online]. Available: http://freecode.
com/projects/fio
The authors would like to thank the anonymous reviewers for [23] A. P. Ferreira, S. Bock, B. Childers, R. Melhem, and D. Moss,
their valuable feedbacks and improvements on this article. “Impact of process variation on endurance algorithms for wear-
This work was supported in part by the National Natural Sci- prone memories,” in Proc. Des. Autom. Test Eur. Conf. Exhib., 2011,
ence Foundation of China under Grant 61802038, in part by pp. 1–6.
[24] M. Zhao, L. Jiang, Y. Zhang, and C. J. Xue, “SLC-enabled wear
the Chongqing Postdoctoral Special Science Foundation leveling for MLC PCM considering process variation,” in Proc.
under Grant XmT2018003, in part by the China Postdoctoral Des. Autom. Conf., 2014, pp. 36:1–36:6.
Science Foundation under Grant 2017M620412, and in part by [25] S.-H. Chen, Y.-H. Chang, Y.-M. Chang, and W.-K. Shih, “mwJFS:
the Joint Sino(Chongqing)-Singapore Post-Doctoral Fellow- A multiwrite-mode journaling file system for MLC NVRAM
storages,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27,
ship Program. no. 9, pp. 2060–2073, Sep. 2019.
[26] M. Dong and H. Chen, “Soft updates made simple and fast on
REFERENCES non-volatile memory,” in Proc. USENIX Annu. Tech. Conf., 2017,
pp. 719–731.
[1] X. Chen, E. H. -M. Sha, Y. Zeng, C. Yang, W. Jiang, and Q. Zhuge, [27] R. Kadekodi, S. K. Lee, S. Kashyap, T. Kim, A. Kolli, and
“Efficient wear leveling for inodes of file systems on persistent V. Chidambaram, “SplitFS: Reducing software overhead in file
memories,” in Proc. Des. Autom. Test Eur. Conf. Exhib., 2018, systems for persistent memory,” in Proc. ACM Symp. Operating
pp. 1524–1527. Syst. Princ., 2019, pp. 494–508.
[2] P. Zhou, B. Zhao, J. Yang, and Y. Zhang, “A durable and energy [28] M. Dong, H. Bu, J. Yi, B. Dong, and H. Chen, “Performance and
efficient main memory using phase change memory technology,” protection in the ZoFS user-space NVM file system,” in Proc.
in Proc. Int. Symp. Comput. Archit., 2009, pp. 14–23. ACM Symp. Operating Syst. Princ., 2019, pp. 478–493.
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1045
[29] Z. Ross, “Add support for new persistent memory instructions,” Edwin H.-M. Sha (Senior Member, IEEE) received
2014. [Online]. Available: https://lwn.net/Articles/619851/ the PhD degree from the Department of Computer
[30] “Intel architecture instruction set extensions programming refer- Science, Princeton University, Princeton, NJ, in
ence,” 2020. [Online]. Available: https://software.intel.com/sites/ 1992. From 1992 to 2000, he was with the Depart-
default/files/managed/0d/53/319433-022.pdf ment of Computer Science and Engineering, Uni-
[31] C. Kuo, A. Shah, A. Nomura, S. Matsuoka, and F. Wolf, “How file versity of Notre Dame, Dame, IN. Since 2000, he
access patterns influence interference among cluster applications,” has been a tenured full professor with the Univer-
in Proc. IEEE Int. Conf. Cluster Comput., 2014, pp. 185–193. sity of Texas at Dallas, Richardson, TX. From 2012
[32] J. W. Hurrell et al., “The community earth system model: A frame- to 2017, he served as the dean of the College of
work for collaborative research,” Bull. Amer. Meteorological Soc., Computer Science, Chongqing University, China.
vol. 94, no. 9, pp. 1339–1360, 2013. He is currently a tenured distinguished professor
[33] J. Borrill, L. Oliker, J. Shalf, and H. Shan, “Investigation of leading with East China Normal University. He was a recipient of the Teaching
HPC I/O performance using a scientific-application derived bench- Award, the Microsoft Trustworthy Computing Curriculum Award, the NSF
mark,” in Proc. ACM/IEEE Conf. Supercomputing, 2007, Art. no. 10. CAREER Award, the NSFC Overseas Distinguished Young Scholar
[34] V. Gogte et al., “Software wear management for persistent memo- Award, and the Chang-Jiang Honorary chair professorship.
ries,” in Proc. USENIX Conf. File Storage Technol., 2019, pp. 45–63.
[35] Intel, “Persistent memory emulation platform (PMEP),” 2016.
[Online]. Available: https://pmem.io/2016/02/22/pm-emulation. Xinxin Wang is currently working toward the mas-
html ter’s degree majoring in computer science with
[36] S. Zheng, L. Huang, H. Liu, L. Wu, and J. Zha, “HMVFS: A hybrid Chongqing University, Chongqing, China. His rese-
memory versioning file system,” in Proc. IEEE Symp. Mass Storage arch interests include emerging non-volatile mem-
Syst. Technol., 2016, pp. 1–14. ory technology and in-memory file system.
[37] S.-H. Chen, T.-Y. Chen, Y.-H. Chang, H.-W. Wei, and W.-K. Shih,
“Unistorfs: A union storage file system design for resource shar-
ing between memory and storage on persistent ram-based sys-
tems,” ACM Trans. Storage, vol. 14, no. 1, 2018, Art. no. 3.
[38] M.-F. Chang and S.-J. Shen, “A process variation tolerant embed-
ded split-gate flash memory using pre-stable current sensing
scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 3, pp. 987–994,
Mar. 2009.
[39] L. Shi, Y. Di, M. Zhao, C. J. Xue, K. Wu, and E. H. -M. Sha, Chaoshu Yang received the BS degree from the
“Exploiting process variation for write performance improvement College of Computer Science, South-Central Uni-
on NAND flash memory storage systems,” IEEE Trans, VLSI Syst., versity for Nationalities, in 2008. He is currently
vol. 24, no. 1, pp. 334–337, Jan. 2016. working toward the PhD degree with the College of
[40] Y. Di, L. Shi, K. Wu, and C. J. Xue, “Exploiting process variation Computer Science, Chongqing University. His cur-
for retention induced refresh minimization on flash memory,” in rent research interests include non-volatile memo-
Proc. Des. Autom. Test Eur. Conf. Exhib., 2016, pp. 391–396. ries, distributed systems, and file systems.
[41] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch, and O. Mutlu, “Improving
3D NAND flash memory lifetime by tolerating early retention loss
and process variation,” ACM Meas. Anal. Comput. Syst., vol. 2, no. 3,
2018, Art. no. 37.
[42] J. Yun, S. Lee, and S. Yoo, “Dynamic wear leveling for phase-
change memories with endurance variations,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, pp. 1604–1615, Weiwen Jiang received the PhD degree from the
Sep. 2015. College of Computer Science, Chongqing Univer-
[43] M. Zhao, L. Jiang, L. Shi, Y. Zhang, and C. J. Xue, “Wear relief for sity. He is currently a post-doctoral scholar with
high-density phase change memory through cell morphing con- the University of Notre Dame. He was a recipient
sidering process variation,” IEEE Trans. CAD, vol. 34, no. 2, of best paper awards in ICCD’17, and best paper
pp. 227–237, Feb. 2015. nominations in DAC’19, and CODES+ISSS’19.
[44] Y. Han, J. Dong, K. Weng, Y. Wang, and X. Li, “Enhanced wear- His current research interests include neural
rate leveling for pram lifetime improvement considering process architecture search, FPGAs, nonvolatile memo-
variation,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, ries, and HW/SW co-optimization.
no. 1, pp. 92–102, Jan. 2016.
[45] X. Zhang and G. Sun, “Toss-up wear leveling: Protecting phase-
change memories from inconsistent write patterns,” in Proc. Des.
Autom. Conf., 2017, pp. 1–6.
Qingfeng Zhuge (Member, IEEE) received the BS
Xianzhang Chen (Member, IEEE) received the BS and MS degrees in electronics engineering from
and MS degrees in computer science and engi- Fudan University, Shanghai, China, and the PhD
neering from Southeast University, Nanjing, China, degree from the Department of Computer Science,
and the PhD degree from the College of Computer University of Texas at Dallas, Richardson, TX, in
Science, Chongqing University, China in 2017. He 2003. She is currently a professor with East China
was a research fellow with the National University Normal University, China. Her current research
of Singapore from 2019 to 2020. He is currently interests include parallel architectures, embedded
an associate professor with Chongqing University. systems, real-time systems, optimization algo-
He was a recipient of best paper awards in rithms, and scheduling. She was the recipient of
NVMSA’2015 and ICCD’17, “the Editor’s Pick of the Best PhD Dissertation Award, in 2003.
2016” of IEEE TC, and the Chongqing Best PhD
Dissertation Award in 2018.
" For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/csdl.
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.