You are on page 1of 12

1034 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO.

7, JULY 2021

Contour: A Process Variation Aware


Wear-Leveling Mechanism for Inodes
of Persistent Memory File Systems
Xianzhang Chen , Member, IEEE, Edwin H.-M. Sha , Senior Member, IEEE, Xinxin Wang,
Chaoshu Yang , Weiwen Jiang , and Qingfeng Zhuge , Member, IEEE

Abstract—Existing persistent memory file systems exploit the fast, byte-addressable persistent memory (PM) to boost storage
performance but ignore the limited endurance of PM. Particularly, the PM storing the inode section is extremely vulnerable for the
inodes are most frequently updated, fixed on a location throughout lifetime, and require immediate persistency. The huge endurance
variation of persistent memory domains caused by process variation makes things even worse. In this article, we propose a process
variation aware wear leveling mechanism called Contour for the inode section of persistent memory file system. Contour first enables
the movement of inodes by virtualizing the inodes with a deflection table. Then, Contour adopts cross-domain migration algorithm and
intra-domain migration algorithm to balance the writes across and within the memory domains. We implement the proposed Contour
mechanism in Linux kernel 4.4.30 based on a real persistent memory file system, SIMFS. We use standard benchmarks, including
Filebench, MySQL, and FIO, to evaluate Contour. Extensive experimental results show Contour can improve the wear ratios of pages
417.8 and 4.5 over the original SIMFS and PCV, the state-of-the-art inode wear-leveling algorithm, respectively. Meanwhile, the
average performance overhead and wear overhead of Contour are 0.87 and 0.034 percent in application-level workloads, respectively.

Index Terms—Persistent memory file system, wear leveling, process variation, persistent memory, metadata management

1 INTRODUCTION [18]. Hence, the lack of wear-leveling mechanism in existing


persistent memory file systems may cause fatal damage to the
the development of persistent memory (PM)
W ITH
technologies, such as Phase Change Memory
(PCM) [2] and Intel’s Optane DC persistent memory [3],
underlying PM device.
Particularly, the inode section of persistent memory file
system is extremely fragile to the wear issues. There are two
many persistent memory file systems [4], [5], [6], [7], [8], [9]
roots in the severe wear leveling problem of the inodes in
are proposed to exploit the advanced characteristics of PM,
existing persistent memory file systems. On one hand, the
especially non-volatility, low latency, and byte-addressability.
inode structure is one of the most frequently updated data
Existing persistent memory file systems, such as PMFS [5] and
structure in a file system and the inode updates require imme-
SIMFS [10], focus on designing highly efficient file I/O stacks
diate persistency. The inode of a file is responsible for main-
and data consistency guarantees but ignore the critical disad-
taining the attributes of the file, typically including file size,
vantage of PM, i.e., limited write endurance [11], [12]. For
last access time, last modification time, and access-control
example, the endurance of PCM is about 108 [13]. What’s
information. In the POSIX file system interfaces, most file sys-
worse, the write endurances of memory cells in a PM show
tem operations either directly update the inodes, such as link()
huge variance due to process variation [14], [15], [16], which
and unlink() that modify the link count in the inode of the tar-
makes the memory cells can only sustain 106 -108 writes [17],
get file, or write both of the file data and the corresponding
inodes. For example, write() and mkdir() update the file data
 Xianzhang Chen is with the College of Computer Science, Chongqing and the corresponding inode of a data file or a directory file,
University, Chongqing 400044, China, and also with the School of respectively.
Computing, National University of Singapore, Singapore 117417, Singapore. What is more important is that the inodes in existing
E-mail: xzchen109@gmail.com.
 Xinxin Wang and Chaoshu Yang are with the College of Computer persistent memory file systems are stored on fixed loca-
Science, Chongqing University, Chongqing 400044, China. tions throughout their lifetime. Existing persistent mem-
E-mail: {wangxinxcq, yangchaoshu}@gmail.com. ory file systems, regardless of whether they organizes the
 Edwin H.-M. Sha and Qingfeng Zhuge are with the School of Computer inode section by an array structure [10], [19] or tree struc-
Science and Software Engineering, East China Normal University, Shanghai
200062, China. E-mail: {edwinsha, qfzhuge}@gmail.com. ture [5], [9], never move the inode of a file to different
 Weiwen Jiang is with the College of Engineering, University of Notre memory cells. Hence, the update operations of each inode
Dame, Notre Dame, IN 46556 USA. E-mail: jiang.wwen@gmail.com. perform all the writes to the same physical memory
Manuscript received 22 Dec. 2019; revised 22 May 2020; accepted 10 June 2020. address, which can easily wear out the corresponding
Date of publication 15 June 2020; date of current version 9 June 2021. memory cells and cause data loss. Unfortunately, most
(Corresponding author: Xianzhang Chen.)
Recommended for acceptance by F. Douglis. existing file systems on PMs do not consider wear-leveling
Digital Object Identifier no. 10.1109/TC.2020.3002537 designs for inodes. Previous work for the wear leveling of
0018-9340 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1035

inodes [1] is unaware of variant endurance caused by pro-  We design a process variation aware inode migra-
cess variation. tion algorithm to balance the wear of inodes across
In this paper, we focus on solving the wear-leveling the memory domains.
problem for the inode section of file systems on the PM  We implement Contour in Linux kernel based on a
device with process variation. We present a process-varia- real persistent memory file system, SIMFS.
tion-aware mechanism called Contour, that balances the  Extensive experimental are conducted with standard
wear of the physical pages storing inodes by fitting the benchmarks. The experimental results show that
unevenly updated inodes to the memory pages with var- Contour significantly improves the wear leveling of
ied endurance. Contour consists of two techniques. First, inodes with negligible overhead.
we enable the migration of inodes by inode virtualization. The reminder of this paper is organized as follows. Section 2
Inode virtualization uncouples inodes from “fixed” physi- introduces the background and shows the motivational exam-
cal locations by a “virtual inode” layer between the logical ple for managing the inodes of file system on persistent mem-
inodes and the inode slots on physical PM. The mapping ory. In Section 3, we present the design and implementation of
relation between a logical inode and the corresponding the proposed Contour mechanism. Section 4 evaluates the
virtual inode is recorded as an offset in a deflection table. proposed Contour by extensive experiments. In Section 5, we
Given the inode number (i-number) of a logical inode, the summarize the studies related to this work. Finally, Section 6
file system can access the actual inode slot using the offset concludes the paper.
and the virtual address of the corresponding virtual inode.
Therefore, inode virtualization enables a logical inode1 2 BACKGROUND AND MOTIVATION
to be moved to different physical locations by changing
its offset. 2.1 Endurance Variation of Persistent Memory
Second, we tackle the unbalanced inode updates by an A common problem of persistent memory technologies is
inode migration mechanism. A PM device is divided into limited endurance. To make things worse, the endurance of
multiple memory domains that show various endur- persistent memory cells show huge variation due to process
ance [14]. The memory cells in the same domain, however, variation [14], [15], [17], [23], [24]. Previous works [14], [18]
have the same endurance. Take the features of persistent divide the memory space into multiple domains that the
memory domains in mind, we propose cross-domain migra- memory cells in a domain have the same endurance. Accord-
tion and intra-domain migration for migrating inodes. The ing to [17], [18], the endurance of domains in a persistent
cross-domain migration algorithm controls the balance of memory device approximately follows a linear distribution.
writes on the memory domains by “write budgets”. We For a 2 GB persistent memory device, the endurance of
migrate an inode to a proper memory domain according to domains ranges from 3:0  106 to 1:7  108 . The endurance
the write frequency of the inode and the relative write of the weakest domain can be 56 times lower than that of the
budgets of memory domains. In intra-domain migration, strongest domain. Thus, the persistent memory device can
we evenly distribute the writes of inodes to the inode slots fail quickly without protection for the weak persistent mem-
in the same domain by migrating inodes to less-wear slots. ory cells will be worn out early.
We implement the proposed Contour mechanism in The size of a domain is determined by the architecture of
SIMFS [10], a typical file system for managing PM device, as persistent memory device. The persistent memory device
a case study in Linux kernel 4.4.30. We evaluate the wear- usually organizes the memory cells in a hierarchical manner,
leveling effect and overheads of Contour using the typical including sub-arrays, sub-banks, and banks. The memory
benchmarks, including Filebench [20], MySQL [21], and cells residing in the same sub-array show the same endur-
Flexible I/O [22]. The wear-leveling effect is expressed by ance [14]. Hence, we regard a memory sub-array as a
wear ratio, i.e., write counts/endurance. domain, which is generally 4MB. In this case, the inode zone
The experimental results show that the proposed Con- in a file system consists of multiple memory domains with
tour mechanism brings significant wear-leveling improve- huge endurance variation. For example, the size of an inode
ment for inodes. Compared with the original SIMFS and zone is about 2.5GB for a 256GB persistent memory device
PCV [1], the state-of-the-art algorithm for the wear-leveling since the file system generally reserves 1 percent space as
of inodes, Contour shows 417.8 and 4.5 lower standard inode zone. Such an inode zone covers 640 domains with
deviation of wear ratios of pages, respectively. The maxi- more than 56 variations from the weakest domain to the
mum wear ratio of pages in Counter can be 2740.3 and strongest domain. In summary, the inode management of
2.5 lower than these of original SIMFS and PCV, respec- persistent memory file systems should fully consider the
tively. Meanwhile, the average performance overhead and endurance variation of the memory cells.
wear overhead of Contour are 0.87 and 0.034 percent in
application-level workloads, respectively. 2.2 Existing Persistent Memory File Systems
The main contribution of this paper includes: With the development of new persistent memories, a set of
persistent memory file systems have been proposed to break
 We propose a process variation aware wear-leveling the traditional I/O bottleneck. The design of existing persis-
mechanism, Contour, to protect PM device from being tent memory file systems mainly focus on improving perfor-
damaged by frequently updated inodes. mance or data consistency [4], [9], [25]. For example, BPFS [4]
proposes “short-circuit shadow paging”, an improved
1. In the following, “inode” is stand for logical inode without special shadow paging mechanism exploiting the byte-addressabil-
explanation. ity of persistent memory, to provide highly efficient data

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1036 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021

consistency guarantees. SIMFS [10] presents “file virtual


address space” to boost the file access performance by utiliz-
ing virtual address space and hardware MMU. NOVA [9] is
a log-structured file system that develops efficient atomic
operations for data and metadata. SoupFS [26] optimizes soft
updates on persistent memory taking advantages of the
advanced features of PM. SplitFS [27] and ZoFS [28] seek
user-space file access solutions for improving performance
with consistency and security guarantees.
The limited endurance problem of persistent memory, Fig. 1. Write distribution of OLTP on PMFS [5] and SIMFS [10].
however, are not well considered in the design of existing
persistent memory file systems. The underlying persistent
memory can be easily wear out by common file system and SIMFS [10]. The experimental results are shown Fig. 1. In
operations, especially the pages for storing inodes. It is the two file systems, all the writes of the inodes are performed
because that inodes are the most frequently updated data in on merely tens of pages. OLTP writes each of the few pages
a file system. The inodes are frequently updated by many more than 30,000 times in just one minute while left the rest
commonly-used file operations, including write, read, mkdir, thousands of pages unwritten. Therefore, it is necessary to
rmdir, rename, link, symlink, unlink, chmod, chown, open (cre- develop wear-leveling mechanism for protecting the memory
ate), and delete. Unfortunately, the wear out problem of cells from the frequently updated inodes. We believe the way
pages for storing inodes is ignored by existing persistent to tackle the wear leveling problem of inodes is to dynamically
memory file systems. Therefore, we focus on the wear level- migrate the inodes.
ing problem of inodes in this paper.
3 CONTOUR DESIGN AND IMPLEMENTATION
2.3 Wear Leveling Problem of Inodes
In this section, we propose a new design called Contour, to
There are two types of designs for inode section in existing
achieve wear-leveling of pages by distributing the writes of
persistent memory file systems: tree structures and arrays.
inodes evenly over multiple physical persistent memory
For example, PMFS [5], SanGuo [6], and HiNFS [7] use B-
pages. Contour is composed of two parts: Inode virtualiza-
tree to organize all the inodes. The inodes are stored in the tion enables a logical inode to be remapped to any physical
leaf node of the B-tree. SCMFS [19] and SIMFS [10] organize inode; Inode migration selects and migrates the heavily-
the inodes by an array. Beyond these data structures, we written inodes to less written pages.
have three observations on the inode management of persis-
tent memory file systems.
3.1 Inode Virtualization
Observation 1. The Inodes of Most Existing Persistent Memory In a file system, each inode has a unique ID called “inode
File Systems Never Move Throughout Their Lifetime. In most number” (i-number). In existing persistent memory file sys-
existing persistent memory file systems, inode section is tems, i-number is an index for seeking the corresponding
fixed once initialized. Therefore, most of the writes to a inode in the inode section. We use the inode section of
given inode are performed on a single inode slot of a SIMFS [10] as an example.
physical page, which makes the PM fragile to heavily The inode section of SIMFS is shown in Fig. 2a. SIMFS
updated inodes. organizes the logical inodes as an array-structured inode
section. An i-number is the index of an element in the array.
Observation 2. The Write Counts of Inodes Generally Vary a The sizes of inodes are the same. The array of inode section
Lot. For example, most system files are read-only and their is stored on multiple physical memory pages. Each physical
inodes are seldom updated, whereas some directory files page of persistent memory can be viewed as an array of
and user’s data files are frequently updated. This common inode slots. Once the inode section is created, the mapping
feature of systems result in unbalanced write distribution from the logical inodes to the inode slots is fixed, i.e., each
over the pages for storing inodes. Since each page can store inode slot is one-to-one bounded with a logical inode
multiple inodes, an inappropriate inode combination on a throughout the lifetime. As a result, the file system is unable
page can expedite the damage of the page. to handle the wear leveling problem of inodes with the per-
manent bind between inode slots and logical inodes.
Observation 3. The Updates Upon Inodes are Immediately Per-
In this section, we propose to change the permanent bind
formed on Persistent Memory. Existing persistent memory
to a convertible relation by a new design of inode virtualiza-
file systems use hardware primitives, such as pm_barrier [5]
tion. As shown in Fig. 2b, Contour adds a new virtual inode
and PCOMMIT [9], [29], [30], to ensure writing ordering of
layer between logical inode (represented by i-number) and
inodes. With these instructions, a file system cannot cache
inode slot. The Virtual Inodes (VInodes) forms an array-
dirty inodes in CPU cache, DRAM, or even the buffer in
structured virtual space, “inode virtual address space”, that
memory controller, but writes the new data of inodes to
each VInode has a unique offset in virtual space. The inode
PM immediately. As a result, the updates of inodes will
virtual address space uses page table to map the virtual
bring inevitable damage to the PM.
inode to the physical pages of inode slots, where each
We show a motivational example by running the VInode is bound with an inode slot. The inode virtual
application-level workload OLTP of Filebench [20] on PMFS [5] address space has a beginning virtual address. The file

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1037

Fig. 2. Comparison of conventional inode section and the proposed virtualized inode section.

system can get the virtual address of a VInode using its off- memory domain are evenly distributed over the pages in
set and the inode virtual address space’s beginning address. the memory domain, a stronger memory domain can bear
In Contour, we realize the endurance variation of memory more writes than a weaker memory domain with the same
domains in the persistent memory, whereas existing persis- wear ratio. Thus, it is necessary to move the inodes with dif-
tent memory file systems are unaware of that. ferent writes counts between the domains such that the
On the other side, inode virtualization mechanism maps wear ratio of memory domains are balanced.
logical inodes to VInodes via a deflection table. In the deflec- In this section, we propose a cross-domain migration algo-
tion table, each logical inode has an entry indexed by its i- rithm to balance the wear ratios of different memory domains.
number. The entry of a logical inode maintains the offset to To control the number of wear ratios of memory domains, we
the corresponding VInode. With the help of deflection table, define a Write Budget Line (WBL) for all the domains. WBL is
a logical inode can be mapped to different VInode dynami- an adjustable wear ratio constraint. Within the write budget
cally by changing its offset. Similar to array-structured inode line, each domain dj has its own “Write Budget” WBj , which
sections, tree-structured inode sections can also be virtual- indicates the maximum number of writes can be performed
ized by constructing a deflection table for inodes in the leaf on all the memory pages in the domain. On one hand, the
nodes. Using the proposed Contour mechanism, the file sys- write budgets of memory domains are various due to endur-
tem access an inode in three steps: (1) The file system gets the ance variation. On the other hand, the updating counts of
offset of the logical inode via its i-number; (2) The file system inodes are diverse for the files have different access features.
calculates the virtual address of the inode by adding its offset The main idea of the proposed cross-domain migration is to
with the beginning address of the inode virtual address hold the write budget line for all the domains by migrating
space; (3) The file system accesses the inode slot using the inodes between the domains. We now discuss two key con-
virtual address of inode and the hardware memory manage- cerns of the idea.
ment unit in the processor. Matching Inodes and Domains. A basic observation is that
In implementation, the deflection table is an array that migration operations actually bring additional writes to the
corresponds to the logical inodes of the file system. The memory domains. We need to avoid migration operations
inode virtual address space is a segment of the kernel vir- as far as possible. From this perspective, to achieve balanced
tual address space. The overhead for accessing an inode in wear ratios and reduce the potential inode migrations, the
Contour is only an additional memory access and an addi- ideal case is to place the inodes with most writes on the
tion, which is negligible for the file operation. strongest memory domains and so on. To achieve the ideal
case, we should match the inodes and memory domains
3.2 Inode Migration according to the potential updating counts of inodes and
Given a PM page (domain), we define wear ratio as the por- the available write budget of memory domains.
tion of the write counts performed on the page (domain) The potential updating counts of inodes is related to the
with respect to the corresponding endurance. For an inode I/O patterns of applications [31], such as open-write-close,
section across the memory domains with endurance varia- write-seek, and aggregate-write. For example, an applica-
tion, wear leveling basically means that the wear ratio of the tion adopting open-write-close pattern, such as CESM [32],
physical pages for storing inodes are the same. In this case, generally opens a file, write it, and then close it. Thus, it will
we propose a two-layer strategy for the wear leveling of not cause much writes to the same inode. On the contrary,
inode zone: first, we design a cross-domain migration algo- an application employing aggregate-write pattern, such as
rithm for balancing the wear ratio of different domains; sec- MADCAP [33] and the file log in Webserver [20], opens a
ond, we present intra-domain migration algorithm to even file and contiguously appends it, which will bring lots of
distribute the writes on the PM pages in the same domain, writes to the inode of the corresponding file. Hence, the
which has identical endurance. potential updating counts of an inode can be reflected by
the write frequency of the inode. We define damaging flow
DF T ðIi Þ as the write counts of inode Ii in a timing period T .
3.2.1 Cross-Domain Migration The length of each timing period is set to t.
Given a wear ratio, the affordable numbers of writes on the With the damaging flow, we can judge the criticality of
memory domains are various. Assume the writes upon a each inode to the memory domain, i.e., how harmful an

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1038 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021

inode can be to the current memory domain. To determine In period T , the expected maximum write counts of domain
T 1
the criticality of an inode, we use the average damaging dj is t  NWBj  DFavg . Thus, the expected maximum write
flow of previous periods as the reference. The average dam- counts WI ðIi Þmax of inode Ii in memory domain dj is:
T
T
aging flow DFavg of period T is calculated by Equation (1).
T 1
NWBj  DFavg
T
þ T 1 WI T ðIi Þmax ¼ t  : (5)
WItotal =t DFavg NP  NS
T
DFavg ¼ ; (1)
2
It is because that each domain has NP  NS inode slots and
T
where WItotal is the total write counts of all the inodes in we assume the inode slots sustain the same write counts.
0
period T . Initially, DFavg equals zero. Then, we calculate the When the write counts of an inode Ii reaches the expected
T
criticality factor ‘i of inode Ii in period T as follows: maximum write counts, the file system calculates the current
criticality factor of ‘curr
i of Ii . According to Equations (2) and
DF T ðIi Þ (5), ‘curr is calculated by:
‘Ti ¼ T
 NP  NS : (2) i
DFavg
WI T ðIi Þmax  NP  NS t  NWBj
‘curr ¼ T 1
¼ ; (6)
where NP is the number of pages in each memory domain, i
Dti  DFavg Dti
which is 1024 for a memory domain is 4MB and a page is
4KB. NS is the number of slots in each page, which is a fixed where Dti is the time passed from the beginning of period T .
value for any file system. In Equation (2), we amplify the rel- Then, we migrate Ii to the corresponding memory domain
ative damaging flow of an inode by NP  NS times for the crit- according to ‘curr
i and Equation (3). To avoid migrations as
ical factor reveals the importance of an inode to the whole much as possible, we do not proactively migrate the ino-
memory domain. des that do not reach the minimum write counts in the
In each period, we match an inode with a memory period. We simply move it to the best-fit memory domain
domain according to the criticality factor of the inode and when the current memory domain is running out of
the endurance of the memory domains. During a period, if space.
the criticality factor of an inode reaches a higher level, we Adjusting the Write Budgets. The balance of the whole
move it to a stronger memory domain. In the cross-domain write distribution on memory domains is controlled by
migration algorithm, we setup D criticality levels for match- write budget line. Each time we set the new write budget
ing the inodes with memory domains, where D is the num- line, we call it a new round R. At the beginning of round R,
ber of memory domains. The criticality levels are one-to-one the file system calculates the total write budget WBR
total :
matched with the memory domains. The criticality levels is
defined by a step function f : Rþ ! N: X
D

total ¼ WBL
WBR Ej þ ðWBtotal  WItotal Þ;
R R1 R1
(7)
8 j¼1
>
> 1; if ‘Ti 2 A1
<
2; if ‘Ti 2 A2
fð‘Ti Þ ¼ : (3) where Ej , j ¼ 1; 2; ldots; D, is the endurance of domain dj .
>
> :::
: During the round, the file system records the total writes
D; if ‘i 2 AD
T
R
counts WItotal of inodes and estimates the percentage of
remaining total budget. Once the remaining total budget
where Aj (j ¼ 1; 2; ldots; D) are the ranges of the criticality fac-
reaches a threshold, the file system will start a new round. A
tors of inodes. The ranges of criticality factor is determined by
busy system needs a larger threshold to ensure that the file
the relative write budgets of memory domains. First, we take
system has enough time to reset the budgets. In our imple-
the average write budget of the memory domains as the base-
mentation, we always set the write budget line and the new
line for the criticality factor is normalized to the average dam-
round threshold to 1 and 5 percent, respectively. With this
aging flow of the system. Then, we calculate the normalized
configuration and the 2 GB persistent memory device from
write budget value (denoted by NWBj for domain dj ) of each
[18], each round can support about 4:42  108 inode updates.
domain over the baseline. Finally, we set the range of memory
The file system can support another 2:21  107 inode updates
domain dj to Aj ¼ ðNWBj1 ; NWBj , except that A0 ¼ ½0;
before starting a new round, which is enough for most
NWB1  and AD ¼ ðNWBD1 ; þ1Þ.
workloads [9].
Migrating the Inodes. Basically, we do not need to migrate
At the beginning of each round, we also need to adjust the
an inode if its damaging flow matches the located memory
write budget of each memory domain. A memory domain
domain. Thus, there is an expected damaging flow for each
may overuse the prior write budget or have surplus. Thus,
inode. In a period T , we migrate an inode Ii if its write counts
we need to rectify the ranges in Equation (3) according to the
exceeds its expected damaging flow DF T ðIi Þexp in the current
actual write budgets of the memory domains: First, the file
period. DF T ðIi Þexp is based on the expected criticality factor
T 1 system finds out the average write budget WBR avg of the ino-
of Ii and the average damaging flow DFavg in the prior
des in round R; then, the file system calculates the normal-
period. The expected criticality factor of Ii in memory domain
ized write budget NWBR j for each memory domain by
dj is NWBj1 < ‘Ti  NWBj according to Equation (3). Given
T 1 dividing WBR avg ; finally, we reset the boundaries of the ranges
a period T , DFavg is a known variable. Thus, the predicted
in Equation (3) according to the normalized write budgets of
damaging flow can be calculated by Equation (4).
memory domains.
T 1 T 1 In a certain round, a memory domain dj may have a write
NWBj1  DFavg < DF T ðIi Þexp  NWBj  DFavg : (4)
budget equal to or larger than that of the adjacent stronger

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1039

memory domain djþ1 , i.e., NWBR j  NWBjþ1 . In this case, the


R
a less written inode slot for inode Ii . Here we also consider
file system resets the boundaries of range Aj and Ajþ1 by: the balance of writes on physical pages. First, we check if
there is a free slot in a page that not only the write counts
NWBR þNWBR
j1 < Aj  ;
j
NWBR 2
jþ1
of the pages is less than the reference line but also the slot has
NWBR þNWBR
(8) not reach its write budget (Lines 4-7). If the condition is
j
2
jþ1
< Ajþ1  NWBR
j : failed, we try to find such a free slot in the pages that their
write counts have not reaching its write budget (Lines 8-12).
The new boundaries increase the possibility of accepting ino- If there is still no available slot and total write counts already
des for the weaker domain and place the inodes with higher excess the budget of the domain, we find a free slot from a
damage flow in the stronger domain, through which the domain that has available budget following best-fit principle
usage of write budgets for the domains may be balanced. (Lines 13-16). Otherwise, we swap inode Ii with the least
recently written inode Icand in the domain (Lines 17-20). If we
3.2.2 Intra-Domain Migration find a satisfiable empty slot, we migrate Ii to it.
As aforementioned, the memory cells in the same domain
have the same endurance. Hereby, in-domain wear leveling Algorithm 1. The Intra-Domain Migration Algorithm
means that the write counts of the physical pages for storing
Input: WSðIi Þ: the write counts of the inode slot storing inode
inodes are the same. In this case, the maximum write count
Ii ; WPavg : the average write counts of the physical pages;
of the pages is the minimum and the standard deviation of
Output: The relative storing position Offseti for Ii ;
the write count of pages is zero. Therefore, the in-domain 1: if WSðIi Þ  a then
wear leveling problem of inodes can be defined as follows: 2: continue;
Given n inodes and NP physical pages that each page has 3: else
NS slots for storing inodes, how to place n inodes on NP 4: for Pp has free slot and WPp  WPavg do
pages such that the standard deviation of the write counts 5: if a slot Slots on Pp satisfies WSs < a then
of pages is minimized? The key is to find out a proper com- 6: Slotcand Slots ;
bination of inodes for each page such that the pages can 7: break;
achieve the same write count. The write counts of inodes, 8: if Slotcand ¼ NULL then
however, are dynamically changing in a run-time system. 9: for Pp has free slot and WPp < a  NS do
Therefore, we propose to solve the problem by dynamically 10: if a slot Slots on Pp satisfies WSs < a then
migrating frequently updated inodes during run-time. 11: Slotcand Slots ;
There are two observations for designing the dynamic 12: break;
migration method. First, from the perspective of wear level- 13: if Slotcand ¼ NULL then
ing, it is not necessary to migrate an inode if the write 14: if WPavg  NP  WBj then
counts of the underlying inode slot has not reach its write 15: Slotcand a free slot in the best-fit domain that has
budget. Second, only the future number of writes of an available budget;
inode impacts the page that the inode will be placed after 16: else
migration. Thus, a migration needs to consider the write 17: Icand the least recently written inode;
counts of slots and the features of inodes simultaneously. 18: Slotcand SlotðIcand Þ;
19: Swap Ii with Icand ;
Based on the observations, we propose intra-migration
20: Exchange Offseti and OffsetðSlotcand Þ;
algorithm for migrating the inodes. We use the average write
21: return;
counts of all pages in a memory domain (denoted by WPavg )
22: Migrate Ii to Slotcand ;
as the “reference line”. The main idea of intra-migration 23: Offseti OffsetðSlotcand Þ;
algorithm is to achieve wear-leveling by aligning the write
counts of each page to the reference line. The intra-migration
In each round, the threshold a is a constant. In worst
algorithm determines whether an inode need to be migrated
case, the intra-domain migration algorithm need to traverse
or not by comparing the write counts of the underlying inode
all the inodes in the memory domain. Hence, the timing
slot and the reference line. The detailed algorithm is shown
complexity of the proposed intra-domain migration algo-
in Algorithm 1.
rithm is OðNP  NS Þ.
PR
Ej WBLr
a¼ r¼1
; (9) 3.3 Discussion
NP  NS
The deflection table in Contour is implemented as an array.
The proposed intra-migration algorithm uses threshold a Each entry in the deflection table stores the offset by a 4-
to judge if an inode slot is overly wore. a is the write budget byte integer, which can support 232 inodes. For a file system
of each inode slot in domain dj . We hope that the expected with one million inodes, the space overhead of the deflec-
writes in each round (i.e., write budget) are evenly distrib- tion table is merely 3.82 MB. The wear of the deflection table
uted on the inode slots in the domain. Thus, in each round is much lighter than the wear of inode section since the off-
R, we set a by Equation (9). set is modified only when an inode is migrated. We main-
In Algorithm 1, WSð Ii Þ is the write counts of the inode tain the counters for storing the write counts of each logical
slot that storing inode Ii throughout all the existing rounds inode, each page, each domain, and all the domains in
(Line 1). If WSð Ii Þ achieves a, Ii needs to be migrated to pro- DRAM-based main memory and write the dirty counters
tect the corresponding inode slot. In the following, we find back to the PM at the end of each period. Thereby, the

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1040 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021

TABLE 1 TABLE 2
Characteristics of Filebench Workloads and MySQL The Number of Writes After Running 10 Hours on NoWL

# of files Average Threads I/O size NoWL fileserver webserver OLTP MySQL
(Data/Log) file size (r/w)
Total writes 6.80E+08 5.00E+08 4.10E+08 4.00E+08
fileserver 200K/0 4KB 50 1MB/16KB Max on domains 21345585 20198403 35243334 16559976
webserver 200K/100 4KB 100 1MB/256B Max on pages 25788 20197700 24028600 16559700
OLTP 200K/1 10MB 12 2KB/2KB
MySQL 30M/20 8KB 50 32B

The selected workloads have different read/write ratios and access patterns.
workload to test the performance of the transactions that
poses small random reads and writes on a file system using
damage of these counters is alleviated to a lower level than 10 database writers, 1 database readers, and one log writer.
the inodes. Even a system failure occurs, we will only miss Furthermore, we install MySQL on the file systems and eval-
the write counts in a short timing period, which is negligible uate wear leveling by the mysqlslap [21] benchmark. The
in the lifetime of PM. detailed characteristics of the workloads are shown in
There is a wear-leveling mechanism called Kevlar [34] Table 1. Then, we measure the performance overhead by Fil-
that also deflects virtual and physical pages and migrates ebench and Flexible I/O (FIO) [22] respectively. Finally, we
hot data. However, since Kevlar is a universal technology evaluate the wear overhead caused by inode migration.
for applications and Contour is dedicated to the inodes of
persistent memory file systems, the granularity, migration 4.2 Effect of Wear Leveling
mechanism, and wear-leveling strategy of Contour and In this subsection, we show the effect of wear leveling from
Kevlar are different. For the wear-leveling of inode section domain level and page level. First, the domain-level wear
in persistent memory file systems, Contour is promised to ratios reveal the trend of wear of the whole memory
be more efficient and economic than Kevlar. domains for inodes. Second, the page-level write distribu-
tions show the wear-leveling effect of the approaches at one
4 EVALUATION point. Table 2 shows the absolute numbers of writes after
running the workloads for 10 hours without support of
4.1 Experimental Setups
wear leveling. For webserver, OLTP, and MySQL, the writes
In this section, we evaluate the wear-leveling effect and effi- on the most heavily written domain mainly come from the
ciency of Contour. We implement the proposed Contour most heavily written page, indicating that the write distri-
mechanism in Linux kernel 4.4.30 based on SIMFS [10], a butions in some domains are highly unbalanced. The max
typical persistent memory file system. We compare Contour numbers of writes on pages for webserver, OLTP, and
with two file systems: 1) The original SIMFS that has no MySQL already reach 108 , showing that some pages are
wear-leveling mechanism (denoted by NoWL) for inodes; 2) closing to be damaged. Hence, it is necessary to evaluate the
PCV, the state-of-the-art wear-leveling algorithm for ino- wear-leveling effect of different approaches.
des [1], which is also implemented in SIMFS. PCV balances
the write counts of pages by migrating the inodes that meets
the thresholds. We set the period T of Contour to 10 seconds 4.2.1 Domain-Level Wear Ratio
for it can achieve the best trade-off between domain wear In the experiments, we capture the wear ratios of domains
ratio and migration overhead. after running the workloads for one hour, five hours, and
The experiments are conducted on a server that equipped ten hours, respectively. We define “wear gap” as the differ-
with an Intel Xeon E5-2640 v4 2.40 GHz processor and ence by subtracting the minimum wear ratio of memory
256 GB DRAM. We use PMEM [35] to emulate 64 GB DRAM domains from the maximum wear ratio. Fig. 3 shows the
as persistent memory of the file systems. To compensate the experimental results. For all the workloads, Contour stably
difference of write latency between PM and DRAM, we maintains balanced domain wear ratios at each moment.
adopt the RDTSC instruction to read timestamp and add After running the fileserver workload for 10 hours, the
software latency after the clflush instruction. Similar to [7], maximum wear ratios of NoWL, PCV, and Contour are 22.2,
the write latency of PM is set to 200 ns and the read latency is 27.8, and 9.0 percent respectively, as shown in Fig. 3a. Mean-
the same to DRAM. In each file system, the inode section has while, the Standard Deviation (SD) of the wear ratios by
32 memory domains that can support one million files. The NoWL, PCV, and Contour are 4.6, 5.7, and 0.64 respectively.
system runs on Ubuntu 16.04 with Linux kernel 4.4.30. Thus, the wear-leveling effect of Contour is 7.4 and 9.5
In the experiments, we first use Filebench [20], a widely better than that of NoWL and PCV respectively. The wear
used benchmark for file systems, and MySQL to evaluate the gaps in Contour are 0.95, 0.74, and 1.64 percent for one hour,
effect of wear leveling. We select three typical application- five hours, and ten hours respectively. The wear gaps in
level workloads from Filebench: fileserver, webserver, and NoWL are 3.4, 12.4, and 17.2 percent respectively, which
OLTP. fileserver emulates a file server that hosts 50 threads. means that the wear of domains becomes more unbalanced
Each thread performs a sequence of file operations, includ- along with the use of the system. Similarly, the wear gaps in
ing create, write, read, append, delete, and stat. webserver simu- PCV also grows from 3.2 percent (at 1h), 12.3 percent (at 5h)
lates a Web server that uses 100 threads to open, read, and to 21.5 percent (at 10h). The results show that only Contour
close multiple files in a directory tree, along with a set of file maintains balanced writes on the domains along with system
appends to simulate the web log. OLTP emulates a TPC-C running for Contour is aware of process variation.

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1041

Figs. 4a, 4b, and 4c show the experimental results by file-


server. The threads in the fileserver workload manage the
files by a fixed pattern: create, write, close, open, read, close,
and delete. Moreover, as Table 1 shows, fileserver has zero
log file. Hence, the write counts of inodes by fileserver are
nearly uniform. Since we alternatively allocate inode slot
from each page for create file requests, NoWL shows almost
evenly distributed write counts (20392 to 25788) on the
pages in Fig. 4a. PCV further balanced the writes of pages,
as shown in Fig. 4b. However, due to the ignorance of
endurance variations of domains, the SD of wear ratios PCV
reaches 5.71. In this case, Contour shows balanced wear
ratio of pages and the SD of wear ratios is only 0.56, which
is 10.2 lower than that of PCV.
On webserver, OLTP, and MySQL, Contour still shows
balanced wear of pages. The SDs of wear ratios by Contour
are 0.95, 0.81, and 0.49 for webserver, OLTP, and MySQL
respectively. On the contrary, many pages in NoWL are
already worn out for their write counts exceed the endur-
Fig. 3. Comparison of wear ratio of domains in NoWL, PCV, and
ance of the corresponding pages, even more than 100 that
Contour. of several pages. The maximum wear ratios in NoWL is
2740.3 higher than these of Contour on average. Moreover,
Figs. 3b, 3c, and 3d show that Contour still achieves bet- the SDs of wear ratios in NoWL are 286.1, 221.4, and
ter wear leveling than other approaches on webserver, 417.8 higher than that of Contour for webserver, OLTP,
OLTP, and MySQL. For example, the wear gaps in Contour and MySQL respectively. The highly imbalanced wear of
are less than 1.4 percent by running MySQL. The wear pages by NoWL is due to the lack of wear-leveling mecha-
gaps in NoWL are 6.5, 8.62, and 10.8 percent at 1h, 5h, and nism for the heavily-written files.
10h, respectively. The wear gaps in PCV also grows from Figs. 4e, 4h, and 4k show that PCV also evenly distributes
0.79 percent (at 1h), 3.8 percent (at 5h) to 10.1 percent (at the writes to the pages on the latter three workloads. The
10h). The SD of the wear ratios by NoWL, PCV, and Con- wear ratios of pages, however, are highly imbalanced: The
tour at 10h are 3.1, 2.7, and 0.39 respectively. It means that SD of wear ratios on webserver, OLTP, and MySQL are 3.1,
for MySQL, the wear-leveling effect of Contour is 6.9 and 3.3, and 2.7 respectively. The wear leveling of Contour out-
5.8 better than that of NoWL and PCV respectively. performs PCV by 2.2, 3.0, and 4.5 on webserver, OLTP,
In fileserver and webserver, the lines of wear ratios by and MySQL respectively. Meanwhile, the maximum wear
NoWL is quite smooth, whereas the lines in OLTP and ratios in Contour is 2.4 lower than these of PCV on aver-
MySQL fluctuate a lot. Both fileserver and webserver gener- age. Hence, Contour can achieve better wear leveling of
ally commit just few updates to the inode of a single file that pages than PCV.
make the variance of write counts of the inodes are natu- Different from fileserver, Figs. 4f, 4i, and 4l show that
rally small. On the contrary, many files in OLTP and some pages in Contour has relatively low write counts and
MySQL have much more updates than other files, which wear ratios by webserver, OLTP, and MySQL. For example,
cause huge variance in the write counts of inodes. there are 101 pages that has a wear ratio less than 1 percent
In webserver, OLTP, and MySQL, the lines of wear ratios by MySQL. In webserver, OLTP, and MySQL, many files
by PCV are smoother than that of NoWL. It is because that are heavily written, as Figs. 4d, 4g, and 4j illustrated. In this
PCV balances the write counts of pages by migrating case, Contour migrates the related inodes considering the
heavily-written inodes. However, due to the unawareness write budget of the pages to balance the wear ratio. Some of
of process variation, PCV still cause heavy damage to the pages are not tolerating much writes yet for the other pages
weak domains although the writes are evenly distributed. still have write budgets. Therefore, the wear ratios of pages
in Contour will be more balanced with the increased num-
ber of writes.
4.2.2 Page-Level Write Distribution
Fig. 4 shows the write counts (denoted by dark dots) and 4.3 Performance Overhead
wear ratios (denoted by blue lines) of the pages in the inode To understand the impact of the proposed Contour mecha-
sections after running the workloads for ten hours. In the nism on the performance of file systems, we measure the
sub-figures of Fig. 4, each dot represents the write count of throughput of write(), one of the most frequently used opera-
a page and the line shows the wear ratio of the correspond- tion that updates inode, by FIO and the Ops/s of application-
ing page. The write counts of pages in Contour generally level workloads by Filebench. “I/O Size” in Fig. 5 means the
follows the endurance variation of memory domains. The data size of each I/O request.
wear ratio of pages in Contour are highly balanced, e.g., the Fig. 5 shows that Contour and PCV has less than 1 percent
SDs of wear ratios are less than 0.95 on all workloads. On performance variation on both write throughputs and appli-
the contrary, the wear ratios of pages in NoWL and PCV cation-level throughputs, even though Contour achieves bet-
shows large variation. ter wear leveling. For sequential write, PCV and Contour

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1042 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021

Fig. 4. Comparison of page-level write distribution in NoWL, PCV, and Contour at 10 hours.

show 6.8 and 8.1 percent performance degradation on aver- for fileserver, webserver, and OLTP respectively. It shows
age compared with NoWL, which can easily cause data that the performance overhead of Contour is negligible for the
loss for unaware of wear-leveling. For random write, the applications with complex file access patterns.
throughput of PCV and Contour drop by 8.9 and 8.6 percent
on average respectively compared to NoWL. The write 4.4 Wear Overhead
throughput degradation of Contour decreases with the PCV and Contour both have wear overheads caused by
increase of I/O size. When I/O size is larger than 32 KB, writing the migrated inodes to new locations and updating
the sequential and random write throughputs of Contour are the offsets in the deflection table. Table 3 shows the evalua-
between 95.6 and 100.0 percent these of NoWL. It is because tion results of wear overheads. “AWR (%)” is calculated by
the weight of inode management cost in a write operation dividing the additional writes cause by migration to the
decreases with the increase of the size of data transfer. total write counts in the experiments. “Domain wear (%)” is
In the cases of application-level workloads, we measure the the average wear ratio of the domains after running the
Ops/s of fileserver, webserver, and OLTP on the file systems. workload for 10h. “AWER (%)” is the wear ratio caused by
Fig. 5c shows the experimental results. The performance of additional writes. The AWR of PCV and Contour are both
Contour achieves 98.2, 99.3, and 99.9 percent that of NoWL less than 1 percent on all the benchmarks. The average

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1043

Fig. 5. Performance comparison of NoWL, PCV, and contour.

AWR of Contour is 0.06 percent higher than that of PCV. It WBL configurations to that with WBL=0.5, as shown in
is because that Contour protects the weak memory domains Fig. 6b. When WBL is 1, 1.5, and 2 percent, the performance
by controlling the write budgets. The AWER of PCV and shows 1, 1.1, and 1.4 percent improvement over that with
Contour are 0.041 and 0.034 percent respectively. The WBL=0.5, respectively.
AWER of Contour is lower than that of PCV for Counter In summary, when WBL grows larger than 1 percent, the
has lower average wear ratio of domains. Hence, the wear wear ratios of domains varies drastically while the overhead
overhead of Contour is negligible to the system. reduces slightly. Thus, we set WBL to 1 percent in the
experiments to balance the trade-off between the wear-
4.5 Sensitive Analysis of Tuning Write Budget Line leveling effect and the migration overhead.
Contour manages the writes of domains using a Write Bud-
get Line (WBL). Here we analysis the sensitivity of Contour 5 RELATED WORK
to the WBL. We set the WBL of each round to 0.5, 1, 1.5, and 5.1 Wear Leveling in Persistent Memory
2 percent. Then, we run the webserver workload for 5 hours File Systems
to evaluate the wear ratios of domains, the wear overhead Several existing persistent memory file systems also consider
and the performance overhead. wear leveling. BPFS [4] mentions that page-level wear leveling
Fig. 6a shows the wear ratios of memory domains in Con- can be achieved by periodically swapping virtual-to-physical
tour with different WBL. The variation of wear ratios grows page mappings. However, swapping page mappings for dif-
larger with the increase of WBL. On one hand, the SDs of ferent data in the file system is a complex mechanism that
wear ratios of the memory domains are 0.0023, 0.0126, 0.0074, requires careful analysis and design. PMFS [5] leaves wear
and 0.018 when WBL is 0.5, 1, 1.5, and 2 percent respectively. leveling to hardware for they believe that software-based
On the other hand, the differences between the maximum wear leveling may be overly complicated. In this paper, we
and minimum wear ratio grows from 0.67 percent with show that software-base wear leveling only has negligible
“WBL=0.5” to 4.96 percent with “WBL=2”. Therefore, the overhead to the file system. HMVFS [36] simply improves
smaller the WBL is, the better wear leveling is guaranteed. wear leveling of metadata and file data by write ordering.
Nevertheless, a smaller WBL will bring larger overhead. UnistorFS [37] achieves wear leveling by interchanging the
As Fig. 6b shows, the additional write ratio AWR of Con- pages between PM-based main memory and storage. The
tour is decreased from 1.69 percent with “WBL=0.5” to 0.37 problem is that existing persistent memory file systems still
percent with “WBL=2”. Accordingly, the additional wear maintain the inodes on fixed locations. The inode section of
ratio AWER of Contour is also reduced from 0.056 percent these persistent memory file systems can be easily damaged
with “WBL=0.5” to 0.01 percent with “WBL=2”. With a by frequently updated inodes.
larger WBL, each domain can tolerant more writes and
thereby reduce the number of migrations. The reduction of
5.2 Process Variation Aware Wear Leveling
migration result in lower wear overhead and overall perfor-
Process variation is not only a problem for PM but also for
mance overhead. We normalize the Ops/s with different
flash memory [38]. Most works for flash memory with pro-

TABLE 3
The Wear Overhead Caused by Migrations
in PCV [1] and Contour

Fig. 6. Sensitivity of tuning write budget line in Contour.

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
1044 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 7, JULY 2021

cess variation focus on optimizing the lifetime and perfor- [3] J. Izraelevitz et al., “Basic performance measurements of the intel
optane DC persistent memory module,” 2019, arXiv: 1903.05714.
mance in the SSD controller [39], [40], [41], rather than in [4] J. Condit et al., “Better I/O through byte-addressable, persistent
the system software. Besides, the granularity difference memory,” in Proc. ACM Symp. Operating Syst. Princ., 2009,
between flash memory and PM brings fundamental discrep- pp. 133–146.
ancy between their wear leveling techniques. There are also [5] S. R. Dulloor et al., “System software for persistent memory,” in
Proc. Euro. Conf. Comput. Syst., 2014, pp. 15:1–15:15.
works that consider process variation in the wear leveling [6] K. Zeng, Y. Lu, H. Wan, and J. Shu, “Efficient storage manage-
of PM. Dong et al. [15] propose a hardware migration algo- ment for aged file systems on persistent memory,” in Proc. Des.
rithm for wear rate leveling considering the trade-off Autom. Test Eur. Conf. Exhib., 2017, pp. 1773–1778.
[7] J. Ou, J. Shu, and Y. Lu, “A high performance file system for non-
between endurance benefits and migration cost. Yun volatile main memory,” in Proc. Eur. Conf. Comput. Syst., 2016,
et al. [42] present a bloom-filter-based algorithm to achieve pp. 12:1–12:16.
highly efficient fine-grained wear leveling of PCM. Zhao [8] E. Lee, S. H. Yoo, and H. Bahn, “Design and implementation of a
et al. [24], [43] take advantages of cell morphing to improve journaling file system for phase-change memory,” IEEE Trans.
Comput., vol. 64, no. 5, pp. 1349–1360, May 2015.
the endurance of MLC PCM cells. Han et al. [44] propose an [9] J. Xu and S. Swanson, “NOVA: A log-structured file system for
architecture-level approach to balance the wear rate of PCM hybrid volatile/non-volatile main memories,” in Proc. USENIX
chips. Zhang et al. [45] evenly put writes on weak or strong Conf. File Storage Technol., 2016, pp. 323–338.
domains according to an endurance-dependent probability. [10] E. H.-M. Sha, X. Chen, Q. Zhuge, L. Shi, and W. Jiang, “A new
design of in-memory file system based on file virtual address
Most of the process variation aware wear leveling works framework,” IEEE Trans. Comput., vol. 65, no. 10, pp. 2959–2972,
focus on architecture level, which bring large hardware cost Oct. 2016.
and show inflexibility on the granularity of migration. In [11] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable high perfor-
mance main memory system using phase-change memory tech-
this paper, the proposed OS-level approach, Contour, is nology,” in Proc. Int. Symp. Comput. Archit., 2009, pp. 24–33.
easy to deploy with negligible hardware and software costs. [12] F. Huang, D. Feng, Y. Hua, and W. Zhou, “A wear-leveling-aware
counter mode for data encryption in non-volatile memories,” in
Proc. Des. Autom. Test Eur. Conf. Exhib., 2017, pp. 910–913.
6 CONCLUSION [13] H. S. P. Wong et al., “Phase change memory,” Proc. IEEE, vol. 98,
no. 12, pp. 2201–2227, Dec. 2010.
In this paper, we studied the wear leveling problem of ino- [14] W. Zhang and T. Li, “Characterizing and mitigating the impact of
des, the most frequently updated data in a file system, on process variations on phase change based memory systems,” in
persistent memory considering process variation. We Proc. IEEE/ACM Int. Symp. Microarchit., 2009, pp. 2–13.
showed the potential wear risk of inodes by a motivational [15] J. Dong, L. Zhang, Y. Han, Y. Wang, and X. Li, “Wear rate level-
ing: Lifetime enhancement of pram with endurance variation,” in
example obtained on existing persistent memory file sys- Proc. Des. Autom. Conf., 2011, pp. 972–977.
tems. We proposed a process variation aware wear leveling [16] Z. Sun, X. Bi, and H. Li, “Process variation aware data manage-
mechanism, Contour, to balance the wear of memory pages ment for SRR-RAM cache design,” in Proc. Int. Symp. Low Power
for inodes. Contour enables dynamic migration of inodes by Electron. Des., 2012, pp. 179–184.
[17] W. Zhou, D. Feng, Y. Hua, J. Liu, F. Huang, and P. Zuo,
a deflection table. In tackle of varied endurance of memory “Increasing lifetime and security of phase-change memory with
domains, we designed cross-domain migration algorithm to endurance variation,” in Proc. IEEE Int. Conf. Parallel Distrib. Syst.,
match inodes to proper memory domains. For the memory 2016, pp. 861–868.
[18] J. Xu et al., “An efficient spare-line replacement scheme to enhance
pages in the same domain show same endurance, we pre- NVM security,” in Proc. Des. Autom. Conf. Conf. Exhib., 2019,
sented an intra-domain migration algorithm to balance the pp. 91:1–6.
writes on the pages in the same memory domain. We imple- [19] X. Wu, S. Qiu, and A. L. Narasimha Reddy,“SCMFS: A file system
mented a case of Contour in Linux kernel based on a real per- for storage class memory and its extensions,” ACM Trans. Storage,
vol. 9, no. 3, pp. 1–11, 2013.
sistent memory file system. Experimental results show that [20] V. Tarasov, E. Zadok, and S. Shepler,“Filebench: A flexible frame-
Contour achieves significant wear-leveling improvement work for file system benchmarking,” Login, vol. 41, no. 1, pp. 6–12,
over existing solutions with negligible overhead. 2016.
[21] “mysqlslap,” 2019. [Online]. Available: https://dev.mysql.com/
doc/refman/8.0/en/mysqllap.html
ACKNOWLEDGMENTS [22] “Fio: flexible I/O tester,” 2014. [Online]. Available: http://freecode.
com/projects/fio
The authors would like to thank the anonymous reviewers for [23] A. P. Ferreira, S. Bock, B. Childers, R. Melhem, and D. Moss,
their valuable feedbacks and improvements on this article. “Impact of process variation on endurance algorithms for wear-
This work was supported in part by the National Natural Sci- prone memories,” in Proc. Des. Autom. Test Eur. Conf. Exhib., 2011,
ence Foundation of China under Grant 61802038, in part by pp. 1–6.
[24] M. Zhao, L. Jiang, Y. Zhang, and C. J. Xue, “SLC-enabled wear
the Chongqing Postdoctoral Special Science Foundation leveling for MLC PCM considering process variation,” in Proc.
under Grant XmT2018003, in part by the China Postdoctoral Des. Autom. Conf., 2014, pp. 36:1–36:6.
Science Foundation under Grant 2017M620412, and in part by [25] S.-H. Chen, Y.-H. Chang, Y.-M. Chang, and W.-K. Shih, “mwJFS:
the Joint Sino(Chongqing)-Singapore Post-Doctoral Fellow- A multiwrite-mode journaling file system for MLC NVRAM
storages,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27,
ship Program. no. 9, pp. 2060–2073, Sep. 2019.
[26] M. Dong and H. Chen, “Soft updates made simple and fast on
REFERENCES non-volatile memory,” in Proc. USENIX Annu. Tech. Conf., 2017,
pp. 719–731.
[1] X. Chen, E. H. -M. Sha, Y. Zeng, C. Yang, W. Jiang, and Q. Zhuge, [27] R. Kadekodi, S. K. Lee, S. Kashyap, T. Kim, A. Kolli, and
“Efficient wear leveling for inodes of file systems on persistent V. Chidambaram, “SplitFS: Reducing software overhead in file
memories,” in Proc. Des. Autom. Test Eur. Conf. Exhib., 2018, systems for persistent memory,” in Proc. ACM Symp. Operating
pp. 1524–1527. Syst. Princ., 2019, pp. 494–508.
[2] P. Zhou, B. Zhao, J. Yang, and Y. Zhang, “A durable and energy [28] M. Dong, H. Bu, J. Yi, B. Dong, and H. Chen, “Performance and
efficient main memory using phase change memory technology,” protection in the ZoFS user-space NVM file system,” in Proc.
in Proc. Int. Symp. Comput. Archit., 2009, pp. 14–23. ACM Symp. Operating Syst. Princ., 2019, pp. 478–493.

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: CONTOUR: A PROCESS VARIATION AWARE WEAR-LEVELING MECHANISM FOR INODES OF PERSISTENT MEMORY FILE... 1045

[29] Z. Ross, “Add support for new persistent memory instructions,” Edwin H.-M. Sha (Senior Member, IEEE) received
2014. [Online]. Available: https://lwn.net/Articles/619851/ the PhD degree from the Department of Computer
[30] “Intel architecture instruction set extensions programming refer- Science, Princeton University, Princeton, NJ, in
ence,” 2020. [Online]. Available: https://software.intel.com/sites/ 1992. From 1992 to 2000, he was with the Depart-
default/files/managed/0d/53/319433-022.pdf ment of Computer Science and Engineering, Uni-
[31] C. Kuo, A. Shah, A. Nomura, S. Matsuoka, and F. Wolf, “How file versity of Notre Dame, Dame, IN. Since 2000, he
access patterns influence interference among cluster applications,” has been a tenured full professor with the Univer-
in Proc. IEEE Int. Conf. Cluster Comput., 2014, pp. 185–193. sity of Texas at Dallas, Richardson, TX. From 2012
[32] J. W. Hurrell et al., “The community earth system model: A frame- to 2017, he served as the dean of the College of
work for collaborative research,” Bull. Amer. Meteorological Soc., Computer Science, Chongqing University, China.
vol. 94, no. 9, pp. 1339–1360, 2013. He is currently a tenured distinguished professor
[33] J. Borrill, L. Oliker, J. Shalf, and H. Shan, “Investigation of leading with East China Normal University. He was a recipient of the Teaching
HPC I/O performance using a scientific-application derived bench- Award, the Microsoft Trustworthy Computing Curriculum Award, the NSF
mark,” in Proc. ACM/IEEE Conf. Supercomputing, 2007, Art. no. 10. CAREER Award, the NSFC Overseas Distinguished Young Scholar
[34] V. Gogte et al., “Software wear management for persistent memo- Award, and the Chang-Jiang Honorary chair professorship.
ries,” in Proc. USENIX Conf. File Storage Technol., 2019, pp. 45–63.
[35] Intel, “Persistent memory emulation platform (PMEP),” 2016.
[Online]. Available: https://pmem.io/2016/02/22/pm-emulation. Xinxin Wang is currently working toward the mas-
html ter’s degree majoring in computer science with
[36] S. Zheng, L. Huang, H. Liu, L. Wu, and J. Zha, “HMVFS: A hybrid Chongqing University, Chongqing, China. His rese-
memory versioning file system,” in Proc. IEEE Symp. Mass Storage arch interests include emerging non-volatile mem-
Syst. Technol., 2016, pp. 1–14. ory technology and in-memory file system.
[37] S.-H. Chen, T.-Y. Chen, Y.-H. Chang, H.-W. Wei, and W.-K. Shih,
“Unistorfs: A union storage file system design for resource shar-
ing between memory and storage on persistent ram-based sys-
tems,” ACM Trans. Storage, vol. 14, no. 1, 2018, Art. no. 3.
[38] M.-F. Chang and S.-J. Shen, “A process variation tolerant embed-
ded split-gate flash memory using pre-stable current sensing
scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 3, pp. 987–994,
Mar. 2009.
[39] L. Shi, Y. Di, M. Zhao, C. J. Xue, K. Wu, and E. H. -M. Sha, Chaoshu Yang received the BS degree from the
“Exploiting process variation for write performance improvement College of Computer Science, South-Central Uni-
on NAND flash memory storage systems,” IEEE Trans, VLSI Syst., versity for Nationalities, in 2008. He is currently
vol. 24, no. 1, pp. 334–337, Jan. 2016. working toward the PhD degree with the College of
[40] Y. Di, L. Shi, K. Wu, and C. J. Xue, “Exploiting process variation Computer Science, Chongqing University. His cur-
for retention induced refresh minimization on flash memory,” in rent research interests include non-volatile memo-
Proc. Des. Autom. Test Eur. Conf. Exhib., 2016, pp. 391–396. ries, distributed systems, and file systems.
[41] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch, and O. Mutlu, “Improving
3D NAND flash memory lifetime by tolerating early retention loss
and process variation,” ACM Meas. Anal. Comput. Syst., vol. 2, no. 3,
2018, Art. no. 37.
[42] J. Yun, S. Lee, and S. Yoo, “Dynamic wear leveling for phase-
change memories with endurance variations,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, pp. 1604–1615, Weiwen Jiang received the PhD degree from the
Sep. 2015. College of Computer Science, Chongqing Univer-
[43] M. Zhao, L. Jiang, L. Shi, Y. Zhang, and C. J. Xue, “Wear relief for sity. He is currently a post-doctoral scholar with
high-density phase change memory through cell morphing con- the University of Notre Dame. He was a recipient
sidering process variation,” IEEE Trans. CAD, vol. 34, no. 2, of best paper awards in ICCD’17, and best paper
pp. 227–237, Feb. 2015. nominations in DAC’19, and CODES+ISSS’19.
[44] Y. Han, J. Dong, K. Weng, Y. Wang, and X. Li, “Enhanced wear- His current research interests include neural
rate leveling for pram lifetime improvement considering process architecture search, FPGAs, nonvolatile memo-
variation,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, ries, and HW/SW co-optimization.
no. 1, pp. 92–102, Jan. 2016.
[45] X. Zhang and G. Sun, “Toss-up wear leveling: Protecting phase-
change memories from inconsistent write patterns,” in Proc. Des.
Autom. Conf., 2017, pp. 1–6.
Qingfeng Zhuge (Member, IEEE) received the BS
Xianzhang Chen (Member, IEEE) received the BS and MS degrees in electronics engineering from
and MS degrees in computer science and engi- Fudan University, Shanghai, China, and the PhD
neering from Southeast University, Nanjing, China, degree from the Department of Computer Science,
and the PhD degree from the College of Computer University of Texas at Dallas, Richardson, TX, in
Science, Chongqing University, China in 2017. He 2003. She is currently a professor with East China
was a research fellow with the National University Normal University, China. Her current research
of Singapore from 2019 to 2020. He is currently interests include parallel architectures, embedded
an associate professor with Chongqing University. systems, real-time systems, optimization algo-
He was a recipient of best paper awards in rithms, and scheduling. She was the recipient of
NVMSA’2015 and ICCD’17, “the Editor’s Pick of the Best PhD Dissertation Award, in 2003.
2016” of IEEE TC, and the Chongqing Best PhD
Dissertation Award in 2018.
" For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/csdl.

Authorized licensed use limited to: R V College of Engineering. Downloaded on December 28,2021 at 16:04:01 UTC from IEEE Xplore. Restrictions apply.

You might also like