You are on page 1of 13

1350 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO.

9, SEPTEMBER 2021

Zweilous: A Decoupled and Flexible Memory


Management Framework
Guoxi Li, Student Member, IEEE, Wenzhi Chen , Member, IEEE, and Yang Xiang , Fellow, IEEE

Abstract—Currently, with the booming growth of cloud computing, workloads from broad ranges of functions and demands are
crammed into a single physical machine. They lay considerable stress on the need of evolution of the operating system underneath,
especially the memory subsystem. Even enhancing large pages with main memory compression is not intuitively straightforward due to
rigid rules imposed by the state-of-the-art manager Buddy System from the beginning of the design. To relieve the aforementioned
problems and provide broader design space for system designers, we propose Zweilous, a clean slate physical memory management
framework. It is self-contained, highly decoupled, and thus can co-exist with the vanilla memory manager. Separate self-contained
metadata/functions guarantee a flexible extension with little modification to current frameworks. To show it is easy to add enhanced
functions that accelerate the evolution of the memory management subsystem, we implement Hzmem, a new large page memory
manager redesign enhanced with the function of main memory compression. Our method achieves competitive performance compared
with native and virtualized large page support, effective memory size increased and fewer impacts on other parts of the operating
system.

Index Terms—Operating systems, memory management, memory architecture

1 INTRODUCTION underpinned during the time and make memory manager


rigid and bug-prone.
memory management subsystem is an indispensable
T HE
part of any modern commodity operating system. Histor-
ically, memory management has evolved from a naive
Page descriptors as the key metadata for memory subsys-
tems are the very embodiment of these assumptions. For
example, the fact that one page descriptor contains data for
implementation to a very sophisticated paged memory man-
agement framework hand in hand with the evolution of one base page of the constant page size. When it comes to
memory hardware. The most famous memory manager of supports for large pages, developers have no alternative but
paged memory, Buddy System, is pervasive in most mod- to combine page descriptors of continuous base pages as a
ern commodity operating systems nowadays like Linux or compound page for one large page. That inevitably introdu-
FreeBSD. However, the algorithm behind it can trace back ces large overheads in time and space. What’s worse, large
into the 1960s [1]. pages become second-class objects with limited functions
Along with the enhancement of the basic paged memory relying on the first-class objects, base pages. These obsolete
management, many subsystems or features are added in assumptions impose constraints on the developments and
succession. They have become heavily coupled with the hinder the further evolution of the memory management
Buddy System, complicating the codebases. subsystem.
Nowadays, memory management subsystems of operat- However, it is difficult to eliminate these constraints. One
ing systems are notoriously difficult to develop and main- solution for system developers or researchers is to make
tain. Kernel developers are in continuous battles against radical changes of metadata/functions at the low levels of
faults, suboptimal and unpredictable performance and the systems. Because these metadata/functions are also
increasing complexity for development [2]. responsible for many other subsystems, the modifications
The root cause is the simplified assumptions or premises are also entangled with them. Therefore the costs of labors
of the underlying physical memories which are centered are prohibitive even for experienced developers as shown
around from the beginning of design, such as a single physi- in many previous works [4], [5], [6], [7], [8], [9].
cal address space or a constant page size [3]. They are Another solution is to use separated, customizable and
self-contained metadata/functions. Because the modifications
 Guoxi Li and Wenzhi Chen are with the School of Computer Science and are isolated, developers are given as much as freedom as pos-
Technology, Zhejiang University, Hangzhou, Zhejiang 310027, China. sible without considering constraints from other subsystems.
E-mail: {guoxili, chenwz}@zju.edu.cn. They can use memory resources to maximum advantage with
 Yang Xiang are with the School of Software and Electrical Engineering,
Swinburne University of Technology, Hawthorn, VIC 3122, Australia.
the knowledge they possess. However, excessive overhauls
E-mail: yxiang@swin.edu.au. for the memory subsystems, which is at the heart of the oper-
Manuscript received 13 Oct. 2019; revised 14 Apr. 2020; accepted 10 June 2020. ating system and functionally interconnected with many
Date of publication 14 July 2020; date of current version 8 Sept. 2021. other subsystems, bring the system instability.
(Corresponding authors: Wenzhi Chen and Yang Xiang.) Therefore, we propose a general-purpose memory man-
Recommended for acceptance by A. Sivasubramaniam.
Digital Object Identifier no. 10.1109/TC.2020.3009124
agement framework abstraction from scratch called Zweilous.

0018-9340 ß 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
LI ET AL.: ZWEILOUS: A DECOUPLED AND FLEXIBLE MEMORY MANAGEMENT FRAMEWORK 1351

The key techniques for Zweilous are as follows a) well- to increase TLB reaches and thus reduce page faults and
defined placeholders of self-contained, customizable meta- TLB misses of big memories; b) main memory compression
data/functions that are necessary for a full-featured memory to increase effective memory size so that it enables a single
manager, and b) well partitioned underlying physical mem- machine to accommodate more workloads, thus reducing
ories that are managed by specific managers. costs. They are used frequently in these use cases separately.
Through these, Zweilous can run simultaneously with However, the costs of labors of applying them in one situa-
the vanilla memory manager, leaving critical data like ker- tion are prohibitive because of some obsolete assumptions.
nel data managed by the vanilla memory manager and For example, a single base page size is assumed in the
bringing down instability when developers deploying their current page reclaiming subsystem on which the vanilla
experimental implements of new ideas. It gets rid of rigid main memory compression framework is based. All that
assumptions with customizable metadata/function and hinders the enhancement of large pages with main memory
hides many non-trial, general details of developments of compression.
physical memory managers. It is the first-ever memory Motivated by the above, we have implemented a new
management framework in a commodity OS with three physical memory manager called Hzmen for memory compress-
benefits: ible hugepages. It takes advantage of Zweilous to show how it
is easy to add enhanced functions that accelerate the pace of
 Decoupled and flexible extension, co-existing with the evolution of the memory subsystem. Hzmem is another new
vanilla and prevention from the “burden of history”. It is a data and control path of physical memory management run-
decoupled and flexible methodological support of ning in parallel with the regular path. Hzmem includes a self-
memory manager implementation with a clean slate contained hugepage allocator without utilities provided by
by utilizing separated, customizable self-contained Buddy System. Hzmen have competitive performance over
metadata/functions and management from the vanilla the vanilla large page support in the benchmarks of frequent
memory manager. Therefore it enables multiple mem- and heavy hugepage allocations. Also, we believe that
ory managers to co-exist in one operating system co- Hzmem is the first physical memory manager that inherently
operatively. That will bring down much instability allows main memory compression over large pages. Hzmem
from the newly-designed memory manager. can increase effective memory size and impacts little on the
 Non-intrusive ways with little modification to implement rest of the operating system.
new ideas. The framework remodels the current oper- This research makes the following contributions:
ating system memory infrastructure in less intrusive
ways. Zweilous enables efficient development of new  We analyze the implementation and execution char-
ambitious/aggressive physical memory managers acteristics of current physical memory management.
managing the resource on their own for higher per- We find out the root cause of why its design is rigid
formance and/or richer features with little efforts. and not open to extension. Therefore, a new clean-
 More programmability for ambitious developers. The slate idea of managing memory with the current one
framework is based on a general understanding of in an independent way is pioneered. It opens up
memory management developments rather than pre- another direction of memory subsystem evolutions
cise definitions. In order to help developers easily to for system designers and researchers to experiment
introduce and modify the memory management serv- with their new ideas or algorithms with less effort,
ices, and to focus on the new ways of managing mem- which is difficult before.
ory resources without taking care of details, the  We propose a new physical memory management
framework hides many non-trial general details and framework abstraction named Zweilous which is
provides programmability at two levels. First, the self-contained and decoupled from the current mem-
framework can provide userspace application compat- ory framework. To the best of our knowledge, this is
ible API at the application level. Second, the frame- the first effort ever to design a physical memory
work can manage memories for some well defined framework that is independent of and co-exists with
partitions of the physical memory at the level of pages. the unmodified one.
Therefore, ambitious developers need only pay atten-  We design and implement new large page support
tion to the designs of metadata/functions, like page with the main memory compression feature named
descriptors and page fault handlers, and fill up the pla- Hzmem, which is to enhance large page support
ceholders for the targeted memory allocator. with the function of main memory compression with
On the other hand, both industry and academia have wit- moderate efforts.
nessed a growing movement towards cloud computing in  We demonstrate a thorough evaluation of Zweilous
recent years. In response, circumstances and scenarios have and Hzmem showing the framework and its specific
evolved to the point where intricate applications are very appliance can achieve competitive performance,
likely to deplete the entire memory of a single physical increasing effective memory and impacting little on
machine. Traditional databases, NoSQL stores, and large other subsystems.
router machines fall into such categories of memory-hungry The remainder of the paper is structured as follows:
workloads where large memory footprint fails to bring section 3 outlines the high-level design and the architec-
about good temporal locality [4], [10], [11], [12], [13]. ture of Zweilous, section 4 describes the implementation of
Two existing techniques are already used for the afore- Hzmem, section 5 presents the evaluation of Zweilous and
mentioned workloads: a) large pages also called hugepages Hzmem, and section 6 concludes.

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
1352 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 9, SEPTEMBER 2021

2 RELATED WORK with fewer overheads. Pekhimenko et al. [16] propose Line-
arly Compressed Pages (LCP) that compresses all the cache
Although virtual memory is an active research area, our
lines with the same pages to the same size using special
work of new memory management framework over com-
hardware. Unlike these works, Hzmem does not need any
mercial operating systems is the first effort to the best of our
hardware modifications.
knowledge.
Tuduce et al. [17] propose a main memory compression
solution that can resize the compressed data automatically.
2.1 Memory Management Architecture
This work gives us much insight into how to manage com-
Recently, lots of researches [2], [4], [5], [6] with great ambi- pressed data in software, and we get much inspiration from
tions make efforts to review or improve the memory it to implement our part of compression management.
subsystem.
Some studies give us an overview of the evolution of
Linux memory subsystems. Huang et al. [2] conduct a quan- 2.4 Limitation of the Previous Works
titative survey of virtual memory’s development process Previous works improve or enhance the memory manage-
over five years (2009-2015). Insights of this work help sys- ment subsystem by introducing heavy software or hardware
tem designers and developers track hot functions and build modifications. Although main ideas are simple such works
more reliable and efficient memory management systems. require lots of domain-specific knowledge in implementa-
The study with great patience gives us lessons and start tion. This paper does not focus on the traditional method of
points for diving into this research area. memory subsystem improvement, but rather it focuses on a
Clements et al. [5] propose a new design called RadixVM general-purpose memory management framework abstrac-
that makes mmap, munmap and page faults scale perfectly tion to facilitate the implementation of new ideas.
on non-overlapping memory regions. The drawbacks of
RadixVM are its intensive labor interconnected with other 3 DESIGN
subsystems if applied to a commodity operating system.
In this section, we describe the design and architecture of
Therefore the only implementation on a Unix-like research
Zweilous. We focus on the motivation of redesign and the
kernel is presented. This study again shows that modifying
multiple choices made during the process of design. To clar-
the memory subsystem of commercial operating systems
ify such problems we are giving examples of Linux, but the
needs to be thought twice and that our work to avoid the
idea is not specific to Linux.
risk by using Zweilous is valuable.
Park et al. [7] propose a new physical memory manager
call lazy iBuddy system. They eliminate overheads of splits 3.1 Motivation
and coalescing by managing pages individually and reduce The rapid development of cloud computing and thus the
lock contentions by exploiting a fine-grained locking mecha- advent of emerging demands impact tremendously the evo-
nism. However, this work has to replace vanilla memory lution of software and hardware infrastructure. For exam-
manager while our work can co-exist with Buddy System. ple, Non-volatile memory (NVM) technologies which are
Considering the complexity of a memory manager, our receiving increasing attention in academia and industry
work is more reliable. blurs the line between main memory and persistent storage
[18], [19], [20], [21] while big memory workloads of hun-
2.2 Huge Page Support dreds of GBs call for much more elaborate and adaptive
Large page supports on commercial operating systems are overcommitment functions of the underlining memory
pioneered very early at the beginning of this century. Navarro management. These new situations bring big challenges for
et al. [8] pioneer the official large page support in FreeBSD. the current memory management whose obsolete design
They introduce techniques of reservation-based allocation can date back to decades ago.
and fragmentation control on the vanilla memory manager Our efforts to build Zweilous are first motivated by
while Hzmem is built upon the new memory framework improving memory-hungry workloads without good tempo-
Zweilous that is independent of the vanilla base page alloca- ral locality. In this section, we discuss two exiting techniques:
tor. Kwon et al. [9] propose Ingens, a framework for transpar- large pages and main memory compression. We discuss the
ent huge pages. They use tracking utilization and access current challenges in combining them in one situation, and
frequency of memory pages. Panwar et al. [14] present a nice briefly mention how we address these challenges. Finally, we
research of the impacts of fragmentation with huge pages in show some more examples of why a framework is needed so
the Linux kernel and an efficient memory manager called illu- that memory subsystems of the operating system can evolve
minator to make cost-effective allocations. The problems these independently and quickly.
works try to resolve, like fragmentations of base pages are the
focus of our work. 3.1.1 Large Pages
Most modern commodity operating systems support large
2.3 Main Memory Compression pages. For example, Linux has support for large pages since
Most researches of main memory compression focus on around 2003 [22]. The large pages are called hugepages in
hardware modification. Ekman et al. [15] propose a main Linux. Unlike other operating systems, Linux starts to sup-
memory compression scheme that reduces access costs by port large pages more explicitly via hugetlbfs filesystem.
exploiting a highly-efficient structure for data locating and Developers need to map some sections of memory from
a hierarchical memory layout to vary the compressibility files of hugetlbfs. The process first reads or writes that

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
LI ET AL.: ZWEILOUS: A DECOUPLED AND FLEXIBLE MEMORY MANAGEMENT FRAMEWORK 1353

sections and thus triggers a page fault. While handling page other subsystems are using them that a single byte modifica-
faults, the virtual memory mapping of the hugetlbfs tion impacts much on other parts of the operating system.
makes the operating system associate the virtual memory The task is time consuming and laborious because of the
with one free large page instead of one free base page. How- complex invariants of memory systems.
ever, the free large page is not directly allocated from physi- We choose another solution that self-contained, custom-
cal memory manager (i.e., Buddy System) but large page izable metadata/functions are used to eliminate these con-
memory pools preserved in advance by administrators. straints. The implementation can have as much freedom as
The large page that it maintains is indeed contiguous possible without the consequent impacts on other subsys-
base pages preserved from Buddy System. Thus, the mem- tems if implemented by the first solution. Developers need
ory pool acts as a “broker” or “middle person” between the only to pay attention to the specific parts of the system and
process demanding large pages and Buddy System of the not other parts.
physical memory manager.
Why processes cannot use large pages directly from the 3.1.3 Slow Evolutions of Memory Management
physical memory manager? It is a burden of history. The The memory subsystem is a mature and core kernel subsys-
legacy physical memory manager has been running without tem. Over decades, it has much more functionalities and
trouble for long before operating systems start to support interconnected more inextricably with other subsystems. Its
larges pages. The design of the physical memory manager codebases get large and complex. Therefore, the recent
in Linux from the very beginning bases on the simplifying developments are mainly focused on bug fixes, code main-
assumption that the page size is constant (i.e., 4 KB of the tenance, and optimizations, but new features [2]. To get a
base page in Linux). Meanwhile, a lot of enhanced features, general understanding of the memory subsystem with other
like swapping or slab, have been added to it not long after- subsystems in terms of new features added recently, we do
wards. They all exclusively handle base pages using the a survey of important features of Linux kernel from version
page descriptors for base pages, which underpins the 4.0 to 5.5. That is a time range from Apr 2015 to Jan 2020.
assumption and make the memory management system The sources are “LinuxChanges” from the site kernelnew-
rigid and complicated. bies.org. They introduce kernel changes whenever a major
Therefore, Linux kernel developers have to take a lengthy revision of the kernel is released. We pay attention to the
detour and implement the large page support based on the title in the section of “prominent features” or “coolest
vanilla memory manager at the expense of overheads and features” after checking its descriptions. We observe that 18
maintainability problems. In Linux, one large page is deemed are associated with memory subsystems, which accounts
to be 512 contiguous base pages in an aligned 2 MB memory for only 7.9 percent of the 228 important features in total.
region but consuming the same amount of page descriptors. Memory management changes are very few. For example, 4
In Linux, it is the key metadata called struct page, a data of them are related to system calls like mmmap, madvise.
structure per base page holding meta-information used by the They are deemed as important because some in-kernel serv-
physical memory manager. ices are exposed to user space, which has a greater impact
on others. Therefore, if we raise the standards, the percent-
age of new and important features that are recently added
3.1.2 Main Memory Compression
to memory subsystems can be lower.
A compressed-memory system reserves some sections of We find that some large and crazy memory manage
memory to hold pages in compressed form to make the changes finding their way into the mainline often take
effective memory larger instead of disk read/write for many years like userspace page fault handling [26], or some
swapping. Main memory compressions like zRAM [23], incomplete functions take a couple of years to complete like
zswap [24] and zcache [25] are supported in Linux, but using persistent memory as RAM [27]. Although our early
the implementation is leveraging page reclaiming subsys- motivation is contingent, these examples show that a frame-
tem. The page reclaiming is centering around Buddy Sys- work should be built for hiding non-trial, general details
tem and expects a page size of 4 KB. Therefore, hugepages and make it more extensible for developers. The program-
of legacy memory management are detached from the page mability it achieves can help memory subsystems evolve
reclaiming for practical reasons. more quickly. Developers can use the framework to fulfill
The constraints that make it difficult to reuse current them for these benefits:
infrastructure to complete our goal: 1) current page reclaim-
ing subsystem expects a single base page size exclusively,  Zweilous aims to generalize. Components of our
not for multiple page sizes. It is no trivial to change the situ- framework are as follows a) well-defined placeholders
ation and even some simple key assumptions Linux has of self-contained, customizable metadata/functions
held for a long time have to be challenged. 2) Current large that are necessary for a full-featured memory man-
pages supports do not involve in the page reclaiming sub- ager, and b) well partitioned underlying physical
system. Furthermore, The fact that large pages in Linux memories that are managed by to-be-implemented
have no backing storage devices leads to the impossibility memory managers. It is easy for developers to start
of swapping and writing back to persistent storage devices. designing the very core of the memory manager with-
One solution is to modify base codes of page reclaiming out learning or referring to the details that memory
or even the page descriptors, the metadata, to change this managers have in common. Our work shows that a
situation. As one of the most used metadata in the kernel, developer without any experience of kernel develop-
the size of page descriptors has swollen largely. So many ing can start to work in 2 weeks.

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
1354 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 9, SEPTEMBER 2021

 Zweilous can run with the vanilla memory manager From the practical development with high-level design,
in one operating system. Zweilous achieve program- we derive the following metadata/functions which a new
mability at two levels: pages and API levels. Chang- memory management framework need to consider:
ing and debugging memory management is difficult
as many in-kernel debugging/trace tools rely on  invisibility from the vanilla memory management
memory to run. The case is different in Zweilous. subsystem.
The underlying physical memory is well partitioned:  new page descriptor represents the physical page
important data in the operating system, like kernel states.
data, is managed by the vanilla one which is proved  page fault handler branch.
to be robust and stable in a long time while the mem-  page reclaiming mechanism.
ory of testing applications is handled by the newly- In need of considering details and implementations of
implemented memory manager based on Zweilous. these aspects, system designers and developers should not
Developers can easily introduce and modify the new ignore optimization.
memory management services with much fewer
times of kernel crashes, which greatly reduce devel- 3.2.1 Invisibility From the Vanilla Memory Management
opment time and labors. Subsystem
Therefore, we conclude that enhance the large page with To make our framework as decoupled as possible from the
main memory compression using existing infrastructure vanilla memory management subsystem in Linux, we decide
and tools is even more difficult and convoluted, and that it to make the framework completely invisible from the vanilla
is advisable to redesign a general-purpose physical memory memory subsystem but not to make any significant modifica-
management framework based on much more flexible and tion to other parts of the operating system. During the boot-
extensible assumptions without bringing instability to the ing process of the operating system, some memories need to
Linux base code. be removed from the management of the vanilla memory
management subsystems. The reserved memory region is,
thus, invisible from the operating system but visible to our
3.2 Goals and Challenges of Design new memory framework. Applications access memory
We want the code modification both of kernel space and through the framework. Then our new memory framework
user space to be as small as possible. Thus, 1) in user space, can manipulate the reserved memory region exclusively
framework interface should be compatible with the legacy without any interference of any other Linux subsystems.
hugepage API; 2) in kernel space, a self-contained frame-
work without using existing in-kernel utilities (i.e., function
or API provided by the kernel) makes no overlapping with 3.2.2 New page descriptor represents one physical
old stable kernel codes. page’ s states
Thus, the main goals are to make our framework decoupled A normal page descriptor is used for representing one physi-
and flexible, and systems based on Zweilous do not compro- cal page’s states and the key metadata for memory manager.
mise functionality and performance. The new systems can also These general metadata are modified by many subsystems
run simultaneously with other subsystems and/or even the in a frequent and complicated way during the whole life of
vanilla memory management. We take hardware virtualiza- the operating system. Therefore, They contain various states
tion support into consideration to make our framework more concerning many different subsystems in just one data
versatile in cloud computing. structure.
A large, old and complicated system is retrofitted with a For example, the complete definition of a struct page
new feature that affects or crosscuts all functions. This kind has a size of over 4 double words in x86-64 architecture
of retrofit does not make a better new system. There are though it has been optimized by developers leaving almost
many cases in the system design domain. For example, no means untried. The union of C language is seen
Buddy System, the allocator that was originally designed throughout its definition. What parts can be put together
to manage a single page size, is to be retrofitted with a new into one union is quite a domain-specific knowledge, and
feature for large pages. Though the new feature can destroy there is no guarantee that the length of the structure will be
the old principles or assumptions, it has to be subordinate unchanged. Many states concerning different subsystems
to the old ones by reusing the existing page descriptors. are tangled together, which causes a problem of readability
That makes the design phase left with less freedom and and maintainability in this overused data structure. More-
results in an awkward implementation. over, the assumption of a constant page size originates in
The solution is to level the new features with the old the general page descriptors.
ones. It is also the first-class principle. The key points of our The new memory management is focused on the mem-
framework are as follows: a) well-defined placeholders of ory regions that are detached and invisible from the vanilla
self-contained, customizable metadata/functions that are memory management subsystem. Due to the main design
necessary for a full-featured memory manager, and b) well goals of decoupling and flexibility, reusing of old page
partitioned underlying physical memories that are man- descriptor with readability and maintainability problems
aged by specific managers. We decide to implement the and basing on the obsolete assumption of page size of 4 KB
huge page physical memory management with a clean slate show no merits or even necessity. But a page descriptor
based on a redesigned memory management framework used for holding information of one physical page is indis-
Zweilous. pensable, therefore we decide to use a completely new page

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
LI ET AL.: ZWEILOUS: A DECOUPLED AND FLEXIBLE MEMORY MANAGEMENT FRAMEWORK 1355

descriptor and make its size as small as possible. It only


holds the information needed by our new memory manage-
ment and free of readability and maintainability problems
with custom page size it can present.

3.2.3 Page Fault Handler Branch


Our new framework aims to run on top of commodity hard-
ware. When it comes to memory management, it is difficult
to avoid covering page faults that greatly concern the hard-
ware mechanism. For example, in x86-64 architecture dur- Fig. 1. The architecture of decoupled and flexible memory management
ing translation from virtual memory address to the physical framework.
memory address (we call it page walk), if a state of the phys-
ical page in the page table entry indicates ”not present”, a exists with the vanilla one at the same time in a decoupled
page fault will be triggered by hardware without software and flexible way.
interference. A page fault handler registered by the operat- There are three layers of the whole system: user space,
ing system during booting will be invoked to deal with the kernel space, and hardware. Zweilous gets decoupled from
exception, which is responsible for bringing back the con- the vanilla in all these three layers:
tents of pages and setting up the correct physical memory
address in the page table entry.  In the hardware layer, physical page frames are divided
The new framework is to be designed as decoupled and into two regions: one managed by the vanilla and the
flexible as possible. However, the page fault handler is global other by Zweilous. To make it decoupled, there is no
and only one in the operating system, which is a fact deter- overlapped region managed by Zweilous and the
mined by hardware and cannot be changed through software vanilla together at any point in time.
methods. Therefore, still in decoupled and flexible manners,  In kernel layer, Zweilous uses its own page descriptors,
we decide to make a branch in the old page fault handler that physical memory allocator and page reclaiming mech-
deals with page faults exclusive to the memory region we anism in as self-contained ways as possible, i.e., there
reserved in advance. That requires only minor modifications are no data structures or code paths tangled with the
to old page fault handler including a branch check and a new vanilla memory management subsystem except a
branch dealing with the new page fault code path for the branch in the global page fault handler (because most
reserved memory region. All this does not even touch the old hardware platform specifications stipulate one global
code path, keeping it stable and decoupled. page fault handler for all physical memories).
 In user space, Zweilous provides a specially-designed
APIs or compatible ones. Through these, administra-
3.2.4 Page Reclaiming Mechanism tors decide which part of the virtual memory for a
As a full-featured memory management framework, the certain process would be backed up by the physical
page reclaiming mechanism should not be omitted. There memory managed by Zweilous. Other processes or
are various kinds of subsystems in Linux based on the old other parts of virtual memory is still backed by the
page reclaiming mechanism, like swapping, page migra- vanilla, which makes the most processes run in a sta-
tion, page writing back, main memory compression, etc. ble and decoupled way.
They are all running simultaneously according to page Between layers of User and Kernel, specific system calls can
states provided by the aforementioned page descriptor. be used for specifying parts of virtual memories. Between
They are highly dependent on the page states that are layers of Hardware and Kernel, there is a constant one to one
crammed into one page descriptor, meaning that these page match of physical page frames and page descriptors. There-
descriptors are subject to race conditions. Introducing locks fore Zweilous is vertically decoupled and isolated.
and state checks make codes concerning page reclaiming
lengthy and complicated, thus bug-prone.
However, our new memory management framework 4 IMPLEMENTATION OF HZMEM
gives up using the old page descriptor, which deprives us
Leveraging Zweilous, we implement Hzmem, a new hugepage
of the aforementioned problems.
manager with the function of main memory compression and
To be decoupled and simple, we decide to design a new
hugetlbfs API compatibility. Zweilous can take Hzmem
page reclaiming mechanism that is completely detached
and put it in the placeholder of the New Memory Allocator
from the old page reclaiming mechanism and based on the
shown in Fig. 1. Physical memories allocated from different
page states provided by the new page descriptor.
memory managers in Zweilous can be used by applications
differently through normal API or hugetlbfs API.
3.3 Architecture of Zweilous Hzmem consists of four parts: physical large memory
To sum it up, the decoupled and flexible memory manage- manager, page fault handler for large pages, page reclaiming
ment framework is shown in Fig. 1. Zweilous contains a for compressing large pages, and large page compression
placeholder for a physical memory allocator that can be data management. Fig. 2 shows workflows in four parts.
implemented from academic or industrial requirements or A complete workflow involves two paths — the com-
ideas, and the whole framework is self-contained and co- pression path and the decompression path — and two

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
1356 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 9, SEPTEMBER 2021

Fig. 3. Management of hugepage in Hzmem. In the figure, there are 3


sections in one node. Each node has two lru list for holding allocated
pages.

Fig. 2. Workflows of Hzmem: two paths, two entities, and one end.
newly-customized page descriptor is introduced in
Hzmem. One page descriptor strictly represents one
entities — a fault handler and a compression daemon — one large page and co-exists with the vanilla page descrip-
end — a compression data manager. tors that represent base pages for other parts of the
Compression Path. a periodically-running compression operating system. The one-to-one mapping gets rid of
daemon selects pages with less prospect to be used in the any “compound” and space-wasting problems.
future (we call them cold pages) and passes them to com-  The new page descriptors are completely self-contained
pression data management component. The compression and specialized without being used by other parts of
data manager compresses them in a lossless way. the operating system. It contains fewer fields of struc-
Decompression Path. This path starts from a user space ture with much more readability and maintainability.
application accessing a page that is compressed in the above The decreasing number and size of the new page descrip-
path before. When that happens, a special page fault is trig- tors reduce memory footprint enormously, which is ampli-
gered and the corresponding handler restores the compressed fied by machines with a large capacity of physical memory
pages through compression data manager in a fast way. under severe pressure.
We implement 3196 lines of C code (LOC). It runs well in
Linux kernel of versions with the vanilla Linux memory
manager Buddy System. 4.1.2 Deallocated and Allocated Page Management
Each node of a NUMA system usually have tens of GBs of
memory on servers. To manage such large memory effi-
4.1 Hugepage Physical Memory Allocator
ciently, one NUMA node should not be managed in the
We take a clean-slate approach to implement the hugepage
same way as Linux. Zweilous gets rid of Buddy System,
physical memory allocator from the ground up. Overall, it
and thus many its key assumptions cannot hamper Hzmem
uses the design principles described in Section 3.
from flexible management of memory.
As is shown in Fig. 3, on top of Zweilous, hugepage
4.1.1 New Page Descriptors memory of one NUMA node contains several sections. Cur-
Currently, one important data structure regarding memory rently, we assume the size of one section as 4 GB heuristi-
management in Linux is struct page since many subsys- cally. Thus, free pages are scattered into different sections,
tems or functionalities of memory management, like page reducing lock contention among CPUs. A smaller granular-
allocation/free of Buddy System, page reclaiming, etc. ity memory management achieves better scalability and
using it for getting/setting page states. Each struct page parallelism.
represents one base page in the initial designing phase. As The state-of-the-art Buddy System manages contiguous
memory capacity increases, it means a large number of page blocks of base pages in the number of 2n , where n is called
descriptors, forcing developers to rack their brains cram- order. A n þ 1 order of physical memory block is exactly
ming more into a single page descriptor just to make its size twice the size of a n order of physical memory block. To
small at the expense of readability and maintainability [28]. best fit the size of requested memory large order of block
Hugepages (Linux’s large page support) use this data split into halves of one order lower when necessary. Both
structure to reuse page allocation/free from Buddy System are called a buddy to each other, where the name of Buddy
by merging 512 base pages into one compound page. Its System comes from. To make contiguous physical memory
states are stored in one page descriptor at the head, leaving as large as possible, both buddies, when free, are coalesced
other page descriptors unused. 511 of 512 page descriptors into one block size of one order higher.
are just wastes of spaces. Blocks of excessively large orders are seldom requested
We introduce new page descriptors exclusively for and bring lots of overhead when allocated/free. The upper
Hzmem. It has two benefits: limit of order in Linux is empirically 10, which makes the
largest contiguous block 4 MB, a size of 2 contiguous huge-
 In Hzmem, vanilla large page support that treats one pages. Thus, it is not advisable to maintain free huge pages
large page as 512 contiguous base pages is dropped. A in contiguous blocks. To make the implementation of

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
LI ET AL.: ZWEILOUS: A DECOUPLED AND FLEXIBLE MEMORY MANAGEMENT FRAMEWORK 1357

Hzmem simple, we arrange free hugepages in linked lists. 2) More Complicated Page Fault hanzdling. When EPT
The non-trivial splitting and coalescing of “Buddy” are faults in the same 2 MB range occur successively,
averted. In Hzmem, allocation/free of hugepages are done only the first one triggers a host page fault and loads
in Oð1Þ time with faster speed. . the 2 MB memory pointed by the page table entry.
The rests do not have to trigger a host page fault due
to the already resident 2 MB memory. However,
4.2 Page Fault Handler they cannot simply point to the same 2 MB page
Page faults are categorized into hard and soft. They are dis- since the fault addresses falls in different offsets
tinguished by whether reading contents from disk to fill the within the 2 MB range.
free page returned from memory manager, which means Therefore, some tweaks of page faults exist in Hzmem to
that hard page faults make the performance of the system support nested paging in a hardware-based virtualization
suffer. environment.
Since huge pages are not mapping persistent storage we Our solution to the mismatch of page size shows great
pay attention only to soft page faults that trigger page table flexibility in our design. We add one field to the new page
operations in our implementation of Hzmem, which greatly descriptor for containing the offset within the 2 MB block.
simplifies the code path of page faults. Because of our self-contained design, it is convenient to add
There are two solutions when processing page faults in one field without impacts of other subsystems.
Hzmem: We make the page descriptor available to the EPT page
1) The most common one is the virtual page is accessed fault handler. When the first EPT fault occurs within one
when not resident in memory for the first time. The 2 MB memory block, the VMM setups EPT page table and
huge page fault is a soft page fault, therefore the new host page table correctly and loads a free 2 MB page. Other
physical memory manager allocates a zeroed huge EPT faults within the same 2 MB block can be handled
page. through the offset carried by the page descriptor without
2) The special one is protection violations are triggered. manipulating the host page table. The offset in page descrip-
Either a shared page has to be returned from page tor is transient, and it is reusable in every EPT fault within.
caches or a compressed page has to be decompressed
by the compression data manager. 4.3 Page Reclaiming and Compression Data
Hardware-based virtualization helps to improve virtuali- Management
zation performance and simplify guest OS implementation. The allocated pages are identified as cold or hot. The cold
We also extend Hzmem to virtualization supports in terms ones are reclaimed instead of hot ones, which helps reduc-
of X86-64 CPU and Intel VT-x for practical purposes. ing the throttles of the operating system when memory is
Intel VT-x introduces many new features: In term of CPU not enough. Based on Zweilous we have our own page
virtualization, root and non-rood mode guarantee different descriptors for storing states of hugepages and thus a self-
privileges of CPU executions for isolation and trap-and- contained page reclaiming can be established. In every
emulate; in terms of memory virtualization, nested paging, node, a daemon monitors the usage of huge pages periodi-
called Extended Page Table (EPT) is used for mapping from guest cally with a watermark indicating whether the memory is
physical memory address to host physical memory address. enough or else triggers page reclaiming. We use the second
However, it is not possible to apply Hzmem directly. The chance algorithm [29] taken from vanilla Linux and further
challenge comes from different page sizes of host and guest. investigation of hot/code page identification for specialized
Both host and guest can have 2 sizes of pages: 2 MB and optimization is left for future work.
4 KB. That forms 4 kinds of combinations mathematically, Hugepage compression data manager takes a role in com-
but only 3 kinds exit in reality—a combination of 2 MB in pression/decompression of hugepages. We choose LZ4 for a
guest and 4 KB in host does not exist in Linux. lossless data compression algorithm with better decompres-
Ideally, everything just works fine when page sizes sion speed [30]. Since the not fixed size in the compressed
match both in host and guest. An EPT fault is a VM exit, form we store them in base pages for less fragmentation.
and thus the execution falls back to the VMM. In the han-
dling of the EPT fault to fill the EPT table, the VMM triggers 5 EVALUATION
a host page fault which brings in the “real page”. Both page
fault handlers can simply use the two corresponding EPT We measure Zweilous and Hzmem with many user applica-
page entry and host page entry with the same size, which tions and benchmarks, comparing against the state-of-the-art
can be perfectly dealt with by Hzmem or the vanilla. hugetlbfs in Linux. Experiments are performed on a
The situation is not straightforward in the third 64-GB-memory server having 16 Intel Xeon E7520 1.87 GHz
combination—4 KB in the guest and 2 MB in the host. CPUs. Linux 3.10 and Centos 7 are for both host and guest
environments. Base pages are 4 KB and large pages are 2 MB.
1) Managing Two Sizes in One Memory maznager. In this After describing some details of our development experi-
case, one host page entry of 2 MB can be involved in ence to show the programmability, we first test Zweilous in
many other EPT faults of 4 KB whose fault address native, non-virtualized and virtualized platforms using
falls in the same 2 MB memory range. Hzmem mem- SPECjvm2008[31], SPECCPU2017[32] for overheads and
ory manager has to deal with these two kinds of STREAM [33] for throughputs. Then datasets of Yelp Data-
pages at the same time while Hzmem is inherently set Challenge [34] are used to evaluate effective memory
designed for managing pages with a size of 2 MB increased in Hzmem. Finally, a series of user applications

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
1358 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 9, SEPTEMBER 2021

are conducted to test the page fault overhead exclusively for


performance isolations. Parameters for hot and cold pages
are consistent throughout the experiments: watermark as
80 percent and detecting period as 10 seconds.

5.1 Programmability of Zweilous


The major benefit of our framework is that developers can
easily introduce and modify the memory management serv-
ices. We evaluate how long it takes one to implement a new
memory manager in Zweilous by report the time it took one
student of our lab to add the compression/decompression
to Hzmem. The student has no prior experience of kernel
development. Therefore, the student needs to do a survey
of compression/decompression techniques in Linux kernel
and some basics of kernel coding styles. Our framework
gets rid of the needs of learning kernel API as all the meta-
data/functions are self-contained. The student had only to
learn how to put the invocation of customized compres-
sion/decompression services in the branch of the page fault
handler. Therefore, it took 2 weeks to start actually coding,
and less one week to finish the prototype. We believe more
experienced developers can get quicker to start while the
finishing time depends on the complexity of the design
which is out of the interest of our work.

5.2 Overheads of Zweilous


We want to measure the overheads introduced by Zweilous
in SPECCPU benchmarks. We utilize a tool called hugectl
in libhugetlbfs [35]. It is a wrapper for hosting applica- Fig. 4. Overhead of Hzmem in SPECCPU and SPECjvm. All values are
tions by remapping the text and data section of applications normalized to the baseline of unmodified Linux.
in large pages. The heap sections used by libc can also be
hooked by redirecting to mapping of large pages. All this 5.3 Throughputs of Hzmem
can be achieved without any modification of source code. We want to measure the throughput of memory managed
From the results shown in Fig. 4a, we can find that Zwei- by Zweilous and compare it with the unmodified vanilla
lous slows down in the most benchmarks, but no worse manager and how it scales with large memory usage.
than 2.32 and 0.58 percent on average. Since SPECCPU is Table 1 compares the throughput of STREAM bench-
CPU-intensive benchmark suite, benefits of the new design mark in Hzmem to unmodified Linux. STREAM is a syn-
do not appear obvious. The performance reduction can be thetic benchmark of memory bandwidth with simple vector
explained from the extra page fault handling with decom- operations: Copy, Scale, Add, and Triad. The STREAM
pression/compression and lack polishness of the code like benchmark is suitable for datasets that are beyond the CPU
the vanilla one. Though Zweilous is not showing many cache. We run STREAM with different vector sizes which
advantages, it works competitively and not too badly in are indicated by the x-axis in units of millions.
non-memory-insensitive environments. The results show that with vector sizes ranging from 20 mil-
Fig. 4b shows the overheads in Zweilous from SPECjvm lion to 40 million the throughput of Hzmem and unmodified
benchmarks. SPECjvm benchmarks focus on the perfor- Linux is not different wider than 0.2 percent. On the other
mance of a Java Runtime Environment and reflect the perfor- hand, with vector sizes from 80 million on, the gap between
mance of the hardware processor and memory subsystem. them gets wider. Hzmem achieves at most 0.6 percent better
We configure the SPECjvm to make the Java heap using large throughput and even on average 0.5 percent can be reached.
pages. All this proves that Zweilous scales better when working sets
The results show that Zweilous works better in most increase greatly.
cases with 3.56 percent best. The worst case is -4.88 percent. When memory usage is much larger than the cache
Zweilous runs much better in SPECjvm than in SPECCPU. capacity and fragmented after long-running without the fit
Since SPECjvm is a data-intensive benchmark suite, benefits size of free large pages, the vanilla memory allocator has to
of the new design come to show its advantage. Using Zwei- coalesce memory to get required sizes, thus bringing over-
lous, the management of large pages eliminates much heads and throughput degradation. However, Zweilous
redundancy of memory footprint by reducing the number uses a new metadata for each large page and eliminates the
of page descriptors and simplifies the execution of page overheads of splitting or coalescing neighboring memory
fault handling of large pages thanks to the new code path of from Buddy System of the vanilla memory manager, which
page fault handling. shows great advantages on the large scale.

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
LI ET AL.: ZWEILOUS: A DECOUPLED AND FLEXIBLE MEMORY MANAGEMENT FRAMEWORK 1359

TABLE 1
Throughput (MB/s) of Benchmark STREAM in Different szizes (millions) of Vectors

COPY SCALE ADD TRIAD


Size(m) Hzmem Unmod Hzmem Unmod Hzmem Unmod Hzmem Unmod
20 3031 3035 2595 2592 2825 2823 2352 2352
40 3024 3018 2588 2582 2814 2807 2321 2319
60 3030 3013 2591 2578 2816 2801 2324 2315
80 3028 3009 2591 2575 2810 2795 2323 2310
100 3030 3010 2590 2576 2824 2809 2342 2332

5.4 Virtualized Platform Section 4.2. In both cases, the QEMU in host creates and
We test our Zweilous in the virtualized platform using lib- maps the main memory of the virtual machine using 2 MB
virt 3.9.0 and QEMU 1.5.3. Both guest and host use Linux page size. In guest, we configure the benchmark SPECjvm
3.10. QEMU [36] is a generic and open source machine emu- using two page sizes to stress memory subsystems through
lator and virtualizer that can leverage hardware virtualiza- different page fault code path.
tion on x86-64 platform. It is a user-mode virtual machine The first case is that the benchmark uses 4 KB page size
launch and monitor. To use Hzmem to manage the memory inside the VM. In this case, EPT faults return execution to
used by VM, we launch and map the memory of QEMU via the VMM and the VMM handles the EPT faults in 4 KB size
the aforementioned API hugetlbfs. which further triggers normal page faults in 2 MB in the
host. If successive EPT faults occur within the same 2 MB
5.4.1 Start Time memory block, no host page faults are triggered and correct
offsets within the 2 MB memory block are set without trou-
We want to measure how long it takes to create and boot a vir- ble. Fig. 6a shows the results of this case. The best case is
tual machine using management from Zweilous, and how it 3.37 percent and the worst is -0.91 percent. Most results are
scales as the capacity of the memory of running VMs increases, within the range of plus/minus 1.0 percent. Zweilous
and how these compare to an unmodified virtual machine. achieve comparative performance with the unmodified one.
In our test, we increase the memory capacity from 2 GB to
32 GB at one step of 2 GB. The results are shown in Fig. 5. An
unmodified virtual machine with 2 GB memory capacity
starts about 1.5 seconds, scaling almost linearly to a maxi-
mum of 16 seconds with 32 GB memory capacity. On the
other hand, Zweilous curve in the Figure starts at 3.7 seconds
up to 14 seconds. Before 22 GB memory capacity, Zweilous
is inferior to the unmodified in start time. However, from
22 GB up to 32 GB, Zweilous shows a great advantage over
the unmodified one.
Zweilous contains only a prototype code base without
the polishness of many superior engineers like the unmodi-
fied one. However, it has a Smore advanced design to
reduce the redundancy of the vanilla large page manage-
ment. Therefore, it scales greatly compared to the unmodi-
fied one in the start time of virtual machines.

5.4.2 Overheads of Different Combinations


We want to measure overheads of combinations for differ-
ent page sizes in host and guest, which is detailed in

Fig. 6. Overheads of benchmark using different page size combinations.


Fig. 5. Start time of VMs using Zweilous and unmodified. All values are normalized to the baseline of unmodified Linux.

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
1360 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 9, SEPTEMBER 2021

Fig. 7. Effective memory of Hzmem and unmodified Linux.

The second case is that the benchmark uses 2 MB page size


inside the VM. In this case, page fault handling is not so com-
plicated as the first case. Every EPT fault goes hand in hand
with one host page fault, which simply manipulates two
entries in two page tables. Fig. 6b show the results of this sec-
ond case. Zweilous performs much better than in the first
case. Only three benchmarks show slower performance than
the unmodified one without over 1.0 percent worse. Four
benchmarks run better with over 2.0 percent improvement
and the best is 5.0 percent. This case goes simple code path in Fig. 8. Throughput of page fault using a stress test. Overcommitted is the
page fault handling can bring the best of design in reducing case where working sets are beyond huge page capacity while the non-
the redundancy of the vanilla memory subsystems. overcommitted case is not. When beyond capacity unmodified Linux that
cannot increase effective memory has no throughput at all and the
benchmark stalls.
5.5 Evaluation of Hzmem
In this section, we test our experimental huge page enhanced
with main memory compression based on our memory more data get compressed the overheads increase when
framework Zweilous. decompressed. The balance between effective memory size
and performance is left for future works to study.

5.5.1 Effective Memory Increasing


Effective memory size is the actual amount of memory 5.5.2 Overhead and Performance Isolation
applications can access. It depends on the datasets and aver- of Page Fault
age ratio of the compression algorithm. More importantly, We want to measure the throughputs of page faults, the cor-
the management area of compressed data in the main mem- rectness of decompressing and CPU utilization introduced
ory also contributes to this and reflects the optimization of by Hzmem.
implementation. Overheads of memory access mostly introduced by page
Applying LZ4 algorithms to datasets of Yelp Dataset, an walks from TLB misses and page fault handling from page
average compression ratio of 0.23 can get. We want to mea- table entry missing. And the performance penalty is larger
sure the effective memory increased by Hzmem. Both for the latter as page fault handling is resolved in a more
Hzmem and unmodified Linux have the same size of large complicated software way. Thus, the efficiency of page fault
memory (shown in the x-axis of Fig. 7) before evaluation. handling in commercial operating systems is critical to
The benchmark keeps requesting memory and accessing accelerating memory accesses.
the allocated memory. Until the benchmark fails due to We want to evaluate the page fault handling of Hzmem in
memory exhaustion the amount of memory it can access is an isolated way. We write a benchmark that has heavy page
the effective memory size. faults. It first requests certain amounts of large pages and
A comparison of effective memory increased by Hzmem accesses pages with random numbers followed by page faults.
with unmodified Linux is shown in Fig. 7. It is clear that Two cases depending whether the working sets of the bench-
unmodified Linux fails to increase effective memory size marks is larger than the amount of memory set up before run-
when the amount of memory set up before running is ning. Main memory compression introduced by Hzmem can
depleted completely. Hzmem is capable of getting the effec- reclaim huge pages in the compressed areas so that it makes
tive memory size increased by 4465 percent while the gap room for more free huge pages to be requested by applica-
narrows as the memory set up before gets close to the capac- tions. Therefore, applications with more working sets than the
ity of hardware memory. This may seem perplexing but the huge page capacity can still run without stopping.
explanation is simple and clear. Data of compressed form In Fig. 8, we call overcommitted when working sets are
are stored in the rest of the main memory. The smaller the larger the huge page capacity and non-overcommitted oth-
amount of memory is set up in advance the larger the erwise. For the overcommitted cases, the huge page capac-
remaining space to store the compressed data. However, as ity is set up as 4 GB. Overheads of data compressing from

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
LI ET AL.: ZWEILOUS: A DECOUPLED AND FLEXIBLE MEMORY MANAGEMENT FRAMEWORK 1361

page reclaiming devalue the results. Therefore, we let the ACKNOWLEDGMENTS


benchmark sleep for 30 seconds to cool down the pages and
The authors would like to thank the members of ARC Lab of
subtract this period of time for fairness. And in case of
Zhejiang University for their constructive comments and
cooled down pages, the results can reflect the exclusive
help during the work. They would also want to thank the
throughput of page fault handling.
reviewers for their feedback to improve this article.
First, the correctness of decompressing is proved in our
experiments.
Fig. 8a shows throughputs of Hzmem and unmodified REFERENCES
Linux when non-overcommitted. Hzmem has larger through- [1] K. C. Knowlton,“A fast storage allocator,” Commun. ACM, vol. 8,
put but the gap is within 9 MB/s, 4 percent in the best case. no. 10, pp. 623–624, 1965.
Fig. 8b shows throughputs of Hzmem and unmodified [2] J. Huang, M. K. Qureshi, and K. Schwan, “An evolutionary study
of linux memory management for fun and profit,” in Proc. USE-
Linux when overcommitted. When working sets are less NIX Conf. USENIX Annu. Tech. Conf., 2016, pp. 465–478.
than 4 GB there is almost no difference. When requesting [3] N. Ganapathy and C. Schimmel, “General purpose operating sys-
more than 4 GB there is no throughput of unmodified Linux tem support for multiple page sizes,” in Proc. USENIX Annu. Tech.
as the effective memory size hits the ceiling while that of Conf., 1998, no. 98, pp. 91–104.
[4] A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift,
Hzmem is not zero and decreasing no more than 27 percent. “Efficient virtual memory for big memory servers,” Proc. 40th
The performance is quite higher than disk accesses while Annu. Int. Symp. Comput. Architecture, 2013, pp. 237–248.
swapping. [5] A. T. Clements, M. F. Kaashoek, and N. Zeldovich, “Radixvm:
Scalable address spaces for multithreaded applications,” in Proc.
The CPU utilization of compressing daemons is mea- 8th ACM Eur. Conf. Comput. Syst., 2013, pp. 211–224.
sured in the meantime: 16.6 percent at most and 11.0 percent [6] S. Gerber, G. Zellweger, R. Achermann, K. Kourtis, T. Roscoe, and
in average, which can be optimized in future works. D. Milojicic, “Not your parents’ physical address space,” in Proc.
15th Workshop Hot Topics Operating Syst., 2015, Art. no. 16.
[7] H. Park, J. Choi, D. Lee, and S. H. Noh, “IBuddy: Inverse buddy for
enhancing memory allocation/deallocation performanceon multi-
5.5.3 Summary core systems,” IEEE Trans. Comput., vol. 64, no. 3, pp. 720–732,
Zweilous is a general-purpose memory management frame- Mar. 2015.
work abstraction based on decoupled, flexible and self- [8] J. Navarro, S. Iyer, P. Druschel, and A. Cox, “Practical, transparent
operating system support for superpages,” ACM SIGOPS Operat-
contained metadata/functions, co-existing with the vanilla ing Syst. Rev., vol. 36, 2002, Art. no. 89.
memory manager. Therefore, the implementation of Hzmen [9] Y. Kwon, H. Yu, S. Peter, C. J. Rossbach, and E. Witchel,
is achieved in more isolated and non-intrusive ways on com- “Coordinated and Efficient Huge Page Management with
Ingens,” Proc. 12th USENIX Conf. Operating Syst. Des. Implementa-
mercial operating systems. Native and virtualized Hzmem
tion, 2016, pp. 705–721.
both achieve competitive performance and throughput [10] J. Ousterhout et al., “The case for RAMCloud,” Commun. ACM,
against the vanilla equivalent. vol. 54, no. 7, pp. 121–130, 2011.
[11] M. Ferdman et al., “Clearing the clouds: A study of emerging
scale-out workloads on modern hardware,” ACM SIGPLAN Noti-
ces, vol. 47, no. 4, pp. 37–48, 2012.
6 CONCLUSION [12] K. Koh, K. Kim, S. Jeon, and J. Huh, “Disaggregated cloud mem-
With developments of cloud computing and AI, and improve- ory with elastic block management,” IEEE Trans. Comput., vol. 68,
no. 1, pp. 39–52, Jan. 2019.
ment of hardware architecture, workloads that require big- [13] C. Chen, K. Li, A. Ouyang, and K. Li, “FlinkCL: An opencl-based in-
memories become pervasive. In the future, nobody can pre- memory computing architecture on heterogeneous cpu-gpu clusters
dict memory patterns of dominant workloads. What we make for big data,” IEEE Trans. Comput., vol. 67, no. 12, pp. 1765–1779,
sure is that workloads are becoming more diverse. These Dec 2018.
[14] A. Panwar, A. Prasad, and K. Gopinath, “Making huge pages
workloads stress memory heavily, forcing improvement of actually useful,” in Proc. 23rd Int. Conf. Architect. Support Program.
memory management. However, the obsolete assumptions of Lang. Operating Syst., 2018, pp. 679–692. [Online]. Available:
memory subsystems become burdens and obscure atrociously http://doi.acm.org/10.1145/3173162.3173203
[15] M. Ekman and P. Stenstrom,“A robust main-memory compres-
the evolution of general-purpose operating systems.
sion scheme,” ACM SIGARCH Comput. Architecture News, vol. 33,
A new memory management framework abstract called no. 2, pp. 74–85, 2005.
Zweilous is proposed to address this problem. It is decoupled [16] G. Pekhimenko et al., “Linearly compressed pages: A low-
and flexible with customizable self-contained metadata/func- complexity, low-latency main memory compression framework,”
in Proc. 46th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2013,
tions and can co-exist with the vanilla memory manager in a pp. 172–184.
separated and independent way. Leveraging Zweilous, [17] I. C. Tuduce and T. R. Gross, “Adaptive main memory comp-
researchers can experiment and implement their creative ression,” in Proc. USENIX Annu. Tech. Conf., General Track, 2005,
ideas of memory manager designs safely and robustly, which pp. 237–250.
[18] F. Chen, M. P. Mesnier, and S. Hahn, “A protected block device
helps evolve memory subsystems of OSes more rapidly. for persistent memory,” in Proc. 30th Symp. Mass Storage Syst.
In one way of realizing our ideas of combining the tech- Tech., 2014, pp. 1–12.
niques of large pages and main memory compression, we [19] D. Xue, L. Huang, C. Li, and C. Wu, “Dapper: An adaptive man-
ager for large-capacity persistent memory,” IEEE Trans. Comput.,
implement Hzmem with little modification of Linux kernel vol. 68, no. 7, pp. 1019–1034, Jul. 2019.
codebase. Using the new memory framework, we overcome [20] S. Li et al., “RC-NVM: Dual-addressing non-volatile memory
many challenges which seem hopeless otherwise. Hzmem architecture supporting both row and column memory accesses,”
achieves competitive performance compared with native IEEE Trans. Comput., vol. 68, no. 2, pp. 239–254, Feb. 2019.
[21] C. Wang, Q. Wei, J. Yang, C. Chen, Y. Yang, and M. Xue, “NV-
and virtualized large page supports, effective memory size Dedup: High-performance inline deduplication for non-volatile
increased and fewer impacts on other parts of the operating memory,” IEEE Trans. Comput., vol. 67, no. 5, pp. 658–671,
system. May 2018.

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.
1362 IEEE TRANSACTIONS ON COMPUTERS, VOL. 70, NO. 9, SEPTEMBER 2021

[22] J. Corbet huge pages part 1 (Introduction), 2010. [Online]. Avail- Wenzhi Chen (Member, IEEE) received the PhD
able: https://lwn.net/Articles/374424/ degree from the College of Computer Science and
[23] N. Gupta, zRAM: Compressed RAM based block devices, Accessed: Engineering at Zhejiang University. He is a profes-
Jul. 2020. [Online]. Available: https://www.kernel.org/doc/ sor with the College of Computer Science and
Documentation/blockdev/zram.txt Technology, Zhejiang University, and the director of
[24] S. Jennings, zswap, Accessed: Jul. 2020. [Online]. Available: https:// Information Technology Center of Zhejiang Univer-
www.kernel.org/doc/Documentation/vm/zswap.txt sity, he used to be the vice dean of the College of
[25] J. Corbet, zcache: A compressed page cache, 2010. [Online]. Avail- Computer Science and Technology. His current
able: https://lwn.net/Articles/397574/ research interests include embedded system and
[26] J. Corbet, User-space page fault handling, 2015. [Online]. Avail- its application, computer architecture, and com-
able: https://lwn.net/Articles/636226/ puter system software and information security. He
[27] J. Corbet, Persistent memory for transient data, 2019. [Online]. is a member of ACM and ACM Education Council.
Available: https://lwn.net/Articles/777212/
[28] J. Corbet, Cramming more into struct page, 2013. [Online]. Avail-
able: https://lwn.net/Articles/565097/ Yang Xiang (Fellow, IEEE) received the PhD
[29] W. Mauerer, Professional Linux Kernel Architecture. Hoboken, NJ, degree in computer science from Deakin Univer-
USA: Wiley, 2010. sity, Australia. He is currently a full professor and
[30] P. Zaitsev and V. Tkachenko Evaluating Database Compression the dean of Digital Research & Innovation Capa-
Methods: Update, 2016. [Online]. Available: https://www. bility Platform, Swinburne University of Technol-
percona.com/blog/2016/04/13/evaluating-database- ogy, Australia. His research interests include
compression -methods-update/ cyber security, which covers network and system
[31] SPEC, SPECjvm2008, 2008. [Online]. Available: http://www.spec. security, data analytics, distributed systems, and
org/jvm2008/ networking. He is also leading the Blockchain ini-
[32] SPEC, SPECCPU2017, 2017. [Online]. Available: http://www. tiatives at Swinburne. In the past 20 years, he has
spec.org/cpu2017/ been working in the broad area of cyber security,
[33] McCalpin, STREAM benchmark, 2002, [Online]. Available: which covers network and system security, AI, data analytics, and net-
http://www.cs.virginia.edu/stream/ working. He has published more than 300 research papers in many inter-
[34] Yelp Dataset Challenge, Accessed: Jan. 2017. [Online]. Available: national journals and conferences. He is the editor-in-chief of the
https://www.yelp.com/dataset_challenge/ SpringerBriefs on Cyber Security Systems and Networks. He serves as
[35] libhugetlbfs, 2015. [Online]. Available: https://github.com/ the associate editor of IEEE Transactions on Dependable and Secure
libhugetlbfs/libhugetlbfs Computing and IEEE Internet of Things Journal, and the editor of Journal
[36] QEMU, the FAST! processor emulator, Accessed: Jul. 2020. [Online]. of Network and Computer Applications. He served as the associate editor
Available: http://www.qemu.org of IEEE Transactions on Computers and IEEE Transactions on Parallel
and Distributed Systems. He is the coordinator, Asia for IEEE Computer
Guoxi Li (Student Member, IEEE) received the Society Technical Committee on Distributed Processing (TCDP).
BS degree in computer science from Zhejiang
University, he is currently working toward the PhD
degree with Zhejiang University. His research " For more information on this or any other computing topic,
interests include operating system and system vir- please visit our Digital Library at www.computer.org/csdl.
tualization. He is a student member of the ACM.

Authorized licensed use limited to: National University Fast. Downloaded on November 23,2022 at 10:54:26 UTC from IEEE Xplore. Restrictions apply.

You might also like