You are on page 1of 13

Why Memory reclamation:

ESXi supports memory over commitment in order to provide higher memory utilization
and higher ratio of consolidation. In order to effectively support memory over
commitment, the hypervisor provides efficient host memory reclamation techniques.
ESXi uses several techniques to reclaim virtual machine memory, which are:

Transparent page sharing (TPS)

Ballooning

Memory compression

Hypervisor swapping

Do check the links for detailed discussion about each of these techniques.
Now the question is, when do these techniques are running, is it always? is it at specific
threshold? So lets explore that too.
Which memory reclamation technique is active will depend upon which memory state is
active currently.
Following are the possible memory states in vSphere.

High

Clear (New in vSphere 6 onward)

Soft

Hard

Low

I have explained these states in another article on Sliding scale method


Below chart explains which memory reclamation technique will be active considering
which memory state is active.

NOTE: As we all know that vSphere 6 onward, TPS is by default turned OFF.
However, if you enable it, the TPS runs always and tries to share memory pages
like what we had in old versions of ESXi but this is applicable only on small
memory pages i.e. 4KB pages.

When available free memory is less than High state but more then Clear state as
in chart above then ESXi will start preemptively breaking up large pages so that
TPS (If enabled in vSphere 6) can collapse them at next run cycle.

If the amount of available free memory is bit less than the Min.FreePct threshold
as in chart above, the VMkernel applies ballooning to reclaim memory.

The ballooning memory reclamation technique introduces the least amount of


performance impact on the virtual machine by working together with the Guest
operating system inside the virtual machine, however there is some latency
involved with ballooning.

Compression helps to avoid hitting the low state without impacting virtual
machine performance, but if memory demand is higher than the VMkernelss
ability to reclaim, drastic measure of Hypervisor swapping is taken to avoid
memory exhaustion.

However, hypervisor swapping will introduce VM performance degradation's due


to issues like high latancy rate, paging/double paging. For this reason this
reclamation technique is used when situation require drastic measurements.

Transparent Page Sharing (TPS) in vSphere 6.0


Also do check Ballooning. Compression articles in this series on VMware Memory
Reclamation.
On ESXi host, you may have several virtual machines that are running with same guest

operating system, have the same applications, or contain the same user data. Due to
this, there is possibility that memory pages created by virtual machines are similar in
terms of content. So instead of creating similar multiple pages in host memory for each
virtual machine, TPS is used to perform memory page sharing.
In vSphere 6, intra-VM TPS is enabled by default and inter-VM TPS is disabled by
default, due to some security concerns as described in VMware KB 2080735.
With page sharing, the hypervisor reclaims the redundant copies and keeps only one
copy, which is shared by multiple virtual machines in the host physical memory. As a
result, the total virtual machine host memory consumption is reduced and a high
memory over commitment is possible.
How TPS works?
ESXi scans the content of guest physical memory for sharing opportunities. Instead of
comparing each byte of a candidate guest physical page to other pages, ESXi uses
hashing to identify potentially identical pages.

Image: VMware

A hash value is generated based on the virtual machines physical pages (GA)
content and stored in global hash table. Each entry in global hash table includes
a hash value and the physical page number of a shared page

The hash value is used to look up a global hash table. If the hash value of virtual
machines physical page matches an existing entry in hash table, a bit-by-bit
comparison of the page contents is performed to exclude any false match.

Once the virtual machine physical pages content matches with the content of an
existing shared host physical page, the guest physical (GA) to host physical
mapping (HA) of the virtual machine physical page is changed to the shared host
physical page, and the redundant host memory copy (the page pointed to by the
dashed arrow in above image) is reclaimed.

This remapping is invisible to the virtual machine and inaccessible to the guest
operating system. Because of this invisibility, sensitive information cannot be
leaked from one virtual machine to another.

Image:VMware

Any attempt to write to the shared pages will generate a minor page fault. In the
page fault handler, the hypervisor will transparently create a private copy of the
page for the virtual machine and remap the affected guest physical page to this
private copy. A standard copy-on-write (CoW) technique is used to handle writes
to the shared host physical pages.

In hardware-assisted memory virtualization (Intel EPT and AMD RVI) systems, ESXi will
not share large pages because:

The probability of finding two large pages having identical contents is low

The overhead of doing a bit-by-bit comparison for a 2MB page is much larger
than for a 4KB page

Since ESXi will not swap out large pages, the large page (2MB) will be broken into small
pages (4KB) during host swapping so that these pre-generated hashes can be used to
share the small pages before they are swapped out.
What is Salting in TPS?
Salting is used to allow more granular management of the virtual machines participating
in TPS. Salting is enabled after the ESXi update mentioned below are deployed.
ESXi 5.0 Patch ESXi500-201502001
ESXi 5.1 Update 3
ESXi 5.5, Patch ESXi550-201501001
ESXi 6.0
By default, salting is set Mem.ShareForceSalting=2 and each virtual machine has a
different salt. This means page sharing does not occur across the virtual machines
(inter-VM TPS) and only happens inside a virtual machine (intra VM).
When salting is enabled (Mem.ShareForceSalting=1 or 2) in order to share a page
between two virtual machines both salt and the content of the page must be same. A
salt value is a configurable vmx option for each virtual machine. You can manually
specify the salt values in the virtual machine's vmx file with the new vmx option
sched.mem.pshare.salt. If this option is not present in the virtual machine's vmx file,
then the value of vc.uuid vmx option is taken as the default value. Since the vc.uuid is
unique to each virtual machine, by default TPS happens only among the pages
belonging to a particular virtual machine (Intra-VM).
How can I enable or disable salting?

1. Log in to ESX (i)/vCenter with the VI-Client.


2. Select ESX (i) relevant host.
3. In the Configuration tab, click Advanced Settings (link) under the software
section.
4. In the Advanced Settings window, click Mem.
5. Search for Mem.ShareForceSalting and set the value to 1 or 2 (enable salting),
0(disable salting).
6. Click OK.
7. For the changes to take effect do either of the two:
o Migrate all the virtual machines to another host in cluster and then back to
original host. Or
o Shutdown and power-on the virtual machines.

Steps to specify the salt value for a virtual machine:


1. Power off the virtual machine on which you want to set salt value.
2. Right click on virtual machine, click on Edit settings.
3. Select options menu, click on General under Advanced section.
4. Click on Configuration Parameters.
5. Click on Add Row, new row will be added.
6. On LHS add text sched.mem.pshare.salt and on RHS specify the unique string.
7. Power on the virtual machine to take effect of salting.
8. Repeat steps 1 to 7 to set the salt value for individuals virtual machine.
Note: Same salting values can be specified to achieve the page sharing across virtual machines.
You can change the TPS behavior by applying the salting mechanism as described in
VMware KB 2097593.

Memory Reclamation-Ballooning

Do check my previous articles on TPS and Compression in this series on VMware


Memory reclamation, as compression will start after TPS and Ballooning.
Ballooning in simple terms is a process where the hypervisor reclaims memory back
from the virtual machine. Ballooning gets initiated when the ESXi host is running out of
physical memory. The demand of the virtual machine is too high for the host to handle.
But before I describe Ballooning, it is good idea to understand why we need to reclaim
memory from Virtual machine.
In order to understand why reclamation? Lets understand, how operating system
manages memory allocation in a physical system. Below Diagram provides us idea on
how the memory pages are handled by operating system.

For example, when I open MS outlook for the first time on my computer, it takes some
amount of time to load all pages of that program. Now lets just say, I closed the outlook,
but after couple of minutes I tried to re-open outlook again, now I may not need to wait
same amount of time, in fact this time it will be quicker. So what happened in back end?
Well, when I started application first time, it loaded all the required pages of that
program into the memory which we call as Active Pages or MRU. But when I closed the
application, memory pages of that application which were loaded into MRU are not
deleted from memory, rather operating system keeps those pages back in LRU or Idle
pages, considering application may require those pages if request comes in again like in
my example I started application again.
Now this is really good approach of managing memory pages and ensuring
performance by keeping pages in LRU. But this approach is good for physical systems.
The challenge that we face in virtual machine due to this approach is as below.

Hypervisor has no visibility of Free list, LRU and MRU memory pages that are
managed by Operating system of a virtual machine.

So if multiple VMs are demanding memory resources and later keeping memory
pages in LRU even after workload is no longer present, this results in
unnecessary consumption of host memory of ESXi host which can cause
memory contention when multiple VMs puts high demand for memory resources.

On the other hand, operating system of virtual machine is also not aware that
ESXi server is under memory contention as virtual machine operating system
also does not have visibility of ESXi memory consumption and cannot detect the
hosts memory shortage.

So to overcome host memory contention due to above mentioned issues, we use


Ballooning reclamation technique. Balloon driver (VMMEMCTL) is loaded into the guest
operating system when we install VMware tools.

In Figure (A), four guest physical pages are mapped in the host physical memory. Two
of the pages are used by the guest application and the other two pages (marked by
stars) are in the guest operating system free list. Note that since the hypervisor cannot
identify the two pages in the guest free list, it cannot reclaim the host physical pages
that are backing them. Assuming the hypervisor needs to reclaim two pages from the
virtual machine, it will set the target balloon size to two pages.
After obtaining the target balloon size, the balloon driver allocates two guest physical
pages inside the virtual machine and pins them, as shown in Figure (B). Here, pinning
is achieved through the guest operating system interface, which ensures that the pinned
pages cannot be paged out to disk under any circumstances.
Once the memory is allocated, the balloon driver notifies the hypervisor the page
numbers of the pinned guest physical memory so that the hypervisor can reclaim the
host physical pages that are backing them. In Figure (B), RED and GREEN are
representing these pages.
The hypervisor can safely reclaim this host physical memory because neither the
balloon driver nor the guest operating system relies on the contents of these pages.
If any of these pages are re-accessed by the virtual machine for some reason, the
hypervisor will treat it as normal virtual machine memory allocation and allocate a new
host physical page for the virtual machine.
OK. Now the above description is as per the VMware Documentation. In order to
understand this in simple terms lets discuss this same process further.

When ESXi host is under memory contention, ESXi host sets the target for
balloon driver.

As per the target, balloon driver inside virtual machine, will fake itself as another
application and demand memory from Operating system of virtual machine.

Considering the request from application (FAKE), VM operating system will start
allocating memory pages to balloon driver from Free list, LRU and if required
from MRU as well in case there is situation to satisfy reservation demand.

As soon as balloon driver receives memory pages from operating system of VM,
it starts inflating from its initial size just like what happens with actual balloon
when we pump air into it.

Memory pages that are consumed by balloon driver, are pinned (Red and Green
pages in above figure) so that they are not swapped out.

Balloon driver communicates with the hypervisor through a private channel and
informs hypervisor about pinned pages.

Hypervisor then reclaims these pages by setting up lower target, this causes
balloon driver to deflate back to initial state, just like in actual balloon, if we take
air out of it, it comes back to initial state.

Below image describes this process graphically.

Image: VMware

ESXi host will try to reclaim memory from virtual machines as per target received. How
much memory is reclaimed from each VM is calculated with the help of Memory Taxing
(mem.idletax).
Like if you earn more bucks, you pay more tax, so if any VM holding more number of

idle memory, it is charged (Taxed) more. :P


If a virtual machine is not actively using all of its currently allocated memory, ESXi
charges more for idle memory than for memory that is in use.

VMware Memory Reclamation: Memory Compression Explained

Do check my previous articles on TPS and Ballooning in this series on VMware Memory
reclamation, as compression will start after TPS and Ballooning.
ESXi provides a memory compression cache to improve virtual machine performance
when you use memory over commitment.
If the virtual machines memory usage reaches to the level at which host-level swapping
will be required, ESXi uses memory compression to reduce the number of memory
pages it will need to swap out. Because the decompression latency is much smaller
than the swap-in from disk latency, compressing memory pages has significantly less
impact on performance than swapping out those pages.
Lets see how compression helps improving performance of virtual machines that are
running on over-committed ESXi Host. Below video from VMware is for old version of
ESXi, however it gives us idea about Memory compression impact on VM Performance.

How Memory Compression Works:


Memory Compression is enabled by default. You can disable it if you want from
advanced configuration settings (mem.memzipenable), also you can set the maximum
size for the compression cache using the Advanced Settings.
We have two types of pages in memory as listed below that are compressed.

Large Pages (2MB)

Small Pages (4KB)

Note 1: ESXi does not directly compress 2MB large pages, rather 2MB large pages are
chopped down to 4KB pages first and later they are compressed to 2KB pages.
Note 2: if a pages compression ratio is larger than 75%, ESXi will store the
compressed page using a 1KB quarter-page space.

There are couple of conditions for pages that will be considered for compression. If
memory pages are meeting below criteria then only memory pages are compressed.
1. Memory pages that are any way marked for swapping out to disk only those
pages. AND
2. Memory pages that can be compressed at least 50%.
Any page that is not meeting above criteria, will be swapped out to disk.
Lets understand how compression works with an example.

Image: VMware
Lets assume that ESXi needs to reclaim 8 KB physical memory (two 4KB pages) from
Virtual machines.If we consider host swapping, two swap candidate pages, page A and
B, are directly swapped to disk (Image A).
With compression, a swap candidate page is compressed and stored using 2KB of
space in a per -VM compression cache. Hence, each compressed page yields 2KB
memory space for ESXi to reclaim.In order to reclaim 8 KB physical memory, four
swap candidate pages need to be compressed (Image B).
If memory requests comes in to access a compressed page, the page is decompressed
and pushed back to the guest memory. The page is then removed from the compression
cache.
What is Per-VM Compression Cache:
The memory for the compression cache is not allocated separately as an extra
overhead
memory. The compression cache size starts with zero when host memory is under
committed and grows when the virtual machine memory starts to be swapped out.
If the compression cache is full, one compressed page must be replaced in order to
make room for a new compressed page. The page which has not been accessed for the
longest time will be decompressed and swapped out. ESXi will not swap out
compressed pages.

If the pages belonging to compression cache need to be swapped out under severe
memory pressure, the compression cache size is reduced and the affected compressed
pages are decompressed and swapped out.
The maximum compression cache size is important for maintaining good VM
performance. Since compression cache is accounted for by the VMs guest memory
usage, a very large compression cache may waste VM memory and unnecessarily
create host memory pressure.
In vSphere 5.0, the default maximum compression cache size is conservatively set to
10% of configured VM memory size. This value can be changed through Advanced
Settings by changing the value for Mem.MemZipMaxPct
VMware Memory Reclamation:Hypervisor Swapping

ESXi employs hypervisor swapping to reclaim memory, if other memory reclamation


techniques like ballooning, transparent page sharing, and memory compression are not
sufficient to reclaim memory.
Transparent Page Sharing (TPS) speed is dependent of possibility to share memory
pages, another reclamation technique of ballooning also depends on guest operating
system response for memory allocation. Due to all this, these techniques may take time
to reclaim memory.
Unlike other techniques, Hypervisor swapping is a guaranteed technique to reclaim a
specific amount of memory within a specific amount of time.
At virtual machine start up, the hypervisor creates a separate swap file for the virtual
machine (.vswp) inside virtual machine folder by default unless changed the swap file
location. This file is used by hypervisor to directly swap out virtual machine physical
memory to the swap file. This frees host physical memory and can be used by other
virtual machines.
However, hypervisor swapping is used as a last resort to reclaim memory from the
virtual machine as there will be performance impact on virtual machine due to some of
known issues as listed below.

High swap-in latency

Page selection problems due to no visibility of guest OS pages.

Double paging problems

ESXi employs below methods to address the limitations mentioned above that improves
hypervisor swapping performance:

Memory compression: To reduce the amount of pages that need to be swapped


out while reclaiming the same amount of host memory. For more details on how
compression work, do check my other article on the same.

SSD Swapping: If an SSD device is installed in the host, we can choose to


configure a host SSD Cache. Using swap to host cachedoes not means placing
regular swap files on SSD-backed datastores. Even if you enable swap to host
cache, the host still needs to create regular swap files. ESXi will use the host
cache (SSD) to store the swapped out pages first instead of putting them directly
in the regular hypervisor swap file (.vswp). Upon the next access to a page in the
host cache, the page will be pushed back to the guest memory and then
removed from the host cache. Since SSD read latency, which is normally around
a few hundred microseconds, is much faster than typical disk access latency, this
optimization significantly reduces the swap-in latency and hence greatly
improves the application performance in high memory over commitment
scenarios.

How SSD Swap works?


Multiples of 1GB sized .vswp file chunks will be created inside SSD swap. As shown in
below figure, 10GB SSD has ten .vswp files created inside it. These files can be seen
by browsing the datastore. These .vswp files are not specific to VMs like one we have in
shared storage. Each VM has its own regular .vswp in shared storage inside their
specific VM folders. However, the .vswp files inside SSD swap will be shared by virtual
machines whenever there is need for swapping.