You are on page 1of 11

Impact of host local VM swap on HA and DRS | frankdenneman.


Page 1 of 11

CPU scheduling
Memory scheduling

Impact of host local VM swap on HA and DRS

February 15, 2010 | DRS, Memory | 41 Comments
On a regular basis I come across NFS based environments where the decision is made to store the virtual machine swap files
on local VMFS datastores. Using host-local swap can affect DRS load balancing and HA failover in certain situations. So
when designing an environment using host-local swap, some areas must be focused on to guarantee HA and DRS
VM swap file
Lets start with some basics, by default a VM swap file is created when a virtual machine starts, the formula to calculate the
swap file size is: configured memory memory reservation = swap file. For example a virtual machine configured with 2GB
and a 1GB memory reservation will have a 1GB swap file.

Reservations will guarantee that the specified amount of virtual machine memory is (always) backed by ESX machine
memory. Swap space must be reserved on the ESX host for the virtual machine memory that is not guaranteed to be backed by
ESX machine memory. For more information on memory management of the ESX host, please the article on the impact of
memory reservation.
During start up of the virtual machine, the VMkernel will pre-allocate the swap file blocks to ensure that all pages can be
swapped out safely. A VM swap file is a static file and will not grow or shrink not matter how much memory is paged. If there
is not enough disk space to create the swap file, the host admission control will not allow the VM to be powered up.
Note: If the local VMFS does not have enough space, the VMkernel tries to store the VM swap file in the working directory of
the virtual machine. You need to ensure enough free space is available in the working directory otherwise the VM is still not
allowed to be powered up. Let alone ignoring the fact that you initially didnt want the VM swap stored on the shared storage
in the first place.
This rule also applies when migrating a VM configured with a host-local VM swap file as the swap file needs to be created on
the local VMFS volume of the destination host. Besides creating a new swap file, the swapped out pages must be copied out to
the destination host. Its not uncommon that a VM has pages swapped out, even if there is not memory pressure at that
moment. ESX does not proactively return swapped pages back into machine memory. Swapped pages always stays swapped,
the VM needs to actively access the page in the swap file to be transferred back to machine memory but this only occurs if the
ESX host is not under memory pressure (more than 6% free physical memory).
Copying host-swap local pages between source- and destination host is a disk-to-disk copy process, this is one of the reasons
why VMotion takes longer when host-local swap is used.
Real-life scenario
A customer of mine was not aware of this behavior and had discarded the multiple warnings of full local VMFS datastores on
some of their ESX hosts. All the virtual machines were up and running and all seemed well. Certain ESX servers seemed to be
low on resource utilization and had a few active VMs, while other hosts were highly utilized. DRS was active on all the
clusters, fully automated and a default (3 stars) migration threshold. It looked like we had a major DRS problem.


Impact of host local VM swap on HA and DRS |

Page 2 of 11

If DRS decide to rebalance the cluster, it will migrate virtual machines to low utilized hosts. VMkernel tries to create a new
swap file on the destination host during the VMotion process. In my scenario the host did not contain any free space in the
VMFS datastore and DRS could not VMotion any virtual machine to that host because the lack of free space. But the host
CPU active and host memory active metrics were still monitored by DRS to calculate the load standard deviation used for its
recommendations to balance the cluster. (More info about the DRS algorithm can be found on the DRS deepdive page). The
lack of disk space on the local VMFS datastores influenced the effectiveness of DRS and limited the options for DRS to
balance the cluster.
High availability failover
The same applies when a HA isolation response occurs, when not enough space is available to create the virtual machine swap
files, no virtual machines are started on the host. If a host fails, the virtual machines will only power-up on host containing
enough free space on their local VMFS datastores. It might be possible that virtual machines will not power-up at-all if not
enough free disk space is available.
Failover capacity planning
When using host local swap setting to store the VM swap files, the following factors must be considered.
Amount of ESX hosts inside cluster.
HA configured host failover capacity.
Amount of active virtual machines inside cluster.
Consolidation ratio (VM per host).
Average swap file size.
Free disk space local VMFS datastores.
Number of hosts inside cluster:
HA configured host failover capacity: 1
Active virtual machines:
Average consolidation ratio:
Average memory reservation:
Average swap file size:

For the sake of simplicity, lets assume that DRS balanced the cluster load and that all (identical) virtual machines are spread
evenly across every host.
In case of a host failure, 27 VMs will be restarted on the remaining 5 hosts inside the cluster, HA will start 5.4 virtual
machines per host, as it is impossible to start 0.4 VM, some ESX hosts will start 6 virtual machines, while other hosts will start
5 VMs.
The average swap file size is 4GB, this requires at least 24 GB of free space to be available on the local VMFS datastores to
start the VMs. Besides the 24GB, enough free space needs to be available to for DRS to move multiple VMs around to
rebalance the load across the cluster.
If the design of the virtual infrastructure incorporates site failover as well, enough free disk space on all the ESX hosts must be
reserved to power-up all the affected virtual machines from the failed site.
Closing remarks
Using host local swap can be a valid option for some environments, but additional calculation of the factors mentioned above
is necessary to ensure sustained HA and DRS functionality.

41 responses to Impact of host local VM swap on HA and DRS

Duncan says:
February 15th, 2010 at 12:09


Impact of host local VM swap on HA and DRS |

Page 3 of 11

Great article, its all about understanding the impact of your decisions

Eric Sloof says:
February 15th, 2010 at 12:44

One little side note, the formula to calculate the swap file size is: memory limit memory reservation. Usually the
configured memory is the same as the limit, but the limit can be lower.

Eric Sloof says:
February 15th, 2010 at 12:55

Sorry, Im wrong.

Tom Howarth says:
February 15th, 2010 at 13:12

an Excellent article, it certainly is food for thought

frankdenneman says:
February 15th, 2010 at 13:19

Thanks for the reply, but the correct calculation is configured memory- reservation.
See page 31 of the vSphere resource management guide:
You must reserve swap space for any unreserved virtual machine memory (the difference between the
reservation and the configured memory size) on per-virtual machine swap files.
I understand your point, as a limit will reduce the amount of memory allowed to be backed by machine memory, but you
cannot configure a virtual machine with a limit less than its reservation setting. A limit will not have any effect on the
size of the swap file. It only restricts the VM ability to use machine memory.
For example a machine configured with a 2GB and a 1GB memory reservation will end up with 1GB. A swap file is
created with a size of 1GB. The minimum limit of that VM is 1GB. In this situation, all pages above 1GB are not
allowed to be backed by physical memory and are paged in the swap file per default.
When no reservation is set, the swap file will be equal to the configured memory of the virtual machine.

frankdenneman says:
February 15th, 2010 at 13:20


Impact of host local VM swap on HA and DRS |

Page 4 of 11

Ah just spotted your correction.

But your question was a good exercise!

Richard says:
February 15th, 2010 at 13:23

Nice article Frank! Can you think of any reasons why people want to store swap on local vmfs (beside the reason there
is no shared storage)?

frankdenneman says:
February 15th, 2010 at 13:30

Most of the time its not the case that shared storage is unavailable, but just a reduction of IO load towards the shared
storage environment.
VMware used to recommend placement of swap files on local VMFS datastores when NFS was used as shared storage.

Eric Sloof says:
February 15th, 2010 at 14:02

vConsult also made a good point on Twitter, when your shared storage is replicated you dont want to replicate the swap
files to the disaster recovery site. Thats one reason to place the swap files a non replicated LUN or local VMFS.

Eric Sloof says:
February 15th, 2010 at 14:10

Frank youre right, its not the limit. I actually tested it in my lab. Theres still an error in the Fast-Track Guide which
states (p.435).
The size of the VMKernel swap file is determined by the difference between how much memory the virtual machine
can use (the virtual machines maximum configured memory or its memory limit) and how much RAM is reserved for it
(its reservation).
Thats when I decided to test it.

Duncan says:
February 15th, 2010 at 14:23

If you dont want to replicate your swap file why not store the swap file on a separate shared VMFS datastore?
And why even care about replication? If youre not overcommitting the .vswp file is more or less static and should not
increase replication traffic after the initial first replica.


Impact of host local VM swap on HA and DRS |

Page 5 of 11

Arnim van Lieshout says:
February 15th, 2010 at 14:52

Great article Frank. I dont have any experience with local swap files, but this is something that could be easily
I fully agree with Duncan. When you do a proper sizing of your environment you hardly get any ESX swapping, so
replication overhead is minimized.
When you care about storage replication consider placing the Guests page/swap on a seperate datastore that is not
replicated. The guests page/swap file is utilized more than the ESX swap.
Oh and before your host hits swapping it will try ballooning first, which utilizes the guests swap/page even more.

Eric Sloof says:
February 15th, 2010 at 14:53

But every time you power off or power on a virtual machine the replication will be triggered again. Best practice: Locate
the swap file on shared storage but not on replicated storage.

Alan Renouf says:
February 15th, 2010 at 15:55

Great Article !
I wrote a PowerCLI one-liner to display the VMs and their average memory swapped and the amount of memory used
by memory control, I was supprised to see some VMs with swapped memory even when the hosts look fine, this article
helps to explain this, Thanks Frank !
How much would you expect was a non worying amount of data in a swap file ?
One-liner for those interested:
Get-VM | Where {$_.PowerState -eq PoweredOn }| Select Name, Host, @{N=SwapKB;E={(Get-Stat -Entity $_ Stat mem.swapped.average -Realtime -MaxSamples 1 -ErrorAction SilentlyContinue).Value}}, @
{N=MemBalloonKB;E={(Get-Stat -Entity $_ -Stat mem.vmmemctl.average -Realtime -MaxSamples 1 -ErrorAction
SilentlyContinue).Value}} | Out-GridView

Unexpected says:
February 15th, 2010 at 15:57

Good stuff that Ill keep in mind when Im thinking about putting swaps locally.



Impact of host local VM swap on HA and DRS |

Page 6 of 11

Duncan says:
February 15th, 2010 at 16:00

Thats true Eric, but will that really saturate your link? if thats the case you will need to do the math.
Question is though, will you also replicate your Windows swap?

Arnim van Lieshout says:
February 15th, 2010 at 16:06

How often is a vm powered down in a production environment?
Guest OS reboots dont count.
Allthough powering off/on a vm will trigger replication it will not necessarily replicate the complete swap file.
Replication is active on the block level of the backend storage. Deleting the swapfile will only impact a small change in
the allocation table. No disk scrubbing is done.
The same holds true for creating the swapfile. The swap file isnt eagerzeroed, but zeroed out on first write AFAIK.
Therefore the impact on replication is minimal.
Im no storage specialist though.
Interesting topic.

frankdenneman says:
February 15th, 2010 at 16:08

Not entirely true, the VMkernel does not zero-out the swap file, it will reserve only the blocks.

Duco says:
February 15th, 2010 at 16:08

For the case Frank was talking about, we actualy looked in to not replicating the windows swap, but decided that would
make the setup too complex. Even with sufficient bandwith, you do not want to replicated data that is useless on the
other side, especialy when you get a performance penalty by replicating.

Arnim van Lieshout says:
February 15th, 2010 at 16:14

I mentioned that the swap file is NOT eagerzeroed. But arent the reserved blocks zeroed out before they are written to?


Impact of host local VM swap on HA and DRS |

Page 7 of 11

Duncan says:
February 15th, 2010 at 16:15

Thats also true Arnim. The vswp file isnt zeroed out. only when a write occurs its zeroed.

frankdenneman says:
February 15th, 2010 at 16:17

Oh apologies, I misunderstood your comment.

I think a block is zeroed out before writing, will come back on this.

Bouke Groenescheij says:
February 15th, 2010 at 16:24

I can agree with Arnim on this one. In our production environment, the VMs are almost never shutdown and booted
again. Instead, the OS inside of them are rebooted. However, now that Im thinking of it, there is an option in Windows
to clear the pagefile during boot. That could certainly impact the performance when rebooting 3000 VMs.. I would
disable that setting. Great article Frank, its going to be a classic!

Jason Boche says:
February 15th, 2010 at 16:46

I know a large NFS shop that places all VMkernel swap on a no-snap volume, while the VMs are typically placed on
snap volumes for backup purposes. The logic? No need to snap VMkernel swap wasting Tier 1 disk space. Furthermore,
VMkernel swap doesnt need to roll off to snap vault for the same reasons.

Doug says:
February 15th, 2010 at 18:25

This discussion got me thinking and I provisioned a new thin VMFS from my LeftHand array and parked a new VM out
there with 4 GB RAM. I put it into a resource pool that is capped at 1 GB and fired up MemTest x86. This was so I
could ensure that the swap file was being used simply starting the VM did allocate a file on the VMFS, but the array
didnt grab any more storage for the volume.
Once I started swapping, the array allocated a couple GB and I took a snapshot and powered off the VM. When I
brought it back up, no additional space was allocated for the VSWP file until I again started using it then the array
grabbed another couple GB.
I guess the moral of the story is that the act of allocating the swap file does not seem to allocate much disk space, but
that may depend on how your array handles provisioning. If it looks at changed blocks, you should be fine.


Impact of host local VM swap on HA and DRS |

Page 8 of 11

Richard says:
February 17th, 2010 at 12:57

Still not sure if hosting swap files on separate non-replicated storage is such a wise idea.
If you have a setup in which you have two sites, then you must take into account the possibility of a site failure. In that
case, VMs should be started up on the working site, and therefore, you also need space for those VM swapfiles. Because
of this situation, I think its better to store swap on replicated storage.
Also of this, its better not to store swap files on local storage, because you need more local storage on each ESX host to
facilitate a site failure.
Please, shoot me on this! xD
ps: Found some more interesting information on local swap (little outdated, but still applies I think):

February 18th, 2010 at 05:48

This blog was awesome. This site was very informative.Thank you for this information.

Jeremy Barth says:
February 18th, 2010 at 19:14

Windows pagefile: Bouke, Windows has an option to clear the pagefile at shutdown, but its disabled by default (Im not
aware of any setting that clears the pagefile at bootup). So this probably isnt a concern in practice.
Replication traffic: there seems to be a soft consensus here that a VM startup reserves disk blocks but no block-level
replication is triggered until theres an actual write which may well never occur since memory ballooning always
precedes use of the VSWP file. Its essential to verify this on your particular array, though.
Array storage: yeah, it does seem awfully expensive to waste Tier 1 storage on mere swap files. The idea Jason
mentioned of using non-replicated NAS/NFS storage has a certain appeal but if the storage is too much cheaper, it may
not be as well engineered (for example, against single-points-of-failure) as the Tier 1 stuff. Though presumably the NFS
shops Frank is referring to have already done their homework.
Host-local may have slight performance benefits vs. SAN, but one could argue that if the host is so loaded that youre in
swapping territory, the location of your VSWP files is probably the least of your worries.
Theres no perfect solution, just tradeoffs, but personally I like the idea of using non-replicated, but well-engineered,
NFS/NAS for swap files you get the benefits of central storage (no performance hit when doing VMotion) at the cost
of having to install a NAS in each site (production, D/R). That is, use Tier 1 (snapshots, replication) storage for VMs
and Tier 2 (fast and no single-points-of-failure, but no snapshots or replication) for swap.

Andrew Fidel says:
February 18th, 2010 at 23:37


Impact of host local VM swap on HA and DRS |

Page 9 of 11

Richard, why waste often valuable WAN bandwidth replicating state that wont be used at the other site? Sure you need
to have a target for the swap at the other side, but it doesnt need to be replicated from the first site.

Marchelle Robicheaux says:
February 19th, 2010 at 21:31

That was a awesome post! I agree with your post. Great job again, and I hope you have a great day!

Ian Forbes says:
February 21st, 2010 at 02:49

What I do with swap is to create thin provisioned NFS volumes that are the size of my aggregate (Netapp). This way I
get the benefit of fast shared storage for swap and the storgae array only stores what swap actually needs. By
oversubscribing to the size of the aggregate I get the protection that Ill always have enough capacity available for swap.
Of course I need to monitor the available space in my aggregate, but thats not a problem.
So, thin provision swap on non-replicated, non snapshotted volumes is what I like to do.

MarkVerhagen says:
February 21st, 2010 at 17:47

Excellent info and banter, thanks Frank!

Shannon says:
February 22nd, 2010 at 20:10

Weve been running VMWare since 2.5 and I didnt like when VMWare forced us to use SAN storage in 3.0 (since
changed I realize). In my environment SAN storage is expensive and if you look about the amount that would be taken
if all my swaps were moved to it would be in the TBs. All my ESX servers have a pair of 146 GB drives mirrored (with
a few with 300s). As of our current servers (soon to change) the most RAM I have is 128 GB. If you subtract the ESX
layer that leaves enough space on that local drive for a VMFS partition to hold a lot of swap space. I load our ESX
servers as to be able to take an entire site down and still run all of production with very little to no over commit on
RAM. I havent, as of yet, seen a reason to move all that useless space to the SAN, mirrored or not.
Now, as we start to eval going to ESXi and thus the removal of need for local storage things change, but at least in our
environment to date I cant justify spending the extra $$ on SAN storage.
As has been said, it goes back to knowing your environment, your needs and ramifications of your design decisions.
Like most everything in this world one size does not fit all.
Trackbacks says:
March 31st, 2012 at 04:30

uberVU social comments

Most Tweeted Articles by Virtualization Experts
New home: Frank Denneman


Impact of host local VM swap on HA and DRS |

Page 10 of 11

Virtualization Short Take #35 The weblog of an IT pro specializing in virtualization,
storage, and servers
Top 5 Planet V12n blog posts week 07 | VMvisor
Re: Memory Compression |
VMware VMware
| Release Yourself

| Virtualization Pedia VMware

Leave a Reply
Name (required)
Mail (will not be published) (required)

Submit Comment

Welcome to A blog about virtualization with a strong focus on resource management. Contact me or learn

Follow @frankdenneman on twitter

Subscribe to RSS Get updates sent by RSS

Search this site


Recommended reading

vSphere 5 Clustering technical deepdive By Duncan Epping and Frank Denneman

Also available as e-book


Impact of host local VM swap on HA and DRS |

Page 11 of 11

vSphere 4.1 HA and DRS technical deepdive By Duncan Epping and Frank Denneman
Also available as e-book

The views expressed anywhere on this site are strictly mine and not the opinions and views of VMware.
Copyright (c) 2010
Using a heavily modified WordPress Themes by DBT