You are on page 1of 6

Fault Tolerance

VMware vSphere Pro Series


Instructor: Eric Siebert - vExpert

Fault Tolerance
VMware vSphere Pro Series

In This Lesson:
What is Fault Tolerance (FT)?
Fault Tolerance Requirements
Fault Tolerance Limitations
How to use Fault Tolerance
Fault Tolerance Tips and Best Practices

Fault Tolerance
VMware vSphere Pro Series

What is Fault Tolerance?


• Fault Tolerance (FT) takes the High Availability (HA) feature to
the next level by providing continuous availability for a VM in
case of a host failure
• FT works by keeping a secondary copy of a VM running on
another host server
• In case of a host failure the secondary VM becomes the primary
VM and a new secondary is created on another functional host
• The primary VM and secondary VM stay in sync with each other
by using a technology called Record/Replay
• Record/Replay works by recording the computer execution on a
VM and saving it into a log file which is replayed on another VM
• Record/Replay functionality is built into certain models of Intel &
AMD processors and is called vLockstep by VMware
Fault Tolerance
VMware vSphere Pro Series

What is Fault Tolerance?


• Primary and secondary VMs receive the same inputs, only the
primary VM produces output such as disk writes and network
transmits
• Secondary VMs output is suppressed by the hypervisor and is
not on the network until it becomes a primary VM
• Not everything that happens on the primary VM is copied to the
secondary – certain actions and instructions are not relevant to
the secondary
• Only non-deterministic events which include inputs to the VM
(disk reads, received network traffic, keystrokes, mouse clicks,
etc.) and certain CPU events are recorded
• Inputs are fed to the secondary VM at the same execution point
so it is in exactly the same state as the primary VM

Fault Tolerance
VMware vSphere Pro Series

What is Fault Tolerance?

Fault Tolerance
VMware vSphere Pro Series

What is Fault Tolerance?


• Information from the primary VM is copied to the secondary VM
using a special logging network that is configured on each host
server
• Information that is sent over the FT Logging network between
the host can be very intensive depending on the operation of
the VM
• In the event of a hardware failure there is no interruption in
service or data loss as the secondary VM can always reproduce
execution of the primary VM up to its last output
• FT and HA can be used together to provide maximum
protection, if both the primary and secondary host failed at the
same time HA would restart the VM on another operable host
and re-spawn a new secondary VM
Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Requirements


• Host Requirements
– Hosts must have an FT-capable processor, and both hosts
running an FT VM pair must be in the same processor family
– CPU clock speeds between the two hosts must be within 400
Mhz of each other to ensure that the hosts can stay in sync
– All hosts must be running the same build of ESX or ESXi and
be licensed for Fault Tolerance which is only included in the
Advanced, Enterprise and Enterprise Plus editions of vSphere
– Hosts used together as an FT cluster must share storage for
the protected Virtual Machines (FC, iSCSI, or NAS)
– Requires a dedicated gigabit network interface card (NIC) for
the FT Logging traffic, NICs must also be on the same
network

Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Requirements


• Host Requirements
– Hosts must be in an HA-enabled cluster
– Host certificate checking must be enabled in vCenter Server
(configured in vCenter Server Settings, SSL Settings)
• VM Requirements
– VMs must be single-processor (no vSMP)
– All VM disks must be “thick” (fully-allocated) and not thin,
thin disks will be converted to thick when FT is enabled
– No non-replayable devices (USB, serial/parallel ports, sound,
physical CD-ROM, physical floppy, physical RDMs)
– Most guest operating systems are supported
– Snapshots must be removed before FT can be enabled on a
virtual machine

Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Requirements


• Use VMware SiteSurvey utility to generate a report that shows if
your hosts support FT and if hosts and VMs meet the
prerequisites to use it
• Can also use the use the vCenter Server Profile Compliance tool
to see if your hosts meet FT requirements
Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Limitations


• Only protects against host failures, doesn’t protect against
application or other infrastructure failure (i.e. storage)
• If protection for application failure is something you need then a
solution like MSCS would be better for you
• In case of an OS failure on the primary like a Windows Blue
Screen Of Death (BSOD) the secondary will also experience the
failure as it is an identical copy of the primary
• FT does not protect against a storage failure – since the VMs on
both hosts use the same storage and virtual disk file it is a
single point of failure
• FT is only meant to keep a VM running if there is a problem with
the underlying host hardware

Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Limitations


• Not possible to take snapshots of VMs when FT is enabled
• N_Port ID Virtualization (NPIV) is not supported with FT
• Paravirtual SCSI adapters are not supported with FT
• Physical Raw Disk Mapping (RDM) is not supported with FT
• Hot plug feature is automatically disabled for fault tolerant
virtual machines
• EPT/RVI is automatically disabled for virtual machines with FT
turned on
• IPv6 is not supported, you must use IPv4 addresses with FT
• You can only use FT on a vCenter Server running as a VM if it is
running with a single vCPU

Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Limitations


• VMotion is supported on FT-enabled VMs, but you cannot
VMotion both the primary and secondary VMs at the same time
• Storage VMotion is not supported on FT-enabled VMs
• In vSphere 4.0 FT is compatible with DRS but the automation
level is disabled for FT-enabled VMs
• VMware currently recommends that you do not use FT in a
cluster that consists of a mix of ESX & ESXi hosts
Fault Tolerance
VMware vSphere Pro Series

How to Use Fault Tolerance


• Configure the networking needed for FT on the host servers
(two separate vSwitches on each host, one for VMotion and one
for FT Logging)
• Confirm that the networking is configured by selecting the
Summary tab for the host and the VMotion Enabled and Fault
Tolerance Enabled fields should both say Yes
• Enable FT on a VM by right-clicking on it and choosing the Fault
Tolerance item and then Turn On Fault Tolerance
• Once enabled a secondary VM will then be created on another
host
• Once secondary is created you will see a new Fault Tolerance
section on the Summary tab of the Virtual Machine that will
display FT information

Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Tips and Best Practices


• Because of high overhead and limitations of FT, use sparingly
• If you have a VM that does a lot of disk reading you can reduce
the amount of disk read traffic across the FT logging network by
using a special VM parameter (replay.logReadData = checksum)
• Configure alarms to check for specific conditions such as FT
state, latency, secondary VM status and more
• In the case of a split-brain scenario the secondary VM may try
and become the primary resulting in two primary VMs running
at the same time. This is prevented by using a lock on a special
FT file
• There is a limit of four FT-enabled VM per host (not per cluster)
and it is not a hard limit, but recommended for optimal
performance

Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Tips and Best Practices


• Secondary VM can slow down the primary VM if it is not getting
enough CPU resources to keep up. Set a CPU reservation on the
primary VM which will also be applied to the secondary VM and
will ensure they will run at the same CPU speed
• Patching hosts can be tricky because of the requirement that
the hosts must have the same build level, simplest method is to
temporarily disable FT on VMs, update all hosts in the cluster to
the same build level and then re-enable FT on the VMs
• When FT is enabled any memory limits on the primary VM will
be removed and a memory reservation will be set equal to the
amount of RAM assigned to the VM
Fault Tolerance
VMware vSphere Pro Series

Fault Tolerance Tips and Best Practices


• FT can be enabled/disabled easily at any time, do this when you
need to do something that is not supported when using FT such
as a SVMotion, snapshot or hot-add of hardware to the VM
• Difference between disabling & turning off FT
– Disable preserves secondary VM, configuration & history so
it can be easily re-enabled
– Turning off deletes secondary VM, configuration & history
• Backing up FT-enabled VMs can be tricky because many backup
applications rely on VM snapshots. Look at alternative methods
such as OS backup agents, cloning, temporarily disabling FT,
storage snapshots

Fault Tolerance
VMware vSphere Pro Series

My Favorite Supporting Resources


1. vSphere Availability Guide -
http://vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_availabil
ity.pdf
2. VMware SiteSurvey download page -
http://www.vmware.com/download/shared_utilities.html
3. VMware KB articles on FT:
• http://kb.vmware.com/kb/1013637
• http://kb.vmware.com/kb/1008027
• http://kb.vmware.com/kb/1011965
• http://kb.vmware.com/kb/1020058
• http://kb.vmware.com/kb/1016619

Fault Tolerance
VMware vSphere Pro Series

What We Covered
check boxes
 What is Fault Tolerance (FT)?
bullet point Fault Tolerance Requirements
e.

rder the
 Fault Tolerance Limitations
o to the How to use Fault Tolerance
b in the
k Custom  Fault Tolerance Tips and Best Practices
and use the
ation task
he 


You might also like