You are on page 1of 59

vSphere High Availability

Protection at Every Level


• vSphere makes it possible to reduce planned
downtime, prevent unplanned downtime, and
recover rapidly from outages.

vSphere HA and vSphere Fault VM


Tolerance VM

vS
vSphere VM

ph
Storage

ere
Site Recovery
vSphere vMotion, vMotion Manager

6
vSphere DRS

NIC
Teaming,
Storage
Multipathing FS
VM F S
VM

vSphere Replication, Third-Party


Backup Solutions, vSphere Data
Protection
Component Server Storage Data Site
vCenter Server Availability: Recommendations
Make VMware vCenter Server™ and the components that it relies on
highly available.
vCenter Server relies on these major components:
• vCenter Server database:
– Create a cluster for the database.

• Authentication identity source:


– For example, VMware Center™ Single Sign-On™ and Active Directory.
– Set up with multiple redundant servers.

Methods for making vCenter Server available:


• Use vSphere HA to protect the vCenter Server virtual machine.
vSphere HA
• vSphere HA uses multiple ESXi hosts configured as a cluster to provide rapid
recovery from outages and cost-effective high availability for applications running in
virtual machines.

Protects against Protects against datastore


server failures accessibility failures

Protects virtual machines


Protects against application
against network isolation
failures
vSphere HA Scenarios: ESXi Host Failure

Virtual Machine A Virtual Machine B


When a host fails,
Virtual Machine A Virtual Machine C Virtual Machine E
vSphere HA restarts
Virtual Machine B Virtual Machine D Virtual Machine F the affected virtual
machines on other
hosts.
ESXi Host ESXi Host ESXi Host

vCenter Server = vSphere HA Cluster


vSphere HA Scenarios:Guest Operating System
Failure

Virtual Machine A Virtual Machine C Virtual Machine E When a virtual


VMware Tools VMware Tools VMware Tools machine stops
Virtual Machine B Virtual Machine D Virtual Machine F sending heartbeats
VMware Tools VMware Tools VMware Tools or the virtual
machine process
ESXi Host ESXi Host ESXi Host crashes (vmx),
vSphere HA resets
the virtual machine.

vCenter Server = vSphere HA Cluster


vSphere HA Scenarios: Application Failure

Application Application Application


When an application
Virtual Machine A Virtual Machine C Virtual Machine E
fails, vSphere HA
Application Application Application restarts the affected
Virtual Machine B Virtual Machine D Virtual Machine F virtual machine on the
same host.
ESXi Host ESXi Host ESXi Host Requires installation
of VMware Tools™.

vCenter Server = vSphere HA Cluster


Importance of Redundant Heartbeat Networks
In a vSphere HA cluster, heartbeats have these characteristics:
• Heartbeats are sent between the master host and the slave hosts.
• They are used to determine whether a master host or slave host has failed.
• They are sent over a heartbeat network.

Redundant heartbeat networks ensure reliable failure detection.


Heartbeat network implementation:
• Implemented by using a VMkernel port marked for management.
Redundancy Using NIC Teaming
You can use NIC teaming to create a redundant heartbeat network on
ESXi hosts.
Ports or port groups used must be VMkernel ports.

NIC Teaming on an ESXi Host


Redundancy Using Additional Networks
You can also create redundancy by configuring more heartbeat
networks: On each ESXi host, create a second VMkernel port on a
separate virtual switch with its own physical adapter.
vSphere HA Architecture
vSphere HA Architecture: Agent Communication
Datastore Datastore Datastore

FDM FDM FDM

vpxa hostd vpxa hostd vpxa hostd

ESXi Host (Slave) ESXi Host (Slave) ESXi Host (Master)

• To configure high vpxd


availability, ESXi hosts are vCenter Server
= Management Network
grouped into an object
called a cluster.
vSphere HA Architecture: Network Heartbeats
VMFS VMFS NAS/
NFS

Virtual Machine A Virtual Machine C Virtual Machine E

Virtual Machine B Virtual Machine D Virtual Machine F

Slave Host Slave Host Master Host

• The master host sends periodic


heartbeats to the slave hosts so
that the slave hosts know that the vCenter Server
master host is alive. Management Network 1
Management Network 2
vSphere HA Architecture: Datastore Heartbeats
VMFS VMFS NAS/NFS
Datastores are used as a
backup communication
channel to detect virtual
Virtual Machine A Virtual Machine C Virtual Machine E machine and host heartbeats.
Virtual Machine
Virtual Machine B Virtual Machine F
D

Slave Host Master Host Slave Host


Cluster Edit Settings Window

vCenter Server
Management Network 1
Management Network 2
Additional vSphere HA Failure Scenarios
o Slave host failure
o Master host failure
o Host isolation
o Virtual machine storage failure:
• Virtual Machine Component Protection
o All Paths Down
o Permanent Device Loss
o Network failures and isolation
Failed Slave Host
• When a slave host does not respond to the network heartbeat
issued by the master host, the master vSphere HA agent tries
to identify the cause.
NAS/NFS VMFS
(Lock File) (Heartbeat Region)
File Locks File Locks

Virtual Machine Virtual Machine Virtual Machine


A C E
Virtual Machine Virtual Machine Virtual Machine
B D F
Failed Slave Master Host Slave Host
Host

vCenter Server Primary Heartbeat Network


Alternate Heartbeat Network
Failed Master Host
• When the master host is placed in maintenance mode or crashes, the slave hosts
detect that the master host is no longer issuing heartbeats.
NAS/NFS VMFS
(Lock File) (Heartbeat Region)

File Locks File Locks

Virtual Machine Virtual Machine Virtual Machine


Default Gateway
A C E
Virtual Machine Virtual Machine (Isolation Address)
Virtual Machine F
B D
Slave Host master host
Failed Master Host Slave Host
MOID: 98 MOID: 99 99
MOID: MOID: 100

Primary Heartbeat Network


vCenter Server Alternate Heartbeat Network
MOID = Managed Object ID
Isolated Host
• If the host does not observe
election traffic on the management
and cannot ping its default
gateway, the host is isolated.

Virtual Machine A Virtual Machine C Virtual Machine E

Virtual Machine B Virtual Machine D Virtual Machine F

ESXi Host ESXi Host ESXi Host

Primary Heartbeat Network


Default Gateway Alternate Heartbeat Network
(Isolation Address)
Design Considerations

• Host isolation events can be minimized through good design:


o Implement redundant heartbeat networks.
o Implement redundant isolation addresses.

• If host isolation events do occur, good design enables vSphere HA to determine


whether the isolated host is still alive.
• Implement datastores so that they are separated from the management network by
using one or both of the following approaches:
o Fibre Channel over fiber optic
o Physically separating your IP storage network from the management network
Virtual Machine Storage Failures
• With an increasing number of
virtual machines and datastores
on each host, storage
connectivity issues have high
costs but are infrequent.
ESXi ESXi
• Connectivity problems due to:
o Network or switch failure
o Array misconfiguration
o Power outage

• Virtual machine availability is


affected:
o Virtual machines on affected hosts are
difficult to manage.
o Applications with attached disks crash.
Virtual Machine Component Protection
• Virtual Machine Component Protection (VMCP) protects against storage failures in a
virtual machine.
• Only vSphere HA clusters that contain ESXi 6 hosts can be used to enable VMCP.

Runs on cluster
enabled for
vSphere HA. Application
availability and
ESXi ESXi remediation.

VMCP detects
and responds to
failures.
Configuring vSphere HA
About Clusters

• A cluster is a collection of
ESXi hosts and their
associated virtual machines,
configured to share their
resources.
• vCenter Server manages
cluster resources like a
single pool of resources.
• Components such as
vSphere HA and VMware Cluster
vSphere® Distributed
Resource Scheduler™ are
configured on a cluster.
vSphere HA Prerequisites
o All hosts must be licensed for vSphere HA.
o A cluster must contain at least two hosts.
o All hosts must be configured with static IP addresses. If you are using DHCP, you must ensure that the address for each
host persists across reboots.
o All hosts must have at least one management network in common.
o All hosts must have access to the same virtual machine networks and datastores.
o For Virtual Machine Monitoring to work, VMware Tools™ must be installed.
o Only vSphere HA clusters that contain ESXi 6 hosts can be used to enable VMCP.
Configuring vSphere HA Settings
• When you create a vSphere HA cluster or configure a cluster, you must configure
settings that determine how the feature works.
Permanent Device Loss and All Paths Down Overview

• vSphere HA uses VMCP to move virtual machines in Permanent Device Loss and
All Paths Down situations to other fully connected hosts.
• Permanent Device Loss:
o The datastore appears as unavailable in the Storage view.
o A storage adapter indicates the operational state as loss of communication.
o All paths to the device are marked as dead.

• All Paths Down:


o The datastore appears as unavailable in the Storage view.
o A storage adapter indicates the operational state as dead or error.
o All paths to the device are marked as dead.
o The vSphere Client is unable to connect directly to the ESXi host.
o The ESXi host appears as disconnected in vCenter Server.
vSphere HA Settings: Virtual Machine Monitoring (1)
• You use Virtual Machine Monitoring settings to control the monitoring of virtual
machines.
vSphere HA Settings: Virtual Machine Monitoring (2)
vSphere HA Settings: Datastore Heartbeating
• A heartbeat file is created on the selected datastores and is used in the event of a
management network failure.
vSphere HA Settings: Admission Control
• vCenter Server uses admission control to ensure that:
 Sufficient resources are available in a cluster to provide failover protection
 Virtual machine resource reservations are respected
vSphere HA Settings: Advanced Options
• To customize vSphere HA behavior, you set advanced vSphere HA options.
• To force cluster not to use the default isolation address (default gateway):
o das.usedefaultisolationaddress = false
• To force cluster to ping alternate isolation addresses:
o das.isolationaddressX = pintable address
• To force cluster to wait beyond default 30-second isolation action window:
o fdm.isolationpolicydelaysec = > 30 sec
Configuring Virtual Machine Overrides
• You can override the vSphere HA settings that are set on a cluster for individual
virtual machines in that cluster.
Network Configuration and Maintenance

• Before changing the networking settings on an ESXi host (adding port groups,
removing virtual switches, and so on), you must suspend the Host Monitoring feature
and place the host in maintenance mode.
• This practice prevents unwanted attempts to fail over virtual machines.
Cluster Resource Reservation
• The Resource Reservation tab reports total cluster CPU, memory, memory
overhead, storage capacity, the capacity reserved by virtual machines, and how
much capacity is still available.
Monitoring Cluster Status
• You can monitor the status of a vSphere HA cluster on the Monitor tab.
Introduction to vSphere Fault Tolerance
vSphere Fault Tolerance
• vSphere Fault Tolerance provides instantaneous failover and continuous availability:
o Zero downtime
o Zero data loss
o No loss of TCP connections
Instantaneous
Failover

Fast Checkpointing

Primary Virtual Machine Secondary Virtual Machine

ESXi
vSphere Fault Tolerance Features (1)
• vSphere Fault Tolerance protects mission-critical, high-performance applications
regardless of the operating system used.
• vSphere Fault Tolerance:
o Supports up to four virtual CPUs
o Supports up to 64 GB of memory
o Supports VMware vSphere® vMotion® for primary and secondary virtual machines
o Creates a secondary copy of all virtual machine files, including disks
o Provides fast checkpoint copying to keep primary and secondary CPUs synchronized
o Supports thin-provisioned disks
o Supports memory virtualization hardware assist
o Supports Enhanced vMotion Compatibility clusters
How vSphere Fault Tolerance Works
with vSphere HA and vSphere DRS
• vSphere Fault Tolerance works with vSphere HA and vSphere DRS.
• vSphere HA:
o Is required for vSphere Fault Tolerance
o Restarts failed virtual machines
o Is vSphere Fault Tolerance aware

• vSphere DRS:
o Selects the virtual machine’s location at power-on
o Does not balance fault-tolerant virtual machines in a balanced cluster

ESXi ESXi ESXi

Primary New Secondary


Secondary
Machine Machine
Machine
Redundant VMDKs
• vSphere Fault Tolerance creates two complete virtual machines.
• Each virtual machine has its own .vmx configuration file and .vmdk files. Each of
these virtual machines can be on a different datastore.

Primary Secondary
.vmx file .vmx file

vmdk file vmdk file vmdk file vmdk file vmdk file vmdk file
Datastore 1 Datastore 2
vSphere vMotion: Precopy
• During a vSphere vMotion migration, a second virtual machine is created on the
destination host. Then the memory of the source virtual machine is copied to the
destination.

VM A VM A

Memory
Bitmap

vSphere vMotion Memory Precopy


Network
Virtual Machine
Port Group

Virtual Machine
End User
vSphere vMotion: Memory Checkpoint

• In vSphere vMotion migration, checkpoint data is the last bit of memory that keeps
changing.

VM A VM A

Memory
Bitmap

vSphere vMotion Checkpoint Data


Network
Virtual Machine
Port Group

Virtual Machine
End User
Shared Files
• vSphere Fault Tolerance has shared files:
o shared.vmft prevents UUID change.
o .ftgeneration is for the split-brain condition.

Primary Host Secondary Host

shared.vmft

.ftgeneration
shared.vmft File
• The shared.vmft file, which is found on a shared datastore, is the vSphere Fault
Tolerance metadata file and contains the primary and secondary instance UUIDs and
the primary and secondary vmx paths.

UUID-1 UUID-1
UUID-2

VM Guest OS

Ref: UUID-1
Enabling vSphere Fault Tolerance on a Virtual
Machine
• You can turn on
vSphere Fault
Tolerance for a
virtual machine
through the
VMware vSphere®
Web Client.
vSphere Distributed Resource Scheduler
vSphere DRS Cluster Prerequisites
• vSphere DRS works best when the virtual machines meet VMware vSphere®
vMotion® migration requirements.
• To use vSphere DRS for load balancing, the hosts in the cluster must be part of a
vSphere vMotion migration network.
o If not, vSphere DRS can still make initial placement recommendations.

• To use shared storage, configure all hosts in the cluster:


o Volumes must be accessible by all hosts.
o Volumes must be large enough to store all virtual disks for your virtual machine.
vSphere DRS Cluster Settings: Automation Level
• Configure the automation level for the initial placement of virtual machines and
dynamic balancing while virtual machines are running.

Automation Level Settings

Migration threshold guides


selection of virtual machines for
migration.
Cluster Settings: Swap File Location for vSphere DRS
• Store the virtual machine’s swap file with the virtual machine or in a specified
datastore.
• VMware recommends that you store the swap file in the same directory as the virtual
machine.
vSphere DRS Cluster Settings: Virtual Machine Affinity

• vSphere DRS affinity


rules specify that selected
virtual machines be
placed either on
the same host (affinity) or
on separate hosts (anti-
affinity).
• Affinity rules:
o Use for multi-virtual machine
systems where virtual Options:
machines communicate • Keep Virtual Machines Together
heavily with one another. • Separate Virtual Machines
• Virtual Machines to Hosts
• Anti-affinity rules:
o Use for multi-virtual machine
systems where load balance
or high availability is desired.
vSphere DRS Cluster Settings: DRS
Groups
• DRS groups are used
in defining VM-Host
affinity rules.
• Types of DRS groups:
o A group of virtual machines
o A group of hosts

• A virtual machine can


belong to multiple
virtual machine DRS
groups.
• A host can belong to
multiple host DRS
groups.
vSphere DRS Cluster Settings: VM-Host Affinity Rules

• A VM-Host affinity
rule:
o Specifies an affinity
relationship between a
virtual machine DRS group
and a host DRS group

Other options:
Must run on hosts in group,
Must Not run on hosts in group,
Should Not run on hosts in group
vSphere DRS Cluster Settings: Automation at the
Virtual Machine Level
• You can customize the automation level for individual virtual machines in a cluster
to override the automation level set on the entire cluster.
Viewing vSphere DRS Cluster Information
• The cluster Summary tab provides information specific to vSphere DRS.
• Clicking the vSphere DRS link on the Monitor tab displays CPU and memory
utilization per host.
Viewing vSphere DRS Recommendations
• The DRS tab displays information about the vSphere DRS recommendations made
for the cluster, the faults that occurred in applying such recommendations, and the
history of vSphere DRS actions.

Refresh recommendations.

Apply a subset of
recommendations.
Apply all
recommendations.
Monitoring Cluster Status
• View the inventory hierarchy for the cluster state.
• You can view the cluster’s Tasks and Events tabs for more information.
Maintenance Mode and Standby Mode
• To service a host in a cluster, for example, to install more memory, or remove a host
from a cluster, you must place the host in maintenance mode:
o Virtual machines on the host should be migrated to another host or shut down.
o You cannot power on virtual machines or migrate virtual machines to a host entering maintenance mode.
o While in maintenance mode, the host does not allow you to deploy or power on a virtual machine.

• When a host is placed in standby mode, it is powered off:


o This mode is used by VMware vSphere® Distributed Power Management™ to optimize power usage.
Removing a Host from the vSphere DRS Cluster
• Before removing a
host from a vSphere
DRS cluster,
consider the
following issues:
o The resource pool
hierarchy remains with
the cluster.
o Because a host must be in
maintenance mode, all
virtual machines running
on that host are powered
off.
o The resources available
for the cluster decrease.
Improving Virtual Machine Performance Methods

Fine
Use network
traffic shaping.

Modify the virtual


machine’s CPU and memory
reservations.
Modify the resource pool’s CPU and
memory limits and reservations.

Broad Use NIC teaming.


Use storage multipathing.

Use a vSphere DRS cluster.

You might also like