20740C 08-GMG

Module 8
Implementing failover clustering

Module Overview
Planning a failover cluster

Creating and configuring a new failover cluster
Maintaining a failover cluster
Troubleshooting a failover cluster
• Implementing site high availability with stretch
clustering
Lesson 1: Planning a failover cluster
Preparing to implement failover clustering

Failover-cluster storage
Hardware requirements for a failover-cluster
implementation
Network requirements for a failover-cluster implementation
Demonstration: Verifying a network adapter's RSS and
RDMA compatibility on an SMB server
Infrastructure and software requirements for a failover
cluster
Security considerations
Quorum in Windows Server 2016
• Planning for migrating and upgrading failover clusters
Preparing to implement failover clustering
Features of failover clustering include:

• Identify services/applications to make highly
available
• Cannot configure failover clustering equally to all
applications
• Only for IP-based applications
• Plan resource utilization adequately
• Hardware that has similar capacity for all nodes
in a cluster
• Identify single failure points Ej: Nic Teaming/
MPIO
Failover Cluster: Components
• Node - A Windows Server 2016 computer that is part of a failover

cluster, and has the failover clustering feature installed. Can Be
Virtual or physical Servers
• Service or application - A service that you can move between
cluster nodes (for example, Hyper-V Role, a clustered file server
can run on either node).
• Shared storage - External storage (SAN/iSCSI/StorageSpace/S2D)
that is accessible to all cluster nodes. (Mandatory)
• Quorum - The number of elements that must be online for a
cluster to continue to run. The quorum is determined when
cluster nodes vote.
• Witness – A server that is participating in cluster voting when the
number of nodes is even.
Failover Cluster: Components
• Failover – The process of moving cluster resources from the first

node to the second node, as a result of node failure or
administrator’s action.
• Switch Over- Is Administrator Planned Fail Over
• Failback - The process of moving cluster resources back from the
second node to the first node, as a result of firs node becoming
again online or administrator’s action.
• Clients - Computers that connect to the failover cluster, and are
not aware which node the service is running on.
• Failover/Fail Back If the service or application fails over from
Node1 to Node2, when Node1 is again available, the service or
application will fail back to Node1.
Hardware requirements for a failover-cluster
implementation
The hardware requirements for a failover

implementation include:
• You must use server hardware that is certified for
Windows Server
• Server nodes should all have the same
configuration and contain the same or similar
components
• All servers must pass the tests in the Validate a
Configuration Wizard
• Ensure that each node runs the same processor
architecture.
Network requirements for a failover-cluster
implementation
The network requirements for a failover
• Your server should connect to multiple networks
to ensure communication redundancy, or it
should connect to a single network with
redundant hardware, to remove single points
of failure
• You should ensure that network adapters are
identical and that they have the same IP protocol
versions, speed, duplex, and flow-control
capabilities
• Your network adapters compatible with RSS and
RDMA for distribution of network-receive Pack.
Infrastructure and software requirements for a
failover cluster
• The infrastructure requirements for a failover

• Active Directory domain controllers should run
Windows Server 2008 or newer
• Domain-functional level and forest-functional level
should run Windows Server 2008 or newer
• The application must support Windows Server 2016
high availability
• The software best practices for a failover cluster
implementation require that:
• All nodes have the same edition of Windows Server
2016 and the same service pack and updates
• Security considerations for failover clustering include that
you must:
• Provide a method for authentication and authorization
• Ensure that unauthorized users do not have physical access to
failover cluster nodes
• Ensure that you use antimalware software
• Ensure that your intra-cluster communication authenticates with
Kerberos version 5
• If you use an Active Directory-detached cluster:
• AD DS objects for network names are not created
• Cluster network name that you register in a DNS is not necessary to
create new objects in AD DS
• We do not recommend this for any scenario that requires Kerberos
authentication
• You must run Windows Server 2012 R2 or newer on all cluster nodes
Windows Server 2016 introduces several cluster

types, and which one you use depends on your
domain-membership scenario:
• Single-domain clusters
• Cluster nodes are members of the same domain
• Workgroup clusters
• Cluster nodes are not joined to the domain
• Configure Only With Windows PowerShell
• Multi-domain clusters
• Cluster nodes are members of the different domains
• Workgroup and domain clusters
• Cluster nodes are members of domains and members that are not
joined to the domain (workgroup servers).
Quorum What has the vote? When is quorum
mode maintained?
Node Only nodes in the When more than half of
majority cluster have a vote the nodes are online
Node and The nodes in the cluster When more than half of
disk and a disk witness have the votes are online
majority a vote
Node and The nodes in the cluster When more than half of
file share and a file share witness the votes are online
majority have a vote
No Only the quorum- When the shared disk is
majority: shared disk has a vote online
disk only
Dynamic Votes are dynamically When half the votes are
quorum assigned to always be online
odd
• Dynamic quorum (Windows 2012)

• Dynamic witness (Windows 2012r2)
• Disk witness
• File share witness
•Azure Cloud Witness (Windows 2016)

•Admin should configure a witness for all clusters
• We recommend that you use dynamic quorum,
which is the default configuration
• You should use all other forms of quorum in
specific use cases only
Dynamic quorum / Dynamic witness
Planning for migrating and upgrading failover clusters
The upgrade steps for each node in the cluster

include:
• Pause the cluster node and drain all cluster resources
• Migrate cluster resources to another node in the cluster
• Replace the cluster node operating system with Windows
Server 2016 and add the node back to the cluster
• Upgrade all nodes to Windows Server 2016
• Run cmdlet Update-ClusterFunctionalLevel
• Site-aware Failover Clusters, Workgroup and Multi-domain
Clusters, Virtual Machine Node Fairness, Virtual Machine Start
Order, Simplified SMB Multichannel, and Multi-NIC Cluster
Networks
Virtual Machine Node Fairness
Low: consume 80% of the node’s capacity

Medium: consume 70% of the node’s capacity
High: consume 60% of the node’s capacity
Lesson 2: Creating and configuring a new
failover cluster
The validation wizard and the cluster support-policy requirements

The process for creating a failover cluster
Demonstration: Creating a failover cluster
Demonstration: Reviewing the validation wizard
Configuring roles
Demonstration: Creating a general file-server failover cluster
Managing failover clusters
Configuring cluster properties
Configuring failover and failback
Configuring storage
Configuring networking
The validation wizard and the cluster
support-policy requirements
• Validation wizard performs multiple types of

tests, such as:
• Cluster
• Inventory
• Network
• Storage
• System
• You can perform validation from the Validate a

Configuration Wizard or with the Test-Cluster
Windows PowerShell cmdlet
The process for creating a failover cluster
1. Install the failover clustering feature

2. Verify the configuration, and create a cluster
3. Install the role on all cluster nodes by using
Server Manager
4. Create a clustered application by using the
Failover Clustering Management snap-in
5. Configure the application
6. Test failover
Configuring roles
• Configuring a cluster role includes:

• Choosing a clustering role Ej: File Server, Virtual
Machine
• Installing the role
• Verifying the status (Running) on all cluster nodes
• You can configure a cluster role by using:

• The Cluster Manager console
• The New-Cluster Windows PowerShell cmdlet
Demonstration: Creating a general file-server
failover cluster
In this demonstration, you will learn how to cluster

a file server role
Managing failover clusters
The most common management tasks

include:
• Managing nodes
• Managing networks
• Managing permissions
• Configuring cluster-quorum settings
• Migrating services and applications to a cluster
• Configuring new services and applications
• Removing the cluster
Configuring cluster properties
The three aspects of managing cluster nodes

include:
• Adding nodes after you create a cluster
• Pausing nodes, which prevents resources from
running on that node
• Evicting nodes from a cluster, which removes the
node from the cluster configuration
Configuration tasks are available in:
• The Actions pane of the Failover Cluster
Management console
• Windows PowerShell
Configuring failover and failback
• During failover, the clustered instance and all

associated resources move from one node to
another
• Failover occurs when:
• The node that hosts the instance becomes inactive for
some reason
• One of the resources within the instance fails
• An administrator performs a failover
• The Cluster service can fail back after the offline

node becomes active again
• Failover can be planned or unplanned
Configuring storage
Storage configuration tasks in Failover Clustering

include:
• Adding storage spaces
• Adding a disk to available storage and to the CSV
• Taking a disk offline
• Bringing the disk back online
Configuring networking
Network Description
Public network Clients use this network to connect to the
clustered service
Private network Nodes use this network to communicate with
each other heartbeats UDP 3343
Public-and-private Required to communicate with external storage
network systems
• One network can support both client and node

communications
• Multiple network adapters are recommended for
enhanced performance and redundancy
• iSCSI storage should have a dedicated network
Configuring quorum options
Quorum configuration options available in the

Configure Cluster Quorum Wizard and Windows
PowerShell include:
• Use typical settings
• Add or change the quorum witness
• Advanced quorum configuration and witness selection
Dynamic quorum and quorum-configuration
considerations
• Dynamic quorum management:
• Failover cluster dynamically manages the vote assignment to nodes
• Allows for a cluster to run on the last surviving cluster node
• Cannot survive a simultaneous failure of a majority of voting nodes
• If you explicitly remove a vote from a node, the cluster cannot
dynamically add or remove that vote
• Quorum configuration considerations include:
• Validating the quorum configuration by using the Validate a
Configuration Wizard or the Test-Cluster Windows PowerShell
cmdlet
• Changing the quorum configuration only in specific scenarios:
• Adding or evicting nodes
• Node or witness have failed and cannot be recovered quickly
• Recovering a cluster in a multisite disaster recovery scenario
Demonstration: Configuring the quorum
In this demonstration, you will learn how to

configure a quorum
Lab A: Implementing failover clustering
Exercise 1: Creating a failover cluster

• Exercise 2: Verifying quorum settings and adding a
node
Logon Information
Virtual machines: 20740C-LON-DC1
20740C-LON-SVR1
20740C-LON-SVR2
20740C-LON-SVR3
20740C-LON-SVR4
20740C-LON-CL1
User name: Adatum\Administrator
Password: Pa55w.rd
Estimated Time: 45 minutes

Lab Scenario
Adatum Corporation is looking to ensure that its

critical services, such as file services, have better
uptime and availability. You decide to implement
a failover cluster with file services to provide
better uptime and availability.
Lab Review
What information do you need for planning a

failover-cluster implementation?
After running Validate a Configuration Wizard,
how can you resolve the network communication’s
single point of failure?
• In which situations might it be important to
enable failback of a clustered application during a
specific time?
Lesson 3: Maintaining a failover cluster
Monitoring failover clusters

Backing up and restoring failover-cluster
configuration
Maintaining failover clusters
Managing cluster-network heartbeat traffic
What is Cluster-Aware Updating?
• Demonstration: Configuring CAU
Monitoring failover clusters
Tools you can use to monitor clusters include:

• Event Viewer
• Tracerpt.exe
• MHTML-formatted cluster configuration reports
• Performance and Reliability Monitor snap-in
Backing up and restoring failover-cluster configuration
• When backing up failover clusters, remember that:

• Windows Server Backup is a Windows Server 2016 feature
• Non-Microsoft tools are available to perform backups and restores
• You must perform system-state backups
• A nonauthoritative restore completely restores a single
node in the cluster
• An authoritative restore restores the entire cluster
configuration to a point in time
Maintaining failover clusters
Failover cluster troubleshooting techniques

include:
• Using the Validate a Configuration Wizard
• Reviewing events in logs (cluster, hardware, storage)
• Defining a process for troubleshooting failover clusters
• Reviewing storage configuration
• Checking for group and resource failures
Managing cluster-network heartbeat traffic
• Types of network monitoring:

• Aggressive
• Relaxed
• Network-monitoring parameter settings:

• Delay
• Threshold
• Windows PowerShell cmdlet examples:

Get-Cluster | fl *subnet*
(Get-Cluster).SameSubnetThreshold=10
What is Cluster-Aware Updating?
• Automated feature in Windows Server 2016

• Updates nodes in a cluster with minimal or no
downtime
• Benefits:
• Updating is automatic
• Can be scheduled
• No downtime
How CAU works
CAU works in two modes:

• Remote updating mode:
• Configure a separate computer (W8.1)as an orchestrator
• Install the failover-clustering administrative tools
• Ensure that the orchestrator computer is not a cluster
member
• Self-updating mode:
• Configure the CAU clustered role as a workload
• Ensure that there is no dedicated orchestrator computer
• Remember that cluster updates itself
• Show Updating Run in progress with Get-CauRun
cmdlet
Demonstration: Configuring CAU

configure CAU
Lesson 4: Troubleshooting a failover cluster
Communication issues
Repairing the cluster name object in AD DS
Starting a cluster with no quorum
Demonstration: Reviewing the Cluster.log file
Monitoring performance with failover clustering
Using Event Viewer with failover clustering
• Windows PowerShell troubleshooting cmdlets
Communication issues
• The following might cause communications issues

in failover clustering:
• Network latency
• Network failures
• Network-adapter driver issues
• Firewall rules (Permit heartbeats port UDP 3343)
• Security software
• You can use the Get-ClusterLog cmdlet to

generate the Cluster.log file for troubleshooting.
You can find this file in
C:\Windows\Cluster\Reports
Repairing the cluster name object in AD DS
• The CNO repair process:

• Use the Repair Active Directory Object option in the
Failover Cluster Manager
• You must have Reset Password permissions on the CNO
computer object
• The VCO repair process:
• Use the AD Recycle Bin feature to recover deleted
computer objects, and use the Repair function as the
last recovery action
• The CNO will reset the password and heal itself
automatically
• The CNO must have Create Computer Objects
permissions on the VCO’s OU
Starting a cluster with no quorum
• Cluster nodes must retain the quorum for the cluster

to work
• If the quorum is lost, try to reestablish the quorum
• If you cannot reestablish the quorum during an
extended period, start the cluster in the
ForceQuorum mode Start-ClusterNode –FQ switch,
• After you start the cluster in ForceQuorum mode,
other nodes can rejoin the cluster
• Once the quorum is reestablished, cluster mode
changes from ForceQuorum to normal automatically
• When joining nodes to the cluster in ForceQuorum
mode, you should start other nodes with a setting
preventing the quorum Start-ClusterNode –PQ switch
Demonstration: Reviewing the Cluster.log file
In this demonstration, you will learn how to review

the Cluster.log file
Monitoring performance with failover clustering
Some of the failover clustering performance

counters include:
• Cluster Network Messages
• Cluster Network Reconnections
• Global Update Manager
• Database
• Resource Control
• API
• Cluster Shared Volumes
• Cluster Shared Volumes is a storage architecture that is optimized
for Hyper-V Virtual Machines, and examples include IO Read
Bytes, IO Reads, IO Write Bytes, and IO Writes.
Using Event Viewer with failover clustering
Events that are displayed in Event Viewer and require you

to troubleshoot clusters include:
• Cluster resource in clustered service or application failed
• Cluster network interface for cluster node on network
failed
• File share witness resource failed to arbitrate for the file
share
• Cluster node was removed from the active failover cluster
membership
• The Cluster service failed to bring clustered service or
application completely online or offline
• Cluster network name resource failed registration of one
or more associated DNS names
• Cluster network name resource cannot be brought online
Windows PowerShell troubleshooting cmdlets
Common cmdlets for troubleshooting failover

clustering include:
• Get-Cluster
• Get-ClusterAccess
• Get-ClusterDiagnostics
• Get-ClusterGroup
• Get-ClusterLog
• Get-ClusterNetwork
• Get-ClusterResourceDependencyReport
• Get-ClusterVMMonitoredItem
• Test-Cluster
• Test-ClusterResourceFailure
Lesson 5: Implementing site high availability with
stretch clustering
What is a stretch cluster?

Prerequisites for implementing a stretch cluster
Synchronous and asynchronous replication
Overview of the Storage Replica feature
Demonstration: Implementing server-to-server
storage replica
Selecting a quorum mode for a stretch cluster
Configuring a stretch cluster
Challenges for deploying a stretch cluster
• Multisite failover and failback considerations
What is a stretch cluster?
A stretch cluster is a cluster that has been extended so that
different nodes in the same cluster reside in separate
physical locations Provides highly available services in more
than one location
A stretch cluster is a configuration that has one cluster with
nodes in two locations and storage in both locations
Site A Site B
SAN SAN
Cannot share a disk between sites

Prerequisites for implementing a stretch cluster
To implement a stretch-failover cluster:

• Plan for additional hardware to support enough nodes on
each site
• Ensure that the same operating systems and service
packs are installed on each node
• Include at least one low-latency and reliable network
connection between sites
• Configure a storage replication mechanism
• Failover clustering does not provide a storagereplication
mechanism.
• Configure storage infrastructure services on each site
• Ensure AD DS and DNS, are available on a second site
Synchronous and asynchronous replication
• In synchronous replication, the host receives a write complete
response from the primary storage after the data is written
successfully to both storage locations
• In asynchronous replication, the host receives a write complete
response from the primary storage after the data is written
successfully on the primary storage
Site A Site B
Replication
Write
request
Secondary
Data Data storage
Write
complete Primary
storage
The Storage Replica feature utilizes synchronous or asynchronous
Block-level replication separate from whatever vendor storage
might be at the location
Overview of the Storage Replica feature
• Use for disaster recovery or preparedness

• Configure via Failover Cluster Manager or
Windows PowerShell
• The three replication scenarios are:
• Stretch cluster
• Server-to-server
• Cluster-to-cluster
• Replicates synchronously or asynchronously

• Requires W2016 / 2019 Datacenter Edition
unlimited Space to SR, W2019 until 2TB in
Standard Edition
• Requires GPT-initialized disks
Storage Replica
• Synchronous replication
• Asynchronous replication
5
Storage Replica
Hyper-V stretch cluster supports synchronous

replication only
Storage Replica
Server-to-server supports both synchronous and

asynchronous replication
Storage Replica
Cluster-to-cluster supports synchronous

replication only
Demonstration: Implementing server-to-server
storage replica

configure storage replica
Selecting a quorum mode for a stretch cluster
• File-share witness:
• Requires three or more datacenter locations
• Is available in Windows Server 2012 R2 and
Windows Server 2016
• Azure Cloud Witness:
• Requires two datacenter locations
• Requires Internet connection for all nodes
• Is available in Windows Server 2016 only
• No witness:
• Is not recommended
• Manual failover (disaster-recovery site)
Configuring a stretch cluster
Site-aware failover-cluster services provide:

• Failover affinity
• Fail over to a node on the same site
• Cross-site heartbeating
• Determine the heartbeat setting for the same site nodes
• Preferred site configuration
• (Get-Cluster).PreferredSite = 1
• This allows you to identify what nodes the roles should attempt to
bring online first
SET Site-Aware Clustering
(Get-ClusterNode Node1).Site=1
Challenges for deploying a stretch cluster
When deploying stretch clusters:

• Ensure that the business requirements are met
• Use storage replication between sites:
• Hardware vendor (Windows Server 2012 R2 or earlier)
• Storage Replica (Windows Server 2016)
• Choose the correct quorum witness to properly
maintain functionality in the event of failures
• Choose the correct storage-replication solution to meet
the needs for Storage Replica
Multisite failover and failback considerations
When implementing stretch clusters in disaster- recovery scenarios, consider:
• Failover time
• Must decide how long you should wait before you pronounce a Disaster.
• Services for failover
• You should clearly define critical services should fail over to another site
• Quorum maintenance
• Quorum model so that each site has enough votes for maintaining cluster
functionality.
• Storage connection
• Storage available at each site
• Published services and name resolution
• Procedure for changing DNS records Internal / External
• Client connectivity to another Site
• Failback procedure
Lab B: Managing a failover cluster
Exercise 1: Evicting a node and verifying quorum settings

Exercise 2: Changing the quorum from disk witness to
file-share witness and defining node voting
• Exercise 3: Verifying high availability
Logon Information
Virtual machines: 20740C-LON-DC1
20740C-LON-SVR1
20740C-LON-SVR2
20740C-LON-SVR3
20740C-LON-SVR4
20740C-LON-CL1
User name: Adatum\Administrator
Password: Pa55w.rd
Estimated Time: 45 min
Lab Scenario
Adatum Corporation recently implemented

failover clustering for better uptime and
availability. The implementation is new and your
boss has asked you to go through some failover-
cluster management tasks so that you are
prepared to manage it moving forward.
Lab Review
Why would you evict a cluster node from a failover

cluster?
• Do you perform failure-scenario testing for your
high-available applications based on Windows
Server failover clustering?
Module Review and Takeaways
Review Questions
Real-world Issues and Scenarios
Tools
Best Practices
• Common Issues and Troubleshooting Tips

20740C 08-GMG

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

20740C 08-GMG

Uploaded by

Copyright:

Available Formats

Module 8

Implementing failover clustering

Planning a failover cluster

Preparing to implement failover clustering

Features of failover clustering include:

• Node - A Windows Server 2016 computer that is part of a failover

• Failover – The process of moving cluster resources from the first

The hardware requirements for a failover

• The infrastructure requirements for a failover

Windows Server 2016 introduces several cluster

• Dynamic quorum (Windows 2012)

•Azure Cloud Witness (Windows 2016)

The upgrade steps for each node in the cluster

Low: consume 80% of the node’s capacity

The validation wizard and the cluster support-policy requirements

• Validation wizard performs multiple types of

• You can perform validation from the Validate a

1. Install the failover clustering feature

• Configuring a cluster role includes:

• You can configure a cluster role by using:

In this demonstration, you will learn how to cluster

The most common management tasks

The three aspects of managing cluster nodes

• During failover, the clustered instance and all

• The Cluster service can fail back after the offline

Storage configuration tasks in Failover Clustering

• One network can support both client and node

Quorum configuration options available in the

In this demonstration, you will learn how to

Exercise 1: Creating a failover cluster

Estimated Time: 45 minutes

Adatum Corporation is looking to ensure that its

What information do you need for planning a

Monitoring failover clusters

Tools you can use to monitor clusters include:

• When backing up failover clusters, remember that:

Failover cluster troubleshooting techniques

• Types of network monitoring:

• Network-monitoring parameter settings:

• Windows PowerShell cmdlet examples:

• Automated feature in Windows Server 2016

CAU works in two modes:

In this demonstration, you will learn how to

• The following might cause communications issues

• You can use the Get-ClusterLog cmdlet to

• The CNO repair process:

• Cluster nodes must retain the quorum for the cluster

In this demonstration, you will learn how to review

Some of the failover clustering performance

Events that are displayed in Event Viewer and require you

Common cmdlets for troubleshooting failover

What is a stretch cluster?

Cannot share a disk between sites

To implement a stretch-failover cluster:

• Use for disaster recovery or preparedness

• Replicates synchronously or asynchronously

Hyper-V stretch cluster supports synchronous

Server-to-server supports both synchronous and

Cluster-to-cluster supports synchronous

In this demonstration, you will learn how to

Site-aware failover-cluster services provide:

When deploying stretch clusters:

Exercise 1: Evicting a node and verifying quorum settings