You are on page 1of 67

Module 8

Implementing failover clustering

Module Overview

Planning a failover cluster Creating and configuring a new failover cluster Maintaining a failover cluster Troubleshooting a failover cluster Implementing site high availability with stretch clustering

Lesson 1: Planning a failover cluster

Preparing to implement failover clustering Failover-cluster storage Hardware requirements for a failover-cluster implementation Network requirements for a failover-cluster implementation

Demonstration: Verify a network adapter's RSS and RDMA compatibility on an SMB Server

Infrastructure and software requirements for a failover cluster Security considerations Quorum in Windows Server 2016 Planning for migrating and upgrading failover clusters

Preparing to implement failover clustering

Features of failover clustering include:

High availability Stateful application IP-based protocols

Preparing to implement failover clustering Features of failover clustering include: • High availability • Stateful application
Preparing to implement failover clustering Features of failover clustering include: • High availability • Stateful application

Preparing to implement failover clustering

Consider the following guidelines when planning node capacity in a failover cluster:

Distribute the highly-available applications from a failed node Ensure that each node has sufficient capacity

Use hardware with similar capacity for all nodes in a cluster

Preparing to implement failover clustering Consider the following guidelines when planning node capacity in a failover

Failover-cluster storage

Failover clusters require shared storage to provide consistent data to a virtual server after failover

Shared storage options include:

SAS

iSCSI

Fibre Channel

Shared .vhdx

Scale-Out File Server

You can also implement clustered storage spaces to achieve high availability at the storage level

Failover-cluster storage • Failover clusters require shared storage to provide consistent data to a virtual server
Failover-cluster storage • Failover clusters require shared storage to provide consistent data to a virtual server

Hardware requirements for a failover-cluster

implementation

The hardware requirements for a failover

implementation include:

You must use server hardware that is certified for Windows Server

Server nodes should all have the same configuration and contain the same or similar components

All servers must pass the tests in the Validate a Configuration Wizard

Network requirements for a failover-cluster

implementation

The network requirements for a failover

implementation include:

Your server should connect to multiple networks to ensure communication redundancy, or it should

connect to a single network with redundant hardware, to remove single points of failure

You should ensure that network adapters are identical and that they have the same IP protocol

versions, speed, duplex, and flow-control capabilities

Your network adapters should be compatible with RSS and RDMA

Demonstration: Verify a network adapter's RSS

and RDMA compatibility on an SMB Server

In this demonstration, you will learn how to verify a

network adapter’s RSS and RDMA compatibility on

an SMB Server

Infrastructure and software requirements for a

failover cluster

The infrastructure requirements for a failover implementation include:

Active Directory domain controllers should run Windows Server 2008 or newer

Domain-functional level and forest-functional level should run Windows Server 2008 or newer

The application must support Windows Server 2016 high availability

The software best practices for a failover cluster implementation requires that:

All nodes have the same edition of Windows Server 2016, same service pack and updates

Security considerations

Security considerations for failover clustering include that you must:

Provide a method for authentication and authorization

Ensure that unauthorized users do not have physical access to failover cluster nodes

Ensure that you use antimalware software

Ensure that your intra-cluster communication authenticates with Kerberos version 5

If you use an Active Directory-detached cluster:

AD DS objects for network names are not created Cluster network name that you register in a DNS is not necessary to create new objects in AD DS We do not recommend this for any scenario that requires Kerberos authentication You must run Windows Server 2012 R2 or newer on all cluster nodes

Security considerations • Security considerations for failover clustering include that you must: • Provide a method

Security considerations

Windows Server 2016 introduces several cluster

types, and which one you use depends on your domain-membership scenario:

Single-domain clusters Workgroup clusters Multi-domain clusters Workgroup and domain clusters

Security considerations Windows Server 2016 introduces several cluster types, and which one you use depends on

Quorum in Windows Server 2016

Quorum

What has the vote?

When is quorum

mode

maintained?

Node

Only nodes in the

When more than half of

majority

cluster have a vote

the nodes are online

Node and

The nodes in the cluster

When more than half of

disk

and a disk witness have

the votes are online

majority

a vote

Node and file share majority

The nodes in the cluster and a file share witness have a vote

When more than half of the votes are online

No

Only the quorum-

When the shared disk is

majority:

shared disk has a vote

online

disk only

Dynamic

Votes are dynamically

When half the votes are

quorum

assigned to always be

online

odd

Quorum in Windows Server 2016 Quorum What has the vote? When is quorum mode maintained? Node

Quorum in Windows Server 2016

Dynamic quorum:

Disk witness File share witness Azure Cloud Witness

We recommend that you use dynamic quorum, which is the default configuration

You should use all other forms of quorum in specific use cases only

Quorum in Windows Server 2016 • Dynamic quorum: • Disk witness • File share witness •

Planning for migrating and upgrading failover clusters

The upgrade steps for each node in the cluster

include:

Pause the cluster node and drain all cluster resources Migrate cluster resources to another node in the cluster

Replace the cluster node operating system with Windows Server 2016 and add the node back to the cluster

Upgraded all nodes to Windows Server 2016 Run cmdlet Update-ClusterFunctionalLevel

Lesson 2: Creating and configuring a new

failover cluster

The Validation Wizard and the cluster support-policy requirements The process for creating a failover cluster Demonstration: Creating a failover cluster Demonstration: Reviewing the Validation Wizard Configuring roles Demonstration: Creating a general file-server failover cluster Managing failover clusters Configuring cluster properties Configuring failover and failback Configuring storage Configuring networking Configuring quorum options Demonstration: Configuring the quorum

The validation wizard and the cluster support-

policy requirements

Validation Wizard performs multiple types of tests, such as:

Cluster Inventory Network Storage System

You can perform validation from the Validate a Configuration Wizard or with the Test-Cluster Windows PowerShell cmdlet

The process for creating a failover cluster

  • 1. Install the failover clustering feature

  • 2. Verify the configuration, and create a cluster

  • 3. Install the role on all cluster nodes by using Server Manager

  • 4. Create a clustered application by using the Failover Clustering Management snap-in

    • 5. Configure the application

    • 6. Test failover

Demonstration: Creating a failover cluster

In this demonstration, you will learn how to install a

Failover Clustering feature

Demonstration: Reviewing the Validation Wizard

In this demonstration, you will learn how to validate

and configure a failover cluster

Configuring roles

Configuring a cluster role includes:

Choosing a clustering role Installing the role Verifying the status (Running) on all cluster nodes

You can configure a cluster role by using:

The Cluster Manager console The New-Cluster Windows PowerShell cmdlet

Demonstration: Creating a general file-server

failover cluster

In this demonstration, you will learn how to cluster

a file server role

Managing failover clusters

The most common management tasks

include:

Managing nodes Managing networks Managing permissions Configuring cluster-quorum settings Migrating services and applications to a cluster Configuring new services and applications Removing the cluster

Configuring cluster properties

The three aspects of managing cluster nodes

include:

Adding nodes after you create a cluster Pausing nodes, which prevents resources from running on that node

Evicting nodes from a cluster, which removes the node from the cluster configuration

Configuration tasks are available in:

The Actions pane of the Failover Cluster Management console

Windows PowerShell

Configuring failover and failback

During failover, the clustered instance and all associated resources move from one node to another

Failover occurs when:

The node that hosts the instance becomes inactive for some reason

One of the resources within the instance fails

An administrator performs a failover

The Cluster service can fail back after the offline node becomes active again

Failover can be planned or unplanned

Configuring storage

Storage configuration tasks in Failover Clustering include:

Adding storage spaces Adding a disk to available storage and to the CSV Taking a disk offline Bringing the disk back online

Configuring networking

Network

Description

Public network

Clients use this network to connect to the clustered service

Private network

Nodes use this network to communicate with

each other

Public-and-private

Required to communicate with external storage

network

systems

One network can support both client and node communications

Multiple network adapters are recommended for enhanced performance and redundancy iSCSI storage should have a dedicated network

Configuring quorum options

Quorum configuration options available in the

Configure Cluster Quorum Wizard and Windows PowerShell) include:

Use typical settings Add or change the quorum witness Advanced quorum configuration and witness selection

Configuring quorum options Quorum configuration options available in the Configure Cluster Quorum Wizard and Windows PowerShell)

Dynamic quorum and quorum-configuration

considerations

Dynamic quorum management:

Failover cluster dynamically manages the vote assignment to nodes Allows for a cluster to run on the last surviving cluster node Cannot survive a simultaneous failure of a majority of voting nodes If you explicitly remove a vote from a node, the cluster cannot dynamically add or remove that vote.

Quorum configuration considerations include:

Validating the quorum configuration by using the Validate a Configuration Wizard, or the Test-Cluster Windows PowerShell cmdlet.

Changing the quorum configuration only in specific scenarios:

Adding or evicting nodes Node or witness have failed and cannot be recovered quickly Recovering a cluster in a multisite disaster recovery scenario.

Dynamic quorum and quorum-configuration considerations • Dynamic quorum management: • Failover cluster dynamically manages the vote

Demonstration: Configuring the quorum

In this demonstration, you will learn how to

configure a quorum

Lab A: Implementing failover clustering

Exercise 1: Creating a failover cluster

Exercise 2: Verifying quorum settings and adding a node

Logon Information

Virtual machines: 20740A-LON-DC1

20740A-LON-SVR1

20740A-LON-SVR2

20740A-LON-SVR3

20740A-LON-SVR5

20740A-LON-CL1

User name:

Adatum\Administrator

Password: Pa$$w0rd Estimated Time: 45 minutes

Lab Scenario

A. Datum Corporation is looking to ensure that its

critical services, such as file services, have better

uptime and availability. You decide to implement a failover cluster with file services to provide better

uptime and availability.

Lab Review

What information do you need for planning a failover-cluster implementation?

After running Validate a Configuration Wizard,

how can you resolve the network communication’s

single point of failure?

In which situations might it be important to enable failback of a clustered application during a

specific time?

Lesson 3: Maintaining a failover cluster

Monitoring failover clusters

Backing up and restoring failover-cluster configuration

Maintaining failover clusters Managing cluster-network heartbeat traffic What is cluster-aware updating? Demonstration: Configuring CAU

Monitoring failover clusters

Tools you can use to monitor clusters include:

Event Viewer Tracerpt.exe MHTML-formatted cluster configuration reports Performance and Reliability Monitor snap-in

Backing up and restoring failover-cluster configuration

When backing up failover clusters, remember that:

Windows Server Backup is a Windows Server 2016 feature Non-Microsoft tools are available to perform backups and restores You must perform system-state backups

A nonauthoritative restore completely restores a single node in the cluster

An authoritative restore restores the entire cluster configuration to a point in time

Maintaining failover clusters

Failover cluster troubleshooting techniques include:

Using the Validate a Configuration Wizard Reviewing events in logs (cluster, hardware, storage) Defining a process for troubleshooting failover clusters Reviewing storage configuration Checking for group and resource failures

Managing cluster-network heartbeat traffic

Types of network monitoring:

Aggressive Relaxed

Network-monitoring parameter settings:

Delay Threshold

Windows PowerShell cmdlet examples:

Get-Cluster | fl *subnet*

(Get-Cluster).SameSubnetThrehold=10

What is cluster-aware updating?

Automated feature in Windows Server 2016

Updates nodes in a cluster with minimal or no downtime

Benefits:

Updating is automatic Can be scheduled No downtime

What is cluster-aware updating? • Automated feature in Windows Server 2016 • Updates nodes in a

How CAU works

CAU works in two modes:

Remote updating mode:

Configure a separate computer as an orchestrator Install the failover-clustering administrative tools

Ensure that the orchestrator computer is not a cluster member

Self-updating mode:

Configure the CAU clustered role as a workload Ensure that there is no dedicated orchestrator computer Remember that cluster updates itself

How CAU works CAU works in two modes: • Remote updating mode: • Configure a separate

Demonstration: Configuring CAU

In this demonstration, you will learn how to

configure CAU

Lesson 4: Troubleshooting a failover cluster

Communication issues Repairing the cluster name object in AD DS Starting a cluster with no quorum Demonstration: Reviewing the Cluster.Log file Monitoring performance with failover clustering Using Event Viewer with failover clustering Windows PowerShell troubleshooting cmdlets

Communication issues

The following might cause communications issues in failover clustering:

Network latency Network failures Network-adapter driver issues Firewall rules Security software

You can use Get-ClusterLog cmdlet to generate the Cluster.log file for troubleshooting located in C:\Windows\Cluster\Reports

Repairing the cluster name object in AD DS

The CNO repair process:

Use Repair Active Directory Object option in the Failover Cluster Manager

You must have Reset Password permissions on the CNO computer object

The VCO repair process:

Use the AD Recycle Bin feature to recover deleted computer objects, and use the Repair function as the last

recovery action

The CNO will reset the password and self-heal automatically

The CNO must have Create Computer Objects permissions on the VCO’s OU

Starting a cluster with no quorum

Cluster nodes must retain quorum for the cluster to work

If quorum is lost, try to reestablish the quorum If you cannot reestablish quorum during an extended period, start the cluster in the ForceQuorum mode

After you start the cluster in ForceQuorum mode, other nodes can rejoin the cluster

Once quorum is reestablished again, cluster mode changes from ForceQuorum to normal automatically

When joining nodes to the cluster in ForceQuorum mode, you should start other nodes with a setting

preventing quorum

Demonstration: Reviewing the Cluster.Log file

In this demonstration, you will learn how to review the Cluster.log file

Monitoring performance with failover clustering

Some of the failover clustering performance

counters include:

Cluster Network Messages Cluster Network Reconnections Global Update Manager Database Resource Control API Cluster Shared Volumes

Using Event Viewer with failover clustering

Events that are displayed in Event Viewer and require you to troubleshoot clusters include:

Cluster resource in clustered service or application failed

Cluster network interface for cluster node on network failed

File share witness resource failed to arbitrate for the file share

Cluster node was removed from the active failover cluster membership

The Cluster service failed to bring clustered service or application completely online or offline

Cluster network name resource failed registration of one or more associated DNS name(s)

Cluster network name resource cannot be brought online

Windows PowerShell troubleshooting cmdlets

Common cmdlets for troubleshooting failover

clustering include:

Get-Cluster Get-ClusterAccess Get-ClusterDiagnostics Get-ClusterGroup Get-ClusterLog Get-ClusterNetwork Get-ClusterResourceDependencyReport Get-ClusterVMMonitoredItem Test-Cluster Test-ClusterResourceFailure

Lesson 5: Implementing site high availability with

stretch clustering

What is a stretch cluster? Prerequisites for implementing a stretch cluster Synchronous and asynchronous replication Overview of the Storage Replica feature Demonstration: Implementing server-to-server storage replica Selecting a quorum mode for a stretch cluster Configuring a stretch cluster Challenges for deploying a stretch cluster Multisite failover and failback considerations

What is a stretch cluster?

A stretch cluster is a cluster that has been extended so that different nodes in the same cluster reside in separate physical locations

Site A

Site A SAN
SAN
SAN
Site A SAN

Site B

Site B SAN
SAN
SAN
Site B SAN
What is a stretch cluster? A stretch cluster is a cluster that has been extended so

Prerequisites for implementing a stretch cluster

To implement a stretch-failover cluster, you must

ensure the following:

Plan for additional hardware to support enough nodes on each site

Ensure that the same operating systems and service packs are installed on each node Include at least one low-latency and reliable network connection between sites Configure a storage replication mechanism Configure storage infrastructure services on each site

Synchronous and asynchronous replication

In synchronous replication, the host receives a write complete response

from the primary storage after the data is written successfully to both storage locations In asynchronous replication, the host receives a write complete response from the primary storage after the data is written

successfully on the primary storage

Site A Site B Replication Write request Secondary Data Data storage Write complete Primary storage
Site A
Site B
Replication
Write
request
Secondary
Data
Data
storage
Write
complete
Primary
storage

Overview of the Storage Replica feature

Use for disaster recovery or preparedness Configure via Failover Cluster Manager or Windows PowerShell The three replication scenarios are:

Stretch cluster Server-to-server Cluster-to-cluster

Replicates synchronously or asynchronously Requires Windows Server 2016 Datacenter Edition Requires GPT-initialized disks

Overview of the Storage Replica feature • Use for disaster recovery or preparedness • Configure via

Storage Replica

Synchronous replication

Storage Replica • Synchronous replication • Asynchronous replication

Asynchronous replication

Storage Replica • Synchronous replication • Asynchronous replication

Storage Replica

Hyper-V stretch cluster supports synchronous replication only

Storage Replica Hyper-V stretch cluster supports synchronous replication only

Storage Replica

Server-to-server supports both synchronous and

asynchronous replication

Storage Replica Server-to-server supports both synchronous and asynchronous replication

Storage Replica

Cluster-to-cluster supports synchronous replication only

Storage Replica Cluster-to-cluster supports synchronous replication only

Demonstration: Implementing server-to-server

storage replica

In this demonstration, you will learn how to

configure storage replica

Selecting a quorum mode for a stretch cluster

File-share witness:

Requires three or more datacenter locations

Is available in Windows Server 2012 R2 and Windows Server 2016

Azure Cloud Witness:

Requires two datacenter locations Requires Internet connection for all nodes Is available in Windows Server 2016 only

No witness:

Is not recommended Manual failover (disaster-recovery site)

Configuring a stretch cluster

Site-aware failover-cluster services provide:

Failover affinity Cross-site heartbeating Preferred site configuration

Challenges for deploying a stretch cluster

When deploying stretch clusters:

Ensure that the business requirements are met Use storage replication between sites:

Hardware vendor (Windows Server 2012 R2 or earlier) Storage Replica (Windows Server 2016)

Choose the correct quorum witness to properly maintain functionality in the event of failures

Choose the correct storage-replication solution to meet the needs for Storage Replica

Multisite failover and failback considerations

When implementing stretch clusters in disaster

recovery scenarios, consider the following:

Failover time Services for failover Quorum maintenance Storage connection Published services and name resolution Client connectivity Failback procedure

Lab B: Managing a failover cluster

Exercise 1: Evicting a node and verifying quorum settings Exercise 2: Changing the quorum from disk witness to file-share witness, and defining node voting Exercise 3: Verifying high availability

Logon Information

Virtual machines: 20740A-LON-DC1

20740A-LON-SVR1

20740A-LON-SVR2

20740A-LON-SVR3

20740A-LON-SVR5

20740A-LON-CL1

User name:

Password:

Adatum\Administrator

Pa$$w0rd

Estimated Time: 45 min

Lab Scenario

A. Datum Corporation recently implemented

failover clustering for better uptime and

availability. The implementation is new and your boss has asked you to go through some failover-

cluster management tasks so that you are

prepared to manage it moving forward.

Lab Review

Why would you evict a cluster node from a failover cluster?

Do you perform failure-scenario testing for your high-available applications based on Windows

Server failover clustering?

Module Review and Takeaways

Review Questions Real-world Issues and Scenarios Tools Best Practices Common Issues and Troubleshooting Tips