Professional Documents
Culture Documents
Overview of the High Availability Add-On for Red Hat Enterprise Linux
Edition 6
Legal Notice
Copyright 2010 Red Hat, Inc. and others.
T his document is licensed by Red Hat under the Creative Commons Attribution-ShareAlike 3.0 Unported
License. If you distribute this document, or a modified version of it, you must provide attribution to Red
Hat, Inc. and provide a link to the original. If the document is modified, all Red Hat trademarks must be
removed.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section
4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity Logo,
and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux is the registered trademark of Linus T orvalds in the United States and other countries.
Java is a registered trademark of Oracle and/or its affiliates.
XFS is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States
and/or other countries.
MySQL is a registered trademark of MySQL AB in the United States, the European Union and other
countries.
Node.js is an official trademark of Joyent. Red Hat Software Collections is not formally related to or
endorsed by the official Joyent Node.js open source or commercial project.
T he OpenStack Word Mark and OpenStack Logo are either registered trademarks/service marks or
trademarks/service marks of the OpenStack Foundation, in the United States and other countries and
are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or
sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.
Abstract
High Availability Add-On Overview provides an overview of the High Availability Add-On for Red Hat
Enterprise Linux 6.
Table of Contents
Table of Contents
.Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . .
1. Document Conventions
4
1.1. T ypographic Conventions
5
1.2. Pull-quote Conventions
6
1.3. Notes and Warnings
7
2. We Need Feedback!
7
.Chapter
. . . . . . . . 1.
. . .High
. . . . .Availability
. . . . . . . . . . . .Add-On
. . . . . . . . Overview
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9. . . . . . . . . .
1.1. Cluster Basics
9
1.2. High Availability Add-On Introduction
10
1.3. Cluster Infrastructure
10
.Chapter
. . . . . . . . 2.
. . .Cluster
. . . . . . . .Management
. . . . . . . . . . . . . .with
. . . . .CMAN
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
............
2.1. Cluster Quorum
12
2.1.1. Quorum Disks
13
2.1.2. T ie-breakers
13
.Chapter
. . . . . . . . 3.
. . .RGManager
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
............
3.1. Failover Domains
15
3.1.1. Behavior Examples
16
3.2. Service Policies
16
3.2.1. Start Policy
17
3.2.2. Recovery Policy
17
3.2.3. Restart Policy Extensions
17
3.3. Resource T rees - Basics / Definitions
18
3.3.1. Parent / Child Relationships, Dependencies, and Start Ordering
18
3.4. Service Operations and States
18
3.4.1. Service Operations
18
3.4.1.1. T he freeze Operation
19
3.4.1.1.1. Service Behaviors when Frozen
19
3.4.2. Service States
19
3.5. Virtual Machine Behaviors
20
3.5.1. Normal Operations
20
3.5.2. Migration
20
3.5.3. RGManager Virtual Machine Features
21
3.5.3.1. Virtual Machine T racking
21
3.5.3.2. T ransient Domain Support
21
3.5.3.2.1. Management Features
21
3.5.4. Unhandled Behaviors
21
3.6. Resource Actions
21
3.6.1. Return Values
22
.........4
Chapter
. ...Fencing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
...........
.Chapter
. . . . . . . . 5.
. . .Lock
. . . . . Management
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
............
5.1. DLM Locking Model
27
5.2. Lock States
28
.Chapter
. . . . . . . . 6.
. . .Configuration
. . . . . . . . . . . . . . .and
. . . .Administration
. . . . . . . . . . . . . . . .T.ools
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
............
6.1. Cluster Administration T ools
29
.Chapter
. . . . . . . . 7.
. . .Virtualization
. . . . . . . . . . . . . . and
. . . . .High
. . . . .Availability
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
...........
7.1. VMs as Highly Available Resources/Services
31
32
33
34
34
. . . . . . . . . .History
Revision
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
...........
Table of Contents
Introduction
T his document provides a high-level overview of the High Availability Add-On for Red Hat Enterprise
Linux 6.
Although the information in this document is an overview, you should have advanced working knowledge
of Red Hat Enterprise Linux and understand the concepts of server computing to gain a good
comprehension of the information.
For more information about using Red Hat Enterprise Linux, refer to the following resources:
Red Hat Enterprise Linux Installation Guide Provides information regarding installation of Red Hat
Enterprise Linux 6.
Red Hat Enterprise Linux Deployment Guide Provides information regarding the deployment,
configuration and administration of Red Hat Enterprise Linux 6.
For more information about this and related products for Red Hat Enterprise Linux 6, refer to the
following resources:
Configuring and Managing the High Availability Add-On Provides information about configuring and
managing the High Availability Add-On (also known as Red Hat Cluster) for Red Hat Enterprise Linux
6.
Logical Volume Manager Administration Provides a description of the Logical Volume Manager
(LVM), including information on running LVM in a clustered environment.
Global File System 2: Configuration and Administration Provides information about installing,
configuring, and maintaining Red Hat GFS2 (Red Hat Global File System 2), which is included in the
Resilient Storage Add-On.
DM Multipath Provides information about using the Device-Mapper Multipath feature of Red Hat
Enterprise Linux 6.
Load Balancer Administration Provides information on configuring high-performance systems and
services with the Red Hat Load Balancer Add-On (Formerly known as Linux Virtual Server [LVS]).
Release Notes Provides information about the current release of Red Hat products.
Note
For information on best practices for deploying and upgrading Red Hat Enterprise Linux clusters
using the High Availability Add-On and Red Hat Global File System 2 (GFS2) refer to the article
"Red Hat Enterprise Linux Cluster, High Availability, and GFS Deployment Best Practices" on Red
Hat Customer Portal at . https://access.redhat.com/kb/docs/DOC-40821.
T his document and other Red Hat documents are available in HT ML, PDF, and RPM versions on the Red
Hat Enterprise Linux Documentation CD and online at http://docs.redhat.com/.
1. Document Conventions
T his manual uses several conventions to highlight certain words and phrases and draw attention to
specific pieces of information.
In PDF and paper editions, this manual uses typefaces drawn from the Liberation Fonts set. T he
Liberation Fonts set is also used in HT ML editions if the set is installed on your system. If not, alternative
but equivalent typefaces are displayed. Note: Red Hat Enterprise Linux 5 and later include the Liberation
Fonts set by default.
Introduction
Whether mono-spaced bold or proportional bold, the addition of italics indicates replaceable or variable
text. Italics denotes text you do not input literally or displayed text that changes depending on
circumstance. For example:
T o connect to a remote machine using ssh, type ssh username@ domain.name at a shell
prompt. If the remote machine is exam ple.com and your username on that machine is
john, type ssh john@ exam ple.com .
T he m ount -o rem ount file-system command remounts the named file system. For
example, to remount the /hom e file system, the command is m ount -o rem ount /hom e.
T o see the version of a currently installed package, use the rpm -q package command. It
will return a result as follows: package-version-release.
Note the words in bold italics above username, domain.name, file-system, package, version and
release. Each word is a placeholder, either for text you enter when issuing a command or for text
displayed by the system.
Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and
important term. For example:
Publican is a DocBook publishing system.
Desktop
Desktop1
documentation
downloads
drafts
images
mss
notes
photos
scripts
stuff
svgs
svn
Source-code listings are also set in m ono-spaced rom an but add syntax highlighting as follows:
Introduction
Note
Notes are tips, shortcuts or alternative approaches to the task at hand. Ignoring a note should
have no negative consequences, but you might miss out on a trick that makes your life easier.
Important
Important boxes detail things that are easily missed: configuration changes that only apply to the
current session, or services that need restarting before an update will apply. Ignoring a box
labeled 'Important' will not cause data loss but may cause irritation and frustration.
Warning
Warnings should not be ignored. Ignoring warnings will most likely cause data loss.
2. We Need Feedback!
If you find a typographical error in this manual, or if you have thought of a way to make this manual
better, we would love to hear from you! Please submit a report in Bugzilla: http://bugzilla.redhat.com/
against the product Red Hat Enterprise Linux 6, the component doc-High_Availability_Add-
Note
T he cluster types summarized in the preceding text reflect basic configurations; your needs might
require a combination of the clusters described.
Additionally, the Red Hat Enterprise Linux High Availability Add-On contains support for
configuring and managing high availability servers only. It does not support high-performance
clusters.
Note
Only single site clusters are fully supported at this time. Clusters spread across multiple physical
locations are not formally supported. For more details and to discuss multi-site clusters, please
speak to your Red Hat sales or support representative.
You can supplement the High Availability Add-On with the following components:
Red Hat GFS2 (Global File System 2) Part of the Resilient Storage Add-On, this provides a cluster
file system for use with the High Availability Add-On. GFS2 allows multiple nodes to share storage at
a block level as if the storage were connected locally to each cluster node. GFS2 cluster file system
requires a cluster infrastructure.
Cluster Logical Volume Manager (CLVM) Part of the Resilient Storage Add-On, this provides
volume management of cluster storage. CLVM support also requires cluster infrastructure.
Load Balancer Add-On Routing software that provides IP-Load-balancing. the Load Balancer AddOn runs in a pair of redundant virtual servers that distributes client requests evenly to real servers
that are behind the virtual servers.
10
Cluster management
Lock management
Fencing
Cluster configuration management
11
Note
By default, each node has one quorum vote. Optionally, you can configure each node to have
more than one vote.
12
2.1.2. Tie-breakers
T ie-breakers are additional heuristics that allow a cluster partition to decide whether or not it is quorate
in the event of an even-split - prior to fencing. A typical tie-breaker construct is an IP tie-breaker,
sometimes called a ping node.
With such a tie-breaker, nodes not only monitor each other, but also an upstream router that is on the
same path as cluster communications. If the two nodes lose contact with each other, the one that wins is
the one that can still ping the upstream router. Of course, there are cases, such as a switch-loop, where
it is possible for two nodes to see the upstream router - but not each other - causing what is called a
split brain. T hat is why, even when using tie-breakers, it is important to ensure that fencing is configured
correctly.
Other types of tie-breakers include where a shared partition, often called a quorum disk, provides
additional details. clumanager 1.2.x (Red Hat Cluster Suite 3) had a disk tie-breaker that allowed
operation if the network went down as long as both nodes were still communicating over the shared
partition.
More complex tie-breaker schemes exist, such as QDisk (part of linux-cluster). QDisk allows arbitrary
heuristics to be specified. T hese allow each node to determine its own fitness for participation in the
cluster. It is often used as a simple IP tie-breaker, however. See the qdisk(5) manual page for more
information.
CMAN has no internal tie-breakers for various reasons. However, tie-breakers can be implemented
using the API. T his API allows quorum device registration and updating. For an example, look at the
QDisk source code.
You might need a tie-breaker if you:
13
Have a two node configuration with the fence devices on a different network path than the path used
for cluster communication
Have a two node configuration where fencing is at the fabric level - especially for SCSI reservations
However, if you have a correct network and fencing configuration in your cluster, a tie-breaker only adds
complexity, except in pathological cases.
14
Chapter 3. RGManager
Chapter 3. RGManager
RGManager manages and provides failover capabilities for collections of cluster resources called
services, resource groups, or resource trees. T hese resource groups are tree-structured, and have
parent-child dependency and inheritance relationships within each subtree.
How RGManager works is that it allows administrators to define, configure, and monitor cluster services.
In the event of a node failure, RGManager will relocate the clustered service to another node with
minimal service disruption. You can also restrict services to certain nodes, such as restricting httpd to
one group of nodes while m ysql can be restricted to a separate set of nodes.
T here are various processes and agents that combine to make RGManager work. T he following list
summarizes those areas.
Failover Domains - How the RGManager failover domain system works
Service Policies - Rgmanager's service startup and recovery policies
Resource T rees - How rgmanager's resource trees work, including start/stop orders and inheritance
Service Operational Behaviors - How rgmanager's operations work and what states mean
Virtual Machine Behaviors - Special things to remember when running VMs in a rgmanager cluster
ResourceActions - T he agent actions RGManager uses and how to customize their behavior from
the cluster.conf file.
Event Scripting - If rgmanager's failover and recovery policies do not fit in your environment, you can
customize your own using this scripting subsystem.
15
16
Chapter 3. RGManager
Note
T hese policies also apply to virtual machine resources.
T he above service tolerance is 3 restarts in 5 minutes. On the fourth service failure in 300 seconds,
rgmanager will not restart the service and instead relocate the service to another available host in the
cluster.
Note
You must specify both parameters together; the use of either parameter by itself is undefined.
17
Resource trees are XML representations of resources, their attributes, parent/child and sibling
relationships. T he root of a resource tree is almost always a special type of resource called a
service. Resource tree, resource group, and service are usually used interchangeably on this wiki.
From rgmanager's perspective, a resource tree is an atomic unit. All components of a resource tree
are started on the same cluster node.
fs:myfs and ip:10.1.1.2 are siblings
fs:myfs is the parent of script:script_child
script:script_child is the child of fs:myfs
18
Chapter 3. RGManager
Important
Failure to follow these guidelines may result in resources being allocated on multiple hosts.
You must not stop all instances of rgmanager when a service is frozen unless you plan to
reboot the hosts prior to restarting rgmanager.
You must not unfreeze a service until the reported owner of the service rejoins the cluster and
restarts rgmanager.
19
Note
Other states, such as starting and stopping are special transitional states of the started
state.
3.5.2. Migration
In addition to normal service operations, virtual machines support one behavior not supported by other
services: migration. Migration minimizes downtime of virtual machines by removing the requirement for a
start/stop in order to change the location of a virtual machine within a cluster.
T here are two types of migration supported by rgmanager which are selected on a per-VM basis by the
migrate attribute:
live (default) the virtual machine continues to run while most of its memory contents are copied to
the destination host. T his minimizes the inaccessibility of the VM (typically well under 1 second) at
the expense of performance of the VM during the migration and total amount of time it takes for the
migration to complete.
pause - the virtual machine is frozen in memory while its memory contents are copied to the
destination host. T his minimizes the amount of time it takes for a virtual machine migration to
complete.
Which migration style you use is dependent on availability and performance requirements. For example,
a live migration may mean 29 seconds of degraded performance and 1 second of complete unavailability
while a pause migration may mean 8 seconds of complete unavailability and no otherwise degraded
performance.
Important
A virtual machine may be a component of service, but doing this disables all forms of migration
and most of the below convenience features.
Additionally, the use of migration with KVM requires careful configuration of ssh.
20
Chapter 3. RGManager
Note
If the VM is running in multiple locations, RGManager does not warn you.
21
22
Chapter 4. Fencing
Chapter 4. Fencing
Fencing is the disconnection of a node from the cluster's shared storage. Fencing cuts off I/O from
shared storage, thus ensuring data integrity. T he cluster infrastructure performs fencing through the
fence daemon, fenced.
When CMAN determines that a node has failed, it communicates to other cluster-infrastructure
components that the node has failed. fenced, when notified of the failure, fences the failed node. Other
cluster-infrastructure components determine what actions to take that is, they perform any recovery
that needs to done. For example, DLM and GFS2, when notified of a node failure, suspend activity until
they detect that fenced has completed fencing the failed node. Upon confirmation that the failed node is
fenced, DLM and GFS2 perform recovery. DLM releases locks of the failed node; GFS2 recovers the
journal of the failed node.
T he fencing program determines from the cluster configuration file which fencing method to use. T wo
key elements in the cluster configuration file define a fencing method: fencing agent and fencing device.
T he fencing program makes a call to a fencing agent specified in the cluster configuration file. T he
fencing agent, in turn, fences the node via a fencing device. When fencing is complete, the fencing
program notifies the cluster manager.
T he High Availability Add-On provides a variety of fencing methods:
Power fencing A fencing method that uses a power controller to power off an inoperable node.
storage fencing A fencing method that disables the Fibre Channel port that connects storage to an
inoperable node.
Other fencing Several other fencing methods that disable I/O or power of an inoperable node,
including IBM Bladecenters, PAP, DRAC/MC, HP ILO, IPMI, IBM RSA II, and others.
Figure 4.1, Power Fencing Example shows an example of power fencing. In the example, the fencing
program in node A causes the power controller to power off node D. Figure 4.2, Storage Fencing
Example shows an example of storage fencing. In the example, the fencing program in node A causes
the Fibre Channel switch to disable the port for node D, disconnecting node D from storage.
23
Specifying a fencing method consists of editing a cluster configuration file to assign a fencing-method
name, the fencing agent, and the fencing device for each node in the cluster.
T he way in which a fencing method is specified depends on if a node has either dual power supplies or
multiple paths to storage. If a node has dual power supplies, then the fencing method for the node must
specify at least two fencing devices one fencing device for each power supply (refer to Figure 4.3,
Fencing a Node with Dual Power Supplies). Similarly, if a node has multiple paths to Fibre Channel
storage, then the fencing method for the node must specify one fencing device for each path to Fibre
Channel storage. For example, if a node has two paths to Fibre Channel storage, the fencing method
should specify two fencing devices one for each path to Fibre Channel storage (refer to Figure 4.4,
Fencing a Node with Dual Fibre Channel Connections).
24
Chapter 4. Fencing
25
You can configure a node with one fencing method or multiple fencing methods. When you configure a
node for one fencing method, that is the only fencing method available for fencing that node. When you
configure a node for multiple fencing methods, the fencing methods are cascaded from one fencing
method to another according to the order of the fencing methods specified in the cluster configuration
file. If a node fails, it is fenced using the first fencing method specified in the cluster configuration file for
that node. If the first fencing method is not successful, the next fencing method specified for that node is
used. If none of the fencing methods is successful, then fencing starts again with the first fencing
method specified, and continues looping through the fencing methods in the order specified in the
cluster configuration file until the node has been fenced.
For detailed information on configuring fence devices, refer to the corresponding chapter in the Cluster
Administration manual.
26
27
28
29
Note
system -config-cluster is not available in RHEL 6.
30
31
RHEL 5 AP Cluster supports both KVM and Xen for use in running virtual machines that are managed by
the host cluster infrastructure.
RHEL 6 HA supports KVM for use in running virtual machines that are managed by the host cluster
infrastructure.
T he following lists the deployment scenarios currently supported by Red Hat:
RHEL 5.0+ supports Xen in conjunction with RHEL AP Cluster
RHEL 5.4 introduced support for KVM virtual machines as managed resources in RHEL AP Cluster
as a T echnology Preview.
RHEL 5.5+ elevates support for KVM virtual machines to be fully supported.
RHEL 6.0+ supports KVM virtual machines as highly available resources in the RHEL 6 High
Availability Add-On.
RHEL 6.0+ does not support Xen virtual machines with the RHEL 6 High Availability Add-On, since
RHEL 6 no longer supports Xen.
Note
For updated information and special notes regarding supported deployment scenarios, refer to
the following Red Hat Knowledgebase entry:
https://access.redhat.com/kb/docs/DOC-46375
T he types of virtual machines that are run as managed resources does not matter. Any guest that is
supported by either Xen or KVM in RHEL can be used as a highly available guest. T his includes variants
of RHEL (RHEL3, RHEL4, RHEL5) and several variants of Microsoft Windows. Check the RHEL
documentation to find the latest list of supported guest operating systems under each hypervisor.
32
33
Linux does not presently support SCSI 3 Persistent Reservations, so it is not suitable for use
with fence_scsi.
VMware vSphere 4.1, VMware vCenter 4.1, VMware ESX and ESXi 4.1 supports running guest
clusters where the guest operating systems are RHEL 5.7+ or RHEL 6.2+. Version 5.0 of VMware
vSphere, vCenter, ESX and ESXi are also supported; however due to an incomplete WDSL schema
provided in the initial release of Vmware vSphere 5.0, the fence_vmware_soap utility does not work
on the default install. Refer to the Red Hat Knowledgebase https://access.redhat.com/knowledge/ for
updated procedures to fix this issue.
Guest clusters must be homogeneous (either all RHEL 5.7+ guests or all RHEL 6.1+ guests).
Mixing bare metal cluster nodes with cluster nodes that are virtualized is permitted.
T he fence_vmware_soap agent requires the 3rd party VMware perl APIs. T his software package
must be downloaded from VMware's web site and installed onto the RHEL clustered guests.
Alternatively, fence_scsi can be used to provide fencing as described below.
Shared storage can be provided by either iSCSI or VMware raw shared block devices.
Usage of VMware ESX guest clusters is supported using either fence_vmware_so_ap or
fence_scsi.
Usage of Hyper-V guest clusters is unsupported at this time.
34
35
Revision History
Revision 1-9.4 00
Rebuild with publican 4.0.0
2013-10-31
Rdiger Landmann
Revision 1-9
Mon Feb 18 2013
fix Author_Group.xml file to fix GA build error
John Ha
Revision 1-6
Mon Feb 18 2013
fix Author_Group.xml file to fix GA build error
John Ha
Revision 1-5
Mon Feb 18 2013
Release for GA of Red Hat Enterprise Linux 6.4
John Ha
Revision 1-4
Mon Nov 28 2012
Release for Beta of Red Hat Enterprise Linux 6.4
John Ha
Revision 1-3
Mon Jun 18 2012
Release for GA of Red Hat Enterprise Linux 6.3
John Ha
Revision 1-2
Update for 6.2 release
John Ha
Revision 1-1
Initial Release
Paul Kennedy
36