You are on page 1of 6

AskF5 | Manual Chapter: Fail-Safe and Fast Failover http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmo...

Solutions Products Community Support Partners Education About Us Search

LOGIN SELF-HELP DOCUMENTATION SERVICES DOWNLOADS

AskF5 Home > Products > BIG-IP LTM > BIG-IP TMOS: Concepts > Fail-Safe and Fast Failover

Applies To: Show Versions


Manual Chapter: Fail-Safe and Fast Failover

Table of Contents | << Previous Chapter | Next Chapter >>

8
Fail-Safe and Fast Failover
Introduction

Understanding fail-safe

Understanding fast failover

Introduction
Two features of high availability are fail-safe and HA groups. The remainder of this chapter describes these features.

®
Important: For information on implementing configuration synchronization, failover, and connection mirroring, see the BIG-IP Redundant Systems
® ®
Configuration Guide and BIG-IP TMOS : Implementations.

Understanding fail-safe
®
Fail-safe is the ability of a BIG-IP system to monitor certain aspects of the system or network, detect interruptions, and consequently take some
action. More specifically:
In the case of a redundant system, a device can detect a problem with a service or VLAN, and initiate failover to a peer device.
In the case of a single device (non-redundant system), the system can detect a problem with a service or VLAN, and take some action, such as
restarting the service.

The BIG-IP system offers these types of fail-safe:


System fail-safe
Monitors the switch board component and a set of key system services.
VLAN fail-safe
Monitors traffic on a VLAN.

Note: Only users with either the Administrator or Resource Administrator user role can configure fail-safe.

To configure and manage fail-safe, log in to the BIG-IP Configuration utility, and on the Main tab, expand System, and click High Availability.

Configuring system fail-safe

1 of 6 5/2/2014 12:24 PM
AskF5 | Manual Chapter: Fail-Safe and Fast Failover http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmo...

Solutions Products Community Support Partners Education About Us

LOGIN SELF-HELP DOCUMENTATION SERVICES DOWNLOADS


Using the Configuration utility, you can specify the action that you want the BIG-IP system to take when the component fails. Possible actions that the
BIG-IP system can take are:
Reboot the BIG-IP system.
Restart all system services.
Go offline.
Go offline and abort the TMM service.
Take no action.

Configuring system-service monitoring


You can specify the particular action that you want the BIG-IP system to take when the heartbeat of a system service fails. The following table lists
each system service, and shows the possible actions that the BIG-IP system can take in the event of a heartbeat failure.

Table 8.1 Possible actions in response to heartbeat failure

System Service Possible actions

BIGD Restart the system service.


Restart all system services.
MCPD Reboot the BIG-IP system.
Go offline.
GTMD
Go offline and abort the TMM service.
TMROUTED Take no action.
(Single device only)

BCM56XXD Restart
No action

TMM Go offline and down links and restart.


Restart the system service.
Restart all system services.
Reboot the BIG-IP system.
Go offline.
Go offline and restart.

SOD Reboot the system.


Restart all system services.
Take no action.

CBRD Restart the system service.


Take no action.

SCRIPTD Restart the system service.


Restart all system services.
Reboot the BIG-IP system.
Go offline.
Go offline and restart.
Take no action.

DEDUP_ADMIN Restart the system service.


Take no action.

Configuring VLAN fail-safe


For maximum reliability, the BIG-IP system supports failure detection on all VLANs. When you configure the fail-safe option on a VLAN, the BIG-IP

2 of 6 5/2/2014 12:24 PM
AskF5 | Manual Chapter: Fail-Safe and Fast Failover http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmo...

Solutions Products Community Support Partners Education About Us

LOGIN SELF-HELP DOCUMENTATION SERVICES DOWNLOADS


For a single device, the system can either reboot or restart all system services. The default action is Reboot.

Warning: You should configure the fail-safe option on a VLAN only after the BIG-IP system is in a stable production environment. Otherwise, routine
network changes might cause failover unnecessarily.

Each interface card installed on the BIG-IP system is typically mapped to a different VLAN. Thus, when you set the fail-safe option on a particular
VLAN, you need to know the interface to which the VLAN is mapped. You can use the Configuration utility to view VLAN names and their associated
interfaces.
There are two ways to configure VLAN fail-safe: from the Redundancy Properties screen, or from the VLANs screen.

Understanding fast failover


The BIG-IP system includes a feature known as fast failover. Fast failover is a feature that is based on the concept of an HA group. An HA group is
a set of trunks, pools, or clusters (or any combination of these) that you want the BIG-IP system to use to calculate an overall health score for a
device in a redundant system configuration. A health score is based on the number of members that are currently available for any trunks, pools, and
clusters in the HA group, combined with a weight that you assign to each trunk, pool, and cluster. The device that has the best overall score at any
given time becomes or remains the active device.
To configure and manage fast failover, log in to the BIG-IP Configuration utility, and on the Main tab, expand System, and click High Availability.

Note: To use the fast failover feature, you must first create a redundant system configuration.

The fast failover feature is designed for a redundant configuration that contains a maximum of two devices in a device group.

®
Note: Only VIPRION systems can have a cluster as an object in an HA group. For all other platforms, HA group members consist of pools and
trunks only.

An HA group is typically configured to fail over based on trunk health in particular. Trunk configurations are not synchronized between units, which
means that the number of trunk members on the two units often differs whenever a trunk loses or gains members. The HA group feature allows
failover to occur based on changes to trunk health instead of on system or VLAN failure.
Only one HA group can exist on the BIG-IP system. By default, the HA group feature is disabled.
To summarize, when you configure the HA group, the process of one BIG-IP device failing over to the other based on HA scores is noticeably faster
than if failover occurs due to a hardware or daemon failure.

Understanding HA score calculation


The BIG-IP system calculates an HA score based on these criteria:
The number of available members for each object (such as a trunk)
The weight that you assign to each object in the HA group
The threshold you specify for each object (optional)
The active bonus value that you specify for the HA group

Specifying a weight
A weight is a health value that you assign to each object in the HA group (that is, pool, trunk, and cluster). The weight that you assign to each object
must be in the range of 10 through 100. The maximum overall score that the BIG-IP system can potentially calculate for a device is the sum of the
individual weights for the HA group objects, plus the active bonus value. (For information on the Active Bonus setting, see Specifying an active
bonus.)
Table 8.2 shows an example of how the system calculates a score for the device, based solely on the weight of objects in the HA group. In this
example, the HA group contains two pools (my_http_pool and my_ftp_pool) and one trunk (my_trunk1). A user has assigned a weight to each
object.

Table 8.2 Example of an HA score calculation for a device

Available User-assigned HA
Object Name Members Members Weight Score

3 of 6 5/2/2014 12:24 PM
AskF5 | Manual Chapter: Fail-Safe and Fast Failover http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmo...

Solutions Products Community Support Partners Education About Us

LOGIN SELF-HELP DOCUMENTATION SERVICES DOWNLOADS


x 20)

my_trunk1 4 3 (75%) 30 23
(75%
x 30)

Total device score = 74

On each device, the system uses each weight, along with a percentage that the system derives for each object (the percentage of the objects
members that are available), to calculate a score for each object.
The system then adds the scores to determine a total score for the device. The device with the highest score becomes or remains the active device
in the redundant system configuration.
Note that if you have configured VLAN fail-safe, and the VLAN fails on an active device, the device goes offline regardless of its score, and its peer
becomes active.

Specifying a threshold
For each object in an HA group, you can specify an optional setting known as a threshold. A threshold is a value that specifies the number of object
members that must be available to prevent failover. If the number of available members dips below the threshold, the BIG-IP system assigns a score
of 0 to the object, so that the score of that object no longer contributes to the overall score of the device.
For example, if a trunk in the HA group has four members and you specify a threshold value of 3, and the number of available trunk members falls to
2, then the trunk contributes a score of 0 to the total device score.
If the number of available object members equals or exceeds the threshold value, or you do not specify a threshold, the BIG-IP system calculates the
score as described previously, by multiplying the percentage of available object members by the weight for each object and then adding the scores to
determine the overall device score.

Tip: Do not configure the tmsh attribute min-up-members on any pool that you intend to include in the HA group.

Specifying an active bonus


An active bonus is an amount that the BIG-IP system automatically adds to the overall score of the active device. An active bonus ensures that the
active device remains active when the devices score would otherwise temporarily fall below the score of the standby device.
A common reason to specify an active bonus is to prevent failover due to flapping, the condition where failover occurs frequently as a trunk member
toggles between availability and unavailability. In this case, you might want to prevent the HA scoring feature from triggering failover each time a
trunk member is lost. You might also want to prevent the HA scoring feature from triggering failover when you make minor changes to the BIG-IP
system configuration, such as adding or removing a trunk member.
Suppose that the HA group on each device contains a trunk with four members, and you assign a weight of 30 to each trunk. Without an active bonus
defined, if the trunk on device 1 loses some number of members, failover occurs because the overall calculated score for device 1 becomes lower
than that of device 2. Table 8.3 shows the scores that could result if the trunk on device 1 loses one trunk member and no active bonus is specified.

Members
Event Available Trunk score Device Score Failover?

Device 1 Device 2 Device 1 Device 2 Device Device


1 2

Device 1 is 4 (100%) 4 30 30 30 30 No
active (initial (100%)
state)

Device 1 loses a 3 (75%) 4 (100%) 23 30 23 30 Yes


trunk member (75% of 30) (100% of 30)

Table 8.3 Failover when no active bonus is specified

You can prevent this failover from occur i ng by specifying an active bonus value. In our example, if we specify an active bonus of 10 (the default
value), the score of the active device changes from 23 to 33, thereby ensuring that the score of the active device remains equal to or higher than that
of the standby device (30).
Although you specify an active bonus value on each device, the BIG-IP system uses the active bonus specified on the active device only, to
contribute to the score of the active device. The BIG-IP system never uses the active bonus on the standby device to contribute to the score of the

4 of 6 5/2/2014 12:24 PM
AskF5 | Manual Chapter: Fail-Safe and Fast Failover http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmo...

Solutions Products Community Support Partners Education About Us

LOGIN SELF-HELP DOCUMENTATION SERVICES DOWNLOADS


For example, if you assigned a weight of 30 to the trunk, and one of the four trunk members fails, the trunk score becomes 23 (75% of 30), putting
the device at risk for failover. However, if you specified an active bonus of 7 or higher, failover would not actually occur, because a score of 7 or
higher, when added to the score of 23, is greater than or equal to 30.
For the procedure on specifying an active bonus, see 6 Device 1 regains two trunk member Device 2 remains the active device even when one trunk
member is unavailable, due to the active bonus..

Example
You can prevent failover from occur i ng by specifying an active bonus value. Table 8.4, and the list that follows, show how configuring an active
bonus for the active device can affect failover.

Device
Trunk Score
Members Active (Trunk Score + Fail
Event Available Trunk Score Bonus Active Bonus) over?

Device Device Device Device Device Device Device Device


1 2 1 2 1 2 1 2

1 Device 4 (100%) 4 30 30 10 0 40 30 No
1 is (100%) (100% of 30) (100% of

active 30)

(initial
state)

2 Device 1 3 (75%) 4 23 30 10 0 33 30 No
loses a (100%) (75% of 30) (100% of 30)

trunk
member

3 Device 1 2 (50%) 4 15 30 10 0 25 30 Yes


loses (100%) (50% of 30) (100% of 30)

another
trunk
member

4 Device 1 2 (50%) 4 15 30 0 10 15 40 N/A


goes to (100%) (50% of 30) (100% of 30)

standby.
Device 2
goes
active.

5 Device 2 2 (50%) 3 15 23 0 10 15 33 No
loses a (75%) (50% of 30) (75% of 30)

trunk
member

6 Device 1 4 (100%) 3 30 23 0 10 30 33 No
regains (75%) (100% of 30) (75% of 30)

two trunk
members

Table 8.4 Example of configuring an active bonus to prevent failover

To help understand Table 8.4, the row numbers in the left column of the table correspond to the explanations below:
1 Device 1 is active (initial state)
With all trunk members available on both units, and the active bonus configured, the active device (device 1) retains the higher device score and
therefore remains active.
2 Device 1 loses a trunk member
The device score for device 1 is still higher than the score for device 2, due to an active bonus value of 10.

5 of 6 5/2/2014 12:24 PM
AskF5 | Manual Chapter: Fail-Safe and Fast Failover http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmo...

Solutions Products Community Support Partners Education About Us

LOGIN SELF-HELP DOCUMENTATION SERVICES DOWNLOADS


If the active device (device 2) loses a trunk member, the score on device 2 is still higher than device 1 (with two unavailable members), due to the
active bonus.
6 Device 1 regains two trunk member
Device 2 remains the active device even when one trunk member is unavailable, due to the active bonus.

Table of Contents | << Previous Chapter | Next Chapter >>

Was this resource helpful in solving your issue? Additional Comments (optional)

Yes - this resource was helpful


No - this resource was not helpful
I don‘t know yet
NOTE: Please do not provide personal information.

Please enter the words to the right:

Reload Audio Help

Submit

How to Buy Contact F5 Policies


Join DevCentral Careers Trademarks
Ask a Question Events © 2014 F5 Networks, Inc. All rights reserved.
Email Preferences

6 of 6 5/2/2014 12:24 PM

You might also like