You are on page 1of 14



High-availability (HA) clustering is a solution that uses clustering software and special purpose hardware to minimize system downtime HA clusters are groups of computing resources that are implemented to provide high availability of software and hardware computing services

Basic Work Done

Putting together a group of computers which trust each other to

provide a service even when system components fail

When one machine goes down, others take over its work. This

involves IP address takeover, service takeover, etc.

If 1 node shuts down or fails, another node takes over application

load and facilitates planned maintenance

Performs its function continuously for a significantly longer period

of Time

HA clusters usually use a Heartbeat private network connection

which is used to monitor the health and status of each node in the cluster
HA Cluster consists of R.A.S.

Reliability: High degree of protection for corporate data as information is a crucial business asset
Availability: Continuous data access Serviceability: Procedures to correct problems with minimal business impact

HA Cluster Categories
There are two main Categories of HA Cluster: Shared Disk: There is only one SHARED Disk. All nodes have access to that same storage. A locking mechanism protects against race.

Shared Nothing clusters: At any given time, only one node owns a disk. When a node fails, another owns it.


HA Clusters introduce concepts and complications around: Split-Brain Quorum Fencing One subtle, but serious condition all clustering software must be able to handle is split-brain

Split Brain
Split-brain occurs when all of the private links go down simultaneously, but the cluster nodes are still running If that happens, each node in the cluster may mistakenly decide that every other node has gone down and attempt to start services that other nodes are still running Having duplicate instances of services may cause data corruption on the shared storage

This condition is called SPLIT BRAIN condition

Quorum is an attempt to avoid split brain for most kinds of failures Typically one tries to make sure only one partition can be active and Quorum is term for methods for ensuring this One disadvantage is that this doesn't work very well for 2 nodes

Fencing tries to put a fence around an errant node or nodes to keep them from accessing cluster resources This way one doesn't have to rely on correct behaviour or timing of the errant node. We use STONITH to do this STONITH: Shoot The Other Node In The Head

The most common size for an HA Cluster is a two-node cluster and such configuration can sometimes be categorized into: Active/Active: Traffic intended for the failed node is either passed onto an existing node or load balanced across the remaining nodes Active/Passive: Provides a fully redundant instance of each node, which is only brought online when its associated primary node fails

N-to-1: Allows the failover standby node to become the active one temporarily, until the original node can be restored or brought back online


The usual goal of virtualization is to centralize administrative tasks while improving scalability and work loads.
They allow to run multiple virtual servers on a single physical machine. By combining virtualization and HA clustering, it is possible to benefit from increased manageability and savings from server consolidation through virtualization without decreasing uptime of critical services.

Systems that handle failures have different strategies to get rid of a failure, these are three ways to configure a failover:
FAIL_FAST: The try fails, if the first node cannot be reached ON_FAIL_TRY_ONE_NEXT_AVAILABLE: Tries one more host before giving up ON_FAIL_TRY_ALL_AVAILABLE: Tries all existing nodes before giving up

Supports many operating systems like Windows, Linux, Sun Solaris, etc. Simple to install, configure and maintain

Often used for critical databases, file-sharing on a network, business applications, etc.
Handles and solves Split-Brain condition easily Provides facility like Heartbeat private Network to maintain the health on cluster nodes