You are on page 1of 6




@ @ 


Clustering is a complex technology with lots of messy details. To make it easier to understand, let's

take a look at the big picture of how clustering works. In this article we take a look at:

i cctive vs. Passive Nodes

i „hared Disk crray
i The Quorum
i Public and Private Networks
i The Virtual „erver
i £ow a Failover Works


clthough a „QL „erver 2005 cluster can support up to eight nodes, clustering actually only occurs

between two nodes at a time. This is because a single „QL „erver 2005 instance can only run on a

single node at a time, and should a failover occur, the failed instance can only fail over to another

individual node. This adds up to two nodes. Clusters of three or more nodes are only used where you

need to cluster multiple instances of „QL „erver 2005.

In a two-node „QL „erver 2005 cluster, one of the physical server nodes is referred to as the active

node, and the other one is referred to as the passive node. It doesn't matter which physical servers in

a cluster is designated as the active or the passive, but it is easier, from an administrative point of

view, to go ahead and assign one node as the active and the other as the passive. This way, you won't

get confused about which physical server is performing which role at the current time.

When we refer to an active node, we mean that this particular node is currently running an active

instance of „QL „erver 2005 and that it is accessing the instance's databases, which are located on a

shared data array.

When we refer to a passive node, we mean that this particular node is not currently in production and

it is not accessing the instance's databases. When the passive node is not in production, it is in a state

of readiness, so that if the active node fails, and a failover occurs, it can automatically go into

production and begin accessing the instance's databases located on the shared disk array. In this

case, the passive mode then becomes the active node, and the formerly active node now becomes the

passive node (or failed node should a failure occur that prevents it from operating).


„o what is a shared disk array? Unlike non-clustered „QL „erver 2005 instances, which usually store

their databases on locally attached disk storage, clustered „QL „erver 2005 instances store data on a

shared disk array. By shared, we mean that both nodes of the cluster are physically connected to the

disk array, but that only the active node can access the instance's databases. There is never a case

where both nodes of a cluster are accessing an instance's databases at the same time. This is to

ensure the integrity of the databases.

Generally speaking, a shared disk array is a „C„I- or fiber-connected RcID 5 or RcID 10 disk array

housed in a stand-alone unit, or it might be a „cN. This shared array must have at least two logical

partitions. One partition is used for storing the clustered instance's „QL „erver databases, and the

other is used for the quorum.


When both nodes of a cluster are up and running, participating in their relevant roles (active and

passive) they communicate with each other over the network. For example, if you change a

configuration setting on the active node, this configuration change is automatically sent to the passive

node and the same change made. This generally occurs very quickly, and ensures that both nodes are


But, as you might imagine, it is possible that you could make a change on the active node, but before

the change is sent over the network and the same change made on the passive node (which will

become the active node after the failover), that the active node fails, and the change never gets to the
passive node. Depending on the nature of the change, this could cause problems, even causing both

nodes of the cluster to fail.

To prevent this from happening, a „QL „erver 2005 cluster uses what is called a quorum, which is

stored on the quorum drive of the shared array. c quorum is essentially a log file, similar in concept to

database logs. Its purpose is to record any change made on the active node, and should any change

recorded here not get to the passive node because the active node has failed and cannot send the

change to the passive node over the network, then the passive node, when it becomes the active

node, can read the quorum file and find out what the change was, and then make the change before it

becomes the new active node.

In order for this to work, the quorum file must reside on what is called the quorum drive. c quorum

drive is a logical drive on the shared array devoted to the function of storing the quorum.

!     "

Each node of a cluster must have at least two network cards. One network card will be connected to

the public network, and the other to a private network.

The public network is the network that the „QL „erver 2005 clients are attached, and this is how they

communicate to a clustered „QL „erver 2005 instance.

The private network is used solely for communications between the nodes of the cluster. It is used

mainly for what is called the heartbeat signal. In a cluster, the active node puts out a heartbeat signal,

which tells the other nodes in the cluster that it is working. „hould the heartbeat signal stop then a

passive node in the cluster becomes aware that the active node has failed, and that it should at this

time initiate a failover so that it can become the active node and take control over the „QL „erver

2005 instance.

One of the biggest mysteries of clustering is how do clients know when and how to switch

communicating from a failed cluster node to the now new active node? cnd the answer may be a

surprise. They don't. That's right; „QL „erver 2005 clients don't need to know anything about specific

nodes of a cluster (such as the NETBIO„ name or IP address of individual cluster nodes). This is

because each clustered „QL „erver 2005 instance is given a virtual name and IP address, which

clients use to connect to the cluster. In other words, clients don't connect to a node's specific name or

IP address, but instead connect to a virtual name and IP address that stays the same no matter what

node in a cluster is active.

When you create a cluster, one of the steps is to create a virtual cluster name and IP address. This

name and IP address is used by the active node to communicate with clients. „hould a failover occur,

then the new active node uses this same virtual name and IP address to communicate with clients.

This way, clients only need to know the virtual name or IP address of the clustered instance of „QL

„erver, and a failover between nodes doesn't change this. ct worst, when a failover occurs, there may

be an interruption of service from the client to the clustered „QL „erver 2005 instance, but once the

failover has occurred, the client can once again reconnect to the instance using the same virtual name

or IP address.

cc      $ 




@ @ 


£" $  %&      '

While there can be many different causes of a failover, let's look at the case where the power stops for

the active node of a cluster and the passive node has to take over. This will provide a general

overview of how a failover occurs.

Let's assume that a single „QL „erver 2005 instance is running on the active node of a cluster, and

that a passive node is ready to take over when needed. ct this time, the active node is communicating

with both the database and the quorum on the shared array. Because only a single node at a time can

be communicating with the shared array, the passive node is not communicating with the database or

the quorum. In addition, the active node is sending out heartbeat signals over the private network,

and the passive node is monitoring them to see if they stop. Clients are also interacting with the active

node via the virtual name and IP address, running production transactions.

Now, for whatever reason, the active node stops working because it no longer is receiving any

electricity. The passive node, which is monitoring the heartbeats from the active node, now notices

that it is not receiving the heartbeat signal. cfter a predetermined delay, the passive node assumes

that the active node has failed and it initiates a failover. cs part of the failover process, the passive

node (now the active node) takes over control of the shared array and reads the quorum, looking for

any unsynchronized configuration changes. It also takes over control of the virtual server name and IP

address. In addition, as the node takes over the databases, it has to do a „QL „erver startup, using

the databases, just as if it is starting from a shutdown, going through a database recovery. The time

this takes depends on many factors, including the speed of the system and the number of transactions

that might have to be rolled forward or back during the database recovery process. Once the recovery

process is complete, the new active nodes announces itself on the network with the virtual name and

IP address, which allows the clients to reconnect and begin using the „QL „erver 2005 instance with

minimal interruption.


That's the big picture of how „QL „erver 2005 clustering works. If you are new to „QL „erver

clustering, it is important that you understand these basic concepts before you begin to drill down into
the detail. In later articles, I will discuss, in great detail, how to plan, build, and administer a „QL

„erver 2005 cluster.

cc      $