You are on page 1of 24

Linux Clustering For a Failover Scenarios

Introduction and Advantages/Disadvantages of


Clustering in Linux
What is Clustering

Clustering is establishing connectivity among two or more servers in order to make it work like
one. Clustering is a very popular technic among Sys-Engineers that they can cluster servers as
a failover system, a load balance system or a parallel processing unit.
By this series of guide, I hope to guide you to create a Linux cluster with two nodes
on RedHat/CentOS for a failover scenario.
Since now you have a basic idea of what clustering is, let’s find out what it means when it
comes to failover clustering. A failover cluster is a set of servers that works together to maintain
the high availability of applications and services.

For an example, if a server fails at some point, another node (server) will take over the load and
gives end user no experience of down time. For this kind of scenario, we need at
least 2 or 3 servers to make the proper configurations.
I prefer we use 3 servers; one server as the red hat cluster enabled server and others as nodes
(back end servers). Let’s look at below diagram for better understanding.
Cluster Server: 172.16.1.250
Hostname: clserver.test.net

node01: 172.16.1.222
Hostname: nd01server.test.net

node02: 172.16.1.223
Hostname: nd02server.test.net
Clustering Diagram
In above scenario, cluster management is done by a separate server and it handles two nodes
as shown by the diagram. Cluster management server constantly sends heartbeat signals to
both nodes to check whether if anyone is failing. If anyone has failed, the other node takes over
the load.

Advantages of Clustering Servers


 Clustering servers is completely a scalable solution. You can add resources to the cluster
afterwards.
 If a server in the cluster needs any maintenance, you can do it by stopping it while handing
the load over to other servers.
 Among high availability options, clustering takes a special place since it is reliable and easy
to configure. In case of a server is having a problem providing the services furthermore,
other servers in the cluster can take the load.
Disadvantages of Clustering Servers
 Cost is high. Since the cluster needs good hardware and a design, it will be costly
comparing to a non-clustered server management design. Being not cost effective is a
main disadvantage of this particular design.
 Since clustering needs more servers and hardware to establish one, monitoring and
maintenance is hard. Thus increase the infrastructure.
Now let’s see what kind of packages/installations we need to configure this setup successfully.
The following packages/RPMs can be downloaded by rpmfind.net.
 Ricci (ricci-0.16.2-75.el6.x86_64.rpm)
 Luci (luci-0.26.0-63.el6.centos.x86_64.rpm)
 Mod_cluster (modcluster-0.16.2-29.el6.x86_64.rpm)
 CCS (ccs-0.16.2-75.el6_6.2.x86_64.rpm)
 CMAN(cman-3.0.12.1-68.el6.x86_64.rpm)
 Clusterlib (clusterlib-3.0.12.1-68.el6.x86_64.rpm)
Let’s see what each installation does for us and their meanings.
 Ricci is a daemon which used for cluster management and configurations. It
distributes/dispatches receiving messages to the nodes configured.
 Luci is a server that runs on the cluster management server and communicates with other
multiple nodes. It provides a web interface to make things easier.
 Mod_cluster is a load balancer utility based on httpd services and here it is used to
communicate the incoming requests with the underlying nodes.
 CCS is used to create and modify the cluster configuration on remote nodes through ricci. It
is also used to start and stop the cluster services.
 CMAN is one of the primary utilities other than ricci and luci for this particular setup, since
this acts as the cluster manager. Actually, cman stands for CLUSTER MANAGER. It is a
high-availability add-on for RedHat which is distributed among the nodes in the cluster.

Install and Configure Cluster with Two Nodes


in Linux

As I said in my last article, that we prefer 3 servers for this setup; one server act as a cluster
server and others as nodes.
Cluster Server: 172.16.1.250
Hostname: clserver.test.net

node01: 172.16.1.222
Hostname: nd01server.test.net

node02: 172.16.1.223
Hostname: nd02server.test.net

In today’s Part 2, we will see how to install and configure clustering on Linux. For this we need to
install below packages in all three servers.
1. Ricci (ricci-0.16.2-75.el6.x86_64.rpm)
2. Luci (luci-0.26.0-63.el6.centos.x86_64.rpm)
3. Mod_cluster (modcluster-0.16.2-29.el6.x86_64.rpm)
4. CCS (ccs-0.16.2-75.el6_6.2.x86_64.rpm)
5. CMAN(cman-3.0.12.1-68.el6.x86_64.rpm)
6. Clusterlib (clusterlib-3.0.12.1-68.el6.x86_64.rpm)

Step 1: Installing Clustering in Linux


So let’s start installing these packages in all three servers. You can easily install all these
packages using yum package manager.
I will start by installing “ricci” package on all these three servers.
# yum install “ricci”

Install Ricci Package


After ricci installation is done, we can see it has installed mod_cluster and cluster lib as its
dependencies.
Ricci Installed Summary
Next I’m installing luci using yum install “luci” command.
# yum install "luci"

Install Luci Package


After the installation of luci, you can see it has installed the dependencies it needed.

Luci Package Installed Summary


Now, let’s install ccs package in the servers. For that I entered yum install ccs.x86_64 which is
shown in the list when I issued yum list |grep “ccs” or else you can simply issue yum install “ccs”.

# yum install “ccs”


Install CSS Package
Let’s install cman as the last requirement for this particular setup. The command is yum install
“cman” or yum install cman.x86_64 as shown in the yum list as I mentioned earlier.
# yum install “cman”

Install CMAN Package


We need to confirm the installations are in place. Issue below command to see whether the
packages we needed are installed properly in all three servers.

# rpm -qa | egrep "ricci|luci|modc|cluster|ccs|cman"


All Packages Installed
Perfect all the packages are installed and all we need to do is configuring the setup.

Step 2: Configure Cluster in Linux


1. As the first step for setting up the cluster, you need to start the ricci service on all three
servers.
# service ricci start
OR
# /etc/init.d/ricci start

Start Ricci Service on Cluster Server


Start Ricci On Node 01

Start Ricci On Node 02


2. Since ricci is started in all servers, now it’s time to create the cluster. This is
where ccs package comes to our help when configuring the cluster.
If you don’t want to use ccs commands then you will have to edit the  “cluster.conf”  file for
adding the nodes and do other configs. I guess easiest way is to use following commands. Let’s
have a look.
Since I haven’t created the cluster yet, there’s no cluster.conf file created in /etc/cluster location
yet as shown below.
# cd /etc/cluster
# pwd
# ls
Check Cluster Configuration File
In my case, I do this in 172.16.1.250 which is dedicated for cluster management. Now onwards,
everytime we try to use ricci server, it will ask for ricci’s password. So you will have to set the
password of ricci user in all servers.
Enter passwords for ricci user.

# passwd ricci

Set Ricci Password


Now enter the command as shown below.

# ccs -h 172.16.1.250 --createcluster tecmint_cluster

You can see after entering above command, cluster.conf file is created in /etc/cluster directory.

Create Cluster Configuration


This is how my default cluster.conf looks like before I do the configs.

Cluster Configuration
3. Now let’s add the two nodes to the system. In here also we use ccs commands to make the
configurations. I’m not going to manually edit the cluster.conf file but use the following syntax.

# ccs -h 172.16.1.250 --addnode 172.16.1.222

Add Nodes to Cluster


Add the other node too.

# ccs -h 172.16.1.250 --addnode 172.16.1.223

Add Second Node to Cluster


This is how cluster.conf file looks like after adding the node servers.
Cluster Configuration with Nodes
You also can enter below command to verify node details.

# ccs –h 172.16.1.250 --lsnodes

Confirm Cluster Node Details


Perfect. You have successfully created the cluster yourself and added two nodes. For further
details about ccscommand options, enter ccs –help command and study the details.

Fencing and Adding a Failover to Clustering

First of all let’s see what is meant by Fencing and Failover.


What is Fencing?
If we think of a setup with more than one nodes, it is possible that one or more nodes can be
failed at some point of time. So in this case fencing is isolating the malfunctioning server from
the cluster in order to protect and secure the synced resources. Therefore we can add a fence
to protect the resources shared within the cluster.

What is Failover?
Imagine a scenario, where a server has important data for an organization which the
stakeholders need the organization to keep the server up and running without any down time
experienced. In this case we can duplicate the data to another server (now there are two
servers with identical data and specs) which we can use as the fail-over.

By any chance, one of the servers goes down, the other server which we have configured as
the fail-over will take over the load and provides the services which were given by the first
server. In this method, users will not be experienced the down time period which was caused to
the primary server.

As we’ve already discussed about our testing environment setup in last two articles, that we’re
using three servers for this setup, the first server act as a Cluster server and other two as
nodes.
Cluster Server: 172.16.1.250
Hostname: clserver.test.net

node01: 172.16.1.222
Hostname: nd01server.test.net

node02: 172.16.1.223
Hostname: nd02server.test.net

Step 1: How to Add Fencing to Cluster Server


1. First we have to enable fencing on the cluster server, for this I will use below two commands.
# ccs -h 172.16.1.250 --setfencedaemon post_fail_delay=0
# ccs -h 172.16.1.250 --setfencedaemon post_join_delay=10

Enable Fencing on Cluster


As you can see we use ccs command to add the configurations to cluster. Following are
definitions of the options I have used in the command.
1. -h: Cluster host IP address.
2. –setfencedaemon: Applies the changes to the fencing daemon.
3. post_fail_delay: Time in seconds which the daemon waits before fencing a victim server
when a node has been failed.
4. post_join_delay: Time in seconds which the daemon waits before fencing victim server
when a node has joined the cluster.
2. Now let’s add a fence device for our cluster, execute below command to add a fence device.
# ccs -h 172.16.1.250 --addfencedev tecmintfence agent=fence_virt

This is how I executed the command and how the  cluster.conf  file looks like after adding a
fence device.

Add Fencing Device in Cluster


You can execute below command to see what kind of fence options you can use to create a
fence device. I used fence_virt since I use VMs for my setup.
# ccs -h 172.16.1.250 --lsfenceopts
Fence Options

Step 2: Add Two Nodes to Fence Device


3. Now I’m going to add a method to the created fence device and add hosts in to it.
# ccs -h 172.16.1.250 --addmethod Method01 172.16.1.222
# ccs -h 172.16.1.250 --addmethod Method01 172.16.1.223

You have to add the methods you have created while ago for the both nodes you have in your
setup. Following is how I added methods and my cluster.conf.
Add Nodes to Fence Device
4. As the next step, you will have to add the fence methods you created for the both nodes, to
the fence device we created namely “tecmintfence”.
# ccs -h 172.16.1.250 --addfenceinst tecmintfence 172.16.1.222 Method01
# ccs -h 172.16.1.250 --addfenceinst tecmintfence 172.16.1.223 Method01

I have successfully associated my methods with the fence device and this is how
my cluster.conf looks like now.
Add Fence to Nodes
Now you have successfully configured fence device, methods and added your nodes to it. As
the last step of part 03, I will now show you how to add a failover to the setup.

Step 3: Add Failover to Cluster Server


5. I use below syntax of commands to create my fail-over to the cluster setup.
# ccs -h 172.16.1.250 --addfailoverdomain tecmintfod ordered
Add Failover to Cluster
6. As you have created the fail-over domain, now you can add two nodes to it.
# ccs -h 172.16.1.250 --addfailoverdomainnode tecmintfod 172.16.1.222 1
# ccs -h 172.16.1.250 --addfailoverdomainnode tecmintfod 172.16.1.223 2
Add Nodes to Cluster Failover
As it is shown above, you can see cluster.conf bears all the configurations I have added for the
fail-over domain.

How to Sync Cluster Configuration and Verify


Failover Setup in Nodes
We will start by adding resources to the cluster. In this case we can add a file system or a web
service as your need. Now I have /dev/sda3 partition mounted to /x01 which I wish to add as a file
system resource.
1. I use below command to add a file system as a resource:
# ccs -h 172.16.1.250 --addresource fs name=my_fs
device=/dev/mapper/tecminttest_lv_vol01 mountpoint=/x01 fstype=ext3

Add Filesystem to Cluster

Additionally, if you want to add a service also, you can by using below methodology. Issue the
following command.

# ccs -h 172.16.1.250 --addservice my_web domain=testdomain recovery=relocate


autostart=1

You can verify it by viewing the  cluster.conf  file as we did in previous lessons.
2. Now enter following entry in cluster.conf file to add a reference tag to the service.
<fs ref="my_fs"/>
Add Service to Cluster

3. All set. No we will see how we can sync the configurations we made to cluster among the 2
nodes we have. Following command will do the needful.
# ccs -h 172.16.1.250 --sync --activate

Sync Cluster Configuration

Note: Enter passwords we set for ricci in the early stages when we were installing packages.
You can verify your configurations by using below command.
# ccs -h 172.16.1.250 --checkconf

Verify Cluster Configuration

4. Now it’s time to start the things up. You can use one of below commands as you prefer.
To start only one node use the command with relevant IP.

# ccs -h 172.16.1.222 start

Or if you want to start all nodes use  --startall  option as follows.

# ccs -h 172.16.1.250 –startall

You can use stop or  --stopall  if you needed to stop the cluster.
In a scenario like if you wanted to start the cluster without enabling the resources (resources will
automatically be enabled when the cluster is started), like a situation where you have
intentionally disabled the resources in a particular node in order to disable fencing loops, you
don’t want to enable those resources when the cluster is starting.

For that purpose you can use below command which starts the cluster but does not enable the
resources.

# ccs -h 172.16.1.250 --startall --noenable

5. After the cluster has been started up, you can view the stats by issuing clustat command.
# clustat
Check Cluster Status

Above output says there are two nodes in the cluster and both are up and running at the
moment.

6. You can remember we have added a failover mechanism in our previous lessons. Want to
check it works? This is how you do it. Force shutdown one node and look for cluster stats
using clustat command for the results of failover.
I have shut down my node02server(172.16.1.223) using shutdown -h now command. Then
executed clustatcommand from my cluster_server(172.16.1.250).

Check Cluster FailOver

Above output clarifies you that node 1 is online while node 2 has gone offline as we shut it down.
Yet service and the file system we shared are still online as you can see if you check it
on node01 which is online.
# df -h /x01

Verify Cluster Node

Refer the  cluster.conf  file with whole config set relevant to our setup used for tecmint.

<?xml version="1.0"?>
<cluster config_version="15" name="tecmint_cluster">
<fence_daemon post_join_delay="10"/>
<clusternodes>
<clusternode name="172.16.1.222" nodeid="1">
<fence>
<method name="Method01">
<device name="tecmintfence"/>
</method>
</fence>
</clusternode>
<clusternode name="172.16.1.223" nodeid="2">
<fence>
<method name="Method01">
<device name="tecmintfence"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_virt" name="tecmintfence"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="tecmintfod" nofailback="0"
ordered="1" restricted="0">
<failoverdomainnode name="172.16.1.222"
priority="1"/>
<failoverdomainnode name="172.16.1.223"
priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device="/dev/mapper/tecminttest_lv_vol01"
fstype="ext3" mountpoint="/x01" name="my_fs"/>
</resources>
<service autostart="1" domain="testdomain" name="my_web"
recovery="relocate"/>
<fs ref="my_fs"/>
</rm>
</cluster>

Hope you’ll enjoyed the whole series of clustering lessons. Keep in touch with tecmint for more
handy guides everyday and feel free to comment your ideas and queries.

You might also like