Professional Documents
Culture Documents
Clustering is establishing connectivity among two or more servers in order to make it work like
one. Clustering is a very popular technic among Sys-Engineers that they can cluster servers as
a failover system, a load balance system or a parallel processing unit.
By this series of guide, I hope to guide you to create a Linux cluster with two nodes
on RedHat/CentOS for a failover scenario.
Since now you have a basic idea of what clustering is, let’s find out what it means when it
comes to failover clustering. A failover cluster is a set of servers that works together to maintain
the high availability of applications and services.
For an example, if a server fails at some point, another node (server) will take over the load and
gives end user no experience of down time. For this kind of scenario, we need at
least 2 or 3 servers to make the proper configurations.
I prefer we use 3 servers; one server as the red hat cluster enabled server and others as nodes
(back end servers). Let’s look at below diagram for better understanding.
Cluster Server: 172.16.1.250
Hostname: clserver.test.net
node01: 172.16.1.222
Hostname: nd01server.test.net
node02: 172.16.1.223
Hostname: nd02server.test.net
Clustering Diagram
In above scenario, cluster management is done by a separate server and it handles two nodes
as shown by the diagram. Cluster management server constantly sends heartbeat signals to
both nodes to check whether if anyone is failing. If anyone has failed, the other node takes over
the load.
As I said in my last article, that we prefer 3 servers for this setup; one server act as a cluster
server and others as nodes.
Cluster Server: 172.16.1.250
Hostname: clserver.test.net
node01: 172.16.1.222
Hostname: nd01server.test.net
node02: 172.16.1.223
Hostname: nd02server.test.net
In today’s Part 2, we will see how to install and configure clustering on Linux. For this we need to
install below packages in all three servers.
1. Ricci (ricci-0.16.2-75.el6.x86_64.rpm)
2. Luci (luci-0.26.0-63.el6.centos.x86_64.rpm)
3. Mod_cluster (modcluster-0.16.2-29.el6.x86_64.rpm)
4. CCS (ccs-0.16.2-75.el6_6.2.x86_64.rpm)
5. CMAN(cman-3.0.12.1-68.el6.x86_64.rpm)
6. Clusterlib (clusterlib-3.0.12.1-68.el6.x86_64.rpm)
# passwd ricci
Cluster Configuration
3. Now let’s add the two nodes to the system. In here also we use ccs commands to make the
configurations. I’m not going to manually edit the cluster.conf file but use the following syntax.
What is Failover?
Imagine a scenario, where a server has important data for an organization which the
stakeholders need the organization to keep the server up and running without any down time
experienced. In this case we can duplicate the data to another server (now there are two
servers with identical data and specs) which we can use as the fail-over.
By any chance, one of the servers goes down, the other server which we have configured as
the fail-over will take over the load and provides the services which were given by the first
server. In this method, users will not be experienced the down time period which was caused to
the primary server.
As we’ve already discussed about our testing environment setup in last two articles, that we’re
using three servers for this setup, the first server act as a Cluster server and other two as
nodes.
Cluster Server: 172.16.1.250
Hostname: clserver.test.net
node01: 172.16.1.222
Hostname: nd01server.test.net
node02: 172.16.1.223
Hostname: nd02server.test.net
This is how I executed the command and how the cluster.conf file looks like after adding a
fence device.
You have to add the methods you have created while ago for the both nodes you have in your
setup. Following is how I added methods and my cluster.conf.
Add Nodes to Fence Device
4. As the next step, you will have to add the fence methods you created for the both nodes, to
the fence device we created namely “tecmintfence”.
# ccs -h 172.16.1.250 --addfenceinst tecmintfence 172.16.1.222 Method01
# ccs -h 172.16.1.250 --addfenceinst tecmintfence 172.16.1.223 Method01
I have successfully associated my methods with the fence device and this is how
my cluster.conf looks like now.
Add Fence to Nodes
Now you have successfully configured fence device, methods and added your nodes to it. As
the last step of part 03, I will now show you how to add a failover to the setup.
Additionally, if you want to add a service also, you can by using below methodology. Issue the
following command.
You can verify it by viewing the cluster.conf file as we did in previous lessons.
2. Now enter following entry in cluster.conf file to add a reference tag to the service.
<fs ref="my_fs"/>
Add Service to Cluster
3. All set. No we will see how we can sync the configurations we made to cluster among the 2
nodes we have. Following command will do the needful.
# ccs -h 172.16.1.250 --sync --activate
Note: Enter passwords we set for ricci in the early stages when we were installing packages.
You can verify your configurations by using below command.
# ccs -h 172.16.1.250 --checkconf
4. Now it’s time to start the things up. You can use one of below commands as you prefer.
To start only one node use the command with relevant IP.
You can use stop or --stopall if you needed to stop the cluster.
In a scenario like if you wanted to start the cluster without enabling the resources (resources will
automatically be enabled when the cluster is started), like a situation where you have
intentionally disabled the resources in a particular node in order to disable fencing loops, you
don’t want to enable those resources when the cluster is starting.
For that purpose you can use below command which starts the cluster but does not enable the
resources.
5. After the cluster has been started up, you can view the stats by issuing clustat command.
# clustat
Check Cluster Status
Above output says there are two nodes in the cluster and both are up and running at the
moment.
6. You can remember we have added a failover mechanism in our previous lessons. Want to
check it works? This is how you do it. Force shutdown one node and look for cluster stats
using clustat command for the results of failover.
I have shut down my node02server(172.16.1.223) using shutdown -h now command. Then
executed clustatcommand from my cluster_server(172.16.1.250).
Above output clarifies you that node 1 is online while node 2 has gone offline as we shut it down.
Yet service and the file system we shared are still online as you can see if you check it
on node01 which is online.
# df -h /x01
Refer the cluster.conf file with whole config set relevant to our setup used for tecmint.
<?xml version="1.0"?>
<cluster config_version="15" name="tecmint_cluster">
<fence_daemon post_join_delay="10"/>
<clusternodes>
<clusternode name="172.16.1.222" nodeid="1">
<fence>
<method name="Method01">
<device name="tecmintfence"/>
</method>
</fence>
</clusternode>
<clusternode name="172.16.1.223" nodeid="2">
<fence>
<method name="Method01">
<device name="tecmintfence"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_virt" name="tecmintfence"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="tecmintfod" nofailback="0"
ordered="1" restricted="0">
<failoverdomainnode name="172.16.1.222"
priority="1"/>
<failoverdomainnode name="172.16.1.223"
priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device="/dev/mapper/tecminttest_lv_vol01"
fstype="ext3" mountpoint="/x01" name="my_fs"/>
</resources>
<service autostart="1" domain="testdomain" name="my_web"
recovery="relocate"/>
<fs ref="my_fs"/>
</rm>
</cluster>
Hope you’ll enjoyed the whole series of clustering lessons. Keep in touch with tecmint for more
handy guides everyday and feel free to comment your ideas and queries.