Professional Documents
Culture Documents
On RHEL/CentOS/RockyLinux
Last updated on March 17th, 2023 - by LinuxTeck - Leave a Comment
This article will help you learn how to setup/configure a High-Availability (HA) cluster on
Linux/Unix based systems. Cluster is nothing but a group of computers (called
nodes/members) to work together to execute a task. Basically there are four types of
clusters available, which are Storage Cluster, High-availability Cluster, Load-balancing
Cluster, and HIGH-Performance Computing Cluster. In production, HA (High-
Availability) and LB (Load Balancing) Clusters are the most deployed cluster types in
the clustered environment. They offer, uninterrupted availability of services/data as they
can be (for eg: web services) to the end-user community. HA Cluster configurations are
sometimes grouped into two subsets: (Active-active and Active-passive).
Active-active: Typically you need a minimum of two nodes, both nodes should be
running the same service/application actively. This is mainly used to achieve Load
Balancing (LB) Cluster to distribute the workloads across the nodes.
This step-by-step guide will help you on how to configure a High-Availability (HA) / Fail-
over cluster with common iscsi shared storage on RHEL/CentOS 7.6. You can use the
same guide for all the versions of RHEL/CentOS/Fedora with a few minimal changes.
Prerequisites:
Operating System : CentOS Linux 7
Shared Storage : iSCSI SAN
Floating IP address : For Cluster nodes
package : pcs, fence-agents-all and targetcli
My Lab Setup :
For the lab setup, I am using 3 centos machines. Two for Cluster nodes and one for
ISCSI/Target Server
Node-1:
Use the following command to check the available block device to use for a Storage
Server.
# lsblk
Output:
From the above command, it will list all (/dev/sda and /dev/sdb) the block devices in a
tree format. In our demo, I will be using "/dev/sdb" with 1GB disk as shared storage for
cluster nodes.
Note:
Shared storage becomes an important resource for all high-availability clusters as it needs to
provide the same type of application data across all the nodes in the cluster and it should be
accessed either consecutively or at the same time of running an application in the cluster. SAN
storage will be widely used in production. For our LAB, we will use ISCSI shared storage for our
HA Cluster.
Add the following entries into /etc/hosts file in the following format "IP Address Domain-
name [Domain-aliases]". This will help resolve host-names, which means it can easily
bind local IP addresses into a host name, web address, or URLs.
# vi /etc/hosts
Note:
The above fields are separated by at least one space or tab. The 1st field is the numeric IP
address and the 2nd field specifies the locally-known host name connected to the IP address of
the 1st field and the 3rd field will be aliases or alternate name for the given host-name.
To learn more about DNS: click here How to set up Domain Name Services (DNS) on
Linux
First, let's update to the latest current version and then install the target utility package.
# yum update -y
# yum install -y targetcli
Now follow the below command to get into the interactive shell of the iSCSI Server.
# targetcli
/> cd /iscsi/iqn.2020-01.local.server-iscsi:server/tpg1
/iscsi/iqn.20...i:server/tpg1> cd acls/iqn.2020-01.local.client-iscsi:client1
/iscsi/iqn.20...iscsi:client1> cd /
/> ls
/> saveconfig
/> exit
(f) Add a firewall rule to permit iscsi port 3260 OR disable it
# firewall-cmd --list-all
OR
Note:
That's it for the iSCSI configuration part. Click here to see the detailed configuration setup of iSCSI
Server-Client on Centos / RHEL 7.6.
Add the following host entries to all the nodes and shared storage in the cluster. It will
help the systems to communicate with each other using hostnames.
Node:1
# vi /etc/hosts
# vi /etc/hosts
# yum update -y
(ii) Install the iscsi-initiator package on both nodes (Node1 and Node2)
# vi /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2020-01.local.client-iscsi:client1
(iv) Save and restart the iscsid service on both nodes
# vi /etc/iscsi/iscsid.conf
node.session.auth.authmethod = CHAP
node.session.auth.username = linuxteck
node.session.auth.password = password@123
Save the file:
(vi) Now is the time to Discover the iSCSI Shared Storage (LUNs) on both nodes
(Node1 and Node2)
Note:
The LUN was successfully discovered on both nodes (iQNs).
(vii) Use the following command to log in to the Target Server:
# lsblk
Note:
The new disk drive "sdb" with 1GB volume size is visible now on both of the nodes (Node1 and
Node2).
(ix) Use the following command to create a filesystem for the newly added block device
(/dev/sdb) to any one of your nodes, either node1 or node2. I will use it in our demo on
Node1.
# mkfs.xfs /dev/sdb
Note:
Before moving to install the Custer packages, we need to ensure that our shared storage is
accessible on all the nodes with the same data.
For testing purposes, use the following steps to mount the newly added disk
temporarily with /mnt directory and create 3 files named "1, 2, 3", then use 'ls'
command to verify these files are placed in /mnt directory and finally unmount the /mnt
directory from Node1.
# mount /dev/sdb /mnt
# cd /mnt
[root@node1 mnt]# touch 1 2 3
[root@node1 mnt]# ls
123
[root@node1 mnt]# cd
Note:
We have confirmed that our shared storage is working on all the available nodes in Cluster. In our
case, it is working perfectly on both Node1 and Node2. Finally, we have successfully implemented
the LUN "/dev/sdb" on both nodes. That's it. Now moving forward into the Cluster setup.
(b) Install and configure Cluster Setup
(i) Use the following command to Install cluster Packages (pacemaker) on both nodes
(Node1 and Node2)
Note:
Once you have successfully installed the packages on both nodes, then configure the firewall
service to permit the High-Availability application to have a direct connection between the nodes
(Node1 and Node2). If you wish not to apply any firewall rules, then simply disable it.
# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --reload
# firewall-cmd --list-all
(ii) Now, start the cluster service and enable it for every reboot on both nodes (Node1
and Node2).
# systemctl start pcsd
Note:
The real purpose of "hacluster" users in the cluster is to communicate between the nodes. This
user (hacluster) was created during the installation of Cluster software itself. In order for us to
properly communicate, we need to set a password for this account. It is recommended to use the
same password on all nodes.
(iv) Use the following command to authorize the nodes. Execute it to only one of your
nodes in the Cluster. In our case, I would prefer to run it on Node1.
Username: hacluster
Password:
node2.lteck.local: Authorized
node1.lteck.local: Authorized
Note:
The above command is mainly used to authenticate the pcs to the pcsd across the nodes in the
cluster. Authentication should only be done once. The Token (Authorization) key file will be saved
on either one of the paths (~/.pcs/tokens or /var/lib/pcsd/tokens).
(v) Start and configure Cluster Nodes. Execute the following command to only one of
your nodes. In our case, Node1
Note:
Using the above command we have enabled Clusters on both nodes. Next, before adding the
resources to the top of the cluster, we need to check the status of the Clusters.
(vii) Use the following command to get the simple or detailed cluster status
Cluster Status:
Stack: corosync
Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with
quorum
Last updated: Wed Mar 11 19:46:41 2020
Last change: Wed Mar 11 18:58:35 2020 by hacluster via crmd on node1.lteck.local
2 nodes configured
0 resources configured
PCSD Status:
node1.lteck.local: Online
node2.lteck.local: Online
Note:
It will list only the status of your cluster part and the following command will get you the detailed
information of the Cluster which consists of the details of the Nodes, the status of pcs and the
resources.
# pcs status
Output:
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with
quorum
Last updated: Wed Mar 11 19:47:06 2020
Last change: Wed Mar 11 18:58:35 2020 by hacluster via crmd on node1.lteck.local
2 nodes configured
0 resources configured
No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Note:
Based on the output above, we were able to see that the Cluster setup is working perfectly on both
of the nodes, but no resources are configured yet. Next, let's try to add a few resources in order to
complete the cluster setup. Before moving forward let us try to verify the cluster configuration.
# crm_verify -L -V
WARNING:
You will be notified by an error from the above output "Errors like unpack_resources". This means
that the above tool has found some errors regarding the Fencing setup as STONITH is enabled by
default. For our demo setup, we will disable this feature. This option "stonith-enabled=false" is not
recommended for a production cluster setup.
(viii) Setup Fencing
Fencing, also known as STONITH "Shoot The Other Node In The Head", is one of the
important tools in the cluster which can be used to safeguard the data corruption on the
shared storage. Fencing plays a vital role when the nodes are not able to talk to each
other. This will detach the shared storage access from the faulty node. There are two
types available in Fencing: Resource Level Fencing and Node Level Fencing.
For this demo, I am not going to run Fencing (STONITH), as our machines are running
in a VMware environment, which doesn't support it, but for those who are implementing
in a production environment please click here to see the entire setup of fencing
Use the following command to disable the STONITH and ignore the quorum policy and
check the status of Cluster Properties to ensure both are disabled:
Output:
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: linuxteck_cluster
dc-version: 1.1.20-5.el7_7.2-3c4c782f70
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
Note:
The output of Cluster Properties shows that both the STONITH and Quoram Policy are disabled.
(ix) Resources / Cluster Services
For Clustered services, the resources would be either a physical hardware unit such as
disk drive or logical units like IP address, Filesystem or applications. In a cluster, a
resource can run only on a single node at a time. In our demo we will be using the
following resources:
Httpd Service
IP Address
Filesystem
First, let us install and configure the Apache server on both nodes (Node1 and Node2).
Follow these steps:
# vi /etc/httpd/conf/httpd.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
Save the file.
Note:
In order to store Apache files (HTML/CSS) we need to use our centralized storage unit (i.e., iSCSI
server). This setup only has to be done in one node. In our case, Node1.
# mount /dev/sdb /var/www/
# mkdir /var/www/html
# umount /var/www
Note:
That's it for the Apache configuration. Use the following command to add a firewall rule for apache
service on both nodes (Node1 and Node2) OR simply disable the Firewall. Click here to see the
detailed configuration setup of Apache LAMP in Centos / RHEL 7.6.
# firewall-cmd --permanent --add-port=80/tcp
# firewall-cmd --list-all
OR
(x) Create Resources. In this section, we will add three cluster resources: "FileSystem
resources named as APACHE_FS", "Floating IP address resources named as
APACHE_VIP", "Webserver resources named as APACHE_SERV". Use the following
command to add the three resources to the same group.
(i) Add the first resource: Filesystem with the combination of shared storage (iSCSI
Server)
Note:
After the resources and the resource group creation, start the cluster.
# pcs cluster start --all
Output:
node1.lteck.local: Starting Cluster (corosync)...
node2.lteck.local: Starting Cluster (corosync)...
node2.lteck.local: Starting Cluster (pacemaker)...
node1.lteck.local: Starting Cluster (pacemaker)...
Note:
The above output clearly shows as both the Corosync service and Pacemaker services are started
on both the nodes (Node1 and Node2) in the Cluster. You can check the status using the following
command:
# pcs status
Output:
2 nodes configured
3 resources configured
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Note:
The above output clearly shows that the cluster is up and all three resources are running on the
same node (Node1). Now we need to use the Apache Virtual IP address to get the sample web
page earlier.
(xi) Test High-Availability (HA)/Failover Cluster
The final step in our High-Availability Cluster is to do the Failover test, manually we
stop the active node (Node1) and see the status from Node2 and try to access our
webpage using the Virtual IP.
Note:
As you can see now the Node1 has completely stopped the cluster service. Now move on to
Node2 and verify the cluster status.
[root@node2 ~]# pcs status
Output:
2 nodes configured
3 resources configured
Online: [ node2.lteck.local ]
OFFLINE: [ node1.lteck.local ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Note:
As you can see now, all three resources have been migrated to Node2. Now if you go to the
browser and access the webpage using the same virtual IP you can access the page. That's all.
Based on this article, you can even create and test more than two nodes of HA Cluster in Linux.
Additionally, here are some more cluster commands that are here. This may help you
manage your cluster.
Start or stop the cluster (using the '--all' option will help to start/stop all the nodes
across your cluster)
# corosync-quorumtool
Cluster configuration file
# /etc/corosync/corosync.conf
How to find the status of the resource
I hope this article will help you to understand a few things about the 'HA/Failover
Cluster'. Drop me your feedback/comments. If you like this article, kindly share it and it
may help others as well.
Thank you!