You are on page 1of 10

Creating a Red Hat Cluster: Part 3

Here is the third of a series of article describing how to create a Linux Red Hat/CentOS cluster. At the end of this article, we will have a working cluster, all that will be left to do is the creation of the GFS filesystem and the scripts that will stop, start and give a status of our ftp and web services. You can refer to “Part1 , “Part 2” and our network diagram before reading this article, but for now let’s move on and continue our journey into building our cluster.

Defining the fencing devices
The fencing device is use by the cluster software to power-off the node when it is considered in problem. We need to define a fencing device for all the nodes that we defined in the previous article. In our cluster, the fencing device used is the HP ILO interface. Select “the “HP ILO Device” from the pull down menu and enter the device information needed to connect to it. If you would like to see what are the fencing device supported by Red Hat cluster, you can consult the fencing FAQ on this page. You could use manual fence for testing purpose but it is not supported in a production environment, since manual intervention is required to fence a server.

   

We choose to prefix the fencing device name with a lowercase “f_”, followed by the name assigned to its IP in /etc/hosts. The login name to access the device. The password used to authenticate to the device. The host name as defined is our host file used to connect to the device. Repeat the operation for each server in the cluster.

Defining the fence level

Click on the “OK” button to close the screen above and then press the “Close” button.ca”.Next. allowing you to assign the fencing device to the host name.maison. press the “Close” button again to end the definition of our fence level. We need to repeat the operation for each node in our cluster. Let’s begin by clicking on the node name “hbbilbo. In our case we select the fencing device name “f_bilbo”. this will create a new fence level named “Fence_Level-1″. select “Fence-Level-1″ on the left of the screen and then click on the button “Add a New Fence to this Level” . You will be presented with a similar screen than the one below. Back on this screen. Once a fence level is created for all nodes. A little pop-up will appear. Next. So we just created an association between a fencing device and a node. we need to associate a fencing device with each server in our cluster. you can proceed with the next step. . Now click on the “Add a New Fence Level” button. then press on the button name (“Manage Fencing For This Node”) at the bottom right of the screen.

So let’s begin defining our failover domain. by clicking on “Failover Domains” on the right side of the GUI and pressing the “Create a Failover Domain” button. So select each server one at a time from the selection list “Available Cluster Nodes”. This means all of them for this cluster. We choose. that each member of the cluster will be listed in our failover domain. this is the standard that I choose. Now enter the name of the failover domain.check the “Prioritized List” checkbox. When all the servers are selected. Press the “OK” button to proceed. list all the servers that will be part of the specified failover domain. until they have all been selected. We choose to restrict the failover domainto the list of servers we have included. . but they can be situation were you could have six nodes in a cluster and we want one service to run on only three of the servers because they have more cpu and memory then the three others .Define Failover Domain A failover domain is a named subset of cluster nodes that are eligible to run a cluster service in the event of a node failure. The name of our failover domain. We also want the service to be prioritized. The failover domain configuration. will always begin with “fd_”(lowercase). in this case for “bilbo” server will be “fd_bilbo” to stick to our standard. the selection list will display “No Cluster Nodes Available”.

database that are needed to run a service.1. The preferred server for the failover domain “fd_gollum” will be “hbgollum” (1). select the “IP Address” resource type. we’ll be running 2 services in our cluster.168. And in the eventuality that the two servers are not available then “hbbilbo” (3) will take over.168. Enter the IP of our ftp server “192. From the drop down list. That means that if for a . the service will be relocated to the node with the next higher priority (priority 2) and so forth.204.204 and a web service “www. So for the failover domain “fd_bilbo” the “hbbilbo” will be prioritize (1). if it is not available then our passive server “hbgandalf” (2) will take over.1.maison.2.168. The node with the highest priority (priority 1) will be the initial node for running the service.204″ and make sure that the “Monitor Link” check box is selected. So let’s define an “IP Resource” for our ftp server at 192.1.211.ca” running at IP 192. In this case our services will be a FTP and a web site.You may then highlight the nodes listed in the “Member Node” box and click the “Adjust Priority” buttons to select the order in which the failover should occur.168. If that node fails.maison. scripts. Click on “Resources” on the left part of the screen and then click on the button “Create a Resource” at the lower right of the screen. If you remember in our cluster network diagram. Defining cluster resources Cluster resource can be IP addresses. “hbgollum” priority two and three to “hbbilbo” . The first service will be a FTP server name “ftp.ca” running at IP 192. For the failover domain “fd_gandalf” we have choosen to give “hbgandalf” priority one. NFS and/or GFS filesystem mount point. if the server is power-off then the service will be move to “hbgandalf” (2) our passive node and if it unavailable then service will move to the server “hbgollum” (3).

First we need to assign a failover domain to our ftp service. To do soclick on the button “Add a Shared Resource to this service” and select the IP used for our ftp service. you should see a screen similar to this one. Defining our cluster services We have decided that we would be running two services in our cluster. we need to assign a name to our service. As planned on our cluster network diagram we were going to run the ftp service on the gollum server.168. this will be our standard. we choose to prefix our service by “srv_” .1. because they are nicer when they are displayed by the “clustat” command that we will see later on. Next we need to make our FTP server IP (192. We choose a short name. Repeat the process for our web site at IP 192. .reason or another this IP is no longer responding. First. Click on the IP that our ftp server will use (192. After pressing the button above. so from the failover domain drop down list we select the failover domain “fd_gollum” for our ftp service.204) part of the service. Click on “Services” on the bottom left of the screen and then press the button called “Create a Service” to begin the process. Next you will see a screen similar to this one. one for the ftp and one for the web site. then this will trigger the move of our ftp service to the next server that we have defined in our failover domain for that node.168.1.211. So enter the service name “srv_ftp” and click on the “OK” button.2. we will begin by creating our ftp service. Now is the time to define these services.168.204) and press the “OK” button.

If the web service become unavailable. so it will start on the “bilbo” server.One last thing before finishing the definition of our ftp service. We will look at that in the next article. The missing part of our cluster in order to finalize it. So the IP will automatically be move when the service is relocated to another server within the cluster. But for now let’s push this configuration to our cluster. it does not have the final configuration but we will be able to bring it up and move services from one node to another. by pressing the “Sent to Cluster” button. make sure you select “Relocate” from the “Recovery Policy”. The server to where the web service will be move. We can now press the “Close” button and repeat the operation to create of our web service. We will demonstrate that later on in this article. This will copy our new configuration file . Follow the name procedure as the ftp service to create the web service. This allow our service to be move automatically to another server if the FTP service become inaccessible. Propagate cluster configuration We should have now a working cluster. The name that we decided to call the service for the web site is “srv_www”. The web site service “srv_www” will be part of the “fb_bilbo” failover domain. is be based on the priority order we have put in the failover domain. the service will be relocated to another server within the cluster. is the creation of the GFS filesystem and our scripts that will bring up and down our ftp/web site.

When a server is fenced. You can also the command “ccs_tools” to propagate the cluster configuration change (describe below).conf.conf gandalf:/etc/cluster gollum:/etc/cluster If the copy is done manually. it would power off / power on. on my first attempt. We want to eliminate that. / power off / power on ….conf” to every member of the cluster and activate it. root@gandalf:~# vi /etc/cluster/cluster. If we do not make the following change.. the service is transferred to another node and the server that was hosting that service is power off (fenced). Once you have a working cluster.conf Increment the version number Since we are changing the configuration file. Manual adjustment to cluster configuration file I made the following changes to the cluster configuration file to prevent the reboot of the server after it is fenced (For the HP ILO). FC problem). you will have to restart the cluster services on each node or reboot all the nodes. I had some problem pushing the initial configuration to other member of the cluster. If you have ran the “system-config-cluster” GUI on “bilbo” server. so it will trigger an update on the others servers . by pressing the “Yes” button to the pop up below. the server that was power off will automatically reboot and perhaps then same error condition may occur (network. Before change <cluster alias=”our_cluster” config_version=”74” name=”our_cluster”> After the change <cluster alias=”our_cluster” config_version=”75” name=”our_cluster”> . need to increment the version number. you may have to copy it manually. scp /etc/cluster/cluster. If this happen. then issue the following command on “bilbo” to copy the configuration file to the other servers .“/etc/cluster/cluster. you will not have to manually copy the cluster configuration file. switch. Every time you update the cluster configuration file manually you need to increment the configuration version number. The cluster configuration file is stored in /etc/cluster and is named cluster.conf scp /etc/cluster/cluster. pressing the button “Send to Cluster” will be enough. We will need to confirm our intention.

you can add “-i 2″ to the .maison. increasing verbosity of the resource manager daemon “rgmanager” may help you.ca” nodeid=”1″ votes=”1″> <fence> <method name=”1″> <device action=”off” name=”f_bilbo”/> </method> </fence> </clusternode> Increase resource manager verbosity I include here something optional.ca” nodeid=”1″ votes=”1″> <fence> <method name=”1″> <device name=”f_bilbo”/> </method> </fence> </clusternode> After the change <clusternode name=”hbbilbo. This will prevent a restart of the node after it has been fenced. The simplest one is to use the “clustat” command (see output below). Before the change <clusternode name=”hbbilbo. Changing the <rm> line will add more debugging information in the /var/log/rgmanager file (syslogd need to be restarted). Before the change </fencedevices> <rm> <failoverdomains> After the change </fencedevices> <rm log_facility=”local4″ log_level=”7″> <failoverdomains> Whenever you update the configuration file manually you can use the “ccs_tool” command to propagate the new cluster configuration file to all cluster members. Don’t forget to update the version number.maison. If you want a continuous display. if you are having some problem with your cluster. Checking cluster functionality We can check the status of your cluster in two different ways. Distribute new cluster config file shown below.Changes to prevent rebooting of the node after a power-off We need to do this modification for each of the “clusternode name” section.

Systemconfig-cluster (cluster mgmt tab) shown below.maison. The second way to get the cluster status. we can visualise the status of each services define within the cluster.“clustat” command. This is the normal output you should have. In the upper part of the screen. their “Node ID” and their status are displayed. . We can also see that our service “srv_ftp”. All our members are “Online” with the resource manager running on them. Our two services “srv_ftp” and “srv_www” are running (started) on the selected server. we can see the current member of our cluster and their “Node ID”. is running (started) on “hbgollum” and that the service “srv_www” is running on “hbbilbo. is to run the cluster GUI (system-config-cluster) and then click on the “Cluster Management” tab. The name of each member of the cluster. to have the output refreshed every two seconds (Press CTRL-C to stop the display). Below. clustat command output shown below.ca”.

root@bilbo:~# ip addr list 1: lo: <LOOPBACK.LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.1. we can see that server IP “192.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST.111/24 brd 192.0. we have our heartbeat IP “10.LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:50:da:68:df:9b brd ff:ff:ff:ff:ff:ff inet 192. we need to use the “ip” command to see it.MULTICAST.204″ and from the “clustat” command above we know that it is running on the server “gollum”. we can disable (stop).255 scope global eth0 inet 192.104″ is active.10.10.1.0.1.LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127. Be aware that the output of the “ifconfig -a” does not include our FTP IP.111/24 brd 10.10.10.168.10.168.1.168.LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:50:8b:f4:c5:59 brd ff:ff:ff:ff:ff:ff inet 10.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST.204/24 scope global secondary eth0 inet6 fe80::250:daff:fe68:df9b/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST.1.MULTICAST.ca” IP defined on the interface “eth0″.0 root@gollum:~# On the server “bilbo” we have our web IP (192.UP.168.255 scope global eth1 inet6 fe80::250:8bff:fef4:c559/64 scope link valid_lft forever preferred_lft forever .104/24 brd 10.1.104/24 brd 192.MULTICAST.168.255 scope global eth1 inet6 fe80::202:a5ff:feb1:a0d4/64 scope link valid_lft forever preferred_lft forever 4: sit0: <NOARP> mtu 1480 qdisc noop link/sit 0. So let’s logon on that server and check if that IP is active. More on this later.10.211/24 scope global secondary eth0 inet6 fe80::201:2ff:fe75:8058/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST.0.UP.With the “Cluster Management” tab.168.204″ is also active on the same interface.UP. From the output below.1.UP. The IP address of the ftp service that we have define is “192.maison. Let’s check if our services IP are alive.0 brd 0.0.10.1. we have the “ftp. Our heartbeat IP is defined on the interface “eth1″.LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:01:02:75:80:58 brd ff:ff:ff:ff:ff:ff inet 192.255 scope global eth0 inet 192. So our cluster software is doing its job.168.UP.104″ is active on “eth0″ and that our ftp server IP “192.0.0. root@gollum:~# ip addr list 1: lo: <LOOPBACK.168.168. enable (start) or restart each of these services. You will also notice that on “eth1″.111).10.MULTICAST.1.0.1.10.LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:02:a5:b1:a0:d4 brd ff:ff:ff:ff:ff:ff inet 10.1.211) active on the interface “eth0″ along with the server IP (192.168.UP.0.168.10.