Professional Documents
Culture Documents
The Powerha For Aix (Formerly Hacmp) Cheat Sheet: Frequently Used Acronyms
The Powerha For Aix (Formerly Hacmp) Cheat Sheet: Frequently Used Acronyms
cheat sheet
Building a redundant environment for high availability with AIX
There are some types of computing environments in which you can't afford downtime—the
applications and data are so important that if one machine dies, you want another to be able to
pick up and immediately take over. Fortunately, in IBM® AIX®, a special piece of software
called PowerHA can provide redundancy and high availability to meet these needs. This article
provides an introduction to PowerHA and shows how to set up and configure a simple two-node
cluster.
PowerHA at work
PowerHA is designed to keep resources highly available with minimum downtime by gathering
resources in ways that allow multiple IBM System p servers to access them. PowerHA manages
disk, network, and application resources logically, passing control to individual machines based
on availability and preference. From a systems administration point of view, the main concept
behind PowerHA is to keep everything as redundant as possible to ensure that there is high
availability at all levels.
Here, two System p servers share a common set of SAN storage and communicate on two
networks. They share between them a set of IP addresses, some Logical Volume Manager
(LVM) resources, and application controls—all managed by PowerHA.
One of these servers is considered to be "active" and is in control of these resources, while the
other is idle and sits ready in case it is needed, as shown in Figure 2.
Figure 2. Active and idle servers
When a problem occurs with the availability of some of the physical resources, such as some
wires being accidentally unplugged, PowerHA senses the errors and makes the other server take
over. There is a momentary pause in the availability of the resources, but then everything comes
up as though it were on the original machine, and no one can tell the difference, as shown in
Figure 3.
Once the hardware becomes available again, the resources can remain where they are or go back
to the original server. It is completely at the discretion of the administrator.
However, hardware failures aren't the only reason for making resources move from one server to
another. You can also use this technology for things like operating system upgrades, firmware
maintenance, or other activities that may require downtime, all of which adds to the versatility
and usefulness of PowerHA.
The following terms are used throughout this article and are helpful to know when discussing
PowerHA:
Back to top
Prep work
A number of steps must take place before you can configure an PowerHA cluster and make it
available. The first step is to make sure that the hardware you will be using for the two servers is
as similar as possible. The number of processors, the quantity of memory, and the types of Fibre
Channel and Ethernet adapters should all be the same. If you are using logical partition (LPAR)
or virtual I/O (VIO) technology, be consistent: Don't mix hardware strategies like logical Host
Ethernet Adapters (LHEA) on one node with standard four-port Ethernet adapters on the other.
No development servers
I have seen many environments in a number of different companies over the years in which the
decision is made to declare one of the nodes in a cluster a "production" server and the other a
"development" server. This decision is typically made because companies decide that having a
server sit idle more than 90 percent of the time in case of a disaster is a waste of money. I cannot
stress this enough: DO NOT DO THIS. When this strategy is used, invariably differences in the
two servers arise, as development causes differences in software, applications, and operating
system functions. And when the time comes that the production resource group has to be failed
over to the development server (because it's always a matter of when, not if), those differences
will prevent things from running correctly.
The second step, which should coincide with the first, is to size the environment in such a way
that each node can manage all the resource groups simultaneously. If you decide that you will
have multiple resource groups running in the cluster, assume a worst-case scenario where one
node will have to run everything at once. Ensure that the servers have adequate processing power
to cover everything.
Third, you need to assign and/or share the same set of resources to each server. If you use SAN
disks for storage, the disks for the shared volume groups need to be zoned to all nodes. The
network VLANs, subnets, and addresses should be hooked up in the same fashion. Work with
your SAN and network administrators to get addresses and disks for the boot, persistent, and
service IP addresses.
Fourth and finally, the entire operating system configuration must match between the nodes. The
user IDs, third-party software, technology levels, and service packs need to be consistent. One of
the best ways to make this happen is to build out the intended configuration on one node, make a
mksysb backup, and use that to build out all subsequent nodes. Once the servers are built,
consider them joined at the hip: make changes on both servers consistently all the time.
With all of the virtualization technology available today, it's far more worthwhile to use VIO to
create a pair of production and development LPARs on the same set of System p servers and
hardware resources than to try to save a few dollars at the expense of sacrificing the true purpose
for which PowerHA was designed. Use things like shared processor weights, maximum
transmission unit (MTU) sizes, and RAM allocation to give the production LPARs more clout
than the development LPARs. Doing so creates an environment that can handle a failover and
assures managers and accountants that finances are being used wisely.
Back to top
Now for the actual work. In this example, you set up a simple two-node cluster across two
Ethernet networks: one shared volume group on a SAN disk that also uses a second SAN disk for
a heartbeat and with an application managed by PowerHA in one resource group.
Note: This process assumes that all IP addresses have been predetermined and that the SAN
zoning of the disks is complete. Unless otherwise stated, you must run the tasks here on each and
every node of the cluster.
You can purchase this software from IBM directly (see Resources for a link); the file sets all start
with the word cluster. Use the installp command to install the software, much like any other
licensed program package (LPP).
Put all of the IP addresses associated with the cluster—boot, persistent, and service—into each
/etc/hosts file on each node of the cluster. Do the same with the /usr/es/sbin/cluster/etc/rhosts
file. Verify that the server hostnames match the appropriate IP addresses; the server's hostname
should also match with the persistent IP address.
Run the smitty chinet command, and set the boot IP addresses for each network adapter.
Make sure that you are able to ping and connect freely from node to node on all respective
networks. Also, double-check to make sure that the default route is properly configured. If it
isn't, run smitty tcpip, go into the Minimum Configuration menu, enter the default route for
the primary adapter, and press Enter.
Create two simple Korn shell scripts—one that starts an application and one that stops an
application. Keep these scripts in identical directories on both nodes.
smitty cm_config_an_hacmp_cluster_menu_dmn
Then, define the cluster, including naming it appropriately.
smitty cm_config_hacmp_nodes_menu_dmn
smitty cm_config_hacmp_networks_menu_dmn
This defines one network per Ethernet adapter. I prefer to use the Pre-defined option as opposed
to the Discovered path, but that is up to your discretion. Check the subnet masks for consistency.
smitty
cm_config_hacmp_communication_interfaces_devices_m
enu_dmn
This defines the boot IP addresses on the respective network adapters. This address should be the
same IP addresses you used in step 3. Make sure you define these addresses within the proper
respective PowerHA-defined network.
smitty
cm_config_hacmp_persistent_node_ip_label_addresses
_menu_dmn
This defines the persistent IP addresses, again paying attention to pick the proper respective
PowerHA-defined network.
smitty
cm_config_hacmp_service_ip_labels_addresses_menu_d
mn
By this point, the nodes should have the ability to communicate with each other and keep the
information stored in the nodes' Object Data Managers (ODMs) in sync. Make the nodes within
the cluster communicate with each other by running the command:
smitty cm_extended_config_menu_dmn
Select the Discover PowerHA-related Information from Configured Nodes option, and check
for errors to fix. Generally, rebooting each node can clear up any minor problems, and this is a
good point to test restarting each server anyway.
smitty
cm_hacmp_extended_resource_group_config_menu_dmn
This setting prevents the resources from going back to the original server when it is
brought up, which is a wise thing to do.
Run the smitty cl_vg command, and create a shared volume group. When you create a shared
volume group, you only need to select one of the nodes, because the disk is shared.
smitty
cm_config_hacmp_communication_interfaces_devices_m
enu_dmn
Repeat step 7, except this time, select the Discovered option and the target disk.
smitty cm_cfg_app_extended
This defines an application server for an application that PowerHA will manage. Use the scripts
you created in step 4.
smitty
cm_hacmp_extended_resource_group_config_menu_dmn
Select the Change/Show Resources and Attributes for a Resource Group option. Then,
perform these steps:
smitty cm_ver_and_sync
Set Automatically correct errors found during verification? to Interactive. Correct any
problems along the way.
Step 18. Start the cluster
At this point, the cluster is ready to start. On one of the nodes, run the smitty clstart
command, and pick that particular node. My preference is not to have the cluster start on reboot,
because if there is a PowerHA-related problem on startup, it can be difficult to troubleshoot it.
After the node comes up with the resources available, start the cluster on the other node.
The best way I have found to test PowerHA’s adaptability is to reboot the active node and let
things fail over naturally while running the tail –f /tmp/hacmp.out command on the other
node to watch as things go over. Or, run the command:
smitty cl_resgrp_move.node_site
If you really want to make sure your cluster is solid, perform testing by literally removing cables
and seeing how the resources move back and forth. The more you test, the more reliable your
cluster will be.
Back to top
Conclusion
PowerHA is a robust and effective tool for keeping resources available on AIX servers. Although
this article presented a simple introduction and how-to for setting up a two-node cluster,
PowerHA is capable of doing much more, including application monitoring, integrating NAS
resources, and putting logic into starting up resource groups. But if you are looking to hit the
ground running, the best advice I have is to make a test cluster and try everything you can.