January 2008

• A l e x A b d e r r a z a g • Ve r s i o n 0 3 . 0 0 •

Table of Contents

I. II.

Overview 1 Designing High Availability 1
• Risk Analysis 2

III. Cluster Components 3
• Nodes 3 • Networks 3 • Adapters 5 • Applications 5

IV. Testing 9 V. Maintenance 9
• Upgrading the Cluster Environment 10

VI. Monitoring 12 VII. HACMP in a Virtualized World 13
• Maintenance of the VIOS partition – Applying Updates 18 • Workload Partitions (WPAR) 18

VIII. Summary 22 IX. Appendix A 24
• Sample WPAR start, stop and monitor scripts for HACMP 24


References 26

XI. About the Author 26

HACMP Best Practices


Designing High Availability “…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs)…” A High Availability Solution helps ensure that the failure of any component of the solution. and carefully controlled change management discipline. and suggests the alternatives that make for the highest level of availability*. While the principle of "no single point of failure" is generally accepted. software. and provides the cluster administrator with a great deal of flexibility. planning. “…. and is generally a false economy: the resulting cluster is still more expensive than a single node. It is generally recognized as a robust. but which are not optimum in terms of providing availability.WHITE PAPER Overview IBM High Availability Cluster Multiprocessing (HACMP TM) product was first shipped in 1991 and is now in its 15th release. selection of hardware. This choice is often made to reduce the price of a cluster. be it hardware. This document discusses the choices that the cluster designer can make. with over 60.cluster design decisions should be based on whether they contribute to availability (that is. does not cause the application and its data to be inaccessible to the user community. The most common instance of this is when cluster nodes are chosen that do not have enough I/O slots to support redundant adapters.000 HACMP clusters in production world wide. A cluster should be carefully planned so that every cluster element has a backup (some would say two of everything!). Best practice is that either the paper or on-line planning worksheets be used to do this planning. configuration of software. but has no better availability. HACMP Best Practices 1 . although general best practice concepts are also applicable to HACMP running under Linux®. With this flexibility comes the responsibility to make wise choices: there are many cluster configurations that are workable in the sense that the cluster will pass verification and come on line. High availability solutions should eliminate single points of failure (SPOF) through appropriate design. It is deliberately violated when the cluster designer chooses not to put redundant hardware in the cluster. eliminate a SPOF) or detract from availability (gratuitously complex) …” * This document applies to HACMP running under AIX®.0 provides a list of typical SPOFs within a cluster. and saved as part of the on-going documentation of the system. Fig 1. it is sometimes deliberately or inadvertently violated. HACMP supports a wide variety of configurations. This is achieved through the elimination or masking of both planned and unplanned downtime. mature high availability product. or system management. It is inadvertently violated when the cluster designer does not appreciate the consequences of the failure of a specific component.

HACMP Best Practices 2 . Estimate the cost of a failure versus the probability that it occurs. ² Eliminating the Site as a SPOF depends on distance and the corporate disaster recovery strategy. Evaluate counter measures. What does it take to reduce the risk or consequence to an acceptable level? Finally. These topics are beyond the scope of this white paper. Examples. previously known as HAGEO). However. then one of the HACMP/XD: PPRC technologies is appropriate. Hypothesize all possible vulnerabilities. this only eliminates the network directly connected to the cluster as a SPOF. make decisions. in reality it is just not feasible to truly eliminate all SPOFs within a cluster. If the sites are within the range of PPRC (roughly. Otherwise. Generally.Fig 1. It is not unusual for the users to be located some number of hops away from the cluster. Perform requirements analysis. switches and cabling – each of which typically represents yet another SPOF. this involves using HACMP eXtended Distance (XD. What could go wrong? Identify and quantify risks. providing the best performance at no additional expense. buildings within a 2km radius—then Cross-site LVM mirroring function as described in the HACMP Administration Guide is most appropriate. if the sites can be covered by a common storage area network—say. One should : Study the current environment. Each of these hops involves routers.0 Eliminating SPOFs Risk Analysis Sometimes however. 1 If the network as a SPOF must be eliminated then the cluster requires at least two networks. How much availability is required and what is the acceptable likelihood of a long outage. may include : Network ¹. consider HACMP/XD: GLVM. Risk analysis techniques should be used to determine those which simply must be dealt with as well as those which can be tolerated. Truly eliminating the network as a SPOF can become a massive undertaking. create a budget and design the cluster. 100km) and compatible ESS/DS/SVC storage systems are used. Unfortunately. Site ². An example would be that the server room is on a properly sized UPS but there is no disk mirroring today.

in addition HACMP Best Practices 3 . Use of nodes without redundant adapters should not be considered best practice. The worst case situation – e. HACMP networks not only provide client access to the applications but are used to detect and diagnose node. Networks HACMP is a network centric application.one node that is normally not running any applications. the root volume group in each node should be mirrored.which may happen if they are placed in a single rack. Similarly. the most reliable and available clusters have at least one standby node . And. building a cluster of nodes that are actually logical partitions (LPARs) with a single footprint is useful as a test cluster. it is important to pay attention to environmental considerations. just as every cluster resource should have a backup. That is. or be on a RAID device. or acquirable through Capacity Upgrade on Demand. HACMP can determine what type of failure has occurred and initiate the appropriate recovery action. By gathering heartbeat information on multiple nodes. While it is possible to have all nodes in the cluster running applications (a configuration referred to as "mutual takeover"). network and adapter failures. Nodes should be chosen that have sufficient I/O slots to install redundant network and disk adapters. Additionally. even under the heaviest expected load. with any combination of active and standby nodes. for example the failure of a network and the failure of a node. HACMP uses RSCT which sends heartbeats (UDP packets) over ALL defined networks. twice as many slots as would be required for single node operation. but is available to take them over in the event of a failure on an active node. Nodes should also be chosen so that when the production applications are run at peak load. The production application should be carefully benchmarked (preferable) or modeled (if benchmarking is not feasible) and nodes chosen so that they will not exceed 85% busy. However.g. it must be capable of servicing multiple workloads. Therefore. there are still sufficient CPU cycles and I/O bandwidth to allow HACMP to operate. these resources must actually be available. requires a second network! Although this additional network can be “IP based” it is possible that the entire IP subsystem could fail within a given node.Cluster Components Here are the recommended practices for important cluster components. Note that the takeover node should be sized to accommodate all possible workloads: if there is a single standby backing up multiple primaries. Nodes HACMP supports clusters of up to 32 nodes. To do this. HACMP can be configured to allocate processors and memory to a takeover node before applications are started. On hardware that supports dynamic LPAR operations. Nodes should not have a common power supply . Being able to distinguish between certain failures. all the applications on a single node – must be understood and planned for. Blades are an outstanding example of this. This naturally suggests that processors with small numbers of slots should be avoided. but should not be considered for availability of production applications..

it is possible to do so. Since each node is in fact. This can be achieved by ensuring the configuration contains a backup adapter which plugs into an alternate switch. If you have several virtual clusters split across frames. Although. Disk heartbeat networks only require a small disk or LUN. monitoring and secure Node-to-Node communications. However. intelligent switch port settings and VLANs. Ensure you have in place the correct network configuration rules for the cluster with regards IPAT via Replacement/Aliasing. service and persistent addressing. Configure persistent IP labels to each node. Currently. ensure boot subnet Addresses are unique per cluster. To aid problem determination configure the netmon. Etherchannel. Virtual Adapter support. Ensure there is at least one non-IP network configured. Be careful not to put application data on these disks. An example: when using C-SPOC option "change a users password". subnetmasks.cf file to allow ICMP echo requests to be sent to other interfaces outside of the cluster. Each physical adapter in each node needs an IP address in a different subnet using the same subnet mask unless Heartbeating over IP Aliasing is used. Avoid problems by configuring /etc/netsvc. with regards to IP addresses. non-IP networks. then the result could lead to data divergence (massive data corruption). If the shared disks are also online to both nodes. HACMP would conclude the other nodes are down and initiate a takeover. IPv6 and IEEE802. This is a situation which must be avoided at all costs.there should be at least one. HACMP Best Practices 4 . Failure to implement a non-IP network can potentially lead to a Partitioned cluster. Read the release notes stored in : /usr/es/sbin/cluster/release_notes. ideally two. This will avoid problems with netmon reporting the network is up when indeed the physical network outside the cluster maybe down.3 standard et interfaces. Where possible use Etherchannel configuration in conjunction with HACMP to aid availability. H/W Address Take-over (HWAT). note: HACMP see Etherchannel configurations as single adapter networks. See Administration guide for further details. The most convenient way of configuring non-IP networks is to use Disk Heartbeating as it removes the problems of distance with rs232 serial networks. persistent addressing and Fast failure detection. External resolvers are deactivated under certain event processing conditions. Name resolution is essential for HACMP. still very alive. They are useful for remote administration. Consider implementing a host-to-host IPsec tunnel between persistent labels between nodes. such as collocation rules.conf and NSORDER variable in /etc/environment to ensure the host command checks the local /etc/hosts file first. you don't want any conflict with the disk heartbeat mechanism! Important network best practices for high availability : Failure detection is only possible if at least two physical adapters per node are in the same physical network/VLAN. This situation can occur if the IP network(s) between nodes becomes severed or in some cases congested. Look out for new and enhanced features. This will ensure sensitive data such as passwords are not sent unencrypted across the network. These IP addresses are available at AIX® boot time and HACMP will strive to keep them highly available. there is NO support in HACMP for Virtual IP Addressing (VIPA). Take extreme care when making subsequence changes to the networks. After takeover has occurred the application(s) potentially could be running simultaneously on both nodes. sometimes referred to as 'Split Brain' Syndrome. For more information check the HACMP Planning Guide documentation.

In HACMP terms. serial number or IP address. Be aware of network detection settings for the cluster and consider tuning these values. When done properly. In most situations. By switching to a fast setting the detection time would be reduced by 50% (10 seconds) which in most cases would be more acceptable. Some applications tend to bond to a particular OS characteristic such as a uname. normal. While it is possible to build a cluster with fewer. the reaction to adapter failures is more severe: the resource group must be moved to another node. Note that this is not a substitute for a non-IP network. these problems can be overcome. when using custom settings. fast and custom.com/support/techdocs/atsmastr. For high availability to be achieved. One port on such an adapter should not be used to back up another port on that adapter. This must be set up with some care in an HACMP cluster.nsf/WebIndex/TD101785 for further details. Some adapters provide multiple ports. Be careful however. Etherchannel is particularly useful for fast responses to adapter / switch failures. If the nodes are physically close together. with the two backing up each other. Many System p TM servers contain built-in Ethernet adapters. These settings can be viewed using a variety of techniques including : lssrc –ls topsvcs command (from a node which is active) or odmget HACMPnim |grep –p ether and smitty hacmp. This is particularly important when designing the Resource Group policy behavior and dependencies. AIX provides support for Etherchannel. it is possible to use the built-in Ethernet adapters on two nodes and a "cross-over" Ethernet cable (sometimes referred to as a "data transfer" cable) to build an inexpensive Ethernet network between two nodes for heart beating. Refer to the IBM techdocs website: http://www-03. best practice is to provide an additional adapter in the node. The same thing is true of the built-in Ethernet adapters in most System p servers and currently available blades: the ports have a common adapter. this provides the highest level of availability against adapter failure. There are four settings per network type which can be used : slow.ibm. the network failure detection time would be approximately 20 seconds. With todays switched network technology this is a large amount of time. as setting these values too low can cause false takeovers to occur. HACMP Best Practices 5 . The vast majority of commercial software products which run under AIX are well suited to be clustered with HACMP. the application must have the ability to stop and start cleanly and not explicitly prompt for interactive input. each network defined to HACMP should have at least two adapters per node. since the adapter card itself is a common point of failure. a facility that can used to aggregate adapters (increase bandwidth) and provide network resilience.Adapters As stated above. With the default setting of normal for a standard Ethernet network. these are referred to as NIM values. When the built-in Ethernet adapter can be used. Applications The most important part of making an application run well in an HACMP cluster is understanding the application's requirements.

Check the state of the data. then symbolic links can be created prior to install which point to the shared storage area. by bffcreate’ing the packages to disk and restoring them with the preview option. Start/Stop Scripts Application start scripts should not assume the status of the environment. ensure there are adequate alerts (email. This is particularly useful for applications which are installed locally. as it is easy to forget to update it on all cluster nodes when it changes. start the application. The assumption is once the application is started why stop it? Relying on a failure of a node to stop an application will be effective. This action will show the install paths. Some of the issues to avoid are: HACMP Best Practices 6 .Application Data Location Where should application binaries and configuration data reside? There are many arguments to this discussion. someone thinks of a new set of circumstances. the correct answer is not fixed. but these are recommendations. store the executing HOSTNAME to variable. Are there prerequisite services that must be running? Is it feasible to start all prerequisite services from within the start script? Is there an inter-resource group dependency or resource group sequencing that can guarantee the previous resource group has started correctly? HACMP v5. SMS. check that the application is not currently running! This is especially crucial for v5. This can prevent the application from starting or working correctly. Here are some rules of thumb: If the application is packaged in LPP format. This behavior can be overcome. Intelligent programming should correct any irregular conditions that may occur. Are all the disks. and IP labels available? If different commands are to be run on different nodes. Many application vendors have suggestions on how to set up the applications in a cluster. Just when it seems to be clear cut as to how to implement an application. Some things a start script should do are: First. it is usually installed on the local file systems in rootvg. Finally. Verify the environment. Does it require recovery? Always assume the data is in an unknown state since the conditions that occurred to cause the takeover cannot be assumed. in previous versions). The cluster manager spawns theses scripts off in a separate job in the background and carries on processing. when it is run on a backup node.4 users as resource groups can be placed into an unmanaged state (forced down action. keep all the application binaries and data were possible on the shared disk. file systems. Using the default startup options. If the application is found to be running just simply end the start script with exit 0. HACMP will rerun the application start script which may cause problems if the application is actually running. However.2 and later has facilities to implement checks on resource group dependencies including collocation rules in HACMP v5.3. Also. A simple and effective solution is to check the state of the application on startup. then the application and configuration data would probably be on local disks and the data sets on shared disk with application scripts altering the configuration files during fallover. SMTP traps etc) sent out via the network to the appropriate support administrators. If the environment is not correct and error recovery procedures cannot fix the problem. Generally. If the application is to be used on multiple nodes with different data or configuration. Stop scripts are different from start scripts in that most applications have a documented start-up routine and not necessarily a stop routine. when the environment looks right. but to use some of the more advanced features of HACMP the requirement exists to stop an application cleanly. remember the HACMP File Collections facility can be used to keep the relevant configuration files in sync across the cluster.

restart and stop methods carefully! Poor start. CUoD and CBU for high-end only. HACMP Best Practices 7 . 4 CoD support includes : On/Off CoD inc. Don’t forget to test the monitoring. Verify that the application is stopped to the point that the file system is free to be unmounted. In addition. Although optional. most vendor stop/starts scripts are not designed to be cluster proof! A useful tip is to have stop and start script verbosely output using the same format to the /tmp/hacmp. In order to ensure that event processing does not hang due to failures in the (user supplied) script and to prevent hold-up during event processing. HACMP has always started the application in the background. Application monitoring can either check for process death. Trial. Best practice for applications is to have both process death and usersupplied application monitors in place.e. In some cases it may be necessary to double check that the application vendor’s stop script did actually stop all the processes.es. not just in maintaining application availability but avoiding data corruption 3. 3 Having monitoring scripts exit with non zero return codes when the application has not failed in-conjunction with poor start / stop scripts can result in undesirable behavior (i. * Note: This is not the case with start scripts! Remember. The fuser command may be used to verify that the file system is free. start.ibm. A good best practice example for this use would be for application provisioning.event scripts. See http://www-03.a monitor can run a null transaction to ensure that the application is functional. Failure to exit the stop script with a zero return code as this will stop cluster processing. or run a user-supplied custom monitor method during the start-up or continued running of the application.data) – those are the only ones for which upwards compatibility is guaranteed. This can be achieved by including the following line in the header of the script: set -x && PS4="${0##*/}"'[$LINENO] ' Application Monitoring HACMP provides the ability to monitor the state of an application.com/servers/eserver/about/cod for further details. This mechanism provides for self-healing clusters. data corruption). The latter is particularly useful when the application provides some form of transaction processing .man. This approach has disadvantages : There’s no wait or error checking In a multi-tiered environment there is no easy way to ensure that applications of higher tiers have been started. HACMP also supplies a number of tools and utilities to help in customization efforts like preand post. and occasionally it may be necessary to forcibly terminate some processes. stop and monitor scripts can cause cluster problems.out file. Clearly the goal is to return the machine to the state it was in before the application start script was run.Be sure to terminate any child or spawned processes that may be using the disk resources.en_US. implementation is highly recommended. Care should be taken to use only those for which HACMP also supplies a man page (lslpp -f cluster. Consider implementing child resource groups. Not only is the application down but is in need of emergency repair which may involve data restore from backup.

I would recommend this behavior is implemented using user supplied custom scripts. this approach does have several limitations : Support for POWER4 TM architecture only (Whole CPU's and 256 Memory Chunks) No provisions or flexibility for shutting down or "stealing from" other LPARs CoD activation key must have been entered manually prior to any HACMP Dynamic Logical Partitioning (DLPAR) event Must have LPAR name = AIX OS Hostname = HACMP node name Large memory moves will be actioned in one operation.see Reference section.which is far from ideal! If the acquisition / release fails the operation is not repeated on another HMC if defined Given these drawbacks. This is shown in Fig 1. Practical examples can be explored in the AU61G Education class . Debugging is limited and often is it necessary to hack the script . Fig 1.Application Provisioning HACMP has the capability of driving Dynamic LPAR and some Capacity on Demand (CoD) operations 4 to ensure there is adequate processing and memory available for the application(s) upon start-up.1 Application Provisioning example. HACMP Best Practices 8 . However.1. This will invariably take some time and hold up event processing LPAR hostname must be resolvable at HMC The HACMP diver script hmc_cmd does not log the DLPAR / CoD commands it sends to the HMC. This process can be driven using HACMP smit panels.

and allow for correction before the cluster fails in production. When any group plans maintenance on a cluster node. Additionally. change control in an HACMP cluster needs to include a goal of HACMP Best Practices 9 . and reacting to any warnings or errors. also take a mksysb backup. maintain a test cluster identical to the production ones.log file. as best practice. Prior to any change to a cluster node. take an HACMP snapshot. All changes to applications. network and adapter failures. These tests will at least indicate whether any problems have crept in. Organizational policy must preclude “unilateral” changes to a cluster node. if the enterprise can afford to schedule it. clverify should be run. it should be planned and coordinated amongst all the parties. It’s a common safety recommendation that home smoke detectors be tested twice a year . networks and clusters are administered by separate individuals or groups. On successful completion of the change.) Maintenance Even the most carefully planned and configured cluster will have problems if it is not well maintained. Best practice would be to perform the same level of testing after each change to the cluster. node fallover and fallback tests should be scheduled biannually. This will verify that the applications are brought back on line after node. Not only errors but also warning messages should be taken quite seriously. regular testing should be planned. HACMP provides a cluster test tool that can be run on a cluster before it is put into production. All mission critical HA Cluster Enterprises should. All should be aware of the changes being made to avoid introducing problems. Additionally.2. or software should be first thoroughly tested on the test cluster prior to being put on the production clusters. A large part of best practice for an HACMP cluster is associated with maintaining the initial working state of the cluster through hardware and software changes. A cluster should be thoroughly tested prior to initial production (and once clverify runs without errors or warnings). The Online Planning Worksheets facility can also be used to generate a HTML report of the cluster configuration. Starting with HACMP v5. The HACMP cluster test tool can be used to at least partially automate this effort. On a more regular basis. and fixed at the first opportunity. If the change involves installing an HACMP. Similarly. This means that every cluster node and every interface that HACMP uses should be brought down and up again.the switch to and from daylight savings time being well-known points. the most important thing about testing is to actually do it. clverify is run automatically daily @ 00:00 hrs. print out and save the smit. cluster configuration.Testing Simplistic as it may seem. The test tool should be run as part of any comprehensive cluster test effort. to validate that HACMP responds as expected. Administrators should make a practice of checking the logs daily. Change control is vitally important in an HACMP cluster. databases. In some organizations. AIX or other software fix. use SMIT to display the cluster configuration.

especially with regards to shared volume groups. Develop a process which encompasses the following set of questions : Is the change necessary? How urgent is the change? How important is the change? (not the same as urgent) What impact does the change have on other aspects of the cluster? What is the impact if the change is not allowed to occur? Are all of the steps required to implement the change clearly understood and documented? How is the change to be tested? What is the plan for backing out the change if necessary? Is the appropriate expertise be available should problems develop? When is the change scheduled? Have the users been notified? Does the maintenance period include sufficient time for a full set of backups prior to the change and sufficient time for a full restore afterwards should the change fail testing? This process should include an electronic form which requires appropriate sign-offs before the change can go ahead. In all cases. If the installation uses AIX password control on the cluster nodes (as opposed to NIS or LDAP). HACMP will then ensure that the change is properly reflected to all cluster nodes. so you want to upgrade? Start by reading the upgrade chapter in the HACMP installation documentation and make a detailed plan. Before.software.ibm. It is insufficient (and unwise!) to upgrade just the node running the application.com/webapp/set2/flrt/home. Every change. Don’t forget to check all the version compatibilities between the different levels of software/firmware and most importantly the application software certification against the level of AIX and HACMP. must follow the process. C-SPOC should also be used for any changes to users and groups. it is extremely useful to complete the process in test environment before actually doing it for real. To this end. The notion that a change. attempting the upgrade ensure you carry out the following steps : Check that cluster and application are stable and that the cluster can synchronize cleanly HACMP Best Practices 10 . the best practice is to use the HACMP C-SPOC facility where possible for any change. Don’t even think about upgrading AIX or HACMP without first taking a backup and checking that it is restorable. even a small change might be permitted (or sneaked through) without following the process must not be permitted. AIX facilities such as alt_disk_copy and multibos for creating an alternative rootvg which can activated via a reboot are very useful tools worth exploring and using. If you are not sure check with IBM support and/or user the Fix Level Recommendation Tool (FLRT) which is available at : http://www14. even the minor ones.having all cluster nodes at the same level. Upgrading the Cluster Environment OK. Taking the time to review and plan thoroughly will save many 'I forgot to do that!' problems during and after the migration/upgrade process.

Historically. Note: During this state the application(s) are not under the control of HACMP (ie. the entire cluster and applications can be gracefully shutdown to update the cluster using either the ‘snapshot’ or ‘Offline’ conversion methods. Using the default start-up options. Effectively. the goal should be to have all nodes at exactly the same levels of AIX. upgrading the cluster this way has resulted in fewer errors! but requires a period of downtime! HACMP Best Practices 11 . The node/system is updated accordingly and cluster services restarted. check with odmget HACMPcustom.4). HACMP relies on an application monitor to determine the application state and hence appropriate actions to undertake. PTFs can now be applied using a ‘Non disruptive upgrade’ method. Check that the same level of cluster software (including PTFs) are on all nodes before beginning a migration Ensure that the cluster software is committed (and not just applied) Where possible the Rolling Migration method should be used as this ensures maximum availability. Additionally.4. HACMP prevents changes to the cluster configuration when mixed levels of HACMP are present. Starting with HACMP v5. however. HACMP and application software. resource groups are placed into an ‘Unmanaged’ State to ensure they remain available. If you are unsure that any custom scripts are included.Take a cluster snapshot and save it to a temporary non cluster directory (export SNAPSHOTPATH=<some other directory>) Save event script customization files / User Supplied scripts to a temporary non cluster directory. Alternatively. This operation is completed one node at a time until all nodes are at the same level and operational. Note : While HACMP will work with mixed levels of AIX or HACMP in the cluster. Not highly Available!). The process is actually identical to the rolling migration. cluster services are stopped one node at a time using the takeover option (Now move resource groups’ in HACMP v5.

network & node failures. as shown in Fig 2.. software failures and significant actions such as adapter. such as Tivoli Integration filesets and commands such as cldisp. The actual facilities used may well be set by enterprise policy (e. these include : HP OpenView. The output can be parsed through perl or shell scripts to produce a cluster status report. Best practice is to have notification of some form in place for all cluster events associated with hardware. Further details are covered in the AU61 WorldWide HACMP Education class.g. Furthermore. as shown in Fig 2.defs risc6000clsmuxpd > $OUTFILE. cldump & clstat. A little further scripting can parse the output again in HTML format so the cluster status can be obtained through a cgi web driven program. HACMP Best Practices 12 . pager and e-mail messages on cluster event execution and execute scripts on entry of error log reports. Other parties also have HACMP aware add-ons for SNMP monitors. Tivoli is used to monitor all enterprise systems).Monitoring HACMP provides a rich set of facilities for monitoring a cluster.0 SNMP and HACMP The clinfo daemon status facility does have several restrictions and many users/administrators of HACMP clusters implement custom monitoring scripts. Tivoli Universal Agent and BMC PATROL (HACMP Observe Knowledge Module by to/max/x). Fig 2. HACMP implements a private Managed Information Base (MIB) branch maintained via a SMUX peer subagent to SNMP contained in clstrmgrES daemon. This may seem complex but actually it’s remarkably straight forward. The cluster SNMP MIB data can be pulled simply over an secure session by typing : ssh $NODE snmpinfo -v -m dump -o /usr/es/sbin/cluster/hacmp. HACMP can invoke notification methods such as a SMS.1. The SNMP protocol is the crux to obtaining the status of the cluster.0.

however some restrictions apply when using virtual Ethernet or virtual disk access. Redundancy for disk access can be achieved through LVM mirroring or Multi-Path I/O (MPIO). LVM mirroring is most suited to eliminate the VIOC rootvg as a SPOF as shown in Fig 3.0.1 Custom HACMP Monitor HACMP in a Virtualized World HACMP will work with virtual devices. ideally some distance apart. In the event of VIOS failure. the LPAR will see stale partitions and the volume group would need to be resynchronized using syncvg. To eliminate the additional SPOFs in a virtual cluster the use of a second VIOS should be implemented in each frame with the Virtual Client (VIOC) LPARs located within different frames.Fig 2. Creating a cluster in a virtualized environment will add new SPOFs which need to be taken into account. this configuration should be considered only for test environments. The root volume group can be mirrored using standard AIX practices. This procedure can also utilHACMP Best Practices 13 . HACMP nodes inside the same physical footprint (frame) must be avoided if high availability is to be achieved.

hence only devices that will not be reserved on open are supported. other modes are not yet supported. The configuration is then duplicated on the backup frame/node. A LUN is mapped to both VIOS in the SAN. the reservation held by a VIO server can not be broken by HACMP.0 Redundancy using LVM Mirroring For shared data volume groups. Fig 3. the MPIO method should be deployed. Therefore. All devices accessed through a VIO server must support a “no_reserve” attribute.0. Currently. the virtual storage devices will work only in failover mode. the LUN is mapped again to the same VIOC. whereby each VIOC is located in the same frame LVM mirroring could also be used for datavgs as well. For test environments. The VIOC LPAR will correctly identify the disk as an MPIO capable device and create one hdisk device with two paths. HACMP requires the use of enhanced concurrent mode volume groups (ECVGs) The use of ECVGs is generally considered best practice! HACMP Best Practices 14 . If the device driver is not able to “ignore” the reservation.ize logical volumes as backing storage to maximize flexibility. From both VIOSs. See Fig 4. Currently. the device can not be mapped to a second VIOS.

Now. a VIOS is needed for access to the outside world via a layer-2 based Ethernet bridge which is referred to an a Shared Ethernet Adapter (SEA).0 SEA failover via the Hypervisor. See Fig 6. SEA failover is generally considered best practice as it provides the use of Virtual LAN ID (VID) tags and keeps the client configuration cleaner.0.Fig 4.No Aggregation) in the VIOC. To eliminate the VIOS as a SPOF there are two choices : 1. IPAT via Replacement and H/W Address Takeover (HWAT) are not supported. From the client perspective only a single virtual adapter is required and hence IPAT via Aliasing must be used. the physical network devices along with the SEA are the new SPOFs. Etherchannel (configured in backup mode ONLY . How are these SPOFs eliminated? Again through the use of a second VIOS. 2. Etherchannel technology from within the VIOS can use used to eliminate both the network adapters and switch as a SPOF.0 Redundancy using MPIO In a virtualized networking environment. HACMP Best Practices 15 . Having a second virtual adapter will not eliminate a SPOF as the adapter is not real! The SPOF is the Hypervisor! Generally. There are advantages and disadvantages with both methods. See Fig 5. However.

Depending on your environment this technology maybe worth investigating. at least two physical adapters per SEA should be used in the VIOS in an Etherchannel configuration. Adapters in this channel can also form an aggregate. Fig 5. In this case.0 Etherchannel in Backup Mode HACMP Best Practices 16 . In addition. it can’t be avoided so to aid additional analysis.cf file. but remember that most vendors require adapters which form an aggregate to share the same backplane (A SPOF! .single interface networks are not best practice as this limits the error detection capabilities of HACMP. An exception this this rule is Nortel’s Split Multi-Link Trunking. add external IP-addresses to the netmon.so don’t forget to define a backup adapter).

0 SEA Failover And finally a view of the big picture. As you can see from Fig 7.Fig 6. Be methodical in your planning.0 even a simple cluster design can soon become rather complex! Fig 7.0 A HACMP Cluster in a virtualized world HACMP Best Practices 17 .

Maintenance of the VIOS partition – Applying Updates The VIOS must be updated in isolation. System WPARs carve a single instance of the AIX OS into multiple virtual instances. For: MPIO storage. The biggest drawback is that currently. HACMP Best Practices 18 . i. allowing for separate "virtual partitions. disable the activate path by typing : chpath -l hdiskX -p vscsiX -s disable LVM mirrored disks. This ensures that no client partition can access any devices and the VIOS is ready for maintenance. Then delete all virtual devices from the profile and reactivate the VIOS using the new profile. an exciting new feature with the potential to improve administration efficiency and assist with server consolidation. Workload Partitions (WPAR) The Nov ‘07 release of the AIX 6 introduced workload partitions (WPARs). new SPOFs are introduced into the environment. When deploying WPAR environments careful consideration will be needed to ensure maximum availability. Prior to restarting the VIOS. manual failover from the client must be performed so all disk access and networking goes through the alternate VIOS. This is a key point and for this reason best practice recommends a much different method of supporting WPARs with HACMP than the default method supported by HACMP. The client should then be redirected to the newly updated VIOS and the same procedure followed on the alternative VIOS. After the update has been applied the VIOS must be rebooted. these may include: the network between the WPAR host and the NFS server the NFS server the WPAR(s) the OS hosting the WPAR(s) the WPAR applications HACMP 5. A simple way of achieving this is to start by creating a new profile for the VIO server by copying the existing one.1 release. Steps to accomplish this are as follows. It’s important that each VIOS used has the same code level. Potentially. As discussed previously with application provisioning. set the virtual SCSI target devices to 'defined' state in the VIO server partition." This capability allows administrators to deploy multiple AIX environments without the overhead of managing individual AIX images. SEA failover can be initiated from the active VIOS by typing: chdev -attr ha_mode=standby Etherchannel in the VIOC.4. it manages the applications that run within a WPAR. several limitations currently exist.e. with no client access. HACMP does not manage or monitor the WPAR itself. initiate a force failover using smitty etherchannel. introduces (as part of the base product) basic WPAR support which allows a resource group to be started within a WPAR.

This way the system can be restored to a point of last known good state. /var. Furthermore.Firstly. say every hour. During recovery the appropriate checkpoint can be restored. HACMP Best Practices 19 . Using this method of integration makes WPAR support with HACMP release independent ie. /home. This approach has several major advantages as by default we can move the WPARs by checkpointing and restoring the system. same implementation steps can be carried out with any supported version of HACMP not just 5. I am going to recommend that all WPARs to be clustered for High Availability are designed and created so they are not fixed to any one individual node or LPAR using what’s known as WPAR mobility. HACMP controls the start-up. Figure 8. HACMP monitoring the health of the WPAR applications using process and/or custom application monitoring.4. WPAR mobility is created and tested outside of HACMP control before being put under HACMP control.0 shows an example of a highly available WPAR environment with both resilience for the NFS server and the WPAR hosting partitions. Checkpoints can be saved at any time continually. and most importantly. Note: the movement of wparRG will checkpoint all running applications which will automatically resume from the checkpoint state on the backup node (no application start-up is required – but a small period of downtime is experienced!). The WPAR: zion is under the control of HACMP and shares both filesystems from the local host and the NFS server as show in figure 9. /tmp filesystems for each of the WPARs. WPAR mobility is an extension to WPAR which allows a WPAR to be moved from node to node (independently of products such as HACMP). This involves the use of at least a third node which will act as an NFS server to host at minimum the /. I recommend that : the service type IP addresses for the WPARs are defined to the WPAR themselves and not to HACMP. HACMP monitors the health of the WPARs using custom application monitoring. shutdown/checkpoint and movement of the WPAR to the backup nodes.1. For multiple WPAR configurations use mutual takeover implementation to balance the workload of the AIX servers hosting the WPARs and the NFS servers. The checkpoint facility saves the current status of the WPAR & application(s) and then restarts them at there previously saved state.0.

Remember that HACMP uses the /usr/es/sbin/cluster/etc/exports as the default NFS export file.root=hacmp_wparA:hacmp_wpa rB:zion HACMP Best Practices 20 . Optionally decide whether to cluster the NFS server for improved resilience.Fig 8.rw. Here is an example line from the export file: /wpars/home -sec=sys.0 Example layout for WPAR: zion Implementation Overview On the NFS server create the local filesystems to export for each WPAR.0 Highly available WPAR environment Fig 9.access=hacmp_wparA:hacmp_wparB:zion.

On the secondary node(s) create the WPAR definition mkwpar -p -f <path to the WPAR definition file> Manually perform a WPAR relocation from the primary to the secondary node. sample scripts. Here’s an example of the home directory export for WPAR: zion. eg. Change/Show All Resources and Attributes for a Resource Group [MORE. This will enable WPAR mobility.16] Filesystems Recovery Method Filesystems mounted before IP configured Filesystems/Directories to Export (NFSv2/3) Filesystems/Directories to NFS Mount Network For NFS Mount [Entry Fields] sequential true [/wpar/home] [/zionhome.47. Copy the clone file to the secondary node(s). See Appendix A. HACMP Best Practices 21 . Test the integration thoroughly.1.log -R active=yes \ -M directory=/ vfs=nfs host=count dev=/wpars/slash \ -M directory=/home vfs=nfs host=count dev=/wpars/home \ -M directory=/tmp vfs=nfs host=count dev=/wpars/tmp \ -M directory=/var vfs=nfs host=count dev=/wpars/tmp \ -M directory=/cpr vfs=nfs host=count dev=/wpars/cpr \ -h zion -N interface='en0' address='10. Primary # /opt/mcr/bin/chkptwpar –d <path to statefile> -o <path to logfile> -k <WPARname> Secondary # /opt/mcr/bin/restartwpar –d <path to statefile> -o <path to logfile> <WPARname> Note: The Statefile must be visible to both the Global AIX and WPAR environment! Create a HACMP Application Server and Resource Group for the WPAR.Also.. before attempting any destructive testing.0.3' \ netmask='255. Test the WPAR operation under HACMPs control. to monitor the state of both the WPAR(s) and Application(s). mkwpar -c -r -o /tmp/w. please ensure you configure high availability for the NFS servers using cross-mounting.. Crossmounting causes each HACMP cluster node to NFS the local filesystem to itself. sample scripts. See Appendix A.255.rte. Tip! Start by moving the RG between nodes using C-SPOC./wpar/home] [] + + + This causes HACMP to export /wpar/home and mount the filesystem on all all nodes as: mount wpar_nfs:/wpar/home /zionhome On each of the cluster (WPAR) nodes install the metacluster checkpoint file: mcr.0' -n WPARname Go to smitty clonewpar_sys and create a WPAR clone file. On the primary cluster node create the WPAR eg. Implement HACMP custom monitoring. this results in a much faster failover time.

volume group settings. use IPAT via Aliasing style networking and enhanced concurrent VGs. to conclude this white paper. ensure the change is made and synchronized from an active node. ‘automated’ cluster testing (in TEST only). SMS or email when failures are encountered within the cluster.. HACMP Best Practices 22 . Always ensure changes are synchronized immediately. here is a general summary list of HACMP do’s and don’ts. NFS mount/export settings. fast disk takover and fast failure detection. Ensure the H/W & S/W environment has a reasonable degree of currency. Make the architecture too complex or implement a configuration which hard to test. Details should include a thorough description of the Storage / Network / Application / Cluster environment (H/ W & S/W configuration) and the Cluster Behavior (RG policies. In addition. Make use of available HACMP features. Once the cluster is in production. CPU IDs etc. build a technical detailed design document⁶. Focus on ensuring the cluster does what the users want /need it to do and that the cluster behaves how you intend it to do. extended cluster verification methods. Implement verification and validation scripts that capture common problems (or problems that are discovered in the environment) eg. Configure application monitors to enhance availability and aid self healing. Configure applications to bind in any way to node specific attributes. Following the above steps from the initial start phase will greatly reduce the likelihood of problems and change once the cluster is put into production. Where possible use CSPOC. Implement a test environment to ensure changes are adequately tested.. In addition. Do not : Introduce changes to one side of the cluster whilst not keeping the other nodes in sync. such as IP Addresses. application changes. If some nodes are up and others down. location dependencies etc). It is best practice to move the applications from node-to-node manually before putting them in resource groups under the control of HACMP.. make certain the cluster undergoes comprehensive and thorough testing⁷ before going live and further at regular intervals. Next. This is where the bulk of the documentation will be produced and will lay the foundation for a successful production environment! Start by building a detailed requirements document⁵. hostnames. such as: application monitoring. Take regular cluster snapshots and system backups. file collections.Summary ‘Some final words of advice .’ Spend considerable time in the planning stage. Do : Where feasible. Attempt change outside of HACMPs control using custom mechanisms. ensure that these mechanisms are kept up-to-date. all changes must be made in accordance with a documented Change Management procedure. Implement a reliable heartbeat mechanism and include at least one non IP network. Ensure there are mechanisms in place which will send out alerts via SNMP. Finally. and the specific changes must follow the Operational Procedures using (where possible) cluster aware tools⁸.

exactly how you intend to configure the cluster environment. ⁸Do not make the mistake of assuming that you have time to write the operational documentation once the cluster is in production. Always ensure these scripts verbosely log to stdout and stderr. This way. Rely on standard AIX volume groups (VGs) if databases use raw logical volumes. Implement nested file systems that create dependencies or waits and other steps that elongate failovers. The test plan should be formatted in a way which allows you to record the pass or failure of each test as this allows you to easily know what’s broken and it allows you to eventually demonstrate that the cluster actually does what the users wanted it to do and what you intended it to do. Provide root access to untrained and cluster unaware administrators. ⁵A written cluster requirements document allows you to carry out a coherent and focused discussion with the users about what they want done.Deploy basic application start and stop scripts which do not include pre-requisite checking and error recovery routines. This may also result in killing the HACMP application monitor as well. Rely on any form of manual effort or intervention which maybe involved in keeping the applications highly available. Consider instead implementing Big or Scaleable VGs. group and permission information can be stored in the VGDA header and will reduce the likelihood of problems during failover. It also allows you to refer to these requirements while you design the cluster and while you develop the cluster test plan. Change failure detection rates on networks without very careful thought and consideration. The behavior of the environment should meet all requirements specified in ⁵. user. ⁶A written cluster design document describes from a technical perspective. HACMP Best Practices 23 . Action operations such as # kill `ps –ef | grep appname | awk ‘{print $2}’` when stopping an application. ⁷A written test plan allows you to test the cluster against the requirements (which describes what you were supposed to build) and against the cluster design document (which describes what you intended to build).

.Appendix A Sample WPAR start.state LOGFILE=/wpars/$WPARname/cpr/alex...state LOGFILE=/wpars/$WPARname/cpr/alex.sh #!/bin/ksh WPARname=WPARalex LOG=/tmp/REC STATEFILE=/wpars/$WPARname/cpr/alex.log CHECKPOINT=/wpars/$WPARname/CHECKPOINT echo "Stopping WPAR on $(date)" >> $LOG if [ -d $STATEFILE ]." /opt/mcr/bin/chkptwpar -d $STATEFILE \ -o $LOGFILE -k $WPARname | tee -a $LOG # create a CHKPOINT marker file on the WPAR nfs server HACMP Best Practices 24 . then rm -r $STATEFILE. key=${VAR1:=null} if [ $key = "CLEAN" ].. rm $LOGFILE fi VAR1=$1.sh #!/bin/ksh WPARname=WPARalex LOG=/tmp/REC STATEFILE=/wpars/$WPARname/cpr/alex." /opt/mcr/bin/restartwpar -d $STATEFILE \ -o $LOGFILE $WPARname | tee -a $LOG rm $CHECKPOINT else startwpar $WPARname fi exit 0 wparstop. then # may want to cleanly stop all applications running in the WPAR first! stopwpar -F $WPARname else echo "checkpoint stopping. then umount /wpars/$WPARname echo "checkpoint restoring .. stop and monitor scripts for HACMP wparstart.log CHECKPOINT=/wpars/$WPARname/CHECKPOINT echo "Starting WPAR on $(date)" >> $LOG #check for the CHECKPOINT marker file on the WPAR nfs server mount count:/wpars/slash /wpars/$WPARname if [ -f $CHECKPOINT ].

then ping -w 1 -c1 $WPARNODE > /dev/null 2>&1 if [ $? -eq 0 ].($STATE)" |tee -a $LOG exit 33 fi HACMP Best Practices 25 .mount count:/wpars/slash /wpars/$WPARname touch $CHECKPOINT umount /wpars/$WPARname fi exit 0 wparmonitor. then echo "WPAR is alive and well" |tee -a $LOG exit 0 # Probably want to perform an application health check here! else echo "WPAR not responding" |tee -a $LOG exit 31 fi else echo "WPAR is not active .sh #!/bin/ksh LOG=/tmp/REC WPARNODE=zion # get the state of the WPAR STATE=`lswpar -c -q | awk -F: '{print $2}'` if [ $STATE = "A" ].

Security and High Availability. course code AU62G System p LPAR and Virtualization II: Implementing Advanced Configurations AU78G IBM Redbooks : IBM System p Advanced Power Virtualization Best Practice . or to comment on this document please email: hafeedbk@us.SG24-6769. HACMP Best Practices 26 . Version V03. Alex has over 17 years experience working with UNIX® systems and has been actively responsible for managing. Grant McLaughlin and Susan Schreitmueller for reviewing this document.References HACMP Web Pages.ibm. 2007 Feb. 2008 July.com About the Author Alex Abderrazag Alex has worked for IBM since 1994 and has been part of IBM Training for the past six years specializing in POWER technology.1 .ibm.com/systems/p/software/hacmp. course code AU54G HACMP System Administration II: Administration and Problem Determination.html http://lpar. Major re-write.REDP-4194 Implementing High Availability Cluster Multi-Processing Cookbook .uk IBM Training Education : HACMP System Administration I: Planning and Implementation.00 V02.SG24-7431-00.co.00 Date Jan. Introduction to Workload Partition Management in IBM AIX Version 6. 2005 Author Alex Abderrazag Alex Abderrazag Tom Weaver Description Minor update to include section on WPAR. course code AU61G HACMP System Administration III: Virtualization and Disaster Recovery. Bill Miller. teaching and developing the AIX/Linux® education curriculum. For more information about HACMP. Document Created by HACMP development. http://www-03.00 V01. Special Thanks To: Tony ‘Red’ Steel. TCP/IP.

including system benchmarks. product.com/legal/copytrade. PSW03025-GBEN-01 HACMP Best Practices 27 . Copying or downloading the images contained in this document is expressly prohibited without the written consent of IBM. Changes may be incorporated in production models. In some cases.ibm. The IBM System p home page on the Internet can be found at: http://www. Questions on the capabilities of the nonIBM products should be addressed with those suppliers. Consult your local IBM business contact for information on the products. trademarks owned by IBM may be found at: http://www. When referring to storage capacity. Performance information is provided “AS IS” and no warranties or guarantees are expressed or implied by IBM. accessible capacity may be less. or services discussed in this document in other countries. Micro-Partitioning. The information may be subject to change without notice. A full list of U. New York 10589 Produced in the United States of America July 2007 All Rights Reserved This document was developed for products and/or services offered in the United States. features and services available in your area.S.com/systems/p. All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals and objectives only. or new and used parts. POWER. our warranty terms apply. AIX 5L. IBM. Tivoli are trademarks or registered trademarks of International Business Machines Corporation in the United States or other countries or both.com. Buyers should consult other sources of information. IBM hardware products are manufactured from new parts. IBM may not offer the products. Actual results may vary. Power Architecture. features.ibm. Regardless.® © IBM Corporation 2007 IBM Corporation Systems and Technology Group Route 100 Somers. and service names may be trademarks or service marks of others. POWER4. HACMP. Information concerning non-IBM products was obtained from the suppliers of these products or other public sources. The IBM home page on the Internet can be found at: http://www. This equipment is subject to FCC rules. the IBM logo. System p. POWER5. Photographs show engineering and design models. to evaluate the performance of a system they are considering buying. 1 TB equals total GB divided by 1000.shtml. It will comply with the appropriate FCC rules before final delivery to the buyer. Other company.ibm. the hardware product may not be new and may have been previously installed. All performance information was determined in a controlled environment.

Sign up to vote on this title
UsefulNot useful