You are on page 1of 16

HACMP : High Availability Cluster Multi-Processing High Availability : Elimination of both planned and unplanned system and application

downtime. This is achieved through elimination of H/W and S/W single points of failure. Cluster Topology : The Nodes, networks, storage, clients, persistent node ip label/devices Cluster resources: HACMP can move these components from one node to others Ex: Service labels, File systems and applications RSCT Version: 2.4.2 SDD Version: 1.3.1.3 HA Configuration :

Define the cluster and nodes Define the networks and disks Define the topology Verify and synchronize Define the resources and resource groups Verify and synchronize

After Installation changes : /etc/inittab,/etc/rc.net,/etc/services,/etc/snmpd.conf,/etc/snmpd.peers,/etc/syslog.conf, /etc/trcfmt,/var/spool/cron/crontabs/root,/etc/hosts , HACMP group will add Software Components: Application server HACMP Layer RSCT Layer AIX Layer LVM Layer TCP/IP Layer HACMP Services : Cluster communication daemon(clcomdES) Cluster Manager (clstrmgrES) Cluster information daemon(clinfoES)

Cluster lock manager (cllockd) Cluster SMUX peer daemon (clsmuxpd) HACMP Deamons: clstrmgr, clinfo, clmuxpd, cllockd. HA supports up to 32 nodes HA supports up to 48 networks HA supports up to 64 resource groups per cluster HA supports up to 128 cluster resources IP Label : The label that is associated with a particular IP address as defined by the DNS (/etc/hosts) Base IP label : The default IP address. That is set on the interface by aix on startup. Service IP label: a service is provided and it may be bound on a single/multiple nodes. These addresses that HACMP keep highly available. IP alias: An IP alias is an IP address that is added to an interface. Rather than replacing its base IP address. RSCT Monitors the state of the network interfaces and devices. IPAT via replacement : The service IP label will replace the boot IP address on the interface. IPAT via aliasing: The service IP label will be added as an alias on the interface. Persistent IP address: this can be assigned to a network for a particular node. In HACMP the NFS export : /use/es/sbin/cluster/etc/exports Shared LVM:

Shared volume group is a volume group that resides entirely on the external disks shared by cluster nodes Shared LVM can be made available on Non concurrent access mode, Concurrent Access mode, Enhanced concurrent access mode.

NON concurrent access mode: This environment typically uses journaled file systems to manage data. Create a non concurrent shared volume group: smitty mkvgGive VG name, No for automatically available after system restart, Yes for Activate VG after it is created, give VG major number

Create a non concurrent shared file system: smitty crjfsRename FS names, No to mount automatically system restart, test newly created FS by mounting and unmounting it. Importing a volume group to a fallover node:

Varyoff the volume group Run discover process Import a volume group

Concurrent Acccess Mode: Its not supported for file systems. Instead must use raw LVs and Physical disks. Creating concurrent access volume group:

Verify the disk status using lsdev Cc disk Smitty cl_convgCreate a concurrent volume groupenter Import the volume group using importvg C y vg_name physical_volume_name Varyonvg vgname

Create LVs on the concurrent VG: smitty cl_conlv. Enhanced concurrent mode VGs: This can be used for both concurrent and non concurrent access. This VG is varied on all nodes in the cluster, The access for modifying the data is only granted to the node that has the resource group active. Active or passive mode: Active varyon: all high level operations permitted. Passive varyon: Read only permissions on the VG. Create an enhanced concurrent mode VG: mkvg n s 32 C y myvg hdisk11 hdisk12 Resource group behaviour: Cascading: Fallover using dynamic node priority. Online on first available node Rotating : Failover to next priority node in the list. Never fallback. Online using distribution policy. Concurrent : Online on all available nodes . never fallback RG dependencies:Clrgdependency t /etc/hosts : /etc/hosts for name resolution. All cluster node IP interfaces must be added on this file. /etc/inittab : hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init>/dev/console 2> &1 will strat the clcomdES and clstrmgrES.

/etc/rc.net file is called by cfgmgr. To configure and start TCP/IP during the boot process. C-SPOC uses clcomdES to execute commands on remote nodes. C-SPC commands located in /usr/es/sbin/cluster/cspoc you should not stop a node with the forced option on more than one node at a time and also the RG in concurrent mode. Cluster commands are in /usr/es/sbin/cluster User Administration : cl_usergroup Create a concurrent VG > smitty cl_convg To find the resource group information: clrginfo P HACMP Planning: Maximum no.of nodes in a cluster is 32 In an HACMP Cluster, the heartbeat messages are exchanged via IP networks and Point-toPoint networks IP Label represents the name associated with a specific IP address Service IP label/address: The service IP address is an IP address used for client access. 2 types of service IP addresses: Shared Service IP address: It can be active only on one node at a time. Node bound service IP address: An IP address that can be configured only one node Method of providing high availability service IP addresses: IP address takeover via IP aliases IPAT via IP replacement IP alias is an IP address that is configured on a communication interface in addition to the base ip address. IP alias is an AIX function that is supported by HACMP. AIX supports multiple IP aliases on each communication interface. Each IP alias can be a different subnet. Network Interface: Service Interface: This interface used for providing access to the application running on that node. The service IP address is monitored by HACMP via RSCT heartbeat.

Boot Interface: This is a communication interface. With IPAT via aliasing, during failover the service IP label is aliased onto the boot interface Persistent node IP label: Its useful for administrative purpose. When an application is started or moved to another node together with its associated resource group, the service IP address can be configured in two ways.

Replacing the base IP address of a communication interface. The service IP label and boot IP label must be on same subnet. Configuring one communication interface with an additional IP address on top of the existing one. This method is IP aliasing. All Ip addresses/labels must be on different subnet.

Default method is IP aliasing. HACMP Security: Implemented directly by clcomdES, Uses HACMP ODM classes and the /usr/es/sbin/cluster/rhosts file to determine partners. Resource Group Takeover relationship: Resource Group: Its a logical entity containing the resources to be made highly available by HACMP. Resources: Filesystems, NFS, Raw logical volumes, Raw physical disks, Service IP addresses/Labels, Application servers, startup/stop scripts. To made highly available by the HACMP each resource should be included in a Resource group. Resource group takeover relationship: 1. 2. 3. 4. Cascading Rotating Concurrent Custom

Cascading:
o o o o o

Cascading resource group is activated on its home node by default. Resource group can be activated on low priority node if the highest priority node is not available at cluster startup. If node failure resource group falls over to the available node with the next priority. Upon node reintegration into the cluster, a cascading resource group falls back to its home node by default. Attributes:

1. Inactive takeover(IT): Initial acquisition of a resource group in case the home node is not available.

2. Fallover priority can be configured in default node priority list. 3. cascading without fallback is an attribute that modifies the fall back behavior. If cwof flag is set to true, the resource group will not fall back to any node joining. When the flag is false the resource group falls back to the higher priority node. Rotating:
o o o

At cluster startup first available node in the node priority list will activate the resource group. If the resource group is on the takeover node. It will never fallback to a higher priority node if one becomes available. Rotating resource groups require the use of IP address takeover. The nodes in the resource chain must all share the same network connection to the resource group.

Concurrent:
o

A concurrent RG can be active on multiple nodes at the same time.

Custom:
o o

Users have to explicitly specify the desired startup, fallover and fallback procedures. This support only IPAT via aliasing service IP addresses.

Startup Options:

Online on home node only Online on first available node Online on all available nodes Online using distribution policyThe resource group will only be brought online if the node has no other resource group online. You can find this by lssrc ls clstrmgrES

Fallover Options:

Fallover to next priority node in list Fallover using dynamic node priorityThe fallover node can be selected on the basis of either its available CPU, its available memory or the lowest disk usage. HACMP uses RSCT to gather all this information then the resource group will fallover to the node that best meets. Bring offlineThe resource group will be brought offline in the event of an error occur. This option is designed for resource groups that are online on all available nodes.

Fallback Options:

Fallback to higher priority node in the list Never fallback

Basic Steps to implement an HACMP cluster:


Planning Install and connect the hardware Configure shared storage Installing and configuring application software Install HACMP software and reboot each node Define the cluster topology Synchronize the cluster topology Configure cluster resources Configure cluster resource group and shared storage Synchronize the cluster Test the cluster

HACMP installation and configuration: HACMP release notes : /usr/es/lpp/cluster/doc Smitty install_all fast path for installation Cluster.es and cluster.cspoc images must be installed on all servers Start the cluster communication daemon startsrc s clcomdES Upgrading the cluster options: node by node migration and snapshot conversion Steps for migration:

Stop cluster services on all nodes Upgrade the HACMP software on each node Start cluster services on one node at a time

Convert from supported version of HAS to hacmp


Current s/w should be commited Save snapshot Remove the old version Install HA 5.1 and verify

Check previous version of cluster: lslpp h cluster To save your HACMP configuration, create a snapshot in HACMP Remove old version of HACMP: smitty install_remove ( select software name cluster*) Lppchk v and lppchk c cluster* both commands run clean if the installation is ok. After you have installed HA on cluster nodes you need to convert and apply the snapshot. converting the snapshot must be performed before rebooting the cluster nodes

Clconvert_snapshot C v version s It converts HA old version snapshot to new version After installation rebooting the cluster services is required because to activate the new cluster manager. Verification and synchronization : smitty hacmpextended configuration extended verification and configuration verify changes only Perform Node-by-Node Migration:

Save the current configuration in snapshot. Stop cluster services on one node using graceful with takeover Verify the cluster services Install hacmp latest version. Check the installed software using lppchk Reboot the node. Restart the HACMP software ( smitty hacmpSystem ManagementManage cluster servicesstart cluster services Repeat above steps on all nodes Logs documenting on /tmp/hacmp.out /tmp/cm.log /tmp/clstrmgr.debug Config_too_long message appears when the cluster manager detects that an event has been processing for more than the specified time. To change the time interval ( smitty hacmp extended configurationextended event configurationchange/show time until warning)

Cluster snapshots are saved in the /usr/es/sbin/cluster/snapshots. Synchronization process will fail when migration is incomplete. To back out from the change you must restore the active ODM. (smitty hacmp Problem determination tools Restore HACMP configuration database from active configuration.) Upgrading HACMP new version involves converting the ODM from previous release to the current release. That is done by /usr/es/sbin/cluster/conversion/cl_convert F v 5.1 The log file for the conversion is /tmp/clconvert.log. Clean up process once installation interrupted.( smitty install software maintenance and installation clean up after a interrupted installation) Network Configuration: Physical Networks: TCP/IP based, such as Ethernet and token ring Device based, RS 232 target mode SSA(tmssa) Configuring cluster Topology: Standard and Extended configuration Smitty hacmpInitialization and standard configuration

IP aliasing is used as the default mechanism for service IP label/address assignment to a network interface.

Configure nodes : Smitty hacmpInitialization and standard configurationconfigure nodes to an hacmp cluster (Give cluster name and node names) Configure resources: Use configure resources to make highly available ( configure IP address/label, Application server, Volume groups, Logical volumes, File systems Configure resource groups: Use configure HACMP resource groups . you can choose cascading, rotating, custom, concurrent Assign resources to each resource group: configure HACMP resource groups Change/show resources for a Resource group. Verify and synchronize the cluster configuration Display the cluster configuration

Steps for cluster configuration using extended path:


Run discovery: Running discovery retrieves current AIX configuration information from all cluster nodes. Configuring an HA cluster: smitty hacmpextended configurationextended topology configurationconfigure an HACMP clusterAdd/change/show an HA cluster Defining a node: smitty hacmpextended configurationextended topology configurationconfigure HACMP nodesAdd a node to the HACMP cluster Defining sites: This is optional. Defining network: Run discover before network configuration. 1. IP based networks: smitty hacmpextended configurationextended topology configurationconfigure HACMP networksAdd a network to the HACMP clusterselect the type of network(enter network name, type, netmask, enable IP takeover via IP aliases(default is true), IP address offset for heartbeating over IP aliases. Defining communication interfaces: smitty hacmpextended configurationextended topology configurationHACMP cotmmunication interfaces/DevicesSelect communication interfacesadd node name, network name, network interface, IPlabel/address, network type Defining communication devices: smitty hacmpextended configurationextended topology configurationconfigure HACMP communication interface/devicesselect communication devices To see boot IP labels on a node use netstat in Defining persistent IP labels: It always stays on the same node, does not require installing an additional physical interface, its not part of any resource group.smitty hacmpextended topology configurationconfigure persistent node IP label/addressesadd persistent node IP label(enter node name, network name, node IP label/address)

Resource Group Configuration

Smitty hacmpinitialization and standard configurationConfigure HACMP resource groups Add a standard resource group Select cascading/Rotating/Concurrent/Custom (enter resource group name, participating node names)

Assigning resources to the RG. Smitty hacmpinitialization and standard configuration Configure HACMP resource groupschange/show resources for a standard resource group( add service IP label/address, VG, FS, Application servers.

Resource group and application management:


Bring a resource group offline: smitty cl_adminselect hacmp resource group and application managementBring a resource group offline. Bring a resource group online: smitty hacmp select hacmp resource group and application managementBring a resource group online. Move a resource group: smitty hacmp select hacmp resource group and application management Move a resource group to another node

C-SPOC: Under smitty cl_admin


Manage HACMP services HACMP Communication interface management HACMP resource group and application manipulation HACMP log viewing and management HACMP file collection management HACMP security and users management HACMP LVM HACMP concurrent LVM HACMP physical volume management

Post Implementation and administration: C-Spoc commands are located in the /usr/es/sbin/cluster/cspoc directory. HACMP for AIX ODM object classes are stored in /etc/es/objrepos. User group administration in hacmp is smitty cl_usergroup Problem Determination: To verify the cluster configuration use smitty clverify.dialog Log file to store output: /var/hacmp/clverify/clverify.log HACMP Log Files: /usr/es/adm/cluster.log: Generated by HACMP scripts and daemons. /tmp/hacmp.out: This log file contains line by line record of every command executed by scripts. /usr/es/sbin/cluster/history/cluster.mmddyyyy: System creates cluster history file everyday. /tmp/clstrmgr.debug: This messages generated by clstrmgrES activity.

/tmp/cspoc.log: generated by hacmp c-spoc commands /tmp/dms_loads.out: stores log messages every time hacmp triggers the deadman switch /var/hacmp/clverify/clverify.log: cluster verification log. /var/ha/log/grpsvcs, /var/ha/log/topsvcs, /var/ha/log/grpglsm: daemon logs. Snapshots: The primary information saved in a cluster snapshot is the data stored in the HACMP ODM classes(HACMPcluster, HACMPnode, HACMPnetwork, HACMPdaemons). The cluster snapshot utility stores the data it saves in two separate files: ODM data file(.odm), Cluster state information file(.info) To create a cluster snapshot: smitty hacmphacmp extended configurationhacmp snapshot configurationadd a cluster snapshot Cluster Verification and testing: High and Low water mark values are 33 and 24 The default value for syncd is 60. Before starting the clu ster clcomd daemon is added to the /etc/inittab and started by init. Verify the status of the cluster services: lssrc g cluster ( cluster manager daemon(clstrmgrES), cluster SMUX peer daemon(clsmuxpd) and cluster topology services daemon(topsvcd) should be running. Status of different cluster subsystems: lssrc g topsvcs and lssrc g emsvcs. In /tmp/hacmp.out file look for the node_up and node_up_complete events. To check the HACMP cluster status: /usr/sbin/cluster/clstat. To use this command you should have started the clinfo daemon. To change the snmp version : /usr/sbin/snmpv3_ssw -1. Stop the cluster services by using smitty clstop : graceful, takeover, forced. In the log file /tmp/hacmp.out search for node_down and node_down_complete. Graceful: Node will be released, but will not be acquired by other nodes. Graceful with takeover: Node will be released and acquired by other nodes. Forced: Cluster services will be stopped but resource group will not be released. Resource group states: online, offline, aquiring, releasing, error, temporary error, or unknown.

Find the resource group status: /usr/es/sbin/cluster/utilities/clfindres or clRGinfo. Options: -t : If you want to display the settling time p: display priority override locations To review cluster topology: /usr/es/sbin/cluster/utilities/cltopinfo. Different type of NFS mounts: hard and soft Hard mount is default choice. NFS export file: /usr/es/sbin/cluster/etc/exports. If the adapter configured with a service IP address : verify in /tmp/hacmp.out event swap_adapter has occurred, Service IP address has been moved using the command netstat in . You can implement RS232 heartbeat network between any 2 nodes. To test a serial connection lsdev Cc tty, baud rate is set to 38400, parity to none, bits per character to 8 Test to see RSCT is functioning or not : lssrc ls topsvcs RSCT verification: lssrc ls topsvcs. To check RSCT group services: lssrc ls grpsvcs Monitor heartbeat over all the defines networks: cllsif.log from /var/ha/run/topsvcs.clustername. Prerequisites: PowerHA Version 5.5 AIX v5300-9 RSCT levet 2.4.10 BOS components: bos.rte.*, bos.adt.*, bos.net.tcp.*, Bos.clvm.enh ( when using the enhanced concurrent resource manager access) Cluster.es.nfs fileset comes with the powerHA installation medium installs the NFSv4. From aix BOS bos.net.nfs.server 5.3.7.0 and bos.net.nfs.client 5.3.7.0 is required. Check all the nodes must have same version of RSCT using lslpp l rsct Installing powerHA: release notes: /usr/es/sbin/cluster/release_notes Enter smitty install_allselect input devicePress f4 for a software listingenter Steps for increase the size of a shared lun:

Stop the cluster on all nodes Run cfgmgr Varyonvg vgname

Lsattr El hdisk# Chvg g vgname Lsvg vgname Varyoffvg vgname On subsequent cluster nodes that share the vg. (run cfgmgr, lsattr El hdisk#, importvg L vgname hdisk#) Synchronize

PowerHA creates a backup copy of the modified files during synchronization on all nodes. These backups are stored in /var/hacmp/filebackup directory. The file collection logs are stored in /var/hacmp/log/clutils.log file. User and group Administration: Adding a user: smitty cl_usergroupselect users in a HACMP clusterAdd a user to the cluster.(list users, change/show characteristics of a user in cluster, Removing a user from the cluster Adding a group: smitty cl_usergroupselect groups in a HACMP clusterAdd a group to the cluster.(list groups, change/show characteristics of a group in cluster, Removing a group from the cluster Command is used to change password on all cluster nodes: /usr/es/sbin/cluster/utilities/clpasswd Smitty cl_usergroupusers in a HACMP cluster

Add a user to the cluster List users in the cluster Change/show characteristics of a user in the cluster Remove a user from the cluster

Smitty cl_usergroupGroups in a HACMP cluster


Add a group to the cluster List groups to the cluster Change a group in the cluster Remove a group

Smitty cl_usergroupPasswords in an HACMP cluster Importing VG automatically: smitty hacmpExtended configurationHACMP extended resource configurationChange/show resources and attributes for a resource groupAutomatically import volume groups to true C-SPOC LVM: smitty cl_admin HACMP Logical Volume Management

Shared Volume groups Shared Logical volumes

Shared File systems Synchronize shared LVM mirrors (Synchronize by VG/Synchronize by LV) Synchronize a shared VG definition

C-SPOC concurrent LVM: smitty cl_admin HACMP concurrent LVM


Concurrent volume groups Concurrent Logical volumes Synchronize concurrent LVM mirrors

C-SPOC Physical volume management: smitty cl_adminHACMP physical volume management


Add a disk to the cluster Remove a disk from the cluster Cluster disk replacement Cluster datapath device management

Cluster Verification: smitty hacmpExtended verificationExtended verification and synchronization. Verification log files stored in /var/hacmp/clverify. /var/hacmp/clverify/clverify.log Verification log /var/hacmp/clverify/pass/nodename If verification succeeds /var/hacmp/clverify/fail/nodename If verification fails Automatic cluster verification: Each time you start cluster services and every 24 hours. Configure automatic cluster verification: smitty hacmpproblem determination toolshacmp verification Automatic cluster configuration monitoring. Cluster status Monitoring: /usr/es/sbin/cluster/clstat a and o. /usr/es/sbin/cluster/utilities/cldumpIt provides snapshot of the key cluster status components. Clshowsrv: It displays the status Disk Heartbeat:

Its a non-IP heartbeat Its use dedicated disk/LUN Its a point to point network If more than 2 nodes exist in your cluster, you will need a minimum of n number of non-IP heartbeat networks. Disk heartbeating will typically requires 4 seeks/second. That is each of two nodes will write to the disk and read from the disk once/second. Filemon tool monitors the seeks.

Configuring disk heartbeat:

Vpaths are configured as member disks of an enhanced concurrent volume group. Smitty lvmselect volume groupsAdd a volume groupGive VG name, PV names, VG major number, Set create VG concurrent capable to enhanced concurrent. Import the new VG on all nodes using smitty importvg or importvg V 53 y c23vg vpath5 Create the diskhb networksmitty hacmpextended configuration extended topology configurationconfigure hacmp networksAdd a network to the HACMP clusterchoose diskhb Add 2 communication devices smitty hacmpextended configuration extended topology configurationConfigure HACMP communication Interfaces/DevicesAdd communication interfaces/devicesAdd pre-defined communication interfaces and devices communication deviceschoose the diskhb Create one communication device for other node also

Testing Disk Heartbeat connectivity:/usr/sbin/rsct/dhb_read is used to test the validity of a diskhb connection. Dhb_read p vpath0 r for receives data over diskhb network Dhb_read p vpath3 t for transmits data over diskhb network. Monitoring disk heartbeat: Monitor the activity of the disk heartbeats via lssrc ls topsvcs. Monitor the Missed HBS field. Configure HACMP Application Monitoring: smitty cm_cfg_appmonAdd a process application monitorgive process names, app startup/stop scripts Application availability analysis tool: smitty hacmpsystem managementResource group and application managementapplication availability analysis Commands: List the cluster topology : /usr/es/sbin/cluster/utilities/cllsif /usr/es/sbin/cluster/clstat Start cluster : smitty clstart .. Monitor with /tmp/hacmp.out and check for node_up_complete. Stop the cluster : smitty cl_stop Monitor with /tmp/hacmp.out and check fr node_down_complete. Determine the state of cluster: /usr/es/sbin/cluster/utilities/clcheck_server Display the status of HACMP subsystems: clshowsrv v/-a Display the topology information: cltopinfo c/-n/-w/-i Monitor the heartbeat activity: lssrc ls topsvcs [ check for dropped, errors] Display resource group attributes: clrginfo v, -p, -t, -c, -a OR clfindres