You are on page 1of 54

What’s New in PowerHA SystemMirror

for AIX v7.2.x Session# a100261



Shawn Bodily

Clear Technologies
sbodily@cleartechnologies.net

2018 IBM Systems


Technical University
April 30th- May 4th

Orlando, Florida
Copyright © 2017 by International Business Machines Corporation (IBM). No part of this document may be
Notice and reproduced or transmitted in any form without written permission from IBM.

disclaimers U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by
IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional
technical or typographical errors. IBM shall have no responsibility to update this information. This document is
distributed “as is” without any warranty, either express or implied. In no event shall IBM be liable for
any damage arising from the use of this information, including but not limited to, loss of data,
business interruption, loss of profit or loss of opportunity. IBM products and services are warranted
according to the terms and conditions of the agreements under which they are provided.

IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be
new and may have been previously installed. Regardless, our warranty terms apply.”

Any statements regarding IBM's future direction, intent or product plans are subject to change or
withdrawal without notice.

Performance data contained herein was generally obtained in a controlled,


isolated environments. Customer examples are presented as illustrations of how those customers have used
IBM products and the results they may have achieved. Actual performance, cost, savings or other results in
other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to
make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and
do not necessarily reflect the views of IBM. All materials and discussions are provided for informational
purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any
individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of
competent legal counsel as to the identification and interpretation of any relevant laws and regulatory
requirements that may affect the customer’s business and any actions the customer may need to take to
comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products
will ensure that the customer is in compliance with any law.
Please note

IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general


product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract.

The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in


a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the
amount of multiprogramming in the user’s job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve results similar to those stated here.

IBM Systems Technical Events - ibm.com/training/events 3


Please
complete
the session
survey!

IBM Systems Technical Events - ibm.com/training/events 4


Agenda
New PowerHA v7.2.0 features
• Automatic Repository Disk Replacement
• Rootvg loss failure via critical vg setting from LVM
• LPM Automation Integration
• Live Kernel Updated Integration
• Non-Disruptive Upgrades
• ROHA fallovers utilizing Enterprise Pools
• Split-brain handling options
– NFS tie-breaker option for split/merge
• GLVM wizard

New PowerHA v7.2.1 features


• SMUI – System Mirror User Interface
• Additional ROHA support
• Split-brain handling options expanded to all cluster types
IBM Systems Technical Events - ibm.com/training/events
Agenda

PowerHA v7.2.1 SP1


• Introduced cl_ezupdate for applying fixes cluster wide

PowerHA v7.2.2 features


• SMUI Enhancements
– Deployment wizard
– Start/stop cluster and move Resource Group (RG)
– Support for role-based user administration
– Cluster Zones
• Cluster wide log analysis - clanalyze
• Easy update rollback feature and OS cloning added to cl_ezupdate

IBM Systems Technical Events - ibm.com/training/events


Migration Planning

AIX Requirements
AIX 6.1 AIX 7.1 AIX 7.2
HA - 7.1.2 TL8 SP1 TL2 SP1 SP1

HA - 7.1.3 TL9 SP1 TL3 SP1 SP1

HA - 7.2.0 TL9 SP5 TL3 SP5 SP1


TL4 SP1
HA - 7.2.1 NA TL3 SP5 SP1
TL4 SP2 TL1
HA - 7.2.2 NA TL4 SP2 TL1
TL5 TL2
PowerHA for AIX Version Compatibility Matrix http://tinyurl.com/hacompat

Always check the following URL for known PowerHA fixes prior to upgrading
https://aix.software.ibm.com/aix/ifixes/PHA_Migration/ha_install_mig_fixes.htm#_Toc483527879

IBM Systems Technical Events - ibm.com/training/events 7


PowerHA SystemMirror for AIX v7.2.0

IBM Systems Technical Events - ibm.com/training/events 8


Automatic Repository Disk Replacement

• PowerHA v7.1.1 added backup repository capability but to replace it was


a manual procedure.

• Automatic Repository Update (ARU) provides autoswap capability of


failed repository disk with a pre-defined backup repository disk.
– Requires AIX 7.1.4 or 7.2.0
– Maximum 6 backups can be defined
– Backups are polled once a minute by clconfd to verify the disks are still
viable for an ARU procedure

• Defining Backup Repository Disk(s)


– Preferably make the disk the same size
– Make sure disk has PVID on each node of the cluster
– Via SMIT
 Smitty sysmirror Cluster Nodes and NetworksManage Repository
DisksAdd a Repository Disk
 Choose disk from picklist, hit Enter
– Via clmgr
 Clmgr add repository <pvid>
– Synchronize cluster
IBM Systems Technical Events - ibm.com/training/events
Automatic Repository Disk Replacement

• Demo available here


IBM Systems Technical Events - ibm.com/training/events
Automatic Repository Disk Replacement
• Logging during failure detection and replacement is in syslog.caa

IBM Systems Technical Events - ibm.com/training/events


Rootvg failure detection

• PowerHA has had “critical” volume group feature for data volume groups.

• PowerHA also had previous method of rootvg loss detection.

• AIX LVM enhancements added a “critical VG” setting

• When rootvg has the critical VG option set, and the system cannot access
a quorum of rootvg disks or all rootvg disks if quorum is disabled, then the
node is failed with a message sent to the console and KERNEL_PANIC
error logged in errpt.

• Support also exists with PowerHA 7.1.3 with proper AIX levels that supports
critical vg option.

• PowerHA 7.2 automatically enables the setting

IBM Systems Technical Events - ibm.com/training/events


Rootvg failure detection enabling

• Simple vg setting change

• Can change across all cluster nodes with clcmd command.

(Jess)# clcmd lsvg rootvg |grep CRIT


DISK BLOCK SIZE: 512 CRITICAL VG: no
DISK BLOCK SIZE: 512 CRITICAL VG: no

(Jess)# clcmd chvg -r y rootvg


(Jess)# clcmd lsvg rootvg |grep CRIT
DISK BLOCK SIZE: 512 CRITICAL VG: yes
DISK BLOCK SIZE: 512 CRITICAL VG: yes

• Test by unmapping rootvg volume to host either from storage or VIOS

• Demo available here

IBM Systems Technical Events - ibm.com/training/events


LPM Automation Integration

• LPM with HACMP has been supported since March 2008

– HACMP version 5.3 and 5.4.1

• Recommended with only default heartbeat settings

• Recommended only with application monitoring with minutes in between checks

• If either one of the above two did NOT apply, then undesired results may be
encountered during LPM.

• To prevent such results it was STRONGLY recommended to “Unmanage” the cluster


node before performing LPM

• Now PowerHA 7.2 does it, and much more, automatically now.

IBM Systems Technical Events - ibm.com/training/events


LPM Automation Integration – Pre LPM Steps
1. Check if HyperSwap is used, if YES, go to 2; otherwise, go to 1.1

1.1 Check if set LPM_POLICY=unmanage, if YES, go to 2; otherwise, go to 4:


clodmget -n -f lpm_policy HACMPcluster

2. Change node to ‘unmanage resource group’ status:


clmgr stop node <node_name> WHEN=now MANAGE=unmanage

3. Add entry in the /etc/inittab file, which is useful in case of a node crash
before restoring the
managed state:
mkitab hacmp_lpm:2:once:/usr/es/sbin/cluster/utilities/cl_dr
undopremigrate >/dev/null 2>&1

4. Check if RSCT DMS critical resource monitoring is enable:


/usr/sbin/rsct/bin/dms/listdms -s cthags | grep -qw Enabled

.5 Disable RSCT DMS critical resource monitoring:


/usr/sbin/rsct/bin/hags_disable_client_kill -s cthags
/usr/sbin/rsct/bin/dms/stopdms -s cthags

6. Check if current node_timeout is equal the value you set:


clodmget -n -f lpm_node_timeout HACMPcluster
clctrl -tune -x node_timeout

7. Change the CAA node_timeout value:


clmgr -f modify cluster HEARTBEAT_FREQUENCY="600“

8. If enable SAN based heartbeating, then disable this function:


echo ‘sfwcom’ >> /etc/cluster/ifrestrict
clusterconf
IBM Systems Technical Events - ibm.com/training/events
LPM Automation – Post LPM Steps
1. Check if current resource group status is unmanaged. If YES, go to 2; otherwise, go to 4.

2. Change node back to ‘manage resource group’ status:


clmgr start node <node_name> WHEN=now MANAGE=auto

3. Remove entry from /etc/inittab file which added in pre-migration process:


rmitab hacmp_lpm

4. Check if RSCT DMS critical resource monitoring function is enabled before LPM
operation.

5. Enable RSCT DMS critical resource monitoring:


/usr/sbin/rsct/bin/dms/startdms -s cthags
/usr/sbin/rsct/bin/hags_enable_client_kill -s cthags

6. Check if current node_timeout is equal with the value you set before:
clctrl -tune -x node_timeout
clodmget -n -f lpm_node_timeout HACMPcluster

6. Restore the CAA node_timeout value:


clmgr -f modify cluster HEARTBEAT_FREQUENCY="30“

7. If enable SAN based heartbeating, then enable this function:


rm -f /etc/cluster/ifrestrict
clusterconf
rmdev -l sfwcomm*
mkdev -l sfwcomm*

IBM Systems Technical Events - ibm.com/training/events


LPM Automation Further Enablement

smit sysmirror → Custom Cluster Configuration→ Cluster Nodes and


Networks → Manage the Cluster → Cluster heartbeat settings

Smitty fastpath is smit cm_chng_tunables

IBM Systems Technical Events - ibm.com/training/events


Live Kernel Update Support (LKU)

• LKU can only be performed on one cluster node at a time.

• Support includes all PowerHA SystemMirror Enterprise Edition Storage


replication features including HyperSwap and Geographic Logical Volume
Manager (GLVM).

– However, for asynchronous GLVM, you must swap to sync mode


before LKU is performed, and then swap back to async mode upon
LKU completion.

• During LKU operation, enhanced concurrent volume groups cannot be


changed.

• Workloads continue to run without interruption.

IBM Systems Technical Events - ibm.com/training/events


Live Kernel Update Support Phases

PowerHA provides scripts that are called during different phases of the AIX live
kernel update. An overview of the PowerHA operations at each phase follows:

Check phase
 Verifies that no other concurrent AIX Live Update is in progress in the cluster
 Verifies that the cluster is in stable state
 Verifies that there are no GLVM active asynchronous mirror pools

Pre-phase
 Switches the active Enhanced Concurrent volume groups (VGs) in silent mode
 Stops the cluster services and SRC daemons
 Stops GLVM traffic if required

Post phase
 Restarts GLVM traffic
 Restarts System Resource Controller (SRC) daemons and cluster services
 Restores the state of the Enhanced Concurrent volume groups
IBM Systems Technical Events - ibm.com/training/events
Live Kernel Update Support Enabling
Enabling via SMIT

1. smitty sysmirror → Cluster Nodes and Networks → Manage Nodes


→Change/Show a Node.
2. Select desired node.
3. Set the Enable AIX Live Update operation field as desired.
4. Press Enter.

The cluster must be synchronized to enable the change.


IBM Systems Technical Events - ibm.com/training/events
Live Kernel Update Support Enabling
Enabling via clmgr

An example of how to check the current value of this setting using the clmgr command
follows:

[root@Jess] /# clmgr view node Jess |grep LIVE

ENABLE_LIVE_UPDATE= “true”

To disable this setting using the clmgr perform the following:

[root@Jess] /# clmgr modify node Jess ENABLE_LIVE_UPDATE=false


[root@Jess] /# clmgr view node Jess |grep LIVE

ENABLE_LIVE_UPDATE= “false”

The cluster must be synchronized to enable the change.


Demo available here
IBM Systems Technical Events - ibm.com/training/events
Non-Disruptive Upgrades

• The AIX level must already be at supported levels for PowerHA version

• Previously in v7.1.x non-disruptive updates (SPs) were supported but not


upgrades.

• Process is exactly the same as non-disruptive updating

• Perform on ONE node at a time from start to finish

– Stop cluster node with the “Unmanage Resource Group” option


– Perform upgrade (update_all)
– Restart cluster with default of “Auto” manage resource group

• Demo available here

IBM Systems Technical Events - ibm.com/training/events


PowerHA: Resource Optimized High Availability (ROHA)

Active LPAR Failover Standby LPAR

Application PowerHA Cluster

HMC

Hypervisor Hypervisor
Enterprise Pool
CoD

Customer Benefits
 Economy High Availability through Enterprise Pool and CoD Exploitation
 Customer Savings
– Hardware Costs
– Software License Costs

IBM Systems Technical Events - ibm.com/training/events


Split/Merge Cluster Options – NFS TB

Split-Merge Combination
Standard Cluster  None-Majority

 None-Majority
Stretched Cluster  Tiebreaker Disk-TieBreaker Disk
 Tiebreaker NFS-TieBreaker NFS

 None-Majority
 Tiebreaker Disk-TieBreaker Disk
Linked cluster
 Tiebreaker NFS-TieBreaker NFS
 Manual-Manual

IBM Systems Technical Events - ibm.com/training/events


GLVM Wizard
GLVM Wizard originally introduced in PowerHA v6.1
Support was for synchronous configurations only

PowerHA v7.2 GLVM enhancements:

• Also known as the GLVM Configuration Assistant


• Number of steps required have been reduced
• Support for asynchronous configurations has been added

Prerequisites:

An active cluster is configured with sites with no existing verification errors.


XD_data networks with persistent IP labels are defined on the cluster.
The network communication between the local site and remote site is working.
The /etc/hosts file on both sites contain all of the host IP, service IP, and persistent IP labels that you want to use in the
GLVM configuration.
Verify that the remote site has enough free disks and enough free space on those disks to support all of the local site volume
groups that are created for geographical mirroring.

The following filesets must be installed on your system:


– cluster.xd.glvm
– glvm.rpv.client
– glvm.rpv.server

Ensure that for all logical volumes that are planned to be geographically mirrored, the inter-disk allocation policy is set to
Super Strict.

IBM Systems Technical Events - ibm.com/training/events


GLVM Wizard

smit sysmirror → Applications and Resources → Make Applications Highly Available (Use
Smart Assist) → GLVM Configuration Assistant → Configure Asynchronous GMVG

IBM Systems Technical Events - ibm.com/training/events


GLVM Wizard

root@Houston(/)# clshowres

Resource Group Name datamvg_RG


Participating Node Name(s) Houston Boston
Startup Policy Online On Home Node
Only
Fallover Policy Fallover To Next
Priority Node
In The List
Fallback Policy Never Fallback
Site Relationship Prefer Primary Site
Node Priority
Service IP Label
Filesystems ALL
Filesystems Consistency Check fsck
Filesystems Recovery Method sequential
Volume Groups datamvg
Concurrent Volume Groups
Use forced varyon for volume groups, if necessary true
Disks
Raw Disks
Disk Error Management? no
GMVG Replicated Resources datamvg

IBM Systems Technical Events - ibm.com/training/events


PowerHA SystemMirror for AIX v7.2.1

28
SMUI – System Mirror User Interface

Initial Implementation Focuses on these key areas:


1. Single pane of glass view of all clusters
2. Easy detailed status views of each cluster
3. Better troubleshooting and log support

Future updates will focus on (See 7.2.2):

1. Cluster configuration
2. Cluster administration
3. Cluster management
4. Users with specified privileges

PowerHA 7.2.1. SP1 added support for previous PowerHA versions of:
7.2.0 (SP3)
7.1.3 (SP7)

IBM Systems Technical Events - ibm.com/training/events


SMUI – System Mirror User Interface

 Install the following on the PowerHA cluster nodes


– cluster.es.smui.agent, cluster.es.smui.common

 Install the following on either a PowerHA node or external AIX system


– cluster.es.smui.server, cluster.es.smui.common

 Ssh must be installed and running on both the server and clients.

 Execute the smuiinst.ksh script on the server (internet access required)


‒ If no internet access is available from GUI Server, copy script over to another AIX system that does have internet access.
‒ Execute smuiinst.ksh -d /directory command where /directory is the location where you want to the download the files.
‒ Then either NFS mount, or copy all over to server, and execute smuiinst.ksh -i /directory command where /directory is the
location where you copied the downloaded files on the node.

 Open the URL given in the output of the script in a supported browser
https://shawnssmui.cleartechnologies.net:8080/#/login

 Supported browsers at the time of release are:


‒ Google Chrome Version 50, or later
‒ Firefox Version 45, or later

IBM Systems Technical Events - ibm.com/training/events


SMU – System Mirror User Interface

Install demo available here

Tip: Before adding make


sure the CAA node name
matches the system hostname
Also place FQDN first in /etc/hosts

1. Click the keypad icon in the


top center of the window

2. Chose, Add Clusters

3. Enter either hostname or IP


address, along with user and
password as required.

4. Click Discover clusters

IBM Systems Technical Events - ibm.com/training/events


SMUI – System Mirror User Interface

Usage overview demo


Is available here

1. Navigation pane

2. Scoreboard

3. Event Filter

4. Event Timeline

5. Event list
IBM Systems Technical Events - ibm.com/training/events
SMU – System Mirror User Interface

1. Search terms – choose previously used or type in new search text


2. Log files – (hacmp.out,cluster.log,clutils.log,clstrmgr.debug,syslog.caa,clverify.log,autoverify.log)
3. Log file viewer – can open upto 4 windows,click arrow expand to new window

IBM Systems Technical Events - ibm.com/training/events


SMU – System Mirror User Interface

IBM Systems Technical Events - ibm.com/training/events


SMU – System Mirror User Interface

IBM Systems Technical Events - ibm.com/training/events


Additional ROHA Support

ROHA performs acquisition of EPCoD, activation of On/Off /CoD, allocation of DLPAR at resource
group start time and release of these resources at resource group stop time; nothing is done at
cluster synchronization time.

Loss of HMC/HMC access does not prevent takeover from occuring.

If LPM moves the LPAR to another HMC managed system, PowerHA adapts to the same.

ALWAYS_START_RG dramatically modifies general behaviour: instead of failing if not enough


resources or other reason, RG will succeed in starting. ALWAYS_START_RG is set by default to 1
in 7.2.1, was set to 0 in 7.2.0.

RESOURCE_ALLOCATION_ORDER changes the order the acquisition order is performed. By


default DLPAR is acquired first then EPCoD : EPCoD is acquired only if not enough available
resources on DLPAR. It is now possible to acquire EPCoD first and then DLPAR.

IBM Systems Technical Events - ibm.com/training/events


Additional ROHA Support

PowerHA SystemMirror node name is the name of the node in PowerHA configuration.

The hostname is the communication path to the node.

LPAR name is the name of the LPAR hosting the node as seen by the HMC.

Having different values for example :


– nodename : jess_node
– hostname : jessica
– lparname : jess_lpar

is supported by ROHA in 7.2.1

Matches between node names and LPAR names are stored into HACMPdynresop ODM
entries of each node and exploited when no other possibility.

– clodmget HACMPdynresop

IBM Systems Technical Events - ibm.com/training/events


Split/Merge Options Apply To All Clusters

Split-Merge Combination
Standard Cluster  None-Majority
 Tiebreaker Disk-TieBreaker Disk
Stretched Cluster
 Tiebreaker NFS-TieBreaker NFS
Linked cluster  Manual-Manual

IBM Systems Technical Events - ibm.com/training/events


CL_EZUPDATE – Easy Update Tool (SP1)

Single step, non-disruptive cluster update support

• Command line tool to address various types of non-disruptive updates


• Automate all steps of migration documentation
• Cluster wide (one node at a time) update supported
• Detailed checks, messages/guidance and error messages

Support for NIM Server or local file system


• Integrated with NIM server

https://www.ibm.com/support/knowledgecenter/en/SSPHQG_7.2.1/com.ibm.powerha.cmds/cl_ezupdate.htm

IBM Systems Technical Events - ibm.com/training/events


CL_EZUPDATE – Easy Update Tool (SP1)

So what does it do?

• It performs numerous checks on each node to help ensure success


– This includes, but may not be limited to, the following:
 PowerHA images are supported on the AIX levels installed in the cluster
 Clcomd communications is functional
 Cluster, node, and resource group state
 NIM server communications is functional and NIM resource exists and usable
 Tests NFS mounting from the NIM server
 Compares and validates installed PowerHA filesets are the same on all nodes
 Ensures no current PowerHA filesets need to be Committed or Rejected
 Performs a preview installation of the update package

IBM Systems Technical Events - ibm.com/training/events


CL_EZUPDATE – Easy Update Tool (SP1)

So what does it do?

• Ultimately it performs a non-disruptive update across the cluster


– If node is hosting a resource group, it stops cluster in “Unmanaged” state.
– If node is NOT hosting a resource group it gracefully stops cluster on that node
– Performs the update (update_all)
– If node was gracefully stopped, it’s restarted in Manual mode
– If node was stopped Unmanaged (forced) it’s restarted in Automatic mode
 This means that the application start script will be executed again and may have undesireable
results. Ideally you should have a smart start script to check if app is running and exit
accordingly if so. However, you can edit the script and simply insert an “exit 0” at the top of the
script.

IBM Systems Technical Events - ibm.com/training/events


PowerHA SystemMirror for AIX v7.2.2

IBM Systems Technical Events - ibm.com/training/events 42


SMUI – Deployment Wizard – Creating Cluster

IBM Systems Technical Events - ibm.com/training/events 43


SMUI – Deployment Wizard – Create Resource Group

IBM Systems Technical Events - ibm.com/training/events 44


SMUI – Deployment Wizard – Create Resource - VG

IBM Systems Technical Events - ibm.com/training/events 45


SMUI – Deployment Wizard – Assign Apps

IBM Systems Technical Events - ibm.com/training/events 46


SMUI – Admin - Sync & Start Cluster

47
IBM Systems Technical Events - ibm.com/training/events
SMUI - Admin - RG move and Stop Cluster

IBM Systems Technical Events - ibm.com/training/events


SMUI – Role Based Administration

IBM Systems Technical Events - ibm.com/training/events 49


SMUI – User Management

IBM Systems Technical Events - ibm.com/training/events 50


SMUI – Cluster Zones

IBM Systems Technical Events - ibm.com/training/events 51


Cluster Wide Log Analysis - clanalyze

Analyzes PowerHA SystemMirror log files for errors and provides the analysis report.

It performs the following tasks:

• Analyzes the log files and provides an error report based on error strings or time stamps.
• Analyzes the core dump file from the AIX® error log.
• Analyzes the log files that are collected through the snap and clsnap utility.
• Analyzes user-specified snap file based on error strings that are provided and generates a
report.

https://www.ibm.com/support/knowledgecenter/en/SSPHQG_7.2.2/com.ibm.powerha.cmds/clanalyze.htm

IBM Systems Technical Events - ibm.com/training/events 52


cl_ezupdate – Easy update rollback feature

Cl_ezupdate enhanced to include OS cloning for rollback recovery

https://www.ibm.com/support/knowledgecenter/en/SSPHQG_7.2.2/com.ibm.powerha.cmds/cl_ezupdate.htm

IBM Systems Technical Events - ibm.com/training/events 53


Additional Resources

Subscribe to my YouTube channel:


http://www.youtube.com/powerhaguy

Follow me on Twitter:
http://twitter.com/#!/POWERHAguy

Check out the PowerHA SystemMirror Wiki


http://tinyurl.com/PHAwiki

IBM Systems Technical Events - ibm.com/training/events