You are on page 1of 50

GTS Delivery Center Argentina

IBM Global Account

Power HA Workshop
Part 1
ing. Luciano Bez

1Understand the High Availability Technology.


2Understand the hacmp & powerha technology.

Power HA - Workshop

Instructor
ing. Luciano Martn BAEZ MOYANO
lucianobaez@ar.ibm.com
luciano.baez
http://www.luchonet.com.ar
http://www.linkedin.com/in/lucianobaez
https://www.facebook.com/lucianobaez

Resources
http://ibmurl.hursley.ibm.com/NUMX
http://ibmurl.hursley.ibm.com/NUOH
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

What is a Cluster ?
How many kinds of Clusters there are ?

A Cluster is a group of servers and other


resources with a common objective.
-Server Farms
-Load Balancing
-High Availability
-High Performance Computing

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

What is High Availability ?

High Availability is a system design approach and


associated service implementation that ensures a
prearranged level of operational performance will be met
during a contractual measurement period.
Availability refers to the ability of the user community to access the system, whether to
submit new work, update or alter existing work, or collect the results of previous work. If
a user cannot access the system, it is said to be unavailable.
Generally, the term downtime is used to refer to periods when a system is unavailable.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Causes Of Downtime

Solution
Required
Disaster
Recovery

High
Availability
Downtime refers to a period of time or a
percentage of a time span that a machine
or system (usually a computer server) is
offline or not functioning, usually as a
result of either system failure (such as a
crash or routine maintenance.

(Continuous
Operations)

Reliability is not the same as Availability!


5ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Availability percentage calculation


Availability is usually expressed as a percentage of uptime in a given year.

Huh?
Did
something
happen?

Checkpoint restart.
Not too bad .

Start over.
Where's all my work?

Uptime and Availability are not synonymous. A system can be up, but not available, as
in the case of a network outage.
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

What is a Fault Tolerant system ?


Fault-tolerance or graceful degradation is the property that
enables a system, to continue operating properly in the event
of the failure of (or one or more faults within) some of its
components. If its operating quality decreases at all, the decrease
is proportional to the severity of the failure, as compared to a
navely-designed system in which even a small failure can cause
total breakdown.
Fault-tolerance is particularly sought-after in high-availability or life-critical
systems.
No single point of failure or redundancy.
Fault isolation to the failing component.
Fault containment to prevent propagation of the failure.
Availability of reversion modes.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

High Availability cluster design


High-availability clusters (also known as HA Clusters or Failover Clusters) are
computer clusters that are implemented primarily for the purpose of providing
high availability of services which the cluster provides. They operate by having
redundant computers or nodes which are then used to provide service when
system components fail.
Network
Cluster

Heartbeat

SAN

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Some concepts
Not every application can run in a high-availability cluster environment, and the necessary design
decisions need to be made early in the software design phase. In order to run in a high-availability
cluster environment, an application must satisfy at least the following technical requirements:

There must be a relatively easy way to start, stop, force-stop, and check the status of the
application. In practical terms, this means the application must have a command line interface or
scripts to control the application, including support for multiple instances of the application.
The application must be able to use shared storage (NAS/SAN).
Most importantly, the application must store as much of its state on non-volatile shared storage
as possible. Equally important is the ability to restart on another node at the last state before
failure using the saved state from the shared storage.
The application must not corrupt data if it crashes, or restarts from the saved state.

Fail over: If a Node with a clustered resource crashes, the HA clustering remedies this situation by
immediately restarting the application on another node without requiring administrative intervention.
Fail back: Is the process to back the resource to the original node, after a failover.
Heartbeat: Is a connection between nodes which is used to monitor the health and status of each
node in the cluster.
Split-Brain: Occurs when all private links go down simultaneously, but the cluster nodes still
running. If that happens, each node in the cluster may mistakenly decide that every other node has
gone down and attempt to start services that other nodes are still running. Having duplicate
instances of services may cause data corruption on the shared storage.
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

High Availability Node configuration.


Active/Passive: Provides a fully redundant instance of each node, which is
only brought online when its associated primary node fails. This configuration
typically requires the most extra hardware.
Active/Active: Traffic intended for the failed node is either passed onto an
existing node or load balanced across the remaining nodes. This is usually only
possible when the nodes utilize a homogeneous software configuration.
N+1: Provides a single extra node that is brought online to take over the role of
the node that has failed. In the case of heterogeneous software configuration
on each primary node, the extra node must be universally capable of assuming
any of the roles of the primary nodes it is responsible for. This normally refers to
clusters which have multiple services running simultaneously; in the single
service case, this degenerates to Active/Passive.
N+M: In cases where a single cluster is managing many services, having only
one dedicated failover node may not offer sufficient redundancy. In such cases,
more than one (M) standby servers are included and available. The number of
standby servers is a tradeoff between cost and reliability requirements.
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

High Availability Node configuration.


N-to-1: Allows the failover standby node to become the active one temporarily,
until the original node can be restored or brought back online, at which point
the services or instances must be failed-back to it in order to restore High
Availability.
N-to-N: A combination of Active/Active and N+M clusters, N to N clusters
redistribute the services, instances or connections from the failed node among
the remaining active nodes, thus eliminating (as with Active/Active) the need for
a 'standby' node, but introducing a need for extra capacity on all active nodes.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Active/Passive Node Configuration


Network

Cluster

Request for Application


A

Service Access

FAIL
Application A

Heartbeat

Passive
Active Node
FAILOVER process Node

ing. Luciano Bez lucianobaez@ar.ibm.com

Note: The failover process could be


triggered by a unexpected fail (unplanned)
or by a planned system administratorMay
action
2016

Power HA - Workshop

Active/Active Node Configuration


Network

Request for Application


A
Request for Application
B

Service Access

Cluster

FAIL
Application A

Application B
Heartbeat

Active Node
Active Node
FAILOVER process
B
A

ing. Luciano Bez lucianobaez@ar.ibm.com

Note: The failover process could be


triggered by a unexpected fail (unplanned)
or by a planned system administratorMay
action
2016

Power HA - Workshop

Poor Man Cluster Node Configuration


Network

Request for Application


A

No
Request for Application
available
B

Service Access

Cluster

FAIL

UNLOAD
the
Development Database
Low importance
Application B application
(Development)

Application A

Heartbeat

(production)

Active Node
Active Node
FAILOVER process
B
A

ing. Luciano Bez lucianobaez@ar.ibm.com

Note: The failover process could be


triggered by a unexpected fail (unplanned)
or by a planned system administratorMay
action
2016

Power HA - Workshop

2 HACMP
High Availability Cluster Multiprocessing
(Now called IBM PowerHA SystemMirror)

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

HACMP History
PowerHA SystemMirror 7.1
HACMP 5.2

HACMP 4.4.1
Integration with
Tivoli
Application Monitoring

Cascading w/out
fallback option

Integration of HANFS
Functionality

Selective Fallover
HACMP 4.3.1

32 Node Support

Node by Node
migration

Fast Connect
Support

HACMP Task Guides

HACMP 4.2.2
Introduced HAES
based on RSCT
monitoring topology &
event management
services from PSSP

2 Node Config Assist


File Collections
Cluster Test Tool
RG Dependencies
Self Healing Clusters
User Password Mgmt
WebSMIT

HACMP 5.1

HAS (Classic)
Dropped

Fast Disk Takeover

Custom Resource
Groups

Heartbeating over IP
Aliases

Disk Heartbeating
HACMP 4.5
Introduction of IP
Aliasing

Persistent IP Address

64-bit capable APIs

Monitoring and
recovery from loss of
VG quorum

ing. Luciano Bez lucianobaez@ar.ibm.com

HACMP 5.4.1

First Failure Data Capture


WPAR Integration
Consistency Group
Support DS Metro Mirror
GLVM Monitoring
Enhancements
NFSV4 Support
Multi-Node Disk Heartbeat
WebSMIT Enhancements

HACMP 5.4.0
Non Disruptive Upgrades
Fast Failure Detection
IPAT on XD Networks
Linux on Power Support
Oracle Smart Assistant
GPFS 2.3 Integration
DSCLI Support
Intermix of DS
Enclosures
HACMP 5.3
OEM Volume & FS
Support
Location Dependencies
Startup Verification
Geographic LV Mirroring
IP Distribution Policies

Cluster Aware AIX


IBM Director Integration
Hitachi TrueCopy & Global
Replicator Integration
DS8700 Global Mirror
Integration
Drop RSCT for Multi Cast
protocol
Storage Monitoring
HADR Storage Framework

PowerHA SystemMirror 6.1

DSCLI Metro Mirror VIOS


Packaging & Pricing
Changes
p6/p7 CoD DLPAR Support
EMC SRDF Integration
GLVM Config Wizard
Full IPV6 Support

PowerHA 5.5

WebSMIT Gateway Server


WebSMIT Enterprise View
Partial IPV6 Support
Asynchronous GLVM
DR Manual Recovery
Option
SVC PPRC VIO Support

May 2016

Power HA - Workshop

HACMP History

Single
server

Clusters

Site 1

Site 1

Split Site
Clusters

Multi-site
Disaster Recovery

Split Site Mirror


Site 1
Site 2

Site 1

1
112
01

1989

1992

HACMP Cluster
Active-Passive Failover
Resource Group
Management
Planned and unplanned
outage handling

2000
Disaster Recovery with
Storage: DS8K, SVC
Framework for OEM disk
and FS support
Location dependencies
Low cost Host mirroring
File Collections

Fast Disk Takeover


Redundant communication
support (Disk, Network, SCSI
target, Token ring, Serial etc)
Integration with Tivoli
Monitoring
VG failure handling
NFS HA management

ing. Luciano Bez lucianobaez@ar.ibm.com

Site 2

11

122 1
94
31
8
765 0

219
3
4578
6

Third Party
Storage DR
Site 1

1
112
01

Site 2

11

122 1
94
3 10
8
765

SAP HotStandby HyperSwap


SAP
liveCache HotStandby

HyperSwap
Site 1 Site 2

Active-Active
3 Site
Sites
Deployments
Active Active Sites
Site 1
Site 2

3 Site Deployments
Site 1 Site 2 Site 3

219
3
4578
6

2004
Fast Failure Detection
Framework for OEM disk
and FS support
Capacity Optimized
failovers
Low cost Host mirroring
GPFS Integration
Two Node Rapid
deployment assistant
WPAR HA Management
Browser based UI
RG Dependencies
Health Monitoring and
Verification framework
Flexible and uniform
E2E
integration
failover
policies for 1 or 2
Single
sites point of control
Application
level
granularity
DR with EMC,
Hitachi,
XIV
Distributed
server h/w mgt
NDU upgrades
E2E
0-3 sec / RTO < 1H
SelfRPO
healing

2010

2013

2012

PowerHA v7: Kernel based


clustering
PowerHA federated Security
Administration
Enhanced Split/Merge
handling
Enhanced Middleware HA
management (Smart Asists)
SAP HA management
SAP liveCache HotStandby
solution
IBM Director: Graphical
Management
Full IPv6 Support
Stretched and Linked
Clusters

HyperSwap with DS8K


Active-Active Sites support
Manual Failover DR
Tie breaker Support
3 Site support through LVM
Mirror+HyperSwap
3 Site Support through LVM
Mirror + GLVM Mirroring
Unicast clustering
Dynamic Host Name Change
support

May 2016

Power HA - Workshop

PowerHA / HACMP support matrix


Packaging Changes Introduced in version 6.1:
Standard Edition - Local Availability
Enterprise Edition - Local & Disaster Recovery

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA / HACMP support matrix

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Clustering for HA and DR


Application

PowerHA standard edition cluster


Application

PowerHA enterprise edition cluster

PowerHA enables 24x365 operational availability


Automation for planned and unplanned outages
Solutions covering simple data center to multiple-site configurations
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Power HA Standard and Enterprise


Standard
Edition

Enterprise
Edition

Centralized Management CSPOC

Cluster resource management

Shared Storage management

Cluster verification framework

Integrated disk heartbeat

SMIT management interfaces

AIX event/error management

Integrated heartbeat

PowerHA DLPAR HA management

Smart Assists

Multi Site HA Management

High Level Features

PowerHA GLVM async mode

GLVM deployment wizard

IBM Metro Mirror support

IBM Global Mirror support

OEM Copy Services

ing. Luciano Bez lucianobaez@ar.ibm.com

Highlights:
Editions to optimize software value
capture
Standard Edition targeted
at datacenter HA
Enterprise Edition targeted
at multi-site HA/DR
- Stretched Clusters
- Linked Clusters
Per processor core used + tiered
pricing structure
- Small/Med/Large

May 2016

Power HA - Workshop

PowerHA SystemMirror 7.1 Enterprise Edition


Simpler to deploy and easier to manage multi-site configurations
with IBM Systems Director, intuitive interfaces, multi-site install
wizard.
Stretched Cluster; cluster wide AIX commands, kernel based
event management single repository multicast communications.
Linked Clustering; cluster wide AIX commands, kernel based
event management, linked clusters with unicast communications &
dual Repositories.
HyperSwap for continuously available storage in two-site
topologies Cluster Split/Merge technology for managing split-site
policy scenarios.

Announce Date: Oct 3 2012


GA Date: Nov 9 2012
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Stretched clusters and linked cluster differences


You can use PowerHA SystemMirror management interfaces to create the following multiple-site
solutions:
Stretched cluster: Contains nodes from sites that are located at the same geographical locations.
Stretched clusters do not support HADR (High Availability Disaster Recovery) with storage
replication management.
Linked cluster: Contains nodes from sites that are located at different geographical locations.
Linked clusters support cross-site LVM mirroring and HyperSwap.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

HyperSwap Support by AIX-PowerHA


HyperSwap device configuration transparent to application
Applications continue to use the devices as usual - storage switching is fast seconds

Application/LVM/Middleware

Application/LVM/Middleware

/dev/hdiskX

/dev/hdiskX

/dev/hdiskY

Metro Mirror

Primary DS8K

Secondary DS8K

Traditional Metro Mirror Cluster


ing. Luciano Bez lucianobaez@ar.ibm.com

/dev/hdiskX

/dev/hdiskY

Metro Mirror

Primary DS8K

Secondary DS8K

HyperSwap Cluster
May 2016

Power HA - Workshop

HyperSwap for PowerHA SystemMirror


HyperSwap function in PowerHA enhances application availability for storage errors by using IBM
DS8000 metro mirroring. If you use the HyperSwap function in your environment, your applications
stay online even if errors occur on the primary storage because PowerHA SystemMirror,
transparently routes the application I/O to an auxiliary storage system.
The HyperSwap function uses a model of communication, which is called in-band, that sends the
control commands to a storage system through the same communication channel as the I/O for the
disk.
The HyperSwap function supports the following types of configurations:
Traditional Metro Mirror Peer-to-Peer Remote Copy (PPRC): The primary volume group is
only visible in the primary site and the auxiliary volume group is only visible in the auxiliary site.
HyperSwap:The primary and auxiliary volume group are visible from the same node in the
cluster.
You typically configure the HyperSwap function to be used in the following environments:.
Single node environment: A single compute node is connected to two storage systems that are
in two sites. This HyperSwap configuration is ideal to protect your environment against simple
storage failures in your environment.
Multiple site environment: A cluster has multiple nodes that are spread across two sites. This
HyperSwap configuration provides high availability and disaster recovery for your environment.
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

HyperSwap for PowerHA SystemMirror


Mirror groups in HyperSwap for PowerHA SystemMirror represent a container of disks and have the
following characteristics: used by Cluster Aware AIX (CAA).

Mirror group contain information about the disk pairs across the site. This information is used
to configure mirroring between the sites.
Mirror groups can contact a set of logical volume manager (LVM) volume groups and a set of
raw disks that are not managed by the AIX operating system.
All the disks devices that are associated with the LVM volume groups and raw disks that are
part of a mirror group are configured for consistency. For example, the IBM DS8800 views a
mirror group as one entity regarding consistency management during replication.
The following types of mirror groups are supported:
User mirror group: Represents the middleware-related disk devices. The HyperSwap function is
prioritized internally by PowerHA SystemMirror and is considered low priority.
System mirror group: Represents critical set of disks for system operation, such as, rootvg disks
and paging space disks. These types of mirror groups are used for mirroring a copy of data that is not
used by any other node or site other than the node that host these disks.
Repository mirror group: Represents the cluster repository disks of that are used by Cluster Aware
AIX (CAA).

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA SystemMirror: DLPAR Value

Pros:
Automated action on acquisition of
resources (bound to the PowerHA application

server)
HMC Verification Checking for connectivity to the
HMC
Ability to Grow LPAR on Failover
Save $ on PowerHA SM Licensing

Thin Standby node

Cons:
Requires Connectivity to HMC
Potentially Slower Failover System
Specs (Takes a lot of time)

Lacks ability to grow LPAR on-fly

Ssh comunication
LPAR A
HMC

LPAR B
HMC

Backup

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA/HACMP topology
Networking components

Nodes: In the PowerHA context, the term node, means any


IBM pSeries system (physical or virtual) which is member of a
high availability cluster running PowerHA .
Networks: Network consist of IP and Non-IP networks. The
Non-Ip networks ensure cluster monitoring can be done if
there is total loss of IP communications. Non-IP networks are
strongly recommended to be configured to provide high
availability. (Ethernet, Ether channel, Non-Ip disk, Non-Ip
serial, etc.)
Communication interfaces: Adapter for IP networks.
Communication devices: Used for Non-IP networks (SAN).
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA/HACMP topology
Resource components

r
Se

P Ad
dres
s

r
ve

Serv
i ce I

p
rou
eG
um
Vol

n
io
at

ing. Luciano Bez lucianobaez@ar.ibm.com

ic
pl
Ap

Resource: Is a logical component


that can be put into a resource
group. Because they are logical
components, they can be moved
without human intervention.

NF
Se
xp
ort
s

Resource Group: Is a collection of resources treated as a unit along


with the nodes, they can potentially be activated on and what policies the
cluster manager should use to decide which node to choose during
startup, fallover and fallback.

le
Fi

m
te
s
sy

May 2016

Power HA - Workshop

Custom Resource Groups


Start up preferences:
* Online On Home Node Only (cascading) - (OHNO)
* Online on First Available Node (rotating or cascading
w/inactive takeover) - (OFAN)
* Online On All Available Nodes (concurrent) - (OAAN)
* Startup Distribution
Fallover Preferences:
* Fallover To Next Priority Node In The List - (FOHP)
* Fallover Using Dynamic Node Priority - (FDNP)
* Bring Offline (On Error Node Only) - (BOEN)
Fallback Preferences:
* Fallback To Higher Priority Node - (FBHP)
* Never Fallback - (NFB)
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Resource Groups Dependencies


* The maximum depth of the dependency tree is three levels, but any
resource group can be in a dependency relationship with any number of
other resource groups.
* Circular dependencies are not supported, and are prevented during
configuration time.

RG A

RG B

RG A

RG C
RG B

RG C

RG D

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Resource Groups Locations Dependencies

Online on same node: all resource groups must be


online on the same node.
Online on Different Nodes: All resource groups must
be online on different nodes.
Online on Same Site: All resource groups must be
online on the same site.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Online on different Node priorities


You can assign High, Intermediate, and Low priority to each resource group.
- Higher priority resource groups take precedence over lower priority groups
at startup, fallover, and fallback.
- High priority groups can force Intermediate and Low priority groups to
move or go offline.
- Intermediate priority groups can force Low priority groups to move or go
Offline.
- Low priority groups cannot force any other groups to move or go offline.
- Groups of the same priority cannot force each other to move or go offline.
- RGs with the same priority cannot come ONLINE (startup) on the same
node.
- RGs with the same priority do not cause one another to be moved from the
node after a failover or fallback

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

Availability components
Not just PowerHA: The final high availability solution goes beyond the
PowerHA. A high availability solution comprises a reliable OS (AIX),
applications that are tested to work in a HA cluster, storage devices,
appropriate selection of hardware, trained administrators and thorough
design and planning.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

So what is PowerHA/HACMP?
It's an application wich

Controls where resource groups runs.


Monitors and reacts to events. (Does fall over & does reintegration)
Provides tools for cluster wide configuration and synchronization

Cluster Manager Subsystem


clcomdES

Topology
manager

Resource
manager

Event
manager

RSCT (Topology Svcs / RMC)

ing. Luciano Bez lucianobaez@ar.ibm.com

SNMP
manager

snmpd

clinfoES
clstat

May 2016

Power HA - Workshop

PowerHA V6 cluster manager flow


Cluster manager is a subsystem wich:
- Controls where resource groups run, reconfigures service addresses
- Reacts to events and has tools for making cluster-wide changes and displaying status
- Relies on other AIX subsystem (ODM, LVM, CAA, RSCT, AHAFS, TCPIP, )

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA V7 cluster manager flow


Cluster manager is a subsystem wich:
- Controls where resource groups run, reconfigures service addresses
- Reacts to events and has tools for making cluster-wide changes and displaying status
- Relies on other AIX subsystem (ODM, LVM, CAA, RSCT, AHAFS, TCPIP, )

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
The cluster topology represents the physical view of the cluster and how hardware cluster
components are connected using networks (IP and non-IP). To understand the operation of
PowerHA, you need to understand the underlying topology of the cluster, the role each component
plays and how PowerHA interacts. In this section we describe:

PowerHA cluster
Nodes
Sites
Policies (Split and Merge)
Networks ( physical, logical, labels, alias, multicasting,
unicasting, etc.)
Communication interfaces / devices
Persistent and Service node IP labels / addresses
Network modules (NIMs)
Topology and group services
Clients
etc.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
Networks
In PowerHA, the term network is used to define a logical entity that groups the communication interfaces and
devices used for communication between the nodes in the cluster, and for client access. The networks in
PowerHA can be defined as IP networks and non-IP networks. The following terms are used to describe
PowerHA networking:

IP address: The dotted decimal IP address.


IP label: The label that is associated with a particular IP address as defined by the name
resolution method (DNS or static using /etc/hosts).
Base IP label / address: The default IP label / address that is set on the
interface by AIX on
startup. The base address of the interface.
Service IP label / address: An IP label / address over which a service is provided. It can be
bound to a single node or shared by multiple nodes. Although not part of the topology, these are
the addresses that PowerHA keeps highly available.
Boot interface: Earlier versions of PowerHA have used the terms boot adapter and standby
adapter depending on the function. These have been collapsed into one term to describe any IP
network interface that can be used by PowerHA to host a service IP label / address.
IP aliases: An IP alias is an IP address that is added to an interface, rather than replacing its
base IP address. This is an AIX function that is supported by PowerHA. However, PowerHA
assigns to the IP alias the same subnet mask of the base IP address over which it is configured.
Logical network interface: The name to which AIX resolves a port (for example, en0) of a
physical network adapter.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
IP Address takeover mechanism
One of the key roles of PowerHA is to maintain the service IP labels / addresses highly available.
PowerHA does this by starting and stopping each service IP address as required on the
appropriate interface. When a resource group is active on a node, PowerHA supports two
methods of activating the service IP addresses:

By replacing the base (boot-time) IP address of an interface with the


service IP address. This method is known as IP address takeover (IPAT)
via IP replacement. This method also allows the takeover of a locally
administered hardware address (LAA)hardware address takeover.
By adding the service IP address as an alias on the interface, for
example, in addition to the base IP address. This method is known as IP
address takeover via IP aliasing. This is the default for PowerHA.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
Persistent IP Label or Address
A persistent node IP label is an IP alias that can be assigned to a network for a specified node. A
persistent node IP label is a label that:
Always stays on the same node (is node-bound)
Co-exists with other IP labels present on the same interface
Does not require installation of an additional physical interface on that node
Is not part of any resource group
Assigning a persistent node IP label for a network on a node allows you to have a highly available
node-bound address on a cluster network. This address can be used for administrative purposes
because it always points to a specific node regardless of whether PowerHA is running.
Note: It is only possible to configure one persistent node IP label per network per node. For
example, if you have a node connected to two networks defined in PowerHA, that node can be
identified via two persistent IP labels (addresses), one for each network.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
Device based or serial networks
Serial networks are designed to provide an alternative method for exchanging information using
heartbeat packets between cluster nodes. In case of IP subsystem or physical network failure,
PowerHA can still differentiate between a network failure and a node failure when an independent
path is available and functional.
Serial networks are point-to-point networks, and therefore, if there are more than two nodes in
the cluster, the serial links should be configured as a ring, connecting each node in the cluster.
Even though each node is only aware of the state of its immediate neighbors, the RSCT daemons
ensure that the group leader is aware of any changes in state of any of the nodes.
Even though it is possible to configure a PowerHA cluster without non-IP networks, we strongly
recommend that you use at least one non-IP connection between each node in the cluster.
The following devices are supported for non-IP (device-based) networks in PowerHA:
Serial RS232 (rs232)
Target mode SCSI (tmscsi)
Target mode SSA (tmssa)
Disk heartbeat (diskhb)
Multi-node disk heartbeat (mndhb)

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
Split policy
A cluster split event can occur between sites when a group of nodes cannot communicate with the
remaining nodes in a cluster. For example, in a linked cluster, a split occurs if all communication
links between the two sites fail. A cluster split event splits the cluster into two or more partitions.
The following options are available for configuring a split policy:
None: A choice of None indicates that no action will be taken when a cluster split event is
detected. Each partition that is created by the cluster split event becomes an independent
cluster. Each partition can start a workload independent of the other partition. If shared volume
groups are in use, this can potentially lead to data corruption. This option is the default
setting, since manual configuratiDo not use this option if your environment is configured to use
HyperSwap for PowerHA SystemMirroron is required to establish an alternative policy..
Tie breaker: A choice of Tie Breaker indicates that a disk will be used to determine which
partitioned site is allowed to continue to operate when a cluster split event occurs. Each partition
attempts to acquire the tie breaker by placing a lock on the tie breaker disk. The tie breaker is a
SCSI disk that is accessible to all nodes in the cluster. The partition that cannot lock the disk is
rebooted. If you use this option, the merge policy configuration must also use the tie breaker
option.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
Merge policy
Depending on the cluster split policy, the cluster might have two partitions that run independently of
each other. You can use PowerHA SystemMirror Version 7.1.2, or later, to configure a merge
policy that allows the partitions to operate together again after communications are restored
between the partitions.
The following options are available for configuring a merge policy:
Majority: The partition with the highest number of nodes remains online. If each partition has
the same number of nodes, then the partition that has the lowest node ID is chosen. The
partition that does not remain online is rebooted, as specified by the chosen action plan. This
option is available for linked clusters. For stretched clusters to use the majority option, your
environment must be running one of the following version of the AIX operating system:
*IBM AIX 7 with Technology Level 4, or later
*AIX Version 7.2, or later.
Tie breaker: Each partition attempts to acquire the tie breaker by placing a lock on the tie
breaker disk. The tie breaker is a SCSI disk that is accessible to all nodes in the cluster. The
partition that cannot lock the disk is rebooted, or has cluster services restarted, as specified by
the chosen action plan. If you use this option, your split policy configuration must also use the
tie breaker option.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
Highly available NFS server
The highly available NFS server functionality is included in the PowerHA SystemMirror product
subsystem.
A highly available NFS server allows a backup processor to recover current NFS activity should the
primary NFS server fail. The NFS server special functionality includes highly available modifications
and locks on network file systems (NFS).
You can do the following:

Use the reliable NFS server capability that preserves locks and dupcache (2-node
clusters only if using NFS version 2 and version 3)
Specify a network for NFS cross-mounting
Define NFS exports and cross-mounts at the directory level v Specify export options for
NFS-exported directories and file systems
Configure two nodes to use NFS.

PowerHA SystemMirror clusters can contain up to 16 nodes. Clusters that use NFS version 2 and
version 3 can have a maximum of two nodes, and clusters that use NFS version 4 can have a
maximum of 16 nodes.

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

PowerHA topology
PowerHA SystemMirror common cluster configurations
Standby configurations: Standby configurations are the traditional redundant hardware configurations where
one or more standby nodes stand idle, waiting for a server node to leave the cluster. (Standby configurations with
online on home node only startup policy, Standby configurations with online using distribution policy startup)

Takeover configurations: In the takeover configurations, all cluster nodes do useful work, processing part of the
cluster's workload. There are no standby nodes. Takeover configurations use hardware resources more efficiently
than standby configurations since there is no idle processor. Performance can degrade after node detachment,
however, since the load on remaining nodes increases. (One-sided takeover, Mutual takeover, Two-node mutual
takeover configuration, Eight-node mutual takeover configuration)

Cluster configurations with multitiered applications: A typical cluster configuration that could utilize parent
and child dependent resource groups is the environment in which an application such as WebSphere depends on
another application such as DB2.
Cluster configurations with resource group location dependencies: You can configure the cluster so that
certain applications stay on the same node, or on different nodes not only at startup, but during fallover and
fallback events. To do this, you configure the selected resource groups as part of a location dependency set.
Cross-site LVM mirror configurations for disaster recovery: You can set up disks that are located at two
different sites for remote LVM mirroring, using a storage area network (SAN).
Cluster configurations with dynamic LPARs: The advanced partitioning features of AIX provide the ability to
dynamically allocate system CPU, memory, and I/O slot resources (dynamic LPAR).
ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016

Power HA - Workshop

What will PowerHA do for me ?

Heartbeats & Failure Detection

Communication across all available


interfaces (NICs, HBAs, Repository
Lun)
IP Address Takeover
Break Disk Reservations
Start / Stop Applications
Ability to define Dependencies in
multitiered Environments
Application Monitoring
Failure Notification
Application Recovery
Initiate Takeover
Custom Events (Alter processing)

ing. Luciano Bez lucianobaez@ar.ibm.com

Automatic Corrective Actions


Nightly Verification of configuration
Built-in Lazy Update
Documenting the environment
Cluster Snapshots
IBM Director Integration
File Collections
Propagate Files amongst the
cluster
Ease of Deployment
Smart Assistants (SAP, DB2,
OracleL)
Federated Security Integration
System Security Administration
Encrypted Filesystems
RBAC

May 2016

Power HA - Workshop

PowerHA topology
Tasks to Configure Cluster Infrastructure and ownership
Plan-out IP Addresses
Hard set interface Ips
Document DNS names
update /etc/hosts
Share Storage
Drivers / filesets
Assign Luns
Alter SAN infrastructure
Zoning
Install Applications
Start/stop scripts
Space requirements
Optimize configuration for
performance
HA Cluster Installation / Deployment
Topology & Resource Setup
Fallover Testing
Monitoring the environment
ing. Luciano Bez lucianobaez@ar.ibm.com

Owner

Network Admin.
Storage Admin.
Application Admin.
HA Admin.

May 2016

Power HA - Workshop

Questions

ing. Luciano Bez lucianobaez@ar.ibm.com

May 2016