PowerHA - 3 Basic Configuration

PowerHA SystemMirror
Basic Configuration
IBM Power Systems
© Copyright IBM Corporation 2010

实施专家级课程 PowerHA
HACMP配置过程
¾ HACMP配置前的准备工作
z 配置IP地址
z 编辑/etc/hosts文件
z 编写应用程序的启动/停止脚本
z 创建vg和文件系统
z 准备串口设备及磁盘心跳设备
¾ HACMP的Standard配置过程
z 添加Cluster和节点
z 配置Cluster资源
z 创建Cluster资源组
z 同步HACMP的配置
¾ HACMP的Extended配置过程
z 添加心跳
z 定制Cluster资源
Page 2
PowerHA Configure Menu

Smitty hacmp
Page 3
Extended Configuration
1
2
Page 4
Extended Topology Configuration
1.1
1.2
1.3
1.4
1.5
Page 5
Extended Resource Configuration
2.1
2.2
Page 6
Extended Resources Configuration
2.1.1
2.1.2
Page 7
Extended Resource Group Configuration
2.2.1
2.2.2
Page 8
Startup and Stop services
Page 9
Startup and Stop services (V5.4版)
Page 10
Startup and Stop services (V5.4版)
Graceful down
Take over
Force down
Page 11
Hands-on Description
¾1.Active - Standby(Service ip,2 nodes)
z Simulate Application Takeover
¾2.Active - Standby(Service ip,vg,2 nodes)

z Simulate DataBase Takeover
¾3.Active - Active(Service ip,vg,2 nodes)

z Simulate Applicaton and DataBase server active-active takeover
¾4.Persistent Ip(2 nodes)
¾5.Active - Active(Service ip,vg,Disk heart beat,2 nodes)

z Simulate Disk Heartbeat
¾6.Active - Active(Concurrent VG,2 nodes)

z Simulate concurrent running mode
¾7 Dependency among RGs(4 nodes)

Page 12
z Parent/Child Dependency
Case 1.Active - Standby(Service ip,2 nodes)
¾Simulate Application Takeover
Page 13
Case 2.Active - Standby(Service ip,vg,2 nodes)
¾Simulate Database server Takeover

vlpar1_svc
vlpar1_boot vlpar2_boot
172.16.16网段
172.16.18网段
vlpar2_stdby
vlpar1_stdby
Active Standby
Node: Node:
hb_vlpar1 hb_vlpar2
test1vg
rg1:vlpar1_svc,test1vg 新加部分
Share Disk:hdisk1
Page 14
Case 3.Active - Active(Service ip,vg,2 nodes)

¾Simulate Application and DB server Takeover
vlpar1_svc vlpar2_svc
vlpar1_boot vlpar2_boot
172.16.16网段
172.16.18网段
vlpar2_stdby
vlpar1_stdby
Active Active
Node: Node:
hb_vlpar1 hb_vlpar2
test1vg
test2vg 新加部分
rg1:vlpar1_svc,test1vg
rg2:vlpar2_svc,test2vg
Share Disk:hdisk1/2
Page 15
Case 4.Persistent Ip(2 nodes)

新加部分
Page 16
Case 5.Active - Active(Service ip,vg,Disk heart beat,2 nodes)

¾Simulate Disk Heartbeat
新加部分
Page 17
Case６.Active - Active(Concurrent VG,4 nodes)
¾Simulate Concurrent Running Mode
修改部分
Page 18
Case 7.资源组之间的关联关系
Page 19
Thank
You!
© Copyright IBM Corporation 2010
Backups
Page 21
DARE(Dynamic Reconfiguration )
¾ When you configure an HACMP cluster, configuration data is stored in HACMP-

specific object classes in the Configuration Database (ODM). The AIX 5L ODM
object classes are stored in the default system configuration directory (DCD),
/etc/es/objrepos.
¾ At cluster startup, HACMP copies HACMP-specific ODM classes into a separate
directory called the Active Configuration Directory (ACD),
/usr/es/sbin/cluster/etc/objrepos/active . While a cluster is running, the HACMP
daemons, scripts, and utilities reference the Configuration Database data stored
in the active configuration directory (ACD) in the HACMP Configuration Database.
¾ If you synchronize the cluster topology or cluster resources definition while the
Cluster Manager is running on the local node, this action triggers a dynamic
reconfiguration event. In a dynamic reconfiguration event, the HACMP
Configuration Database data in the Default Configuration Directories (DCDs) on
all cluster nodes is updated and the HACMP Configuration Database data in the
ACD is overwritten with the new configuration data. The HACMP daemons are
refreshed so that the new configuration becomes the currently active
configuration.
Page 22
DARE support topology changes:
¾ Adding or removing nodes

¾ Adding or removing network interfaces
¾ Swapping a network interface card
¾ Changing network module tuning parameters
¾ Adding a new Heartbeating over Aliasing network
¾ Changing an active network to (but not from) Heartbeating over Aliasing
¾ Changing the address offset for a Heartbeating over Aliasing network
¾ Adding, changing or removing a network interface or a node configured in a
Heartbeating over Aliasing network.
¾ All topology configuration changes allowed in an HACMP configuration are
now supported in HACMP/XD configurations. Supported changes include
changes for XD-type networks,interfaces, sites, nodes, and NIM values.
Page 23
DARE support resource changes:

¾ Add, remove, or change an application server.
¾ Add, remove, or change application monitoring.
¾ Add or remove the contents of one or more resource groups.
¾ Add, remove, or change a tape resource.
¾ Add or remove one or more resource groups.
¾ Add, remove, or change the order of participating nodes in a resource group.
¾ Change the node relationship of the resource group.
¾ Change resource group processing order.
¾ Add, remove or change the fallback timer policy associated with a resource
group.
¾ Add, remove or change the settling time for resource groups.
¾ Add or remove node distribution policy for resource groups.
¾ Add, change, or remove parent/child or location dependencies for resource
groups.
¾ Add, change, or remove inter-site management policy for resource groups.
¾ Add or remove a replicated resource. (You cannot change a replicated resource
to non-replicated or vice versa.)
¾ Add, remove, or change pre- or post-events.
Page 24
DARE support topology changes(remind):

¾ •DARE changes to the settling time.
z The current settling time continues to be active until the resource group moves to
another node or goes offline. A DARE operation may result in the release and re-
acquisition of a resource group, in which case the new settling time values take
effect immediately.
¾ Changing the name of an application server, node or resource group.
z You must stop cluster services before they become active. You can include such a
change in a dynamic reconfiguration; however, HACMP interprets these changes,
particularly name change, as defining a new cluster component rather than as
changing an existing component. Such a change causes HACMP to stop the active
component before starting the new component, causing an interruption in service.
¾ Dynamic reconfiguration is not supported during a cluster migration to a new
version of HACMP, or when any node in the cluster has resource groups in the
UNMANAGED state.
Page 25
Tasks that Require Stopping the Cluster
¾ HACMP allows you to do many tasks without stopping the cluster; you can do
many tasks dynamically using the DARE and C-SPOC utilities. However, in
order to do the following tasks, you must stop the cluster:
z Change the name of a cluster component: network module, cluster node, or
network interface. Once you configure the cluster, you should not need to
change these names.
z Maintain RSCT.
z Change automatic error notification.
z Convert a service IP label from IPAT via IP Replacement to IPAT via IP
Aliases.
¾ Note: No automatic corrective actions take place during a DARE.
Page 26
NFS & HACMP

Service IP Labels/Addresses [nfssvr]
Application Servers []
Volume Groups [nfsvg]
Use forced varyon of volume groups, if necessary false
Automatically Import Volume Groups false
Filesystems (empty is ALL for VGs specified) [ ]
Filesystems Consistency Check fsck
Filesystems Recovery Method sequential
Filesystems mounted before IP configured true
Filesystems/Directories to Export [/home/testfs]
Filesystems/Directories to NFS Mount
[/mnt;/home/testfs]
Page 27
日常管理
z clshowsrv –v
查询HACMP子系统的状态
z clRGinfo
显示资源组目前的状态
z cllscf/cltopinfo
显示集群拓扑结构信息
z clshowres
显示资源组的配置信息
z cllsnw、cllsif
显示集群网络信息
z clstat(需要启动clinfoES服务)
显示集群内所有节点运行情况
Page 28
日常管理
z /usr/sbin/snap –e
collects the hacmp data.
z /usr/sbin/rsct/bin/dhb_read –p devicename –r/-t
test the link status of the disk heartbeating path.
z clpasswd
Changes a user’s password on each node in the cluster.
z cllsdisk
Lists PVIDs of accessible disks in a specified resource chain
z cllsvg
List volume groups accessible in a specified resource chain.
z cllsparam
Lists runtime parameters.
Page 29
日常管理
z cl_clstop
Stops cluster services on nodes running C-SPOC.
z cl_lsfs
Displays shared filesystem attributes for all cluster nodes.
z cl_lsgroup
Displays group attributes for all cluster nodes.
z cl_lslv
Displays shared logical volume attributes for cluster nodes.
z cl_lsuser
Displays user account attributes for all nodes.
z cl_lsvg
Displays shared volume group attributes for cluster nodes.
Page 30
日常管理-参数调整
z I/O pacing
每当系统内有其它应用在做大量的I/O操作时，用户可能会碰到如交互性能受到严重影响等
问题，能够通过调整系统的I/O pacing，以使系统在大量的磁盘读写操作期间的资源分配
更加均衡。可以使用smitty chgsys 去设置I/O pacing 到high-water 和low-water ，缺省值
为“0”（disable I/O pacing）。
z 改变故障检测速率
如果在群集内enable I/O pacing或延长syncd频率而不能解决deadman问题，在deadman
switch 在挂起节点上被请求之前和接管节点检测一个节点故障而获得挂起节点的资源之
前，可通过改变故障检测速率到–“slow”，延长这个被请求的时间。
z syncd 频率
编辑/sbin/rc.boot 文件去增加syncd 频率，可以从缺省的60 秒到30、20、10 秒，增加此
频率可在繁重的I/O传输期间促使更频繁的 I/O flush 和减少触发deadman switch的可能
性。
Page 31
HACMP相关的日志文件1/7
z /tmp/clstrmgr.debug
Contains time-stamped, formatted messages generated by the clstrmgrES daemon. The default messages
are verbose and are typically adequate for troubleshooting most problems, however IBM support may direct
you to enable additional debugging.
Recommended Use: Information in this file is for IBM Support personnel.
z /tmp/cspoc.log☺
Contains time-stamped, formatted messages generated by HACMP C-SPOC commands. The
/tmp/cspoc.log file resides on the node that invokes the C-SPOC command.
Recommended Use: Use the C-SPOC log file when tracing a C-SPOC command’s execution on cluster
nodes.
z /tmp/emuhacmp.out
Contains time-stamped, formatted messages generated by the HACMP Event Emulator. The messages are
collected from output files on each node of the cluster, and cataloged together into the /tmp/emuhacmp.out
log file. In verbose mode (recommended), this log file contains a line-by-line record of every event emulated.
Customized scripts within the event are displayed, but commands within those scripts are not executed.
Page 32
z /var/hacmp/log (V5.4以前/tmp/hacmp.out )
Contains time-stamped, formatted messages generated by HACMP scripts on the current
day.In verbose mode (recommended), this log file contains a line-by-line record of every
command executed by scripts,including the values of all arguments to each command.An
event summary of each high-level event is included at the end of each event’s details.
Recommended Use: Because the information in this log file supplements and expands
upon the information in the /usr/es/adm/cluster.log file, it is the primary source of
information when investigating a problem.
Note: With recent changes in the way resource groups are handled and prioritized in
fallover circumstances, the hacmp.out file and its event summaries have become even
more important in tracking the activity and resulting location of your resource groups. In
HACMP releases prior to 5.2, non-recoverable event script failures result in the
event_error event being run on the cluster node where the failure occurred. The
remaining cluster nodes do not indicate the failure. With HACMP 5.2 and up, all cluster
nodes run the event_error event if any node has a fatal error. All nodes log the error and
call out the failing node name in the hacmp.out log file
Page 33
z /usr/es/adm/cluster.log
Contains time-stamped, formatted messages generated by HACMP scripts and daemons.
Recommended Use: Because this log file provides a high-level view of current cluster status, check this file first
when diagnosing a cluster problem.
z /usr/es/sbin/cluster/history/cluster.mmddyyyy
Contains time-stamped, formatted messages generated by HACMP scripts. The system creates a cluster history file
every day, identifying each file by its file name extension,where mm indicates the month, dd indicates the day, and
yyyy the year. For information about viewing this log file and interpreting its messages, see the section
Recommended Use: Use the cluster history log files to get an extended view of cluster behavior over time.Note that
this log is not a good tool for tracking resource groups processed in parallel. In parallel processing,certain steps
formerly run as separate events are now processed differently and these steps will not be evident in the cluster
history log. Use the hacmp.out file to track parallel processing activity.
z /usr/es/sbin/cluster/snapshots/clsnapshot.log
Contains logging information from the snapshot utility of HACMP, and information about errors found and/or actions
taken by HACMP for resetting cluster tunable values.
Page 34
z /var/adm/clavan.log
Contains the state transitions of applications managed by HACMP. For example, when each application managed by HACMP is
started or stopped and when the node stops on which an application is running. Each node has its own instance of the file. Each
record in the clavan.log file consists of a single line. Each line contains a fixed portion and a variable portion:
Recommended Use: By collecting the records in the clavan.log file from every node in the cluster, a utility program can
determine how long each application has been up, as well as compute other statistics describing application availability time.
z /var/ha/log/grpglsm
Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Group Services
Globalized Switch Membership daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed
regularly.Therefore, please save it promptly if there is a chance you may need it.
z /var/ha/log/grpsvcs
Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Group Services
daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it
promptly if there is a chance you may need it.
Page 35
z /var/ha/log/topsvcs
Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Topology Services
daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it
promptly if there is a chance you may need it.
z /var/hacmp/clcomd/clcomddiag.log
Contains time-stamped, formatted, diagnostic messages generated by clcomd.
Recommended Use: Information in this file is for IBM Support personnel.
z /var/hacmp/clcomd/clcomd.log
Contains time-stamped, formatted messages generated by Cluster Communications daemon (clcomd) activity. The log shows
information about incoming and outgoing connections, both successful and unsuccessful. Also displays a warning if the file
permissions for /usr/es/sbin/cluster/etc/rhosts are not set correctly—users on the system should not be able to write to the file.
Recommended Use: Use information in this file to troubleshoot inter-node communications, and to obtain information about
attempted connections to the daemon (and therefore to HACMP).
z /var/hacmp/clverify/clverify.log ☺
The /var/hacmp/clverify/clverify.log file contains the verbose messages output by the cluster verification utility.The messages
indicate the node(s), devices, command, etc. in which any verification error occurred.
Page 36
z /var/hacmp/log/clutils.log
Contains information about the date, time, results, and which node performed an automatic cluster
configuration verification. It also contains information for the file collection utility,the two-node cluster
configuration assistant, the cluster test tool and the OLPW conversion tool.
z /var/hacmp/log/cl_configassist.log
Contains debugging information for the Two-Node Cluster Configuration Assistant. The Assistant stores up
to ten copies of the numbered log files to assist with troubleshooting activities.
z /var/hacmp/log/cl_testtool.log ☺
Includes excerpts from the hacmp.out file. The Cluster Test Tool saves up to three log files and numbers
them so that you can compare the results of different cluster tests.The tool also rotates the files with the
oldest file being overwritten
Page 37
z 修改默认日志目录
1. Enter smit hacmp
2. In SMIT, select System Management (C-SPOC) > HACMP Log Viewing and Management
> Change/Show a Cluster Log Directory.
3. Select a log that you want to redirect
Page 38
Q & A 规划-网络通讯
z Persistent ip
当HACMP成功启动，A机Persistent IP绑定ent0，Server IP绑定ent1（B机情况正常，忽
略不谈）,
- 如果1：拔ent0的网线，正常情况下，Persistent IP应该会漂移到ent1上，但是发现并没有漂移，并且
此时ping Server IP丢包严重
- 如果2：拔ent1的网线，Server IP正常的漂移到ent0上，此时ent0有3个IP（boot1、Persistent IP、
Server IP），然后接回ent1的网线，没有任何动作（应该是正确的情况吧？），再拔除ent0的网线，
发现Persistent IP和Server IP成功漂移到ent1上
问题：为什么会出现这样的问题呢？Persistent IP正常的情况下应该是可以在本机ent0和
ent1进行漂移的。（测试环境AIX5.3TL06SP01 HACMP5.4.0.1）
回答：这属于正常情况
Page 39
z Persistent ip
HACMP中能否指定persistent 使用某一块网卡？
回答：在配置当中不能指定，可以通过ifconfig命令来修改
Page 40
z Disk HeartBeat
最近看到一篇文档，发现把磁盘心跳的盘做成了资源组，请问什么时候需要这样做呢？
回答：目前用于磁盘心跳的VG可以用做其他用途，例如创建文件系统、配置成concurrent
rource group等；不过需要确保该盘不能太繁忙，如果过于繁忙，会引起dead man
swith。
Page 41
z 串口网络
有关HACMP心跳网络的各种常用实施方法，希望举例多台（三台或三台以上）小型机使
用八步异口卡配置HACMP ？
回答：根据不同的资源组类型，来定义不同的串口网络，主要分网状、星型状或环状三种
拓扑结构；建议在需要相互切换的节点之间，都需要配置non-ip网络，避免cluster被孤立
的情况。
Page 42
z 网关
主要就是网关如果加入？在脚本中加入？还是配置/etc/rc.net加入呢？还是有别的建议呢？
回答：对于service的网关，可以在rg的启动脚本中增加，也可以增加persistent ip来解
决。
Page 43
z EtherChannel
如果两块网卡做捆绑，在HACMP中是否需要有特殊的设置？
回答：没有特殊的配置．
Page 44
z Site
配置oracle rac需要hacmp做那些准备工作，HACMP的site是否必须配置？
回答：由于site用在异地灾备环境，在oracle rac运行环境中，一般不需要配置site.
Page 45
z rlogin
HACMP5.3、5.4是否可以不用配置rlogin环境？
回答：不需要，从hacmp 5.1开始就不需要了，hacmp采用clcomdES守护进程进行节点间
访问
Page 46
Q & A 规划-Oracle RAC

z LVM
在concurrent VG配置中，如果在conccurent VG中批量增加新的lv应如何操作,在c-spoc中
只能一个一个地加，容易出错,在AIX中加，必需将VG export之后，再import，进行同步，
之后VG中原来的lv属主属性就会改变
回答：通过C-SPOC创建完enhanced concurrent vg之后，通过HA把该vg在所有节点
varyon之后，在其中一个节点，采用命令行mklv方式去创建，这样就可以自动同步到其他
节点
Page 47

z LVM
在RAC配置中，Aix操作系统已支持conccurent VG，为什么还需要hacmp支持？
回答：创建enhanced concurrent vg可以不用hacmp，但是如果想在所有节点varyonvg该
vg，需要hacmp的group service，详细请看man varyonvg中的注释。
Page 48

z 网络
在RAC配置中，hacmp ehternet网络的public、private属性与RAC中网络的public、private
有什么联系与区别？
回答：建议在hacmp中的网络和RAC中的网络一致。
Page 49
Q & A 日常管理
z Take over
在配置EMC CX系列带PowerPath与IBM HACMP 5.2环境时，在HACMP切换时，对存储
的LUN每个进行SCSI Reservation Reset时，每个LUN花很长时间处理？
当共享卷组中包含大量的LV时（比如几百个），切换业务时表现得非常慢，这是正常现象
还是配置有问题？
回答：建议采用fast disk takeover方式
Page 50
Q & A 日常管理
z ip change
有关HACMP测试过程中反复插拔网线，往往发现网卡地址和实际配置不同，比如主网卡配置
172.168.1.1，备网卡配置192.168.1.1，经过反复插拔网线后，使用netstat –in命令发现主网
卡上地址是192.168.1.1，备网卡则成了172.168.1.1，明显和smitty tcpip里面的实际配置相反?
回答：基于ipat over replace方式的网络拓扑，这是正常现象。
Page 51
Q & A 日常管理
z Take over
共享存储用的是富士通的,切换测试时带takeover切换都正常,但这天遇麻烦了,主机突然
DOWN机，备机HACMP在获取共享存储时，无法清除硬盘上事前在主机上设置的的保留
标志，导致共享硬盘无法访问，进而共享卷组激活失败，最后备机无法接管主机的应用。
同主机的其它LPAR的HACMP用的存储是IBM ESS.就没有出现这情况.很轻松的就接管或
是手工的将RG move to backup-node.不知道是不是富士通的存储与IBM HACMP有兼容性
问题呢 ?
回答：一些第三方存储的解锁机制与IBM的存储不同，需要咨询存储厂家具体的解锁方
法，加到cl_disk_available脚本中。
Page 52

PowerHA - 3 Basic Configuration

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PowerHA - 3 Basic Configuration

Uploaded by

Copyright:

Available Formats

PowerHA SystemMirror

IBM Power Systems

© Copyright IBM Corporation 2010

PowerHA Configure Menu

Extended Topology Configuration

Extended Resource Configuration

Extended Resources Configuration

Extended Resource Group Configuration

Startup and Stop services

Startup and Stop services (V5.4版)

Startup and Stop services (V5.4版)

¾2.Active - Standby(Service ip,vg,2 nodes)

¾3.Active - Active(Service ip,vg,2 nodes)

¾4.Persistent Ip(2 nodes)

¾5.Active - Active(Service ip,vg,Disk heart beat,2 nodes)

¾6.Active - Active(Concurrent VG,2 nodes)

¾7 Dependency among RGs(4 nodes)

Case 1.Active - Standby(Service ip,2 nodes)

¾Simulate Application Takeover

Case 2.Active - Standby(Service ip,vg,2 nodes)

¾Simulate Database server Takeover

Case 3.Active - Active(Service ip,vg,2 nodes)

Case 4.Persistent Ip(2 nodes)

Case 5.Active - Active(Service ip,vg,Disk heart beat,2 nodes)

Case６.Active - Active(Concurrent VG,4 nodes)

¾Simulate Concurrent Running Mode

¾ When you configure an HACMP cluster, configuration data is stored in HACMP-

DARE support topology changes:

¾ Adding or removing nodes

DARE support resource changes:

DARE support topology changes(remind):

Tasks that Require Stopping the Cluster

¾ Note: No automatic corrective actions take place during a DARE.

NFS & HACMP

Q & A 规划-Oracle RAC

Q & A 规划-Oracle RAC

Q & A 规划-Oracle RAC

You might also like