You are on page 1of 11

HMC

VIRTUAL SERIAL ADAPTER


Virtual serial adapters provide a point-to-point connection from one logical
partition to another or from the Hardware Management Console (HMC) to each
logical partition on the managed system.
HMC BACKUP + RECOVERY:
Back up Critical Console Data (HMC Data on v7):
it backs up everything (installed data, efixes, cutomization), it could take an
hour it is not a bootable media.
Save Upgrade Data:
only saves customization data, and it is used only when you will boot from
recovery media or network image (before upgrades), it takes less than a minute
For recovery we need a recovery media (or a base installation image)
and the Critical Console Data.
How to move DVD-ROM or CD-ROM drive from one LPAR to another:
If you don't know which LPAR owns the CD-ROM drive, use the HMC manager or
WEBSM tool.
Select the managed system and open "Properties".
Select the "I/O" tab. Look for the I/O device with the description "Other Mass
Storage Controller" and read the "Owner" field. This will show the LPAR currently
owning that device.
Find the parent adapter of the DVD or CD device:
Find the slot containing the IDE bus:
Remove the slot from this host:
Select the LPAR currently owning the CD-ROM, and in the Actions menu select:
Dynamic Logical Partitioning -> Physical Adapters -> Move or Remove
Select the adapter for "Other Mass Storage Controller" and move to the desired
target LPAR.
This will perform a DLPAR operation on both the source and target LPAR.
Log in as root and run- target node
=============================================
The Dynamic Platform Optimizer (DPO) is a PowerVM virtualization feature
that improves partition memory and processor placement (affinity) on Power
servers.
DPO determines an optimal RAM and CPU placement for the server based on the
partition configuration and hardware topology. After that it performs memory and
processor relocations to reach optimal layout. This process occurs dynamically
while the partitions are running.
HMC UPDATE - HMC UPGRADE
(Remote HMC update/upgrade using network images.)
HMC UPDATES : download iso image / Save config, backup and reboot
Updates : updhmc -t s -h <servername> -f <location of iso> -u
<user> -i
HMC UPGRADE : download network images / Save profile data of each
Man. Sys/
save config, backup HMC and reboot /
4. Save Upgrade Data # saveupgdata -r disk
5. FTP network images to HMC getupgfiles (ls -l /hmcdump shows progress)

6. Set HMC to boot from alternate disk chhmc -c altdiskboot -s enable


mode upgrade
# hmcshutdown -t now -r / Create a backup

HMC command line:


lshmc V /-v (version / model) . lshmc n (nework) lshmc r (check the
boot mode)
hmcshutdown t now r (reboot)
lslogon find the user
monhmc -r mem/proc/disk n 0 (- to check mem / cpu /disk)
vtmenu / mkvterm / rmvterm
lssyscfg and lshwres
Syntax: lssyscfg -m <sys> -r <res> --filter "<name>=<value>,<value>"
-F <attr>,<attr> --header
(e.g) lssyscfg -r sys -F name / lssyscfg -m <sys> -r lpar
Syntax: lshwres -m <sys> -r <res> --rsubtype <subtype> --level
<lpar/sys> --filter "<name>=<value>,<value>" -F <attr>,<attr>
--header
(e.g) :
Physical I/O (io):
lshwres -m <sys> -r io --rsubtype slot/slotchildren
MEM/PROC (mem/proc):
lshwres -m <sys> -r mem/proc --level lpar/sys
VIRTUAL I/O (virtualio):
lshwres -m <sys> -r virualio --rsubtype vswitch
lshwres -m <sys> -r virualio --rsubtype eth --level lpar
lshwres -m <sys> -r virualio --rsubtype fc --level lpar
SR-IOV (sriov):
lshwres -m <sys> -r sriov --rsubtype physport --level ethc
Power ON Physical Machines : chsysstate -m <sys> -o standby -r sys / -o off
--immed
Power ON LPAR Machines : chsysstate -m <sys> -r lpar -n <lpar> -o on -f
default / -o shutdown immed / --restart
DLPAR Capable : lspartition dlpar
What IPs are assigned to HMC (physical machine) : lssysconn -r all
Discover all server IP in HMC : mssysconn o auto

LPM
(lslparmigr -r sys -m <system>
Mover service partition (MSP):
MSP is an attribute of the Virtual I/O Server partition. (This has to be set on HMC
for the VIOS LPAR). Two mover service partitions are involved in an active
partition migration: one on the source system, the other on the destination
system. Mover service partitions are not used for inactive migrations.
Virtual asynchronous services interface (VASI):
The source and destination mover service partitions use this virtual device to
communicate with the POWER Hypervisor to gain access to partition state. The
VASI device is included on the Virtual I/O Server, but is only used when the server
is declared as a mover service partition.
Hardware:
- Power6 or later systems
- System should be managed by at lease one HMC or IVM (if dual HMC, both on
same level and should communicate with each other) / - The destination system
must have enough processor and memory resources to host the mobile partition
VIOS:
- PowerVM Enterprise Edition with Virtual I/O Server (or dual VIOSes) (version
1.5.1.1 or higher)
- Working RMC connection between HMC and VIOS
- VIOS must be designated as a mover service partition on source and
destination
- VIOS must have enough virtual slots on the destination server
- VIOS on both system must have SEA configured to bridge to the same Ethernet
network used by the LPARs
- VIOS on both system must be capable of providing access to all disk resources
to the mobile partition
- If VSCSI is used it must be accessible by both VIO Servers (on source and
destination systems)
- If NPIV is used physical adapter max_xfer_size should be the same or greater at
dest.side (lsattr -El fcs0 | grep max_xfer_size)
LPAR:
- AIX version must be AIX 6.1 or AIX 7.1
- Working RMC connection between HMC and LPAR
- LPAR has a unique name (cannot be migrated if LPAR name is already used on
destination server)
- Migration readiness (LPAR in crashed or failed state cannot be migrated, maybe
a reboot is needed, validation will checkt his)
- No physical adapters may be used by the mobile partition during the migration
- No logical host Ethernet adapters
- LPAR should have a virtual Ethernet adapter
- The mobile partitions network and disk access must be virtualized by using one
or more Virtual I/O Servers
The disks used by the mobile partition must be accessed through virtual SCSI,
virtual Fibre Channel-based mapping, or both.
- If VSCSI is used no lv or files as backing devices (only LUNs can be mapped)
- If NPIV is used, each VFC client adapter must have a mapping to a VFC server
adapter on VIOS
- If NPIV is used, at least one LUN should be mapped to the LPAR`s VFC adapter.
- LPAR is not designated as a redundant error path reporting partition
- LPAR is not in workload group

Validating / Migration the LPAR:

System management -> Servers -> Trim screen, select the LPAR name:
Operations -> Mobility -> Validate/ Migrate

GPFS
GPFS provides high performance by allowing data to be accessed over multiple
computers at once. Most existing file systems are designed for a single server
environment,
and
adding
more
file
servers
does
not
improve
performance. GPFS provides higher input/output performance by striping
blocks of data from individual files over multiple disks, and reading and writing
these blocks in parallel. Other features provided by GPFS include high
availability,
support
for
heterogeneous
clusters,
disaster
recovery,
security, DMAPI, HSM and ILM
GPFS is highly scalable: (2000+ nodes)
GPFS is high performance file system
GPFS is highly available and fault tolerant

1. Verify the system environment


2. Create a GPFS cluster
3. Define NSDs
4. Create a GPFS file system
Install the GPFS software
GPFS.base
3.4.0.11 A F GPFS File Manager
GPFS.docs.data
3.4.0.4
A F GPFS Server Manpages and
Documentation
GPFS.gnr
3.4.0.2
A F GPFS Native RAID
Create the GPFS cluster
Ssh key gen should be done
mmcrcluster -N node1:manager-quorum -p node1 -r /usr/bin/ssh -R /usr/bin/scp
mmlscluster
mmchlicense server accept -N node1
mmstartup a
mmgetstae a
Add the second node to the cluster
mmaddnode -N node2
mmchlicense server accept -N node2
mmstartup -N node2
Create NSDs

Create a disk descriptor file /yourdir/data/diskdesc.txt using the format:


#DiskName:ServerList::DiskUsage:FailureGroup:DesiredName:StoragePoole
hdiskw:::dataAndMetadata::nsd1:
hdiskx:::dataAndMetadata::nsd2:
hdisky:::dataAndMetadata::nsd3:
hdiskz:::dataAndMetadata::nsd4:
mmcrnsd -F /yourdir/data/diskdesc.txt
Create a file system
mmcrfs /GPFS fs1 -F diskdesc.txt -B 64k
mmlsfs fs1

mmmount all a
mmdf fs1
POWERNIM
machines: shows the machines in NIM (master, clients)
networks: shows what type of network (topology: ent, Token-Ring... ) can be
used
resources: shows resource types: mksysb, spot, lppsource, bosinstdata
/ETC/NIMINFO:
This file always exist on the NIM master and mostly will exist on a NIM client. This
file is a text file and contains hostname information for the client, tells the client
who its master is, communication port and protocol information. This file should
not be edited manually. If there is incorrect information in the file, it should be
removed and recreated.
rebuild /etc/nimifo of NIM master:
on NIM master: nimconfig -r
rebuild /etc/niminfo of NIM client:
on NIM client: niminit -a master=<MASTER_HOSTNAME> -a
name=<CLIENT_NIM_NAME>
Preparing a system for maintenance (network) boot:
# lsnim / lsnim o check
# nim -Fo reset <client
# nim -o deallocate -a subclass=all <client>
# nim -o maint_boot -a spot=spot_5300-11-04 <client
Log -- /var/adm/ras/nimsh.log & /var/adm/ras/nimsh.log
Config files - /etc/bootptab & TFTPD (Trivial File Transfer Protocol)
BOOTPD:
This is the initial communication made between the NIM master and client during
network boot. When a NIM client is configured to be booted from the NIM Master,
the bootpd daemon will use the /etc/bootptab configuration file to pass
information to the client (server, gateway IP..).
(to remove the entry from /etc/bootptab a NIM reset operation on the client
machine is needed)
lssrc -ls inetd
bootps
/usr/sbin/bootpd
bootpd /etc/bootptab
active
TFTPB
When the NIM client has been rebooted for network boot, once bootp connection
has successfully been achieved, the NIM master uses tftp for transfer. When the
inetd daemon receives TFTP requests, it will start the tftpd daemon to service it,
and start the transfer of the boot image file from the /tftpboot directory. When a
SPOT is created, network boot images are constructed in the /tftpboot directory
using code from the newly created SPOT. When a client performs a network boot,
it uses tftp to obtain a boot image from the server.

# ls -l /tftpboot
lrwxrwxrwx 1 root
system
34 Dec 19 18:36 aix21.domain. -rw-r--r-- 1
root
system
1276 Dec 19 18:36 aix21.domain.com.info
-rw-r--r-- 1 root
system
9379964 Dec 8 15:31 spot_5200-08.chrp.64.ent
# lssrc -ls inetd
tftp
/usr/sbin/tftpd
tftpd -n
active
install NIM filesets
bos.sysmgt.nim.master
bos.sysmgt.nim.spot
nimadm
nimadm -j nimadmvg -c aix_client1 -s spot_6100-06-06 -l
lpp_source_6100-0606 -d hdisk1 Y
or
# nimadm -T bb_lpar61_mksysb -O /nim/mksysb/bb_lpar71_mksysb -s spot_710002-02 -l lpp_7100-02-02 -j nimvg -Y -N bb_lpar71_mksysb
Nimmaster

Power VM :
# echo "vfcs" | kdb ----- npiv
# echo "cvai" | kdb | grep vscsi ----vscsi
NPIV : vfcmap -vadapter vfchost0 -fcp fcs0 / lsnports
SEA : entstat -all ent8 | grep -e " Priority" -e "Virtual Adapter" -e "
State:" -e "High Availability Mode"
mkvdev -sea <PHYS> -vadapter <VIRT> -default <VIRT> -defaultid <VLAN>
chdev -dev entX -attr ha_mode=auto / Standby
VSCSI: mkvdev -vdev hdisk34 -vadapter vhost0 -dev vclient_disk

SR-IOV (Single Root IO Virtualization) is a network virtualization technology.

PowerHA

Cluster: A logical grouping of servers running PowerHA.

Node: An individual server within a cluster.

Network: Although normally this term would refer to a larger area of computer-tocomputer communication (such as a WAN), in PowerHA network refers to a logical
definition of an area for communication between two servers. Within PowerHA, even
SAN resources can be defined as a network.

Boot IP: This is a default IP address a node uses when it is first activated and
becomes available. Typicallyand as used in this articlethe boot IP is a nonroutable IP address set up on an isolated VLAN accessible to all nodes in the cluster.

Persistent IP: This is an IP address a node uses as its regular means of


communication. Typically, this is the IP through which systems administrators access
a node.

Service IP: This is an IP address that can "float" between the nodes. Typically, this is
the IP address through which users access resources in the cluster.

Application server: This is a logical configuration to tell PowerHA how to manage


applications, including starting and stopping applications, application monitoring, and
application tunables. This article focuses only on starting and stopping an application.

Shared volume group: This is a PowerHA-managed volume group. Instead of


configuring LVM structures like volume groups, logical volumes, and file systems
through the operating system, you must use PowerHA for disk resources that will be
shared between the servers.

Resource group: This is a logical grouping of service IP addresses, application


servers, and shared volume groups that the nodes in the cluster can manage.

Failover: This is a condition in which resource groups are moved from one node to
another. Failover can occur when a systems administrator instructs the nodes in the
cluster to do so or when circumstances like a catastrophic application or server failure
forces the resource groups to move.

Failback/fallback: This is the action of moving back resource groups to the nodes on
which they were originally running after a failover has occurred.

Heartbeat: This is a signal transmitted over PowerHA networks to check and confirm
resource availability. If the heartbeat is interrupted, the cluster may initiate a failover
depending on the configuration.

Deadman switch causes a node failure


This topic discusses what happens when a deadman switch causes a node failure.

Problem
The node experienced an extreme performance problem, such as a large I/O transfer,
excessive error logging, or running out of memory, and the Topology Services daemon
(hatsd) is starved for CPU time. It could not reset the deadman switch within the time
allotted. Misbehaved applications running at a priority higher than the Cluster Manager can
also cause this problem.

Dynamic automatic reconfiguration (DARE)


This process, called dynamic automatic reconfiguration or dynamic reconfiguration (DARE),
is triggered when you synchronize the cluster configuration after making changes on an
active cluster. Applying a cluster snapshot using SMIT also triggers a dynamic
reconfiguration event.
For example, to add a node to a running cluster, you simply connect the node to the cluster,
add the node to the cluster topology on any of the existing cluster nodes, and synchronize the
cluster. The new node is added to the cluster topology definition on all cluster nodes and the
changed configuration becomes the currently active configuration. After the dynamic
reconfiguration event completes, you can start cluster services on the new node.
lssrc -ls clstrmgrES shows if cluster is STABLE or not, cluster version, Dynamic
Node Priority (pgspace free, disk busy, cpu idle)
ST_STABLE: cluster services running with resources online
NOT_CONFIGURED: cluster is not configured or node is not synced
ST_INIT: cluster is configured but not active on this node
ST_JOINING: cluster node is joining the cluster
ST_VOTING: cluster nodes are voting to decide event execution
ST_RP_RUNNING: cluster is running a recovery program
RP_FAILED: recovery program event script is failed
ST_BARRIER: clstrmgr is in between events waiting at the barrier
ST_CBARRIER: clstrmgr is exiting a recovery program
ST_UNSTABLE: cluster is unstable usually due to an event error
HACMP logs:
/var/hacmp/adm/cluster.log
/var/hacmp/adm/history/cluster.mmddyyyy
/var/hacmp/log/hacmp.out
/var/hacmp/log/cspoc.log
/var/hacmp/log/clstrmgr.debug
/var/hacmp/clverify/clverify.log
i) Startup policy
This policy tells which node the RG should activate.
a) Online on the home node only
b) Online on first available node.
c) Online on all available nodes.

d) The RG will only be brought online if the node has no other RG online,
find using #lssrc -ls clstrmgrES
ii) Fallover policy
When a failure happens, this determines the fallover target node.
a) To next priority node.
b) Using Dynamic Node Priority
c) Bring Offline
The RG will be brought offline in the event of an error occur.
iii) Fallback policy
This tells whether or not the RG will fallback.
a) Fallback to the next priority node (home node)
b) Never fallback

can

You might also like