Professional Documents
Culture Documents
LPM
(lslparmigr -r sys -m <system>
Mover service partition (MSP):
MSP is an attribute of the Virtual I/O Server partition. (This has to be set on HMC
for the VIOS LPAR). Two mover service partitions are involved in an active
partition migration: one on the source system, the other on the destination
system. Mover service partitions are not used for inactive migrations.
Virtual asynchronous services interface (VASI):
The source and destination mover service partitions use this virtual device to
communicate with the POWER Hypervisor to gain access to partition state. The
VASI device is included on the Virtual I/O Server, but is only used when the server
is declared as a mover service partition.
Hardware:
- Power6 or later systems
- System should be managed by at lease one HMC or IVM (if dual HMC, both on
same level and should communicate with each other) / - The destination system
must have enough processor and memory resources to host the mobile partition
VIOS:
- PowerVM Enterprise Edition with Virtual I/O Server (or dual VIOSes) (version
1.5.1.1 or higher)
- Working RMC connection between HMC and VIOS
- VIOS must be designated as a mover service partition on source and
destination
- VIOS must have enough virtual slots on the destination server
- VIOS on both system must have SEA configured to bridge to the same Ethernet
network used by the LPARs
- VIOS on both system must be capable of providing access to all disk resources
to the mobile partition
- If VSCSI is used it must be accessible by both VIO Servers (on source and
destination systems)
- If NPIV is used physical adapter max_xfer_size should be the same or greater at
dest.side (lsattr -El fcs0 | grep max_xfer_size)
LPAR:
- AIX version must be AIX 6.1 or AIX 7.1
- Working RMC connection between HMC and LPAR
- LPAR has a unique name (cannot be migrated if LPAR name is already used on
destination server)
- Migration readiness (LPAR in crashed or failed state cannot be migrated, maybe
a reboot is needed, validation will checkt his)
- No physical adapters may be used by the mobile partition during the migration
- No logical host Ethernet adapters
- LPAR should have a virtual Ethernet adapter
- The mobile partitions network and disk access must be virtualized by using one
or more Virtual I/O Servers
The disks used by the mobile partition must be accessed through virtual SCSI,
virtual Fibre Channel-based mapping, or both.
- If VSCSI is used no lv or files as backing devices (only LUNs can be mapped)
- If NPIV is used, each VFC client adapter must have a mapping to a VFC server
adapter on VIOS
- If NPIV is used, at least one LUN should be mapped to the LPAR`s VFC adapter.
- LPAR is not designated as a redundant error path reporting partition
- LPAR is not in workload group
System management -> Servers -> Trim screen, select the LPAR name:
Operations -> Mobility -> Validate/ Migrate
GPFS
GPFS provides high performance by allowing data to be accessed over multiple
computers at once. Most existing file systems are designed for a single server
environment,
and
adding
more
file
servers
does
not
improve
performance. GPFS provides higher input/output performance by striping
blocks of data from individual files over multiple disks, and reading and writing
these blocks in parallel. Other features provided by GPFS include high
availability,
support
for
heterogeneous
clusters,
disaster
recovery,
security, DMAPI, HSM and ILM
GPFS is highly scalable: (2000+ nodes)
GPFS is high performance file system
GPFS is highly available and fault tolerant
mmmount all a
mmdf fs1
POWERNIM
machines: shows the machines in NIM (master, clients)
networks: shows what type of network (topology: ent, Token-Ring... ) can be
used
resources: shows resource types: mksysb, spot, lppsource, bosinstdata
/ETC/NIMINFO:
This file always exist on the NIM master and mostly will exist on a NIM client. This
file is a text file and contains hostname information for the client, tells the client
who its master is, communication port and protocol information. This file should
not be edited manually. If there is incorrect information in the file, it should be
removed and recreated.
rebuild /etc/nimifo of NIM master:
on NIM master: nimconfig -r
rebuild /etc/niminfo of NIM client:
on NIM client: niminit -a master=<MASTER_HOSTNAME> -a
name=<CLIENT_NIM_NAME>
Preparing a system for maintenance (network) boot:
# lsnim / lsnim o check
# nim -Fo reset <client
# nim -o deallocate -a subclass=all <client>
# nim -o maint_boot -a spot=spot_5300-11-04 <client
Log -- /var/adm/ras/nimsh.log & /var/adm/ras/nimsh.log
Config files - /etc/bootptab & TFTPD (Trivial File Transfer Protocol)
BOOTPD:
This is the initial communication made between the NIM master and client during
network boot. When a NIM client is configured to be booted from the NIM Master,
the bootpd daemon will use the /etc/bootptab configuration file to pass
information to the client (server, gateway IP..).
(to remove the entry from /etc/bootptab a NIM reset operation on the client
machine is needed)
lssrc -ls inetd
bootps
/usr/sbin/bootpd
bootpd /etc/bootptab
active
TFTPB
When the NIM client has been rebooted for network boot, once bootp connection
has successfully been achieved, the NIM master uses tftp for transfer. When the
inetd daemon receives TFTP requests, it will start the tftpd daemon to service it,
and start the transfer of the boot image file from the /tftpboot directory. When a
SPOT is created, network boot images are constructed in the /tftpboot directory
using code from the newly created SPOT. When a client performs a network boot,
it uses tftp to obtain a boot image from the server.
# ls -l /tftpboot
lrwxrwxrwx 1 root
system
34 Dec 19 18:36 aix21.domain. -rw-r--r-- 1
root
system
1276 Dec 19 18:36 aix21.domain.com.info
-rw-r--r-- 1 root
system
9379964 Dec 8 15:31 spot_5200-08.chrp.64.ent
# lssrc -ls inetd
tftp
/usr/sbin/tftpd
tftpd -n
active
install NIM filesets
bos.sysmgt.nim.master
bos.sysmgt.nim.spot
nimadm
nimadm -j nimadmvg -c aix_client1 -s spot_6100-06-06 -l
lpp_source_6100-0606 -d hdisk1 Y
or
# nimadm -T bb_lpar61_mksysb -O /nim/mksysb/bb_lpar71_mksysb -s spot_710002-02 -l lpp_7100-02-02 -j nimvg -Y -N bb_lpar71_mksysb
Nimmaster
Power VM :
# echo "vfcs" | kdb ----- npiv
# echo "cvai" | kdb | grep vscsi ----vscsi
NPIV : vfcmap -vadapter vfchost0 -fcp fcs0 / lsnports
SEA : entstat -all ent8 | grep -e " Priority" -e "Virtual Adapter" -e "
State:" -e "High Availability Mode"
mkvdev -sea <PHYS> -vadapter <VIRT> -default <VIRT> -defaultid <VLAN>
chdev -dev entX -attr ha_mode=auto / Standby
VSCSI: mkvdev -vdev hdisk34 -vadapter vhost0 -dev vclient_disk
PowerHA
Network: Although normally this term would refer to a larger area of computer-tocomputer communication (such as a WAN), in PowerHA network refers to a logical
definition of an area for communication between two servers. Within PowerHA, even
SAN resources can be defined as a network.
Boot IP: This is a default IP address a node uses when it is first activated and
becomes available. Typicallyand as used in this articlethe boot IP is a nonroutable IP address set up on an isolated VLAN accessible to all nodes in the cluster.
Service IP: This is an IP address that can "float" between the nodes. Typically, this is
the IP address through which users access resources in the cluster.
Failover: This is a condition in which resource groups are moved from one node to
another. Failover can occur when a systems administrator instructs the nodes in the
cluster to do so or when circumstances like a catastrophic application or server failure
forces the resource groups to move.
Failback/fallback: This is the action of moving back resource groups to the nodes on
which they were originally running after a failover has occurred.
Heartbeat: This is a signal transmitted over PowerHA networks to check and confirm
resource availability. If the heartbeat is interrupted, the cluster may initiate a failover
depending on the configuration.
Problem
The node experienced an extreme performance problem, such as a large I/O transfer,
excessive error logging, or running out of memory, and the Topology Services daemon
(hatsd) is starved for CPU time. It could not reset the deadman switch within the time
allotted. Misbehaved applications running at a priority higher than the Cluster Manager can
also cause this problem.
d) The RG will only be brought online if the node has no other RG online,
find using #lssrc -ls clstrmgrES
ii) Fallover policy
When a failure happens, this determines the fallover target node.
a) To next priority node.
b) Using Dynamic Node Priority
c) Bring Offline
The RG will be brought offline in the event of an error occur.
iii) Fallback policy
This tells whether or not the RG will fallback.
a) Fallback to the next priority node (home node)
b) Never fallback
can