You are on page 1of 55

Materials may not be reproduced in whole or in part without the prior written permission of IBM. 5.

3
Path Management and SAN Boot with
MPIO on AIX
John Hock jrhock@us.ibm.com
Dan Braden dbraden@us.ibm.com
Power Systems Advanced Technical Skills
IBM Power Systems Technical Symposium 2011
2
2011 IBM Corporation
Agenda
Correctly Configuring Your Disks
Filesets for disks and multipath code
Multi-path basics
Multi Path I/O (MPIO)
Useful MPIO Commands
Path priorities
Failed Path Recovery and path health
checking
MPIO path management
SDD and SDDPCM
Multi-path code choices for DS4000, DS5000
and DS3950
XIV & Nseries
SAN Boot
IBM Power Systems Technical Symposium 2011
3
2011 IBM Corporation
Disk configuration
The disk vendor
Dictates what multi-path code can be used
Supplies the filesets for the disks and multipath code
Supports the components that they supply
A fileset is loaded to update the ODM to support the storage
AIX then recognizes and appropriately configures the disk
Without this, disks are configured using a generic ODM definition
Performance and error handling may suffer as a result
# lsdev Pc disk displays supported storage
The multi-path code will be a different fileset
Unless using the MPIO thats included with AIX

Beware of generic Other disk definition
No command queuing.
Poor Performance & Error Handling
https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates
ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/
IBM Power Systems Technical Symposium 2011
4
2011 IBM Corporation
How many paths for a LUN?
Server
FC Switch
Storage
Paths = (# of paths from server to switch) x
(# paths from storage to switch)
Here there are potentially 6 paths per LUN
But reduced via:
LUN masking at the storage
Assign LUNs to specific FC adapters at the host,
and thru specific ports on the storage
Zoning
WWPN or SAN switch port zoning
Dual SAN fabrics
divides potential paths by two
4 paths per LUN are sufficient for availability
and reduces CPU overhead for choosing the path
Path selection overhead is relatively lowusually negligible
MPIO has no practical limits to number of paths
Other products have path limits
SDDPCM limited to 32 paths per LUN
IBM Power Systems Technical Symposium 2011
5
2011 IBM Corporation
How many paths for a LUN?, contd
Dual SAN Fabric Reduces Potential Paths
4 X 4 = 16
Server
FC Switch
Storage
2 X 2 + 2 X 2 = 8
IBM Power Systems Technical Symposium 2011
6
2011 IBM Corporation
Path selection algorithms choose a path to hopefully minimize latency added to
an IO to send it over the SAN to the storage
Latency to send a 4 KB IO over a 8 Gbps SAN link is
4 KB / (8 Gb/s x 0.1 B/b x1048576 KB/GB) = 0.0048 ms
Multiple links may be involved, and IOs are round trip
As compared to fastest IO service times around 1 ms

If the links arent busy, there likely wont be much, if any, savings from
use of sophisticated path selection algorithims vs. round robin



Costs of path selection algorithms
CPU cycles to choose the best path
Memory to keep track of in-flight IOs down each path, or
Memory to keep track of IO service times down each path
Latency added to the IO to choose the best path
Path selection benefits and costs
Generally utilization
of links is low
IBM Power Systems Technical Symposium 2011
7
2011 IBM Corporation
Multi-path IO with VIO and VSCSI LUNs
VIO Client
MPIO
VIO Server
Multi-path code
VIO Server
Multi-path code
Disk
Subsystem
Two layers of multi-path code: VIOC and VIOS

VSCSI disks always use AIX default MPIO and
all IO for a LUN normally goes to one VIOS
algorithm = fail_over only

VIOS uses the multi-path code specified for the disk
subsystem

Set the path priorities for the VSCSI hdisks so half
use one VIOS, and half use the other

IBM Power Systems Technical Symposium 2011
8
2011 IBM Corporation
Multi-path IO with VIO and NPIV
VIO Client
Multi-path code
VIO Server VIO Server
Disk
Subsystem

VIOC has virtual FC adapters (vFC)
Potentially one vFC adapter for every real FC adapter
in each VIOC
Maximum of 64 vFC adapters per real FC adapter
recommended

VIOC uses multi-path code that the disk subsystem
supports

IOs for a LUN can go thru both VIOSs

One layer of multi-path code
VFC VFC VFC VFC
IBM Power Systems Technical Symposium 2011
9
2011 IBM Corporation
What is MPIO?
MPIO is an architecture designed by AIX development (released in AIX V5.2)
MPIO is also a commonly used acronym for Multi-Path IO
In this presentation MPIO refers explicitly to the architecture, not the acronym
Why was the MPIO architecture developed?
With the advent of SANs, each disk subsystem vendor wrote their own multi-path code
These multi-path code sets were usually incompatible
Mixing disk subsystems was usually not supported on the same system, and if they
were, they usually required their own FC adapters
Integration with AIX IO error handling and recovery
Several levels of IO timeouts: basic IO timeout, FC path timeout, etc
MPIO architecture details available to disk subsystem vendors
Compliant code requires a Path Control Module (PCM) for each disk subsystem
Default PCMs for SCSI and FC exist on AIX and often used by the vendors
Capabilities exist for different path selection algorithms
Disk vendors have been moving towards MPIO compliant code
MPIO Common Interface
IBM Power Systems Technical Symposium 2011
10
2011 IBM Corporation
Overview of MPIO Architecture
LUNs show up as an hdisk
Architected for 32 K paths
No more than 16 paths are necessary
PCM: Path Control Module
Default PCMs exist for FC, SCSI
Vendors may write optional PCMs
May provide commands to manage paths
Allows various algorithms to balance use
of paths
Full support for multiple paths to rootvg
Hdisks can be Available, Defined or non-existent
Paths can also be Available, Defined, Missing or non-existent
Path status can be enabled, disabled or failed if the path is Available
(use chpath command to change status)
Add path: e.g. after installing new adapter and cable to the disk
run cfgmgr (or cfgmgr l <adapter>)
One must get the device layer correct, before working with the path status layer
Tip: to keep paths <= 16, group
sets of 4 host ports and 4 storage ports
and balance LUNs across them
IBM Power Systems Technical Symposium 2011
11
2011 IBM Corporation
MPIO support
Storage Subsystem Family MPIO code Multi-path algorithm
IBM ESS, DS6000, DS8000,
DS3950, DS4000, DS5000,
SVC, V7000
IBM Subsystem Device
Driver Path Control
Module (SDDPCM)
fail_over, round_robin, load
balance, load balance port
DS3/4/5000 in VIOS
Default FC PCM
recommended
fail_over, round_robin
IBM XIV Storage System Default FC PCM fail_over, round_robin
IBM System Storage N Series Default FC PCM fail_over, round_robin
EMC Symmetrix Default FC PCM fail_over, round_robin
HP & HDS
(varies by model)
Hitachi Dynamic Link
Manager (HDLM)
fail_over, round robin,
extended round robin
Default FC PCM fail_over, round_robin
SCSI Default SCSI PCM fail_over, round_robin
VIO VSCSI Default SCSI PCM fail_over
IBM Power Systems Technical Symposium 2011
12
2011 IBM Corporation
Non-MPIO multi-path code
Storage subsystem family Multi-path code
IBM DS4000 Redundant Disk Array Controller (RDAC)
EMC Power Path
HP AutoPath
HDS HDLM (older versions)
Vertias supported storage Dynamic MultiPathing (DMP)
IBM Power Systems Technical Symposium 2011
13
2011 IBM Corporation
Mixing multi-path code sets
The disk subsystem vendor specifies what multi-path code is supported for their storage
The disk subsystem vendor supports their storage, the server vendor generally doesnt
You can mix multi-path code compliant with MPIO and even share adapters
There may be exceptions. Contact vendor for latest updates.
HP example: Connection to a common server with different HBAs requires separate HBA
zones for XP, VA, and EVA
Generally one non-MPIO compliant code set can exist with other MPIO compliant code sets
Except that SDD and RDAC can be mixed on the same LPAR
The non-MPIO compliant code must be using its own adapters
Devices of a given type use only one multi-path code set
e.g., you cant used SDDPCM for one DS8000 and SDD for another DS8000 on the same
AIX instance
IBM Power Systems Technical Symposium 2011
14
2011 IBM Corporation
Sharing Fibre Channel Adapter ports
Disk using MPIO compliant code sets can share adapter ports
Its recommended that disk and tape use separate ports
Disk (typicaly small block random) and
tape (typically large block sequential) IO
are different, and stability issues have
been seen at high IO rates
IBM Power Systems Technical Symposium 2011
15
2011 IBM Corporation
MPIO Command Set
lspath list paths, path status and path attributes for a disk

chpath change path status or path attributes
Enable or disable paths

rmpath delete or change path state
Putting a path into the defined mode means it wont be used (from available to
defined)
One cannot define/delete the last path of a device

mkpath add another path to a device or makes a defined path available
Generally cfgmgr is used to add new paths

chdev change a devices attributes (not specific to MPIO)

cfgmgr add new paths to an hdisk or make defined paths available
(not specific to MPIO)


IBM Power Systems Technical Symposium 2011
16
2011 IBM Corporation
Useful MPIO Commands
List status of the paths and the parent device (or adapter)
# lspath -Hl <hdisk#>
List connection information for a path
# lspath -l hdisk2 -F"status parent connection path_status path_id
Enabled fscsi0 203900a0b8478dda,f000000000000 Available 0
Enabled fscsi0 201800a0b8478dda,f000000000000 Available 1
Enabled fscsi1 201900a0b8478dda,f000000000000 Available 2
Enabled fscsi1 203800a0b8478dda,f000000000000 Available 3
The connection field contains the storage port WWPN
In the case above, paths go to two storage ports and WWPNs:
203900a0b8478dda
201800a0b8478dda
List a specific path's attributes
# lspath -AEl hdisk2 -p fscsi0 w 203900a0b8478dda,f00000000000
scsi_id 0x30400 SCSI ID False
node_name 0x200800a0b8478dda FC Node Name False
priority 1 Priority True
IBM Power Systems Technical Symposium 2011
17
2011 IBM Corporation
Path priorities
A Priority Attribute for paths can be used to specify a preference for path
IOs. How it works depends whether the hdisks algorithm attribute is set to
fail_over or round_robin.
Value specified is inverse to priority, i.e. 1 is high priority

algorithm=fail_over
the path with the higher priority value handles all the IOs unless there's a path failure.
the other path(s) will only be used when there is a path failure.
Set the primary path to be used by setting it's priority value to 1, and the next path's
priority (in case of path failure) to 2, and so on.
if the path priorities are the same and algorithm=fail_over, the primary path will be the
first listed for the hdisk in the CuPath ODM as shown by # odmget CuPath

algorithm=round_robin
If the priority attributes are the same, then IOs go down each path equally.
In the case of two paths, if you set path As priority to 1 and path Bs to 255, then for
every IO going down path A, there will be 255 IOs sent down path B.

To change the path priority of an MPIO device on a VIO client:
# chpath -l hdisk0 -p vscsi1 -a priority=2
Set path priorities for VSCSI disks to balance use of VIOSs
IBM Power Systems Technical Symposium 2011
18
2011 IBM Corporation
Path priorities
# lsattr -El hdisk9
PCM PCM/friend/otherapdisk Path Control Module False
algorithm fail_over Algorithm True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
lun_id 0x5000000000000 Logical Unit Number ID False
node_name 0x20060080e517b6ba FC Node Name False
queue_depth 10 Queue DEPTH True
reserve_policy single_path Reserve Policy True
ww_name 0x20160080e517b6ba FC World Wide Name False


# lspath -l hdisk9 -F"parent connection status path_status"
fscsi1 20160080e517b6ba,5000000000000 Enabled Available
fscsi1 20170080e517b6ba,5000000000000 Enabled Available

# lspath -AEl hdisk9 -p fscsi1 -w"20160080e517b6ba,5000000000000"
scsi_id 0x10a00 SCSI ID False
node_name 0x20060080e517b6ba FC Node Name False
priority 1 Priority True


Note: whether or not path priorities apply depends on the PCM.
With SDDPCM, path priorities only apply when the algorithm used is fail over (fo).
Otherwise, they arent used.
IBM Power Systems Technical Symposium 2011
19
2011 IBM Corporation
Path priorities why change them?
With VIOCs, send the IOs for half the LUNs to one VIOS and half to the other

Set priorities for half the LUNs to use VIOSa/vscsi0 and half to use
VIOSb/vscsi1
Uses both VIOSs CPU and virtual adapters
algorithm=fail_over is the only option at the VIOC for VSCSI disks

With NSeries have the IOs go the primary controller for the LUN
Set via the dotpaths utility that comes with Nseries filesets



IBM Power Systems Technical Symposium 2011
20
2011 IBM Corporation
Path Health Checking and Recovery

Validate a path is working
Automate recovery of path

For SDDPCM and MPIO compliant disks, two hdisk attributes apply:

# lsattr -El hdisk26
hcheck_interval 0 Health Check Interval True
hcheck_mode nonactive Health Check Mode True

hcheck_interval
Defines how often the health check is performed on the paths for a device. The attribute supports a range
from 0 to 3600 seconds. When a value of 0 is selected (the default), health checking is disabled
Preferably set to at least 2X IO timeout value


hcheck_mode
Determines which paths should be checked when the health check capability is used:

enabled: Sends the healthcheck command down paths with a state of enabled
failed: Sends the healthcheck command down paths with a state of failed
nonactive: (Default) Sends the healthcheck command down paths that have no active I/O, including
paths with a state of failed. If the algorithm selected is failover, then the healthcheck command is
also sent on each of the paths that have a state of enabled but have no active IO. If the algorithm
selected is round_robin, then the healthcheck command is only sent on paths with a state of failed,
because the round_robin algorithm keeps all enabled paths active with IO.

Consider setting up error notification for path failures (later slide)

IBM Power Systems Technical Symposium 2011
21
2011 IBM Corporation
Path Recovery
MPIO will recover failed paths if path health checking is enabled with hcheck_mode=nonactive or failed
and the device has been opened

Trade-offs exist:
Lots of path health checking can create a lot of SAN traffic

Automatic recovery requires turning on path health checking for each LUN

Lots of time between health checks means paths will take longer to recover after repair

Health checking for a single LUN is often sufficient to monitor all the physical paths, but not to recover
them

SDD and SDDPCM also recover failed paths automatically

In addition, SDDPCM provides a health check daemon to provide an automated method of reclaiming failed
paths to a closed device.

To manually enable a failed path after repair or re-enable a disabled path:
# chpath -l hdisk1 -p <parent> w <connection> -s enable

To disable all paths using a specific FC port on the host:

# chpath l hdisk1 p <parent> -s disable
IBM Power Systems Technical Symposium 2011
22
2011 IBM Corporation
Path Health Checking and Recovery Notification!

One should also set up error notification for path failure, so that someone knows
about it and can correct it before something else fails.

This is accomplished by determining the error that shows up in the error log when a
path fails (via testing), and then

Adding an entry to the errnotify ODM class for that error which calls a script (that you
write) that notifies someone that a path has failed.

Hint: You can use # odmget errnotify to see what the entries (or stanzas) look like,
then you create a stanza and use the odmadd command to add it to the errnotify
class.
IBM Power Systems Technical Symposium 2011
23
2011 IBM Corporation
Path management with MPIO
Includes examining, adding, removing, enabling and disabling paths
Adapter failure/replacement or addition
VIOS upgrades (VIOS or multi-path code)
Cable failure and replacement
Storage controller/port failure and repair
Adapter replacement
Paths will not be in use if the adapter has failed, paths will be in the failed state
1. Remove paths with # rmpath l <hdisk> -p <parent> -w <connection> [-d]
-d will remove the path, without it the path will changed to Defined
2. Remove the adapter with # rmdev Rdl <fcs#>
3. Replace the adapter
4. cfgmgr
5. Check the paths with lspath
Its better to stop using a path before you know the path will disappear
Avoid timeouts, application delays or performance impacts and potential error
recovery bugs
IBM Power Systems Technical Symposium 2011
24
2011 IBM Corporation
Active/Active vs. Active/Passive Disk Subsystem Controllers
IOs for a LUN can be sent to any storage port with Active/Active controllers

LUNs are balanced across controllers for Active/Passive disk subsystems

So a controller is active for some LUNs, but passive for the others

IOs for a LUN are only sent to the Active controllers port for disk subsystems with Active/Passive
controllers

ESS, DS6000, DS8000, and XIV have active/active controllers
DS4000, DS5000, DS3950, Nseries, V7000 have active/passive controllers
The NSeries passive controller can accept IOs but IO latency is affected
The passive controller takes over in the event the active controller or all paths to it fail

MPIO recognizes Active/Passive disk subsystems and sends IOs only to the primary controller

Except under failure conditions, then the active/passive role switches for the affected LUNs

Terminology regaring active/active and active/passive varies considerably
IBM Power Systems Technical Symposium 2011
25
2011 IBM Corporation
Example: Active/Passive Paths
IBM Power Systems Technical Symposium 2011
26
2011 IBM Corporation
SDD: An Overview
SDD = Subsystem Device Driver Pre-MPIO Architecture

Used with IBM ESS, DS6000, DS8000 and the SAN Volume Controller, but
is not MPIO compliant

A host attachment fileset (provides subsystem-specific support code &
populates the ODM) and SDD fileset are both installed
Host attachment: ibm2105.rte
SDD: devices.sdd.<sdd_version>.rte

LUNs show up as vpaths, with an hdisk device for each path
32 paths maximum per LUN, but less are recommended with more than 600 LUNs

One installs SDDPCM or SDD, not both.

No support for rootvg, dump or paging devices
One can exclude disks from SDD control using the excludesddcfg command
Mirror rootvg across two separate LUNs on different adapters for availability

IBM Power Systems Technical Symposium 2011
27
2011 IBM Corporation
SDD
Load balancing algorithms
fo: failover
rr: round robin
lb: load balancing (aka. df or the default) and chooses adapter with fewest in-flight IOs
lbs: load balancing sequential optimized for sequential IO
rrs: round robin sequential optimized for sequential IO

The datapath command is used to examine vpaths, adapters, paths, vpath statistics,
path statistics, adapter statistics, dynamically change the load balancing algorithm,
and other administrative tasks such as adapter replacement, disabling paths, etc.

mkvg4vp is used instead of mkvg, and extendvg4vp is used instead of extendvg

SDD automatically recovers failed paths that have been repaired via the sddsrv
daemon
IBM Power Systems Technical Symposium 2011
28
2011 IBM Corporation
Does Load Balancing Improve Performance?
Load balancing tried to reduce latency by picking a less active path
but adds latency to choose the best path
These latencies are typically < 1% of typical IO service times
Load balancing is more likely to be of benefit in SANs with heavy utilizations or
with intermittent errors that slow IOs on some path
A round_robin algorithm is usually equivalent




Conclusion:
Load balancing is unlikely to improve performance--especially when
compared to other strategies like algorithm=round_robin or approaches
that balance IO with algorithm=fail_over
IBM Power Systems Technical Symposium 2011
29
2011 IBM Corporation
Balancing IOs with algorithm=fail_over
A fail_over algorithm can be efficiently used to balance IOs!
Any load_balancing algorithm must consume CPU and memory resources to determine
the best path to use.
It's possible to setup fail_over LUNs so that the loads are balanced across the available
FC adapters.
Let's use an example with 2 FC adapters. Assume we correctly lay out our data so that
the IOs are balanced across the LUNs (this is usually a best practice). Then if we
assign half the LUNs to FC adapterA and half to FC adapterB, then the IOs are evenly
balanced across the adapters!
A question to ask is, If one adapter is handling more IO than another, will this have a
significant impact on IO latency?
Since the FC adapters are capable of handling more than 35,000 IOPS then we're
unlikely to bottleneck at the adapter and add significant latency to the IO.
IBM Power Systems Technical Symposium 2011
30
2011 IBM Corporation
SDDPCM: An Overview
SDDPCM = Subsystem Device Driver Path Control Module

SDDPCM is MPIO compliant and can be used with IBM ESS, DS6000, DS8000,
DS4000 (most models), DS5000, DS3950, V7000 and the SVC

A host attachment fileset (populates the ODM) and SDDPCM fileset are both installed
Host attachment: devices.fcp.disk.ibm.mpio.rte
SDDPCM: devices.sddpcm.<version>.rte

LUNs show up as hdisks, paths shown with pcmpath or lspath commands
16 paths per LUN supported

Provides a PCM per the MPIO architecture

One installs SDDPCM or SDD, not both.

SDDPCM is recommended and strategic

IBM Power Systems Technical Symposium 2011
31
2011 IBM Corporation
Comparing AIX Default MPIO PCMs & SDDPCM
Feature/Function MPIO PCMs SDDPCM
How obtained
Provided as an integrated part of the base VIOS
POWERVM firmware and AIX operating system
product distribution
Provided by most IBM storage products for
subsequent installation on the various server
OSs that the device supports
Suported Devices
Supports most disk devices that the AIX
operating system and VIOS POWERVM
firmware support, including selected third-party
devices
Supports specific IBM devices and is referenced
by the particular device support statement. The
supported devices differ between AIX and
POWERVM VIOS
OS Integration
Considerations
Update levels are provided and are updated and
migrated as a mainline part of all the normal AIX
and VIOS service strategy and
upgrade/migration paths
Add-on software entity that has its own update
strategy and process for obtaining fixes. The
customer must manage coexistence levels
between both the mix of devices, operating
system levels and VIOS levels. NOT a licensed
program product.
Path Selection
Algorithms
Fail over (default)
Round Robin (excluding VSCSI disks)
Fail over
Round Robin
Load Balancing (default)
Load Balancing Port
Algorithm Selection
Disk access must be stopped in order to change
algorithm
Dynamic
SAN boot, dump,
paging support
Yes
Yes. Restart required if SDDPCM installed after
MPIOPCM and SDDPCM boot desired.
PowerHA & GPFS
Support
Yes Yes
Utilities
standard AIX performance monitoring tools
such as iostat and fcstat
Enhanced utilities (pcmpath commands) to
show mappings from adapters, paths, devices,
as well as performance and error statistics
IBM Power Systems Technical Symposium 2011
32
2011 IBM Corporation
SDDPCM
Load balancing algorithms
rr - round robin
lb - load balancing based on in-flight IOs per adapter
fo - failover policy
lbp - load balancing port (for ESS, DS6000, DS8000, V7000 and SVC only) based
on in-flight IOs per adapter and per storage port

The pcmpath command is used to examine hdisks, adapters, paths, hdisk statistics, path
statistics, adapter statistics, dynamically change the load balancing algorithm, and other
administrative tasks such as adapter replacement, disabling paths

SDDPCM automatically recovers failed paths that have been repaired via the pcmserv
daemon
MPIO health checking can also be used, and can be dynamically set via the pcmpath
command. This is recommended. Set the hc_interval to a non-zero value for an
appropriate number of LUNs to check the physical paths.
IBM Power Systems Technical Symposium 2011
33
2011 IBM Corporation
Path management with SDDPCM and the pcmpath command
# pcmpath query adapter
# pcmpath query device
# pcmpath query port
# pcmpath query devstats
# pcmpath query adaptstats
# pcmpath query portstats
# pcmpath query essmap
# pcmpath set adapter
# pcmpath set device path
# pcmpath set device algorithm
# pcmpath set device hc_interval
# pcmpath disable/enable ports
# pcmpath query wwpn
And more

SDD offers the similar datapath command
List adapters and status
List hdisks and paths
List DS8000/DS6000/SVC ports
List hdisk/path IO statistics
List adapter IO statistics
List DS8000/DS6000/SVC port statistics
List rank, LUN ID and more for each hdisk
Disable/enable paths to adapter
Disable/enable paths to a hdisk
Dynamically change path algorithm
Dynamically change health check interval
Disable/enable paths to a disk port
Display all FC adapter WWPNs
IBM Power Systems Technical Symposium 2011
34
2011 IBM Corporation
Path management with SDDPCM and the pcmpath command
# pcmpath query device

DEV#: 2 DEVICE NAME: hdisk2 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 600507680190013250000000000000F4
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0 fscsi0/path0 OPEN NORMAL 40928736 0
1* fscsi0/path1 OPEN NORMAL 16 0
2 fscsi2/path4 OPEN NORMAL 43927751 0
3* fscsi2/path5 OPEN NORMAL 15 0
4 fscsi1/path2 OPEN NORMAL 44357912 0
5* fscsi1/path3 OPEN NORMAL 14 0
6 fscsi3/path6 OPEN NORMAL 43050237 0
7* fscsi3/path7 OPEN NORMAL 14 0

* Indicates path to passive controller
2145 is a SVC which has active/passive nodes for a LUN
DS4000, DS5000, V7000 and DS3950 also have active/passive controllers
IOs will be balanced across paths to the active controller
IBM Power Systems Technical Symposium 2011
35
2011 IBM Corporation
Path management with SDDPCM and the pcmpath command
# pcmpath query devstats

Total Dual Active and Active/Asymmetrc Devices : 67

DEV#: 2 DEVICE NAME: hdisk2
===============================
Total Read Total Write Active Read Active Write Maximum
I/O: 169415657 2849038 0 0 20
SECTOR: 2446703617 318507176 0 0 5888

Transfer Size: <= 512 <= 4k <= 16K <= 64K > 64K
183162 67388759 35609487 46379563 22703724

Maximum value useful for tuning hdisk queue depths
20 is maximum inflight requests for the IOs shown
Increase queue depth until queue is not filling up or
until IO services times suffer (bottleneck is pushed to the subsystem)
writes > 3ms
reads > 15-20ms
See References for queue depth tuning whitepaper
IBM Power Systems Technical Symposium 2011
36
2011 IBM Corporation
SDD & SDDPCM: Getting Disks configured correctly
Install the appropriate filesets
SDD or SDDPCM for the required disks (and host attachment fileset)
If you are using SDDPCM, install the MPIO fileset as well which comes with AIX
devices.common.IBM.mpio.rte
Host attachment scripts
http://www.ibm.com/support/dlsearch.wss?rs=540&q=host+scripts&tc=ST52G7&dc=D410
Reboot or start the sddsrv/pcmsrv daemon

smitty disk -> List All Supported Disk
Displays disk types for which software support has been installed
Or # lsdev -Pc disk | grep MPIO
disk mpioosdisk fcp MPIO Other FC SCSI Disk Drive
disk 1750 fcp IBM MPIO FC 1750 DS6000
disk 2105 fcp IBM MPIO FC 2105 ESS
disk 2107 fcp IBM MPIO FC 2107 DS8000
disk 2145 fcp MPIO FC 2145 SVC
disk DS3950 fcp IBM MPIO DS3950 Array Disk
disk DS4100 fcp IBM MPIO DS4100 Array Disk
disk DS4200 fcp IBM MPIO DS4200 Array Disk
disk DS4300 fcp IBM MPIO DS4300 Array Disk
disk DS4500 fcp IBM MPIO DS4500 Array Disk
disk DS4700 fcp IBM MPIO DS4700 Array Disk
disk DS4800 fcp IBM MPIO DS4800 Array Disk
disk DS5000 fcp IBM MPIO DS5000 Array Disk
disk DS5020 fcp IBM MPIO DS5020 Array Disk

IBM Power Systems Technical Symposium 2011
37
2011 IBM Corporation
www-01.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350#AIXSDDPCM
IBM Power Systems Technical Symposium 2011
38
2011 IBM Corporation
IBM Power Systems Technical Symposium 2011
39
2011 IBM Corporation
Migration from SDD to SDDPCM
Migration from SDD to SDDPCM is fairly straightforward and doesn't require
a lot of time. The procedure is documented in the manual:
1. Varyoff your SDD VGs
2. Stop the sddsrv daemon via stopsrc -s sddsrv
3. Remove the SDD devices (both vpaths and hdisks) via instructions below
4. Remove the dpo device
5. Uninstall SDD and the host attachment fileset for SDD
6. Install the host attachment fileset for SDDPCM and SDDPCM
7. Configure the new disks (if you rebooted it's done, else run cfgmgr and
startsrc s pcmserv)
8. Varyon your VGs - you're back in business

To remove the vpaths and hdisks, use:

# rmdev -Rdl dpo

No exportvg/importvg is needed because LVM keeps track of PVs via PVID

Effective queue depths change (and changes to queue_depth will be lost):
SDD effective queue depth = # paths for a LUN x queue_depth
SDDPCM effective queue depth = queue_depth
IBM Power Systems Technical Symposium 2011
40
2011 IBM Corporation
Multi-path code choices for DS4000/DS5000/DS3950
These disk subsystems might use RDAC, MPIO or SDDPCM
Choices depend on model and AIX level
MPIO is strategic
SDDPCM uses MPIO and is recommended
SDDPCM not supported on VIOS yet for these disk subsystems so use MPIO
SAN cabling/zoning is more flexible with MPIO/SDDPCM than with RDAC
RDAC requires fcsA be connected to controllerA and fcsB connected to controllerB with
no cross connections
These disk subsystems have active/passive controllers
All IO for a LUN goes to its primary controller
Unless the paths to it fail, or the controller fails, then the other controller takes over
the LUN
Storage administrator assigns half the LUNs to each controller
The manage_disk_drivers command is used to choose the multi-path code
Choices vary among models and AIX levels
DS3950, DS5020, DS5100, DS5300 use MPIO or SDDPCM
IBM Power Systems Technical Symposium 2011
41
2011 IBM Corporation
Multi-path code choices for DS3950, DS4000 and DS5000
# manage_disk_drivers -l
Device Present Driver Driver Options
2810XIV AIX_AAPCM AIX_AAPCM,AIX_non_MPIO
DS4100 AIX_SDDAPPCM AIX_APPCM,AIX_fcparray
DS4200 AIX_SDDAPPCM AIX_APPCM,AIX_fcparray
DS4300 AIX_SDDAPPCM AIX_APPCM,AIX_fcparray
DS4500 AIX_SDDAPPCM AIX_APPCM,AIX_fcparray
DS4700 AIX_SDDAPPCM AIX_APPCM,AIX_fcparray
DS4800 AIX_SDDAPPCM AIX_APPCM,AIX_fcparray
DS3950 AIX_SDDAPPCM AIX_APPCM
DS5020 AIX_SDDAPPCM AIX_APPCM
DS5100/DS5300 AIX_SDDAPPCM AIX_APPCM
DS3500 AIX_AAPCM AIX_APPCM

To set the driver for use:
# manage_disk_drivers -d <device> -o <driver_option>

AIX_AAPCM - MPIO with active/active controllers
AIX_APPCM - MPIO with active/passive controllers
AIX_SDDAPPCM - SDDPCM
AIX_fcparray - RDAC
IBM Power Systems Technical Symposium 2011
42
2011 IBM Corporation
Other MPIO commands for DS3/4/5000
# mpio_get_config Av
Frame id 0:
Storage Subsystem worldwide name:
608e50017be8800004bbc4c7e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'DS-5020'
hdisk LUN # Ownership User Label
hdisk4 0 A (preferred) Array1_LUN1
hdisk5 1 B (preferred) Array2_LUN1
hdisk6 2 A (preferred) Array3_LUN1
hdisk7 3 B (preferred) Array4_LUN1
hdisk8 4 A (preferred) Array5_LUN1
hdisk9 5 B (preferred) Array6_LUN1

# sddpcm_get_config Av
output is the same as above
IBM Power Systems Technical Symposium 2011
43
2011 IBM Corporation
XIV
Host Attachment Kit for AIX
http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000802
# lsdev -Pc disk | grep xiv
disk 2810xiv fcp N/A
XIV support has moved from fileset support, to support within AIX
Installing the Host Attachment Kit is still recommended
Provides diagnostic and other commands
Disks configured as 2810xiv devices
ODM entries for XIV included with
AIX 5.3 TL 10,
AIX 6.1 TL3,
VIOS 2.1.2.x and
AIX 7
IBM Power Systems Technical Symposium 2011
44
2011 IBM Corporation
Nseries/NetApp
Nseries/NetApp has a preferred storage controller for each LUN
Not exactly an active/passive disk subsystem, as the non-preferred
controller can accept IO requests
I/O requests have to be passed to the preferred controller which
impacts latency
Install the SAN Toolkit
Ontap.mpio_attach_kit.*
Provides the dotpaths utility
and sanlun commands
dotpaths sets hdisk path priorities
to favor the primary controller

for every IO going down secondary path, there will be 255 IOs sent down primary path
IBM Power Systems Technical Symposium 2011
45
2011 IBM Corporation
Storage Area Network (SAN) Boot
Boot Directly from SAN
Storage is zoned
directly to the client
HBAs used for boot
and/or data access
Multi-path code for
the storage runs in
client
SAN Sourced Boot Disks
Affected LUNs are zoned
to VIOS(s) and assigned to
clients via VIOS definitions
Multi-path code in the client
will be the MPIO default
PCM for disks seen
through the VIOS.
Boot from an SVC
Storage is zoned
directly to the client
HBAs used for
boot and/or data
access
SDDPCM runs in
client (to support
boot)
Boot from SVC
via VIO Server
Affected LUNs are
zoned to VIOS(s) and
assigned to clients via
VIOS definitions
Multi-path code in the
client will be the MPIO
default PCM for disks
seen through the
VIOS.
IBM Power Systems Technical Symposium 2011
46
2011 IBM Corporation
Storage Area Network (SAN) Boot
Requirements for SAN Booting
System with FC boot capability
Appropriate microcode (system, FC adapter, disk subsystem and FC switch)
Disk subsystem supporting AIX FC boot
Some older systems dont support FC boot, if in doubt, check the sales manual

SAN disk configuration
Create the SAN LUNs and assign them to the system's FC adapters WWPNs prior to
installing the system
For non-MPIO configurations, assign one LUN to one WWPN to keep it simple

AIX installation
Boot from installation CD or NIM, this runs the install program
When you do the installation you'll get a list of disks that will be on the SAN for the
system
Choose the disks for installing rootvg
Be aware of disk SCSI reservation policies
Avoid policies that limit access to a single path or adapter
IBM Power Systems Technical Symposium 2011
47
2011 IBM Corporation
How to assure you install to the right SAN disk

Only assign the rootvg LUN to the host prior to install, assign data LUNs later, or

Create a LUN for rootvg with a size different than other LUNs, or

Write down LUN ID and storage WWN, or

Use disk with an existing PVID





These criteria can be used to select the LUN from the AIX install program (shown
in following screen shots) or via a bosinst_data file for NIM

IBM Power Systems Technical Symposium 2011
48
2011 IBM Corporation
Choose via Location Code
1 hdisk2 U8234.EMA.06EF634-V5-C22-T1-W50050768012017C2-L1000000000000
2 hdisk3 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L2000000000000
3 hdisk5 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L3000000000000
Storage WWN
LUN ID
IBM Power Systems Technical Symposium 2011
49
2011 IBM Corporation
Choose via Size
IBM Power Systems Technical Symposium 2011
50
2011 IBM Corporation
Choose via PVID
IBM Power Systems Technical Symposium 2011
51
2011 IBM Corporation
Storage Area Network Booting: Pros & Cons
The main benefits of SAN rootvg
Performance < 2 ms write, 5-10 ms read due to cache. Higher IOPS
Availability with built in RAID protection
Ability to easily redeploy disk
Ability to FlashCopy/MetroMirror the rootvg for backup/DR
Fewer hardware resources
SAN rootvg disadvantages
SAN problems can cause loss of access to rootvg ~ not an issue as app data is on SAN
anyway
Potential loss of system dump and diagnosis if loss of access to SAN is caused by a
kernel bug
Difficult to change multi-path IO code
Not an issue with dual VIOScan take down one VIOS at a time and change multi-
path code
SAN boot thru VIO with NPIV is like SAN boot
IBM Power Systems Technical Symposium 2011
52
2011 IBM Corporation
Changing multi-path IO code for rootvg not so easy
How do you change/update rootvg multi-path code when its in use?
Changing from SDD to SDDPCM (or vice versa) requires contacting
support if booting from SAN, or:
Move rootvg to internal SAS disks, e.g., using extendvg, migratepv, reducevg,
bosboot and bootlist, or use alt_disk_install
Change the multi-path code
Move rootvg back to SAN
Newer versions of AIX require a newer version of SDD or SDDPCM
Follow procedures in the SDD and SDDPCM manual for upgrades of AIX
and/or the multi-path code
Not an issue when using VIO with dual VIOSs
If one has many LPARs booting from SAN,
one SAS adapter with a SAS disk or two can be used
to migrate SDD to SDDPCM, one LPAR at a time
IBM Power Systems Technical Symposium 2011
53
2011 IBM Corporation
Documentation & References
Infocenter Multiple Path IO
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.baseadmn/doc
/baseadmndita/dm_mpio.htm

SDD and SDDPCM Support matrix:
www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350

Downloads and documentation for SDD
www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S400006
5&loc=en_US&cs=utf-8&lang=en

Downloads and documentation for SDDPCM:
www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S400020
1&loc=en_US&cs=utf-8&lang=en

IBM System Storage Interoperation Center (SSIC)
http://www-03.ibm.com/systems/support/storage/ssic/interoperability.wss

Guide to selecting a multipathing path control module for AIX or VIOS
http://www.ibm.com/developerworks/aix/library/au-multipathing/index.html

AIX disk queue depth tuning techdoc:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105745
IBM Power Systems Technical Symposium 2011
54
2011 IBM Corporation
Documentation & References
Hitachi MPIO Support Site
https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates
EMC MPIO Support Site
ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/
HP Support Site
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=u
s&objectID=c02619876&jumpid=reg_R1002_USEN
HP StorageWorks for IBM AIX
http://h18006.www1.hp.com/storage/aix.html

IBM Power Systems Technical Symposium 2011
55
2011 IBM Corporation
Session Evaluations
Session Number SE39
Session Name Working with San Boot
Date - Thursday, April 28, 14:30, Lake Down B
Friday, April 29, 13:00, Lake Hart B

You might also like