You are on page 1of 44

SAN Persistent Binding and

Multipathing in the 2.6 Kernel

Michelle Butler, Technical Program Manager


Andy Loftus, System Engineer
Storage Enabling Technologies
NCSA
mbutler@ncsa.uiuc.edu or aloftus@ncsa.edu

Slides available at http://dims.ncsa.uiuc.edu/set/san/

LCI Conference 2007 National Center for Supercomputing Applications

1
Who?
• NCSA
– a unit of the University of Illinois
at Urbana-Champaign
– a federal, state, university, and
industry funded center
• Academic Users
– NSF peer review
• Large amount of
applications/user needs
– 3rd party codes, user written…
– All running on same
environment
• Many research areas

LCI Conference 2007 National Center for Supercomputing Applications

2
NCSA’s 1st Dell Cluster
• Tungsten: 1750 server
cluster t
– 3.2 GHz Xeon f irs le
e a
• 2,560 processors (compute Th e-sc er!!!
only) rg s t
la l clu
• 16.4 TF; 3.8 TB RAM;122 l
TB disk De
• Dell OpenManage
– Myrinet
• Full bi-section – Production date: April 2004
– Lustre over Gig-E
• 13 DataDirect 8500 – User Environment
• 104 OSTs, 2 MDS • Platform Computing LSF
w/separate disk • Softenv
• 11.1 GB/sec sustained • Intel Compilers
– Power/Cooling • ChaMPIon Pro, MPICH,
• 593 KW / 193 tons VMI-2

LCI Conference 2007 National Center for Supercomputing Applications

3
NCSA’s 3rd Dell Cluster
• T2 – retired into:
• Tungsten-3 1955 blade cluster
– 2.6 GHz Woodcrest Dual Core
• 1,040 processors/2080 cores
• 22 TF; 4.1 TB RAM; 20 TB disk
• Warewulf
– Cisco InfiniBand
• 3 to 1 over-subscribed
• OFED-1.1 w/ HPSM subnet
manager
– Lustre over IB – Production date: March 2007
• 4 FasT controllers direct FC
• 1.2GB/s sustained – User Environment
• 8 OSTs and 2 MDS w/complete
auto failovers • Torque/Moab
– Power/Cooling • Softenv
• 148 KW / 42 tons • Intel Compilers
• VMI-2

LCI Conference 2007 National Center for Supercomputing Applications

4
NCSA’s 4th Dell Cluster
• Abe: 1955 blade cluster
– 2.33 GHz Cloverton Quad-Core
• 1,200 blades/9,600 cores
• 89.5 TF; 9.6 TB RAM; 120 TB disk
est
arg er!!!
• Perceus management; diskless boot
– Cisco Infiniband e l t
• 2 to 1 oversubscribed Th l clus
l
• OFED-1.1 w/ HPSM subnet De
manager
– Lustre over IB – Production date: May 2007
• 22 OSTs
(anticipated)
• 2 9500 DDN controllers direct FC
• 10 FasT controllers on SAN fabric – User Environment
• 8.4GB/s sustained • Torque/Moab
• 22 OSTs and 2 MDS w/complete • Sofenv
auto failovers • Intel Compiler
– Power/Cooling • MPI: evaluating Intel MPI,
• 500 KW / 140 tons MPICH, MVAPICH, VMI-2, etc.

LCI Conference 2007 National Center for Supercomputing Applications

5
NCSA Facility - ACB
• Advanced Computation Building
– Three rooms, totals:
• 16,400 sqft raised floor
• 4.5 MW power capacity
• 250 kW UPS
• 1,500 tons cooling capacity

– Room 200:
• 7,000 sqft – no columns
• 70” raised floor
• 2.3 MW power capacity
• 750 tons cooling capacity

LCI Conference 2007 National Center for Supercomputing Applications

6
NCSA’s Other Systems
• Distributed Memory Clusters
– Mercury (IBM, 1.3/1.5 GHz Itanium2):
• 1,846 processors
• 10 TF; 4.6 TB RAM; 90 TB disk

• Shared Memory Clusters

– Copper (IBM p690,1.3 GHz Power4): 12 x 32


processors
• 2 TF; 64 or 256 GB RAM each; 35 TB disk

– Cobalt (SGI Altix, 1.5 GHz Itanium2): 2 x 512 processors


• 6.6 TF; 1 TB or 3 TB RAM; 250 TB disk

LCI Conference 2007 National Center for Supercomputing Applications

7
NCSA Storage Systems
• Archival: SGI/Unitree (5 PB total capacity)
– 72TB disk cache; 50 tape drives
– currently 2.8PB of data in MSS
• >1PB ingested in last 6 months
• project ~3.2PB by end of CY2006
• licensed to support 5PB resident data
– ~30 data collections hosted

• Infrastructure: 394TB Fiberchannel


SAN connected
– Fiberchannel SAN connected; FC and SATA environments
– Lustre, IBRIX, NFS filesystems

• Databases:
– 8 processor 12GB memory SGI Altix
• 30TB of SAN storage
• Oracle 10G, mysql, Postgres
– Oracle RAC cluster
– Single-system Oracle deployments for focused projects
LCI Conference 2007 National Center for Supercomputing Applications

8
Visualization Resources
• 30M-pixel Tiled Display Wall
– 8192 x 3840 pixels composite
display
– 40 NEC VT540 projectors, arranged
in a 5H x 8W matrix
– driven by 40-node Linux cluster
• dual-processor 2.4GHz Intel Xeons
with NVIDIA FX 5800 Ultra graphics
accelerator cards
• Myrinet interconnect
• to be upgrade by early CY2007
– funded by State of Illinois

• SGI Prisms
– 8 x 8 processor (1.6 GHz Itanium2)
– 4 graphics pipes each; 1 GB RAM each
– InfiniBand connection to Altix machines

LCI Conference 2007 National Center for Supercomputing Applications

9
SAN at NCSA
• 1.3PB spinning disk
– 895TB SAN attached
• 1392 Brocade switch ports
• 7 SAN fabrics
• 2 data centers

LCI Conference 2007 National Center for Supercomputing Applications

10
Persistent Binding
• Device naming problems
• Udev solution
• Examples
• Interactive Demo

LCI Conference 2007 National Center for Supercomputing Applications

11
Device Naming Problem
Before After

• Add hardware
• SAN zoning
• New SAN luns
• Modify config

LCI Conference 2007 National Center for Supercomputing Applications

Device node mapping can change with changes to


- hardware
- software
- SAN

Devices assigned random names (based on next available major/minor pair for device type)

CLUSTER
- Multiple hosts that see the same disk will assign the disk to different device nodes
- may be /dev/sda on system1 but /dev/sdc on system2
- Can change with hardware changes; what used to be /dev/sda is not /dev/sdc

Devfs helps only a little:


- Fixes device naming; on a single host, disk will always have the same device node
- But different hosts may have different device names for the same physical disk

12
What needs to happen
• Storage target always maps to same
local device (ie. /dev/…)
• Local device name should be meaningful
– /dev/sda conveys no information about the
storage device

LCI Conference 2007 National Center for Supercomputing Applications

13
udev - Persistent Device Naming
• “Udev is … a userspace solution for a
dynamic /dev directory, with persistent
device naming” *
– Userspace: not required to remain in memory
– Dynamic: /dev not filled with unused files
– Persistent: devices always accessable using the
same device node
• Provides for custom device names
* Daniel Drake (http://www.reactivated.net/writing_udev_rules.html)

LCI Conference 2007 National Center for Supercomputing Applications

Devfs provides dynamic and persistent naming, but:


- kernel based - entire device db stored in kernel memory, never swapped
- not possible to customize device names
UDEV CUSTOM
- custom names for devices
- custom scripts can be run when specifice devices attached/removed

14
Setting up udev device mapper
Overview

1. Uniquely identify each lun


2. Assign a meaningful name to each lun

LCI Conference 2007 National Center for Supercomputing Applications

15
1. Uniquely identify each lun
/sbin/scsi_id
device name
SCSI INQUIRY
scsi_id
Unique id

Sample usage:
root# scsi_id -g -u -s /block/sda
SSEAGATE_ST318406LC_____3FE27FZP000073302G5W

root# scsi_id -g -u -s /block/sdb


3600a0b8000122c6d00000000453174fc

LCI Conference 2007 National Center for Supercomputing Applications

/sbin/scsi_id
- INPUT: existing local device name
- OUTPUT: string that uniquely identifies the specific device (guaranteed unique among all scsi devices)

SAMPLE:
- sda: locally installed drive
- sdb: SAN attached disk

16
2. Associate a meaningful name
New udev rules file: /etc/udev/rules.d/20-local.rules
BUS="scsi", SYSFS{vendor}="DDN", SYSFS{model}="S2A 8000",
PROGRAM="/sbin/scsi_id -g -u -s /block/%k ",
RESULT="360001ff020021101092fadc32a450100", NAME="disk/fc/sdd4c1l0"

• BUS=scsi
– /sys/bus/scsi
• SYSFS
– <BUS>/devices/H:B:T:L/<filename>
• PROGRAM & RESULT
– Program to invoke and result to look for
• NAME
– Device name to create (relative to /dev)

LCI Conference 2007 National Center for Supercomputing Applications

Custom naming controlled by rulesets stored in /etc/udev/rules.d


A rule is a lists of keys to match against.
When all keys match, the specified action is taken (create a device name or symlink)

17
Example: Customizing for multiple paths

Problem
Multiple paths to a
single lun results in
multiple device
nodes.
Need to know which
path each device
uses.

LCI Conference 2007 National Center for Supercomputing Applications

18
Example: Customizing for multiple paths

Custom script : mpio_scsi_id


WWPN
Disk Ctlr
device name
scsi_id
udev mpio_scsi_id
WWPN + scsi_id

Sample udev rule:


BUS="scsi", SYSFS{vendor}="DDN", SYSFS{model}="S2A 8000",
PROGRAM="/root/bin/mpio_scsi_id %k",
RESULT="23000001ff03092f360001ff020021101092fadc32a450100",
NAME="disk/fc/sdd4c1l0"

LCI Conference 2007 National Center for Supercomputing Applications

Get disk controller WWPN


(Emulex) /sys/class/fc_transport/target<H>:<B>:<T>/port_name
(QLA) grep + awk to pull value from /proc/scsi/ql2xxx/<host_id>

19
Demo: udev persistent device naming
• Single HBA
• Single disk unit
– 4 luns
– Each lun presented
through both controllers
• Host sees 8 logical
luns
• Use mpio_scsi_id
to identify the ctlr-lun

LCI Conference 2007 National Center for Supercomputing Applications

20
Demo: udev persistent device naming
Original Configuation Custom device names
• udev config file • Custom rules file
– /etc/udev/udev.conf – 20-local.rules
• scsi_id config file • Restart udev
– /etc/scsi_id.config – udevstart
• Scan fc luns • Custom device
– {sysfs}/hostX/scan
names created
– /dev/disk/by-id
– /dev/disk/fc

LCI Conference 2007 National Center for Supercomputing Applications

BEGIN
- tail -f /var/log/messages
1. Enable udev logging
2. Enable scsi_id for all devices (options -g)
3. /proc/partitions
4. Scan fc luns (echo “- - -” > /sys/class/scsi_host/hostX/scan)
5. See udev log lines in messages file ; See fc disks in /dev/disk/by-id
6. Enable 20-local rules file
7. Udevstart
8. See udev log lines in messages file ; See fc disks in /dev/disk/fc

DEFAULT CONFIGURATION
Local rules file already exists. Disable it.
Default behavior for scsi_id is to blacklist everything unknown (-b option). Enable white list everything (-
g option) so scsi_id’s will be returned.
Even before custom rules are in place, see default udev rule selection activity in /var/log/messages

After running delete_fc_luns, udev removes /dev/sdX devices files (/var/log/messages)

CUSTOM CONFIGURATION
Udev custom rules are selected (see /var/log/messages)

Major/Minor numbers line up for /dev/disk/fc/* and /proc/partition/*


21
Demo: udev persistent device naming
Debugging
• Not all sysfs files are available immediately
– HBA target WWPN
– Add udevstart to boot scripts
• Udev tools can help
– udevinfo
– udevtest

Examples
• udevinfo -a -p $(udevinfo -q path -n /dev/sdb)
• udevtest /block/sdb

LCI Conference 2007 National Center for Supercomputing Applications

Exmaple: multiple paths on Nadir


- If luns are removed (delete_fc_luns)
- Then added (scan_fc_luns)
- No matches are found in 20-local.rules
- Add syslog output to mpio_scsi_id
+ Shows params the script is called with
+ Shows what the script returns
+ target_wwpn is not getting set
- Run udevstart (luns already attached now), matches found in 20-local.rules and device files created
Probably either a driver or udev issue.
Easiest solution is to run scan_luns and udevstart at system boot time (/etc/rc.d/rc.local)

22
Custom script: ls_fc_luns
Get HBA list sysfs /sys/class/fc_host

Get HBA type lspci

sysfs (emulex) /sys/class/scsi_host/hostX/targetX:Y:Z


Get target list
/proc (QLA) /proc/scsi/qla2xxx/X

Get lun list sysfs /sys/class/scsi_host/hostX/targetX:Y:Z/X:Y:Z:L

Get lun info sysfs /sys/class/scsi_host/hostX/targetX:Y:Z/X:Y:Z:L/*

0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:0:0 sdb 3600a0b8000122c6d00000000453174fc


0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:0:1 sdc 3600a0b80000fd6320000000045317563
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:1:0 sdi 3600a0b8000122c6d00000000453174fc
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:1:1 sdj 3600a0b80000fd6320000000045317563

LCI Conference 2007 National Center for Supercomputing Applications

23
Custom script: lip_fc_hosts

Get host list ls_fc_luns

echo “1” > /sys/class/fc_host/hostX/lip

LCI Conference 2007 National Center for Supercomputing Applications

24
Custom script: scan_fc_luns

Get host list ls_fc_luns

echo “- - -” > /sys/class/scsi_host/hostX/scan

LCI Conference 2007 National Center for Supercomputing Applications

25
Custom script: delete_fc_luns

Get lun list ls_fc_luns

echo “1” > /sys/class/scsi_host/hostX/targetX:Y:Z/X:Y:Z:L/delete

LCI Conference 2007 National Center for Supercomputing Applications

26
udev - Additional Resources
• man udev
• http://www.emulex.com/white/hba/wp_linux26udev.pdf
– Excellent white paper
• http://www.reactivated.net/udevrules.php
– How to write udev rules
• http://www.us.kernel.org/pub/linux/utils/kernel/hotplug/
udev.html
– Information and links
• http://dims.ncsa.uiuc.edu/set/san
– FC tools : custom tools used in demo

LCI Conference 2007 National Center for Supercomputing Applications

27
Linux Multipath I/O
• Overview
• History
• Setup
• Demos
– Active / Passive Controller Pair
– Active / Active Controller Pair

LCI Conference 2007 National Center for Supercomputing Applications

28
Linux Multipath - History
Providers
• Storage Vendor
• HBA Vendor
• Filesystem
• OS

LCI Conference 2007 National Center for Supercomputing Applications

STORAGE VENDOR
- End to end solution (they provide disk, HBA, driver, add’l software, sometimes even FC switch)
- HBA’s (and other parts) come at a markup
- One location for support tickets, but no alternate recourse if they can’t fix the problem
- Proprietary requirements (typically require 2 HBA’s, only works with their systems)
HBA VENDOR
- QLA
> Linux support spotty
+ 2.4 kernel ok, but strict requirements (2 HBA’s, exactly 2 paths per lun, active/active controllers)
+ 2.6 kernel inconsistent behavior
> Solaris support spotty (2 months to get 1 machine working, next month stops working, machine was
untouched)
> Dropped Windows support prematurely (Windows MPIO layer not complete yet, only an API for
vendors)
> Proprietary solution, only works with their HBA’s and configuration software
- Emulex (unix philosophy, do one thing and do it well; MPIO doesn’t belong in the driver)
FILESYSTEM
- 3rd party - Veritos, others??
- Parallel Filesystems - Ibrix, Lustre, GPFS, CXFS (enable MPIO via failover hosts)
OS
- *NEW* Solaris 10 (XPATH, but requires Solaris branded QLA cards)
- *NEW* Linux (device mapper multipath) (RedHat4, Suse, others…)

29
Device Mapper Multipath
• Identify luns by scsi_id
• Create “path groups”
– Round-robin I/O on all paths
in groups
• Monitor paths for failure
– When no paths left in current
group, use next group
• Monitor failed paths for
recovery
– Upon path recovery, re-
check group priorities
– Assign new active group if
necessary

LCI Conference 2007 National Center for Supercomputing Applications

30
Linux Device Mapper Multipath

Overview

1. Identify unique luns


2. Monitor active paths for failure
3. Monitor failed paths for recovery

LCI Conference 2007 National Center for Supercomputing Applications

Multipath handles 3 areas.


All settings are saved in /etc/multipath.conf

31
1. Identify unique luns
Storage Device
• vendor
• product
• getuid_callout

device {
vendor "DDN"
product "S2A 8000"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
}

LCI Conference 2007 National Center for Supercomputing Applications

32
1. Identify unique luns
Multipath Device
• wwid
• alias

multipath {
wwid 360001ff020021101092fb1152a450900
alias sdd4l0
}

LCI Conference 2007 National Center for Supercomputing Applications

33
2. Monitor Healthy Paths for Failure
• Priority group • path_grouping_policy
– Collection of paths to – multibus
the same physical lun – failover
– I/O is split across all – group_by_prio
paths in round-robin – group_by_serial
fashion
– group_by_node

LCI Conference 2007 National Center for Supercomputing Applications

Multipath control creates priority groups.


Paths are grouped based on path_grouping_policy
MULTIBUS - all paths in one priority group (DDN) (no penalty to access luns via alternate controllers)
FAILOVER - one path per priority group (Use only 1 path at a time) (typically only 1 usable path, such as
IBM fastt with AVT disabled)
GROUP_BY_PRIO - Paths with same priority in same priority group, 1 group for each unique priority
(Priorities assigned by external program)
GROUP_BY_SERIAL - Paths grouped by scsi target serial (controller node WWN)
GROUP_BY_NODE - (I have not tested or researched this, never had a need to)

34
2. Monitor Healthy Paths for Failure
Path Grouping Policy = group_by_prio

• Path Priority • prio_callout


– Integer value assigned to a – 3rd party pgm to assign
path priority values to each path
– Higher value == higher
priority
multipath
– Directly controls priority
group selection Integer value Device name

prio_callout

LCI Conference 2007 National Center for Supercomputing Applications

Only matters if using “group_by_prio” grouping policy


DIRECTLY CONTROLS PRIORITY GROUP SELECTION
- Priority group with highest value is active group
PREVIOUS SLIDE - When all paths in a group are failed, next group becomes active. That would be the
priority group with the next highest priority value that has an active path.
PRIO_CALLOUT
- Provided by vendor or (more typically) custom script written by admin for specific setup
- If not using group_by_prio, then set this to /bin/true

35
2. Monitor Healthy Paths for Failure
• path_checker • no_path_retry
– tur – queue
– readsector0 – (N > 0)
– directio – fail
– (Custom)
• emc_clarion
• hp_sw

LCI Conference 2007 National Center for Supercomputing Applications

TUR
- SCSI Test Unit Ready
- Preferred if lun supports it (OK on DDN, IBM fastt)
- Does not cause AVT on IBM fastt
- Does not fill up /var/log/messages on failures
READSECTOR0
- physical lun access via /dev/sdX (IS THIS CORRECT???)
DIRECTIO
- physical lun access via /dev/sgY (IS THIS CORRECT???)
Both readsector0 and directio cause AVT on IBM fastt, resulting in lun thrashing
Both readsector0 and directio log “fail” messages in /var/log/messages (could be useful if you want to
monitor logs for these events)
NO_PATH_RETRY
- # of retries before failing path
- queue: queue I/O forever
- (N > 0): queue I/O for N retries, then fail
- fail: fail immediately

36
3. Monitor failed paths for recovery
• Failback
– Immediate (same as n=0)
– (n > 0)
– manual

LCI Conference 2007 National Center for Supercomputing Applications

FAILBACK
- When a path recovers, wait # seconds before enabling the path
- Recovered path is added back into multipath enabled path list
- multipath re-evaluates priority groups, changes active priority group if needed
MANUAL RECOVERY
- User runs ‘/sbin/multipath’ to update enabled paths and priority groups

37
Putting it all togehter
multipaths {
multipath {
wwid 3600a0b8000122c6d00000000453174fc
alias fastt21l0
}
multipath {
wwid 3600a0b80000fd6320000000045317563
alias fastt21l1
}
}
devices {
device {
vendor "IBM"
product "1742-900"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

path_grouping_policy group_by_prio
prio_callout "/usr/local/sbin/path_prio.sh %n"

path_checker tur
no_path_retry fail
failback immediate
}
}

LCI Conference 2007 National Center for Supercomputing Applications

38
Putting it all together
path_prio.sh
sdb matching
line
multipath path_prio.sh Primary-paths
50

/usr/local/etc/primary-paths
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:0 sdb 3600a0b8000122c6d00000000453174fc 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:1 sdc 3600a0b80000fd6320000000045317563 2
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:2 sdd 3600a0b8000122c6d0000000345317524 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:3 sde 3600a0b80000fd6320000000245317593 2
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:0 sdi 3600a0b8000122c6d00000000453174fc 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:1 sdj 3600a0b80000fd6320000000045317563 51
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:2 sdk 3600a0b8000122c6d0000000345317524 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:3 sdl 3600a0b80000fd6320000000245317593 51

LCI Conference 2007 National Center for Supercomputing Applications

PATH_PRIO.SH
- grep device from primary-paths file
- return value from last column

39
Demo: Active/Passive Disk
• Host
– One Emulex LP11000
• Disk
– IBM DS4500
– Luns presented through
both controllers
– Luns accessible via 1
controller only at a time
– AVT enabled

LCI Conference 2007 National Center for Supercomputing Applications

AVT
- Lun will migrate to alternate controller if requested there
- Tolerance of cable/switch failure
- AVT penalty - lun inaccessible for 5-10 secs while controller ownership changing
SCREENS: /var/log/messages , multi-port-mon , command , script host
1. No luns (ls_fc_luns)
2. /etc/multipath.conf
1. Multipaths (fastt)
2. Devices (fastt)
3. /usr/local/sbin/path_prio.sh
1. Identify controller A, controller B
4. /usr/local/etc/primary-paths
5. Add luns (scan_fc_luns)
1. See multipath bindings & path_prio.sh output in /var/log/messages
6. View current multipath configuration
1. Multipath -v2 -l
7. Failover test
1. Script-host: disable disk port A
2. See multipathd reconfig in /var/log/messages
3. See I/O path change in multi-port-mon
8. Recover test
1. Script-host: enable disk port A

40
Demo: Active/Active Disk
• Host
– One Emulex LP11000
• Disk
– DDN 8500
– Luns accessible via
both controllers (no
penalty)

LCI Conference 2007 National Center for Supercomputing Applications

SCREENS: multi-port-mon , /var/log/messages , command , script-host


1. /etc/multipath.conf
1. Devices (DDN) (path_prio = /bin/true ; path_grouping_policy = multibus)
2. Multipath (DDN)
2. Luns present? (ls_fc_luns) Add luns if needed (scan_fc_luns)
1. See multipath bindings in /var/log/messages
3. View multipath configuration
1. Multipath -v2 -l
4. Failover test
1. Expected changes in multi-port-mon
2. Disable switch port for disk ctlr 1
3. See failover in /var/log/messages and multi-port-mon
5. Restore ctlr access
1. Expected changes in multi-port-mon
2. Enable switch port for disk ctlr 1
3. See failback in /var/log/messages and multi-port-mon

41
Path Grouping Policy Matrix

1 HBA 2 HBAs

(demo1)
Active/Active multibus
multibus

Active/Passive (demo2)
path_prio
with AVT path_prio

Active/Passive *multiple points


failover
w/o AVT of failure

LCI Conference 2007 National Center for Supercomputing Applications

ACTIVE/ACTIVE 2 HBAs
- trivial, same as demo1
- Each HBA sees 1 ctlr
- Can let both HBAs see both ctlrs (4 paths to each lun)
+ Use path_prio if need to control path usage
ACTIVE/PASSIVE (AVT) 2 HBAs
- trivial, similar to demo2
ACTIVE/PASSIVE (no AVT) 1 HBA
- Tolerant of ctlr failure only.
- If anything else fails, luns will not AVT to alternate ctlr, host will lose access
ACTIVE/PASSIVE (no AVT) 2 HBAs
- Non-preferred paths will be failed
- Each HBA must have full access to both controllers

42
Linux Multipath Errata
• Making changes to multipath.conf
– Stop multipathd service
– Clear multipath bindings
• /sbin/multipath -F
– Create new multipath bindings
• /sbin/multipath -v2 -l
– Start multipathd service
• Cannot multipath root or boot device
• user_friendly_names
– Not really, just random names dm-1, dm-2 …

LCI Conference 2007 National Center for Supercomputing Applications

CANNOT MULTIPATH ROOT OR BOOT DEVICE


- per ap-rhcs-dm-multipath-usagetxt.html (see references section)

43
Linux Multipath Resources
• multipath.conf.annotated
• man multipath
• http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=H
ome
– Multipath tools official home
• http://www.redaht.com/docs/manuals/csgfs/browse/rh-
cs-en/ap-rhcs-dm-multipath-usagetxt.html
– Description of output (multipath -v2 -l)
• http://kbase.redhat.com/faq/FAQ_85_7170.shtm
– Setup device-mapper multipathing in Red Hat Enterprise Linux 4?
• http://dims.ncsa.uiuc.edu/set/san
– Multi-port-mon
– Set switchport state : (en/dis)able switch port via SNMP

LCI Conference 2007 National Center for Supercomputing Applications

MULTIPATH.CONF.ANNOTATED (RedHat)
- /usr/share/doc/device-mapper-multipath-0.4.5/multipath.conf.annotated

44

You might also like