Linux+Recommended+Settings-Multipathing-IO Balance

Linux Recommended Settings
Applies to: FlashArray
To ensure the best performance with the Pure Storage FlashArray's, please use this guide for configuration and
implementation of Linux hosts in your environment. These recommendations apply to the versions of Linux that we
have certified as per our Compatibility Matrix.
Boot from SAN Considerations

If you are using a LUN to boot from SAN, you need to ensure the changes in your configuration files are applied upon
rebooting. This is done by rebuilding the initial ramdisk (initrd or initramfs) to include the proper kernel modules, files and
configuration directives after the configuration changes have been made. As the procedure slightly varies depending on
the host, we recommend that you refer to your vendor's documentation for the proper procedure.
When rebuilding the initial ramdisk, you will want to confirm that the necessary dependencies are in
place before rebooting the host to avoid any errors during boot. Refer to your vendor's
documentation for specific commands to confirm this information.
An example command from Oracle Linux to check the initramfs:
lsinitrd /boot/initramfs-$(uname -r).img | grep dm
An example file that may be missing that could result in failure to boot:
...(kernel build)/kernel/drivers/md/dm-round-robin.ko
HBAHBA I/O Timeout Settings

Though the Pure Storage FlashArray is designed to service IO with consistent low latency, there are error conditions
that can cause much longer latencies and it is therefore important to ensure dependent servers and applications are
tuned appropriately to ride out these error conditions without issue. By design, given the worst case, recoverable error
condition, the FlashArray will take up to 60 seconds to service an individual IO. You can do this with the following
commands.
You can check current timeout settings using the following command as root
find /sys/class/scsi_generic/*/device/timeout -exec grep -H . '{}' \;
For versions below RHEL 6, you can add the following command(s) into rc.local
©2018 Copyright Pure Storage. All rights reserved.

1
echo 60 > /sys/block/<Dev_name>/device/timeout
Note that the default timeout for normal file system commands is 60 seconds when udev is being used. If udev is not in
use, the default timeout is 30 seconds. If you are running RHEL 6+, and want to ensure the rules persist, then use the
udev method documented below.
Queue Settings
We recommend two changes to the queue settings. The first selects the 'noop' I/O scheduler, which has been shown to
get better performance with lower CPU overhead than the default schedulers (usually 'deadline' or 'cfq'). The second
change eliminates the collection of entropy for the kernel random number generator, which has high cpu overhead when
enabled for devices supporting high IOPS.
Manually Changing Queue Settings

(not required unless LUNs are already in use with wrong settings)
These settings can be safely changed on a running system, by locating the Pure LUNs:
grep PURE /sys/block/sd*/device/vendor
And writing the desired values into sysfs files:
echo noop > /sys/block/sdx/queue/scheduler
An example for loop is shown here to quickly set all Pure luns to the desired 'noop' elevator.
for disk in $(lsscsi | grep PURE | awk '{print $6}'); do

echo noop > /sys/block/${disk##/dev/}/queue/scheduler
done
All changes in this section take effect immediately, without rebooting for RHEL5 and 6. RHEL 4 releases will require a
reboot.
Applying Queue Settings with udev

Once the IO scheduler elevator has been set to 'noop' it is often desired to keep the setting persistent, after reboots.
Step 1: Create the Rules File
Create a new file in the following location (for each respective OS). The Linux OS will use the udev rules to set the
elevators after each reboot.
RHEL:
/etc/udev/rules.d/99-pure-storage.rules

2
Ubuntu:
/lib/udev/rules.d/99-pure-storage.rules
Step 2: Add the Following Entries to the Rules File (Version Dependent)
The following entries automatically sets the elevator to 'noop' each time the system is rebooted. Create a file that has
the following entries, ensuring each entry exists on one line with no carriage returns:
For RHEL 6.x, 7.x
# Recommended settings for Pure Storage FlashArray.
# Use noop scheduler for high-performance solid-state storage

ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE",
ATTR{queue/scheduler}="noop"
# Reduce CPU overhead due to entropy collection

ATTR{queue/add_random}="0"
# Spread CPU load by redirecting completions to originating CPU

ATTR{queue/rq_affinity}="2"
# Set the HBA timeout to 60 seconds

ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{model}=="FlashArray ", RUN+="/bin/sh -c 'echo 60 >
/sys/$DEVPATH/device/timeout'"
For RHEL 5.x
# Recommended settings for Pure Storage FlashArray.
# Use noop scheduler for high-performance solid-state storage

ACTION=="add|change", KERNEL=="sd*[!0-9]|", SYSFS{vendor}=="PURE*", RUN+="/bin/sh -c 'echo noop >
/sys/$devpath/queue/scheduler'"
Warning!
It is expected behavior that you only see the settings take effect for the sd* devices. The dm-* devices will
not reflect the change directly but will inherit it from the sd* devices that make up it's path.

3
Maximum IO Size Settings
The maximum allowed size of an I/O request in kilobytes is determined by the max_sectors_kbsetting in sysfs. This
restricts the largest IO size that the OS will issue to a block device The Pure Storage FlashArray can handle a
maximum of 4MB writes. Therefore, we need to make sure that the maximum allowed IO size matches our
expectations. You can check your current settings to determine what the IO size is, and as long as it does not exceed
4096, you should be fine.
Verify the Current Setting

1. Check which block device you are using with the Pure Storage array.
If you know which device you're looking at already (dm-#)
[root@host ~]# multipath -ll | grep -A 7 -B 0 "dm-6"

3624a9370ffa9a01386b3410600011036 dm-6 PURE,FlashArray
size=35G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 1:0:0:9 sdf 8:80 active ready running
|- 0:0:1:9 sdx 65:112 active ready running
|- 1:0:1:9 sdl 8:176 active ready running
`- 0:0:0:9 sdr 65:16 active ready running
OR
If we want to know all PURE volumes presented to the host
multipath -ll | grep -i PURE
2. Check the "max_sectors_kb" on your Linux host ( regardless of the kernel version or Linux distribution). Customer
will need to know which device.
$ cat /sys/block/sda/queue/max_sectors_kb
512
If the value is ≤ 4096, then no action is necessary. However, if this value is > 4096, we recommend that you change
the max to 4096.
Changing the Maximum Value
Reboot Persistent
We recommend that you add the value to your UDEV rules file (99-pure-storage.rules) created above. This will ensure
that the setting persists through a reboot. To change that value please do the following:
1. Changing the "max_sectors_kb" value by adding it to the UDEV rules (Reboot Persistent):adding it to the UDEV
rules)
echo 'ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_

VENDOR}=="PURE", ATTR{queue/max_sectors_kb}="4096"' >> /etc/udev/rules.d/99-pure-storage.rules

4
NOTE: The location of your rules file may be different depending on your OS version, so please double check the
command before running it.
2. Reboot the host.
3. Check the value again.
Immediate Change but Won't Persist Through Reboot
Warning!
This command should only be run if you are sure there are no running services depending on that volume,
otherwise you can risk an application crash.
If you need to make the change immediately, but cannot wait for a maintenance window to reboot, you can also change
the setting with the following command:
echo %VALUE% > /sys/block/sdz/queue/max_sectors_kb
%VALUE% should be ≤ 4096
Recommended DM-Multipathd Settings

The Multipath Policy defines how the host distributes IOs across the available paths to the storage.
The Round Robin (RR) policy distributes IOs evenly across all Active/Optimized paths. A newer
MPIO policy, queue-length, is similar to round robin in that IOs are distributed across all available
Active/Optimized paths, however it provides some additional benefits. The queue-length path
selector will bias IOs towards paths that are servicing IO quicker (paths with shorter queues). In the
event that one path becomes intermittently disruptive or is experiencing higher latency, queue-
length will prevent the utilization of that path reducing the effect of the problem path.
The following are recommended entries to existing multipath.conf files (/etc/multipath.conf) for Linux OSes. Add the
following to existing section for controlling Pure devices.
RHEL 7.3+:
No customer action is necessary for these values in Red Hat Enterprise Linux 7.3+.
More information:
• https://bugzilla.redhat.com/show_bug.cgi?id=1300415
• https://access.redhat.com/solutions/2772111

5
RHEL 6.2+ and supporting kernels:
defaults {
polling_interval 10
}
devices {
device {
vendor "PURE"
path_selector "queue-length 0"
path_grouping_policy multibus
path_checker tur
fast_io_fail_tmo 10
dev_loss_tmo 60
no_path_retry 0
}
}
RHEL 5.7+ - 6.1 and supporting kernels:

defaults {
polling_interval 10
}
devices {
device {
vendor "PURE"
path_selector "round-robin 0"
rr_min_io 1
path_checker tur
fast_io_fail_tmo 10
dev_loss_tmo 60
no_path_retry 0
}
}
RHEL 5.6 and below, and supporting kernels:

defaults {
polling_interval 10
}
devices {
device {
vendor "PURE"
path_selector "round-robin 0"
rr_min_io 1

6
path_checker tur
no_path_retry 0
}
}
An explanation of these settings.
Attribute RHEL Version Description
Specifies the interval between two

path checks in seconds. For
properly functioning paths, the
polling_interval 10 Any
interval between checks will
gradually increase to (4 *
polling_interval).
This specifies that it will only apply

vendor "PURE" Any
these settings to Pure LUNs.
Specifies the default algorithm to

use in determining what path to use
for the next I/O operation.
path_selector "queue-
6.2+
length" 0
queue-length 0: Send the next
bunch of I/O down the path with the
least number of outstanding I/O
requests.
Specifies the default algorithm to

use in determining what path to use
for the next I/O operation.
round-robin 0 : Loop through

path_selector "round-
Any below 6.2 every path in the path group,
robin 0"
sending the same amount of I/O to
each.
This setting does not consider path

queue length or service time.
Specifies the default path grouping

policy to apply to unspecified
multipaths.
path_grouping_policy_multibus
Any
multibus: all valid paths in 1
priority group.
This setting ensures that all paths

7
are in use at all times, preventing a

latent path problem from going
unnoticed.
Specifies the number of I/O requests

to route to a path before switching to
the next path in the current path
rr_min_io RHEL < 6.2 group. This setting is only for
systems running kernels older tha
2.6.31. Newer systems should use
rr_min_io_rq.
Specifies the number of I/O requests

to route to a path before switching to
the next path in the current path
group, using request-based device-
mapper-multipath. This setting
should be used on systems running
rr_min_io_rq 1 RHEL 6.2+ current kernels. On systems
running kernels older than 2.6.31,
use rr_min_io.
Since the default is already 1,

there is no need to manually add
this setting.
Specifies the default method used to

determine the state of the paths.
tur: Issue a TEST UNIT READY

to the device.
tur uses the SCSI command TEST

UNIT READY to determine if the
path is working. This is different
path_checker tur RHEL 5.7+
from the default RHEL setting of
directio. When our array is
failing over read operations will not
be serviced, but we should continue
to respond to TEST UNIT READY.
This should keep multipath from
propagating an I/O error up to the
application, even behond the SCSI
device timeout.
The number of seconds the SCSI

layer will wait after a problem has
fast_io_fail_tmo 10 RHEL 5.7+
been detected on an FC remote port
before failing I/O to devices on that

8
remote port. This value should be

smaller than the value of
dev_loss_tmo.
The number of seconds the SCSI

layer will wait after a problem has
dev_loss_tmo 60 RHEL 5.7+
been detected on an FC remote port
before removing it from the system.
A numeric value for this attribute

specifies the number of times the
system should attempt to use a
failed path before disabling
queuing.
A value of fail indicates immediate

failure, without queueing. A value of
queue indicates that queueing
should not stop until the path is
fixed.
Pure recommends setting this to a

value of zero to allow failovers to
happen faster, however in the event
no_path_retry 0 Any
you only have one path remaining to
the storage, a rescan of devices will
cause I/O errors because the rescan
will trigger a momentary path down
event on this last path.
Pure assumes multiple connections

per controller to each host. For
customers that only have single
connections per controller, set this
value to 6 (combined with the polling
interval set to 10) this will allow the
system to queue for 60 seconds
before triggering an I/O error on
your host.
More information can be found here: RHEL Documentation
Verifying the Settings

You can check the setup by looking at "multipath -ll".

9
6.2+ (queue-length)
# multipath -ll
Correct Configuration:
mpathe (3624a93709d5c252c73214d5c00011014) dm-2 PURE,FlashArray
`-+- policy='queue-length 0' prio=1 status=active
|- 1:0:0:4 sdd 8:48 active ready running
|- 1:0:1:4 sdp 8:240 active ready running
|- 1:0:2:4 sdab 65:176 active ready running
|- 1:0:3:4 sdan 66:112 active ready running
|- 2:0:0:4 sdaz 67:48 active ready running
|- 2:0:1:4 sdbl 67:240 active ready running
|- 2:0:2:4 sdbx 68:176 active ready running
`- 2:0:3:4 sdcj 69:112 active ready running
...
Incorrect Configuration (check for unecessary spaces in multipath.conf):
3624a9370f35b420ae1982ae200012080 dm-0 PURE,FlashArray

|-+- policy='round-robin 0' prio=0 status=active
| `- 2:0:0:3 sdc 8:32 active undef running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 3:0:0:3 sdg 8:96 active undef running
| `- 1:0:0:3 sdaa 65:160 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3 sdak 66:64 active undef running
...
Below 6.2 (Round Robin)

# multipath -ll
...
Correct Configuration:
`-+- policy='round-robin 0' prio=0 status=active
|- 2:0:0:3 sdc 8:32 active undef running
|- 3:0:0:3 sdg 8:96 active undef running
|- 1:0:0:3 sdaa 65:160 active undef running
...
Incorrect Configuration (check for unecessary spaces in multipath.conf):

|-+- policy='round-robin 0' prio=0 status=active
| `- 2:0:0:3 sdc 8:32 active undef running

10
| `- 3:0:0:3 sdg 8:96 active undef running
| `- 1:0:0:3 sdaa 65:160 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
...
Excluding Third-Party vendor LUNs from DM-Multipathd

When systems have co-existing multipathing software, it is often desired to exclude control from one multipathing
software in order to allow control by another multipathing software.
The following is an example on using DM-Multipathd to blacklist LUNs from a third party vendor. The syntax blocks DM-
Multipathd from controlling those luns that are "blacklisted".
The following can be added to the 'blacklist' section of the multipath.conf file.
blacklist {
device {
vendor "XYZ.*"
product ".*"
}
device {
vendor "ABC.*"
product ".*"
}
}
device-mapper-multipath and EMC PowerPath

Please note that having both device-mapper-multipath and EMC PowerPath on the same system may result in kernel
panics. Refer to RedHat's article : https://access.redhat.com/site/solutions/110553
Space Reclamation
You will want to make sure that space reclamation is configured on your Linux Host so that you do not run out of space.
For more information please see this KB: Reclaiming Space on Linux

11

Linux+Recommended+Settings-Multipathing-IO Balance

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linux+Recommended+Settings-Multipathing-IO Balance

Uploaded by

Copyright:

Available Formats

Linux Recommended Settings

Applies to: FlashArray

Boot from SAN Considerations

An example command from Oracle Linux to check the initramfs:

lsinitrd /boot/initramfs-$(uname -r).img | grep dm

HBAHBA I/O Timeout Settings

find /sys/class/scsi_generic/*/device/timeout -exec grep -H . '{}' \;

©2018 Copyright Pure Storage. All rights reserved.

Manually Changing Queue Settings

grep PURE /sys/block/sd*/device/vendor

And writing the desired values into sysfs files:

echo noop > /sys/block/sdx/queue/scheduler

for disk in $(lsscsi | grep PURE | awk '{print $6}'); do

Applying Queue Settings with udev

Step 1: Create the Rules File

©2018 Copyright Pure Storage. All rights reserved.

For RHEL 6.x, 7.x

# Recommended settings for Pure Storage FlashArray.

# Use noop scheduler for high-performance solid-state storage

# Reduce CPU overhead due to entropy collection

# Spread CPU load by redirecting completions to originating CPU

# Set the HBA timeout to 60 seconds

For RHEL 5.x

# Recommended settings for Pure Storage FlashArray.

# Use noop scheduler for high-performance solid-state storage

©2018 Copyright Pure Storage. All rights reserved.

Verify the Current Setting

[root@host ~]# multipath -ll | grep -A 7 -B 0 "dm-6"

If we want to know all PURE volumes presented to the host

multipath -ll | grep -i PURE

Changing the Maximum Value

echo 'ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_

©2018 Copyright Pure Storage. All rights reserved.

Immediate Change but Won't Persist Through Reboot

echo %VALUE% > /sys/block/sdz/queue/max_sectors_kb

%VALUE% should be ≤ 4096

Recommended DM-Multipathd Settings

©2018 Copyright Pure Storage. All rights reserved.

RHEL 5.7+ - 6.1 and supporting kernels:

RHEL 5.6 and below, and supporting kernels:

©2018 Copyright Pure Storage. All rights reserved.

An explanation of these settings.

Attribute RHEL Version Description

Specifies the interval between two

This specifies that it will only apply

Specifies the default algorithm to

Specifies the default algorithm to

round-robin 0 : Loop through

This setting does not consider path

Specifies the default path grouping

This setting ensures that all paths

©2018 Copyright Pure Storage. All rights reserved.

are in use at all times, preventing a

Specifies the number of I/O requests

Specifies the number of I/O requests

Since the default is already 1,

Specifies the default method used to

tur: Issue a TEST UNIT READY

tur uses the SCSI command TEST

The number of seconds the SCSI

©2018 Copyright Pure Storage. All rights reserved.

remote port. This value should be

The number of seconds the SCSI