You are on page 1of 48

Troubleshooting Storage Performance

Sajjad Siddicky , GSS Escalation

1
Theme: Storage “Where bad things happen”

This is not easy stuff


• This is a complex and confusing topic

The Impact on your virtual infrastructure is serious


• Disk latency can have serious impact on Applications running in virtual
environment

2
Scope, scope, scope!

 What is affected?

• Particular application
• Only 1 Guest
• All guests accessing the same Lun/Volume
• All guests running on the same ESXi host
• Are there Guests on the same lun NOT reporting issues?

3
So, think It IS Storage! Where do I start ?

Mutiple ESXi hosts affected? Start with Array logs


 SAN Array logs
• Error logs
• Latency stats (IOPS / MBPS)
• Cache status
• Schedules tasks (backup ,replication etc)

1 ESXi host affected? Start with host logs


 ESXi logs
• Vmkernel (SCSI sense code failures, Reservation errors)
• Esxtop

4
Storage – “Where bad things happen”

Virtual SCSI

ESXi VMFS NFS client

IOps/MBps Maximums
I/O - “not enough speed”

Paths Processor saturation

Front-end Cache issues


Processor

Array Cache
IOps/MBps maximums
Back-end and device configuration
Spindles – “just not enough disks”

5
ESXi host - Vmkernel logs

Location: /var/log/vmkernel.log (ESXi 5.x)

Example #1:
vmkernel: 1:08:42:28.062 cpu3:8374)NMP:
nmp_CompleteCommandForPath:2190: Command 0x16 (0x41047faed080) to
NMP device "naa.600508b40006c1700001200000080000" failed on physical
path "vmhba39:C0:T1:L16" H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0
0x0.

VMK_SCSI_DEVICE_QUEUE_FULL (TASK SET FULL) = 0x28

This status is returned when the LUN prevents accepting SCSI commands
from initiators due to lack of resources, namely the queue depth on the array.

*kb 1030381 for complete listing of device-side NMP errors

6
ESXi host - Vmkernel logs

Example #2:

vmkernel: 116:03:44:19.039 cpu4:4100)NMP:


nmp_CompleteCommandForPath: Command 0x2a (0x4100020e0b00) to
NMP device "naa.600508b40006c1700001200000080000” failed on physical
path "vmhba2:C0:T0:L152" H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0
0x0.

VMK_SCSI_HOST_BUS_BUSY = 0x02 or 0x2

This status is returned when the HBA driver is unable to issue a command to
the device. This status can occur due to dropped FCP frames in the
environment.

*kb 1029039 for complete listing of host-side NMP errors

7
ESXi host - Vmkernel logs (ESXi 5.x)

Device naa.5000c5000b36354b performance has


deteriorated. I/O latency increased from average value of
1832 microseconds to 19403 microsecond

Note: The message shows microseconds which can be converted to


milliseconds: 19403 microseconds = 19.403 milliseconds.

Only relevant if error is repeating for long periods on the same


device and / or latency value is large (>20 millisecond)

8
ESXTOP

- Local console
- SSH
- vMA (vSphere Management Assistance)
E S X T O P S C R E E N S
CPU
Scheduler
Memory
Scheduler
Virtual
Switch
vSCSI • c: cpu (default) • i: Interrupts
c, i, p m n d, u, v • m: memory • d: disk adapter
• n: network • u: disk device
• p: power management • v: disk VM

9
ESXtop Disk Adapter Screen (d)

Host bus adapters (HBAs) - Latency stats from the


includes SCSI, iSCSI, RAID, Device, Kernel and the
and FC-HBA adapters Guest

DAVG/cmd - Average latency (ms) from the Device (LUN)


KAVG/cmd - Average latency (ms) in the VMKernel
GAVG/cmd - Average latency (ms) in the Guest

 Kernel Latency Average (KAVG)


• The Amount of time an IO spends in the VMKernel (mostly made up of Kernel Queue Time)
• Investigation Threshold: > 2ms, Should typically be 0 ms

 Device Latency Average (DAVG)


• This is the latency seen at the device driver level
• Investigation Threshold: > 20ms, lower is better, some spikes okay

10
Disk I/O – 3 Main Latencies

Application
Guest OS
VMM
vSCSI KAVG KAVG = QAVG + Kernel
processing time
ESX Storage
Stack QAVG QAVG = time I/O spends in
Storage Adapter Queue
Driver
HBA GAVG GAVG = DAVG + KAVG

Fabric
DAVG
Array SP

11
Interpreting Latency Values

 DAVG is HIGH (>20 ms)


 Is it always high? over 20 ms constantly
 Check Array logs and fabric / network
 Scheduled tasks (backup / replication etc)

 KAVG is HIGH (>2 ms)


 Host resource contention
 QAVG high (>2ms) – QUEUEING

12
Queue Length (max # of active commands)

Application
Guest OS GQLEN - Guest OS Queue Length

VMM
VMkernel
vSCSI
ESX Storage
Stack
AQLEN - Adapter Queue Length
Driver
HBA
DQLEN – Device/Lun Queue Length
Fabric
Array SP SQLEN – Array (SP) Queue Length

DQLEN can change dynamically with SIOC enabled


SIOC will throttle depending on Shares/ Priorities

13
Queuing example

1 HBA can support only 2,000 active


commands addressing 40 LUNs

Adapter Queue length = 64

each LUN gets 1 queue so ,


64 x 40 = 2,560 total commands

Result :
I/O will Queue up in Kernel (> 2,000 max)

VMware sets 32 as the default device queue length


(for Qlogic in 5.x, it is 64)

Do not change unless Array vendor recommends to


* Kb 1267 for more information

14
Queuing in the VMKernel
When the active requests exceeds the
LUN 32 IOs in flight and device queue depth, all additional I/O will
Queue 32 Queued be queued in the VMKernel and will reflect
depth is 32 (100% active) in the QAVG

Esxtop “u”
Queuing is
GAVG = DAVG + KAVG (QAVG + kernel time) occurring
KAVG is
non-zero

When the device queue is full, I/O will


backup to the VMKernel queue

15
Storage Latencies Will Effect CPU State Times (“c” esxtop)

WAIT
• Waiting on Idle (Idle VMX – not too
much activity) RDY
• Waiting on mem. pg. to swap on disk
The CPU scheduler CSTP
IDLE SWPWT Blocked MLMTD pausing access to
% of time that PCPU Due to RUN
% of time that Waiting on % of time the
the VCPU is in the world is storage I/O
contention or limits Co-de-scheduled
world was not
idle loop waiting for ESX completion scheduled due state for SMP VMs
swapping for VM to CPU limit
violations % of time the VM is
running on a PCPU
VMWAIT

If %WAIT is high, is it due to %VMWAIT (Blocked)?

16
No errors on ESX host and STORAGE, Now what?

Check path to Storage

Fibre Channel

• CRC errors
– Bad SFP , Cable
• C3 discard, BB_credit exhaustion
– Fabric overloaded, Oversubscription
• Fabric Routing issues, etc

iSCSI / NAS

• Wrong VMKernel used, wrong uplink


• Physical switch
• Spanning-tree Flooding
• Port errors
• Network latency
• Switch CPU usage, etc.

17
Error Check (FIBRE-CHANNEL)
cat /proc/scsi/qla2xxx/1

QLogic PCI to Fibre Channel Host Adapter for QMI2572:


FC Firmware version 5.06.02 (90d5), Driver version 911.k1.1-19vmw
………….
Dpc flags = 0x0
Link down Timeout = 045
Port down retry = 005
Login retry count = 008
Execution throttle = 2048
ZIO mode = 0x6, ZIO timer = 1
Commands retried with dropped frame(s) = 417238

Things to check:
- HBA Driver known issues
- Fabric errors

*kb 1005576 – Enabling verbose logging on QLogic and Emulex Host Bus Adapters

18
Error Check (iSCSI / NFS)

ESXTOP “n”

PORT-ID USED-BY TEAM-PNIC DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX
16777217 iSCSI n/a vSwitch1 0.00 0.00 0.00 0.00 0.00 0.00
16777218 vmnic2 - vSwitch1 1.61 0.00 10.00 0.01 0.00 0.00
16777219 vmk1 vmnic2 vSwitch1 1.20 0.00 6.83 0.01 0.00 0.00

VMKPING TCPDUMP

# vmkping –d –s 1472 10.16.224.237 -c 30 # tcpdump-uw -i vmk1 -w


PING 10.16.224.237 (10.16.224.237): 1472 data bytes /vmfs/volumes/localdatastore/nfs.pcap
1480 bytes from 10.16.224.237: icmp_seq=0 ttl=64 time=0.053 ms
1480 bytes from 10.16.224.237: icmp_seq=1 ttl=64 time=0.091 ms
1480 bytes from 10.16.224.237: icmp_seq=2 ttl=64 time=0.091 ms
1480 bytes from 10.16.224.237: icmp_seq=3 ttl=64 time=0.095 ms
1480 bytes from 10.16.224.237: icmp_seq=4 ttl=64 time=0.094 ms
………….
1480 bytes from 10.16.224.237: icmp_seq=28 ttl=64 time=0.084 ms
1480 bytes from 10.16.224.237: icmp_seq=29 ttl=64 time=0.112 ms

--- 10.16.224.237 ping statistics ---


30 packets transmitted, 30 packets received, 0% packet loss
round-trip min/avg/max = 0.031/0.089/0.129 ms

19
Quick Tip

Disable Bad path


# esxcli storage core path set --state=off -p path

Where:
path is the particular path to be enabled/disabled
device is the NAA ID of the device
state is active or off

# esxcli storage core path set --state=off -p fc.2000001b32865b73:2100001b32865b73-


fc.50060160c6e018eb:5006016646e018eb-naa.6006016095101200d2ca9f57c8c2de11

- Also can be done easily from the GUI by going to “Modify


path” and right click and select Disable

20
Helpful Tools

21
Guest – level issues
 Iometer
 Perfmon (windows) / top (linux)

Host– level issues


 Esxtop
 Vcenter Performance graphs
 Vcenter Operation Manager

22
Perfmon (windows)

Avg. Disk sec / Transfer =


average time for each data transfer
~ GAVG

23
GAVG should be close to R
Application

A = Application Latency
File
Guest System A
R = Perfmon
I/O Drivers Windows R “Avg. Disk Sec/transfer”
Device Queue

S S = Windows
Physical Disk Service Time

Virtual SCSI G
G = Guest Latency
VMkernel VMFS NFS client K
K = ESX Kernel

D D = Device Latency

24
Iometer (I/O workload generator tool)

Simulate I/O

 Windows and Linux


 Sequential / Random
 Metrics Collected
• Total I/Os per Sec.
• Throughput (MB)
• CPU Utilization
• Latency (avg. & max)

25
vCenter “Disk” Performance Chart

Latency statistics available for Disks in the vCenter performance charts

KAVG
• Kernel Read latency
• Kernel write latency
• Kernel command latency

QAVG
• Queue command latency
• Queue write latency
• Queue read latency

GAVG
• Read latency
• Write latency
• Command latency

DAVG
• Physical device command latency
• Physical device read latency
• Physical device write latency

26
Capture ESXTOP results while issue exists

Batch mode:
esxtop -b -d 2 -n 100 > esxtopcapture.csv

Where “-b” stands for batch mode, “-d 2″ is a delay of 2 seconds and “-n 100″ are 100 iterations. In this specific case
esxtop will log all metrics for 200 seconds.

Vm-support snapshot (Preferred):


- pre ESXI 5.x
vm-support -s -d <duration in seconds> -i <interval in seconds>

- ESXi 5.x
vm-support -p -d <duration in seconds> -i <interval in seconds>

KB articles:
- Gathering esxtop performance data at specific times using crontab (http://kb.vmware.com/kb/1033346)
- Collecting performance snapshots using vm-support (http://kb.vmware.com/kb/1967)

27
Considerations and Recommendations

28
VAAI (VMware vStorage APIs for Array Integration)

• Offloads tasks to SAN Storage (reduce CPU


load on host)

• Especially helpful for environments -


• VDI (boot-storms, create snapshots)
• Mass virtual machine provision (create VM)
• Mass cloning
• Mass Storage vMotions

Make sure SAN storage firmware


is upgraded to support VAAI

29
Partition Alignment

Mis-aligned

Aligned

VMFS partition is automatically aligned at 64kb when created from


vSphere

30
VMFS vs RDM

VMFS Scalability
8000
7000
 VMFS is a distributed file system
6000
 VMFS has Negligible performance cost VMFS
and superior functionality 5000

IOPS
4000 RDM
(virtual)
3000 RDM
(physical)
2000
Use VMFS, unless RDM required 1000
0
4K IO 16K IO 64K IO

31
Virtual Disk modes

 Independent Persistent
• Changes persistently written to disk

 Independent Non-persistent
• Changes written to re-do log, ESXi host reads the re-do log first for read (Performance hit)
• Changes get lost when vm is powered off

 Snapshot
• Changes written to re-do log, ESXi host reads the re-do log first for read (Performance hit)

Independent Persistent has best performance benefits but no Snapshot


capabilities

32
Thick vs Thin (VMDK)

 Thick VMDK:
MBs I/O Throughput
 Eager-zeroed
 Blocks zeroed out during vmdk creation
 Performance hit during creation but faster later
 Lazy-zeroed
 Space allocated first, but blocks zeroed out after
first-write

 Faster creation, but slower First write.

 Thin VMDK:
 Same performance hit as Thick Lazy-Zero
 fully Inflated and Zeroed, same as Thick Eager-
Zero
http://www.vmware.com/pdf/vsp_4_thinprov_perf.pdf
NO real Performance difference, VAAI will offload
zeroing anyways

 33
Multipathing policy

 MRU (default for Active-Passive array)


 No option to improve performance

 FIXED policy: (default for Active-Active array)


 Balancing Lun ownership among the two Storage Controllers (SP) can improve
performance

 Round-Robin policy:
 Utilizes ALL available paths by load-balancing
 Best performance in most cases

ALWAYS USE CORRECT POLICY RECOMMENDED BY ARRAY VENDOR to avoid issues (Lun
Thrashing) impacting performance

Use Round-Robin for best performance

34
Extents Vs NO Extents?

 Theoretically using Extents can provide performance benefits in shared


environment

 But, considering management overhead VMware Engineering recommends NOT to


use extents for VMFS volumes. VMFS-5 does provide better management
capabilities by allowing for these larger LUN sizes, which makes a significant
amount of the storage administration overhead go away.

Use extent only if you have to


- You are still on VMFS-3 and need a data store larger than 2TB.
- You have storage devices which cannot be grown at the back-end, but you need a
data store larger than 2TB

DO NOT use Extents for VMFS volumes, no performance difference

35
Virtual Storage Adapters

 BusLogic Parellel
 LSI Logic Parallel
 LSI Logic SAS
 PVSCSI( Paravirtual)
 Reduces CPU Utilization
 Increased throughput
 Not supported as boot-device for most OS

Use PVSCSI if possible

Make sure VMware Tools for the guest is updated

36
Throttling I/O per VM

• Use Shares and Limits individually on hosts


VM A VM B
1500 500
Shares Shares

ESX Server

25 %

device queue depth


75%

Provide Higher share values for I/O intensive disks

Shares proportional to other VMs on the same ESXi host


(when SIOC Disabled)
37
SIOC: Storage Contention Solution
With Storage IO Control
Actual Disk Resources utilized by each VM
Are in the correct ratio even across ESX Hosts
 SIOC calculates datastore normalized latency
to identify storage contention VM A VM B VM C
1500 500 500
 SIOC enforces fairness when datastore latency
Shares Shares Shares

ESX Server 1 ESX Server 2


crosses a threshold of 30 ms (configurable)

 Fairness enforced by limiting VMs access to


25 %
queue slots

device queue
75%
6
Provide Higher share values for I/O 100 %
intensive disks. 0 0

Storage Array Queue


SIOC only kicks in when normalized
Storage Queue
60% 20% 20%
latency threshold is exceeded Throttled

Storage
Controlled
With SIOC – Latency is Controlled

38
“Common Containers? Why”

• Mix disk-intensive with non-


disk-intensive virtual
machines on a datastore.

• Mix virtual machines with


different peak access
times.

• But… also ideally think


about “SLA”

39
VMDK Workload Consolidation

Too many sequential threads


Sequential on a lun will appear as a
Random random workload to the storage
Negative Impact on
Sequential Sequential Perf.

Mixing Sequential with Random


Random can hurt Sequential workload
Random Throughput.
Negative Impact on
Sequential Sequential Perf.

Random Group similar workloads together


Random (Random w/ Random and
Sequential /w Sequential)
Random

40
Sizing Storage

Throughput MB/s
*IOPS Write Read
RAID 0 175 44 110

RAID 5 40 31 110

RAID 6 30 30 110

RAID 10 85 39 110
Useable Storage Space * 100% Sequential write for 15k disks

Rules of Thumb
Drive Type MB/sec IOPS Latency Use Case
• 50 - 150 IOPs/ VM FC 4Gb (15k) 100 200 5.5ms High Perf. Trans

• <15 ms latencies FC 4Gb (10k) 75 165 6.8ms High Perf. Trans


SAS (10k) 150 185 12.7ms Streaming
• ~Typical workload
8K IO Size SATA (7200) 140 38 12.7ms Streaming/Nearline
45% Write SSD 230(read) 25000(read) < 1 ms High Perf. Trans
80% Random 180(write) 6000(write) Tiered Storage / Cache

41
vSphere 5.x
New Storage Features

42
vSphere 5.x - Storage Performance Features / Studies

 SIOC for NFS: Cluster-wide virtual machine storage I/O prioritization

 SDRS: Intelligent placement and on-going space & load balancing of


Virtual Machines across Datastores in the same Datastore cluster.

 VAAI: vSphere Storage APIs for Array Integration primitives for Thin
Provisioning

 1 Million IOPS: vSphere 5.x can support astonishing high levels of IO


Operation per second, enough to support today’s largest and most
consolidated cloud environments

 FCOE Performance: New vSphere 5.x ability to utilize built in software


based FCoE virtual adapters to connect to your FCoE base Storage
infrastructure

43
vFlash in vSphere 5.5

Virtual SCSI
vFlash Read Cache

VMkernel VMFS NFS client

Paths

Front-end

Processor

Array Cache

Back-end

Spindles

44
Common causes of Storage Performance issues

• Under-sized storage arrays/devices unable to provide the needed performances

• Infrastructure issue (Fabric, Network)

• I/O Stack Queue congestion

• ESX Host CPU Saturation

• Incorrectly Tuned Applications

• Guest Level Driver and Queuing Interactions

45
Storage Optimization in VMware (Best practices)

Array side ESXi

• Consult SAN Configuration Best • Use correct multipathing for Array


Practice Guides type (Round-Robin preferred)

• Ensure disks are correctly distributed • Change Queue Depth values only

• Ensure the appropriate controller when suggested by array vendor

cache is enabled • Isolate iSCSI / NFS traffic from

• Spread I/O requests across available management , vmotion traffic. Use

paths Jumbo frames if possible

• Appropriate RAID level used • Utilize VAAI, SIOC, NIOC features

• Array firmware / HBA • HBA driver / firmware

46
Troubleshooting Process revisited:

 SCOPE the issue – SAVE TIME


 1 ESXi host effected –> check host logs first
 Multiple ESXi hosts effected –> check Array logs first
 Application Latency ? –> check Application tuning best practices

 ESXtop
 DAVG, High ? -> check SAN IOPS / latency.
-> If SAN IOPS / latency is low, check Fabric/ Network
 KAVG high? check QAVG (Queue stats)
 Look out for I/O Throttling implied by SIOC / Shares

47
Thank you!
Email: ssiddicky@vmware.com
Presented by,
Sajjad Siddicky , GSS Escalation

48

You might also like