You are on page 1of 5

INTRODUCTION TO STORAGE PERFORMANCE

TROUBLESHOOTING
Approximately 90% of performance problems in a vSphere deployment
are typically related to storage in some way.  There have been
significant advances in storage technologies over the past couple of
years to help improve storage performance. There are a few things that
you should be aware of:

In a well-architected environment, there is no difference in


performance between storage fabric technologies. A well-designed
NFS, iSCSI or FC implementation will work just about the same as the
others.

Despite advances in the interconnects, performance limit is still hit at


the media itself, in fact 90% of storage performance cases seen by
GSS (Global Support Services - VMware support) that are not
configuration related, are media related. Some things to remember:

 Payload (throughput) is fundamentally different from IOP (cmd/s)


 IOP performance is always lower than throughput

A good rule of thumb on the total number of IOPS any given disk will
provide:

 7.2k rpm – 80 IOPS


 10k rpm – 120 IOPS
 15k rpm – 150 IOPS
 EFD/SSD – 20k-100k IOPS (max ≠ real world)

So, if you want to know how many IOPs you can achieve with a given
number of disks:

 Total Raw IOPS = Disk IOPS * Number of disks


 Functional IOPS = (Raw IOPS * Write%)/(Raid Penalty) + (Raw IOPS *  Read
%)

This test demonstrates some methods to identify poor storage


performance, and how to resolve it using VMware Storage DRS for
workload balancing. The first step is to prepare the environment for
the demonstration.

DISK I/O LATENCY

Click to enlarge

GAVG (Guest Average Latency) total latency as seen from vSphere

KAVG (Kernel Average Latency) time an I/O request spent waiting


inside the vSphere storage stack. 

QAVG (Queue Average latency) time spent waiting in a queue inside


the vSphere Storage Stack.

DAVG (Device Average Latency) latency coming from the physical


hardware, HBA and Storage device.

When we think about storage performance problems, the top issue is


generally latency, so we need to look at the storage stack and
understand what layers there are in the storage stack and where
latency can build up.
At the top most layer, is the Application running in the guest operating
system. That is ultimately the place where we most care about
latency. This is the total amount of latency that application sees and
it include the latencies off the total storage stack including the guest
OS, the VMKernel virtualization layers, and the physical hardware.  

ESXi can’t see application latency because that is a layer above the
ESXi virtualization layer.

From ESXi we see 3 main latencies that are reported in esxtop and
vCenter.  

The top most is GAVG, or Guest Average latency, that is the total
amount of latency that ESXi can detect.  

That is not saying this is the total amount of latency the application
will see, in fact if you compare the GAVG (the Total Amount of Latency
ESX is seeing) and the Actual latency the Application is seeing, you
can tell how much latency the Guest OS is adding to the storage stack
and that could tell you if the guest OS is configured incorrectly or is
causing a performance problem. For example, if ESX is reporting GAVG
of 10ms, but the application or perfmon in the guest OS is reporting
Storage Latency of 30ms, that means that 20ms of latency is somehow
building up in the Guest OS Layer, and you should focus your
debugging on the Guest OS’s storage configuration.

Ok, now GAVG is made up of 2 major components KAVG and DAVG,


DAVG = basically how much time is spent in the Device from the driver
HBA and storage array, and KAVG = how much time is spent in the
ESXi Kernel (so how much over is the kernel adding).  

KAVG is actually a derived metric - ESXi does not specifically


calculate KAVG. ESXi calculates KAVG with the following formula:

Total Latency –  DAVG =  KAVG.  

The VMKernel is very efficient in processing IO, so there really should


not be any significant time that an IO should wait in the kernel or
KAVG, so KAVG should be equal to 0 in well configured / running
environments. When KAVG is not equal to 0, then that most likely
means that the IO is stuck in a Kernel Queue inside the VMKernel.  So
the vast majority of the time KAVG will equal QAVG or Queue Average
latency (The amount of time an IO is stuck in a queue waiting for a slot
in a lower queue to free up so it can move down the stack).

Troubleshoot storage contention issues


When they mention “storage contention” I am taking this as I/O throughput or I/O latency issues. I find the
quickest and easiest way of measuring/checking this is via esxtop/resxtop.VMware KB 1008205 and
Duncan Eppings esxtop blog post covers this is in more detail.
Metrics to be aware of:
Disk Threshold Description
Metric

This is the average response time in


DAVG 25
milliseconds per command being sent to
the device.

This is the response time as it is


GAVG 25
perceived by the guest operating
system. This number is calculated with
the formula: DAVG + KAVG = GAVG.

This is the amount of time the command


KAVG 2
spends in the VMkernel

Also see pages 47 thru 50 of the vSphere Troubleshooting documentation for further information.

INTRODUCTION TO NUMA AND VNUMA


Since 5.0, vSphere has had the vNUMA feature that presents the
physical NUMA topology to the guest operating system. Traditionally
virtual machines have been presented with a single NUMA node,
regardless of the size of the virtual machine, and regardless of the
underlying hardware. Larger and larger workloads are being virtualized,
and it has become increasingly important that the guest OS and
applications can make decisions on where to execute application
processes and where to place specific application memory. ESXi is
NUMA aware, and will always try to fit a VM within a single NUMA node
when possible. With the emergence of the "Monster VM" this is not
always possible.

Note that because we are working in a fully virtualized environment,


we have to enforce NUMA architectures presented to a VM. In a real
environment it would be possible to see the physical architecture. The
purpose of this module is to gain understanding of how vNUMA works
by itself and in combination with the cores per socket feature.

You might also like