Professional Documents
Culture Documents
Table of Contents
Lab Overview - HOL-2004-01-SDC - Mastering vSphere Performance............................... 3
Lab Introduction ...................................................................................................... 4
Lab Guidance .......................................................................................................... 5
Module 1 - vSphere 6.7 Performance: What's New? (30 minutes) .................................. 11
Introduction........................................................................................................... 12
Faster Lifecycle Management................................................................................ 14
vCenter Server 6.7 ................................................................................................ 19
Core Platform Improvements ................................................................................ 20
Conclusion............................................................................................................. 40
Module 2 - Right-Sizing vSphere VMs for Optimal Performance (45 minutes)................. 41
Introduction........................................................................................................... 42
NUMA and vNUMA ................................................................................................. 43
vCPU and vNUMA Right-Sizing .............................................................................. 47
Guest OS Tools to View vCPUs/vNUMA .................................................................. 51
Conclusion............................................................................................................. 56
Module 3 - Introduction to esxtop (30 minutes) .............................................................. 57
Introduction to esxtop ........................................................................................... 58
Show esxtop CPU features .................................................................................... 59
Show esxtop memory features ............................................................................. 74
Show esxtop storage features............................................................................... 81
Show esxtop network features .............................................................................. 88
Conclusion and Clean-Up ...................................................................................... 92
Module 4 - esxtop in Real-World Use Cases (30 minutes) ............................................... 94
esxtop in Real-World Use Cases ............................................................................ 95
Creating an esxtop resource file ........................................................................... 96
Saving esxtop statistics with batch mode .......................................................... 109
Graphing esxtop statistics................................................................................... 115
Conclusion and Clean-Up .................................................................................... 127
Module 5 - vCenter Performance Analysis (30 minutes) ............................................... 129
Introduction......................................................................................................... 130
vCenter Server Appliance Management Interface (VAMI) ................................... 132
Tools for Detailed Analysis: vimtop ..................................................................... 141
Tools for Detailed Analysis: vpxd profiler logs ..................................................... 148
Tools for Detailed Analysis: PostgreSQL logs and pg_top .................................... 153
Clients (UI and API) Performance Tips ................................................................. 160
Conclusion and Clean-Up .................................................................................... 162
Module 6 - Database Performance Testing with DVD Store (30 minutes) ...................... 163
Introduction......................................................................................................... 164
What is DVD Store 3? .......................................................................................... 165
Downloading/Installing DVD Store 3.................................................................... 167
Building a DVD Store 3 Database/Starting the Lab ............................................. 170
Configuring/Running DVD Store 3 ....................................................................... 174
HOL-2004-01-SDC Page 1
HOL-2004-01-SDC
HOL-2004-01-SDC Page 2
HOL-2004-01-SDC
Lab Overview -
HOL-2004-01-SDC -
Mastering vSphere
Performance
HOL-2004-01-SDC Page 3
HOL-2004-01-SDC
Lab Introduction
This lab, HOL-2004-01-SDC, Mastering vSphere Performance, has a lot of content,
broken down into modules. First, you'll learn about what specifically is new and
improved with the current vSphere 6.7 release. You will also work with a broad array of
benchmarks such as DVD Store, Weathervane, and X-Mem and performance monitoring
tools such as esxtop and advanced performance charts to both measure performance
and diagnose bottlenecks in a vSphere environment. We also explore performance-
related vSphere features such as right-sizing virtual machines, virtual NUMA, Latency
Sensitivity and Host Power Management.
While the time available in this lab constrains the number of performance problems we
can review as examples, we have selected relevant problems that are commonly seen in
vSphere environments. Walking through these examples can help you understand and
troubleshoot typical performance problems.
For the complete Performance Troubleshooting Methodology and a list of VMware Best
Practices, please visit the www.vmware.com website:
• https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/
techpaper/performance/whats-new-vsphere67-perf.pdf
• https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/
techpaper/performance/whats-new-vsphere65-perf.pdf
• https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/
techpaper/performance/drs-enhancements-vsphere67-perf.pdf
• https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/
techpaper/drs-vsphere65-perf.pdf
Furthermore, if you have interest in performance related articles, make sure that you
monitor the VMware VROOM! Blog:
https://blogs.vmware.com/performance/
HOL-2004-01-SDC Page 4
HOL-2004-01-SDC
Lab Guidance
Note: It takes more than 90 minutes to complete this lab. You should
expect to only finish two or three of the modules during your time. The
modules are independent of each other, so you can start at the
beginning of any module and proceed from there. You can use the Table
of Contents to access any module of your choosing at any point in the
lab.
You can find the Table of Contents in the upper right-hand corner of the
Lab Manual.
Lab Captains:
This lab manual can be downloaded from the Hands-on Labs Document site found
here:
http://docs.hol.vmware.com/announcements/nee-default-language.pdf
HOL-2004-01-SDC Page 5
HOL-2004-01-SDC
1. The area in the RED box contains the Main Console. The Lab Manual is on the tab
to the Right of the Main Console.
2. A particular lab may have additional consoles found on separate tabs in the upper
left. You are directed to open another specific console if needed.
3. Your lab starts with 90 minutes on the timer. The lab cannot be saved. All your
work must be done during the lab session, but you can click EXTEND to increase
your time. If you are at a VMware event, you can extend your lab time twice for
up to 30 minutes; each click gives you an additional 15 minutes. Outside of
VMware events, you can extend your lab time up to 9 hours and 30 minutes; each
click gives you an additional hour.
During this module, you input text into the Main Console. Besides directly typing it in,
there are two helpful methods of entering data which make it easier to enter complex
data.
HOL-2004-01-SDC Page 6
HOL-2004-01-SDC
You can also click and drag text and Command Line Interface (CLI) commands directly
from the Lab Manual into the active window in the Main Console.
You can also use the Online International Keyboard found in the Main Console.
1. Click on the Keyboard Icon found on the Windows Quick Launch Task Bar.
HOL-2004-01-SDC Page 7
HOL-2004-01-SDC
In this example, you will use the Online Keyboard to enter the "@" sign used in email
addresses. The "@" sign is Shift-2 on US keyboard layouts.
HOL-2004-01-SDC Page 8
HOL-2004-01-SDC
When you first start your lab, you may notice a watermark on the desktop indicating
that Windows is not activated.
One of the major benefits of virtualization is that virtual machines can be moved and
run on any platform. The Hands-on Labs utilizes this benefit, and we are able to run the
labs out of multiple datacenters. However, these datacenters may not have identical
processors, which triggers a Microsoft activation check through the Internet.
Rest assured, VMware and the Hands-on Labs are in full compliance with Microsoft
licensing requirements. The lab that you are using is a self-contained pod and does not
have full access to the Internet, which is required for Windows to verify the activation.
Without full access to the Internet, this automated process fails and you see this
watermark.
HOL-2004-01-SDC Page 9
HOL-2004-01-SDC
1. Please check to see that your lab is finished all the startup routines and is ready
for you to start.
If you see anything other than "Ready", please wait a few minutes. If after five minutes
your lab has not changed to "Ready", please ask for assistance.
HOL-2004-01-SDC Page 10
HOL-2004-01-SDC
HOL-2004-01-SDC Page 11
HOL-2004-01-SDC
Introduction
Underlying each release of VMware vSphere® are many performance and
scalability improvements. The vSphere 6.7 platform continues to provide industry-
leading performance and features to ensure the successful virtualization and
management of your entire software-defined datacenter.
Please check to see that your lab is finished all the startup routines and is ready for you
to start. If you see anything other than "Ready", please wait a few minutes. If after 5
minutes you lab has not changed to "Ready", please ask for assistance.
HOL-2004-01-SDC Page 12
HOL-2004-01-SDC
Login to vCenter
• Click on the Hosts and Clusters icon (if it isn't already underlined)
HOL-2004-01-SDC Page 13
HOL-2004-01-SDC
HOL-2004-01-SDC Page 14
HOL-2004-01-SDC
Update Manager
This release of vSphere includes a brand-new Update Manager interface that is part
of the HTML5 Web Client.
Update Manager in vSphere 6.7 keeps VMware ESXi 6.x hosts reliable and secure by
making it easy for administrators to deploy the latest patches and security fixes. When
the time comes to upgrade older releases to the latest version of ESXi 6.7, Update
Manager makes that task easy, too.
The new HTML 5 Update Manager interface is more than a simple port from the old Flex
client – the new UI provides a much more streamlined remediation process. For
example, the previous multi-step remediation wizard is replaced with a much more
efficient workflow, requiring just a few clicks to begin the procedure. In addition to that,
HOL-2004-01-SDC Page 15
HOL-2004-01-SDC
As of vSphere 6.7 Update 1, the HTML5 Client is now ‘Fully Featured’. This means that
you can manage all aspects of your vSphere environment using the HTML5-based
vSphere Client, no need to switch back and forth between the vSphere Client and the
vSphere Web Client. We’ve ported all features including VMware Update Manager
(VUM). Read about all the features released in this version of the vSphere Client by
visiting Functionality Updates for the vSphere Client site.
Hosts that are currently on ESXi 6.5 upgrade to 6.7 significantly faster than ever before.
This is because several optimizations have been made for that upgrade path, including
eliminating one of two reboots traditionally required for a host upgrade. In the past,
hosts that were upgraded with Update Manager were rebooted a first time in order to
initiate the upgrade process, and then rebooted once again after the upgrade was
complete.
Modern server hardware, equipped with hundreds of gigabytes of RAM, typically take
several minutes to initialize and perform self-tests. Doing this hardware initialization
twice during an upgrade really adds up, so this new optimization will significantly
shorten the maintenance windows required to upgrade clusters of vSphere
infrastructure.
These new improvements reduce the overall time required to upgrade clusters,
shortening maintenance windows so that valuable efforts can be focused
elsewhere.
Recall that, because of DRS and vMotion, applications are never subject to
downtime during hypervisor upgrades – VMs are moved seamlessly from host to host
as needed.
Since this lab runs in the cloud, it is not practical to upgrade an ESXi host to 6.7.
Instead, check out this video to see how the process works:
HOL-2004-01-SDC Page 16
HOL-2004-01-SDC
vSphere 6.7 introduces vSphere Quick Boot – a new capability designed to reduce the
time required for a VMware ESXi host to reboot during update operations.
Host reboots occur infrequently but are typically necessary after activities such as
applying a patch to the hypervisor or installing a third-party component or driver.
Modern server hardware that is equipped with large amounts of RAM may take many
minutes to perform device initialization and self-tests.
Since this lab runs in the cloud, we can't show a reboot of a physical host. Instead,
check out this video to see how it works!
HOL-2004-01-SDC Page 17
HOL-2004-01-SDC
Conclusion
The new streamlined Update Manager interface, single reboot upgrades, and vSphere
Quick Boot shorten the time required for host lifecycle management operations and
make VMware vSphere 6.7 the Efficient and Secure Platform for your Hybrid Cloud.
HOL-2004-01-SDC Page 18
HOL-2004-01-SDC
With their benchmark vcbench, VMware performance engineers measured the number
of operations per second (throughput) that vCenter produced.
This benchmark stresses the vCenter server by performing typical vCenter operations
like power on and off a VM among several others. vCenter 6.7 performs 16.7 operations
per second, which is a twofold increase over the 8.3 operations per second vCenter
6.5 produced.
Before vCenter can power on a VM, it first consults several sub-systems, including DRS,
to support the initial placement of the VM on a vSphere host. Latency, in this context, is
the measure of the duration of this process. VMware made many optimizations in the
coordination of these sub-systems to reduce power-on latency from 9.5 seconds to
2.8 seconds.
VMware also optimized the core vCenter process (vpxd) to use much less memory (a 3x
reduction!) to complete the same workloads.
For more information about vCenter Performance, check out vCenter Performance
Analysis, a module later in this lab.
HOL-2004-01-SDC Page 19
HOL-2004-01-SDC
Host Scalability
There are some minor improvements to vSphere 6.7 ESXi host maximums worth noting.
Applications with large memory footprints, like SAP HANA, can often stress the hardware
memory subsystem (that is, Translation Lookaside Buffer, or TLB) with their access
patterns. Modern processors can mitigate this performance impact by creating larger
mappings to memory and increasing the memory reach of the application. In prior
releases, ESXi allowed guest operating system memory mappings based on 2 MB page
sizes. This release introduces memory mappings for 1 GB page sizes.
HOL-2004-01-SDC Page 20
HOL-2004-01-SDC
To enable this advanced attribute, see Backing Guest vRAM with 1GB Pages at
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.resmgmt.doc/
GUID-F0E284A5-A6DD-477E-B80B-8EFDF814EE01.html
Scalability of the vSphere ESXi CPU scheduler is always being improved release-to-
release to support current and future requirements. New in vSphere 6.7 is the
elimination of the last global lock, which allows the scheduler to support tens of
thousands of worlds (various processes running in the VMkernel; for example, each
virtual CPU has a world associated with it). This feature ensures vSphere maintains its
lead as a platform for containers and microservices.
In vSphere 6.7 U2, there is a new scheduler option called the side-channel aware
scheduler to address a security vulnerability known as L1TF. For more information,
including performance test results, see this blog: https://blogs.vmware.com/
performance/2019/05/new-scheduler-option-for-vsphere-6-7-u2.html
HOL-2004-01-SDC Page 21
HOL-2004-01-SDC
5. You'll note that "EVC is Disabled". Click the EDIT... button to see what the
choices are.
HOL-2004-01-SDC Page 22
HOL-2004-01-SDC
Virtual Hardware 14
1. Ensure you're in the Hosts and Clusters view (it should be underlined)
2. Select the perf-worker-01a VM
3. Click the Summary tab
4. Note that it has been configured with VM version 14, which is only compatible
with ESXi 6.7 and later.
HOL-2004-01-SDC Page 23
HOL-2004-01-SDC
Upgrade a VM to HW v14
HOL-2004-01-SDC Page 24
HOL-2004-01-SDC
The warning states that you should make a backup of your VM, since it is an irreversible
operation that makes your VM incompatible with earlier versions of vSphere.
Confirm that the VM now states "Compatibility: ESXi 6.7 and later (VM version
14)"
Congratulations. You have upgraded this VM to use the latest vSphere 6.7
enhancements!
HOL-2004-01-SDC Page 25
HOL-2004-01-SDC
Virtual Hardware 15, which is only supported for ESXi 6.7 U2 (and later) hosts, increases
the maximum number of logical processors from 128 to 256.
HOL-2004-01-SDC Page 26
HOL-2004-01-SDC
Persistent memory (PMEM) is a type of non-volatile DRAM (NVDIMM) that has the
speed of DRAM but retains contents through power cycles. It is a new layer that sits
between NAND flash and DRAM and provides faster performance. It’s also non-volatile
unlike DRAM.
• vPMEM - exposes NVDIMM capacity to the virtual machine through a new virtual
NVDIMM device. Guest operating systems use it directly as a block device or in DAX
mode.
This chart shows the result of a performance test run using the MySQL benchmark of
Sysbench. The benchmark measures the throughput and latency of a MySQL workload.
Here, we ran the tests with three tables, nine threads, and an 80-20 read-write ratio
with a MySQL server in a VM hosted on vSphere 6.7.
The blue bars show throughput measured in transactions per second. The green line
shows latency measured as the 95th percentile in milliseconds.
HOL-2004-01-SDC Page 27
HOL-2004-01-SDC
Check out this video to learn more about vSphere Persistent Memory can significantly
enhance performance for both existing and new applications.
HOL-2004-01-SDC Page 28
HOL-2004-01-SDC
Microsoft VBS, a feature of Windows 10 and Windows Server 2016 operating systems,
uses hardware and software virtualization to enhance system security by creating an
isolated, hypervisor-restricted, specialized subsystem. Starting with vSphere 6.7 and
Virtual Hardware 14, you can enable Microsoft virtualization-based security (VBS) on
supported Windows guest operating systems.
To measure the performance of a vSphere 6.7 virtual machine running Windows with
VBS enabled, we used the benchmark HammerDB. The test simulated 22 virtual users
generating an OLTP TPC-C-like workload that wrote to a Microsoft SQL Server 2016
database. This workload was like TPC-C.
Creating a VBS-enabled VM
HOL-2004-01-SDC Page 29
HOL-2004-01-SDC
HOL-2004-01-SDC Page 30
HOL-2004-01-SDC
Create a new virtual machine will be highlighted. Click the NEXT button.
HOL-2004-01-SDC Page 31
HOL-2004-01-SDC
Type a name for the VM, i.e. VBS and click NEXT.
HOL-2004-01-SDC Page 32
HOL-2004-01-SDC
HOL-2004-01-SDC Page 33
HOL-2004-01-SDC
HOL-2004-01-SDC Page 34
HOL-2004-01-SDC
Select the virtual machine version. By default, ESXi 6.7 and later is selected, which is
required for VBS, so click NEXT.
HOL-2004-01-SDC Page 35
HOL-2004-01-SDC
HOL-2004-01-SDC Page 36
HOL-2004-01-SDC
1. Click VM Options
2. Expand the Boot Options section.
Note that by enabling VBS, the necessary options such as EFI firmware and
Secure Boot are required, and automatically set.
3. Click NEXT.
HOL-2004-01-SDC Page 37
HOL-2004-01-SDC
Instant Clone
The time to fully deploy and boot 64 clones using vSphere 6.7 Instant Clone showed
approximately 2.8x improvement over the older Linked Clone architecture.
You can use Instant Clone technology to create powered-on virtual machines from the
running state of another powered-on virtual machine. The result of an Instant Clone
operation is a new virtual machine that is identical to the source virtual machine. With
Instant Clone, you can create new virtual machines from a controlled point in time.
Instant cloning is very convenient for large-scale application deployments because it
ensures memory efficiency and allows for creating numerous virtual machines on a
single host.
HOL-2004-01-SDC Page 38
HOL-2004-01-SDC
This Instant Clone video demonstration shows how 20 CentOS VMs can be provisioned in
two minutes (credit: LearnVMware.online). The magic happens around 3:11 if you want
to skip ahead!
HOL-2004-01-SDC Page 39
HOL-2004-01-SDC
Conclusion
Based on these performance, scalability, and feature improvements in vSphere
6.7, VMware continues to demonstrate industry-leading performance.
If you are looking for additional information on vSphere 6.7 performance, check out
these links:
Now that you’ve completed this lab, try testing your skills with VMware Odyssey, our
newest Hands-on Labs gamification program. We have taken Hands-on Labs to the next
level by adding gamification elements to the labs you know and love. Experience the
fully automated VMware Odyssey as you race against the clock to complete tasks and
reach the highest ranking on the leaderboard. Try the vSphere Performance Odyssey lab
HOL-2004-01-SDC Page 40
HOL-2004-01-SDC
Module 2 - Right-Sizing
vSphere VMs for Optimal
Performance (45 minutes)
HOL-2004-01-SDC Page 41
HOL-2004-01-SDC
Introduction
Meet Melvin the Monster VM! vSphere 6.5 and later can handle Melvin and any
other large, business-critical workloads (known affectionately as "wide" or
"monster" VMs) without breaking a sweat! :-)
In all seriousness, this module discusses rules of thumb for right-sizing VMs --
particularly those that are so large that they span multiple physical processor or
memory node boundaries. We throw around terms such as vCPUs, pCPUs, Cores
Per Socket, NUMA (pNUMA and vNUMA), and learn how to right-size these VMs to
perform optimally.
HOL-2004-01-SDC Page 42
HOL-2004-01-SDC
UMA
This is a bit of a history lesson, as UMA, or Uniform Memory Access, is no longer how
modern servers are designed. The reason why?
NUMA
NUMA moves away from a centralized pool of memory and introduces the concept of a
topology. By classifying memory location bases on signal path length from the
processor to the memory, latency and bandwidth bottlenecks can be avoided. This is
done by redesigning the whole system of processor and chipset. NUMA architectures
gained popularity at the end of the 90's when it was used on SGI supercomputers such
as the Cray Origin 2000. NUMA helped to identify the location of the memory, in this
case of these systems, they had to wonder which memory region in which chassis was
holding the memory bits.
HOL-2004-01-SDC Page 43
HOL-2004-01-SDC
In the first half of the millennium decade, AMD brought NUMA to the enterprise
landscape where UMA systems reigned supreme. In 2003 the AMD Opteron family was
introduced, featuring integrated memory controllers with each CPU owning designated
memory banks. Each CPU has now its own memory address space. A NUMA-optimized
operating system such as ESXi allows workload to consume memory from both memory
addresses spaces while optimizing for local memory access. Let's use an example of
a two CPU system to clarify the distinction between local and remote memory access
within a single system:
(Credit: frankdenneman.nl)
The memory connected to the memory controller of the CPU1 is considered to be local
memory. Memory connected to another CPU socket (CPU2) is considered to be foreign or
remote for CPU1. Remote memory access has additional latency overhead as opposed
to local memory access, since it has to traverse an interconnect (point-to-point link) and
connect to the remote memory controller. As a result of the different memory locations,
this system experiences “non-uniform” memory access time.
HOL-2004-01-SDC Page 44
HOL-2004-01-SDC
Without vNUMA
In this example, a VM with 12 vCPUs is running on a host with four NUMA nodes with six
cores each. This VM is not being presented with the physical NUMA configuration and
hence the guest OS and application only sees a single NUMA node. This means that the
guest has no chance of placing processes and memory within a physical NUMA node.
HOL-2004-01-SDC Page 45
HOL-2004-01-SDC
With vNUMA
Since vSphere 5, ESXi has had the vNUMA (virtual NUMA) feature that can
present multiple NUMA nodes to the guest operating system. Traditionally, virtual
machines have only been presented with a single NUMA node regardless of the
size of the VM and its underlying hardware. Larger and larger workloads are being
virtualized, so it has become increasingly important that the guest OS and
applications can make decisions on where to execute applications and where to
place memory.
VMware ESXi is NUMA aware and always tries to fit a VM within a single physical
NUMA node when possible. However, with very large "monster VMs", this isn't
always possible.
The purpose of this section is to gain understanding of how vNUMA works by itself
and in combination with the cores per socket feature.
In this example, a VM with 12 vCPUs is running on a host that has four NUMA nodes with
six cores each. This VM is being presented with the physical NUMA configuration, and
hence the guest OS and application sees two NUMA nodes. This means that the guest
can place processes and accompanying memory within a physical NUMA node when
possible.
HOL-2004-01-SDC Page 46
HOL-2004-01-SDC
The most important values are shown in this screenshot, taken directly from the
vSphere Web Client.
NOTE: You must expand the CPU dropdown to view/change some of these fields!
1. CPU: This is the total number of vCPUs presented to the guest OS ( 20 in this
example)
2. Cores per Socket: If this value is 1 (the default), all CPUs are presented to the
guest as single-core processors. For most VMs, the default value is OK, but
there are definitely instances when you should consider increasing this
value, which we'll discuss in a bit.
In this example, we've increased it to 10 , which means the guest will see multi-
core (10-core) processors.
3. Sockets: This is not a configurable value; it is simply the number of CPUs divided
by Cores per Socket : in this example, 20 / 10 = 2 .
Also called "virtual sockets" or "vSockets".
4. CPU Hot Plug: Also known as CPU Hot Add, this is a checkbox to allow adding
more CPUs "on the fly" (while the guest is powered on).
If you have right-sized your VM from the beginning, you should not enable this
HOL-2004-01-SDC Page 47
HOL-2004-01-SDC
feature, because it has the major downside of disabling vNUMA. For more
information, see vNUMA is disabled if VCPU hotplug is enabled (KB 2040375)
Let's refer to this 20 vCPU VM, as configured, as 2 Sockets x 10 Cores per Socket.
Let's talk about the Cores per Socket value. As mentioned earlier, this defaults to 1,
which means that every virtual CPU is present as a Socket to the guest VM. In most
cases, there's no issue there.
However, this may not be ideal from a Microsoft licensing perspective where the
operating system and/or application is sometimes per-processor. Here are a few
examples:
• Since both Windows Server 2012 and 2016 only support up to 64 sockets,
creating a “monster” Windows VM with more than 64 vCPUs requires an increase
in Cores per Socket so the guest can consume all the assigned processors.
• A virtual machine with 8 Sockets x 1 Core per Socket, hosting a single Microsoft
SQL Server 2016 Standard Edition license, would only be able to consume 4 of
the 8 vCPUs since that edition’s license limits to “lesser of 4 sockets or 24
cores”. If the virtual machine is configured with 1 Socket x 8 Cores per Socket,
all 8 vCPUs could be leveraged: https://msdn.microsoft.com/en-us/library/
ms143760.aspx
• A VM created with 16 vCPUs and 2 Cores per Socket hosting Microsoft SQL
Server 2016 Enterprise Edition, may behave differently than a VM configured
with 16 vCPUs and 8 Cores per Socket. This is due to the soft-NUMA feature
within SQL Server which gets automatically configured based on the number of
cores the OS can use: https://msdn.microsoft.com/en-us/library/ms345357.aspx
HOL-2004-01-SDC Page 48
HOL-2004-01-SDC
http://frankdenneman.nl/2016/12/12/decoupling-cores-per-socket-virtual-numa-
topology-vsphere-6-5/
However, you should still choose the CPU and Cores per Socket values wisely. Read on
for some best practices.
In general, the following best practices should be followed regarding vNUMA and Cores
per Socket:
• Configure the VM CPU value equal to Cores per Socket , until you exceed the
physical core count of a single physical NUMA node.
Example: for a host with 8-core processors, any VM with 8 (or fewer) CPUs should
have the same Cores Per Socket value.
• When you need to configure more vCPUs than there are physical cores in the
NUMA node, evenly divide the vCPU count across the minimum number of
NUMA nodes.
Example: for a 4-socket, 8-core host, and the VM needs more than 8 vCPUs, a
reasonable choice may include a 16 vCPU VM with 8 Cores per Socket (to
match the 8-core processor architecture)
• Don’t assign an odd number of vCPUs when the size of your virtual machine
exceeds a physical NUMA node.
Example: for a 2-socket, 4-core host, do not create a VM with more than 8 vCPUs.
There are many Advanced Virtual NUMA Attributes (click for a full list); here are a few
guidelines, but in general, the defaults are best:
HOL-2004-01-SDC Page 49
HOL-2004-01-SDC
• If the VM is larger than total physical core count (e.g. a 64 vCPU VM on a 40 core /
80 thread host), try numa.consolidate = false
• If Hyper-Threading is enabled (usually the default), numa.vcpu.preferHT=true
may help (KB 2003582)
• If Cores per Socket is too restrictive, you can manually set vNUMA size with
numa.vcpu.maxPerMachineNode
• To enable vNUMA on a VM with 8 or fewer vCPUs, use numa.vcpu.min
Of course, a picture (or in this case, a table) is worth a thousand words. This table
outlines how a VM could (should) be configured on a dual-socket, 10-core physical host
to ensure an optimal vNUMA topology and performance, regardless of vSphere version.
HOL-2004-01-SDC Page 50
HOL-2004-01-SDC
What do these toplogies look like from the guest OS perspective? Let's look at
some examples of tools for Windows and Linux that let us verify that the guest is
showing the expected processor and NUMA configurations.
1. CPU: This is the total number of vCPUs presented to the guest OS ( 20 in this
example)
2. Cores per Socket: If this value is 1 (the default), all CPUs are presented to the
guest as single-core processors.
For most VMs, the default value is OK, but there are definitely instances when
you should consider increasing this value, which we'll discuss in a bit.
In this example, we've increased it to 10 , which means the guest will see multi-
core (10-core) processors.
3. Sockets: This is not a configurable value; it is simply the number of CPUs divided
by Cores per Socket : in this example, 20 / 10 = 2 .
4. CPU Hot Plug: Also known as CPU Hot Add, this is a checkbox to allow adding
more CPUs "on the fly" (while the guest is powered on).
HOL-2004-01-SDC Page 51
HOL-2004-01-SDC
If you have right-sized your VM from the beginning, you should not enable this
feature, because it has the major downside of disabling vNUMA.
Let's refer to this 20 vCPU VM, as configured, as 2 Sockets x 10 Cores per Socket.
Windows: Coreinfo
Coreinfo is a command-line utility that shows you the mapping between logical
processors and the physical processor, NUMA node, and socket on which they reside, as
well as the cache’s assigned to each logical processor. It uses the Windows’
GetLogicalProcessorInformation function to obtain this information and prints it to the
screen, representing a mapping to a logical processor with an asterisk e.g. ‘*’ .
Coreinfo is useful for gaining insight into the processor and cache topology of your
system.
HOL-2004-01-SDC Page 52
HOL-2004-01-SDC
Parameter Description
-c Dump information on cores.
-f Dump core feature information.
-g Dump information on groups.
-l Dump information on caches.
-n Dump information on NUMA nodes.
-s Dump information on sockets.
-m Dump NUMA access cost.
-v Dump only virtualization-related features
Here we see the output of coreinfo (with no command line options) on the
aforementioned 20 vCPU VM. Here is a breakdown of the highlights:
HOL-2004-01-SDC Page 53
HOL-2004-01-SDC
Linux: numactl
For Linux, the most useful parameter to gain information about virtual NUMA is numactl .
Note that you may need to install the package that provides the numactl tool for your
OS (for RHEL/CentOS 7, an appropriate command is yum install numactl ).
Parameter Description
-c Dump information on cores.
-f Dump core feature information.
-g Dump information on groups.
-l Dump information on caches.
-n Dump information on NUMA nodes.
-s Dump information on sockets.
-m Dump NUMA access cost.
-v Dump only virtualization-related features
Here we see the output of numactl -H (the -H is an abbreviation for hardware; use the
man numactl command to see all of the available parameters). Here is a quick
explanation:
HOL-2004-01-SDC Page 54
HOL-2004-01-SDC
2. available: 2 nodes (0-1): This section confirms Linux sees 2 NUMA nodes,
also known as vNUMA nodes.
3. node 0 cpus, node 1 cpus: This section confirms Linux sees 10 logical
processors on each NUMA node (20 vCPUs total).
HOL-2004-01-SDC Page 55
HOL-2004-01-SDC
Conclusion
Congratulations! You now know how to right size VMs optimally for
vSphere 6.7!
Resources/Helpful Links
For more information about right-sizing VMs, NUMA/vNUMA, and vSphere performance in
general, here are some helpful links:
• https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-
vnuma-rightsizing-rules-of-thumb.html
• http://frankdenneman.nl/2016/07/06/introduction-2016-numa-deep-dive-series/
HOL-2004-01-SDC Page 56
HOL-2004-01-SDC
Module 3 - Introduction to
esxtop (30 minutes)
HOL-2004-01-SDC Page 57
HOL-2004-01-SDC
Introduction to esxtop
There are several tools to monitor and diagnose performance in vSphere environments.
esxtop helps you diagnose and further investigate performance issues that you've
already identified through the vSphere Client or other tool or method. esxtop is not a
tool designed for monitoring performance over the long term but is great for deep
investigation or monitoring a specific issue or VM on a specific host over a defined
period of time.
In this lab, which should take about 30 minutes, we use esxtop to dive into performance
troubleshooting the utilizations of CPU, Memory, Storage, Network, and Power.
The goal of this module is to expose you to the different views in esxtop and to present
you with different loads in each view. This is not meant to be a deep dive into esxtop
but to get you comfortable with this tool so that you can use it in your own environment.
To learn more about the metrics in esxtop and what they mean, we recommend that you
look at the links at the end of this module.
HOL-2004-01-SDC Page 58
HOL-2004-01-SDC
Type
.\StartCPUTest2.ps1
and press Enter. Wait until you see the RDP sessions to continue.
Open PuTTY
HOL-2004-01-SDC Page 59
HOL-2004-01-SDC
SSH to esx-01a
Start esxtop
HOL-2004-01-SDC Page 60
HOL-2004-01-SDC
esxtop
2. Click the Maximize icon so we can see the maximum amount of information.
If you just started esxtop, you are in the CPU view by default.
If you happen to be on a different screen, pressing "c" gets you back to this view.
By default the screen will be refreshed every five seconds. To change this, for example
to set the refresh rate to two seconds, press "s 2" then press Enter:
s 2
Let's filter this view (remove some fields) by pressing the letter "f":
HOL-2004-01-SDC Page 61
HOL-2004-01-SDC
Since we don't much screen space, let's remove (filter out) the ID and GID fields.
Do this by typing the following letters (NOTE: Make sure these are capitalized as these
are case sensitive!)
AB
You should see the * next to A: and B: disappear. Press Enter to resume the esxtop
screen.
By default, this screen shows performance counters for both virtual machines and ESXi
host processes.
HOL-2004-01-SDC Page 62
HOL-2004-01-SDC
Let's filter out everything except for virtual machines. To do this, type a capital "V":
Monitor VM load
Monitor the load on the two Worker VM's: perf-worker-01a and perf-worker-01b:
1. Both VMs should both be running at or near 100% utilization (%USED). If not,
then wait for a moment and let the CPU workload startup.
2. Another important metric to monitor is %RDY (CPU Ready). This metric is the
percentage of time a “world” is ready to run but waiting on the CPU scheduler for
approval. This metric can go up to 100% per vCPU, which means that with two
vCPUs, it has a maximum value of 200%. A good guideline is to ensure this value
is below 5% per vCPU, but it always depends on the application.
Look at the worker VMs to see if they go above the 5% per vCPU threshold. To
force esxtop to immediately refresh, click the Space bar.
HOL-2004-01-SDC Page 63
HOL-2004-01-SDC
HOL-2004-01-SDC Page 64
HOL-2004-01-SDC
HOL-2004-01-SDC Page 65
HOL-2004-01-SDC
Since we previously enabled CPU Hot Add, we can add another vCPU while the VM is
running:
HOL-2004-01-SDC Page 66
HOL-2004-01-SDC
HOL-2004-01-SDC Page 67
HOL-2004-01-SDC
1. Change CPU to 2
2. Click OK to save
Now that you've added an additional vCPU to each VM, you should see results like the
screenshot above:
HOL-2004-01-SDC Page 68
HOL-2004-01-SDC
After a few minutes, the CPU benchmark starts to use the additional vCPUs and %RDY
increases even more. This is due to CPU contention and SMP scheduling
(increased %CSTP) on the system. The ESXi host has two active virtual machines each
with two vCPUs, and these four vCPUs attempting to run at 100% each results in
fighting for resources. Remember that the ESXi host also requires some physical CPU
resources to run, and this causes CPU contention.
HOL-2004-01-SDC Page 69
HOL-2004-01-SDC
HOL-2004-01-SDC Page 70
HOL-2004-01-SDC
A new switch in vSphere 6.5 lets you monitor the host CPU power statistics in esxtop. To
view the host power screen in esxtop, type a lowercase "p":
Press the letter "f" to see available fields to add to the screen:
HOL-2004-01-SDC Page 71
HOL-2004-01-SDC
Press the letter "f" again to add %Aperf/Mperf then press Enter:
%A/MPERF:
This ratio column identifies at what frequency the processor is currently running. aperf
and mperf are two hardware registers that keep track of the actual frequency and
nominal frequency of the processor. You can't see actual values because of the nature
of the Hands On Lab.
However, look at the following image captured from a physical host. It shows a host
running VMware vSphere 6.7 U2 with 36 logical CPUs (18 physical CPUs with
Hyperthreading enabled) each at 2.8 GHz. The host serves two VMs, and we started a
CPU-intensive quad-threaded process on each VM to generate load.
HOL-2004-01-SDC Page 72
HOL-2004-01-SDC
HOL-2004-01-SDC Page 73
HOL-2004-01-SDC
Click on the Windows PowerShell icon in the taskbar to open a command prompt.
NOTE: If you already have one open, just switch back to that window.
Reset Lab
Type
.\StopLabVMs.ps1
HOL-2004-01-SDC Page 74
HOL-2004-01-SDC
.\StartMemoryTest.ps1
You can continue to the next step while the script is running, but please don't close any
windows since that stops the memory load.
HOL-2004-01-SDC Page 75
HOL-2004-01-SDC
Type
Since we don't have so much screen space, let's remove the two counters ID and GID.
BH
HOL-2004-01-SDC Page 76
HOL-2004-01-SDC
This screen shows memory performance counters for both virtual machines and ESXi
host processes.
You can press (capital) V again to toggle between all processes and only VM processes.
When the load on the worker VMs begin, you can see them in the top of the esxtop
window.
HOL-2004-01-SDC Page 77
HOL-2004-01-SDC
MCTL:
Is the balloon driver installed? If not, then it's a good idea to fix that first.
MCTLSZ:
Shows how inflated the balloon is and how much memory has been taken back from the
operating system. This should be 0.
SWCUR:
Shows how much the VM has swapped. This should be 0, but could be OK if SWR/S and
SWW/S are low.
SWR/S:
SWW/S:
Depending on the lab, all counters should be good. However, due to the nature of the
nested lab, it's unclear what you might see, so look around.
HOL-2004-01-SDC Page 78
HOL-2004-01-SDC
Power on perf-worker-04a
1. Click to focus on the vSphere Web Client browser window. Right click on perf-
worker-04a
2. Select Power
3. Click Power On
Now that we have created memory contention on the ESXi host, we can see:
HOL-2004-01-SDC Page 79
HOL-2004-01-SDC
2. perf-worker-02a, 03a and 04a are swapping to disk, indicating too much
memory strain in this environment
1. To stop the load on the workers that appeared after you started the load script,
close the two VM Stats Collector windows.
HOL-2004-01-SDC Page 80
HOL-2004-01-SDC
Click on the Windows PowerShell icon in the taskbar to open a command prompt
NOTE: If you already have one open, just switch back to that window.
Reset Lab
Type
.\StopLabVMs.ps1
HOL-2004-01-SDC Page 81
HOL-2004-01-SDC
.\StartStorageTest.ps1
The lab takes about five minutes to prepare. Feel free to continue on to the other steps
while the script finishes.
After you start the script, be sure that you don't close any windows that appear.
Different views
When looking at storage in esxtop, you have multiple options to choose from.
and
• vSAN (x)
HOL-2004-01-SDC Page 82
HOL-2004-01-SDC
HOL-2004-01-SDC Page 83
HOL-2004-01-SDC
The StartStorageTest.ps1 script that we executed in the beginning of this lab should be
finished now, and you should have two Iometer windows on your desktop that look like
the above image.
If not, run
.\StartStorageTest.ps1
Monitor VM load
HOL-2004-01-SDC Page 84
HOL-2004-01-SDC
Two of them are running Iometer workloads, and the other two are iSCSI storage targets
using RAM disk. Because they are using a RAM disk as storage target, they do not
generate any disk I/O.
CMDS/S:
This is the total amount of commands per second and includes IOPS (Input/Output
Operations Per Second). It also includes other SCSI commands such as SCSI
reservations, locks, vendor string requests, unit attention commands, and so on being
sent to or coming from the device or virtual machine.
In most cases, CMDS/s = IOPS unless there are a many metadata operations (such as
SCSI reservations).
These indicate average response time or Read and Write IO as seen by the VM.
In this case, you should see high values in CMD/s on the worker VMs that currently do
Iometer load (perf-worker-02a and 03a). This indicates that the VMs are generating a lot
of IO.
You also can observe a high value in LAT/wr since the VMs are only doing writes.
The numbers may be different on your screen due to the nature of the Hands On Labs.
HOL-2004-01-SDC Page 85
HOL-2004-01-SDC
Press
Here you can see that the storage workload is on device vmhba65, which is the software
iSCSI adapter. Look for DAVG (device latency) and KAVG (kernel latency).
1. When finished, stop IOmeter workloads by clicking the red STOP button in each
IOmeter window
2. Click on the red X in the top right corner to close the window
HOL-2004-01-SDC Page 86
HOL-2004-01-SDC
After both Iometer windows are closed, switch back to the PowerShell window and wait
for the script to clean up the environment before proceeding. Once you see this screen,
you can proceed.
HOL-2004-01-SDC Page 87
HOL-2004-01-SDC
Click on the Windows PowerShell icon in the taskbar to open a command prompt
NOTE: If you already have one open, just switch back to that window.
.\StartNetTest.ps1
Press Enter.
Continue with the next steps while the script runs since it takes a few minutes to load.
HOL-2004-01-SDC Page 88
HOL-2004-01-SDC
HOL-2004-01-SDC Page 89
HOL-2004-01-SDC
Since there is not a lot of screen space, let's remove the two counters PORT-ID and
DNAME
AF
Monitor load
Note that the result might be different on your screen due to the load of the
environment where the Hands On Lab is running.
space
HOL-2004-01-SDC Page 90
HOL-2004-01-SDC
Note that the StartNetTest.ps1 script that you ran in the first step starts the VMs and
then waits for two minutes before running a network load for five minutes.
Depending on how fast you were at getting to this step, you might not see any load if it
took you more than seven minutes. You can restart the network load in the next step if
you need to.
If you want to start the network load for another five minutes, return to the PowerCLI
window.
In PowerShell type
.\StartupNetLoad.bat
Press Enter.
The network load runs for another five minutes. While you wait, you can continue to
explore esxtop.
As described previously, the load stops by itself. When the PowerShell window says,
"Network load complete", it no longer generates load and the test is finished.
HOL-2004-01-SDC Page 91
HOL-2004-01-SDC
During this lab we learned how to use esxtop to monitor load in CPU, memory, storage,
network, and power views.
We have only scratched the surface of what esxtop can do. In the next module, we take
a closer look at using esxtop in your own datacenter.
Clean up procedure
To free up resources for the remaining parts of this lab, we need to shut down all used
virtual machines and reset the configuration.
Click on the Windows PowerShell icon in the taskbar to open a command prompt
NOTE: If you already have one open, just switch back to that window.
Reset Lab
.\StopLabVMs.ps1
HOL-2004-01-SDC Page 92
HOL-2004-01-SDC
and press Enter. This resets the lab into a base configuration. You now can move on to
another module.
Conclusion
This concludes the Introduction to esxtop module. We hope you have enjoyed taking
it. To learn more about esxtop's advanced features, such as running in batch mode and
viewing collected statistics, continue to the next module.
HOL-2004-01-SDC Page 93
HOL-2004-01-SDC
HOL-2004-01-SDC Page 94
HOL-2004-01-SDC
HOL-2004-01-SDC Page 95
HOL-2004-01-SDC
Once you become familiar with esxtop and begin using it interactively or in batch mode,
you can see that it generates screens full of detailed VM and host information. When
you have many VMs on a large host, the screen display can be difficult to manage, so
esxtop lets you create one or more resource files that initializes the display to capture
a subset of the performance statistics. This file's default name is ~/.esxtop60rc. Let's
learn how to use it and trim down the number of fields to report.
If you took Module 3 - Introduction to esxtop (30 minutes) then you're already familiar
with adding and removing fields from esxtop. In this module, we filter esxtop to capture
commonly monitored performance statistics in the CPU, memory, I/O, network, and
power components.
Open PuTTY
HOL-2004-01-SDC Page 96
HOL-2004-01-SDC
SSH to esx-01a
Start esxtop
HOL-2004-01-SDC Page 97
HOL-2004-01-SDC
esxtop
If you just started esxtop, you are in the CPU view by default.
If you happen to be on a different screen, pressing "c" gets you back to this view.
Some of the columns exceed the width of the window, so hover the cursor on the right
edge of the window, click once, and stretch it horizontally to the right to expand
the window.
HOL-2004-01-SDC Page 98
HOL-2004-01-SDC
Now the PuTTY window is wide enough to display most or all of the available columns
(as highlighted above).
Let's filter this view (add and remove some fields) by pressing the letter "f":
Let's remove (filter out) ID and GID and add CPU POWER STATS.
Type the letters "A", "B", and "J" (NOTE: Make sure these are capitalized as these are
case sensitive!):
ABJ
You should see the * next to A: and B: disappear and the * next to J: appear. Press
Enter to resume the esxtop screen.
HOL-2004-01-SDC Page 99
HOL-2004-01-SDC
Note that the Power column reports 0. This is due to the nature of the Hands On Lab and
also because the host is idle.
By default, this screen shows performance counters for both virtual machines and ESXi
host processes. To view the host power screen in esxtop, type a lowercase "p":
Press the letter "f" to see available fields to add to the screen:
To add the Percentage of aperf to mperf ratio (%Aperf/Mperf) press the letter "f".
Press the letter "F" again:
To filter the displayed fields, press the letter "f" and press Enter:
Let's remove GID and add Swap Statistics (SWAP STATS). Press the letters "BK" and
press Enter:
BK
To filter the displayed fields, press the letter "f" and press Enter.
To remove Path Name (PATH) and Number of Paths (NPATHS) press the letters "B" and
"C" and press Enter:
BC
To filter the displayed fields, press the letter "f" and press Enter.
BJK
To filter the displayed fields, press the letter "f" and press Enter. To add identification of
uplinks (UPLINK) press "B" and press Enter:
As you can see, you have added the previously hidden UPLINK field to see which
networks have uplinks.
To write these custom settings to a resource file we'll name .esxtopHOL, type "W
.esxtopHOL" then press Enter:
W .esxtopHOL
Note: Don't edit these resource files manually! For changes, run esxtop and follow the
preceding steps to change the resource file.
When you invoke esxtop either interactively or through batch, esxtop looks for the
default resource file and automatically applies any filters it finds.
You can create different resource files for specific components. For example, you may
want to create a CPU-only resource file, memory-only, and so forth.
To see the differences between the default resource file .esxtop60rc and your custom
.esxtopHOL resource file:
Let's say you want to customize your view for capturing performance statistics at
different times of the day for several minutes at a time. This is especially useful when
using esxtop in batch mode. You can create several resource files and use them to filter
your initial view when invoking esxtop whether interactively or through batch.
esxtop -c .esxtopHOL
For more information on using esxtop in batch mode to capture statistics and analyze
them later, see the next section.
In the previous module we filtered the fields to display in esxtop and saved our
preferences in the esxtop resource file. Now that we're collecting only the statistics that
you find interesting, we can invoke esxtop in batch mode and capture the statistics in a
Comma-Separated Values (.csv) file to share with colleagues and graph to look at
trends during the collection period.
As we saw in the previous module, you can invoke esxtop interactively and apply your
custom resource file settings with the -c switch.
For example, if you created a resource file for all fields under the CPU display and
named it .esxtopallcpustats, you can invoke esxtop and use the resource file to apply
your preferred filters:
esxtop -c .esxtopallcpustats
For this lab, we already created a sample resource file named .esxtopHOL. This
resource file captures only the statistics we selected in the previous module.
Let's start a workload and use esxtop in batch mode to capture only the statistics we
requested.
If you don't already have a "Windows PowerShell" window open, click on the
"Windows PowerShell" icon in the taskbar.
Type
.\StartCPUTest2.ps1
and press Enter. Depending on the load of the lab systems, this may take several
minutes.
Open PuTTY
SSH to esx-01a
Invoke esxtop in batch mode and apply our custom settings with the -b, -d, and -
n switches:
where:
The above command collects 100 total samples every two seconds over the course of
200 seconds and writes the statistics to a file named /tmp/esxtop_HOLstats.csv.
To look inside the esxtop output .csv file, type "more /tmp/esxtop_HOLstats.csv" and
press Enter:
more /tmp/esxtop_HOLstats.csv
As you can see, the output .csv file contains all the statistics we selected. You now can
use NMON Visualizer to graph the statistics as described in the next module. You also
can copy the .csv to a Windows system and use PERFMON to analyze the statistics you
collected.
The next sections discuss examples of how to apply additional esxtop switches.
If you want to override any resource files and record all metrics, add -a:
The esxtop output .csv file grows quickly, so you can pipe the output into a
compressed file:
Conclusion
• https://communities.vmware.com/docs/DOC-11812
• http://www.yellow-bricks.com/2010/06/02/esxtop-l/
• http://www.yellow-bricks.com/2010/06/01/esxtop-running-out-of-control/
You can graph the contents of the esxtop output file to visualize vSphere performance
over the collection interval. This module discusses using NMON Visualizer, a free Java
program that graphs the contents of .csv files. You also can use Windows PERFMON to
view the results.
NMON Visualizer is a Java program and can run on any operating system where Java is
installed, and the user interface is the same no matter which platform you use. Let's get
familiar with it on Windows.
We'll use the esxtop batch output file we created in the previous module. First, you need
to copy the .csv output file from the ESXi host to the desktop.
You need to load the .csv file into NMON Visualizer. In the NMON Visualizer window:
1. Click on File
2. Click on Load...
Click on the gray triangle next to the host "esx-01a.corp.local" to expand the list of
collected statistics.
1. Click on the gray triangle next to "Physical Cpu" to open its folder
2. Click on the word "Total"
The graph displays total Physical CPU utilization broken down into Processor Time
and Util Time. During the test, physical CPU averaged about 54% utilization.
Let's clear the CPU statistics and look at physical disk activity.
Click on the gray triangle next to "Physical Cpu" to close its folder.
1. Click on the gray triangle next to "Physical Disk" to open its folder
2. Click on the last entry in the folder for vmhba65:vmhba65:C0:T0:L2:
We can see that disk utilization increased towards the end of the collection period.
Let's narrow down the collection and see the load for a particular time period.
1. You can see the system time interval when esxtop capture performance
statistics.
2. You can add a custom interval to narrow down the time period.
You can see that we've narrowed the statistics to display only the records from
14:40:48 to 14:41:31. We can narrow down further to see only the physical disk
statistics we're interested in.
With NMON Visualizer you can add or remove statistics dynamically. In the box under
the graph, click on the first three check boxes next to Physical Disk Path and
deselect:
1. Command/sec
2. Reads/sec
3. Writes/sec
During this lab we learned how to customize the performance statistics we collect using
resource files, how to save the statistics into an output .csv file, and how to graph the
statistics and produce performance charts.
We have only scratched the surface of what esxtop can do. If you want to know more
about esxtop, see these articles:
Clean up procedure
To free up resources for the remaining parts of this lab, we need to shut down all used
virtual machines and reset the configuration.
Click on the Windows PowerShell icon in the taskbar to open a command prompt
NOTE: If you already have one open, just switch back to that window.
Reset Lab
.\StopLabVMs.ps1
and press Enter. This resets the lab into a base configuration. You now can move on to
another module.
Conclusion
This concludes the esxtop in Real-World Use Cases module. We hope you have
enjoyed taking it. Please remember to fill out the survey when you finish.
Module 5 - vCenter
Performance Analysis (30
minutes)
Introduction
vSphere 6.7 delivers an exceptional experience with an enhanced VMware
vCenter® Server Appliance™ (vCSA). As mentioned earlier, when measuring
the performance of vCenter 6.7 versus 6.5, performance engineers saw much
higher performance (throughput) and lower latency with operations such as
powering on/off VMs.
This module will show you how to monitor the health/performance of your vCenter
Server using the vCenter Server Appliance Management Interface (VAMI),
as well as tools for detailed analysis, including vimtop, profiler, pg_top and
postgres (database) log files.
Please check to see that your lab is finished all the startup routines and is ready for you
to start. If you see anything other than "Ready", please wait a few minutes. If after five
minutes you lab has not changed to "Ready", please ask for assistance.
For most customers, vCenter looks like a service (vpxd) that UI and API clients make
requests to, and vCenter stores inventory information (hosts, clusters, VMs) in a
database.
Many years ago, vpxd used to be a monolithic service, and while it's still conceptually
the same, there is a lot more going on under the hood to provide improved
performance, additional features, etc.
Here is what a vCenter Server/vCSA looks like under the hood. Don't worry, we'll touch
on the most important of these as we look at debugging tools later in this module.
The VAMI was included in the early versions of vCSA, but was removed by VMware
in vSphere 6.0 and then reintroduced once more in vSphere 6.0 U1. The revamped
VAMI in vCenter 6.7 uses HTML and has a new look and feel. In this section, the
VAMI within the HOL environment will be accessed and some of its performance
monitoring features will be showcased, along with some guidance on what to look
for in case performance is not what you would expect.
Open Chrome
1. In the upper-left of the Chrome window, click the HOL Admin folder.
2. Click the vcsa-01a Mgmt bookmark. This is the VAMI interface.
1. Username: root
2. Password: VMware1!
This is the Summary screen of the VAMI, which is the default when you login. Note a
couple of things:
1. This is a useful Health Status table, which shows various states of the vCenter
Server (vCSA). In this example, everything is in the "Good" (healthy) state.
2. Click Monitor to explore the various subsystems that are monitored.
1. Upon clicking Monitor, the first screen shown is CPU & Memory
2. This shows the percentages of CPU & Memory consumption.
3. By default, the time range is over the last hour, but you can change the time
range at the top right of the screen.
4. A good rule of thumb is to keep both CPU & Memory less than 70%. What if
they're higher? Here are some options:
◦ Split the inventory of the vCenter (hosts, clusters, VMs, etc.) across one or
more vCenter Servers. Using vCenter Enhanced Linked Mode allows you to
log in to any single instance of a vCSA and view/manage the inventories of
all the vCenter Server systems in the group. You can join up to 15 vCSA
deployments with vCenter Enhanced Linked Mode.
◦ For CPU > 70%, Add Virtual CPUs to the vCSA VM.
◦ Keep in mind that the CPU scale goes from 0-100% utilization and doesn't
separate out the activity by the individual vCPUs of the vCSA VM.
▪ For example, if you're showing 25% utilization and your vCSA has 4
vCPUs, this could mean that the workload is being divided evenly
between each vCPU, but it could also indicate that one vCPU is
being utilized 100% of the time.
Many services that run on the vCSA are single-threaded, so you do
need to keep this in mind. If you suspect that a single vCPU is being
heavily utilized, you can monitor the CPU activity of the vCSA on a
per-CPU basis from the vSphere client or by using vimtop (which
we'll learn about later).
◦ For Memory > 70%, Change the Memory Configuration of the vCSA VM.
◦ Consider setting a memory reservation for the vCSA VM. For more
information, see Allocate Memory Resources.
1. You are now on the Disks section of the Monitor screens. The Disks screen
shows all of the virtual hard disks the vCSA is using, the purpose of the partition,
and how much disk space is being consumed.
2. The DB, DBLog, and SEAT (Stats/Events/Alarms/Tasks) partitions are write-
intensive, so placing this data on SSDs (solid state drives) is preferred to achieve
optimal performance.
3. Let's move on to the next screen. Click Network.
1. The Network screen shows a variety of network statistics, including transmit (tx)
and receive (rx) throughput (KB/sec), for both loopback and eth0. Unlike CPU &
Memory, you'll need to click through the list of these counters to get an accurate
portrayal of the network activity of the vCSA. Although these counters should be
monitored, networking is usually not an issue with the vCSA.
2. The important thing to check is that you don't see any errors (as shown here, the
value is 0) for eth0 tx/rx errors detected as well as packets dropped. If
greater than zero, you should look into whether there are networking
infrastructure problems in your environment.
3. Let's move on to the next and final screen. Click Database.
1. The Database monitoring tab is arguably the most important, as the information
that it provides is not easily obtained by any other means. The vCSA uses a
PostgreSQL database to store persistent information for the vCSA.
2. The Database page is divided into two charts: Seat space and Overall space
utilization trends. Use Alarms to avoid running out of disk space.
3. The Seat section displays the statistics, events, alarms and stats. These
different categories can be displayed as graph lines by clicking on their names
below the Seat graph.
The total Seat utilization is shown in the bottom graph, as well as the DB log and
core utilization; these graph lines can also be removed from the graph by clicking
on the associated name below the graph. If any of these sections start to fill up,
the reason for this anomaly should be investigated and appropriate actions taken
to ensure that the vCSA database performs as expected.
1. We just covered all of the performance monitoring features of the Monitor tab.
2. While unrelated to performance, you should back up your vCSA on a regular
basis, especially before you perform a major operation on your vCSA such as
updating it. The VAMI tool includes a powerful backup tool (the Backup tab
highlighted) that lets you back up the data on your vCSA either on demand or on
a set schedule. This tool is unique in that, in order to be as space efficient as
possible, it only backs up the data on the vCSA and not the entire vCSA. To
restore the vCSA, you reinstall the vCSA and then restore the backed up data on
it. The restore process can be initiated from the vCSA installation ISO.
3. One of the most critical tasks you can perform to make sure that your vCSA is
safe, secure, reliable, and performant is keeping it updated, and the VAMI has a
feature included that makes the upgrade process as painless as possible: the
Update tab. Let's take a look at what this screen looks like.
1. When you click the Update tab, a screen appears with current version details.
2. In the upper right-hand corner of the screen is the Check Updates button, which
downloads a list of the latest patches and updates for your vCSA from VMware.
3. Once the list is downloaded you can click on the patch to review important
information about it, including its criticality, the size of the download, and
whether will require a reboot of your vCSA.
4. To install the patch or upgrade, you can select Stage Only or Stage and Install.
If you select Stage Only, it only downloads the patch and then later you'll have
the option to install it when you see fit.
Since this is a lab environment, it is not feasible to upgrade the vCSA, as this is is a
resource- and time-intensive process.
For your environment, however, VAMI Monitoring, Backups and Updates will ensure your
vCSA is running as optimally as possible.
Conclusion
The vCSA has become the de-facto standard in most datacenters for managing a
vSphere environment. For your vSphere environment to run most efficiently, you need
to ensure that the processes running on your vCSA have the resources that they need;
by using the VAMI, you can monitor the performance of the vCSA and detect
abnormalities. You can also use the VAMI to back up and update your vCSA to ensure
that it's patched to the most recent version so that, in the case of a catastrophic event,
you can recover easily and efficiently.
Credits to Tom Fenton and Ravi Soundararajan for much of this VAMI content. For more
information on how to use the VAMI, see Tom's great blog article:
https://virtualizationreview.com/articles/2018/09/10/how-to-use-vami.aspx
Open PuTTY
Open vimtop
vimtop
Here is an example screenshot of vimtop running within the lab environment. If you're
familiar with top (the Linux performance monitoring tool) or esxtop (the equivalent for
ESXi), you'll notice vimtop has a similar look and feel. The default vimtop screen
provides you with an overview and task pane. The overview pane quantifies the CPU
and memory resources that your vCSA is currently consuming (the top half of the
screen); the task pane (bottom half) shows you the processes that are consuming the
most CPU resources. The CPU activity should never total more than 70% for your vCSA.
By default, vimtop refreshes its data every second. To pause this automatic refresh,
press "p"; alternatively, to set a lower refresh rate, press "s" and then enter the number
of seconds between screen refreshes.
To see the help menu, press "h." The help menu will explain how to add, remove and
reorder columns from vimtop. To quit vimtop, press "q".
This is what vimtop looks like during a "churn" benchmark, which basically consists of
creating a VM, powering it on, running for a while, powering it off, and then deleting it.
• vCenter Server (vpxd) is consuming 51.43% CPU, which is over 1/2 of 1 core
• vCenter Server is consuming the highest %CPU/%MEM
• vPostgres is the next big consumer (since vCSA must persist its data to the
database). However, it is multi-processed, unlike vCenter Server, and its threads
are consuming high %CPU/%MEM as well
With their benchmark vcbench, VMware performance engineers measured the number
of operations per second (throughput) that vCenter produced.
This benchmark stresses the vCenter server by performing typical vCenter operations
like power on and off a VM, among several others. vCenter 6.7 performs 16.7 operations
per second, which is a twofold increase over the 8.3 operations per second vCenter
6.5 produced.
This is what vimtop looks like during a tagging benchmark (which performs/simulates
advanced API calls, such as PowerCLI Get-Tag). Behind the scenes, tagging goes
through a proxy, the endpoint, through the data service, to the vpxd services (aka
vCenter Services, aka the tagging service).
This screen shows a couple of processes, and here some additional ones that may pop
up:
The vCenter UI runs as a Java process within the vCSA, and as such, if the CPU
utilization is consistently high, i.e. 100% (as shown here; note that this is not 100%
across all vCPUs, just of one core), for a prolonged period of time, it may be invoking
garbage collection too often. This is an indicator that it may not have enough memory.
Let's look at a command that will show you how to increase the memory size.
Assuming you still have the PuTTY session open to vcsa-01a, type this command:
cloudvm-ram-size -l vsphere-client
This will show you the memory allocated to the vsphere-client process in your particular
environment (853MB in this example; this will be different in your environment).
You can increase this by using this command:
where 1000 is the value in MB that you want to increase the service's memory to.
Note that the preferred method would be shutting down your vCSA and assigning it that
VM more virtual memory, which should auto-scale all the processes such as
vsphere.client, but that does involve some downtime.
Conclusion
vimtop is a very powerful real-time tool to show you real-time resource issues that may
be adversely affecting the performance of your vCSA.
For more information on vimtop, please visit these excellent resources online:
• https://virtualizationreview.com/articles/2018/04/03/ow-to-monitor-a-vcsa-using-
vimtop.aspx
• https://virtualizationreview.com/articles/2018/09/19/vami-and-vimtop-vcsa.aspx
• https://virtualizationreview.com/Forms/Search-
Results.aspx?query=vimtop%20fenton&collection=VTR_Web
Open PuTTY
Let's find where the vpxd profiler logs are in the lab environment. If you don't already
have a Putty session open to vcsa-01a, click on the PuTTY icon on the taskbar.
SSH to vcsa-01a
1. To find the vpxd profiler log files, execute these commands in the PuTTY window:
cd /var/log/vmware/vpxd
ls -l vpxd-profiler*
2. Note that vpxd-profiler.log is a symbolic link to the most recent log file, while the
older profiler logs are compressed (gzipped).
3. Let's look at the file format of this log file. Run this command:
less vpxd-profiler.log
vpxd-profiler.log example
1. Timestamp
2. Key-Value pairs (i.e. a vCSA setting, and the value the setting was set to)
This is a large file, with a lot of counters, so what are some useful ones? We'll look at
some next.
Here are a few counters that may be useful while troubleshooting vCSA performance:
Press "q" when you are done reviewing the vpxd profiler log file.
Open PuTTY
Let's look at the Postgres logs and the pg_top command in the lab environment.
SSH to vcsa-01a
To list the Postgres log files, run these commands in the PuTTY window:
cd /var/log/vmware/vpostgres/
ls -l postgresql-*
Note that each numbered log file is for a different day of the month; for example,
postgresql-01.log above would contain the database log entries from June 1.
Let's search for log entries with the string 'duration' to see which SQL queries took
longer than one second (1,000 ms):
For stats and events tables, these durations are OK. For other tables (core tables: host
tables, VM tables, network tables), if you notice SQL queries consistently taking an
abnormally long time (multiple seconds), that could indicate a performance issue with
your database.
How do we look at database performance once we suspect there's an issue? We'll look
at pg_top next, a tool to do just that.
Running pg_top
cd /opt/vmware/vpostgres/current/bin/
./pg_top -U postgres -d VCDB
If you're familiar with top (the Linux performance monitoring tool) or esxtop (the
equivalent for ESXi), you'll notice pg_top has a similar look and feel. The default pg_top
screen provides you with an overview and task pane. The overview pane quantifies
the CPU and memory resources that your PostgreSQL database (VCDB) is currently
consuming (the top half of the screen); the task pane (bottom half) shows you the
processes that are consuming the most CPU resources. The CPU activity should never
total more than 70%.
By default, pg_top refreshes its data every second. To pause this automatic refresh,
press "p"; alternatively, to set a lower refresh rate, press "s" and then enter the number
of seconds between screen refreshes.
To see the help menu, press "h." The help menu explains how to add, remove, and
reorder columns from pg_top. To quit, press "q".
Here is what pg_top looks like; as you can see, much like top, esxtop, or vimtop, it
shows you real-time CPU and memory process usage, but only for the PostgreSQL
database (VCDB).
There are many single-character commands available from this screen. Press "?" to
see a list of them.
Here is a list of pg_top commands. Note that since this a database-specific top, we can
use the "Q" command to show the query of a currently running process, which can be
useful to understand which see what table a SQL query is accessing.
Press the Space Bar a couple of times to return to the main pg_top screen.
Here is another screenshot of pg_top, but while the PostgreSQL database was running a
CPU-intensive query. Here are some things to note:
Since we are not running a benchmark in the lab environment, the next screen will show
you what the output would be upon typing "Q" and then the PID (3063).
Here is the result of querying the CPU-intensive query (PID 3063). The "SELECT
sc.stat_id" confirms that the SELECT SQL command was on the stats table.
Your environment (queries, tables) may be different; just be mindful of queries that are
long-running may be scanning all partitions.
• If your vCenter Server has a large inventory and/or has been running for a long
time, it may have a lot of old data (tasks, events, statistics). Managing these
Clients: UI
Here are some ways to ensure the vCenter user interface (UI) performance is optimal.
This is an example of some PowerCLI code that was taken from the VMware Community
Forums: https://communities.vmware.com/thread/499845
While it gets the job done, internal performance testing with 20 hosts and 300 VMs
showed that this code ran for 80 seconds. Let's see how this code could be optimized,
and how much faster it could run.
Note that this PowerCLI code does the same thing, but it makes much fewer API calls to
vCenter -- namely, the highlighted Get-VM and Get-VMHost calls are only executed once
- outside of the ForEach loop. Minimizing unnecessary/repeated PowerCLI calls is
key to obtaining better client API performance.
By doing this, the runtime for the script was reduced from 80 seconds to 7.5 seconds (a
10x speedup).
To free up resources for the remaining parts of this lab, we need to shut down all used
virtual machines and reset the configuration.
Click on the Windows PowerShell icon in the taskbar to open a command prompt
NOTE: If you already have one open, just switch back to that window.
Reset Lab
.\StopLabVMs.ps1
and press Enter. This resets the lab into a base configuration. You now can move on to
another module.
Conclusion
This concludes the vCenter Performance Analysis module. We hope you have
enjoyed taking it. Please remember to fill out the survey when you finish.
Module 6 - Database
Performance Testing with
DVD Store (30 minutes)
Introduction
This Module introduces DVD Store 3, also known as DS3 for short. It simulates
an online store that allows customers to logon, search for DVDs, read customer
reviews, rate helpfulness of reviews, and purchase DVDs.
• Simulates an online store that allows customers to login, search, read customer
reviews, rate helpfulness of reviews, and purchase DVDs
• Open-source: https://github.com/dvdstore/ds3
• Incorporated as the e-commerce simulation workload of the VMmark 3.0
benchmark: https://www.vmware.com/products/vmmark.html
• OLTP workload (similar to TPC); performance is measured in Orders Per Minute
(OPM)
• Supports Oracle, SQL Server, and MySQL
• Utilizes many database features, including stored procedures, transactions,
triggers, foreign keys, and full-text indexes
• Latest version includes customer reviews with intelligent review rankings
• Workload can be run at varying load levels to determine the highest performing
test configuration
• Note: while DS3 builds on its predecessor (DS2), its new and more complex
queries mean that results are not comparable to previous releases
DVD Store 3 supports three standard sizes of small, medium, and large. In addition to
these standard sizes, any custom size can be specified during the DVD Store setup. The
number of rows in the various tables that make up the DVD Store 3 database are what is
varied to determine the size specified.
The table below shows the number of rows for the standard sizes for the Customers,
Orders, and Products tables as examples:
NOTE #1: The LAMP stack is only one of the supported environments for DVD
Store 3. The benchmark supports a variety of databases: Microsoft SQL Server,
Oracle, MySQL, and PostgreSQL.
NOTE #2: This VM and database have already been created; this is informational,
if you'd like to set it up for testing in your own environment.
Creating the database is resource intensive, in terms of both time and storage, so
it is not available for the hands-on lab environment.
Create a Linux VM
This screenshot shows that, in our lab environment, DVD Store 3 is installed in a CentOS
Linux VM with 1 vCPU, 1 GB of memory, and a 10 GB hard disk.
You may notice that these are lower minimum system requirements than the
Weathervane module. There are a couple of reasons for this:
1. We are only exercising a couple of applications in this VM (namely MySQL for the
database tier and Apache HTTP Server for the web server tier).
2. This VM has been built with a small database size. From the previous lesson, we
learned that DVD Store 3 comes in 3 sizes: small (10 MB), medium (1 GB), and
large (100 GB). For building a medium or large database, you should scale up the
CPU, memory, and disk size appropriately.
OS Installation/Post-Install Tasks
DVD Store should work on any modern Linux distribution. This VM was installed with
CentOS 6.8.
After the OS installation, some of tasks should be run as the root user, prior to installing
DS3:
NOTE: these have already been done in our lab environment; do not run these
commands now!
DVD Store 3 is an open source project that is actively developed and maintained. The
latest version can be downloaded from github, as shown here, from https://github.com/
dvdstore/ds3/
To extract DS3, login as root to your CentOS host, and unzip it with the command unzip
ds3-master.zip
NOTE: This has already been done in our hands-on lab environment, so do not run this
command in the lab VM.
Finally, we need to copy the PHP Web pages to the correct place on the host (again, this
has already been done in our lab, no need to run):
mkdir /var/www/html/ds3
cp /root/home/ds3/mysqlds3/web/php5/* /var/www/html/ds3
service httpd restart
We run the configuration script to generate the necessary SQL commands, but due to
time and resource constraints, we do not run the actual build. Our lab environment
already has a pre-built database ready to run.
Double click the Performance Lab MS shortcut on the Main Console desktop.
Start Module 6
Click on the Module 6 Start button (highlighted) to run a PowerShell script that starts
the DVD Store 3 VM, and open a PuTTY session to it.
Once the module starts, a PuTTY window and a popup box appear indicating that is has
started (as shown here). Click OK.
Remember earlier when we learned DS3 has three "canned" database sizes (small,
medium, and large)? Well, we can also specify a custom database size to build. Here's
how:
(Press Enter after each command/value)
1. Change to the DS3 directory. In this VM, it's been installed to /root/ds3 and you're
already in the /root folder so type:
cd ds3
2. Run the Install_DVDStore Perl script:
perl Install_DVDStore.pl
3. We are now asked how big we want our DS3 database to be. Let's build a 100 MB
MySQL database:
100
4. When asked if the database size is in MB or GB, specify MB:
MB
5. Since DS3 supports multiple databases, we need to specify MYSQL:
MYSQL
6. Finally, DS3 needs to know if the database server will be on a Windows or Linux
machine; this determines whether the input files will have CR/LF (DOS format).
Choose LINUX:
LINUX
• Calculate number of rows for the customers , orders and products tables according
to the database size
• Generate the .CSV (Comma-Separated Values) files for each table in the
appropriate folder
• Create the SQL database scripts to build and clean up the database
This is how the script looks upon completion. Look for the message highlighted:
Completed creating and writing build scripts for MySQL database...
Now that all the MySQL scripts have been generated, the database would normally be
built at this point. The reason that the scripts are generated instead of just doing the
database creation directly is that it allows for the database to be easily recreated later,
or even modified if needed, to address specific testing requirements of individual
environments.
The database build is accomplished by the following commands. NOTE: Do not run
these commands in the lab environment, for a couple of reasons: the database
build takes a long time, and we have already saved you the trouble (a database has
been built and is ready to run).
Now that we've seen how to build a DS3 database, let's start an actual run!
To view the performance of the DVD Store VM, type the command top and press Enter.
This shows us how much CPU and memory are consumed along with which processes
are taxing the VM the most.
Next, we kick off the DS3 driver from our Windows machine.
On the Main Console (Windows desktop), double-click the DVD Store 3 Driver icon
shown here (note: you may need to minimize some windows in order to see it).
Monitor the driver and the PuTTY windows during the run
While the run is progressing, you should watch both the PuTTY console running top
(shown here on the top) and the DS3 driver window (shown here on the bottom).
Let's make some observations about this screenshot (note: due to the variability of the
cloud, your performance may vary):
1. The CPU utilization line in top shows us that 34.2% is consumed in user space
(application), 9.3% in system (kernel), for a total of 43.5% CPU. There is zero
idle time, however; the rest of the CPU (55.5%) is waiting for I/O -- meaning we
likely have a disk or network bottleneck in our environment.
2. The process that is consuming the 43.5% CPU utilization we saw is mysqld (the
MySQL database) -- which makes sense, since we're hammering it with a
database benchmark!
3. These are normal DS3 driver startup messages, indicating the various threads
that are connecting to the database server before the actual run begins
4. Approximately every ten seconds, you will see a performance summary output to
the screen (notice et , elapsed time, goes up by ten each line).
5. There are many statistics on each line (many of them dealing with rt which is
short for response time), but we're most interested in the primary DVD Store
throughput performance metric, known as opm or orders per minute. Here we
can see we're only achieving about 40 opm on average, which is very low. You
would achieve much higher opm numbers in an optimized testbed.
Here's the command we used on the Windows machine to start the driver, in case
you're curious:
You can see a list showing each Parameter Name, Description, and Default Value.
You can also create a configuration file and pass that on the command line instead of
manually setting each parameter.
Performance metric
Definition Value
(abbreviation)
Orders Per Minute
opm Higher = better
(throughput)
rt Response Time (latency) Lower = better
We will look at a couple of results that we'll call "bad" and "good".
• A "bad" or low-performing configuration has low opm (orders per minute) and
high response time (rt).
• Conversely, a high-performing configuration has high opm (orders per minute)
and low response time (rt).
Let's compare this to a high-performing run that was done in an isolated dedicated lab
environment.
1. This summary line which starts with Final that shows the overall performance
statistics.
et= 609.4 tells us that this was a 10-minute run (~600 seconds).
2. opm=74932 indicates this database server was able to process 74,932 operations
per minute. This is much higher than the previous example, as it is a highly-
tuned performance configuration.
3. rt_tot_avg=87 tells us that the average response time was only 87
milliseconds. Again, this low value is in stark contrast to the previous example.
So what factors determine whether a database server can sustain high load, and thus
achieve the maximum opm?
• Follow the Performance Best Practices for VMware vSphere 6.5. This guide
covers hardware (processors, storage, network), the ESXi/vSphere hypervisor,
and virtual machine (guest operating system) performance tuning.
• Follow the best practices for your particular database server. Here are some
good examples:
◦ SQL Server: Architecting Microsoft SQL Server on VMware vSphere Best
Practices Guide
◦ Oracle: Oracle Databases on VMware Best Practices Guide
◦ MySQL: MySQL Performance Tuning and Optimization Resources
• Check out some recent whitepapers from the VMware performance team that
used DVD Store 3 on vSphere 6.5:
◦ SQL Server VM Performance with VMware vSphere 6.5
◦ Oracle Database Performance on vSphere 6.5 Monster Virtual Machines
By following these guides and testing the performance of your particular environment
prior to production deployment, you can ensure your virtualized databases achieve
maximum throughput.
You've also learned how to tune your database server to achieve the
maximum orders per minute (opm), so your database throughput will be
as high as possible with the lowest response times.
Stop Module 6
1. Click on the Module Switcher in the taskbar (or the desktop icon if you closed
it)
2. Click the Stop button for Module 6.
Resources/Helpful Links
For more information about DVD Store 3, and database performance in general, here are
some helpful links:
Best Practices:
• https://blogs.vmware.com/performance/2017/06/introducing-vmmark3.html
• https://www.vmware.com/products/vmmark.html
Module 7 - Application
Performance Testing with
Weathervane (45
minutes)
Introduction
• What is Weathervane?
• Installing Weathervane
• Configuring Weathervane
• Running/Tuning Weathervane
• Resources/Conclusion
What is Weathervane?
This lesson will describe what the Weathervane benchmark is, and how it is
different from traditional benchmark workloads.
Weathervane Description
Weathervane Components
Weathervane consists of three main components (if the picture above seems daunting,
do not fear: this lab has all three components running inside one Linux VM!). It is
possible to run every Weathervane service in one VM or container, but it is also possible
to run only specific service tiers, or even only specific service instances.
1. The Workload driver that can drive a realistic and repeatable load against the
application
2. The Run Harness that automates the process of executing runs and collecting
results and relevant performance data
3. The Auction Application itself is a web-application for hosting real-time
auctions.
We will take a look at each of these components in more detail then run Weathervane in
our lab environment.
Workload Driver
Run Harness
The Weathervane run harness is controlled by a configuration file that describes the
deployment, including:
Later in this module, we will start an actual run in the lab environment using the
harness to see how easy it is -- it is literally just one command!
Auction Application
The Auction Application, as we can tell from the picture above, is the most complex
portion of Weathervane.
It is a web app that simulates hosting real-time auctions. It uses an architecture that
allows deployments to be easily scaled to, and sized for, a large range of user loads. A
deployment of the application involves a wide variety of support services, such as
caching, messaging, data store, and relational database tiers. Many of the
services are optional, and some support multiple provider implementations.
A default Weathervane deployment like the VM in this lab uses the following
applications (click the links for more information about the applications). All are set up
"out of the box" (ready to run) via the automatic setup script that comes with the
benchmark:
Downloading/Installing Weathervane
This lesson describes how to install the Weathervane benchmark. It is very easy
set up as most of it is automated.
NOTE: Weathervane has already been installed in our hands-on lab environment,
so this lesson is purely informational (for example, if you want to learn how easy it
is to install Weathervane in your own environment). In the next lesson, we will
configure and run Weathervane in the lab environment.
Create a Weathervane VM
As shown in the screenshot, the virtual hardware must have at least 2 CPUs, 8 GB of
memory, and at least 20 GB of disk space (we used 30 GB in this example). For
larger deployments, the hardware can be scaled up appropriately (see the Weathervane
documentation for more details).
Install CentOS 7
The CentOS 7 installation may be a Minimal Install (the default, as shown) or a full
desktop install.
In fact, you may want to create one Weathervane host with a full desktop install for
running the harness, and a second with a Minimal Install for cloning to VMs for running
the various Weathervane services.
After completing the OS installation, some of tasks should be done prior to installing
Weathervane:
1. Update all software packages by running the command yum update as the root
user.
2. Install VMware Tools (for CentOS 7, open-vm-tools) by running the command yum
install -y open-vm-tools as the root user.
3. Install Java by running the command yum install –y java-1.8.0-openjdk* as the root
user.
4. Install Perl by running the command yum install –y perl as the root user.
NOTE: These commands will not work in the lab environment, but these tasks have
already been performed in our VM.
A release tarball is a snapshot of the repository at known good point in time. Releases
are typically more heavily tested than the latest check-in on the master branch.
To install Weathervane, login as root to your CentOS host and unpack the tarball with
the command tar zxf weathervane-1.0.14.tar.gz
NOTE: This has already been done in our hands-on lab environment, so do not run this
command in the lab VM.
To build the Weathervane executables, unpack the tarball in the previous step, go into
the /root/weathervane directory and issue the command:
./gradlew clean release
NOTE: This has already been done in our hands-on lab environment, so do not run this
command in the lab VM.
The first time you build Weathervane, this downloads a large number of dependencies.
Wait until the build completes before proceeding to the next step.
The auto-setup script configures the VM to run all of the Weathervane components.
NOTE: the VM must be connected to the Internet in order for this process to succeed.
From the Weathervane directory, Run the script using the command:
./autoSetup.pl
NOTE: This has already been done in our hands-on lab environment, so do not run this
command in the lab VM.
The auto-setup script may take an hour or longer to run depending on the speed of your
internet connection and the capabilities of the host hardware.
Once it has completed, the VM must be rebooted. Weathervane is now ready to run!
Configuring Weathervane
This lesson describes how to start the lab and configure the Weathervane benchmark on
our lab environment deployment.
Double click the Performance Lab MS shortcut on the Main Console desktop, or switch
to that window on the taskbar.
Start Module 7
Click on the Module 7 Start button (highlighted) to start a PowerShell script to start the
Weathervane VM, and open two PuTTY sessions to it.
Once the module starts, you see two PuTTY windows side-by-side and a popup window
(as shown here). Click OK.
Configuring Weathervane
We should look at the Weathervane configuration file to see how configurable this
benchmark is.
In the PuTTY window on the left, type this command and press Enter:
less weathervane.config
We can now use the standard navigation keys (Up/Down arrows, Page Up/Down) to
see the various parameters to customize.
We are now looking at the beginning of the Weathervane configuration file. As standard
with most configuration files, lines that start with "#" are commented out and thus
ignored by Weathervane.
Highlighted here is one of the most useful parameters (which is why it is at the top!):
users. As the comments state, this determines how many simulated users are active
during a Weathervane benchmark run. This has already been reduced to the minimum
value of 60 due to the constraints of our lab environment, but the default is 300 as we
will see next.
In the right-hand PuTTY window, type the following command and press Enter:
(Note: the character before less is the pipe symbol (typically typed by holding down
Shift and pressing the backslash \ key. You can also select this text and drag-and-drop it
directly into the PuTTY window -- try it!)
The --help command we just ran lists all the Weathervane command-line parameters. If
any of these parameters are set on the command line, it will override both the
Parameter Default and even the value set in the weathervane.config file we just looked
at.
As shown in this screenshot, the users parameter defaults to a value of 300, but we
have set it to the minimum value of 60 in the weathervane.config. If we wanted to try a
Weathervane run of 100 users, we could override it on the command line, i.e.
./weathervane.pl --users=100.
In both PuTTY windows, press the Page Down key to scroll down to the next page, and
you should see a screen similar to this. As the help text explains, Weathervane has
three run length parameters: rampUp, steadyState, and rampDown. To make it easier,
you can set all three parameters by changing runLength to short, medium, or long.
In the interest of time (and to not tax our lab environment for any longer than it needs
to be!), we have set the values to 30, 60, and 0 in our configuration file. In an actual
benchmark environment, we would want to set runLength to medium or long to gauge
performance over a longer period of time.
At this point, feel free to use the arrow keys and the Page Up/Page Down keys to
look at all of the parameters Weathervane supports. As you can see, it is very
configurable!
Now that we have looked at the Weathervane configuration file and the help text, left-
click in each PuTTY window and press q to "quit" less and return to the bash shell.
You should see a screen similar to this.
Running/Tuning Weathervane
This lesson describes how to run and tune the Weathervane benchmark using the VM
deployed in the lab environment.
Running Weathervane
Now that we have learned how to configure Weathervane, we can start a test run! This
is actually the easiest part, since the run harness automates starting the necessary
services, gathering performance statistics, and stopping the benchmark once the run
lengths we specified have elapsed.
Click in the left-hand PuTTY window, and start the Weathervane benchmark harness by
running one simple command (and press Enter):
./weathervane.pl
In the right-hand PuTTY window, the processes consuming CPU, memory, etc. in real-
time can be monitored while Weathervane is running by running the Linux top
command (press Enter afterwards):
top
• You can view the progress of the run on the left. Be patient, as it takes a few
minutes to start all of the services.
• You can view the processes for the various Weathervane services on the right.
1. This shows the CPU utilization of the two virtual CPUs (vCPUs); these values will
fluctuate throughout the run. In this screenshot, they are both heavily utilized
(95-96%), which is expected for this benchmark.
2. This shows the memory utilization of the VM.
The top line ( KiB Mem ) shows us that most of the 8 GB we have allocated to the
VM is used , with very little free ; again, this is expected, as there are many
services/processes running and consuming RAM.
Conversely, the next line ( KiB Swap ) shows that while we have ~3 GB of swap
space, most of it is free , and very little used ; this is a Good Thing, as Linux is not
having to swap memory to disk (which is likely what would happen if we did not
give the VM enough memory, i.e. 4 GB)
3. The bottom part of the top output shows the running processes, sorted by
highest CPU utilization ( %CPU ) first. At a quick glance, we can see that java
(Tomcat), mongod (MongoDB), and postgres (PostgreSQL) are the heavy hitters.
This benchmark run takes some time to complete (~15 minutes from start to finish).
While we wait, we can browse through the Weathervane documentation to see how we
can improve performance.
The Weathervane User's Guide comes as a PDF with the benchmark that shows how to
install, configure, and tune Weathervane. It also has a handy section on Tuning
Parameters.
We will not make you read this 99-page document from beginning to end :-) In any
case, we have already touched on a lot of what this guide covers in terms of installation
and configuration.
Therefore, scroll down to page 56 (shown here), which has a section on Component
Tuning. Skim through the next few pages to get a feel for the parameters you can
experiment with to tune the various tiers inside Weathervane:
• Web Server (Apache Httpd, Nginx Tuning Parameters): The run harness provides
a number of parameters related to tuning Nginx, but the harness can manage the
tuning of these parameters.
• Database Server (MySQL, PostgreSQL Tuning Parameters): The run harness
allows for automatic tuning or manually specifying values such as buffer sizes.
• MongoDB: The run harness allows for disabling/enabling transparent huge
pages.
• File Server (optional): If you choose to use a NFS file server instead of MongoDB
for the image store, you can adjust the processes and read/write buffer sizes.
Periodically switch back to the PuTTY windows to check on the progress of the run.
When the Weathervane benchmark run has finished, you will see screens similar to this
one. Specifically:
1. On the left, you will see messages about Cleaning and compacting storage , and
whether the run Passed or Failed .
NOTE: It is OK if it says failed and/or a message such as Failed Response-Time
metric . In our shared lab environment, the response times likely won't meet the
benchmark requirements. This would not be an issue in a dedicated test/dev
environment.
2. This Take specific note of the run number at the end (in this example, it is Run
8 ). We use that number in the next step when we look at the output files.
3. On the right, note the top screen will indicate the Linux VM is now essentially
idle ( %Cpu less than 1%, and most of the memory is free ).
4. Once you have confirmed the run is over, close the PuTTY window on the right
by clicking the "x" in the upper-right (click OK when PuTTY asks you to
confirm).
5. Maximize the remaining PuTTY window on the left by clicking the maximize
button in the upper-right, as shown.
After running the benchmark, you can look at the various log files the Weathervane run
harness collects:
6. cat console.log (not shown; this is just a record what you already saw output to
the PuTTY console, i.e. the start/stopping of services, whether the run passed or
failed, and cleanup)
Once you are done looking at these files, you can close this PuTTY console.
If a run passes, this means that the application deployment and the underlying
infrastructure can support the load driven by the given number of users with
acceptable response-times for the users' operations.
To end this module, open the Module Switcher window and click the Stop button for
Module 7.
Resources/Helpful Links
For more information about Weathervane, here are some helpful links:
Module 8 - Processor
Performance Monitoring,
Host Power Management
(30 minutes)
Performance problems may occur when there are insufficient CPU resources to
satisfy demand. Excessive demand for CPU resources on a vSphere host may
occur for many reasons. In some cases, the cause is straightforward. Populating a
vSphere host with too many virtual machines running compute-intensive
applications can make it impossible to supply sufficient CPU resources to all the
individual virtual machines. However, sometimes the cause may be more subtle,
related to the inefficient use of available resources or non-optimal virtual machine
configurations.
Please check to see that your lab is finished all the startup routines and is ready for you
to start. If you see anything other than "Ready", please wait a few minutes. If after five
minutes you lab has not changed to "Ready", please ask for assistance.
High Ready Time: A CPU is in the Ready state when the virtual machine is ready
to run but unable to run because the vSphere scheduler is unable to find physical
host CPU resources to run the virtual machine on. Ready Time above 10% could
indicate CPU contention and might impact the Performance of CPU intensive
application. However, some less CPU sensitive application and virtual machines
can have much higher values of ready time and still perform satisfactorily.
High Costop time: Costop time indicates that there are more vCPUs than
necessary, and that the excess vCPUs make overhead that drags down the
performance of the VM. The VM likely runs better with fewer vCPUs. The vCPU(s)
with high costop is being kept from running while the other, more-idle vCPUs are
catching up to the busy one.
CPU Limits: CPU Limits directly prevent a virtual machine from using more than a
set amount of CPU resources. Any CPU limit might cause a CPU performance
problem if the virtual machine needs resources beyond the limit.
Host CPU Saturation: When the Physical CPUs of a vSphere host are being
consistently utilized at 85% or more then the vSphere host may be saturated.
When a vSphere host is saturated, it is more difficult for the scheduler to find free
physical CPU resources in order to run virtual machines.
Guest CPU Saturation: Guest CPU (vCPU) Saturation is when the application
inside the virtual machine is using 90% or more of the CPU resources assigned to
the virtual machine. This may be an indicator that the application is being
bottlenecked on vCPU resource. In these situations, adding additional vCPU
resources to the virtual machine might improve performance.
Low Guest Usage: Low in-guest CPU utilization might be an indicator, that the
application is not configured correctly, or that the application is starved of some
other resource such as I/O or Memory and therefore cannot fully utilize the
assigned vCPU resources.
Double click the Performance Lab MS shortcut on the Main Console desktop.
Start Module 8
Wait until you see "Press Enter to continue" to proceed. Press enter.
When the script completes, you see two Remote Desktop windows open (note: you may
have to move one of the windows to display them side by side, as shown above).
The script has started a CPU intensive benchmark (SPECjbb2005) on both perf-
worker-01a and perf-worker-01b virtual machines, and a GUI is displaying the real-time
performance value as this workload runs.
If you do not see the SPECjbb2005 window open, launch the shortcut in the upper left
hand corner.
Above, we see an example screenshot where the performance of the benchmarks are
around 15,000.
IMPORTANT NOTE: Due to changing loads in the lab environment, the performance
values may vary. Please make note of the approximate Performance scores, as it will
change later.
1. Select the perf-worker-01a virtual machine from the list of VMs on the left
2. Click the Monitor tab
3. Click Performance
4. Click Advanced
5. Click on the Popup Chart icon so we can get a dedicated chart popup window.
Let's maximize the window and select specific counters via Chart Options:
1. Click the Maximize window icon (be careful not to click Close!)
2. Click Chart Options at the top
When investigating a potential CPU issue, there are several counters that are important
to analyze:
1. Select CPU on the left-hand side (if it's not already selected by default)
2. Scroll through the list, and check these counters: Demand, Ready, and Usage
in MHz
3. Only select perf-worker-01a for the Target Object (deselect 0 if it's checked)
4. Click OK
Notice the amount of CPU this virtual machine is demanding and compare that to the
amount of CPU usage the virtual machine is actually allocated (Usage in MHz). The
virtual machine is demanding more than it is currently being allowed to use.
Notice that the virtual machine is also seeing a large amount of ready time. Guidance:
Ready time > 10% could be a performance concern.
You can close this popup window, but please leave the vSphere Client window open.
• Wait: This can occur when the virtual machine's guest OS is idle (waiting for
work), or the virtual machine could be waiting on vSphere tasks. Some examples
of vSphere tasks that a vCPU may be waiting on include waiting for I/O to
complete or waiting for ESXi level swapping to complete. These non-idle vSphere
system waits are called VMWAIT.
• Ready (RDY): A CPU is in the Ready state when the virtual machine is ready to
run but unable to run because the vSphere scheduler is unable to find physical
host CPU resources to run the virtual machine on. One potential reason for
elevated Ready time is that the VM is constrained by a user-set CPU limit or
resource pool limit, reported as max limited (MLMTD).
• CoStop (CSTP): Time the vCPUs of a multi-vCPU virtual machine spent waiting
to be co-started. This gives an indication of the co-scheduling overhead incurred
by the virtual machine.
• Run: Time the virtual machine was running on a physical processor.
NOTE: vCenter reports some metrics such as "Ready Time" in milliseconds (ms). Use
the formula above to convert the milliseconds (ms) value to a percentage.
For multi-vCPU virtual machines, multiply the Sample Period by the number of vCPUs of
the VM to determine the total time of the sample period. It is also beneficial to monitor
Co-Stop time on multi-vCPU virtual machines. Like Ready time, Co-Stop time greater
than 10% could indicate a performance problem. You can examine Ready time and Co-
Stop metrics per vCPU as well as per VM. Per vCPU is the most accurate way to
examine statistics like these.
Click the Maximize window icon to get the maximum real estate.
Notice in the Chart, that only one of the CPUs (pictured here in green) on the host
seems to have any significant workload running on it. We'll see why this is the case
next.
The PowerShell/PowerCLI script that ran when we started this lab set the CPU Affinity of
both VMs (perf-worker-01a and perf-worker-01b) to CPU 1, as shown here.
Normally, affinitizing VMs to specific CPUs (also known as "pinning") is generally not a
best practice. It is only used here as a demonstration.
Switch back to the Chrome window (vSphere Client) to shut down one of the VMs:
1. Click on perf-worker-01b
2. Click on the Shut Down Guest OS (stop icon) and click YES
Let's see if the ESXi Host CPU level has dropped from 100% by shutting this VM down.
Notice that even after shutting down 1 of the VMs, CPU1 is still at 100%. Why?
Since the remaining resources went to perf-worker-01a, let's see if its performance
increased.
If you recall the scores from both VMs at the beginning of the tests, you'll notice that the
Performance of the remaining VM has increased to approximately double of its
original value, now that we have shut down the other one (and thus reduced the CPU
contention on CPU1).
Switch back to the Chrome window (vSphere Client) to shut down the remaining VM:
1. Click on perf-worker-01a
2. Click on the Shut Down Guest OS (stop icon) and click YES
Let's see if the ESXi Host CPU level has dropped from 100% by shutting this VM down.
Now that there are no VMs running on the host, CPU1 is no longer at 100%:
Summary
In summary:
• It is not recommended to set CPU affinity for VMs, as it could produce the
behavior we observed (multiple VMs vying for the same CPU resources)
• vCenter Performance Charts (accessed through the vSphere Client) are a useful
way to monitor CPU counters in real-time
• If CPU Demand > Used (Usage), then a VM is demanding more than the host is
allowing it to use. In this case, the VM should be allocated more resources,
migrated to a more powerful host, or the application itself may need to be
optimized to use less resources.
Next, let's talk about Power Management, and how to configure different power policies
at the host/BIOS level and within ESXi.
Bottom line: some form of power management is recommended and should only
be disabled if testing shows this is impacting your application performance.
For more details on how and what to configure, see this white paper:
http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf
The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS
can be configured to allow the OS (ESXi) to control the CPU power-saving features
directly:
• Under the Power Management section, set the Power Management policy to
OS Control.
For a Dell PowerEdge 12th Generation or newer server with UEFI (Unified Extensible
Firmware Interface), review the System Profile modes in the System Setup > System
BIOS settings. You see these options:
Next, you should verify the Power Management policy used by ESXi (see the next
section).
The screenshot above illustrates how a HP ProLiant server BIOS can be configured
through the ROM-Based Setup Utility (RBSU). The settings highlighted in red allow the
OS (ESXi) to control some of the CPU power-saving features directly:
Next, you should verify the Power Management policy used by ESXi (see the next
section).
The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS
can be configured to disable power management:
• Under the Power Management section, set the Power Management policy to
OS Control.
For a Dell PowerEdge 12th Generation or newer server with UEFI, review the System
Profile modes in the System Setup > System BIOS settings. You see these options:
NOTE: Disabling power management usually results in more power being consumed by
the system, especially when it is lightly loaded. The majority of applications benefit from
the power savings offered by power management, with little or no performance impact.
Therefore, if disabling power management does not realize any increased performance,
VMware recommends that power management be re-enabled to reduce power
consumption.
The screenshot above illustrates how to set the HP Power Profile mode in the server's
RBSU to the Maximum Performance setting to disable power management:
NOTE: Disabling power management usually results in more power being consumed by
the system, especially when it is lightly loaded. The majority of applications benefit from
the power savings offered by power management with little or no performance impact.
Therefore, if disabling power management does not realize any increased performance,
VMware recommends that power management be re-enabled to reduce power
consumption.
1. Select "esx-01a.corp.local"
2. Select "Configure"
3. Select "Hardware" (you will need to scroll all the way to the bottom)
4. Select "Power Management"
On a physical host, the Power Management options could look like this (it may vary
depending on the processors of the physical host).
Here you can see what ACPI states that get presented to the host and what Power
Management policy is currently active. There are four Power Management policies
available in ESXi:
NOTE: Due to the nature of this lab environment, we are not interacting directly with
physical servers, so changing the Power Management policy will not have any
noticeable effect. Therefore, while the sections that follow will describe each Power
Management policy, we won't actually change this setting.
High Performance
The High Performance power policy maximizes performance, and uses no power
management features. It keeps CPUs in the highest P-state at all times. It uses only
the top two C-states (running and halted), not any of the deep states (for example, C3
and C6 on the latest Intel processors). High performance was the default power policy
for ESX/ESXi releases prior to 5.0.
Balanced (default)
The Balanced power policy is designed to reduce host power consumption while
having little or no impact on performance. The balanced policy uses an algorithm that
exploits the processor’s P-states. This is the default power policy since ESXi 5.0.
Beginning in ESXi 5.5, we now also use deep C-states (greater than C1) in the
Balanced power policy. Formerly, when a CPU was idle, it would always enter C1. Now
ESXi chooses a suitable deep C-state depending on its estimate of when the CPU will
next need to wake up.
Low Power
The Low Power policy is designed to save substantially more power than the
Balanced policy by making the P-state and C-state selection algorithms more
aggressive, at the risk of reduced performance.
Custom
The Custom power policy starts out the same as Balanced, but allows individual
parameters to be modified.
The next step describes settings that control the Custom power policy.
1. Click inside the Filter text box (next to the Filter icon) and type the word Power.
(make sure to add a period after the word Power)
2. Click the first parameter, Power.ChargeMemoryPct
3. Note that a description and valid minimum and maximum values appear in the
lower-left corner.
4. Click CANCEL after you've reviewed this list.
• Power.MinFreqPct : Do not use any P-states slower than the given percentage
of full CPU speed.
• Power.PerfBias : Performance Energy Bias Hint (Intel only). Sets an MSR on Intel
processors to an Intel-recommended value. Intel recommends 0 for high
performance, 6 for balanced, and 15 for low power. Other values are undefined.
• Power.TimerHz : Controls how many times per second ESXi reevaluates which P-
state each CPU should be in.
• Power.UseCStates : Use deep ACPI C-states (C2 or below) when the processor is
idle.
• Power.UsePStates : Use ACPI P-states to save power when the processor is
busy.
Stop Module 8
On your desktop, find the Module Switcher window and click the Stop button for
Module 8.
Key takeaways
CPU contention problems are generally easy to detect. In fact, vCenter has several
alarms that trigger if host CPU utilization or virtual machine CPU utilization goes too
high for extended periods of times.
vSphere allows you to create very large virtual machines (up to 256 vCPUs with 6.7 U2;
see https://configmax.vmware.com/home for more information). It is highly
recommended to size your virtual machine for the application workload that runs in
them. Sizing your virtual machine with resources that are unnecessarily larger than the
workload can actually use may result in hypervisor overhead and can also lead to
performance issues.
• Rule of thumb: 1-4 vCPU on dual socket hosts, 8+ vCPU on quad socket hosts.
This rule of thumb changes as core counts increase. Try to keep vCPU count
below the core count of any single pCPU for the best performance profile. This is
due to memory locality, see module 4 about vNUMA for more details on this.
• Sizing a VM too large is wasteful. The OS will spend more time wasting cycles
trying to keep workloads in sync.
Don't expect as high of consolidation ratios with busy workloads as you did
with the low-hanging-fruit
• Configure your physical host (server BIOS) to OS Control mode as the power
policy. If applicable, enable Turbo mode, C-States (including deep C-states),
which are usually the default.
• Within ESXi, the default Balanced power management policy will achieve the
best performance per watt for most workloads.
• For applications that require maximum performance, switch the BIOS power
policy and/or the ESXi power management policy to Maximum Performance
and High Performance respectively. This includes latency-sensitive applications
that must execute within strict constraints on response time. Be aware, however,
that this typically only results in minimal performance gain, but disables all
potential power savings.
Depending on your applications and the level of utilization of your ESXi hosts, the
correct power policy setting can have a great impact on both performance and energy
consumption. On modern hardware, it is possible to have ESXi control the power
management features of the hardware platform used. You can select to use predefined
policies or you can create your own custom policy.
Recent studies have shown that it is best to let ESXi control the power policy.
Module 9 - Memory
Performance with X-Mem
(30 minutes)
Introduction
Host memory is a limited resource, but it is critical that you assign sufficient
resources (especially memory, but also CPU) to each VM so they perform
optimally.
Of course, X-Mem is not the only memory benchmark available. Here is a feature
comparison of X-Mem versus some other popular memory benchmarks like STREAM,
lmbench and Intel's mlc (source). Here is a quick summary of some key advantages that
set it apart:
• (A) Access pattern diversity. Cloud applications span many domains. They
express a broad spectrum of computational behaviors and access memory in a
mix of structured and random patterns. These patterns exhibit a variety of
readwrite ratios, spatio-temporal localities, and working-set sizes. Replication of
these memory access patterns using controlled micro-benchmarks facilitates the
study of their performance. This can be used by cloud providers to create cost-
effective hardware configurations for different classes of applications and by
subscribers to optimize their applications.
• (B) Platform variability. Cloud servers are built from a mix of instruction set
architectures (ISAs, e.g., x86-64 and ARM), machine organizations (e.g., memory
model and cache configuration), and technology standards (e.g., DDR, PCIe,
NVMe, etc.). Platforms also span a variety of software stacks and operating
systems (OSes, e.g., Linux and Windows). The interfaces and semantics of OS-
level memory management features such as large pages and non-uniform
memory access (NUMA) also vary. In order to objectively cross-evaluate
competing platforms and help optimize an application for a particular platform, a
memory characterization tool should support as many permutations of these
features as possible.
• (C) Metric flexibility. Both the subscriber’s application-defined performance
and the provider’s costs depend on memory performance and power. Unlike X-
Mem, most tools do not integrate memory power measurement.
• (D) Tool extensibility. Cloud platforms have changed considerably over the last
decade and continue to evolve in the future. Emerging non-volatile memories
(NVMs) introduce new capabilities and challenges that require special
consideration. Unfortunately, most existing characterization tools are not easily
extensible. X-Mem is being actively maintained and extended for ongoing
research needs.
There is a research tool paper describing the motivation, design, and implementation of
X-Mem as well as three experimental case studies using tools to deliver insights useful
to both cloud providers and subscribers. For more information, see the following links:
Citation:
Mark Gottscho, Sriram Govindan, Bikash Sharma, Mohammed Shoaib, and Puneet
Gupta. X-Mem: A Cross-Platform and Extensible Memory Characterization Tool for the
Cloud. In Proceedings of the IEEE International Symposium on Performance Analysis of
Systems and Software (ISPASS), pp. 263-273. Uppsala, Sweden. April 17-19, 2016.
DOI: http://dx.doi.org/10.1109/ISPASS.2016.7482101
Downloading/Installing X-Mem
This lesson describes how to download the X-Mem benchmark. There are prebuilt
binaries for Windows and Linux; this lab demonstrates X-Mem inside of Windows
VMs.
There are multiple ways to obtain X-Mem, but the easiest is to go to http://nanocad-
lab.github.io/X-Mem/ and click the Binaries (zip) button, which has precompiled
binaries for Windows. If you're using Linux, or wish to make modifications to the source
code, click the appropriate link.
Runtime Prerequisites
There are a few runtime prerequisites in order for the software to run correctly. Note
that these requirements are for the pre-compiled binaries that are available on the
project homepage at https://nanocad-lab.github.io/X-Mem. Also note that these
requirements are already met using our lab environment:
HARDWARE:
• Intel x86, x86-64, x86-64+AVX, or MIC (Xeon Phi/Knights Corner) CPU. AMD CPUs
that are compatible with Intel Architecture ISAs should also work fine.
• ARM Cortex-A series processors with VFP and NEON extensions. Specifically
tested on ARM Cortex A9 (32-bit) which is ARMv7. 64-bit builds for ARMv8-A
should also work but have not been tested. GNU/Linux builds only. ARM on
Windows can compile using VC++, but cannot link due to a lack of library support
for desktop/command-line ARM apps. This may be resolved in the future. If you
can get this working, let us know!
WINDOWS:
GNU/LINUX:
• GNU utilities with support for C++11. Tested with gcc 4.8.2 on Ubuntu 14.04 LTS
for x86 (32-bit), x86-64, x86-64+AVX, and MIC on Intel Sandy Bridge, Ivy Bridge,
Haswell, and Knights Corner families.
• libhugetlbfs. You can obtain it at https://github.com/libhugetlbfs/libhugetlbfs. On
Ubuntu systems, you can install using sudo apt-get install libhugetlbfs0 . If you
don't have this or cannot install it, this should be fine but you will not be able to
use large pages. Note that this package requires Linux kernel 2.6.16 or later. This
should not be an issue on most modern Linux systems.
• Potentially, administrator privileges, if you plan to use the --large_pages option.
◦ During runtime, if the --large_pages option is selected, you may need to
first manually ensure that large pages are available from the OS. This can
be done by running hugeadm --pool-list . It is recommended to set minimum
pool to 1GB (in order to measure DRAM effectively). If needed, this can be
done by running hugeadm --pool-pages-min 2MB:512 . Alternatively, run the
linux_setup_runtime_hugetlbfs.sh script that is provided with X-Mem.
Installation
Fortunately, the only file that is needed to run X-Mem on Windows is the respective
executable xmem-win-.exe on Windows, and xmem-linux- on GNU/Linux. It has no other
dependencies aside from the pre-installed system prerequisites which were just
outlined.
Running X-Mem
Launch Performance Lab Module Switcher
Double click on the Performance Lab MS shortcut on the Main Console desktop
Launch Module 9
NOTE: Please wait a couple of minutes, and do not proceed with the lab until you see
Remote Desktop windows appear.
The script opens Remote Desktop Connections to two Windows VMs. However, we need
to make both of them visible. Drag the title bars of the Remote Desktop windows:
Given #5, you might think the memory performance of these two VMs should be
identical. As we'll see, X-Mem can run multiple worker threads to exercise multiple
CPUs simultaneously, allowing better scalability with more vCPUs.
Command-
Purpose
line Option
# of worker threads to use in
benchmarks.
-j
NOTE: Can not be larger than the #
of vCPUs.
Command-
Purpose
line Option
# of iterations to run; helps ensure
-n consistency (the results shouldn't
fluctuate much)
Throughput benchmark mode (as
-t opposed to -l for latency
benchmark mode)
-R Use memory read-based patterns
-W Use memory write-based patterns
Here is a summary of some of the command-line options we'll be using in this lab, but X-
Mem has many more options to customize how it is run.
As you can see, X-Mem has a ton of options! Let's look at some we'll be using for this
lab.
You should already have a Command Prompt window open on perf-worker-01b from
the previous step; if not, click the Command Prompt icon on the taskbar.
Let's try to run X-Mem with a couple of command-line parameters we just saw: -t to
test memory throughput, and -j2 to run two worker threads:
ERROR: Number of worker threads may not exceed the number of logical CPUs (1)
This is expected, because if you recall, this VM only has one virtual CPU.
Now run the exact same X-Mem command that failed on perf-worker-01b on perf-
worker-01a:
This command ran successfully on this VM, because it has 4 virtual CPUs (so -j3 or -j4
would also work). Next, let's take a closer look at the results.
Once you're back at the command prompt, use the scrollbar to scroll back up and look
at the results:
1. The first benchmark throughput test, Test #1T, will show Read/Write Mode:
read. Since we specified -j2, the output shows that it ran 2 worker threads.
The result in this example was 90664.66 MB/s (or 90.664 GB/s). Note that your
performance may vary, given the shared resources of the hands-on lab
environment (where many other workloads are running).
2. The second benchmark throughput test, Test #2T, will show Read/Write Mode:
write. Since we specified -j2, the output shows that it ran 2 worker threads.
The result in this example was 44113.39 MB/s (or 44.11 GB/s). Note that your
performance may vary, given the shared resources of the hands-on lab
environment (where many other workloads are running).
Why did the second test have lower (in this case, about half) the throughput of the first?
Well, writes are almost always more expensive than reads; this is true for
memory/RAM, and other subsystems, such as disk storage I/O.
Let's further customize the X-Mem command line options, again on perf-worker-01a:
1. Make sure the focus is on the Command Prompt of the perf-worker-01a Remote
Desktop window (if it isn't already)
2. Type this command and press Enter: xmem -t -R -j4 -n5
3. The results will be listed under the *** RESULTS*** heading, as shown here.
Notice that the benchmark ran differently due to the different command line we used.
Here is an explanation of each option:
Once you're back at the command prompt, use the scrollbar to scroll back up and look
at the results. In this example, the results are consistently around 170,000 MB/sec (170
GB/sec). Since we specified -j4 , it ran four worker threads, so the memory
performance is significantly higher than when we ran with two worker threads.
NOTE: Given the nature of our hands-on lab environment, your results may (and
probably will) vary from this example.
On the main console, find the Module Switcher window and click Stop.
Key takeaways
During this lab, we learned that X-Mem is a flexible memory benchmark tool. It can:
You can download this tool to run in your environment to ensure you are getting optimal
memory performance out of your hosts and virtual machines.
Conclusion
This concludes the Memory Performance with X-Mem module. We hope you have
enjoyed taking it. Please don't forget to fill out the survey when you finish.
Module 10 - Storage
Performance and
Troubleshooting (30
minutes)
Despite advances in the interconnects, performance limit is still hit at the media
itself. In fact, 90% of storage performance cases seen by GSS (Global Support
Services - VMware support) that are not configuration related are media related.
Some things to remember:
A good rule of thumb on the total number of IOPs any given disk provides:
So, if you want to know how many IOPs you can achieve with a given number of
disks:
This test demonstrates some methods to identify poor storage performance and
how to resolve it using VMware Storage DRS for workload balancing. The first step
is to prepare the environment for the demonstration.
Double click on the Performance Lab MS shortcut on the Main Console desktop.
Launch Module 10
Click on the Start button under Module 10. The script configures and starts up the
virtual machines and launches a storage workload using Iometer.
The script may take up to five minutes to complete. While the script runs,
spend a few minutes on reading through the next step to gain understanding
on storage latencies.
When we think about storage performance problems, the top issue is generally latency,
so we need to look at the storage stack and understand what layers there are in the
storage stack and where latency can build up.
At the top most layer is the Application running in the guest operating system. That is
ultimately the place where we most care about latency. This is the total amount of
latency that application sees and it include the latencies off the total storage stack
including the guest OS, the VMKernel virtualization layers, and the physical hardware.
ESXi can’t see application latency because that is a layer above the ESXi virtualization
layer.
From ESXi we see three main latencies that are reported in esxtop and vCenter.
The top most is GAVG, or Guest Average latency, that is the total amount of latency
that ESXi can detect.
That is not saying this is the total amount of latency the application sees. In fact, if you
compare the GAVG (the Total Amount of Latency ESX is seeing) and the Actual latency
the Application is seeing, you can tell how much latency the Guest OS is adding to the
storage stack. This could tell you if the guest OS is configured incorrectly or is causing a
performance problem. For example, if ESXi is reporting GAVG of 10ms, but the
application or perfmon in the guest OS is reporting Storage Latency of 30ms, that
means that 20ms of latency is somehow building up in the Guest OS Layer, and you
should focus your debugging on the Guest OS's storage configuration.
DAVG is basically how much time is spent in the Device from the driver HBA and
storage array
KAVG is how much time is spent in the ESXi Kernel (so how much over is the kernel
adding).
KAVG is actually a derived metric - ESXi does not specifically calculate KAVG. ESXi
calculates KAVG with the following formula:
The VMKernel is very efficient in processing IO, so there really should not be any
significant time that an IO should wait in the kernel or KAVG. KAVG should be equal to 0
in well configured / running environments. When KAVG is not equal to 0, then that most
likely means that the IO is stuck in a Kernel Queue inside the VMKernel. So the vast
majority of the time KAVG equals QAVG or Queue Average latency (the amount of
time an IO is stuck in a queue waiting for a slot in a lower queue to free up so it can
move down the stack).
When the storage script has completed, you should see two IOmeter windows, and two
storage workloads should be running.
Select perf-worker-03a
1. Select "perf-worker-03a"
1. Select "Monitor"
2. Select "Performance"
3. Select "Advanced"
4. Click "Chart Options"
The disk that IOmeter uses for generating workload is scsi0:1 or sdb inside the guest.
Guidance: Device latencies that are greater than 20ms may see a performance impact
in your applications.
Due to the way we create a private datastore for this test, we actually have pretty good
low latency numbers. scsi0:1 is located on an iSCSI datastore based on a RAMdisk on
perf-worker-04a (DatastoreA) running on the same ESXi host as perf-worker-03a. Hence,
latencies are low for a fully virtualized environment.
vSphere provides several storage features to help manage and control storage
performance:
Space utilization load balancing: You can set a threshold for space use. When
space use on a datastore exceeds the threshold, Storage DRS generates
recommendations or performs Storage vMotion migrations to balance space use
across the datastore cluster.
I/O latency load balancing: You can set an I/O latency threshold for bottleneck
avoidance. When I/O latency on a datastore exceeds the threshold, Storage DRS
generates recommendations or performs Storage vMotion migrations to help
alleviate high I/O load. Remember to consult your storage vendor to get their
recommendation on using I/O latency load balancing.
Anti-affinity rules: You can create anti-affinity rules for virtual machine disks.
For example, the virtual disks of a certain virtual machine must be kept on
different datastores. By default, all virtual disks for a virtual machine are placed
on the same datastore.
1. Click on ACTIONS
2. Go to Storage
3. Click on New Datastore Cluster...
1. We can specify a name for the Datastore cluster, but leave it at the default of
DatastoreCluster.
2. Click NEXT
1. Move the slider all the way to the left to specify a 50% Utilized space
threshold.
2. Click NEXT
Since this lab is a nested virtual environment, it is difficult to demonstrate high latency
in a reliable manner. Therefore we do not use I/O latency to demonstrate load balancing.
The default is to check for storage cluster imbalances every eight hours, but it can be
changed to 60 minutes as a minimum.
Select Datastores
Ready to Complete
Take a note of the name of the virtual machine that Storage DRS (SDRS)
wants to migrate.
1. Select DatastoreCluster
2. Select the Monitor tab
3. Select Storage DRS / Recommendations
4. Click RUN STORAGE DRS NOW
5. Click APPLY RECOMMENDATIONS
Notice that SDRS recommends moving one of the workloads from DatastoreA to
DatastoreB. It is making the recommendation based on capacity. SDRS makes storage
moves based on performance only after it has collected performance data for more than
eight hours. Since the workloads just recently started, SDRS would not make a
recommendation to balance the workloads based on performance until it has collected
more data.
1. Select Configure
2. Select Storage DRS
3. Select the dropdown arrows to observe the different SDRS settings you can
configure
A number of enhancements have been made to Storage DRS to remove some of the
previous limitations:
Common for all these improvements is that they all require VASA 2.0, which
requires that the storage vendor has an updated storage provider.
Now you should see the performance chart you created earlier in this module.
Notice how the throughput has increased and how the latency is lower (green arrows),
than it was when both VMs shared the same datastore.
Return the Iometer workers, and see how they also report increased performance and
lower latencies.
It takes a while for Iometer to show these higher numbers, maybe ten minutes. This due
to the way the storage performance is throttled in this lab. If you want to try a shortcut:
The workload should spike but then settle at the higher performance level in a couple of
minutes.
Stop Module 10
On the main console, find the Module Switcher window and click Stop for Module 10.
Key takeaways
During this lab we saw the importance of sizing your storage correctly with respect to
space and performance. It also shows that sometimes when you have two storage
intensive sequential workloads sharing the same spindles, the performance can be
greatly impacted. If possible try to keep workloads separated; keep sequential
workloads separate (back by different spindles/LUNs) from random workloads.
In general, we aim to keep storage latencies under 20ms, lower if possible, and monitor
for frequent latency spikes of 60ms or more which would be a performance concern and
something to investigate further.
Guidance: From a vSphere perspective, for most applications, the use of one large
datastore vs. several small datastores tends not to have a performance impact.
However, the use of one large LUN vs. several LUNs is storage array dependent and
most storage arrays perform better in a multi-LUN configuration than a single large LUN
configuration.
Guidance: Follow your storage vendor’s best practices and sizing guidelines to properly
size and tune your storage for your virtualized environment.
Module 11 - Network
Performance, Basic
Concepts and
Troubleshooting (15
minutes)
In the following module, we will show you how to monitor and troubleshoot some
network-related issues so that you can troubleshoot similar issues that may exist
in your own environment.
To start this module, double-click on the Performance Lab MS shortcut on the Main
Console desktop.
Start Module 11
In our lab environment, it's not feasible to attempt to saturate the network (we'd
like others to be able to take labs without delays!). Therefore, this module
focuses on creating network load and showing you where to look when you
suspect network problems in your own environment.
NOTE: You might see different results on your screen, which is to be expected
giving the variability of the lab environments.
Depending on the time it took to get here, the network load test might be done. You
should still be able to see the network load that ran and finished.
1. Here you can see the graphical representation of the network load of perf-
worker-02a
2. Here you can see the counters we selected in the previous step (Packets
received, transmitted, and overall Usage in KBps) and a real-time view of their
values
• Usage: If this number is higher than expected, you may want to consider
segregating this VM onto a separate virtual switch or VLAN from other VMs
• Packets received and Packets transmitted: If these values get too high, it
could lead to dropped packets which need to be retransmitted.
In this example, there are no dropped packets at the host level, which indicates the
hosts' NICs are not the bottleneck.
NOTE: You might see different results depending upon the lab environment conditions.
Stop Module 11
Key takeaways
During this lab we saw how to diagnose networking problems, both at a VM and at an
ESXi host level, using the vSphere Client's built-in performance charts.
• If you want real time performance, esxtop is a great tool for just that, and it's
covered in a different module.
• If you want long term performance statistics at a datacenter level, vRealize
Operations is the right tool.
If you want to know more about troubleshooting network performance, see this VMware
KB article:
"Troubleshooting network performance issues in a vSphere
environment": http://kb.vmware.com/kb/1004087
Module 12 - Advanced
Performance Feature:
Latency Sensitivity
Setting (45 minutes)
Since this feature is set on a per-VM basis, a mixture of both normal VMs and
latency sensitive workload VMs can be run on a single vSphere host.
The latency sensitivity feature is intended only for specialized use cases, namely,
workloads that require extremely low latency. It is extremely important to determine if
your workload could benefit from this feature before enabling it. Latency sensitivity
provides extremely low network latency performance with a tradeoff of increased CPU
and memory cost because of reduced resource sharing and increased power
consumption.
Before making the decision to leverage VMware’s latency sensitivity feature, perform
the necessary cost-benefit analysis if this feature is necessary. Choosing to enable this
feature just because it exists can lead to higher host CPU utilization, higher power
consumption, and it can needlessly impact performance of the other VMs running on the
host.
Choosing whether to enable the latency sensitivity or not is one of those “Just because
you can doesn’t mean you should” choices. The Latency sensitivity feature reduces
network latency. Latency sensitivity does not decrease application latency, especially
if latency is influenced by storage or other sources of latency besides the network.
The latency sensitivity feature should be enabled in environments in which the CPU is
under committed. VMs which have latency sensitivity set to High are given exclusive
access to the physical CPU on the host. This means the latency sensitive VM can no
longer share the CPU with neighboring VMs.
Generally, VMs that use the latency sensitivity feature should have fewer vCPUs than
the number of cores per socket in your host to ensure that the latency sensitive VM
occupies only one NUMA node.
If the latency sensitivity feature is not relevant to your environment, consider choosing
a different module.
When a VM has 'High' latency sensitivity set in vCenter, the VM is given exclusive
access to the physical cores it needs to run. This is termed exclusive affinity. These
cores will be reserved for the latency sensitive VM only, which results in greater CPU
accessibility to the VM and less L1 and L2 cache pollution from multiplexing other VMs
onto the same cores. When the VM is powered on, each vCPU is assigned to a particular
physical CPU and remains on that CPU.
When the latency sensitive VM's vCPU is idle, ESXi also alters its halting behavior so that
the physical CPU remains active. This reduces wakeup latency when the VM becomes
active again.
A virtual NIC (vNIC) is a virtual device that exchanges network packets between the
VMkernel and the guest operating system. Exchanges are typically triggered by
interrupts to the guest OS or by the guest OS calling into VMkernel, both of which are
expensive operations. Virtual NIC interrupt coalescing, which is enabled by default in
vSphere, attempts to reduce CPU overhead by holding back packets for some time
(combining or "coalescing" these packets) before triggering interrupts, which causes the
hypervisor to wake up VMs more frequently.
Enabling 'High' latency sensitivity disables virtual NIC coalescing, so that there is
less latency between when a packet is sent or received and when the CPU is interrupted
to process the packet. Typically, coalescing is desirable for higher throughput (so the
CPU isn't interrupted as often), but it can introduce network latency and jitter.
While disabling coalescing can reduce latency, it can also increase CPU utilization and
thus power usage. Therefore this option should only be used in environments with small
packet rates and plenty of CPU headroom.
Are you ready to get your hands dirty? Let's start the hands-on portion of this lab.
Please check to see that your lab is finished all the startup routines and is ready for you
to start. If you see anything other than "Ready", please wait a few minutes. If after five
minutes you lab has not changed to "Ready", please ask for assistance.
Login to vCenter
1. Ensure Hosts and Clusters is the view in the vSphere Client by clicking the
highlighted icon
2. Select the challenge-04a VM highlighted
3. Note this VM has 2 CPUs and 2 GB of Memory configured.
4. Click the Edit Settings icon so we can enable the setting.
Go to Advanced VM Settings
1. Select VM Options
2. Expand the Advanced pulldown
3. Scroll down
After you scroll down, you should see the Latency Sensitivity setting.
Now, let's try to power on this VM. Hint: We may have to do a couple of more things
before it powers on successfully, but we'll learn how to do these as well.
Let's try to power on the VM, and note the error that comes up.
Let's set the CPU reservation for the challenge-04a VM to resolve the power-on failure.
Try to power on the VM again now that the CPU reservation has been set.
Let's try to power on the VM, and note the error that comes up.
Let's set the Memory Reservation for the challenge-04a VM to resolve the power-on
failure.
Let's try to power on the VM again, now that both the CPU and memory reservations
have been set.
Power on challenge-04a VM
Let's try to power on the VM, and note that shouldn't be any more errors.
Open PuTTY
Click on the PuTTY icon so we can SSH to the host that is running the challenge-04a
VM.
PuTTY to esx-01a
Launch esxtop
Type the f key (short for fields) to see a display like the above.
We want to remove the "F" field (CPU State Times), and add the "I" field (CPU Summary
Stats).
Type the uppercase F and I keys and you should see the CPU Summary Stats
selected now.
There are still many fields in esxtop, so expand the window by clicking and dragging
the edge of the right border of the window to the right.
1. Note the GID of your VM (279396 in this example but is different in your lab
environment)
2. Note that EXC_AF is Y - this is new with ESXi 6.7; it confirms that the VM has
exclusive affinity
Note that we now see much more information about the challenge-04a VM, including
processes, which CPUs those processes are on, and so on.
Switch back to the vSphere Client (Chrome window) and let's look at the CPU and
Memory usage, now that we have created the necessary Reservations for Latency
Sensitivity to be High.
NOTE: While these reservations are necessary for latency sensitivity, keep in
mind this excludes the ability for ESXi to share or free up idle resources for other
VMs.
Go to Advanced VM Settings
1. Select VM Options
2. Expand the Advanced pulldown
3. Scroll down
After you scroll down, you can see the Latency Sensitivity setting.
Summary
Conclusion
This concludes the Latency Sensitivity module. We hope you have enjoyed
taking it. Please do not forget to fill out the survey when you are finished.
Key takeaways
The Latency Sensitivity setting is easy to configure, but you should determine
whether your application fits the definition of "High" latency sensitivity.
To review:
If you want to learn more about running latency sensitive applications on vSphere,
consult these white papers:
• http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-
Workloads.pdf
• http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf
Now that you’ve completed this lab, try testing your skills with VMware Odyssey, our
newest Hands-on Labs gamification program. We have taken Hands-on Labs to the next
level by adding gamification elements to the labs you know and love. Experience the
fully automated VMware Odyssey as you race against the clock to complete tasks and
reach the highest ranking on the leaderboard. Try the vSphere Performance Odyssey lab
Conclusion
Thank you for participating in the VMware Hands-on Labs. Be sure to visit
http://hol.vmware.com/ to continue your lab experience online.
Version: 20201130-191634