Hol 2004 01 SDC - PDF - en

HOL-2004-01-SDC
Table of Contents
Lab Overview - HOL-2004-01-SDC - Mastering vSphere Performance............................... 3
Lab Introduction ...................................................................................................... 4
Lab Guidance .......................................................................................................... 5
Module 1 - vSphere 6.7 Performance: What's New? (30 minutes) .................................. 11
Introduction........................................................................................................... 12
Faster Lifecycle Management................................................................................ 14
vCenter Server 6.7 ................................................................................................ 19
Core Platform Improvements ................................................................................ 20
Conclusion............................................................................................................. 40
Module 2 - Right-Sizing vSphere VMs for Optimal Performance (45 minutes)................. 41
Introduction........................................................................................................... 42
NUMA and vNUMA ................................................................................................. 43
vCPU and vNUMA Right-Sizing .............................................................................. 47
Guest OS Tools to View vCPUs/vNUMA .................................................................. 51
Conclusion............................................................................................................. 56
Module 3 - Introduction to esxtop (30 minutes) .............................................................. 57
Introduction to esxtop ........................................................................................... 58
Show esxtop CPU features .................................................................................... 59
Show esxtop memory features ............................................................................. 74
Show esxtop storage features............................................................................... 81
Show esxtop network features .............................................................................. 88
Conclusion and Clean-Up ...................................................................................... 92
Module 4 - esxtop in Real-World Use Cases (30 minutes) ............................................... 94
esxtop in Real-World Use Cases ............................................................................ 95
Creating an esxtop resource file ........................................................................... 96
Saving esxtop statistics with batch mode .......................................................... 109
Graphing esxtop statistics................................................................................... 115
Conclusion and Clean-Up .................................................................................... 127
Module 5 - vCenter Performance Analysis (30 minutes) ............................................... 129
Introduction......................................................................................................... 130
vCenter Server Appliance Management Interface (VAMI) ................................... 132
Tools for Detailed Analysis: vimtop ..................................................................... 141
Tools for Detailed Analysis: vpxd profiler logs ..................................................... 148
Tools for Detailed Analysis: PostgreSQL logs and pg_top .................................... 153
Clients (UI and API) Performance Tips ................................................................. 160
Module 6 - Database Performance Testing with DVD Store (30 minutes) ...................... 163
Introduction......................................................................................................... 164
What is DVD Store 3? .......................................................................................... 165
Downloading/Installing DVD Store 3.................................................................... 167
Building a DVD Store 3 Database/Starting the Lab ............................................. 170
Configuring/Running DVD Store 3 ....................................................................... 174
HOL-2004-01-SDC Page 1
HOL-2004-01-SDC
Analyzing Results/Improving DVD Store 3 Performance ...................................... 178

Module 7 - Application Performance Testing with Weathervane (45 minutes)............... 184
Introduction......................................................................................................... 185
What is Weathervane? ........................................................................................ 186
Downloading/Installing Weathervane .................................................................. 192
Configuring Weathervane ................................................................................... 198
Running/Tuning Weathervane ............................................................................. 206
Module 8 - Processor Performance Monitoring, Host Power Management (30 minutes)214
Intro to CPU Performance Monitoring and Host Power Management................... 215
CPU Contention, vCenter Performance Charts .................................................... 216
Configuring Server BIOS Power Management ..................................................... 231
Configuring ESXi Host Power Management ......................................................... 236
Module 9 - Memory Performance with X-Mem (30 minutes) ......................................... 245
Introduction......................................................................................................... 246
What is X-Mem / Why X-Mem? ............................................................................ 247
Downloading/Installing X-Mem ............................................................................ 249
Running X-Mem ................................................................................................... 252
Module 10 - Storage Performance and Troubleshooting (30 minutes)........................... 261
Introduction to Storage Performance Troubleshooting ........................................ 262
Storage I/O Contention........................................................................................ 264
Storage Cluster and Storage DRS ....................................................................... 271
Module 11 - Network Performance, Basic Concepts and Troubleshooting (15 minutes) 287
Introduction to Network Performance ................................................................. 288
Monitor network activity with performance charts.............................................. 291
Module 12 - Advanced Performance Feature: Latency Sensitivity Setting (45 minutes)297
Introduction to Latency Sensitivity...................................................................... 298
Enabling and Confirming the Latency Sensitivity setting .................................... 301
Conclusion .......................................................................................................... 323
HOL-2004-01-SDC
Lab Overview -
HOL-2004-01-SDC -
Mastering vSphere
Performance
HOL-2004-01-SDC
Lab Introduction
This lab, HOL-2004-01-SDC, Mastering vSphere Performance, has a lot of content,
broken down into modules. First, you'll learn about what specifically is new and
improved with the current vSphere 6.7 release. You will also work with a broad array of
benchmarks such as DVD Store, Weathervane, and X-Mem and performance monitoring
tools such as esxtop and advanced performance charts to both measure performance
and diagnose bottlenecks in a vSphere environment. We also explore performance-
related vSphere features such as right-sizing virtual machines, virtual NUMA, Latency
Sensitivity and Host Power Management.
While the time available in this lab constrains the number of performance problems we
can review as examples, we have selected relevant problems that are commonly seen in
vSphere environments. Walking through these examples can help you understand and
troubleshoot typical performance problems.
For the complete Performance Troubleshooting Methodology and a list of VMware Best
Practices, please visit the www.vmware.com website:
• https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/
techpaper/performance/whats-new-vsphere67-perf.pdf
techpaper/performance/whats-new-vsphere65-perf.pdf
techpaper/performance/drs-enhancements-vsphere67-perf.pdf
techpaper/drs-vsphere65-perf.pdf
Furthermore, if you have interest in performance related articles, make sure that you
monitor the VMware VROOM! Blog:
https://blogs.vmware.com/performance/
HOL-2004-01-SDC
Lab Guidance
Note: It takes more than 90 minutes to complete this lab. You should
expect to only finish two or three of the modules during your time. The
modules are independent of each other, so you can start at the
beginning of any module and proceed from there. You can use the Table
of Contents to access any module of your choosing at any point in the
lab.
You can find the Table of Contents in the upper right-hand corner of the
Lab Manual.
Lab Module List:
• Module 1 - vSphere 6.7 Performance: What's New? (30 minutes) (Basic)

• Module 2 - Right-Sizing vSphere VMs for Optimal Performance (45 minutes)
(Basic)
• Module 3 - Introduction to esxtop (30 minutes) (Basic)
• Module 4 - esxtop in Real-World Use Cases (30 minutes) (Intermediate)
• Module 5 - vCenter Performance Analysis (30 minutes) (Intermediate)
• Module 6 - Database Performance testing with DVD Store (30 minutes)
(Basic)
• Module 7 - Application Performance Testing with Weathervane (45 minutes)
(Basic)
• Module 8 - Processor Performance Monitoring, Host Power Management (30
minutes) (Basic)
• Module 9 - Memory Performance with X-Mem (30 minutes) (Intermediate)
• Module 10 - Storage Performance and Troubleshooting (30 minutes) (Basic)
• Module 11 - Network Performance, Basic Concepts and Troubleshooting (15
minutes) (Basic)
• Module 12 - Advanced Performance Feature: Latency Sensitivity Setting (45
minutes) (Advanced)
Lab Captains:
• David Morse - Performance Engineer, US

• Lisa Roderick - Performance Engineer, US
This lab manual can be downloaded from the Hands-on Labs Document site found
here:
http://docs.hol.vmware.com This lab may be available in other languages. To set

your language preference and have a localized manual deployed with your lab,
you may utilize this document to help guide you through the process:
http://docs.hol.vmware.com/announcements/nee-default-language.pdf
HOL-2004-01-SDC
Location of the Main Console
1. The area in the RED box contains the Main Console. The Lab Manual is on the tab
to the Right of the Main Console.
2. A particular lab may have additional consoles found on separate tabs in the upper
left. You are directed to open another specific console if needed.
3. Your lab starts with 90 minutes on the timer. The lab cannot be saved. All your
work must be done during the lab session, but you can click EXTEND to increase
your time. If you are at a VMware event, you can extend your lab time twice for
up to 30 minutes; each click gives you an additional 15 minutes. Outside of
VMware events, you can extend your lab time up to 9 hours and 30 minutes; each
click gives you an additional hour.
Alternate Methods of Keyboard Data Entry
During this module, you input text into the Main Console. Besides directly typing it in,
there are two helpful methods of entering data which make it easier to enter complex
data.
HOL-2004-01-SDC
Click and Drag Lab Manual Content Into Console Active

Window
You can also click and drag text and Command Line Interface (CLI) commands directly
from the Lab Manual into the active window in the Main Console.
Accessing the Online International Keyboard
You can also use the Online International Keyboard found in the Main Console.
1. Click on the Keyboard Icon found on the Windows Quick Launch Task Bar.
HOL-2004-01-SDC
Click once in active console window
In this example, you will use the Online Keyboard to enter the "@" sign used in email
addresses. The "@" sign is Shift-2 on US keyboard layouts.
1. Click once in the active console window.

2. Click on the Shift key.
Click on the @ key
1. Click on the "@ key".
Notice the @ sign entered in the active console window.
HOL-2004-01-SDC
Activation Prompt or Watermark
When you first start your lab, you may notice a watermark on the desktop indicating
that Windows is not activated.
One of the major benefits of virtualization is that virtual machines can be moved and
run on any platform. The Hands-on Labs utilizes this benefit, and we are able to run the
labs out of multiple datacenters. However, these datacenters may not have identical
processors, which triggers a Microsoft activation check through the Internet.
Rest assured, VMware and the Hands-on Labs are in full compliance with Microsoft
licensing requirements. The lab that you are using is a self-contained pod and does not
have full access to the Internet, which is required for Windows to verify the activation.
Without full access to the Internet, this automated process fails and you see this
watermark.
This cosmetic issue has no effect on your lab.
HOL-2004-01-SDC
Look at the lower right portion of the screen
1. Please check to see that your lab is finished all the startup routines and is ready
for you to start.
If you see anything other than "Ready", please wait a few minutes. If after five minutes
your lab has not changed to "Ready", please ask for assistance.
HOL-2004-01-SDC
Module 1 - vSphere 6.7

Performance: What's
New? (30 minutes)
HOL-2004-01-SDC
Introduction
Underlying each release of VMware vSphere® are many performance and
scalability improvements. The vSphere 6.7 platform continues to provide industry-
leading performance and features to ensure the successful virtualization and
management of your entire software-defined datacenter.
Check the Lab Status in the lower-right of the desktop
Please check to see that your lab is finished all the startup routines and is ready for you
to start. If you see anything other than "Ready", please wait a few minutes. If after 5
minutes you lab has not changed to "Ready", please ask for assistance.
Open Google Chrome
First, let's open Google Chrome.
HOL-2004-01-SDC
Login to vCenter
This is the vCenter login screen. To login to vCenter:
1. Check the Use Windows session authentication checkbox

2. Click the LOGIN button
Select Hosts and Clusters
• Click on the Hosts and Clusters icon (if it isn't already underlined)
HOL-2004-01-SDC
Faster Lifecycle Management

VMware vSphere 6.7 includes several improvements that accelerate the host
lifecycle management experience to save administrators valuable time.
New vSphere Update Manager Interface
To see the Update Manager in our lab environment:
1. Click on the Menu dropdown

2. Select Update Manager
HOL-2004-01-SDC
Update Manager
This release of vSphere includes a brand-new Update Manager interface that is part
of the HTML5 Web Client.
1. Click the Updates tab

2. Click the ID column twice to sort by the most recent Update ID
3. Click a radio button to select an update (note: your environment may be
different, as new updates are continually released)
4. Use the vertical scrollbar to scroll down see more information about the
selected update
Update Manager in vSphere 6.7 keeps VMware ESXi 6.x hosts reliable and secure by
making it easy for administrators to deploy the latest patches and security fixes. When
the time comes to upgrade older releases to the latest version of ESXi 6.7, Update
Manager makes that task easy, too.
The new HTML 5 Update Manager interface is more than a simple port from the old Flex
client – the new UI provides a much more streamlined remediation process. For
example, the previous multi-step remediation wizard is replaced with a much more
efficient workflow, requiring just a few clicks to begin the procedure. In addition to that,
HOL-2004-01-SDC
the pre-check is now a separate operation, allowing administrators to verify that a

cluster is ready for upgrade before initiating the workflow.
As of vSphere 6.7 Update 1, the HTML5 Client is now ‘Fully Featured’. This means that
you can manage all aspects of your vSphere environment using the HTML5-based
vSphere Client, no need to switch back and forth between the vSphere Client and the
vSphere Web Client. We’ve ported all features including VMware Update Manager
(VUM). Read about all the features released in this version of the vSphere Client by
visiting Functionality Updates for the vSphere Client site.
Faster Upgrades from ESXi 6.5 to 6.7
Hosts that are currently on ESXi 6.5 upgrade to 6.7 significantly faster than ever before.
This is because several optimizations have been made for that upgrade path, including
eliminating one of two reboots traditionally required for a host upgrade. In the past,
hosts that were upgraded with Update Manager were rebooted a first time in order to
initiate the upgrade process, and then rebooted once again after the upgrade was
complete.
Modern server hardware, equipped with hundreds of gigabytes of RAM, typically take
several minutes to initialize and perform self-tests. Doing this hardware initialization
twice during an upgrade really adds up, so this new optimization will significantly
shorten the maintenance windows required to upgrade clusters of vSphere
infrastructure.
These new improvements reduce the overall time required to upgrade clusters,
shortening maintenance windows so that valuable efforts can be focused
elsewhere.
Recall that, because of DRS and vMotion, applications are never subject to
downtime during hypervisor upgrades – VMs are moved seamlessly from host to host
as needed.
ESXi 6.7 Update Manager Video (3:48)
Since this lab runs in the cloud, it is not practical to upgrade an ESXi host to 6.7.
Instead, check out this video to see how the process works:
HOL-2004-01-SDC
vSphere Quick Boot
vSphere 6.7 introduces vSphere Quick Boot – a new capability designed to reduce the
time required for a VMware ESXi host to reboot during update operations.
Host reboots occur infrequently but are typically necessary after activities such as
applying a patch to the hypervisor or installing a third-party component or driver.
Modern server hardware that is equipped with large amounts of RAM may take many
minutes to perform device initialization and self-tests.
Quick Boot eliminates the time-consuming hardware initialization phase by shutting

down ESXi in an orderly manner and then immediately re-starting it. If it takes several
minutes or more for the physical hardware to initialize devices and perform necessary
self-tests, then that is the approximate time savings to expect when using Quick Boot!
In large clusters that are typically remediated one host at a time, it’s easy to see how
this new technology can substantially shorten time requirements for data center
maintenance windows.
Quick Boot video (1:53)
Since this lab runs in the cloud, we can't show a reboot of a physical host. Instead,
check out this video to see how it works!
HOL-2004-01-SDC
Conclusion
The new streamlined Update Manager interface, single reboot upgrades, and vSphere
Quick Boot shorten the time required for host lifecycle management operations and
make VMware vSphere 6.7 the Efficient and Secure Platform for your Hybrid Cloud.
HOL-2004-01-SDC
vCenter Server 6.7

vSphere 6.7 delivers an exceptional experience with an enhanced VMware
vCenter® Server Appliance™ (vCSA). vSphere 6.7 adds functionality to support
not only the typical workflows customers need but also other key functionality
such as managing VMware NSX®, VMware vSAN™, VMware vSphere® Update
Manager™ (VUM), as well as third-party components.
2X faster performance in vCenter operations per second
With their benchmark vcbench, VMware performance engineers measured the number
of operations per second (throughput) that vCenter produced.
This benchmark stresses the vCenter server by performing typical vCenter operations
like power on and off a VM among several others. vCenter 6.7 performs 16.7 operations
per second, which is a twofold increase over the 8.3 operations per second vCenter
6.5 produced.
3X faster operations, 3X reduction in memory usage
Before vCenter can power on a VM, it first consults several sub-systems, including DRS,
to support the initial placement of the VM on a vSphere host. Latency, in this context, is
the measure of the duration of this process. VMware made many optimizations in the
coordination of these sub-systems to reduce power-on latency from 9.5 seconds to
2.8 seconds.
VMware also optimized the core vCenter process (vpxd) to use much less memory (a 3x
reduction!) to complete the same workloads.
vCenter Performance Analysis
For more information about vCenter Performance, check out vCenter Performance
Analysis, a module later in this lab.
HOL-2004-01-SDC
Core Platform Improvements

Let's look at the variety of improvements that vSphere 6.7 brings: host
maximums, new scheduler options, large memory pages, per-VM EVC, virtual
hardware versions 14 and 15, persistent memory (PMEM/NVDIMM), virtualization-
based security (VBS), and Instant Clone.
Host Scalability
There are some minor improvements to vSphere 6.7 ESXi host maximums worth noting.
• Host processor maximums increased from 576 to 768 logical CPUs.

• Host memory maximum increased from 12 TB to 16 TB physical RAM.
1 GB Large Memory Pages
Applications with large memory footprints, like SAP HANA, can often stress the hardware
memory subsystem (that is, Translation Lookaside Buffer, or TLB) with their access
patterns. Modern processors can mitigate this performance impact by creating larger
mappings to memory and increasing the memory reach of the application. In prior
releases, ESXi allowed guest operating system memory mappings based on 2 MB page
sizes. This release introduces memory mappings for 1 GB page sizes.
As shown in this figure, there is up to 26% improvement in 1 GB memory access

performance, compared to the 2 MB page size, through more efficient use of the TLB
and processor L1-L3 cache.
HOL-2004-01-SDC
To enable this advanced attribute, see Backing Guest vRAM with 1GB Pages at
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.resmgmt.doc/
GUID-F0E284A5-A6DD-477E-B80B-8EFDF814EE01.html
CPU Scheduler Enhancements
Scalability of the vSphere ESXi CPU scheduler is always being improved release-to-
release to support current and future requirements. New in vSphere 6.7 is the
elimination of the last global lock, which allows the scheduler to support tens of
thousands of worlds (various processes running in the VMkernel; for example, each
virtual CPU has a world associated with it). This feature ensures vSphere maintains its
lead as a platform for containers and microservices.
In vSphere 6.7 U2, there is a new scheduler option called the side-channel aware
scheduler to address a security vulnerability known as L1TF. For more information,
including performance test results, see this blog: https://blogs.vmware.com/
performance/2019/05/new-scheduler-option-for-vsphere-6-7-u2.html
Virtual Per-VM EVC
vSphere previously implemented Enhanced vMotion Compatibility (EVC) as a

cluster-wide attribute because, at the cluster-wide level, you can make certain
assumptions about migrating a VM (for example, even if the processor is not the same
across all ESXi hosts, EVC still works). However, this policy can cause problems when
you try to migrate across vCenter hosts or vSphere clusters. By implementing per-VM
EVC, the EVC mode becomes an attribute of the VM rather than the specific processor
generation it happens to be booted on in the cluster.
Let's configure a EVC for a specific VM:
1. Click on Menu then Hosts and Clusters (it should be underlined)

2. Select the perf-worker-01a VM
3. Click the Configure tab
4. Select VMware EVC from the list
HOL-2004-01-SDC
5. You'll note that "EVC is Disabled". Click the EDIT... button to see what the
choices are.
1. Click the Enable EVC for Intel hosts radio button

2. Click the VMware EVC Mode dropdown and choose Intel "Haswell"
Generation. Read the Description of this mode, indicating this would restrict
the VM to only Haswell or future-generation Intel processors.
3. Click Cancel (since this is just an example and we don't actually want to apply
EVC in the lab)
HOL-2004-01-SDC
Virtual Hardware 14
Virtual Hardware 14 adds support for:
• Persistent memory, with a maximum of:

- 1 NVDIMM controller per VM
- 64 NVDIMMS per VM
- 1 TB non-volatile memory per VM
• Virtual Trusted Platform Module (vTPM) - VMware created a new, vTPM 2.0
device to enable Microsoft Virtualization-based Security (VBS).
• Per-VM EVC (see previous section)
Verify that perf-worker-01a VM is running Virtual Hardware version 14:
1. Ensure you're in the Hosts and Clusters view (it should be underlined)
3. Click the Summary tab
4. Note that it has been configured with VM version 14, which is only compatible
with ESXi 6.7 and later.
Next, we'll show how to upgrade a legacy VM to this version.
HOL-2004-01-SDC
Upgrade a VM to HW v14
To upgrade a VM with an older Virtual Hardware version to version 14:

2. Click the Summary tab. Note that this VM states "Compatibility: ESXi 6.0 and
later (VM version 11)".
3. Click the ACTIONS dropdown
4. Scroll down to Compatibility
5. Click Upgrade VM Compatibility...
HOL-2004-01-SDC
The warning states that you should make a backup of your VM, since it is an irreversible
operation that makes your VM incompatible with earlier versions of vSphere.
Click YES to acknowledge this warning.
We can choose what hardware version to upgrade perf-worker-01b to. By default, it is

set to "ESXi 6.7 and later" (HW version 14), which is what we want.
Click OK to confirm the upgrade.
Confirm that the VM now states "Compatibility: ESXi 6.7 and later (VM version
14)"
Congratulations. You have upgraded this VM to use the latest vSphere 6.7
enhancements!
HOL-2004-01-SDC
Virtual Hardware 15 (ESXi 6.7 Update 2 and later)
Virtual Hardware 15, which is only supported for ESXi 6.7 U2 (and later) hosts, increases
the maximum number of logical processors from 128 to 256.
HOL-2004-01-SDC
Persistent Memory (PMEM)
Persistent memory (PMEM) is a type of non-volatile DRAM (NVDIMM) that has the
speed of DRAM but retains contents through power cycles. It is a new layer that sits
between NAND flash and DRAM and provides faster performance. It’s also non-volatile
unlike DRAM.
vSphere 6.7 supports two modes of accessing persistent memory:
• vPMEMDisk - presents NVDIMM capacity as a local host datastore which requires no

guest operating system changes to leverage this technology.
• vPMEM - exposes NVDIMM capacity to the virtual machine through a new virtual
NVDIMM device. Guest operating systems use it directly as a block device or in DAX
mode.
This chart shows the result of a performance test run using the MySQL benchmark of
Sysbench. The benchmark measures the throughput and latency of a MySQL workload.
Here, we ran the tests with three tables, nine threads, and an 80-20 read-write ratio
with a MySQL server in a VM hosted on vSphere 6.7.
The blue bars show throughput measured in transactions per second. The green line
shows latency measured as the 95th percentile in milliseconds.
We observe that virtual PMEM can improve performance by up to 1.8x better

throughput and 2.3x better latency over standard SSD technology.
HOL-2004-01-SDC
vSphere 6.7 Persistent Memory Video (2:00)
Check out this video to learn more about vSphere Persistent Memory can significantly
enhance performance for both existing and new applications.
Virtualization-based Security (VBS) Overview
HOL-2004-01-SDC
Microsoft VBS, a feature of Windows 10 and Windows Server 2016 operating systems,
uses hardware and software virtualization to enhance system security by creating an
isolated, hypervisor-restricted, specialized subsystem. Starting with vSphere 6.7 and
Virtual Hardware 14, you can enable Microsoft virtualization-based security (VBS) on
supported Windows guest operating systems.
VMware engineering made a number of vSphere features and enhancements to improve

performance in VBS-enabled virtual machines.
To measure the performance of a vSphere 6.7 virtual machine running Windows with
VBS enabled, we used the benchmark HammerDB. The test simulated 22 virtual users
generating an OLTP TPC-C-like workload that wrote to a Microsoft SQL Server 2016
database. This workload was like TPC-C.
As shown, these engineering efforts resulted in a 33% improvement in transactions

per minute.
Creating a VBS-enabled VM
HOL-2004-01-SDC
Let's create a VBS-enabled VM:
1. Ensure you're in the Hosts and Clusters view

2. Select esx-02a.corp.local as the host
3. Select the ACTIONS dropdown
4. Select New Virtual Machine...
HOL-2004-01-SDC
Create a new virtual machine will be highlighted. Click the NEXT button.
HOL-2004-01-SDC
Type a name for the VM, i.e. VBS and click NEXT.
HOL-2004-01-SDC
esx-02a.corp.local should already be selected as the host. Click NEXT.
HOL-2004-01-SDC
Select the RegionA01-ISCSI02 datastore and click NEXT.
HOL-2004-01-SDC
Select the virtual machine version. By default, ESXi 6.7 and later is selected, which is
required for VBS, so click NEXT.
HOL-2004-01-SDC
1. Ensure Windows is selected as the Guest OS Family

2. Change Guest OS Version to Microsoft Windows Server 2016 (64-bit)
3. Note there is a new checkbox for VBS, Enable Windows Virtualization Based
Security. Check this box.
4. Click NEXT.
HOL-2004-01-SDC
1. Click VM Options
2. Expand the Boot Options section.
Note that by enabling VBS, the necessary options such as EFI firmware and
Secure Boot are required, and automatically set.
3. Click NEXT.
Note that Virtualization Based Security is Enabled, allowing an easy provisioning of

a VBS-enabled Windows Server 2016 VM, with the VBS prerequisites automatically set!
Click CANCEL as we are not continue installing the guest.
HOL-2004-01-SDC
Instant Clone
The time to fully deploy and boot 64 clones using vSphere 6.7 Instant Clone showed
approximately 2.8x improvement over the older Linked Clone architecture.
You can use Instant Clone technology to create powered-on virtual machines from the
running state of another powered-on virtual machine. The result of an Instant Clone
operation is a new virtual machine that is identical to the source virtual machine. With
Instant Clone, you can create new virtual machines from a controlled point in time.
Instant cloning is very convenient for large-scale application deployments because it
ensures memory efficiency and allows for creating numerous virtual machines on a
single host.
HOL-2004-01-SDC
This Instant Clone video demonstration shows how 20 CentOS VMs can be provisioned in
two minutes (credit: LearnVMware.online). The magic happens around 3:11 if you want
to skip ahead!
HOL-2004-01-SDC
Conclusion
Based on these performance, scalability, and feature improvements in vSphere
6.7, VMware continues to demonstrate industry-leading performance.
You've finished Module 1
Congratulations on completing Module 1.
If you are looking for additional information on vSphere 6.7 performance, check out
these links:
• VROOM! blog at https://blogs.vmware.com/performance/

• vSphere blog at https://blogs.vmware.com/vsphere
• What's New in vSphere 6.7 whitepaper at https://www.vmware.com/content/
dam/digitalmarketing/vmware/en/pdf/products/vsphere/vmware-whats-new-in-
vsphere-whitepaper.pdf
• What's New in Performance? VMware vSphere 6.7 at
https://www.vmware.com/techpapers/2018/whats-new-vsphere67-perf.html
Test Your Skills!
Now that you’ve completed this lab, try testing your skills with VMware Odyssey, our
newest Hands-on Labs gamification program. We have taken Hands-on Labs to the next
level by adding gamification elements to the labs you know and love. Experience the
fully automated VMware Odyssey as you race against the clock to complete tasks and
reach the highest ranking on the leaderboard. Try the vSphere Performance Odyssey lab
• HOL-2004-04-ODY- VMware Odyssey - vSphere Performance - Advanced

Game
HOL-2004-01-SDC
Module 2 - Right-Sizing
vSphere VMs for Optimal
Performance (45 minutes)
HOL-2004-01-SDC
Introduction
Meet Melvin the Monster VM! vSphere 6.5 and later can handle Melvin and any
other large, business-critical workloads (known affectionately as "wide" or
"monster" VMs) without breaking a sweat! :-)
In all seriousness, this module discusses rules of thumb for right-sizing VMs --
particularly those that are so large that they span multiple physical processor or
memory node boundaries. We throw around terms such as vCPUs, pCPUs, Cores
Per Socket, NUMA (pNUMA and vNUMA), and learn how to right-size these VMs to
perform optimally.
HOL-2004-01-SDC
NUMA and vNUMA

UMA, NUMA, and vNUMA, oh my! Let's look at these acronyms and see what they
look like from an architectural perspective.
UMA
This is a bit of a history lesson, as UMA, or Uniform Memory Access, is no longer how
modern servers are designed. The reason why?
The Memory Controller (highlighted) quickly became a bottleneck; it is easy to see

why, as every CPU requesting memory or I/O had to pass through this layer. (Credit:
frankdenneman.nl)
NUMA
NUMA moves away from a centralized pool of memory and introduces the concept of a
topology. By classifying memory location bases on signal path length from the
processor to the memory, latency and bandwidth bottlenecks can be avoided. This is
done by redesigning the whole system of processor and chipset. NUMA architectures
gained popularity at the end of the 90's when it was used on SGI supercomputers such
as the Cray Origin 2000. NUMA helped to identify the location of the memory, in this
case of these systems, they had to wonder which memory region in which chassis was
holding the memory bits.
HOL-2004-01-SDC
In the first half of the millennium decade, AMD brought NUMA to the enterprise
landscape where UMA systems reigned supreme. In 2003 the AMD Opteron family was
introduced, featuring integrated memory controllers with each CPU owning designated
memory banks. Each CPU has now its own memory address space. A NUMA-optimized
operating system such as ESXi allows workload to consume memory from both memory
addresses spaces while optimizing for local memory access. Let's use an example of
a two CPU system to clarify the distinction between local and remote memory access
within a single system:
(Credit: frankdenneman.nl)
The memory connected to the memory controller of the CPU1 is considered to be local
memory. Memory connected to another CPU socket (CPU2) is considered to be foreign or
remote for CPU1. Remote memory access has additional latency overhead as opposed
to local memory access, since it has to traverse an interconnect (point-to-point link) and
connect to the remote memory controller. As a result of the different memory locations,
this system experiences “non-uniform” memory access time.
HOL-2004-01-SDC
Without vNUMA
In this example, a VM with 12 vCPUs is running on a host with four NUMA nodes with six
cores each. This VM is not being presented with the physical NUMA configuration and
hence the guest OS and application only sees a single NUMA node. This means that the
guest has no chance of placing processes and memory within a physical NUMA node.
We have poor memory locality.
HOL-2004-01-SDC
With vNUMA
Since vSphere 5, ESXi has had the vNUMA (virtual NUMA) feature that can
present multiple NUMA nodes to the guest operating system. Traditionally, virtual
machines have only been presented with a single NUMA node regardless of the
size of the VM and its underlying hardware. Larger and larger workloads are being
virtualized, so it has become increasingly important that the guest OS and
applications can make decisions on where to execute applications and where to
place memory.
VMware ESXi is NUMA aware and always tries to fit a VM within a single physical
NUMA node when possible. However, with very large "monster VMs", this isn't
always possible.
The purpose of this section is to gain understanding of how vNUMA works by itself
and in combination with the cores per socket feature.
In this example, a VM with 12 vCPUs is running on a host that has four NUMA nodes with
six cores each. This VM is being presented with the physical NUMA configuration, and
hence the guest OS and application sees two NUMA nodes. This means that the guest
can place processes and accompanying memory within a physical NUMA node when
possible.
We have good memory locality.
HOL-2004-01-SDC
vCPU and vNUMA Right-Sizing

Using virtualization, we enjoy the flexibility to quickly create virtual machines with
various virtual CPU (vCPU) configurations for a diverse set of workloads.
However, as we virtualize larger and more demanding workloads such as

databases, on top of the latest generations of processors with up to 28 cores,
special care must be taken in vCPU and vNUMA configuration to ensure optimal
performance.
vCPUs, Cores per Socket, vSockets, CPU Hot Plug/Hot Add
The most important values are shown in this screenshot, taken directly from the
vSphere Web Client.
NOTE: You must expand the CPU dropdown to view/change some of these fields!
1. CPU: This is the total number of vCPUs presented to the guest OS ( 20 in this
example)
2. Cores per Socket: If this value is 1 (the default), all CPUs are presented to the
guest as single-core processors. For most VMs, the default value is OK, but
there are definitely instances when you should consider increasing this
value, which we'll discuss in a bit.
In this example, we've increased it to 10 , which means the guest will see multi-
core (10-core) processors.
3. Sockets: This is not a configurable value; it is simply the number of CPUs divided
by Cores per Socket : in this example, 20 / 10 = 2 .
Also called "virtual sockets" or "vSockets".
4. CPU Hot Plug: Also known as CPU Hot Add, this is a checkbox to allow adding
more CPUs "on the fly" (while the guest is powered on).
If you have right-sized your VM from the beginning, you should not enable this
HOL-2004-01-SDC
feature, because it has the major downside of disabling vNUMA. For more
information, see vNUMA is disabled if VCPU hotplug is enabled (KB 2040375)
Let's refer to this 20 vCPU VM, as configured, as 2 Sockets x 10 Cores per Socket.
Cores per Socket: Licensing Considerations
Let's talk about the Cores per Socket value. As mentioned earlier, this defaults to 1,
which means that every virtual CPU is present as a Socket to the guest VM. In most
cases, there's no issue there.
However, this may not be ideal from a Microsoft licensing perspective where the
operating system and/or application is sometimes per-processor. Here are a few
examples:
• Since both Windows Server 2012 and 2016 only support up to 64 sockets,
creating a “monster” Windows VM with more than 64 vCPUs requires an increase
in Cores per Socket so the guest can consume all the assigned processors.
• A virtual machine with 8 Sockets x 1 Core per Socket, hosting a single Microsoft
SQL Server 2016 Standard Edition license, would only be able to consume 4 of
the 8 vCPUs since that edition’s license limits to “lesser of 4 sockets or 24
cores”. If the virtual machine is configured with 1 Socket x 8 Cores per Socket,
all 8 vCPUs could be leveraged: https://msdn.microsoft.com/en-us/library/
ms143760.aspx
• A VM created with 16 vCPUs and 2 Cores per Socket hosting Microsoft SQL
Server 2016 Enterprise Edition, may behave differently than a VM configured
with 16 vCPUs and 8 Cores per Socket. This is due to the soft-NUMA feature
within SQL Server which gets automatically configured based on the number of
cores the OS can use: https://msdn.microsoft.com/en-us/library/ms345357.aspx
vNUMA Behavior Changes in vSphere 6.5 and above
In an effort to automate and simplify configurations for optimal performance, vSphere

6.5 introduced a few changes in vNUMA behavior. Thanks to Frank Denneman for
thoroughly documenting them here:
HOL-2004-01-SDC
http://frankdenneman.nl/2016/12/12/decoupling-cores-per-socket-virtual-numa-
topology-vsphere-6-5/
Essentially, the vNUMA presentation under vSphere 6.5/6.7 is no longer affected

by Cores per Socket. vSphere will now always present the optimal vNUMA topology
(unless you use advanced settings).
However, you should still choose the CPU and Cores per Socket values wisely. Read on
for some best practices.
Best Practices for Cores per Socket and vNUMA
In general, the following best practices should be followed regarding vNUMA and Cores
per Socket:
• Configure the VM CPU value equal to Cores per Socket , until you exceed the
physical core count of a single physical NUMA node.
Example: for a host with 8-core processors, any VM with 8 (or fewer) CPUs should
have the same Cores Per Socket value.
• When you need to configure more vCPUs than there are physical cores in the
NUMA node, evenly divide the vCPU count across the minimum number of
NUMA nodes.
Example: for a 4-socket, 8-core host, and the VM needs more than 8 vCPUs, a
reasonable choice may include a 16 vCPU VM with 8 Cores per Socket (to
match the 8-core processor architecture)
• Don’t assign an odd number of vCPUs when the size of your virtual machine
exceeds a physical NUMA node.
Example: for a 2-socket, 4-core host, do not create a VM with 5 or 7 vCPUs.
• Don’t exceed the total number of physical cores of your host.
Example: for a 2-socket, 4-core host, do not create a VM with more than 8 vCPUs.
• Don’t enable vCPU Hot Add. (this disables virtual NUMA)
• "Right-size" monster VMs to be multiples of physical NUMA size.
Example: on an 8 cores/node system, 8/16/24/32 vCPU; for a 10 cores/node

system, 10/20/40 vCPU
There are many Advanced Virtual NUMA Attributes (click for a full list); here are a few
guidelines, but in general, the defaults are best:
HOL-2004-01-SDC
• If the VM is larger than total physical core count (e.g. a 64 vCPU VM on a 40 core /
80 thread host), try numa.consolidate = false
• If Hyper-Threading is enabled (usually the default), numa.vcpu.preferHT=true
may help (KB 2003582)
• If Cores per Socket is too restrictive, you can manually set vNUMA size with
numa.vcpu.maxPerMachineNode
• To enable vNUMA on a VM with 8 or fewer vCPUs, use numa.vcpu.min
Of course, a picture (or in this case, a table) is worth a thousand words. This table
outlines how a VM could (should) be configured on a dual-socket, 10-core physical host
to ensure an optimal vNUMA topology and performance, regardless of vSphere version.
HOL-2004-01-SDC
Guest OS Tools to View vCPUs/vNUMA

We saw how to use the vSphere Client to right-size a virtual machine's vCPUs and
Cores per Socket.
What do these toplogies look like from the guest OS perspective? Let's look at
some examples of tools for Windows and Linux that let us verify that the guest is
showing the expected processor and NUMA configurations.
vSphere Client CPU/Cores per Socket Example
Although shown before, it is worth repeating:
1. CPU: This is the total number of vCPUs presented to the guest OS ( 20 in this
example)
2. Cores per Socket: If this value is 1 (the default), all CPUs are presented to the
guest as single-core processors.
For most VMs, the default value is OK, but there are definitely instances when
you should consider increasing this value, which we'll discuss in a bit.
In this example, we've increased it to 10 , which means the guest will see multi-
core (10-core) processors.
3. Sockets: This is not a configurable value; it is simply the number of CPUs divided
by Cores per Socket : in this example, 20 / 10 = 2 .
Also called "virtual sockets" or "vSockets".
4. CPU Hot Plug: Also known as CPU Hot Add, this is a checkbox to allow adding
more CPUs "on the fly" (while the guest is powered on).
HOL-2004-01-SDC
If you have right-sized your VM from the beginning, you should not enable this
feature, because it has the major downside of disabling vNUMA.
Let's refer to this 20 vCPU VM, as configured, as 2 Sockets x 10 Cores per Socket.
Windows: Coreinfo
From the Microsoft Sysinternals web site:
Coreinfo is a command-line utility that shows you the mapping between logical
processors and the physical processor, NUMA node, and socket on which they reside, as
well as the cache’s assigned to each logical processor. It uses the Windows’
GetLogicalProcessorInformation function to obtain this information and prints it to the
screen, representing a mapping to a logical processor with an asterisk e.g. ‘*’ .
Coreinfo is useful for gaining insight into the processor and cache topology of your
system.
HOL-2004-01-SDC
Parameter Description
-c Dump information on cores.
-f Dump core feature information.
-g Dump information on groups.
-l Dump information on caches.
-n Dump information on NUMA nodes.
-s Dump information on sockets.
-m Dump NUMA access cost.
-v Dump only virtualization-related features
Here we see the output of coreinfo (with no command line options) on the
aforementioned 20 vCPU VM. Here is a breakdown of the highlights:
HOL-2004-01-SDC
1. Logical to Physical Processor Map: This section confirms Windows sees 20

vCPUs (note that it presents them as Logical and Physical Processors, with a
1:1 mapping)
2. Logical Processor to Socket Map: This section confirms Windows sees 2
Sockets, with 8 Logical Processors on each Socket. We can also refer to these
as vSockets.
3. Logical Processor to NUMA Node Map: This section confirms that Windows sees 2
NUMA Nodes, with 8 Logical Processors on each Node. Since this is a VM, we
call these vNUMA nodes.
Linux: numactl
For Linux, the most useful parameter to gain information about virtual NUMA is numactl .
Note that you may need to install the package that provides the numactl tool for your
OS (for RHEL/CentOS 7, an appropriate command is yum install numactl ).
Parameter Description
-c Dump information on cores.
-f Dump core feature information.
-g Dump information on groups.
-l Dump information on caches.
-n Dump information on NUMA nodes.
-s Dump information on sockets.
-m Dump NUMA access cost.
-v Dump only virtualization-related features
Here we see the output of numactl -H (the -H is an abbreviation for hardware; use the
man numactl command to see all of the available parameters). Here is a quick
explanation:
1. numactl -H: This is the command we typed to get the output.
HOL-2004-01-SDC
2. available: 2 nodes (0-1): This section confirms Linux sees 2 NUMA nodes,
also known as vNUMA nodes.
3. node 0 cpus, node 1 cpus: This section confirms Linux sees 10 logical
processors on each NUMA node (20 vCPUs total).
HOL-2004-01-SDC
Conclusion
Congratulations! You now know how to right size VMs optimally for
vSphere 6.7!
Resources/Helpful Links
For more information about right-sizing VMs, NUMA/vNUMA, and vSphere performance in
general, here are some helpful links:
• https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-
vnuma-rightsizing-rules-of-thumb.html
• http://frankdenneman.nl/2016/07/06/introduction-2016-numa-deep-dive-series/
HOL-2004-01-SDC
Module 3 - Introduction to
esxtop (30 minutes)
HOL-2004-01-SDC
Introduction to esxtop
There are several tools to monitor and diagnose performance in vSphere environments.
esxtop helps you diagnose and further investigate performance issues that you've
already identified through the vSphere Client or other tool or method. esxtop is not a
tool designed for monitoring performance over the long term but is great for deep
investigation or monitoring a specific issue or VM on a specific host over a defined
period of time.
In this lab, which should take about 30 minutes, we use esxtop to dive into performance
troubleshooting the utilizations of CPU, Memory, Storage, Network, and Power.
The goal of this module is to expose you to the different views in esxtop and to present
you with different loads in each view. This is not meant to be a deep dive into esxtop
but to get you comfortable with this tool so that you can use it in your own environment.
To learn more about the metrics in esxtop and what they mean, we recommend that you
look at the links at the end of this module.
For day-to-day performance monitoring of an entire vSphere environment, the VMware

vRealize® suite offers a hybrid cloud management platform that provides a
comprehensive management for IT services on VMware vSphere and other hypervisors.
vRealize Operations™ (vROPs) is a powerful application you can use to monitor your
entire virtual infrastructure. It incorporates high-level dashboard views, custom
dashboards, and built-in intelligence to analyze the data and identify possible problems.
We also recommend that you look at the other VMware vRealize Hands On Labs when
you have finished with this one for better understanding of day-to-day monitoring.
HOL-2004-01-SDC
Show esxtop CPU features

You can use esxtop to diagnose performance issues involving almost any aspect of
performance at both the host and virtual machine perspectives. This section
shows you how to view both VM and host CPU performance using esxtop in
interactive mode.
Monitor VM vCPU load
Open a PowerShell window
Click on the "Windows PowerShell" icon in the taskbar.
Start CPU load on VMs
Type
.\StartCPUTest2.ps1
and press Enter. Wait until you see the RDP sessions to continue.
Open PuTTY
Click the PuTTY icon on the taskbar.
HOL-2004-01-SDC
SSH to esx-01a
1. Select host esx-01a.corp.local

2. Click Open
Start esxtop
HOL-2004-01-SDC
1. From the ESXi shell, type
esxtop
and press Enter.
2. Click the Maximize icon so we can see the maximum amount of information.
Select the CPU view
If you just started esxtop, you are in the CPU view by default.
If you happen to be on a different screen, pressing "c" gets you back to this view.
By default the screen will be refreshed every five seconds. To change this, for example
to set the refresh rate to two seconds, press "s 2" then press Enter:
s 2
Let's filter this view (remove some fields) by pressing the letter "f":
HOL-2004-01-SDC
Filter the fields displayed
Since we don't much screen space, let's remove (filter out) the ID and GID fields.
Do this by typing the following letters (NOTE: Make sure these are capitalized as these
are case sensitive!)
AB
You should see the * next to A: and B: disappear. Press Enter to resume the esxtop
screen.
Filter only VMs
By default, this screen shows performance counters for both virtual machines and ESXi
host processes.
HOL-2004-01-SDC
Let's filter out everything except for virtual machines. To do this, type a capital "V":
Monitor VM load
Monitor the load on the two Worker VM's: perf-worker-01a and perf-worker-01b:
1. Both VMs should both be running at or near 100% utilization (%USED). If not,
then wait for a moment and let the CPU workload startup.
2. Another important metric to monitor is %RDY (CPU Ready). This metric is the
percentage of time a “world” is ready to run but waiting on the CPU scheduler for
approval. This metric can go up to 100% per vCPU, which means that with two
vCPUs, it has a maximum value of 200%. A good guideline is to ensure this value
is below 5% per vCPU, but it always depends on the application.
Look at the worker VMs to see if they go above the 5% per vCPU threshold. To
force esxtop to immediately refresh, click the Space bar.
Open Google Chrome
Click the Google Chrome icon to open up a Web browser.
HOL-2004-01-SDC
Login to the vSphere Client
1. Make sure the Use Windows session authentication box is checked.

2. Click the Login button to login to the vSphere Client.
HOL-2004-01-SDC
Edit Settings of perf-worker-01a
Let's see how perf-worker-01a is configured:
1. Click on perf-worker-01a, which is hosted on esx-01a.corp.local

2. Click the Actions dropdown
3. Click Edit Settings…
HOL-2004-01-SDC
Add a vCPU to perf-worker-01a
Since we previously enabled CPU Hot Add, we can add another vCPU while the VM is
running:
1. Expand the CPU dropdown

2. Change CPU to 2
3. Click OK to save
HOL-2004-01-SDC
Edit Settings of perf-worker-01b
Let's add a virtual CPU to perf-worker-01b as well to improve performance.
1. Right click on the perf-worker-01b virtual machine

2. Click Edit Settings…
HOL-2004-01-SDC
Add a vCPU to perf-worker-01b
1. Change CPU to 2
2. Click OK to save
Switch back to esxtop/PuTTY
Return to the PuTTY (esxtop) window by clicking esx-01a.corp.local on the taskbar

to see what has changed.
Monitor %USED and %RDY
Now that you've added an additional vCPU to each VM, you should see results like the
screenshot above:
HOL-2004-01-SDC
• As expected, the vCPU count has increased from 2 to 4.

• The %USED is still only around 100, which means that the CPU benchmark is still
only using one vCPU per virtual machine.
• %IDLE is now around 100, which means that one vCPU is idle.
• %RDY has increased, which means that even if the additional vCPU is not being
used yet, it causes some additional CPU ready time. This is due to the additional
overhead of scheduling SMP virtual machines. This is also why right-sizing your
virtual machines is important if you want to optimize resource consumption.
Monitor %USED and %RDY (continued)
After a few minutes, the CPU benchmark starts to use the additional vCPUs and %RDY
increases even more. This is due to CPU contention and SMP scheduling
(increased %CSTP) on the system. The ESXi host has two active virtual machines each
with two vCPUs, and these four vCPUs attempting to run at 100% each results in
fighting for resources. Remember that the ESXi host also requires some physical CPU
resources to run, and this causes CPU contention.
HOL-2004-01-SDC
Monitor host CPU power
HOL-2004-01-SDC
A new switch in vSphere 6.5 lets you monitor the host CPU power statistics in esxtop. To
view the host power screen in esxtop, type a lowercase "p":
Press the letter "f" to see available fields to add to the screen:
HOL-2004-01-SDC
Press the letter "f" again to add %Aperf/Mperf then press Enter:
This screen shows:
1. Current power usage in watts

2. The number of processors
3. CPU %USED and %UTIL
4. Turbo boost as a ratio of clock speeds with and without Turbo (%A/MPERF)
The metric to watch is:
%A/MPERF:
This ratio column identifies at what frequency the processor is currently running. aperf
and mperf are two hardware registers that keep track of the actual frequency and
nominal frequency of the processor. You can't see actual values because of the nature
of the Hands On Lab.
However, look at the following image captured from a physical host. It shows a host
running VMware vSphere 6.7 U2 with 36 logical CPUs (18 physical CPUs with
Hyperthreading enabled) each at 2.8 GHz. The host serves two VMs, and we started a
CPU-intensive quad-threaded process on each VM to generate load.
HOL-2004-01-SDC
Actual and Nominal Frequency
1. The host is using 358 watts

2. The host has 36 processors (18 physical and 18 logical with Hyperthreading)
3. %USED and %UTIL vary across the processors. The eight CPUs serve the eight
CPU-intensive processes on the VMs
4. The Aperf/Mperf ratio (%A/MPERF) at about 122% means that the processor is
running at about 3.4 GHz:
2.8 GHz × 122% = approximately 3.4 GHz
For more details on host power policies, see .
HOL-2004-01-SDC
Show esxtop memory features

shows you how to view memory performance using esxtop in interactive mode.
Open a PowerShell Window (if necessary)
Click on the Windows PowerShell icon in the taskbar to open a command prompt.
NOTE: If you already have one open, just switch back to that window.
Reset Lab
Type
.\StopLabVMs.ps1
and press Enter. This resets the lab in to a base configuration.
HOL-2004-01-SDC
Start Memory Test
In the PowerShell window type
.\StartMemoryTest.ps1
Then press Enter to start the memory load.
You can continue to the next step while the script is running, but please don't close any
windows since that stops the memory load.
Select the esxtop Memory view
HOL-2004-01-SDC
In the PuTTY window type
to see the memory view.
Select correct fields
Type
to see the list of available counters.
Since we don't have so much screen space, let's remove the two counters ID and GID.
To do this, press (capital letters)
BH
Press Enter to return to the esxtop screen.
HOL-2004-01-SDC
See only VMs
This screen shows memory performance counters for both virtual machines and ESXi
host processes.
To see only values for virtual machines, press (capital)
You can press (capital) V again to toggle between all processes and only VM processes.
Monitor memory load with no contention
When the load on the worker VMs begin, you can see them in the top of the esxtop
window.
HOL-2004-01-SDC
Some good metrics to look at are:
MCTL:
Is the balloon driver installed? If not, then it's a good idea to fix that first.
MCTLSZ:
Shows how inflated the balloon is and how much memory has been taken back from the
operating system. This should be 0.
SWCUR:
Shows how much the VM has swapped. This should be 0, but could be OK if SWR/S and
SWW/S are low.
SWR/S:
Shows reads from the swap file.
SWW/S:
Shows writes to the swap file.
Depending on the lab, all counters should be good. However, due to the nature of the
nested lab, it's unclear what you might see, so look around.
HOL-2004-01-SDC
Power on perf-worker-04a
1. Click to focus on the vSphere Web Client browser window. Right click on perf-
worker-04a
2. Select Power
3. Click Power On
Monitor memory load under contention
Now that we have created memory contention on the ESXi host, we can see:
1. perf-worker-02a and 03a are ballooning around 400MB each
HOL-2004-01-SDC
2. perf-worker-02a, 03a and 04a are swapping to disk, indicating too much
memory strain in this environment
Stop load on workers
1. To stop the load on the workers that appeared after you started the load script,
close the two VM Stats Collector windows.
HOL-2004-01-SDC
Show esxtop storage features

shows you how to view storage performance using esxtop in interactive mode.
Click on the Windows PowerShell icon in the taskbar to open a command prompt
Reset Lab
Type
.\StopLabVMs.ps1
and press Enter. This resets the lab to a base configuration.
Start Storage Test
HOL-2004-01-SDC
.\StartStorageTest.ps1
and press Enter to start the lab.
The lab takes about five minutes to prepare. Feel free to continue on to the other steps
while the script finishes.
After you start the script, be sure that you don't close any windows that appear.
Different views
When looking at storage in esxtop, you have multiple options to choose from.
esxtop shows the storage statistics in three different screens:
• adapter screen (d)

• device screen (u)
• vm screen (v)
and
• vSAN (x)
Let's focus on the VM screen in this module.
In the Putty window type (lower case)
HOL-2004-01-SDC
to see the storage vm view.
To see the available list of counters, type
Let's add this ID by pressing (capital letter) A:
Press Enter when finished.
HOL-2004-01-SDC
Display Iometer load on VMs
The StartStorageTest.ps1 script that we executed in the beginning of this lab should be
finished now, and you should have two Iometer windows on your desktop that look like
the above image.
If not, run
.\StartStorageTest.ps1
again, and wait for it to finish.
Monitor VM load
You have four running VMs in the Lab.
HOL-2004-01-SDC
Two of them are running Iometer workloads, and the other two are iSCSI storage targets
using RAM disk. Because they are using a RAM disk as storage target, they do not
generate any disk I/O.
The metrics to watch are :
CMDS/S:
This is the total amount of commands per second and includes IOPS (Input/Output
Operations Per Second). It also includes other SCSI commands such as SCSI
reservations, locks, vendor string requests, unit attention commands, and so on being
sent to or coming from the device or virtual machine.
In most cases, CMDS/s = IOPS unless there are a many metadata operations (such as
SCSI reservations).
LAT/rd and LAT/wr:
These indicate average response time or Read and Write IO as seen by the VM.
In this case, you should see high values in CMD/s on the worker VMs that currently do
Iometer load (perf-worker-02a and 03a). This indicates that the VMs are generating a lot
of IO.
You also can observe a high value in LAT/wr since the VMs are only doing writes.
The numbers may be different on your screen due to the nature of the Hands On Labs.
Device or Kernel latency
HOL-2004-01-SDC
Press
to go to the Device view.
Here you can see that the storage workload is on device vmhba65, which is the software
iSCSI adapter. Look for DAVG (device latency) and KAVG (kernel latency).
1. DAVG should be below 25ms

2. KAVG should be very low and always below 2ms
In this example the latencies are within acceptable values.
Stop load on workers
Close BOTH Iometer windows:
1. When finished, stop IOmeter workloads by clicking the red STOP button in each
IOmeter window
2. Click on the red X in the top right corner to close the window
Wait for PowerShell script to complete
HOL-2004-01-SDC
After both Iometer windows are closed, switch back to the PowerShell window and wait
for the script to clean up the environment before proceeding. Once you see this screen,
you can proceed.
HOL-2004-01-SDC
Show esxtop network features

shows you how to view network performance using esxtop in interactive mode.
Start Network Test
.\StartNetTest.ps1
Press Enter.
Continue with the next steps while the script runs since it takes a few minutes to load.
HOL-2004-01-SDC
Select the network view
In the PuTTY window type
to see the networking view
HOL-2004-01-SDC
To see the list of available counters type
Since there is not a lot of screen space, let's remove the two counters PORT-ID and
DNAME
To do this, press (capital letters)
AF
Press Enter when finished.
Monitor load
Note that the result might be different on your screen due to the load of the
environment where the Hands On Lab is running.
The screen updates automatically. To force a refresh, press
space
Take particular note of these metrics:
1. PKTTX/s (Packets Transmitted per second) and MbTX/s (Megabits Transmitted

per Second): transmit throughput of the NIC/VM
2. PKTRX/s (Packets Received per second) and MbRX/s (Megabits Received per
Second): receive throughput of the NIC/VM
3. %DRPTX/RX: if these are non-zero and/or increase over time, your network
utilization may be too high
HOL-2004-01-SDC
Note that the StartNetTest.ps1 script that you ran in the first step starts the VMs and
then waits for two minutes before running a network load for five minutes.
Depending on how fast you were at getting to this step, you might not see any load if it
took you more than seven minutes. You can restart the network load in the next step if
you need to.
Restart network load
If you want to start the network load for another five minutes, return to the PowerCLI
window.
In PowerShell type
.\StartupNetLoad.bat
Press Enter.
The network load runs for another five minutes. While you wait, you can continue to
explore esxtop.
Network workload complete
As described previously, the load stops by itself. When the PowerShell window says,
"Network load complete", it no longer generates load and the test is finished.
HOL-2004-01-SDC
Conclusion and Clean-Up

Key takeaways
During this lab we learned how to use esxtop to monitor load in CPU, memory, storage,
network, and power views.
We have only scratched the surface of what esxtop can do. In the next module, we take
a closer look at using esxtop in your own datacenter.
If you want to know more about esxtop, see these articles:
• Yellow-Bricks esxtop page: http://www.yellow-bricks.com/esxtop/

• Interpreting esxtop statistics: https://communities.vmware.com/docs/DOC-11812
• esxtop "Bible": https://communities.vmware.com/docs/DOC-9279
Clean up procedure
To free up resources for the remaining parts of this lab, we need to shut down all used
virtual machines and reset the configuration.
Reset Lab
To reset the lab, type
.\StopLabVMs.ps1
HOL-2004-01-SDC
and press Enter. This resets the lab into a base configuration. You now can move on to
another module.
Conclusion
This concludes the Introduction to esxtop module. We hope you have enjoyed taking
it. To learn more about esxtop's advanced features, such as running in batch mode and
viewing collected statistics, continue to the next module.
Please remember to fill out the survey when you finish.
HOL-2004-01-SDC
Module 4 - esxtop in Real-

World Use Cases (30
minutes)
HOL-2004-01-SDC
esxtop in Real-World Use Cases

This module takes what you learned in the previous Introduction to esxtop module
and applies it to real-world scenarios. It expands on some of the advanced esxtop
metrics available to monitor. We also discuss how to run esxtop in batch mode, save
its output into Comma-Separated Value (.csv) format, and graph that output with a
graphical interface.
HOL-2004-01-SDC
Creating an esxtop resource file

Because the VM and host performance statistics on the esxtop screens can be
overwhelming, esxtop lets you create a resource file (rc for short) that
automatically filters the displays and saves only the information that interests
you.
Once you become familiar with esxtop and begin using it interactively or in batch mode,
you can see that it generates screens full of detailed VM and host information. When
you have many VMs on a large host, the screen display can be difficult to manage, so
esxtop lets you create one or more resource files that initializes the display to capture
a subset of the performance statistics. This file's default name is ~/.esxtop60rc. Let's
learn how to use it and trim down the number of fields to report.
If you took Module 3 - Introduction to esxtop (30 minutes) then you're already familiar
with adding and removing fields from esxtop. In this module, we filter esxtop to capture
commonly monitored performance statistics in the CPU, memory, I/O, network, and
power components.
First, let's log into a host and start esxtop.
Open PuTTY
Click the PuTTY icon on the taskbar
HOL-2004-01-SDC
SSH to esx-01a

2. Click Open
Start esxtop
HOL-2004-01-SDC
From the ESXi shell, type esxtop on the command line:
esxtop
and press Enter.
If you just started esxtop, you are in the CPU view by default.
If you happen to be on a different screen, pressing "c" gets you back to this view.
Stretch the esxtop window
Some of the columns exceed the width of the window, so hover the cursor on the right
edge of the window, click once, and stretch it horizontally to the right to expand
the window.
HOL-2004-01-SDC
Now the PuTTY window is wide enough to display most or all of the available columns
(as highlighted above).
Customize the CPU view
Let's filter this view (add and remove some fields) by pressing the letter "f":
Filter the CPU fields displayed
Let's remove (filter out) ID and GID and add CPU POWER STATS.
Type the letters "A", "B", and "J" (NOTE: Make sure these are capitalized as these are
case sensitive!):
ABJ
You should see the * next to A: and B: disappear and the * next to J: appear. Press
Enter to resume the esxtop screen.
HOL-2004-01-SDC
Note that the Power column reports 0. This is due to the nature of the Hands On Lab and
also because the host is idle.
Customize host CPU power fields
By default, this screen shows performance counters for both virtual machines and ESXi
host processes. To view the host power screen in esxtop, type a lowercase "p":

HOL-2004-01-SDC
Filter the host power fields displayed
Press the letter "f" to see available fields to add to the screen:

HOL-2004-01-SDC
To add the Percentage of aperf to mperf ratio (%Aperf/Mperf) press the letter "f".
Press the letter "F" again:
and press Enter.
Customize the memory view

HOL-2004-01-SDC
Filter the memory fields displayed
To display the memory statistics, press the letter "m":
To filter the displayed fields, press the letter "f" and press Enter:
Let's remove GID and add Swap Statistics (SWAP STATS). Press the letters "BK" and
press Enter:

HOL-2004-01-SDC
BK
Customize I/O views
To display the disk adapter statistics, press the letter "d":
Filter the disk adapter fields displayed
To filter the displayed fields, press the letter "f" and press Enter.

HOL-2004-01-SDC
To remove Path Name (PATH) and Number of Paths (NPATHS) press the letters "B" and
"C" and press Enter:
BC
Filter the disk device fields displayed
To display the disk device statistics, press the letter "u":
To filter the displayed fields, press the letter "f" and press Enter.

HOL-2004-01-SDC
To remove ID (Path/World/Partition), add Read Latency Stats (ms) (LATSTATS/rd),

and add Write Latency Stats (ms) (LATSTATS/wr) press the letters "B" "J" and "K" then
press Enter:
BJK
Customize network view
To display the network statistics, press the letter "n":

HOL-2004-01-SDC
To filter the displayed fields, press the letter "f" and press Enter. To add identification of
uplinks (UPLINK) press "B" and press Enter:
As you can see, you have added the previously hidden UPLINK field to see which
networks have uplinks.
Write to resource file
To write these custom settings to a resource file we'll name .esxtopHOL, type "W
.esxtopHOL" then press Enter:
W .esxtopHOL
Note: Don't edit these resource files manually! For changes, run esxtop and follow the
preceding steps to change the resource file.
When you invoke esxtop either interactively or through batch, esxtop looks for the
default resource file and automatically applies any filters it finds.
View the resource file's contents
You can create different resource files for specific components. For example, you may
want to create a CPU-only resource file, memory-only, and so forth.
Type "q" to exit esxtop.

HOL-2004-01-SDC
To see the differences between the default resource file .esxtop60rc and your custom
.esxtopHOL resource file:
diff .esxtop60rc .esxtopHOL

--- .esxtop60rc
+++ .esxtopHOL
@@ -1,10 +1,10 @@
-ABcDEFghij
-aBcDefgHijKLmnOpq
-ABCdEfGhijkl
-ABcdeFGhIjklmnop
+abcDEFghiJ
+abcDefgHijkLmnOpq
+AbcdEfGhijkl
+AbcdeFGhIJKlmnop
aBCDEfghIJKl
-AbcDEFGHIJKLMNopq
+ABcDEFGHIJKLMNopq
ABCDeF
-ABCDef
+ABCDeF
ABCd
-5c
+5n
Start esxtop with your customized views
Let's say you want to customize your view for capturing performance statistics at
different times of the day for several minutes at a time. This is especially useful when
using esxtop in batch mode. You can create several resource files and use them to filter
your initial view when invoking esxtop whether interactively or through batch.
To start esxtop with your custom resource file, type:
esxtop -c .esxtopHOL
For more information on using esxtop in batch mode to capture statistics and analyze
them later, see the next section.

HOL-2004-01-SDC
Saving esxtop statistics with batch

mode
This module discusses creating the Comma-Separated Value (.csv) with the output
from esxtop in batch mode to share with colleagues and to analyze the statistics
you collect.
In the previous module we filtered the fields to display in esxtop and saved our
preferences in the esxtop resource file. Now that we're collecting only the statistics that
you find interesting, we can invoke esxtop in batch mode and capture the statistics in a
Comma-Separated Values (.csv) file to share with colleagues and graph to look at
trends during the collection period.
Invoking esxtop interactively with a resource file
As we saw in the previous module, you can invoke esxtop interactively and apply your
custom resource file settings with the -c switch.
For example, if you created a resource file for all fields under the CPU display and
named it .esxtopallcpustats, you can invoke esxtop and use the resource file to apply
your preferred filters:
esxtop -c .esxtopallcpustats
For this lab, we already created a sample resource file named .esxtopHOL. This
resource file captures only the statistics we selected in the previous module.
Start the workload
Let's start a workload and use esxtop in batch mode to capture only the statistics we
requested.
Open a PowerShell window
If you don't already have a "Windows PowerShell" window open, click on the
"Windows PowerShell" icon in the taskbar.

HOL-2004-01-SDC
Start load on VMs
Type
.\StartCPUTest2.ps1
and press Enter. Depending on the load of the lab systems, this may take several
minutes.
Wait until you see the RDP sessions to continue.
Open PuTTY
Click the PuTTY icon on the taskbar.
SSH to esx-01a

HOL-2004-01-SDC

2. Click Open
Start esxtop with a resource file in batch mode

HOL-2004-01-SDC
Invoke esxtop in batch mode and apply our custom settings with the -b, -d, and -
n switches:
esxtop -b -d 2 -n 100 -c .esxtopHOL > /tmp/esxtop_HOLstats.csv
where:
• .esxtopHOL is our esxtop resource file with our filtered fields

• -b sets esxtop to run in batch mode
• -d sets the interval to two seconds
• -n sets the number of samples to 100
The above command collects 100 total samples every two seconds over the course of
200 seconds and writes the statistics to a file named /tmp/esxtop_HOLstats.csv.
After 200 seconds, esxtop finishes.

HOL-2004-01-SDC
Review contents of the .csv file
To look inside the esxtop output .csv file, type "more /tmp/esxtop_HOLstats.csv" and
press Enter:
more /tmp/esxtop_HOLstats.csv
As you can see, the output .csv file contains all the statistics we selected. You now can
use NMON Visualizer to graph the statistics as described in the next module. You also
can copy the .csv to a Windows system and use PERFMON to analyze the statistics you
collected.
The next sections discuss examples of how to apply additional esxtop switches.
Example: View all statistics
If you want to override any resource files and record all metrics, add -a:
esxtop -b -d 30 -n 360 -a > /tmp/esxtop_HOLstats.csv
Example: Output to a compressed file
The esxtop output .csv file grows quickly, so you can pipe the output into a
compressed file:
esxtop -b -d 30 -n 360 -a | gzip -9c > /tmp/esxtop_HOLstats.csv.gz

HOL-2004-01-SDC
Conclusion
For more details on esxtop and running in batch mode, see:
• https://communities.vmware.com/docs/DOC-11812
• http://www.yellow-bricks.com/2010/06/02/esxtop-l/
• http://www.yellow-bricks.com/2010/06/01/esxtop-running-out-of-control/

HOL-2004-01-SDC
Graphing esxtop statistics

This module discusses taking the Comma-Separated Value (.csv) output from
esxtop in batch mode and graph it to see trends over the measurement
interval.
You can graph the contents of the esxtop output file to visualize vSphere performance
over the collection interval. This module discusses using NMON Visualizer, a free Java
program that graphs the contents of .csv files. You also can use Windows PERFMON to
view the results.
NMON Visualizer is a Java program and can run on any operating system where Java is
installed, and the user interface is the same no matter which platform you use. Let's get
familiar with it on Windows.

HOL-2004-01-SDC
Copy the output file to desktop
We'll use the esxtop batch output file we created in the previous module. First, you need
to copy the .csv output file from the ESXi host to the desktop.
1. For this lab, we already copied esxtop_HOLstats.csv on the desktop

2. To start NMON Visualizer, double click on its icon

HOL-2004-01-SDC
Start NMON Visualizer
Load esxtop output file

HOL-2004-01-SDC
You need to load the .csv file into NMON Visualizer. In the NMON Visualizer window:
1. Click on File
2. Click on Load...

HOL-2004-01-SDC
Double click on esxtop_HOLstats.csv.
NMON Visualizer loads the output file.
1. The host name

2. The collection period

HOL-2004-01-SDC
Graphing CPU statistics with NMON Visualizer
Click on the gray triangle next to the host "esx-01a.corp.local" to expand the list of
collected statistics.
1. Click on the gray triangle next to "Physical Cpu" to open its folder
2. Click on the word "Total"

HOL-2004-01-SDC
The graph displays total Physical CPU utilization broken down into Processor Time
and Util Time. During the test, physical CPU averaged about 54% utilization.
Let's clear the CPU statistics and look at physical disk activity.
Click on the gray triangle next to "Physical Cpu" to close its folder.
Graphing disk statistics with NMON Visualizer

HOL-2004-01-SDC
1. Click on the gray triangle next to "Physical Disk" to open its folder
2. Click on the last entry in the folder for vmhba65:vmhba65:C0:T0:L2:
We can see that disk utilization increased towards the end of the collection period.
Let's narrow down the collection and see the load for a particular time period.
Click on the Manage button.

HOL-2004-01-SDC
Displaying activity during a custom interval

HOL-2004-01-SDC
1. You can see the system time interval when esxtop capture performance
statistics.
2. You can add a custom interval to narrow down the time period.

HOL-2004-01-SDC
1. Next to Start: change the time from 37 to 40

2. Press Add
3. Press the red Close Window button to close the dialog box and display the
narrowed collection period

HOL-2004-01-SDC
You can see that we've narrowed the statistics to display only the records from
14:40:48 to 14:41:31. We can narrow down further to see only the physical disk
statistics we're interested in.
With NMON Visualizer you can add or remove statistics dynamically. In the box under
the graph, click on the first three check boxes next to Physical Disk Path and
deselect:
1. Command/sec
2. Reads/sec
3. Writes/sec

HOL-2004-01-SDC

Key takeaways
During this lab we learned how to customize the performance statistics we collect using
resource files, how to save the statistics into an output .csv file, and how to graph the
statistics and produce performance charts.
We have only scratched the surface of what esxtop can do. If you want to know more
about esxtop, see these articles:
• Yellow-Bricks esxtop page: http://www.yellow-bricks.com/esxtop/

• Interpreting esxtop statistics: https://communities.vmware.com/docs/DOC-11812
• esxtop "Bible": https://communities.vmware.com/docs/DOC-9279
Clean up procedure
Reset Lab
.\StopLabVMs.ps1

HOL-2004-01-SDC
another module.
Conclusion
This concludes the esxtop in Real-World Use Cases module. We hope you have
enjoyed taking it. Please remember to fill out the survey when you finish.

HOL-2004-01-SDC
Module 5 - vCenter
Performance Analysis (30
minutes)

HOL-2004-01-SDC
Introduction
vSphere 6.7 delivers an exceptional experience with an enhanced VMware
vCenter® Server Appliance™ (vCSA). As mentioned earlier, when measuring
the performance of vCenter 6.7 versus 6.5, performance engineers saw much
higher performance (throughput) and lower latency with operations such as
powering on/off VMs.
This module will show you how to monitor the health/performance of your vCenter
Server using the vCenter Server Appliance Management Interface (VAMI),
as well as tools for detailed analysis, including vimtop, profiler, pg_top and
postgres (database) log files.
to start. If you see anything other than "Ready", please wait a few minutes. If after five
10,000 Foot View of vCenter
For most customers, vCenter looks like a service (vpxd) that UI and API clients make
requests to, and vCenter stores inventory information (hosts, clusters, VMs) in a
database.

HOL-2004-01-SDC
Many years ago, vpxd used to be a monolithic service, and while it's still conceptually
the same, there is a lot more going on under the hood to provide improved
performance, additional features, etc.
vCenter: Under the Hood
Here is what a vCenter Server/vCSA looks like under the hood. Don't worry, we'll touch
on the most important of these as we look at debugging tools later in this module.

HOL-2004-01-SDC
vCenter Server Appliance Management

Interface (VAMI)
The vCenter Server Appliance Management Interface (VAMI) is the
administration Web interface for the vCenter Server Appliance (vCSA), and is used
to perform basic administrative tasks such as monitoring the vCSA, changing the
host name and the network configuration, NTP configuration, and applying
patches and updates.
The VAMI was included in the early versions of vCSA, but was removed by VMware
in vSphere 6.0 and then reintroduced once more in vSphere 6.0 U1. The revamped
VAMI in vCenter 6.7 uses HTML and has a new look and feel. In this section, the
VAMI within the HOL environment will be accessed and some of its performance
monitoring features will be showcased, along with some guidance on what to look
for in case performance is not what you would expect.
Open Chrome
Click the Chrome icon from the shortcut in the Taskbar.
Open the VAMI Login page
1. In the upper-left of the Chrome window, click the HOL Admin folder.
2. Click the vcsa-01a Mgmt bookmark. This is the VAMI interface.
Note the VAMI URL

HOL-2004-01-SDC
We are now at the VAMI login screen. Note a couple of things:
• The vCSA in this lab environment is named vcsa-01a.corp.local

• The VAMI is accessed through a Web server on the vCSA running on port 5480
(note the :5480 in the URL above).
To access the VAMI in your environment, just add the same :5480 suffix to the IP
address or URL of your vCSA in a Web browser.
Login to the VAMI
To login to the VAMI, use these credentials:
1. Username: root
2. Password: VMware1!

HOL-2004-01-SDC
VAMI Summary Screen
This is the Summary screen of the VAMI, which is the default when you login. Note a
couple of things:
1. This is a useful Health Status table, which shows various states of the vCenter
Server (vCSA). In this example, everything is in the "Good" (healthy) state.
2. Click Monitor to explore the various subsystems that are monitored.

HOL-2004-01-SDC
VAMI Monitoring: CPU & Memory
1. Upon clicking Monitor, the first screen shown is CPU & Memory
2. This shows the percentages of CPU & Memory consumption.
3. By default, the time range is over the last hour, but you can change the time
range at the top right of the screen.
4. A good rule of thumb is to keep both CPU & Memory less than 70%. What if
they're higher? Here are some options:
◦ Split the inventory of the vCenter (hosts, clusters, VMs, etc.) across one or
more vCenter Servers. Using vCenter Enhanced Linked Mode allows you to
log in to any single instance of a vCSA and view/manage the inventories of
all the vCenter Server systems in the group. You can join up to 15 vCSA
deployments with vCenter Enhanced Linked Mode.
◦ For CPU > 70%, Add Virtual CPUs to the vCSA VM.
◦ Keep in mind that the CPU scale goes from 0-100% utilization and doesn't
separate out the activity by the individual vCPUs of the vCSA VM.
▪ For example, if you're showing 25% utilization and your vCSA has 4
vCPUs, this could mean that the workload is being divided evenly
between each vCPU, but it could also indicate that one vCPU is
being utilized 100% of the time.
Many services that run on the vCSA are single-threaded, so you do
need to keep this in mind. If you suspect that a single vCPU is being
heavily utilized, you can monitor the CPU activity of the vCSA on a
per-CPU basis from the vSphere client or by using vimtop (which
we'll learn about later).

HOL-2004-01-SDC
◦ For Memory > 70%, Change the Memory Configuration of the vCSA VM.
◦ Consider setting a memory reservation for the vCSA VM. For more
information, see Allocate Memory Resources.
5. Let's move on to the next screen. Click Disks.
VAMI Monitoring: Disks
1. You are now on the Disks section of the Monitor screens. The Disks screen
shows all of the virtual hard disks the vCSA is using, the purpose of the partition,
and how much disk space is being consumed.
2. The DB, DBLog, and SEAT (Stats/Events/Alarms/Tasks) partitions are write-
intensive, so placing this data on SSDs (solid state drives) is preferred to achieve
optimal performance.
3. Let's move on to the next screen. Click Network.

HOL-2004-01-SDC
VAMI Monitoring: Network
1. The Network screen shows a variety of network statistics, including transmit (tx)
and receive (rx) throughput (KB/sec), for both loopback and eth0. Unlike CPU &
Memory, you'll need to click through the list of these counters to get an accurate
portrayal of the network activity of the vCSA. Although these counters should be
monitored, networking is usually not an issue with the vCSA.
2. The important thing to check is that you don't see any errors (as shown here, the
value is 0) for eth0 tx/rx errors detected as well as packets dropped. If
greater than zero, you should look into whether there are networking
infrastructure problems in your environment.
3. Let's move on to the next and final screen. Click Database.

HOL-2004-01-SDC
VAMI Monitoring: Database
1. The Database monitoring tab is arguably the most important, as the information
that it provides is not easily obtained by any other means. The vCSA uses a
PostgreSQL database to store persistent information for the vCSA.
2. The Database page is divided into two charts: Seat space and Overall space
utilization trends. Use Alarms to avoid running out of disk space.
3. The Seat section displays the statistics, events, alarms and stats. These
different categories can be displayed as graph lines by clicking on their names
below the Seat graph.
The total Seat utilization is shown in the bottom graph, as well as the DB log and
core utilization; these graph lines can also be removed from the graph by clicking
on the associated name below the graph. If any of these sections start to fill up,
the reason for this anomaly should be investigated and appropriate actions taken
to ensure that the vCSA database performs as expected.

HOL-2004-01-SDC
VAMI Backup & Update
1. We just covered all of the performance monitoring features of the Monitor tab.
2. While unrelated to performance, you should back up your vCSA on a regular
basis, especially before you perform a major operation on your vCSA such as
updating it. The VAMI tool includes a powerful backup tool (the Backup tab
highlighted) that lets you back up the data on your vCSA either on demand or on
a set schedule. This tool is unique in that, in order to be as space efficient as
possible, it only backs up the data on the vCSA and not the entire vCSA. To
restore the vCSA, you reinstall the vCSA and then restore the backed up data on
it. The restore process can be initiated from the vCSA installation ISO.
3. One of the most critical tasks you can perform to make sure that your vCSA is
safe, secure, reliable, and performant is keeping it updated, and the VAMI has a
feature included that makes the upgrade process as painless as possible: the
Update tab. Let's take a look at what this screen looks like.

HOL-2004-01-SDC
VAMI Update tab
1. When you click the Update tab, a screen appears with current version details.
2. In the upper right-hand corner of the screen is the Check Updates button, which
downloads a list of the latest patches and updates for your vCSA from VMware.
3. Once the list is downloaded you can click on the patch to review important
information about it, including its criticality, the size of the download, and
whether will require a reboot of your vCSA.
4. To install the patch or upgrade, you can select Stage Only or Stage and Install.
If you select Stage Only, it only downloads the patch and then later you'll have
the option to install it when you see fit.
Since this is a lab environment, it is not feasible to upgrade the vCSA, as this is is a
resource- and time-intensive process.
For your environment, however, VAMI Monitoring, Backups and Updates will ensure your
vCSA is running as optimally as possible.
Conclusion
The vCSA has become the de-facto standard in most datacenters for managing a
vSphere environment. For your vSphere environment to run most efficiently, you need
to ensure that the processes running on your vCSA have the resources that they need;
by using the VAMI, you can monitor the performance of the vCSA and detect
abnormalities. You can also use the VAMI to back up and update your vCSA to ensure
that it's patched to the most recent version so that, in the case of a catastrophic event,
you can recover easily and efficiently.
Credits to Tom Fenton and Ravi Soundararajan for much of this VAMI content. For more
information on how to use the VAMI, see Tom's great blog article:
https://virtualizationreview.com/articles/2018/09/10/how-to-use-vami.aspx

HOL-2004-01-SDC
Tools for Detailed Analysis: vimtop

This section will introduce vimtop, a tool for real-time CPU/memory debugging of
the vCSA. Let's see how it looks in the lab environment, and how it might look
under benchmark loads.
Open PuTTY
First, click on the PuTTY icon on the taskbar.
Load vcsa-01a session
1. Scroll down and click on vcsa-01a.corp.local

2. Click Load

HOL-2004-01-SDC
Open vimtop
Simply type in vimtop:
vimtop
and press Enter to start this tool.

HOL-2004-01-SDC
Example vimtop screenshot
Here is an example screenshot of vimtop running within the lab environment. If you're
familiar with top (the Linux performance monitoring tool) or esxtop (the equivalent for
ESXi), you'll notice vimtop has a similar look and feel. The default vimtop screen
provides you with an overview and task pane. The overview pane quantifies the CPU
and memory resources that your vCSA is currently consuming (the top half of the
screen); the task pane (bottom half) shows you the processes that are consuming the
most CPU resources. The CPU activity should never total more than 70% for your vCSA.
By default, vimtop refreshes its data every second. To pause this automatic refresh,
press "p"; alternatively, to set a lower refresh rate, press "s" and then enter the number
of seconds between screen refreshes.
To see the help menu, press "h." The help menu will explain how to add, remove and
reorder columns from vimtop. To quit vimtop, press "q".
Let's see how vimtop looks while under load.

HOL-2004-01-SDC
vimtop During a 'Churn' Benchmark
This is what vimtop looks like during a "churn" benchmark, which basically consists of
creating a VM, powering it on, running for a while, powering it off, and then deleting it.
This screen shows shows us several interesting things:
• vCenter Server (vpxd) is consuming 51.43% CPU, which is over 1/2 of 1 core
• vCenter Server is consuming the highest %CPU/%MEM
• vPostgres is the next big consumer (since vCSA must persist its data to the
database). However, it is multi-processed, unlike vCenter Server, and its threads
are consuming high %CPU/%MEM as well
With their benchmark vcbench, VMware performance engineers measured the number
of operations per second (throughput) that vCenter produced.
This benchmark stresses the vCenter server by performing typical vCenter operations
like power on and off a VM, among several others. vCenter 6.7 performs 16.7 operations
per second, which is a twofold increase over the 8.3 operations per second vCenter
6.5 produced.

HOL-2004-01-SDC
vimtop During a Tagging Benchmark
This is what vimtop looks like during a tagging benchmark (which performs/simulates
advanced API calls, such as PowerCLI Get-Tag). Behind the scenes, tagging goes
through a proxy, the endpoint, through the data service, to the vpxd services (aka
vCenter Services, aka the tagging service).
This screen shows a couple of processes, and here some additional ones that may pop
up:
• vCenter Services (vpxd-svcs) is the tagging service, which is what we would

expect to be first
• vSphere Client process is the data service which handles the requests for tags
(regardless of whether you're using the user interface or a PowerCLI cmdlet)
◦ This might also be vSphere UI if the HTML5 client is being used instead.
• vapi-endpoint for PowerCLI requests may also show up here

• vmdird is the Directory Service, which handles Single-Sign On (SSO), e.g. LDAP

HOL-2004-01-SDC
vimtop Showing Heap Issues; Consider Increasing Heap

Size
The vCenter UI runs as a Java process within the vCSA, and as such, if the CPU
utilization is consistently high, i.e. 100% (as shown here; note that this is not 100%
across all vCPUs, just of one core), for a prolonged period of time, it may be invoking
garbage collection too often. This is an indicator that it may not have enough memory.
Let's look at a command that will show you how to increase the memory size.
vimtop Showing Heap Issues; Consider Increasing Heap

Size
Assuming you still have the PuTTY session open to vcsa-01a, type this command:
cloudvm-ram-size -l vsphere-client
This will show you the memory allocated to the vsphere-client process in your particular
environment (853MB in this example; this will be different in your environment).
You can increase this by using this command:

HOL-2004-01-SDC
cloudvm-ram-size -C 1000 vsphere-client
where 1000 is the value in MB that you want to increase the service's memory to.
Note that the preferred method would be shutting down your vCSA and assigning it that
VM more virtual memory, which should auto-scale all the processes such as
vsphere.client, but that does involve some downtime.
Conclusion
vimtop is a very powerful real-time tool to show you real-time resource issues that may
be adversely affecting the performance of your vCSA.
For more information on vimtop, please visit these excellent resources online:
• https://virtualizationreview.com/articles/2018/04/03/ow-to-monitor-a-vcsa-using-
vimtop.aspx
• https://virtualizationreview.com/articles/2018/09/19/vami-and-vimtop-vcsa.aspx
• https://virtualizationreview.com/Forms/Search-
Results.aspx?query=vimtop%20fenton&collection=VTR_Web

HOL-2004-01-SDC
Tools for Detailed Analysis: vpxd

profiler logs
This section will discuss the vCenter (vpxd) profiler logs files - how to find them on
your vCSA, what they look like, and some important counters to look at.
Open PuTTY
Let's find where the vpxd profiler logs are in the lab environment. If you don't already
have a Putty session open to vcsa-01a, click on the PuTTY icon on the taskbar.
SSH to vcsa-01a
Scroll down and double-click on vcsa-01a.corp.local.

HOL-2004-01-SDC
Find vpxd-profiler logs
1. To find the vpxd profiler log files, execute these commands in the PuTTY window:
cd /var/log/vmware/vpxd
ls -l vpxd-profiler*
2. Note that vpxd-profiler.log is a symbolic link to the most recent log file, while the
older profiler logs are compressed (gzipped).
3. Let's look at the file format of this log file. Run this command:
less vpxd-profiler.log

HOL-2004-01-SDC
vpxd-profiler.log example
Here is an example of what the vpxd-profiler.log file consists of:
1. Timestamp
2. Key-Value pairs (i.e. a vCSA setting, and the value the setting was set to)
This is a large file, with a lot of counters, so what are some useful ones? We'll look at
some next.

HOL-2004-01-SDC
Useful vpxd-profiler.log counters

HOL-2004-01-SDC
Here are a few counters that may be useful while troubleshooting vCSA performance:
• /SessionStats/SessionPool/NumSessions: total # of sessions

• /SessionStats/SessionPool/Id/… : individual session info
• /ProcessStats/PhyMemUsage/mean (shown above): Average memory usage
of vpxd
• /ActivationStats/… : Method calls to vCenter
• /SystemStats/ThreadPool/VpxLro/LongRunning/Queued/total : Thread
issues
• /SystemStats/ThreadPool/VpxLro/ShortRunning/Queued/total : Thread
issues
Press "q" when you are done reviewing the vpxd profiler log file.

HOL-2004-01-SDC
Tools for Detailed Analysis: PostgreSQL

logs and pg_top
This section discusses how to analyze the Postgres logs and using the pg_top
command to debug the database of the vCSA.
Open PuTTY
Let's look at the Postgres logs and the pg_top command in the lab environment.
First, click on the PuTTY icon on the taskbar.
SSH to vcsa-01a
Scroll down and double-click on vcsa-01a.corp.local

HOL-2004-01-SDC
List Postgresql Logs
To list the Postgres log files, run these commands in the PuTTY window:
cd /var/log/vmware/vpostgres/
ls -l postgresql-*
Note that each numbered log file is for a different day of the month; for example,
postgresql-01.log above would contain the database log entries from June 1.

HOL-2004-01-SDC
List Postgresql Logs
Let's search for log entries with the string 'duration' to see which SQL queries took
longer than one second (1,000 ms):
grep duration postgres*
For stats and events tables, these durations are OK. For other tables (core tables: host
tables, VM tables, network tables), if you notice SQL queries consistently taking an
abnormally long time (multiple seconds), that could indicate a performance issue with
your database.
How do we look at database performance once we suspect there's an issue? We'll look
at pg_top next, a tool to do just that.
Running pg_top
Here are the commands to run pg_top on your vCSA:
cd /opt/vmware/vpostgres/current/bin/
./pg_top -U postgres -d VCDB
If you're familiar with top (the Linux performance monitoring tool) or esxtop (the
equivalent for ESXi), you'll notice pg_top has a similar look and feel. The default pg_top
screen provides you with an overview and task pane. The overview pane quantifies
the CPU and memory resources that your PostgreSQL database (VCDB) is currently
consuming (the top half of the screen); the task pane (bottom half) shows you the

HOL-2004-01-SDC
processes that are consuming the most CPU resources. The CPU activity should never
total more than 70%.
By default, pg_top refreshes its data every second. To pause this automatic refresh,
press "p"; alternatively, to set a lower refresh rate, press "s" and then enter the number
of seconds between screen refreshes.
To see the help menu, press "h." The help menu explains how to add, remove, and
reorder columns from pg_top. To quit, press "q".
Let's see how pg_top looks while under load.
Example pg_top screenshot
Here is what pg_top looks like; as you can see, much like top, esxtop, or vimtop, it
shows you real-time CPU and memory process usage, but only for the PostgreSQL
database (VCDB).
There are many single-character commands available from this screen. Press "?" to
see a list of them.

HOL-2004-01-SDC
pg_top help screen
Here is a list of pg_top commands. Note that since this a database-specific top, we can
use the "Q" command to show the query of a currently running process, which can be
useful to understand which see what table a SQL query is accessing.
Press the Space Bar a couple of times to return to the main pg_top screen.

HOL-2004-01-SDC
pg_top with a CPU-intensive process/query
Here is another screenshot of pg_top, but while the PostgreSQL database was running a
CPU-intensive query. Here are some things to note:
1. The CPU for this process was 97.79% (very high)

2. The PID for this process was 3063 (which would be entered when using the "Q"
command to show the query details)
3. The STATE for this process is "run", which means the process is still running (note
the other processes are in a "sleep" state)
4. The COMMAND includes a "DECLARE CURSOR"; CURSOR usually means a query
on the stats table. Recall from the earlier VAMI database section that the VCDB
consists of Alarms, Events, Tasks, and Stats (Performance Statistics). We'll
confirm this query is on the "stats" table when we look at the query details on the
next screen.
Since we are not running a benchmark in the lab environment, the next screen will show
you what the output would be upon typing "Q" and then the PID (3063).
pg_top query details
Here is the result of querying the CPU-intensive query (PID 3063). The "SELECT
sc.stat_id" confirms that the SELECT SQL command was on the stats table.
Your environment (queries, tables) may be different; just be mindful of queries that are
long-running may be scanning all partitions.
• If your vCenter Server has a large inventory and/or has been running for a long
time, it may have a lot of old data (tasks, events, statistics). Managing these

HOL-2004-01-SDC
Tasks, Events and Statistics stored in vCenter Server database is a common

responsibility of the VMware Administrator.
• To keep the vCenter Server database healthy, refer to this KB article, which has
scripts to delete old data, which can improve the performance and stability of
your vCSA: Delete old tasks, events and statistics data in vCenter Server 5.x and
6.x (2110031)
• If you suspect your database might have issues (corruption), you may need to
contact GSS.

HOL-2004-01-SDC
Clients (UI and API) Performance Tips

This section discusses a few tips to achieve better vCenter client performance
(either the user interface/UI or the APIs, e.g. PowerCLI).
Clients: UI
Here are some ways to ensure the vCenter user interface (UI) performance is optimal.
• Monitor heap size of vsphere-client and vsphere-ui services

◦ Impacts single-node queries
◦ Multi-node: Federated queries may take more memory
• Please load certificates

◦ Allows browser caching
◦ https://kb.vmware.com/s/article/2111219
◦ https://kb.vmware.com/s/article/2108294
• Use faster CPUs if possible

◦ Speeds up single-threaded operations
Clients: API (PowerCLI) - Default PowerCLI
This is an example of some PowerCLI code that was taken from the VMware Community
Forums: https://communities.vmware.com/thread/499845
While it gets the job done, internal performance testing with 20 hosts and 300 VMs
showed that this code ran for 80 seconds. Let's see how this code could be optimized,
and how much faster it could run.

HOL-2004-01-SDC
Clients: API (PowerCLI) - Optimized PowerCLI
Note that this PowerCLI code does the same thing, but it makes much fewer API calls to
vCenter -- namely, the highlighted Get-VM and Get-VMHost calls are only executed once
- outside of the ForEach loop. Minimizing unnecessary/repeated PowerCLI calls is
key to obtaining better client API performance.
By doing this, the runtime for the script was reduced from 80 seconds to 7.5 seconds (a
10x speedup).

HOL-2004-01-SDC

Clean up procedure
Reset Lab
.\StopLabVMs.ps1
another module.
Conclusion
This concludes the vCenter Performance Analysis module. We hope you have
enjoyed taking it. Please remember to fill out the survey when you finish.

HOL-2004-01-SDC
Module 6 - Database
Performance Testing with
DVD Store (30 minutes)

HOL-2004-01-SDC
Introduction
This Module introduces DVD Store 3, also known as DS3 for short. It simulates
an online store that allows customers to logon, search for DVDs, read customer
reviews, rate helpfulness of reviews, and purchase DVDs.
Here is a brief overview for the content in this Module:
• What is DVD Store 3?

• Installing DVD Store 3
• Configuring DVD Store 3
• Running/Tuning DVD Store 3
• Resources/Conclusion

HOL-2004-01-SDC
What is DVD Store 3?

This lesson will describe what DVD Store is, including all of its various features.
DVD Store 3 Description
Here is an overview of the DVD Store 3 (DS3) benchmark:
• Simulates an online store that allows customers to login, search, read customer
reviews, rate helpfulness of reviews, and purchase DVDs
• Open-source: https://github.com/dvdstore/ds3
• Incorporated as the e-commerce simulation workload of the VMmark 3.0
benchmark: https://www.vmware.com/products/vmmark.html
• OLTP workload (similar to TPC); performance is measured in Orders Per Minute
(OPM)
• Supports Oracle, SQL Server, and MySQL
• Utilizes many database features, including stored procedures, transactions,
triggers, foreign keys, and full-text indexes
• Latest version includes customer reviews with intelligent review rankings
• Workload can be run at varying load levels to determine the highest performing
test configuration
• Note: while DS3 builds on its predecessor (DS2), its new and more complex
queries mean that results are not comparable to previous releases
DVD Store 3 Database Sizes
DVD Store 3 supports three standard sizes of small, medium, and large. In addition to
these standard sizes, any custom size can be specified during the DVD Store setup. The
number of rows in the various tables that make up the DVD Store 3 database are what is
varied to determine the size specified.
The table below shows the number of rows for the standard sizes for the Customers,
Orders, and Products tables as examples:

HOL-2004-01-SDC
Database Size Customers Orders Products

Small 10 MB 20,000 1,000/month 10,000
Medium 1 GB 2,000,000 100,000/month 100,000
10,000,000/
Large 100 GB 200,000,000 1,000,000
month

HOL-2004-01-SDC
Downloading/Installing DVD Store 3

This lesson will describe how to install the DVD Store 3 benchmark. Specifically,
we will look at how we set it up for this lab environment using the LAMP stack
(Linux, Apache, MySQL, and PHP) stack.
NOTE #1: The LAMP stack is only one of the supported environments for DVD
Store 3. The benchmark supports a variety of databases: Microsoft SQL Server,
Oracle, MySQL, and PostgreSQL.
NOTE #2: This VM and database have already been created; this is informational,
if you'd like to set it up for testing in your own environment.
Creating the database is resource intensive, in terms of both time and storage, so
it is not available for the hands-on lab environment.
Create a Linux VM

HOL-2004-01-SDC
This screenshot shows that, in our lab environment, DVD Store 3 is installed in a CentOS
Linux VM with 1 vCPU, 1 GB of memory, and a 10 GB hard disk.
You may notice that these are lower minimum system requirements than the
Weathervane module. There are a couple of reasons for this:
1. We are only exercising a couple of applications in this VM (namely MySQL for the
database tier and Apache HTTP Server for the web server tier).
2. This VM has been built with a small database size. From the previous lesson, we
learned that DVD Store 3 comes in 3 sizes: small (10 MB), medium (1 GB), and
large (100 GB). For building a medium or large database, you should scale up the
CPU, memory, and disk size appropriately.
OS Installation/Post-Install Tasks
DVD Store should work on any modern Linux distribution. This VM was installed with
CentOS 6.8.
After the OS installation, some of tasks should be run as the root user, prior to installing
DS3:
NOTE: these have already been done in our lab environment; do not run these
commands now!
1. Update all software packages by running the command yum update

2. Install VMware Tools (or open-vm-tools )
3. Stop the firewall by running service iptables stop and disable it on boot by
running chkconfig iptables off
(NOTE: this is for ease of use in a test/dev environment; never do this in
production!)
4. Install MySQL by running yum install mysql-server and start it by running service
mysqld start
5. Install Apache HTTPD Server by running yum install httpd httpd-devel
6. Install PHP with MySQL support by running yum install php php-mysql
7. Create a user named web and set its password to web:
useradd web
passwd web
chmod 755 /home/web
8. Set permissions for this new user within MySQL:
mysql
>create user 'web'@'localhost' identified by 'web';
>grant ALL PRIVILEGES on *.* to 'web'@'localhost' IDENTIFIED BY 'web';
>exit;

HOL-2004-01-SDC
Download and Extract DS3
DVD Store 3 is an open source project that is actively developed and maintained. The
latest version can be downloaded from github, as shown here, from https://github.com/
dvdstore/ds3/
To extract DS3, login as root to your CentOS host, and unzip it with the command unzip
ds3-master.zip
NOTE: This has already been done in our hands-on lab environment, so do not run this
command in the lab VM.
Finally, we need to copy the PHP Web pages to the correct place on the host (again, this
has already been done in our lab, no need to run):
mkdir /var/www/html/ds3
cp /root/home/ds3/mysqlds3/web/php5/* /var/www/html/ds3
service httpd restart

HOL-2004-01-SDC
Building a DVD Store 3 Database/

Starting the Lab
This lesson shows how to build a DVD Store 3 database - hands-on!
We run the configuration script to generate the necessary SQL commands, but due to
time and resource constraints, we do not run the actual build. Our lab environment
already has a pre-built database ready to run.
Launch Performance Lab Module Switcher
Double click the Performance Lab MS shortcut on the Main Console desktop.
Start Module 6

HOL-2004-01-SDC
Click on the Module 6 Start button (highlighted) to run a PowerShell script that starts
the DVD Store 3 VM, and open a PuTTY session to it.
Once the module starts, a PuTTY window and a popup box appear indicating that is has
started (as shown here). Click OK.
We are now ready to learn how to build a DS3 database!
Run the Install_DVDStore.pl script

HOL-2004-01-SDC
Remember earlier when we learned DS3 has three "canned" database sizes (small,
medium, and large)? Well, we can also specify a custom database size to build. Here's
how:
(Press Enter after each command/value)
1. Change to the DS3 directory. In this VM, it's been installed to /root/ds3 and you're
already in the /root folder so type:
cd ds3
2. Run the Install_DVDStore Perl script:
perl Install_DVDStore.pl
3. We are now asked how big we want our DS3 database to be. Let's build a 100 MB
MySQL database:
100
4. When asked if the database size is in MB or GB, specify MB:
MB
5. Since DS3 supports multiple databases, we need to specify MYSQL:
MYSQL
6. Finally, DS3 needs to know if the database server will be on a Windows or Linux
machine; this determines whether the input files will have CR/LF (DOS format).
Choose LINUX:
LINUX
The Install_DVDStore.pl script now does the following:
• Calculate number of rows for the customers , orders and products tables according
to the database size
• Generate the .CSV (Comma-Separated Values) files for each table in the
appropriate folder
• Create the SQL database scripts to build and clean up the database
Please wait for the Perl script to finish.

HOL-2004-01-SDC
This is how the script looks upon completion. Look for the message highlighted:
Completed creating and writing build scripts for MySQL database...
Now that all the MySQL scripts have been generated, the database would normally be
built at this point. The reason that the scripts are generated instead of just doing the
database creation directly is that it allows for the database to be easily recreated later,
or even modified if needed, to address specific testing requirements of individual
environments.
The database build is accomplished by the following commands. NOTE: Do not run
these commands in the lab environment, for a couple of reasons: the database
build takes a long time, and we have already saved you the trouble (a database has
been built and is ready to run).
# NOTE: Do not run these commands in the lab environment

cd mysqlds3
sh mysqlds3_create_all.sh
Now that we've seen how to build a DS3 database, let's start an actual run!

HOL-2004-01-SDC
Configuring/Running DVD Store 3

This lesson will describe how to configure the DVD Store 3 load driver and run it against
the MySQL database VM deployed in the lab environment.
Start top on the DVD Store VM
To view the performance of the DVD Store VM, type the command top and press Enter.
This shows us how much CPU and memory are consumed along with which processes
are taxing the VM the most.
Next, we kick off the DS3 driver from our Windows machine.
Start the DVD Store driver on the Main Console
On the Main Console (Windows desktop), double-click the DVD Store 3 Driver icon
shown here (note: you may need to minimize some windows in order to see it).

HOL-2004-01-SDC
Monitor the driver and the PuTTY windows during the run
While the run is progressing, you should watch both the PuTTY console running top
(shown here on the top) and the DS3 driver window (shown here on the bottom).
Let's make some observations about this screenshot (note: due to the variability of the
cloud, your performance may vary):
1. The CPU utilization line in top shows us that 34.2% is consumed in user space
(application), 9.3% in system (kernel), for a total of 43.5% CPU. There is zero
idle time, however; the rest of the CPU (55.5%) is waiting for I/O -- meaning we
likely have a disk or network bottleneck in our environment.

HOL-2004-01-SDC
2. The process that is consuming the 43.5% CPU utilization we saw is mysqld (the
MySQL database) -- which makes sense, since we're hammering it with a
database benchmark!
Let's look at the output of the driver:
3. These are normal DS3 driver startup messages, indicating the various threads
that are connecting to the database server before the actual run begins
4. Approximately every ten seconds, you will see a performance summary output to
the screen (notice et , elapsed time, goes up by ten each line).
5. There are many statistics on each line (many of them dealing with rt which is
short for response time), but we're most interested in the primary DVD Store
throughput performance metric, known as opm or orders per minute. Here we
can see we're only achieving about 40 opm on average, which is very low. You
would achieve much higher opm numbers in an optimized testbed.
Congratulations! You're now running DVD Store!
Here's the command we used on the Windows machine to start the driver, in case
you're curious:
c:\hol\ds3mysqldriver --target=dvdstore-01a.corp.local --n_threads=5 --warmup_time=0

--detailed_view=Y
Let's see what each of these driver parameters means.
Show the driver parameters

HOL-2004-01-SDC
Click the Command Prompt icon to open a Command Prompt window.
Type this command as shown and press Enter:

ds3mysqldriver
You can see a list showing each Parameter Name, Description, and Default Value.
You can also create a configuration file and pass that on the command line instead of
manually setting each parameter.

HOL-2004-01-SDC
Analyzing Results/Improving DVD

Store 3 Performance
This lesson describes how to anaylze DVD Store 3 results (specifically, comparing and
contrasting a low-performing run versus a higher-performing run) and then looking at
ways to improve performance.
DVD Store 3 Performance Metrics: opm (throughput) and

rt (response time)
Performance metric
Definition Value
(abbreviation)
Orders Per Minute
opm Higher = better
(throughput)
rt Response Time (latency) Lower = better
We will look at a couple of results that we'll call "bad" and "good".
• A "bad" or low-performing configuration has low opm (orders per minute) and
high response time (rt).
• Conversely, a high-performing configuration has high opm (orders per minute)
and low response time (rt).

HOL-2004-01-SDC
Example output: "Bad" performance (low opm, high rt)
Here is example output of a poorly-performing configuration. We will look at several key

areas:
1. Real-time output: every 10 seconds ( et is short for elapsed time, in seconds),

the DS3 driver will output a line showing how long it's been running, how many
orders per minute (opm) were achieved, and response times (rt).
2. After the driver finishes, it prints a line that starts with Final that shows the
overall performance statistics.
et= 60.0 tells us that this was a short run (only one minute).
3. opm=41 tells us that the database server was only able to process 41 operations
per minute. This is low, but expected, as it was run in our nested hands-on lab
environment which shares resources with many other labs.
4. rt_tot_avg=13218 tells us that the average response time was 13218
milliseconds (13.218 seconds). This is high, but again, expected.
Let's compare this to a high-performing run that was done in an isolated dedicated lab
environment.

HOL-2004-01-SDC
Example output: "Good" performance (high opm, low rt)
Here is example output of a high-performing configuration:
1. This summary line which starts with Final that shows the overall performance
statistics.
et= 609.4 tells us that this was a 10-minute run (~600 seconds).
2. opm=74932 indicates this database server was able to process 74,932 operations
per minute. This is much higher than the previous example, as it is a highly-
tuned performance configuration.
3. rt_tot_avg=87 tells us that the average response time was only 87
milliseconds. Again, this low value is in stark contrast to the previous example.
So what factors determine whether a database server can sustain high load, and thus
achieve the maximum opm?
Database Performance Factors
Obviously, we want to achieve the maximum opm (database performance) possible in

our environment.
There are many factors that affect performance, and there isn't enough time in this lab
to cover any one in detail, but here is a short list (NOTE: some references say 6.5, but
still apply to 6.7):
• Follow the Performance Best Practices for VMware vSphere 6.5. This guide
covers hardware (processors, storage, network), the ESXi/vSphere hypervisor,
and virtual machine (guest operating system) performance tuning.
• Follow the best practices for your particular database server. Here are some
good examples:
◦ SQL Server: Architecting Microsoft SQL Server on VMware vSphere Best
Practices Guide
◦ Oracle: Oracle Databases on VMware Best Practices Guide
◦ MySQL: MySQL Performance Tuning and Optimization Resources
• Check out some recent whitepapers from the VMware performance team that
used DVD Store 3 on vSphere 6.5:
◦ SQL Server VM Performance with VMware vSphere 6.5
◦ Oracle Database Performance on vSphere 6.5 Monster Virtual Machines

HOL-2004-01-SDC
By following these guides and testing the performance of your particular environment
prior to production deployment, you can ensure your virtualized databases achieve
maximum throughput.

HOL-2004-01-SDC

Congratulations! You now know how to install, configure, and run the DVD Store
3 benchmark!
You've also learned how to tune your database server to achieve the
maximum orders per minute (opm), so your database throughput will be
as high as possible with the lowest response times.
Stop Module 6
To end this module:
1. Click on the Module Switcher in the taskbar (or the desktop icon if you closed
it)
2. Click the Stop button for Module 6.
Congratulations on completing this module!
For more information about DVD Store 3, and database performance in general, here are
some helpful links:

HOL-2004-01-SDC
• GitHub Repository: https://github.com/dvdstore/ds3
Best Practices:
• Architecting Microsoft SQL Server on VMware vSphere Best Practices Guide:

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/
sql-server-on-vmware-best-practices-guide.pdf
• Oracle Databases on VMware Best Practices Guide: https://www.vmware.com/
content/dam/digitalmarketing/vmware/en/pdf/partners/oracle/oracle-databases-
on-vmware-best-practices-guide.pdf
• MySQL Performance Tuning and Optimization Resources: https://www.mysql.com/
why-mysql/performance/
DVD Store blogs/whitepapers:
• Benchmark Your SQL Server Instance with DVD Store: http://www.davidklee.net/

2013/07/29/benchmark-your-sql-server-instance-with-dvdstore/
• SQL Server VM Performance with VMware vSphere 6.5: https://blogs.vmware.com/
performance/2017/03/sql-server-performance-vmware-vsphere-6-5.html
• Oracle Database Performance on vSphere 6.5 Monster VMs:
https://blogs.vmware.com/performance/2017/05/oracle-database-performance-
vsphere-6-5-monster-virtual-machines.html
DVD Store 3 is also one of the key workloads in VMmark 3.0:
• https://blogs.vmware.com/performance/2017/06/introducing-vmmark3.html
• https://www.vmware.com/products/vmmark.html

HOL-2004-01-SDC
Module 7 - Application
Performance Testing with
Weathervane (45
minutes)

HOL-2004-01-SDC
Introduction
This Module introduces Weathervane, a new application-level performance

benchmark designed to allow the investigation of performance trade-offs in
modern virtualized and cloud infrastructures.
Here is a brief overview for the content in this Module:
• What is Weathervane?
• Installing Weathervane
• Configuring Weathervane
• Running/Tuning Weathervane
• Resources/Conclusion

HOL-2004-01-SDC
What is Weathervane?
This lesson will describe what the Weathervane benchmark is, and how it is
different from traditional benchmark workloads.
Weathervane Description
• Weathervane is an application-level performance benchmark designed to

allow the investigation of performance trade-offs in modern virtualized and
cloud infrastructures.
• Weathervane is flexible. A deployment can be customized for the needs of the
evaluation.
◦ A small deployment can be hosted within 1 VM or many VMs with
multiple load balancers, messaging servers, web servers, application
servers, and database servers
◦ Supports multiple types of web server and database servers (read on for
specifics)
◦ Unlike industry-standard benchmarks, there are no run rules or
requirements to have results audited/reviewed
• Weathervane supports measurements of key cloud performance metrics and

technologies:
◦ Basic metrics: throughput and response-time with Quality of Service
(QoS) limits
◦ Isolation and Fairness: Can run many simultaneous application instances
◦ Elasticity: Number of instances in service tiers can be scaled up and down
at run-time in response to changing load
◦ Deployment technologies: Application can be deployed in Virtual
Machines and Docker containers
• Workflow is highly automated to simplify running a large multi-tier application

benchmark.

HOL-2004-01-SDC
Weathervane Components
Weathervane consists of three main components (if the picture above seems daunting,
do not fear: this lab has all three components running inside one Linux VM!). It is
possible to run every Weathervane service in one VM or container, but it is also possible
to run only specific service tiers, or even only specific service instances.
1. The Workload driver that can drive a realistic and repeatable load against the
application
2. The Run Harness that automates the process of executing runs and collecting
results and relevant performance data
3. The Auction Application itself is a web-application for hosting real-time
auctions.
We will take a look at each of these components in more detail then run Weathervane in
our lab environment.

HOL-2004-01-SDC
Workload Driver
The Weathervane workload driver has several key features:
• Supports the data-driven, asynchronous behaviors typical of modern Web

applications
• A simulated user can have multiple behaviors representing asynchronous
operations
• Supports complex control-flow decisions based on retrieved state and operation
history
• Asynchronous design uses a small number of threads but can simulate a large
number of users
• Flexible design means it could be used to drive load for any web application

HOL-2004-01-SDC
Run Harness
The Weathervane run harness is controlled by a configuration file that describes the
deployment, including:
• Number of instances in each application tier

• Number of tiers
• Service implementations
• Service Tuning
The harness also does several other extremely useful tasks:
• Manages configuration/start/stop of all services

• Pre-loads and prepares data for run
• Automates multiple run experiments: findMax (finds the maximum number of
users that can be serviced) and targetUtilization (which only drives utilization
up to a certain percentage, such as 70%)
• Collects host, service, and application level logs and performance stats

HOL-2004-01-SDC
Later in this module, we will start an actual run in the lab environment using the
harness to see how easy it is -- it is literally just one command!
Auction Application
The Auction Application, as we can tell from the picture above, is the most complex
portion of Weathervane.
It is a web app that simulates hosting real-time auctions. It uses an architecture that
allows deployments to be easily scaled to, and sized for, a large range of user loads. A
deployment of the application involves a wide variety of support services, such as
caching, messaging, data store, and relational database tiers. Many of the
services are optional, and some support multiple provider implementations.
A default Weathervane deployment like the VM in this lab uses the following
applications (click the links for more information about the applications). All are set up
"out of the box" (ready to run) via the automatic setup script that comes with the
benchmark:
• Apache Tomcat is the application server

HOL-2004-01-SDC
• Nginx is the web server (Apache HTTP Server is also supported)

• PostgreSQL is the database server (MySQL is also supported)
• MongoDB is the NoSQL data store
• RabbitMQ is the message server
In addition, the number of instances of some of these services can be scaled

elastically at run time in response to a preset schedule or to monitored performance
metrics. The flexibility of the application deployment allows us to investigate a wide
variety of complex infrastructure-related performance questions.

HOL-2004-01-SDC
Downloading/Installing Weathervane
This lesson describes how to install the Weathervane benchmark. It is very easy
set up as most of it is automated.
NOTE: Weathervane has already been installed in our hands-on lab environment,
so this lesson is purely informational (for example, if you want to learn how easy it
is to install Weathervane in your own environment). In the next lesson, we will
configure and run Weathervane in the lab environment.
Create a Weathervane VM
Creating a Weathervane host is relatively straightforward. The process of setting up

Weathervane starts with creating a Weathervane host, which is a CentOS 7 VM that
we configure to run the workload driver, run harness, and application components.
When creating a VM, select Linux as the Guest OS Family, and Red Hat Enterprise
Linux 7 (64-bit) as the Guest OS Version. This is necessary in order for proper
operation of customization scripts when cloning the VM.

HOL-2004-01-SDC
As shown in the screenshot, the virtual hardware must have at least 2 CPUs, 8 GB of
memory, and at least 20 GB of disk space (we used 30 GB in this example). For
larger deployments, the hardware can be scaled up appropriately (see the Weathervane
documentation for more details).

HOL-2004-01-SDC
Install CentOS 7

HOL-2004-01-SDC
The CentOS 7 installation may be a Minimal Install (the default, as shown) or a full
desktop install.
In fact, you may want to create one Weathervane host with a full desktop install for
running the harness, and a second with a Minimal Install for cloning to VMs for running
the various Weathervane services.
Post-OS Installation Tasks
After completing the OS installation, some of tasks should be done prior to installing
Weathervane:
1. Update all software packages by running the command yum update as the root
user.
2. Install VMware Tools (for CentOS 7, open-vm-tools) by running the command yum
install -y open-vm-tools as the root user.
3. Install Java by running the command yum install –y java-1.8.0-openjdk* as the root
user.
4. Install Perl by running the command yum install –y perl as the root user.
NOTE: These commands will not work in the lab environment, but these tasks have
already been performed in our VM.

HOL-2004-01-SDC
Download and Extract Weathervane
Weathervane is an open source project developed by VMware. As such, the latest

release tarball (.tar.gz) can be downloaded from github, as shown here, from
https://github.com/vmware/weathervane/releases
A release tarball is a snapshot of the repository at known good point in time. Releases
are typically more heavily tested than the latest check-in on the master branch.
To install Weathervane, login as root to your CentOS host and unpack the tarball with
the command tar zxf weathervane-1.0.14.tar.gz
Once the tarball is extracted, build the Weathervane executables.
Building the Weathervane Executables
To build the Weathervane executables, unpack the tarball in the previous step, go into
the /root/weathervane directory and issue the command:
./gradlew clean release

HOL-2004-01-SDC
The first time you build Weathervane, this downloads a large number of dependencies.
Wait until the build completes before proceeding to the next step.
Running the Weathervane auto-setup script
The auto-setup script configures the VM to run all of the Weathervane components.
NOTE: the VM must be connected to the Internet in order for this process to succeed.
From the Weathervane directory, Run the script using the command:
./autoSetup.pl
The auto-setup script may take an hour or longer to run depending on the speed of your
internet connection and the capabilities of the host hardware.
Once it has completed, the VM must be rebooted. Weathervane is now ready to run!

HOL-2004-01-SDC
Configuring Weathervane
This lesson describes how to start the lab and configure the Weathervane benchmark on
our lab environment deployment.
Double click the Performance Lab MS shortcut on the Main Console desktop, or switch
to that window on the taskbar.
Start Module 7

HOL-2004-01-SDC
Click on the Module 7 Start button (highlighted) to start a PowerShell script to start the
Weathervane VM, and open two PuTTY sessions to it.
Once the module starts, you see two PuTTY windows side-by-side and a popup window
(as shown here). Click OK.
We are now ready to configure and run Weathervane!
Configuring Weathervane

HOL-2004-01-SDC
We should look at the Weathervane configuration file to see how configurable this
benchmark is.
In the PuTTY window on the left, type this command and press Enter:
less weathervane.config
We can now use the standard navigation keys (Up/Down arrows, Page Up/Down) to
see the various parameters to customize.

HOL-2004-01-SDC
We are now looking at the beginning of the Weathervane configuration file. As standard
with most configuration files, lines that start with "#" are commented out and thus
ignored by Weathervane.
Highlighted here is one of the most useful parameters (which is why it is at the top!):
users. As the comments state, this determines how many simulated users are active
during a Weathervane benchmark run. This has already been reduced to the minimum

HOL-2004-01-SDC
value of 60 due to the constraints of our lab environment, but the default is 300 as we
will see next.

HOL-2004-01-SDC
In the right-hand PuTTY window, type the following command and press Enter:
(Note: the character before less is the pipe symbol (typically typed by holding down
Shift and pressing the backslash \ key. You can also select this text and drag-and-drop it
directly into the PuTTY window -- try it!)
./weathervane.pl --help |less

HOL-2004-01-SDC
The --help command we just ran lists all the Weathervane command-line parameters. If
any of these parameters are set on the command line, it will override both the
Parameter Default and even the value set in the weathervane.config file we just looked
at.
As shown in this screenshot, the users parameter defaults to a value of 300, but we
have set it to the minimum value of 60 in the weathervane.config. If we wanted to try a
Weathervane run of 100 users, we could override it on the command line, i.e.
./weathervane.pl --users=100.

HOL-2004-01-SDC
In both PuTTY windows, press the Page Down key to scroll down to the next page, and
you should see a screen similar to this. As the help text explains, Weathervane has
three run length parameters: rampUp, steadyState, and rampDown. To make it easier,
you can set all three parameters by changing runLength to short, medium, or long.
In the interest of time (and to not tax our lab environment for any longer than it needs
to be!), we have set the values to 30, 60, and 0 in our configuration file. In an actual
benchmark environment, we would want to set runLength to medium or long to gauge
performance over a longer period of time.
At this point, feel free to use the arrow keys and the Page Up/Page Down keys to
look at all of the parameters Weathervane supports. As you can see, it is very
configurable!
Now that we have looked at the Weathervane configuration file and the help text, left-
click in each PuTTY window and press q to "quit" less and return to the bash shell.
You should see a screen similar to this.
In the next lesson, we'll start an actual Weathervane benchmark run!

HOL-2004-01-SDC
Running/Tuning Weathervane
This lesson describes how to run and tune the Weathervane benchmark using the VM
deployed in the lab environment.
Running Weathervane

HOL-2004-01-SDC
Now that we have learned how to configure Weathervane, we can start a test run! This
is actually the easiest part, since the run harness automates starting the necessary
services, gathering performance statistics, and stopping the benchmark once the run
lengths we specified have elapsed.
Click in the left-hand PuTTY window, and start the Weathervane benchmark harness by
running one simple command (and press Enter):
./weathervane.pl
Note that since we are already in the /root/weathervane directory, we invoke

weathervane.pl from the current directory.
In the right-hand PuTTY window, the processes consuming CPU, memory, etc. in real-
time can be monitored while Weathervane is running by running the Linux top
command (press Enter afterwards):
top

HOL-2004-01-SDC
Weathervane should now be running!
• You can view the progress of the run on the left. Be patient, as it takes a few
minutes to start all of the services.
• You can view the processes for the various Weathervane services on the right.
The top output can be broken down into three sections:
1. This shows the CPU utilization of the two virtual CPUs (vCPUs); these values will
fluctuate throughout the run. In this screenshot, they are both heavily utilized
(95-96%), which is expected for this benchmark.
2. This shows the memory utilization of the VM.
The top line ( KiB Mem ) shows us that most of the 8 GB we have allocated to the
VM is used , with very little free ; again, this is expected, as there are many
services/processes running and consuming RAM.
Conversely, the next line ( KiB Swap ) shows that while we have ~3 GB of swap
space, most of it is free , and very little used ; this is a Good Thing, as Linux is not
having to swap memory to disk (which is likely what would happen if we did not
give the VM enough memory, i.e. 4 GB)
3. The bottom part of the top output shows the running processes, sorted by
highest CPU utilization ( %CPU ) first. At a quick glance, we can see that java
(Tomcat), mongod (MongoDB), and postgres (PostgreSQL) are the heavy hitters.
This benchmark run takes some time to complete (~15 minutes from start to finish).
While we wait, we can browse through the Weathervane documentation to see how we
can improve performance.

HOL-2004-01-SDC
Tuning Parameters (User's Guide)
The Weathervane User's Guide comes as a PDF with the benchmark that shows how to
install, configure, and tune Weathervane. It also has a handy section on Tuning
Parameters.
The document is available here on github: https://github.com/vmware/weathervane/raw/

master/weathervane_users_guide.pdf
We will not make you read this 99-page document from beginning to end :-) In any
case, we have already touched on a lot of what this guide covers in terms of installation
and configuration.
Therefore, scroll down to page 56 (shown here), which has a section on Component
Tuning. Skim through the next few pages to get a feel for the parameters you can
experiment with to tune the various tiers inside Weathervane:
• Workload Driver Tuning Parameters: These options tune the workload-driver,

such as heap size, # of threads.

HOL-2004-01-SDC
• Web Server (Apache Httpd, Nginx Tuning Parameters): The run harness provides
a number of parameters related to tuning Nginx, but the harness can manage the
tuning of these parameters.
• Database Server (MySQL, PostgreSQL Tuning Parameters): The run harness
allows for automatic tuning or manually specifying values such as buffer sizes.
• MongoDB: The run harness allows for disabling/enabling transparent huge
pages.
• File Server (optional): If you choose to use a NFS file server instead of MongoDB
for the image store, you can adjust the processes and read/write buffer sizes.
Another way to improve performance of a Weathervane environment is to clone the

Weathervane host VM, and assign different services to each. For example, you can
have separate (and multiple) VMs that act as application servers, web servers, NoSQL
data stores, etc. For more information, see section 7.5 of the User's Guide, "Cloning
the Weathervane VM".
Check on the Weathervane run

HOL-2004-01-SDC
Periodically switch back to the PuTTY windows to check on the progress of the run.
When the Weathervane benchmark run has finished, you will see screens similar to this
one. Specifically:
1. On the left, you will see messages about Cleaning and compacting storage , and
whether the run Passed or Failed .
NOTE: It is OK if it says failed and/or a message such as Failed Response-Time
metric . In our shared lab environment, the response times likely won't meet the
benchmark requirements. This would not be an issue in a dedicated test/dev
environment.
2. This Take specific note of the run number at the end (in this example, it is Run
8 ). We use that number in the next step when we look at the output files.
3. On the right, note the top screen will indicate the Linux VM is now essentially
idle ( %Cpu less than 1%, and most of the memory is free ).
4. Once you have confirmed the run is over, close the PuTTY window on the right
by clicking the "x" in the upper-right (click OK when PuTTY asks you to
confirm).
5. Maximize the remaining PuTTY window on the left by clicking the maximize
button in the upper-right, as shown.
Analyzing Weathervane benchmark output
After running the benchmark, you can look at the various log files the Weathervane run
harness collects:
1. cd output (all Weathervane output is stored in /root/weathervane/output )

2. ls (to show all the runs on this VM; determine the most recent one)
3. cd 8/ (replace with the most recent run number)
4. cat version.txt (records the version of Weathervane used to run this result)
5. cat run.log (shows any errors and details of response-times from each of the
application instances)

HOL-2004-01-SDC
6. cat console.log (not shown; this is just a record what you already saw output to
the PuTTY console, i.e. the start/stopping of services, whether the run passed or
failed, and cleanup)
Once you are done looking at these files, you can close this PuTTY console.
If a run passes, this means that the application deployment and the underlying
infrastructure can support the load driven by the given number of users with
acceptable response-times for the users' operations.
A typical way of using Weathervane is to compare the maximum number of supported

users when some component of the infrastructure is varied. For example, if the same
application configuration is run on two different servers, you can compare the maximum
user load supported by the servers to determine which has better performance for this
type of web application.
Congratulations! You now know how to run Weathervane!

HOL-2004-01-SDC

Congratulations! You now know how to install, configure, and run the
Weathervane benchmark!
How to End Module 7
To end this module, open the Module Switcher window and click the Stop button for
Module 7.
For more information about Weathervane, here are some helpful links:
• GitHub Repository: https://github.com/vmware/weathervane

◦ User’s Guide: https://github.com/vmware/weathervane/blob/master/
weathervane_users_guide.pdf
• VMware VROOM! Blogs:

◦ https://blogs.vmware.com/performance/2015/03/introducing-weathervane-
benchmark.html
◦ https://blogs.vmware.com/performance/2017/04/weathervane-
performance-benchmarking-now-open-source.html
• Weathervane is also one of the key workloads in VMmark 3.0:

◦ https://blogs.vmware.com/performance/2017/06/introducing-
vmmark3.html
◦ https://www.vmware.com/products/vmmark.html

HOL-2004-01-SDC
Module 8 - Processor
Performance Monitoring,
Host Power Management
(30 minutes)

HOL-2004-01-SDC
Intro to CPU Performance Monitoring

and Host Power Management
The goal of this module is twofold:
1. Expose you to a CPU contention issue in a virtualized environment, and

quickly identify performance problems by checking various performance
metrics and settings
2. Learn about how to review/change power management policies at the
server BIOS and within ESXi via the vSphere Client
Performance problems may occur when there are insufficient CPU resources to
satisfy demand. Excessive demand for CPU resources on a vSphere host may
occur for many reasons. In some cases, the cause is straightforward. Populating a
vSphere host with too many virtual machines running compute-intensive
applications can make it impossible to supply sufficient CPU resources to all the
individual virtual machines. However, sometimes the cause may be more subtle,
related to the inefficient use of available resources or non-optimal virtual machine
configurations.
Let's get started!

HOL-2004-01-SDC
CPU Contention, vCenter Performance

Charts
Below are a list of most common CPU performance issues:
High Ready Time: A CPU is in the Ready state when the virtual machine is ready
to run but unable to run because the vSphere scheduler is unable to find physical
host CPU resources to run the virtual machine on. Ready Time above 10% could
indicate CPU contention and might impact the Performance of CPU intensive
application. However, some less CPU sensitive application and virtual machines
can have much higher values of ready time and still perform satisfactorily.
High Costop time: Costop time indicates that there are more vCPUs than
necessary, and that the excess vCPUs make overhead that drags down the
performance of the VM. The VM likely runs better with fewer vCPUs. The vCPU(s)
with high costop is being kept from running while the other, more-idle vCPUs are
catching up to the busy one.
CPU Limits: CPU Limits directly prevent a virtual machine from using more than a
set amount of CPU resources. Any CPU limit might cause a CPU performance
problem if the virtual machine needs resources beyond the limit.
Host CPU Saturation: When the Physical CPUs of a vSphere host are being
consistently utilized at 85% or more then the vSphere host may be saturated.
When a vSphere host is saturated, it is more difficult for the scheduler to find free
physical CPU resources in order to run virtual machines.
Guest CPU Saturation: Guest CPU (vCPU) Saturation is when the application
inside the virtual machine is using 90% or more of the CPU resources assigned to
the virtual machine. This may be an indicator that the application is being
bottlenecked on vCPU resource. In these situations, adding additional vCPU
resources to the virtual machine might improve performance.
Oversizing VM vCPUs: Using large SMP (Symmetric Multi-Processing) virtual

machines can cause unnecessary overhead. Virtual machines should be correctly
sized for the application that is intended to run in the virtual machine. Some
applications may only support multithreading up to a certain number of
threads. Assignment of additional vCPU to the virtual machine may cause
additional overhead. If vCPU usage shows that a machine, which is configured
with multiple vCPUs and is only using one of them. Then it might be an indicator
that the application inside the virtual machine is unable to take advantage of the
additional vCPU capacity, or that the guest OS is incorrectly configured.
Low Guest Usage: Low in-guest CPU utilization might be an indicator, that the
application is not configured correctly, or that the application is starved of some

HOL-2004-01-SDC
other resource such as I/O or Memory and therefore cannot fully utilize the
assigned vCPU resources.
Double click the Performance Lab MS shortcut on the Main Console desktop.
Start Module 8
Click on the Start button for Module 8, and a script launches.
The script takes a few minutes to run.

HOL-2004-01-SDC
Wait until you see "Press Enter to continue" to proceed. Press enter.
CPU Benchmarks Started
When the script completes, you see two Remote Desktop windows open (note: you may
have to move one of the windows to display them side by side, as shown above).
The script has started a CPU intensive benchmark (SPECjbb2005) on both perf-
worker-01a and perf-worker-01b virtual machines, and a GUI is displaying the real-time
performance value as this workload runs.
If you do not see the SPECjbb2005 window open, launch the shortcut in the upper left
hand corner.

HOL-2004-01-SDC
Above, we see an example screenshot where the performance of the benchmarks are
around 15,000.
IMPORTANT NOTE: Due to changing loads in the lab environment, the performance
values may vary. Please make note of the approximate Performance scores, as it will
change later.
Navigate to VM-level Performance Chart
Click on the Chrome icon to open a browser window.

HOL-2004-01-SDC

1. Select the perf-worker-01a virtual machine from the list of VMs on the left
2. Click the Monitor tab
3. Click Performance
4. Click Advanced
5. Click on the Popup Chart icon so we can get a dedicated chart popup window.
Select Chart Options
Let's maximize the window and select specific counters via Chart Options:
1. Click the Maximize window icon (be careful not to click Close!)
2. Click Chart Options at the top

HOL-2004-01-SDC
Select CPU Counters for Performance Monitoring
When investigating a potential CPU issue, there are several counters that are important
to analyze:
• Demand: Amount of CPU the virtual machine is demanding / trying to use.

• Ready: Amount of time the virtual machine is ready to run but unable to because
vSphere could not find physical resources to run the virtual machine on
• Usage: Amount of CPU the virtual machine is actually currently being allowed to
use.
1. Select CPU on the left-hand side (if it's not already selected by default)
2. Scroll through the list, and check these counters: Demand, Ready, and Usage
in MHz
3. Only select perf-worker-01a for the Target Object (deselect 0 if it's checked)
4. Click OK

HOL-2004-01-SDC
Monitor Demand vs. Usage lines
Notice the amount of CPU this virtual machine is demanding and compare that to the
amount of CPU usage the virtual machine is actually allocated (Usage in MHz). The
virtual machine is demanding more than it is currently being allowed to use.
Notice that the virtual machine is also seeing a large amount of ready time. Guidance:
Ready time > 10% could be a performance concern.
You can close this popup window, but please leave the vSphere Client window open.

HOL-2004-01-SDC
CPU State Times Explanation
Virtual machines can be in any one of four high-level CPU States:
• Wait: This can occur when the virtual machine's guest OS is idle (waiting for
work), or the virtual machine could be waiting on vSphere tasks. Some examples
of vSphere tasks that a vCPU may be waiting on include waiting for I/O to
complete or waiting for ESXi level swapping to complete. These non-idle vSphere
system waits are called VMWAIT.
• Ready (RDY): A CPU is in the Ready state when the virtual machine is ready to
run but unable to run because the vSphere scheduler is unable to find physical
host CPU resources to run the virtual machine on. One potential reason for
elevated Ready time is that the VM is constrained by a user-set CPU limit or
resource pool limit, reported as max limited (MLMTD).
• CoStop (CSTP): Time the vCPUs of a multi-vCPU virtual machine spent waiting
to be co-started. This gives an indication of the co-scheduling overhead incurred
by the virtual machine.
• Run: Time the virtual machine was running on a physical processor.

HOL-2004-01-SDC
Explanation of value conversion
NOTE: vCenter reports some metrics such as "Ready Time" in milliseconds (ms). Use
the formula above to convert the milliseconds (ms) value to a percentage.
For multi-vCPU virtual machines, multiply the Sample Period by the number of vCPUs of
the VM to determine the total time of the sample period. It is also beneficial to monitor
Co-Stop time on multi-vCPU virtual machines. Like Ready time, Co-Stop time greater
than 10% could indicate a performance problem. You can examine Ready time and Co-
Stop metrics per vCPU as well as per VM. Per vCPU is the most accurate way to
examine statistics like these.
Navigate to Host-level CPU chart view
1. Select esx-01a.corp.local in the vSphere Client.

2. Select the Monitor tab
3. Select the Advanced Performance view
4. Select the Popup Chart icon

HOL-2004-01-SDC
Examine ESXi Host Level CPU Metrics
Click the Maximize window icon to get the maximum real estate.
Notice in the Chart, that only one of the CPUs (pictured here in green) on the host
seems to have any significant workload running on it. We'll see why this is the case
next.

HOL-2004-01-SDC
VM CPU Affinity set via PowerCLI
The PowerShell/PowerCLI script that ran when we started this lab set the CPU Affinity of
both VMs (perf-worker-01a and perf-worker-01b) to CPU 1, as shown here.
Normally, affinitizing VMs to specific CPUs (also known as "pinning") is generally not a
best practice. It is only used here as a demonstration.
Stop 1 VM, Monitor ESXi Host CPU
Switch back to the Chrome window (vSphere Client) to shut down one of the VMs:
1. Click on perf-worker-01b

HOL-2004-01-SDC
2. Click on the Shut Down Guest OS (stop icon) and click YES
Let's see if the ESXi Host CPU level has dropped from 100% by shutting this VM down.
Examine ESXi Host Level CPU with 1 VM
Notice that even after shutting down 1 of the VMs, CPU1 is still at 100%. Why?
• Both VMs run the same CPU-intensive benchmark

• Both VMs were affinitized (pinned) to CPU1, so even after one was shut down, the
rest of the resources went to the remaining VM.
Since the remaining resources went to perf-worker-01a, let's see if its performance
increased.

HOL-2004-01-SDC
Notice VM Performance Increase on perf-worker-01a
If you recall the scores from both VMs at the beginning of the tests, you'll notice that the
Performance of the remaining VM has increased to approximately double of its
original value, now that we have shut down the other one (and thus reduced the CPU
contention on CPU1).

HOL-2004-01-SDC
Stop 2nd VM, Monitor ESXi Host CPU
Switch back to the Chrome window (vSphere Client) to shut down the remaining VM:
1. Click on perf-worker-01a
2. Click on the Shut Down Guest OS (stop icon) and click YES
Let's see if the ESXi Host CPU level has dropped from 100% by shutting this VM down.

HOL-2004-01-SDC
Examine ESXi Host Level CPU
Now that there are no VMs running on the host, CPU1 is no longer at 100%:
1. Notice the sharp dropoff in the line chart

2. Monitor the Latest value for CPU1, and you'll notice it's essentially idle now.
Summary
In summary:
• It is not recommended to set CPU affinity for VMs, as it could produce the
behavior we observed (multiple VMs vying for the same CPU resources)
• vCenter Performance Charts (accessed through the vSphere Client) are a useful
way to monitor CPU counters in real-time
• If CPU Demand > Used (Usage), then a VM is demanding more than the host is
allowing it to use. In this case, the VM should be allocated more resources,
migrated to a more powerful host, or the application itself may need to be
optimized to use less resources.
Next, let's talk about Power Management, and how to configure different power policies
at the host/BIOS level and within ESXi.

HOL-2004-01-SDC
Configuring Server BIOS Power

Management
VMware ESXi includes a full range of host power management capabilities. These
can save power when an ESXi host is not fully utilized. As a best practice, you
should configure your server BIOS settings to allow ESXi the most flexibility in
using the power management features offered by your hardware, and make your
power management choices within ESXi (next section).
On most systems, the default setting is BIOS-controlled power management.

With that setting, ESXi won’t be able to manage power; instead the BIOS firmware
manages it. The sections that follow describe how to change this setting to OS
Control (recommended for most environments).
In certain cases, poor performance may be related to processor power

management, implemented either by ESXi or by the server hardware. Certain
applications that are very sensitive to processing speed latencies may show less
than expected performance when processor power management features are
enabled. It may be necessary to turn off ESXi and server hardware power
management features to achieve the best performance for such applications. This
setting is typically called Maximum Performance mode in the BIOS.
NOTE: Disabling power management usually results in more power being

consumed by the system, especially when it is lightly loaded. The majority of
applications benefit from the power savings offered by power management with
little or no performance impact.
Bottom line: some form of power management is recommended and should only
be disabled if testing shows this is impacting your application performance.
For more details on how and what to configure, see this white paper:
http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf

HOL-2004-01-SDC
Configuring BIOS to OS Control mode (Dell example)
The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS
can be configured to allow the OS (ESXi) to control the CPU power-saving features
directly:
• Under the Power Management section, set the Power Management policy to
OS Control.
For a Dell PowerEdge 12th Generation or newer server with UEFI (Unified Extensible
Firmware Interface), review the System Profile modes in the System Setup > System
BIOS settings. You see these options:
• Performance Per Watt (DAPC-System)

• Performance Per Watt (OS)
• Performance
• Dense Configuration (DAPC-System)
• Custom
Choose Performance Per Watt (OS).
Next, you should verify the Power Management policy used by ESXi (see the next
section).

HOL-2004-01-SDC
Configuring BIOS to OS Control mode (HP example)
The screenshot above illustrates how a HP ProLiant server BIOS can be configured
through the ROM-Based Setup Utility (RBSU). The settings highlighted in red allow the
OS (ESXi) to control some of the CPU power-saving features directly:
• Go to the Power Management Options section, HP Power Profile, and select

Custom
• Go to the Power Management Options section, HP Power Regulator, and
select OS Control Mode
Next, you should verify the Power Management policy used by ESXi (see the next
section).

HOL-2004-01-SDC
Configuring BIOS to Maximum Performance mode (Dell

example)
The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS
can be configured to disable power management:
• Under the Power Management section, set the Power Management policy to
OS Control.
For a Dell PowerEdge 12th Generation or newer server with UEFI, review the System
Profile modes in the System Setup > System BIOS settings. You see these options:
• Performance Per Watt (DAPC-System)

• Performance Per Watt (OS)
• Performance
• Dense Configuration (DAPC-System)
• Custom
Choose Performance to disable power management.
NOTE: Disabling power management usually results in more power being consumed by
the system, especially when it is lightly loaded. The majority of applications benefit from
the power savings offered by power management, with little or no performance impact.
Therefore, if disabling power management does not realize any increased performance,
VMware recommends that power management be re-enabled to reduce power
consumption.

HOL-2004-01-SDC
Configuring BIOS to Maximum Performance mode (HP

example)
The screenshot above illustrates how to set the HP Power Profile mode in the server's
RBSU to the Maximum Performance setting to disable power management:
• Enter RBSU by pressing F9 during the server boot-up process

• Select Power Management Options
• Change the HP Power Profile to Maximum Performance mode.
NOTE: Disabling power management usually results in more power being consumed by
the system, especially when it is lightly loaded. The majority of applications benefit from
the power savings offered by power management with little or no performance impact.
Therefore, if disabling power management does not realize any increased performance,
VMware recommends that power management be re-enabled to reduce power
consumption.

HOL-2004-01-SDC
Configuring ESXi Host Power

Management
VMware ESXi includes a full range of host power management capabilities. These
can save power when an ESXi host is not fully utilized. As a best practice, you
should configure your server BIOS settings to allow ESXi the most flexibility in
using the power management features offered by your hardware, and make your
power management choices within ESXi. These choices are described below.
Select Host Power Management Settings for esx-01a
1. Select "esx-01a.corp.local"
2. Select "Configure"
3. Select "Hardware" (you will need to scroll all the way to the bottom)
4. Select "Power Management"

HOL-2004-01-SDC
Power Management Policies
On a physical host, the Power Management options could look like this (it may vary
depending on the processors of the physical host).
Here you can see what ACPI states that get presented to the host and what Power
Management policy is currently active. There are four Power Management policies
available in ESXi:
• High Performance: Do not use any power management features.

• Balanced (Default): Reduce energy consumption with minimal performance
compromise
• Low Power: Reduce energy consumption at the risk of lower performance
• Custom: User-defined power management policy. Advanced configuration
becomes available.
1. Click "EDIT..." to see these different options.
NOTE: Due to the nature of this lab environment, we are not interacting directly with
physical servers, so changing the Power Management policy will not have any
noticeable effect. Therefore, while the sections that follow will describe each Power
Management policy, we won't actually change this setting.

HOL-2004-01-SDC
High Performance
The High Performance power policy maximizes performance, and uses no power
management features. It keeps CPUs in the highest P-state at all times. It uses only
the top two C-states (running and halted), not any of the deep states (for example, C3
and C6 on the latest Intel processors). High performance was the default power policy
for ESX/ESXi releases prior to 5.0.
Balanced (default)
The Balanced power policy is designed to reduce host power consumption while
having little or no impact on performance. The balanced policy uses an algorithm that
exploits the processor’s P-states. This is the default power policy since ESXi 5.0.
Beginning in ESXi 5.5, we now also use deep C-states (greater than C1) in the
Balanced power policy. Formerly, when a CPU was idle, it would always enter C1. Now

HOL-2004-01-SDC
ESXi chooses a suitable deep C-state depending on its estimate of when the CPU will
next need to wake up.
Low Power
The Low Power policy is designed to save substantially more power than the
Balanced policy by making the P-state and C-state selection algorithms more
aggressive, at the risk of reduced performance.
Custom
The Custom power policy starts out the same as Balanced, but allows individual
parameters to be modified.
Click "Cancel" to exit.

HOL-2004-01-SDC
The next step describes settings that control the Custom power policy.
Edit Advanced System Settings
To configure the custom power policy settings:
1. Select Advanced System Settings (under the System section)

2. Click the EDIT... button.

HOL-2004-01-SDC
Filter Advanced System Settings
To filter the System Settings for only Power settings:
1. Click inside the Filter text box (next to the Filter icon) and type the word Power.
(make sure to add a period after the word Power)
2. Click the first parameter, Power.ChargeMemoryPct
3. Note that a description and valid minimum and maximum values appear in the
lower-left corner.
4. Click CANCEL after you've reviewed this list.
Some Advanced Settings you can customize include:
• Power.CStateMaxLatency : Do not use C-states whose latency is greater than

this value.
• Power.CStatePredictionCoef : A parameter in the ESXi algorithm for predicting
how long a CPU that becomes idle will remain idle. Changing this value is not
recommended.
• Power.CStateResidencyCoef : When a CPU becomes idle, choose the deepest
C-state whose latency multiplied by this value is less than the host’s prediction of
how long the CPU will remain idle. Larger values make ESXi more conservative
about using deep C-states; smaller values are more aggressive.
• Power.MaxCpuLoad : Use P-states to save power on a CPU only when the CPU is
busy for less than the given percentage of real time.
• Power.MaxFreqPct : Do not use any P-states faster than the given percentage
of full CPU speed, rounded up to the next available P-state.

HOL-2004-01-SDC
• Power.MinFreqPct : Do not use any P-states slower than the given percentage
of full CPU speed.
• Power.PerfBias : Performance Energy Bias Hint (Intel only). Sets an MSR on Intel
processors to an Intel-recommended value. Intel recommends 0 for high
performance, 6 for balanced, and 15 for low power. Other values are undefined.
• Power.TimerHz : Controls how many times per second ESXi reevaluates which P-
state each CPU should be in.
• Power.UseCStates : Use deep ACPI C-states (C2 or below) when the processor is
idle.
• Power.UsePStates : Use ACPI P-states to save power when the processor is
busy.

HOL-2004-01-SDC

In order to free up resources for the remaining parts of this lab, we need to shut
down the used virtual machine and reset the configuration.
Stop Module 8
On your desktop, find the Module Switcher window and click the Stop button for
Module 8.
Key takeaways
CPU contention problems are generally easy to detect. In fact, vCenter has several
alarms that trigger if host CPU utilization or virtual machine CPU utilization goes too
high for extended periods of times.
vSphere allows you to create very large virtual machines (up to 256 vCPUs with 6.7 U2;
see https://configmax.vmware.com/home for more information). It is highly
recommended to size your virtual machine for the application workload that runs in
them. Sizing your virtual machine with resources that are unnecessarily larger than the
workload can actually use may result in hypervisor overhead and can also lead to
performance issues.
In general, here are some common CPU performance tips.
Avoid a large VM on too small a platform

HOL-2004-01-SDC
• Rule of thumb: 1-4 vCPU on dual socket hosts, 8+ vCPU on quad socket hosts.
This rule of thumb changes as core counts increase. Try to keep vCPU count
below the core count of any single pCPU for the best performance profile. This is
due to memory locality, see module 4 about vNUMA for more details on this.
• Sizing a VM too large is wasteful. The OS will spend more time wasting cycles
trying to keep workloads in sync.
Don't expect as high of consolidation ratios with busy workloads as you did
with the low-hanging-fruit
• Virtualizing larger workloads require revisiting consolidation ratios.

• Tier 1 applications more performant workloads demand more resources
Here are some best practices around power management policies:
• Configure your physical host (server BIOS) to OS Control mode as the power
policy. If applicable, enable Turbo mode, C-States (including deep C-states),
which are usually the default.
• Within ESXi, the default Balanced power management policy will achieve the
best performance per watt for most workloads.
• For applications that require maximum performance, switch the BIOS power
policy and/or the ESXi power management policy to Maximum Performance
and High Performance respectively. This includes latency-sensitive applications
that must execute within strict constraints on response time. Be aware, however,
that this typically only results in minimal performance gain, but disables all
potential power savings.
Depending on your applications and the level of utilization of your ESXi hosts, the
correct power policy setting can have a great impact on both performance and energy
consumption. On modern hardware, it is possible to have ESXi control the power
management features of the hardware platform used. You can select to use predefined
policies or you can create your own custom policy.
Recent studies have shown that it is best to let ESXi control the power policy.

HOL-2004-01-SDC
Module 9 - Memory
Performance with X-Mem
(30 minutes)

HOL-2004-01-SDC
Introduction
The goal of this module is to learn how to characterize memory performance in

a virtualized environment. VMware vSphere incorporates sophisticated
mechanisms that maximize the use of available memory through page sharing,
resource-allocation controls, and other memory management techniques.
Host memory is a limited resource, but it is critical that you assign sufficient
resources (especially memory, but also CPU) to each VM so they perform
optimally.
This module discusses an open-source memory benchmark named X-Mem, which

can be used to characterize both memory bandwidth (throughput) and
memory latency (access time).

HOL-2004-01-SDC
What is X-Mem / Why X-Mem?

This lesson will describe what X-Mem is (no, it's not a superhero movie), and why
we've decided to use it to characterize memory performance in this lab.
What X-Mem is: A Cross-Platform, Extensible Memory

Characterization Tool for the Cloud
From the X-Mem page on github (https://github.com/Microsoft/X-Mem):
X-Mem is a flexible open-source research tool for characterizing memory

hierarchy throughput, latency, power, and more. The tool was developed jointly by
Microsoft and the UCLA NanoCAD Lab. This project was started by Mark Gottscho
(Email: mgottscho@ucla.edu) as a Summer 2014 PhD intern at Microsoft Research. X-
Mem is released freely and open-source under the MIT License. The project is under
active development.
Why X-Mem / Alternatives
Of course, X-Mem is not the only memory benchmark available. Here is a feature
comparison of X-Mem versus some other popular memory benchmarks like STREAM,
lmbench and Intel's mlc (source). Here is a quick summary of some key advantages that
set it apart:
• (A) Access pattern diversity. Cloud applications span many domains. They
express a broad spectrum of computational behaviors and access memory in a
mix of structured and random patterns. These patterns exhibit a variety of
readwrite ratios, spatio-temporal localities, and working-set sizes. Replication of
these memory access patterns using controlled micro-benchmarks facilitates the
study of their performance. This can be used by cloud providers to create cost-
effective hardware configurations for different classes of applications and by
subscribers to optimize their applications.
• (B) Platform variability. Cloud servers are built from a mix of instruction set
architectures (ISAs, e.g., x86-64 and ARM), machine organizations (e.g., memory
model and cache configuration), and technology standards (e.g., DDR, PCIe,

HOL-2004-01-SDC
NVMe, etc.). Platforms also span a variety of software stacks and operating
systems (OSes, e.g., Linux and Windows). The interfaces and semantics of OS-
level memory management features such as large pages and non-uniform
memory access (NUMA) also vary. In order to objectively cross-evaluate
competing platforms and help optimize an application for a particular platform, a
memory characterization tool should support as many permutations of these
features as possible.
• (C) Metric flexibility. Both the subscriber’s application-defined performance
and the provider’s costs depend on memory performance and power. Unlike X-
Mem, most tools do not integrate memory power measurement.
• (D) Tool extensibility. Cloud platforms have changed considerably over the last
decade and continue to evolve in the future. Emerging non-volatile memories
(NVMs) introduce new capabilities and challenges that require special
consideration. Unfortunately, most existing characterization tools are not easily
extensible. X-Mem is being actively maintained and extended for ongoing
research needs.
Research Paper and Attribution
There is a research tool paper describing the motivation, design, and implementation of
X-Mem as well as three experimental case studies using tools to deliver insights useful
to both cloud providers and subscribers. For more information, see the following links:
• Tool homepage: https://nanocad-lab.github.io/X-Mem

• Microsoft GitHub repository: https://github.com/Microsoft/X-Mem
• Paper from IEEE ISPASS 2016: http://nanocad.ee.ucla.edu/pub/Main/Publications/
C91_paper.pdf
• Published paper on IEEE Xplore: http://ieeexplore.ieee.org/xpl/
articleDetails.jsp?arnumber=7482101
Citation:
Mark Gottscho, Sriram Govindan, Bikash Sharma, Mohammed Shoaib, and Puneet
Gupta. X-Mem: A Cross-Platform and Extensible Memory Characterization Tool for the
Cloud. In Proceedings of the IEEE International Symposium on Performance Analysis of
Systems and Software (ISPASS), pp. 263-273. Uppsala, Sweden. April 17-19, 2016.
DOI: http://dx.doi.org/10.1109/ISPASS.2016.7482101

HOL-2004-01-SDC
Downloading/Installing X-Mem
This lesson describes how to download the X-Mem benchmark. There are prebuilt
binaries for Windows and Linux; this lab demonstrates X-Mem inside of Windows
VMs.
Download and Extract X-Mem
There are multiple ways to obtain X-Mem, but the easiest is to go to http://nanocad-
lab.github.io/X-Mem/ and click the Binaries (zip) button, which has precompiled
binaries for Windows. If you're using Linux, or wish to make modifications to the source
code, click the appropriate link.
Runtime Prerequisites
There are a few runtime prerequisites in order for the software to run correctly. Note
that these requirements are for the pre-compiled binaries that are available on the
project homepage at https://nanocad-lab.github.io/X-Mem. Also note that these
requirements are already met using our lab environment:
HARDWARE:
• Intel x86, x86-64, x86-64+AVX, or MIC (Xeon Phi/Knights Corner) CPU. AMD CPUs
that are compatible with Intel Architecture ISAs should also work fine.

HOL-2004-01-SDC
• ARM Cortex-A series processors with VFP and NEON extensions. Specifically
tested on ARM Cortex A9 (32-bit) which is ARMv7. 64-bit builds for ARMv8-A
should also work but have not been tested. GNU/Linux builds only. ARM on
Windows can compile using VC++, but cannot link due to a lack of library support
for desktop/command-line ARM apps. This may be resolved in the future. If you
can get this working, let us know!
WINDOWS:
• Microsoft Windows 8.1 64-bit or later, Server 2012 R2 or later.

• Microsoft Visual C++ 2015 Redistributables (32-bit) -- for x86 (32-bit) builds.
• Microsoft Visual C++ 2015 Redistributables (64-bit) -- for x86-64 and x86-64 with
AVX builds
• You MAY need Administrator privileges, in order to:
◦ Use large pages, if the --large_pages option is selected (see USAGE, below)
◦ The first time you use --large_pages on a given Windows machine, you
may need to ensure that your Windows user account has the necessary
rights to allow lockable memory pages. To do this on Windows 8, run
gpedit.msc --> Local Computer Policy --> Computer Configuration --> Windows
Settings --> Security Settings --> Local Policies --> User Rights Assignment -->
Add your username to "Lock pages in memory" . Then log out and then log back in.
◦ Use the PowerReader interface, depending on end-user implementation
◦ Elevate thread priority and pin threads to logical CPUs for improved
performance and benchmarking consistency
GNU/LINUX:
• GNU utilities with support for C++11. Tested with gcc 4.8.2 on Ubuntu 14.04 LTS
for x86 (32-bit), x86-64, x86-64+AVX, and MIC on Intel Sandy Bridge, Ivy Bridge,
Haswell, and Knights Corner families.
• libhugetlbfs. You can obtain it at https://github.com/libhugetlbfs/libhugetlbfs. On
Ubuntu systems, you can install using sudo apt-get install libhugetlbfs0 . If you
don't have this or cannot install it, this should be fine but you will not be able to
use large pages. Note that this package requires Linux kernel 2.6.16 or later. This
should not be an issue on most modern Linux systems.
• Potentially, administrator privileges, if you plan to use the --large_pages option.
◦ During runtime, if the --large_pages option is selected, you may need to
first manually ensure that large pages are available from the OS. This can
be done by running hugeadm --pool-list . It is recommended to set minimum
pool to 1GB (in order to measure DRAM effectively). If needed, this can be
done by running hugeadm --pool-pages-min 2MB:512 . Alternatively, run the
linux_setup_runtime_hugetlbfs.sh script that is provided with X-Mem.
Installation
Fortunately, the only file that is needed to run X-Mem on Windows is the respective
executable xmem-win-.exe on Windows, and xmem-linux- on GNU/Linux. It has no other

HOL-2004-01-SDC
dependencies aside from the pre-installed system prerequisites which were just
outlined.

HOL-2004-01-SDC
Running X-Mem
Double click on the Performance Lab MS shortcut on the Main Console desktop
Launch Module 9
Click on the Start button for Module 9.
NOTE: Please wait a couple of minutes, and do not proceed with the lab until you see
Remote Desktop windows appear.

HOL-2004-01-SDC
Reposition Remote Desktops
The script opens Remote Desktop Connections to two Windows VMs. However, we need
to make both of them visible. Drag the title bars of the Remote Desktop windows:
1. Position perf-worker-01a on the left (as shown)

2. Position perf-worker-01b on the right (as shown)
3. Note that perf-worker-01a has 4 vCPUs
4. Note that perf-worker-01b has only 1 vCPU
5. Note that both VMs have 2GB (2047 MB) RAM
Given #5, you might think the memory performance of these two VMs should be
identical. As we'll see, X-Mem can run multiple worker threads to exercise multiple
CPUs simultaneously, allowing better scalability with more vCPUs.
X-Mem Command Line Options
Command-
Purpose
line Option
# of worker threads to use in
benchmarks.
-j
NOTE: Can not be larger than the #
of vCPUs.

HOL-2004-01-SDC
Command-
Purpose
line Option
# of iterations to run; helps ensure
-n consistency (the results shouldn't
fluctuate much)
Throughput benchmark mode (as
-t opposed to -l for latency
benchmark mode)
-R Use memory read-based patterns
-W Use memory write-based patterns
Here is a summary of some of the command-line options we'll be using in this lab, but X-
Mem has many more options to customize how it is run.
X-Mem Command-Line Help
Within the perf-worker-01b Remote Desktop window:
1. Click the Command Prompt taskbar icon

HOL-2004-01-SDC
2. Type this command: xmem -h (h for help) and press Enter

3. The window is not very big, and there's a lot of help text, so use the Up arrow or
the scrollbar to scroll back up and see all the different options X-Mem has.
As you can see, X-Mem has a ton of options! Let's look at some we'll be using for this
lab.
Run X-Mem throughput (two jobs) on perf-worker-01b

(FAIL)
You should already have a Command Prompt window open on perf-worker-01b from
the previous step; if not, click the Command Prompt icon on the taskbar.
Let's try to run X-Mem with a couple of command-line parameters we just saw: -t to
test memory throughput, and -j2 to run two worker threads:
1. Type xmem -t -j2 and press Enter

2. You should see output like what is shown here, namely:
ERROR: Number of worker threads may not exceed the number of logical CPUs (1)

HOL-2004-01-SDC
This is expected, because if you recall, this VM only has one virtual CPU.
Run X-Mem throughput (two jobs) on perf-worker-01a

(PASS)
Now run the exact same X-Mem command that failed on perf-worker-01b on perf-
worker-01a:
1. Select the perf-worker-01a Remote Desktop window.

2. Click the Command Prompt icon on the taskbar.
3. Type this command and press Enter: xmem -t -j2
4. Notice how this command successfully runs the benchmark on this VM.
This command ran successfully on this VM, because it has 4 virtual CPUs (so -j3 or -j4
would also work). Next, let's take a closer look at the results.

HOL-2004-01-SDC
Review X-Mem throughput results (2 jobs)
Once you're back at the command prompt, use the scrollbar to scroll back up and look
at the results:
1. The first benchmark throughput test, Test #1T, will show Read/Write Mode:
read. Since we specified -j2, the output shows that it ran 2 worker threads.
The result in this example was 90664.66 MB/s (or 90.664 GB/s). Note that your
performance may vary, given the shared resources of the hands-on lab
environment (where many other workloads are running).
2. The second benchmark throughput test, Test #2T, will show Read/Write Mode:
write. Since we specified -j2, the output shows that it ran 2 worker threads.
The result in this example was 44113.39 MB/s (or 44.11 GB/s). Note that your
performance may vary, given the shared resources of the hands-on lab
environment (where many other workloads are running).
Why did the second test have lower (in this case, about half) the throughput of the first?
Well, writes are almost always more expensive than reads; this is true for
memory/RAM, and other subsystems, such as disk storage I/O.

HOL-2004-01-SDC
Run X-Mem read throughput (four jobs) on perf-

worker-01a
Let's further customize the X-Mem command line options, again on perf-worker-01a:
1. Make sure the focus is on the Command Prompt of the perf-worker-01a Remote
Desktop window (if it isn't already)
2. Type this command and press Enter: xmem -t -R -j4 -n5
3. The results will be listed under the *** RESULTS*** heading, as shown here.
Notice that the benchmark ran differently due to the different command line we used.
Here is an explanation of each option:
• -t: Throughput mode (same as the previous commands)

• -R : Only do memory reads (not writes)
• -j4 : Use four simultaneous CPU threads (since we have four vCPUs available on
this VM)
• -n5 : Run the benchmark five times in a row; this ensures that the benchmark
produces consistent results

HOL-2004-01-SDC
Review X-Mem throughput results (four jobs)
Once you're back at the command prompt, use the scrollbar to scroll back up and look
at the results. In this example, the results are consistently around 170,000 MB/sec (170
GB/sec). Since we specified -j4 , it ran four worker threads, so the memory
performance is significantly higher than when we ran with two worker threads.
NOTE: Given the nature of our hands-on lab environment, your results may (and
probably will) vary from this example.
As shown here, if your application is multi-threaded, additional vCPUs can potentially

increase the VM's memory performance.
Close the Remote Desktop windows
1. Close the perf-worker-01a Remote Desktop window

2. Close the perf-worker-01b Remote Desktop window

HOL-2004-01-SDC

Stop Module 9
On the main console, find the Module Switcher window and click Stop.
Key takeaways
During this lab, we learned that X-Mem is a flexible memory benchmark tool. It can:
• Run on Windows and Linux

• Run multiple worker threads to utilize multiple vCPUs
• Run memory throughput to measure bandwidth (Note: The tool can also measure
latency, which was not covered here.)
• Run reads and writes (and also sequential or random, which was not covered
here)
• Run multiple iterations to ensure the memory performance is consistent
You can download this tool to run in your environment to ensure you are getting optimal
memory performance out of your hosts and virtual machines.
Conclusion
This concludes the Memory Performance with X-Mem module. We hope you have
enjoyed taking it. Please don't forget to fill out the survey when you finish.

HOL-2004-01-SDC
Module 10 - Storage
Performance and
Troubleshooting (30
minutes)

HOL-2004-01-SDC
Introduction to Storage Performance

Troubleshooting
Approximately 90% of performance problems in a vSphere deployment are
typically related to storage in some way. There have been significant advances in
storage technologies over the past few years to help improve storage
performance. There are a few things that you should be aware of:
In a well-architected environment, there is no difference in performance between

storage fabric technologies. A well-designed NFS, iSCSI or FC implementation
works just about the same as the others.
Despite advances in the interconnects, performance limit is still hit at the media
itself. In fact, 90% of storage performance cases seen by GSS (Global Support
Services - VMware support) that are not configuration related are media related.
Some things to remember:
• Payload (throughput) is fundamentally different from IOP (cmd/s)

• IOP performance is always lower than throughput
A good rule of thumb on the total number of IOPs any given disk provides:
• 7.2k rpm – 80 IOPs

• 10k rpm – 120 IOPs
• 15k rpm – 150 IOPs
• SSD – 20k-100k IOPs (max ≠ real world)
• NVMe - Up to 450k IOPs
So, if you want to know how many IOPs you can achieve with a given number of
disks:
• Total Raw IOPs = Disk IOPs * Number of disks

• Functional IOPs = (Raw IOPs * Write%)/(Raid Penalty) + (Raw IOPs * Read
%)
This test demonstrates some methods to identify poor storage performance and
how to resolve it using VMware Storage DRS for workload balancing. The first step
is to prepare the environment for the demonstration.

HOL-2004-01-SDC
Double click on the Performance Lab MS shortcut on the Main Console desktop.
Launch Module 10
Click on the Start button under Module 10. The script configures and starts up the
virtual machines and launches a storage workload using Iometer.
The script may take up to five minutes to complete. While the script runs,
spend a few minutes on reading through the next step to gain understanding
on storage latencies.

HOL-2004-01-SDC
Storage I/O Contention

Disk I/O Latency
When we think about storage performance problems, the top issue is generally latency,
so we need to look at the storage stack and understand what layers there are in the
storage stack and where latency can build up.
At the top most layer is the Application running in the guest operating system. That is
ultimately the place where we most care about latency. This is the total amount of
latency that application sees and it include the latencies off the total storage stack
including the guest OS, the VMKernel virtualization layers, and the physical hardware.
ESXi can’t see application latency because that is a layer above the ESXi virtualization
layer.
From ESXi we see three main latencies that are reported in esxtop and vCenter.
The top most is GAVG, or Guest Average latency, that is the total amount of latency
that ESXi can detect.
That is not saying this is the total amount of latency the application sees. In fact, if you
compare the GAVG (the Total Amount of Latency ESX is seeing) and the Actual latency
the Application is seeing, you can tell how much latency the Guest OS is adding to the
storage stack. This could tell you if the guest OS is configured incorrectly or is causing a
performance problem. For example, if ESXi is reporting GAVG of 10ms, but the
application or perfmon in the guest OS is reporting Storage Latency of 30ms, that

HOL-2004-01-SDC
means that 20ms of latency is somehow building up in the Guest OS Layer, and you
should focus your debugging on the Guest OS's storage configuration.
GAVG is made up of 2 major components KAVG and DAVG:
DAVG is basically how much time is spent in the Device from the driver HBA and
storage array
KAVG is how much time is spent in the ESXi Kernel (so how much over is the kernel
adding).
KAVG is actually a derived metric - ESXi does not specifically calculate KAVG. ESXi
calculates KAVG with the following formula:
Total Latency – DAVG = KAVG
The VMKernel is very efficient in processing IO, so there really should not be any
significant time that an IO should wait in the kernel or KAVG. KAVG should be equal to 0
in well configured / running environments. When KAVG is not equal to 0, then that most
likely means that the IO is stuck in a Kernel Queue inside the VMKernel. So the vast
majority of the time KAVG equals QAVG or Queue Average latency (the amount of
time an IO is stuck in a queue waiting for a slot in a lower queue to free up so it can
move down the stack).

HOL-2004-01-SDC
View the Storage Performance as reported by IOmeter
When the storage script has completed, you should see two IOmeter windows, and two
storage workloads should be running.
The storage workload is started on both perf-worker-02a and perf-worker-03a. It takes a

few minutes for the workloads to settle and for the performance numbers to become
almost identical for the two VMs. These virtual machines testing disk share the same
datastore, and that datastore is saturated.
The performance can be seen in the IOmeter GUI as:
Latencies (Average I/O Response Time) - latencies around 6ms
Low IOPs (Total I/O per Second) - around 160IOPs
Low Throughput (Total MBs per Second) - around 2.7MBPS

HOL-2004-01-SDC
Disclaimer: Because we run this lab in a fully virtualized environment where

the ESXi host servers also run in virtual machines, we cannot assign physical
disk spindles to individual datastores. Therefore the performance numbers on
these screenshots vary depending on the actual load in the cloud environment
the lab is running in.
Log into vSphere web client


HOL-2004-01-SDC
Select perf-worker-03a
1. Select "perf-worker-03a"
View Storage Performance Metrics in vCenter

HOL-2004-01-SDC
1. Select "Monitor"
2. Select "Performance"
3. Select "Advanced"
4. Click "Chart Options"
Select Performance Metrics
1. Select "Virtual disk"

2. Select only "scsi0:1"
3. Click "None" under "Select counters for this chart"
4. Select "Write latency" and "Write rate"
5. Click "OK"
The disk that IOmeter uses for generating workload is scsi0:1 or sdb inside the guest.

HOL-2004-01-SDC
View Storage Performance Metrics in vCenter
Repeat the configuration of the performance chart for perf-worker-02a and

verify that performance is almost identical to perf-worker-03a.
Guidance: Device latencies that are greater than 20ms may see a performance impact
in your applications.
Due to the way we create a private datastore for this test, we actually have pretty good
low latency numbers. scsi0:1 is located on an iSCSI datastore based on a RAMdisk on
perf-worker-04a (DatastoreA) running on the same ESXi host as perf-worker-03a. Hence,
latencies are low for a fully virtualized environment.
vSphere provides several storage features to help manage and control storage
performance:
• Storage I/O control

• Storage IOP Limits
• Storage DRS
• Disk Shares
Let’s configure Storage DRS to solve this contention problem.

HOL-2004-01-SDC
Storage Cluster and Storage DRS

A datastore cluster is a collection of datastores with shared resources and a
shared management interface. Datastore clusters are to datastores what clusters
are to hosts. When you create a datastore cluster, you can use vSphere Storage
DRS to manage storage resources.
When you add a datastore to a datastore cluster, the datastore's resources

become part of the datastore cluster's resources. As with clusters of hosts, you
use datastore clusters to aggregate storage resources, which enables you to
support resource allocation policies at the datastore cluster level. The following
resource management capabilities are also available per datastore cluster.
Space utilization load balancing: You can set a threshold for space use. When
space use on a datastore exceeds the threshold, Storage DRS generates
recommendations or performs Storage vMotion migrations to balance space use
across the datastore cluster.
I/O latency load balancing: You can set an I/O latency threshold for bottleneck
avoidance. When I/O latency on a datastore exceeds the threshold, Storage DRS
generates recommendations or performs Storage vMotion migrations to help
alleviate high I/O load. Remember to consult your storage vendor to get their
recommendation on using I/O latency load balancing.
Anti-affinity rules: You can create anti-affinity rules for virtual machine disks.
For example, the virtual disks of a certain virtual machine must be kept on
different datastores. By default, all virtual disks for a virtual machine are placed
on the same datastore.
Change to the Datastore view

HOL-2004-01-SDC
1. Change to the Storage view by clicking on the icon

2. Click on RegionA01 which is under vcsa-01a.corp.local
Create a Datastore Cluster
1. Click on ACTIONS
2. Go to Storage
3. Click on New Datastore Cluster...

HOL-2004-01-SDC
Specify Datastore Name
For this lab, we will accept most of the default settings.
1. We can specify a name for the Datastore cluster, but leave it at the default of
DatastoreCluster.
2. Click NEXT

HOL-2004-01-SDC
Specify Storage DRS Automation
1. Select No Automation (Manual Mode)

2. Click NEXT

HOL-2004-01-SDC
Specify Storage DRS Runtime Settings
1. Move the slider all the way to the left to specify a 50% Utilized space
threshold.
2. Click NEXT
Since this lab is a nested virtual environment, it is difficult to demonstrate high latency
in a reliable manner. Therefore we do not use I/O latency to demonstrate load balancing.
The default is to check for storage cluster imbalances every eight hours, but it can be
changed to 60 minutes as a minimum.

HOL-2004-01-SDC
Select Clusters and Hosts
1. Check RegionA01-COMP01 to select our lab cluster

2. Click NEXT

HOL-2004-01-SDC
Select Datastores
1. Select DatastoreA and DatastoreB

2. Click NEXT

HOL-2004-01-SDC
Ready to Complete
Click FINISH to create the Datastore cluster.

HOL-2004-01-SDC
Run Storage DRS
Take a note of the name of the virtual machine that Storage DRS (SDRS)
wants to migrate.
1. Select DatastoreCluster
3. Select Storage DRS / Recommendations
4. Click RUN STORAGE DRS NOW
5. Click APPLY RECOMMENDATIONS
Notice that SDRS recommends moving one of the workloads from DatastoreA to
DatastoreB. It is making the recommendation based on capacity. SDRS makes storage
moves based on performance only after it has collected performance data for more than
eight hours. Since the workloads just recently started, SDRS would not make a
recommendation to balance the workloads based on performance until it has collected
more data.

HOL-2004-01-SDC
Configure Storage DRS
1. Select Configure
2. Select Storage DRS
3. Select the dropdown arrows to observe the different SDRS settings you can
configure
A number of enhancements have been made to Storage DRS to remove some of the
previous limitations:
• Storage DRS has improved interoperability with deduplicated datastores, so

that Storage DRS is able to identify if datastores are backed by the same
deduplication pool or not, and hence avoid moving a VM to a datastore using a
different deduplication pool.
• Storage DRS has improved interoperability with thin provisioned datastores, so
that Storage DRS is able to identify if thin provisioned datastores are backed by
the same storage pool or not, and hence avoid moving a VM between datastores
using the same storage pool.

HOL-2004-01-SDC
• Storage DRS has improved interoperability with Array-based auto-tiering, so

that Storage DRS can identify datastores with auto-tiering and treat them
differently according to the type and frequency of auto-tiering.
Common for all these improvements is that they all require VASA 2.0, which
requires that the storage vendor has an updated storage provider.
Select the VM that was migrated
1. Return to the Hosts and Clusters view by clicking the icon.

2. Select the VM that was migrated using Storage DRS. In this example, it is
perf-worker-03a

HOL-2004-01-SDC
Increased throughput and lower latency
1. Select the "Monitor" tab

2. Select "Performance"
3. Select "Advanced"
Now you should see the performance chart you created earlier in this module.
Notice how the throughput has increased and how the latency is lower (green arrows),
than it was when both VMs shared the same datastore.

HOL-2004-01-SDC
Return to the Iometer GUIs to review the performance
Return the Iometer workers, and see how they also report increased performance and
lower latencies.
It takes a while for Iometer to show these higher numbers, maybe ten minutes. This due
to the way the storage performance is throttled in this lab. If you want to try a shortcut:
1. Click the "Stop sign", and wait for about 30 seconds

2. Click the "Green flag" (start tests) to restart the two workers (see arrows on the
picture)
The workload should spike but then settle at the higher performance level in a couple of
minutes.

HOL-2004-01-SDC
Stop the Iometer workloads
Stop the Workloads
1. Press the "Stop Sign" button on the Iometer GUI

2. Close the GUI by pressing the “X”
3. Press the "Stop Sign" button on the Iometer GUI
4. Close the GUI by pressing the “X”

HOL-2004-01-SDC

This concludes the Storage Performance and Troubleshooting module. We
hope you have enjoyed taking it. Please do not forget to fill out the survey when
you are finished.
Stop Module 10
On the main console, find the Module Switcher window and click Stop for Module 10.
Key takeaways
During this lab we saw the importance of sizing your storage correctly with respect to
space and performance. It also shows that sometimes when you have two storage
intensive sequential workloads sharing the same spindles, the performance can be
greatly impacted. If possible try to keep workloads separated; keep sequential
workloads separate (back by different spindles/LUNs) from random workloads.
In general, we aim to keep storage latencies under 20ms, lower if possible, and monitor
for frequent latency spikes of 60ms or more which would be a performance concern and
something to investigate further.
Guidance: From a vSphere perspective, for most applications, the use of one large
datastore vs. several small datastores tends not to have a performance impact.
However, the use of one large LUN vs. several LUNs is storage array dependent and
most storage arrays perform better in a multi-LUN configuration than a single large LUN
configuration.

HOL-2004-01-SDC
Guidance: Follow your storage vendor’s best practices and sizing guidelines to properly
size and tune your storage for your virtualized environment.

HOL-2004-01-SDC
Module 11 - Network
Performance, Basic
Concepts and
Troubleshooting (15
minutes)

HOL-2004-01-SDC
Introduction to Network Performance

As defined by Wikipedia, network performance refers to measures of service
quality of a telecommunications product as seen by the customer.
These metrics are considered important:
• Bandwidth: commonly measured in bits/second, this is the maximum rate

that information can be transferred
• Throughput: the actual rate that information is transferred
• Latency: the delay between the sender and the receiver decoding it, this is
mainly a function of the signals travel time, and processing time at any
nodes the information traverses
• Jitter: variation in the time of arrival at the receiver of the information
• Error rate: the number of corrupted bits expressed as a percentage or
fraction of the total sent
In the following module, we will show you how to monitor and troubleshoot some
network-related issues so that you can troubleshoot similar issues that may exist
in your own environment.
To start this module, double-click on the Performance Lab MS shortcut on the Main
Console desktop.

HOL-2004-01-SDC
Start Module 11
Click on the Start button under Module 11.
Open Google Chrome
Click the Google Chrome icon on the taskbar.

HOL-2004-01-SDC
Login to the vSphere Client


HOL-2004-01-SDC
Monitor network activity with

performance charts
Network contention can occur when multiple VMs are accessing the same "pipe"
(virtual and/or physical network) and there isn't enough bandwidth available.
In our lab environment, it's not feasible to attempt to saturate the network (we'd
like others to be able to take labs without delays!). Therefore, this module
focuses on creating network load and showing you where to look when you
suspect network problems in your own environment.
NOTE: You might see different results on your screen, which is to be expected
giving the variability of the lab environments.
Chart perf-worker-02a network performance
1. Select the perf-worker-02a VM which is on ESXi host esx-01a.corp.local

3. Select Performance/Advanced option from the list
4. Click the Popup Chart icon

HOL-2004-01-SDC
Click Chart Options to select the network metrics we want to chart.
1. Select the Network subsystem

2. Select these counters: Packets received, Packets transmitted, and Usage
(not pictured)
3. Make sure only perf-worker-02a is checked (uncheck the other objects)
4. Click OK

HOL-2004-01-SDC
Monitor chart output
Depending on the time it took to get here, the network load test might be done. You
should still be able to see the network load that ran and finished.
1. Here you can see the graphical representation of the network load of perf-
worker-02a
2. Here you can see the counters we selected in the previous step (Packets
received, transmitted, and overall Usage in KBps) and a real-time view of their
values
Some good advice on what to look for is:
• Usage: If this number is higher than expected, you may want to consider
segregating this VM onto a separate virtual switch or VLAN from other VMs
• Packets received and Packets transmitted: If these values get too high, it
could lead to dropped packets which need to be retransmitted.
Let's go to the host, and see if this is a VM or a host-level problem.

HOL-2004-01-SDC
Select esx-01a.corp.local host

3. Select Advanced Performance from the list
4. Select the Popup Chart icon
Click Chart Options to select the network metrics we want to chart.

HOL-2004-01-SDC
Monitor chart output
1. See if there are any dropped packets on the host
In this example, there are no dropped packets at the host level, which indicates the
hosts' NICs are not the bottleneck.
NOTE: You might see different results depending upon the lab environment conditions.

HOL-2004-01-SDC

This concludes the Network Performance, Basic Concepts and
Troubleshooting module. We hope you have enjoyed taking it. Please don't
forget to fill out the survey when you are finished.
Stop Module 11
On the main console, click Stop under Module 11.
Key takeaways
During this lab we saw how to diagnose networking problems, both at a VM and at an
ESXi host level, using the vSphere Client's built-in performance charts.
Note that there are other ways to troubleshoot networking performance:
• If you want real time performance, esxtop is a great tool for just that, and it's
covered in a different module.
• If you want long term performance statistics at a datacenter level, vRealize
Operations is the right tool.
If you want to know more about troubleshooting network performance, see this VMware
KB article:
"Troubleshooting network performance issues in a vSphere
environment": http://kb.vmware.com/kb/1004087

HOL-2004-01-SDC
Module 12 - Advanced
Performance Feature:
Latency Sensitivity
Setting (45 minutes)

HOL-2004-01-SDC
Introduction to Latency Sensitivity

The 'Latency Sensitivity' feature was developed to address major sources of
latency that can be introduced by virtualization. This feature was designed to
programmatically reduce response time and jitter on a per-VM basis allowing
sensitive workloads exclusive access to physical resources and avoid resource
contention on a granular basis. This is achieved by bypassing virtualization layers
reducing overhead. Even greater performance can be realized when latency
sensitivity is used in conjunction with a pass-through mechanism such as single-
root I/O virtualization (SR-IOV).
Since this feature is set on a per-VM basis, a mixture of both normal VMs and
latency sensitive workload VMs can be run on a single vSphere host.
Who should use this feature?
The latency sensitivity feature is intended only for specialized use cases, namely,
workloads that require extremely low latency. It is extremely important to determine if
your workload could benefit from this feature before enabling it. Latency sensitivity
provides extremely low network latency performance with a tradeoff of increased CPU
and memory cost because of reduced resource sharing and increased power
consumption.
The definition of a “high-latency sensitive application” is one that requires network

latencies in the tens of microseconds and very small jitter. An example would be
stock market trading applications which are highly sensitive to latency; any introduced
latency could mean the difference of making millions or losing millions.
Before making the decision to leverage VMware’s latency sensitivity feature, perform
the necessary cost-benefit analysis if this feature is necessary. Choosing to enable this
feature just because it exists can lead to higher host CPU utilization, higher power
consumption, and it can needlessly impact performance of the other VMs running on the
host.
Who should not use this feature?
Choosing whether to enable the latency sensitivity or not is one of those “Just because
you can doesn’t mean you should” choices. The Latency sensitivity feature reduces
network latency. Latency sensitivity does not decrease application latency, especially
if latency is influenced by storage or other sources of latency besides the network.
The latency sensitivity feature should be enabled in environments in which the CPU is
under committed. VMs which have latency sensitivity set to High are given exclusive
access to the physical CPU on the host. This means the latency sensitive VM can no
longer share the CPU with neighboring VMs.

HOL-2004-01-SDC
Generally, VMs that use the latency sensitivity feature should have fewer vCPUs than
the number of cores per socket in your host to ensure that the latency sensitive VM
occupies only one NUMA node.
If the latency sensitivity feature is not relevant to your environment, consider choosing
a different module.
Changes to CPU access
When a VM has 'High' latency sensitivity set in vCenter, the VM is given exclusive
access to the physical cores it needs to run. This is termed exclusive affinity. These
cores will be reserved for the latency sensitive VM only, which results in greater CPU
accessibility to the VM and less L1 and L2 cache pollution from multiplexing other VMs
onto the same cores. When the VM is powered on, each vCPU is assigned to a particular
physical CPU and remains on that CPU.
When the latency sensitive VM's vCPU is idle, ESXi also alters its halting behavior so that
the physical CPU remains active. This reduces wakeup latency when the VM becomes
active again.
Changes to virtual NIC interrupt coalescing
A virtual NIC (vNIC) is a virtual device that exchanges network packets between the
VMkernel and the guest operating system. Exchanges are typically triggered by
interrupts to the guest OS or by the guest OS calling into VMkernel, both of which are
expensive operations. Virtual NIC interrupt coalescing, which is enabled by default in
vSphere, attempts to reduce CPU overhead by holding back packets for some time
(combining or "coalescing" these packets) before triggering interrupts, which causes the
hypervisor to wake up VMs more frequently.
Enabling 'High' latency sensitivity disables virtual NIC coalescing, so that there is
less latency between when a packet is sent or received and when the CPU is interrupted
to process the packet. Typically, coalescing is desirable for higher throughput (so the
CPU isn't interrupted as often), but it can introduce network latency and jitter.
While disabling coalescing can reduce latency, it can also increase CPU utilization and
thus power usage. Therefore this option should only be used in environments with small
packet rates and plenty of CPU headroom.
Are you ready to get your hands dirty? Let's start the hands-on portion of this lab.

HOL-2004-01-SDC

HOL-2004-01-SDC
Enabling and Confirming the Latency

Sensitivity setting
In this section, we learn how to enable and confirm that Latency Sensitivity for a
VM in the lab environment.
Open Google Chrome
First, let's open Google Chrome.
Login to vCenter


HOL-2004-01-SDC
Select the challenge-04a VM
Let's select the VM we'll be enabling Latency Sensitivity on:
1. Ensure Hosts and Clusters is the view in the vSphere Client by clicking the
highlighted icon
2. Select the challenge-04a VM highlighted
3. Note this VM has 2 CPUs and 2 GB of Memory configured.
4. Click the Edit Settings icon so we can enable the setting.

HOL-2004-01-SDC
Go to Advanced VM Settings
1. Select VM Options
2. Expand the Advanced pulldown
3. Scroll down

HOL-2004-01-SDC
Set Latency Sensitivity to High
After you scroll down, you should see the Latency Sensitivity setting.
1. Select the dropdown and change it from Normal to High

2. Click OK to save this setting
Now, let's try to power on this VM. Hint: We may have to do a couple of more things
before it powers on successfully, but we'll learn how to do these as well.

HOL-2004-01-SDC
Power on challenge-04a VM, Note CPU Reservation

Requirement
Let's try to power on the VM, and note the error that comes up.
1. Click the Power On icon to attempt to start the VM

2. Note the failure in the lower right corner: Operation failed!
3. The reason for the failure is listed next to Status: Invalid CPU reservation for
the latency-sensitive VM, (sched.cpu.min) should be at least 5598 MHz.
NOTE: Your specific lab environment will likely show a different MHz value here;
please make note of it for the next step.
4. Click the Edit Settings icon again so we can set the CPU reservation.

HOL-2004-01-SDC
Set CPU Reservation
Let's set the CPU reservation for the challenge-04a VM to resolve the power-on failure.
1. Expand the CPU dropdown.

2. The Reservation field is 0 by default (no CPU reservation). Change this to the
value that the error message stated in the previous step. In this example, it
is 5598 MHz but might differ in your lab environment.
3. Click OK.
Try to power on the VM again now that the CPU reservation has been set.

HOL-2004-01-SDC
Power on challenge-04a VM, Note Memory Reservation

Requirement
Let's try to power on the VM, and note the error that comes up.
1. Click the Power On icon to attempt to start the VM

2. Note the failure in the lower right corner: Operation failed!
3. The reason for the failure is listed next to Status: Invalid memory setting:
memory reservation (sched.mem.min) should be equal to
memsize(2048).
4. Click the Edit Settings icon again so we can set the memory reservation.

HOL-2004-01-SDC
Set Memory Reservation
Let's set the Memory Reservation for the challenge-04a VM to resolve the power-on
failure.
1. Expand the Memory dropdown.

2. The Reservation field is 0 by default (no memory reservation). Check the
"Reserve all guest memory (All locked)" checkbox.
3. Click OK.
Let's try to power on the VM again, now that both the CPU and memory reservations
have been set.

HOL-2004-01-SDC
Power on challenge-04a VM
Let's try to power on the VM, and note that shouldn't be any more errors.
1. Click the Power On icon to attempt to start the VM.

2. Click the Monitor tab so we can confirm that the CPU and Memory Reservations
are indeed working as intended.
Open PuTTY
Click on the PuTTY icon so we can SSH to the host that is running the challenge-04a
VM.

HOL-2004-01-SDC
PuTTY to esx-01a
Double-click on the esx-01a.corp.local session.

HOL-2004-01-SDC
Launch esxtop
Type in esxtop and press Enter.

HOL-2004-01-SDC
Filter only running VMs (V)
Filtering only running VMs makes the display easier to read.

Type an uppercase V (shift-v) and to see the display change to only challenge-04a.

HOL-2004-01-SDC
Change the displayed esxtop fields (f)
Type the f key (short for fields) to see a display like the above.
We want to remove the "F" field (CPU State Times), and add the "I" field (CPU Summary
Stats).
Type the uppercase F and I keys and you should see the CPU Summary Stats
selected now.
Press Enter to return to the main esxtop display.

HOL-2004-01-SDC
Expand the esxtop window to the right
There are still many fields in esxtop, so expand the window by clicking and dragging
the edge of the right border of the window to the right.
There are two things we want to note here:
1. Note the GID of your VM (279396 in this example but is different in your lab
environment)
2. Note that EXC_AF is Y - this is new with ESXi 6.7; it confirms that the VM has
exclusive affinity
Expand the VM GID in esxtop
Expand the GID of the VM that is displayed in your lab environment:
• Press the "e" key to expand the VM

• You'll see a prompt as shown here, which states "Group to expand/rollup
(gid):"
Type in the GID of your challange-04a VM and press Enter.

HOL-2004-01-SDC
Expanded esxtop display shows EXC_AF for all vCPUs
Note that we now see much more information about the challenge-04a VM, including
processes, which CPUs those processes are on, and so on.
The highlighted rows show:
1. There are two vCPUs (vmx-vcpu-0 and vmx-vcpu-1)

2. Both vCPUs have 99% DMD (demand), confirming the CPU reservation set earlier
3. Exclusive affinity (EXC_AF = Y), confirming the Latency Sensitivity setting is
active.
4. Go ahead and close PuTTY.

HOL-2004-01-SDC
Monitor Performance Overview Charts (CPU and Memory)
Switch back to the vSphere Client (Chrome window) and let's look at the CPU and
Memory usage, now that we have created the necessary Reservations for Latency
Sensitivity to be High.
1. Under Performance, click Overview.

2. For the time range, make sure this drop-down is set to Real-time.
3. Hover over the red lines on the left and right, which indicate CPU and
Memory Usage for challenge-04a.
These usages are 100%, which means the reservations we set are working.
NOTE: While these reservations are necessary for latency sensitivity, keep in
mind this excludes the ability for ESXi to share or free up idle resources for other
VMs.

HOL-2004-01-SDC
Shut down challenge-04a VM
Still in the Chrome/vSphere Client window:
1. Click the Refresh icon

2. Ensure the Monitor tab is still highlighted
3. Click All Issues
4. Note that vCenter generated CPU and memory usage alarms for this VM.
This is expected, given the reservations we had to set for Latency Sensitivity.
5. Click the Shut Down icon to power off the VM.
6. Click YES to confirm.

HOL-2004-01-SDC
Remove CPU Reservation for challenge-04a VM
To remove the CPU reservation:
1. Expand CPU by clicking the caret as shown

2. Change the Reservation from the value you set earlier to 0 MHz. This removes
the CPU reservation.
3. Collapse CPU by clicking the caret from Step 1.
Next, we'll remove the Memory reservation.
Remove Memory Reservation for challenge-04a VM
To remove the Memory reservation:

HOL-2004-01-SDC
1. Expand Memory by clicking the caret as shown

2. Uncheck the "Reserve all guest memory (All locked)" checkbox
3. Change the Reservation from 2048 MB to 0 MB as shown. This removes the
memory reservation.
4. Click the VM Options tab, where we will set Latency Sensitivity back to Normal.
Remove Memory Reservation for challenge-04a VM
To remove the Memory reservation:
1. Expand Memory by clicking the caret as shown

2. Uncheck the "Reserve all guest memory (All locked)" checkbox
3. Change the Reservation from 2048 MB to 0 MB as shown. This removes the
memory reservation.
Next, set Latency Sensitivity back to Normal.

HOL-2004-01-SDC
Go to Advanced VM Settings
1. Select VM Options
2. Expand the Advanced pulldown
3. Scroll down

HOL-2004-01-SDC
Set Latency Sensitivity to Normal
After you scroll down, you can see the Latency Sensitivity setting.
1. Select the dropdown and change it from High to Normal.

2. Click OK to save these settings.
Summary
Congratulations! In summary, you have successfully:
• Configured a VM with Latency Sensitivity set to High

• Set the requisite CPU and memory reservations
• Powered-on the VM
• Confirmed CPU/memory reservations (100% CPU/memory usage) through the
vSphere Client

HOL-2004-01-SDC
• Confirmed exclusive affinity via esxtop

• Rolled back these changes (reservations and Latency Sensitivity) to restore
default VM settings

HOL-2004-01-SDC
Conclusion
This concludes the Latency Sensitivity module. We hope you have enjoyed
taking it. Please do not forget to fill out the survey when you are finished.
Key takeaways
The Latency Sensitivity setting is easy to configure, but you should determine
whether your application fits the definition of "High" latency sensitivity.
To review:
1. In the VM's Advanced Settings, set Latency Sensitivity to High.

2. Set the necessary minimum CPU reservation for the latency sensitive VM such
that the MHz reserved is equal to the frequency of the CPU.
3. Set the 100% memory reservation to reserve/lock all of the memory of the
guest VM.
If you want to learn more about running latency sensitive applications on vSphere,
consult these white papers:
• http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-
Workloads.pdf
• http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf
Test Your Skills!
Now that you’ve completed this lab, try testing your skills with VMware Odyssey, our
newest Hands-on Labs gamification program. We have taken Hands-on Labs to the next
level by adding gamification elements to the labs you know and love. Experience the
fully automated VMware Odyssey as you race against the clock to complete tasks and
reach the highest ranking on the leaderboard. Try the vSphere Performance Odyssey lab
• HOL-2004-04-ODY- VMware Odyssey - vSphere Performance - Advanced Game

HOL-2004-01-SDC
Conclusion
Thank you for participating in the VMware Hands-on Labs. Be sure to visit
http://hol.vmware.com/ to continue your lab experience online.
Lab SKU: HOL-2004-01-SDC
Version: 20201130-191634

Hol 2004 01 SDC - PDF - en

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hol 2004 01 SDC - PDF - en

Uploaded by

Copyright:

Available Formats

HOL-2004-01-SDC

Analyzing Results/Improving DVD Store 3 Performance ...................................... 178

Lab Module List:

• Module 1 - vSphere 6.7 Performance: What's New? (30 minutes) (Basic)

• David Morse - Performance Engineer, US

http://docs.hol.vmware.com This lab may be available in other languages. To set

Location of the Main Console

Alternate Methods of Keyboard Data Entry

Click and Drag Lab Manual Content Into Console Active

Accessing the Online International Keyboard

Click once in active console window

1. Click once in the active console window.

Click on the @ key

1. Click on the "@ key".

Notice the @ sign entered in the active console window.

Activation Prompt or Watermark

This cosmetic issue has no effect on your lab.

Look at the lower right portion of the screen

Module 1 - vSphere 6.7

Check the Lab Status in the lower-right of the desktop

Open Google Chrome

First, let's open Google Chrome.

This is the vCenter login screen. To login to vCenter:

1. Check the Use Windows session authentication checkbox

Select Hosts and Clusters

Faster Lifecycle Management

New vSphere Update Manager Interface

To see the Update Manager in our lab environment:

1. Click on the Menu dropdown

1. Click the Updates tab

the pre-check is now a separate operation, allowing administrators to verify that a

Faster Upgrades from ESXi 6.5 to 6.7

ESXi 6.7 Update Manager Video (3:48)

vSphere Quick Boot

Quick Boot eliminates the time-consuming hardware initialization phase by shutting

Quick Boot video (1:53)

vCenter Server 6.7

2X faster performance in vCenter operations per second

3X faster operations, 3X reduction in memory usage

vCenter Performance Analysis

Core Platform Improvements

• Host processor maximums increased from 576 to 768 logical CPUs.

1 GB Large Memory Pages

As shown in this figure, there is up to 26% improvement in 1 GB memory access

CPU Scheduler Enhancements

Virtual Per-VM EVC

vSphere previously implemented Enhanced vMotion Compatibility (EVC) as a

Let's configure a EVC for a specific VM:

1. Click on Menu then Hosts and Clusters (it should be underlined)

1. Click the Enable EVC for Intel hosts radio button

Virtual Hardware 14 adds support for:

• Persistent memory, with a maximum of:

Verify that perf-worker-01a VM is running Virtual Hardware version 14:

Next, we'll show how to upgrade a legacy VM to this version.

To upgrade a VM with an older Virtual Hardware version to version 14:

1. Select the perf-worker-02a VM

Click YES to acknowledge this warning.

We can choose what hardware version to upgrade perf-worker-01b to. By default, it is

Click OK to confirm the upgrade.

Virtual Hardware 15 (ESXi 6.7 Update 2 and later)

Persistent Memory (PMEM)

vSphere 6.7 supports two modes of accessing persistent memory:

• vPMEMDisk - presents NVDIMM capacity as a local host datastore which requires no