Professional Documents
Culture Documents
Copyright/Trademark
Copyright © 2012 VMware, Inc. All rights reserved. This manual and its accompanying
materials are protected by U.S. and international copyright and intellectual property laws.
VMware products are covered by one or more patents listed at http://www.vmware.com/go/
patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States
and/or other jurisdictions. All other marks and names mentioned herein may be trademarks
of their respective companies.
The training material is provided “as is,” and all express or implied conditions,
representations, and warranties, including any implied warranty of merchantability, fitness for
a particular purpose or noninfringement, are disclaimed, even if VMware, Inc., has been
advised of the possibility of such claims. This training material is designed to support an
instructor-led training course and is intended to be used for reference purposes in
conjunction with the instructor-led training course. The training material is not a standalone
training tool. Use of the training material for self-study without class attendance is not
recommended.
These materials and the computer programs to which it relates are the property of, and
embody trade secrets and confidential information proprietary to, VMware, Inc., and may not
be reproduced, copied, disclosed, transferred, adapted or modified without the express
written approval of VMware, Inc.
Course development: Mike Sutton
Technical review: Johnathan Loux, Carla Gavalakis
Technical editing: Jeffrey Gardiner
Production and publishing: Ron Morton, Regina Aboud
www.vmware.com/education
TA B L E OF C ONTENTS
MODULE 7 Storage Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351
You Are Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .352
Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .353
Module Lessons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .354
Lesson 1: Storage Virtualization Concepts . . . . . . . . . . . . . . . . . . . . . .355
Learner Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .356
Storage Performance Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .357
Storage Protocol Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .358
SAN Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .359
Storage Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .361
Performance Impact of Queuing on the Storage Array . . . . . . . . . . . . .362
LUN Queue Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .363
Network Storage: iSCSI and NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .364
What Affects VMFS Performance?. . . . . . . . . . . . . . . . . . . . . . . . . . . .365
SCSI Reservations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .367
VMFS Versus RDMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .368
Virtual Disk Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .369
Review of Learner Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .370
Lesson 2: Monitoring Storage Activity . . . . . . . . . . . . . . . . . . . . . . . . .371
Learner Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .372
Applying Space Utilization Data to Manage Storage Resources . . . . .373
Disk Capacity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .374
Monitoring Disk Throughput with vSphere Client . . . . . . . . . . . . . . . .375
Monitoring Disk Throughput with resxtop . . . . . . . . . . . . . . . . . . . . . .376
Disk Throughput Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .377
Monitoring Disk Latency with vSphere Client . . . . . . . . . . . . . . . . . . .379
Monitoring Disk Latency with resxtop . . . . . . . . . . . . . . . . . . . . . . . . .381
Monitoring Commands and Command Queuing . . . . . . . . . . . . . . . . .382
Disk Latency and Queuing Example . . . . . . . . . . . . . . . . . . . . . . . . . . .383
Monitoring Severely Overloaded Storage . . . . . . . . . . . . . . . . . . . . . . .384
Configuring Datastore Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .385
Analyzing Datastore Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .387
Lab 10 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .388
Lab 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .389
Lab 10 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .390
Review of Learner Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .391
Lesson 3: Command-Line Storage Management . . . . . . . . . . . . . . . . .392
Learner Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .393
Managing Storage with vMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .394
Examining LUNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395
Managing Storage Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .396
Managing NAS Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .397
Managing iSCSI Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .398
Contents iii
Example: Spotting CPU Overcommitment . . . . . . . . . . . . . . . . . . . . . .492
Guest CPU Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .493
Using One vCPU in SMP Virtual Machine . . . . . . . . . . . . . . . . . . . . . .494
Low Guest CPU Utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .496
CPU Performance Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .497
Lab 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .498
Review of Learner Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .499
Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .500
Contents v
Power-On Requirements: CPU and Memory Reservations . . . . . . . . .588
Power-On Requirements: Swap File Space . . . . . . . . . . . . . . . . . . . . . .589
Power-On Requirements: Virtual SCSI HBA Selection . . . . . . . . . . . .590
Virtual Machine Performance Best Practices . . . . . . . . . . . . . . . . . . . .591
Review of Learner Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .592
Lesson 2: vSphere Cluster Optimization . . . . . . . . . . . . . . . . . . . . . . . .593
Learner Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .594
Guidelines for Resource Allocation Settings . . . . . . . . . . . . . . . . . . . .595
Resource Pool and vApp Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . .597
DRS Cluster: Configuration Guidelines . . . . . . . . . . . . . . . . . . . . . . . .598
DRS Cluster: vMotion Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . .600
DRS Cluster: Cluster Setting Guidelines . . . . . . . . . . . . . . . . . . . . . . .602
vSphere HA Cluster: Admission Control Guidelines . . . . . . . . . . . . . .603
Example of Calculating Slot Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . .605
Applying Slot Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .606
Reserving a Percentage of Cluster Resources . . . . . . . . . . . . . . . . . . . .607
Calculating Current Failover Capacity . . . . . . . . . . . . . . . . . . . . . . . . .608
Virtual Machine Unable to Power On . . . . . . . . . . . . . . . . . . . . . . . . . .609
Advanced Options to Control Slot Size. . . . . . . . . . . . . . . . . . . . . . . . . 611
Setting vSphere HA Advanced Parameters . . . . . . . . . . . . . . . . . . . . . .612
Fewer Available Slots Shown Than Expected . . . . . . . . . . . . . . . . . . .613
Lab 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .614
Review of Learner Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .615
Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .616
Contents vii
Building an ESXi Image: Step 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .686
Using Image Builder to Build an Image: Step 4 . . . . . . . . . . . . . . . . . .687
Lab 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .688
Review of Learner Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .689
Lesson 6: Auto Deploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .690
Learner Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .691
What Is Auto Deploy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .692
Where Are the Configuration and State Information Stored? . . . . . . . .693
Auto Deploy Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .694
Rules Engine Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .695
Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .696
PXE Boot Infrastructure Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .697
Initial Boot of an Autodeployed ESXi Host: Step 1 . . . . . . . . . . . . . . .698
Initial Boot of an Autodeployed ESXi Host: Step 2 . . . . . . . . . . . . . . .699
Initial Boot of an Autodeployed ESXi Host: Step 3 . . . . . . . . . . . . . . .700
Initial Boot of an Autodeployed ESXi Host: Step 4 . . . . . . . . . . . . . . .701
Initial Boot of an Autodeployed ESXi Host: Step 5 . . . . . . . . . . . . . . .702
Subsequent Boot of an Autodeployed ESXi Host: Step 1 . . . . . . . . . . .703
Subsequent Boot of an Autodeployed ESXi Host: Step 2 . . . . . . . . . . .704
Subsequent Boot of an Autodeployed ESXi Host: Step 3 . . . . . . . . . . .705
Subsequent Boot of an Autodeployed ESXi Host: Step 4 . . . . . . . . . . .706
Auto Deploy Stateless Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .707
Stateless Caching Host Profile Configuration . . . . . . . . . . . . . . . . . . . .708
Auto Deploy Stateless Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .709
Auto Deploy Stateful Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .710
Stateful Installation Host Profile Configuration . . . . . . . . . . . . . . . . . . 711
Auto Deploy Stateful Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .712
Managing the Auto Deploy Environment . . . . . . . . . . . . . . . . . . . . . . .713
Using Auto Deploy with Update Manager to Upgrade Hosts . . . . . . . .714
Lab 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .715
Review of Learner Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .716
Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .717
Storage Optimization 7
7
Slide 7-1
Storage Optimization
Module 7
7
Slide 7-3
Storage Optimization
Storage can limit the performance of enterprise workloads. You
should know how to monitor a hosts storage throughput and
troubleshoot problems that result in overloaded storage and slow
storage performance.
7
Slide 7-5
Storage Optimization
Lesson 1:
Storage Virtualization Concepts
7
Slide 7-7
Storage Optimization
What affects storage
performance?
Storage protocols:
Fibre Channel, Fibre Channel over
Ethernet, hardware iSCSI,
software iSCSI, NFS
Proper storage configuration
Load balancing
Queuing and LUN queue depth
VMware vSphere® VMFS (VMFS)
configuration:
Choosing between VMFS and
RDMs
SCSI reservations
Virtual disk types
VMware vSphere® ESXi™ enables multiple hosts to share the same physical storage reliably
through its optimized storage stack and VMware vSphere® VMFS. Centralized storage of virtual
machines can be accomplished by using VMFS as well as NFS. Centralized storage enables
virtualization capabilities such as VMware vSphere® vMotion®, VMware vSphere® Distributed
Resource Scheduler™ (DRS), and VMware vSphere® High Availability (vSphere HA). To gain the
greatest advantage from shared storage, you must understand the storage performance limits of a
given physical environment to ensure that you do not overcommit resources.
Several factors have an affect on storage performance:
• Storage protocols
• Proper configuration of your storage devices
• Load balancing across available storage
• Storage queues
• Proper use and configuration of your VMFS volumes
Each of these factors is discussed in this lesson.
For Fibre Channel and hardware iSCSI, a major part of the protocol processing is off-loaded to the
HBA. Consequently, the cost of each I/O is very low.
For software iSCSI and NFS, host CPUs are used for protocol processing, which increases cost.
Furthermore, the cost of NFS and software iSCSI is higher with larger block sizes, such as 64KB.
This cost is due to the additional CPU cycles needed for each block for tasks like check summing
and blocking. Software iSCSI and NFS are more efficient at smaller blocks and both are capable of
delivering high throughput performance when CPU resource is not a bottleneck.
ESXi hosts provide support for high-performance hardware features. For greater throughput, ESXi
hosts support 16 Gb Fibre Channel adapters and 10Gb Ethernet adapters for hardware iSCSI and
NFS storage. ESXi hosts also support the use of jumbo frames for software iSCSI and NFS. This
support is available on both Gigabit and 10Gb Ethernet NICs.
7
Slide 7-9
Storage Optimization
Proper SAN configuration can help to eliminate performance issues.
Each LUN should have the right RAID level and storage
characteristics for the applications in virtual machines that will use it.
Use the right path selection policy:
Most Recently Used (MRU)
Fixed (Fixed)
Round Robin (RR)
Storage performance is a vast topic that depends on workload, hardware, vendor, RAID level, cache
size, stripe size, and so on. Consult the appropriate VMware® documentation as well as the storage
vendor documentation on how to configure your storage devices appropriately.
Because each application running in your VMware vSphere® environment has different
requirements, you can achieve high throughput and minimal latency by choosing the appropriate
RAID level for applications running in the virtual machines.
By default, active-passive storage arrays use the Most Recently Used path policy. To avoid LUN
thrashing, do not use the Fixed path policy for active-passive storage arrays.
By default, active-active storage arrays use the Fixed path policy. When using this policy, you can
maximize the use of your bandwidth to the storage array by designating preferred paths to each
LUN through different storage controllers.
7
Slide 7-10
Storage Optimization
Queuing at the host:
Device queue controls number of
queues active commands on LUN at any
time.
Depth of queue is 32 (default).
VMkernel queue is an overflow
queue for device driver queue.
Queuing at the storage array:
Queuing occurs when the number
of active commands to a LUN is too
queues high for the storage array to handle.
Latency increases with excessive
queuing at host or storage array.
There are several storage queues that you should be aware of: device driver queue, kernel queue,
and storage array queue.
The device driver queue is used for low-level interaction with the storage device. This queue
controls how many active commands can be on a LUN at the same time. This number is effectively
the concurrency of the storage stack. Set the device queue to 1 and each storage command becomes
sequential: each command must complete before the next starts.
The kernel queue can be thought of as an overflow queue for the device driver queues. A kernel
queue includes features that optimize storage. These features include multipathing for failover and
load balancing, prioritization of storage activities based on virtual machine and cluster shares, and
optimizations to improve efficiency for long sequential operations.
SCSI device drivers have a configurable parameter called the LUN queue depth that determines how
many commands can be active at one time to a given LUN. The default value is 32. If the total
number of outstanding commands from all virtual machines exceeds the LUN queue depth, the
excess commands are queued in the ESXi kernel, which increases latency.
In addition to queuing at the ESXi host, command queuing can also occur at the storage array.
Aggregate throughput
200
180
160
140
120 seq_rd
MBps
seq_wr
100 rnd_rd
80 rnd_wr
60
40
20
0
1 2 4 8 16 32 64
No. of hosts
To understand the effect of queuing on the storage array, consider a case in which the virtual
machines on an ESXi host can generate a constant number of SCSI commands equal to the LUN
queue depth. If multiple ESXi hosts share the same LUN, SCSI commands to that LUN from all
hosts are processed by the storage processor on the storage array. Strictly speaking, an array storage
processor running in target mode might not have a per-LUN queue depth, so it might issue the
commands directly to the disks. But if the number of active commands to the shared LUN is too
high, multiple commands begin queuing up at the disks, resulting in high latencies.
VMware tested the effects of running an I/O intensive workload on 64 ESXi hosts sharing a VMFS
volume on a single LUN. As shown in the table above, except for sequential reads, there is no drop
in aggregate throughput as the number of hosts increases. The reason sequential read throughput
drops is that the sequential streams coming in from the different ESXi hosts are intermixed at the
storage array, thus losing their sequential nature. Writes generally show better performance than
reads because they are absorbed by the write cache and flushed to disks in the background.
For more details on this test, see “Scalable Storage Performance” at http://www.vmware.com/
resources/techresources/1059.
7
Slide 7-12
Storage Optimization
LUN queue depth determines how
many commands to a given LUN can
be active at one time.
Set LUN queue depth size properly to
Set LUN decrease disk latency.
queue
depth to its Depth of queue is 32 (default).
maximum:
64.
Maximum recommended queue depth
is 64.
Set Disk.SchedNumReqOutstanding
to the same value as the queue depth.
SCSI device drivers have a configurable parameter called the LUN queue depth that determines how
many commands can be active at one time to a given LUN. QLogic Fibre Channel HBAs support up
to 255 outstanding commands per LUN, and Emulex HBAs support up to 128. However, the default
value for both drivers is set to 32. If an ESXi host generates more commands to a LUN than the
LUN queue depth, the excess commands are queued in the VMkernel, which increases latency.
When virtual machines share a LUN, the total number of outstanding commands permitted from all
virtual machines to that LUN is governed by the Disk.SchedNumReqOutstanding configuration
parameter, which can be set in VMware® vCenter Server™. If the total number of outstanding
commands from all virtual machines exceeds this parameter, the excess commands are queued in the
VMkernel.
To reduce latency, ensure that the sum of active commands from all virtual machines does not
consistently exceed the LUN queue depth. Either increase the queue depth (the maximum
recommended queue depth is 64) or move the virtual disks of some virtual machines to a different
VMFS volume.
For details on how to increase the queue depth of your storage adapter, see vSphere Storage Guide at
http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html.
Applications or systems that write large amounts of data to storage, such as data acquisition or
transaction logging systems, should not share Ethernet links to a storage device. These types of
applications perform best with multiple connections to storage devices.
For iSCSI and NFS, make sure that your network topology does not contain Ethernet bottlenecks.
Bottlenecks where multiple links are routed through fewer links can result in oversubscription and
dropped network packets. Any time a number of links transmitting near capacity are switched to a
smaller number of links, such oversubscription becomes possible. Recovering from dropped
network packets results in large performance degradation.
Using VLANs or VPNs does not provide a suitable solution to the problem of link oversubscription
in shared configurations. However, creating separate VLANs for NFS and iSCSI is beneficial. This
separation minimizes network interference from other packet sources.
Finally, with software-initiated iSCSI and NFS, the network protocol processing takes place on the
host system and thus can require more CPU resources than other storage options.
7
Slide 7-14
Storage Optimization
VMFS partition alignment:
The VMware vSphere® Client properly aligns a VMFS partition along
the 1MB boundary.
Performance improvement is dependent on workloads and array types.
Spanning VMFS volumes:
This feature is effective for increasing VMFS size dynamically.
Predicting performance is not straightforward.
NOTE
If a VMFS3 partition was created using an earlier version of ESXi that aligned along the 64KB
boundary, and that file system is then upgraded to VMFS5, it will retain its 64KB alignment. 1MB
alignment can be obtained by deleting the partition and recreating it using the vSphere Client and an
ESXi host.
7
Slide 7-15
Storage Optimization
A SCSI reservation:
Causes a LUN to be used exclusively by a single host for a brief period
Is used by a VMFS instance to lock the file system while the VMFS
metadata is updated
Operations that result in metadata updates:
Creating or deleting a virtual disk
Increasing the size of a VMFS volume
Creating or deleting snapshots
Increasing the size of a VMDK file
To minimize the impact on virtual machine performance:
Postpone major maintenance/configuration until off-peak hours.
VMFS is a clustered file system and uses SCSI reservations as part of its distributed locking
algorithms. Administrative operations—such as creating or deleting a virtual disk, extending a
VMFS volume, or creating or deleting snapshots—result in metadata updates to the file system
using locks and thus result in SCSI reservations. A reservation causes the LUN to be available
exclusively to a single ESXi host for a brief period of time. Although an acceptable practice is to
perform a limited number of administrative tasks during peak hours, postponing major maintenance
to off-peak hours in order to minimize the effect on virtual machine performance is better.
The impact of SCSI reservations depends on the number and nature of storage or VMFS
administrative tasks being performed:
• The longer an administrative task runs (for example, creating a virtual machine with a larger
disk or cloning from a template that resides on a slow NFS share), the longer the virtual
machines are affected. Also, the time to reserve and release a LUN is highly hardware-
dependent and vendor-dependent.
• Running administrative tasks from a particular ESXi host does not have much effect on the I/O-
intensive virtual machines running on the same ESXi host.
You usually do not have to worry about SCSI reservations if no VMFS administrative tasks are
being performed or if VMFS volumes are not being shared by multiple hosts.
There are three choices on ESXi hosts for managing disk access in a virtual machine: virtual disk in
a VMFS, virtual raw device mapping (RDM), and physical raw device mapping. Choosing the right
disk-access method can be a key factor in achieving high performance for enterprise applications.
VMFS is the preferred option for most virtual disk storage and most enterprise applications,
including databases, ERP, CRM, VMware Consolidated Backup, Web servers, and file servers.
Common uses of RDM are in cluster data and quorum disks for virtual-to-virtual clustering, or
physical-to-virtual clustering; or for running SAN snapshot applications in a virtual machine.
For random workloads, VMFS and RDM produce similar I/O throughput. For sequential workloads
with small I/O block sizes, RDM provides a small increase in throughput compared to VMFS.
However, the performance gap decreases as the I/O block size increases. For all workloads, RDM
has slightly better CPU cost.
For a detailed study comparing VMFS and RDM performance, see “Performance Characterization
of VMFS and RDM Using a SAN” at http://www.vmware.com/resources/techresources/1040.
7
Slide 7-17
Storage Optimization
Disk type Description How to Performance Use case
create impact
Eager-zeroed Space allocated vSphere Extended creation Quorum drive
thick and zeroed out Client or time, but best in an MSCS
at time of vmkfstools performance from cluster
creation first write on
Lazy-zeroed Space allocated vSphere Shorter creation Good for most
thick at time of Client (default time, but reduced cases
creation, but disk type) or performance on
zeroed on first vmkfstools first write to block
write
Thin Space allocated vSphere Shorter creation Disk space
and zeroed upon Client or time, but reduced utilization is the
demand vmkfstools performance on main concern.
first write to block
The type of virtual disk used by your virtual machine can have an effect on I/O performance. ESXi
supports the following virtual disk types:
• Eager-zeroed thick – Disk space is allocated and zeroed out at the time of creation. Although
this extends the time it takes to create the disk, using this disk type results in the best
performance, even on the first write to each block. The primary use of this disk type is for
quorum drives in an MSCS cluster. You can create eager-zeroed thick disks at the command
prompt with vmkfstools.
• Lazy-zeroed thick – Disk space is allocated at the time of creation, but each block is zeroed
only on the first write. Using this disk type results in a shorter creation time but reduced
performance the first time a block is written to. Subsequent writes, however, have the same
performance as an eager-zeroed thick disk. This disk type is the default type used to create
virtual disks using the vSphere Client and is good for most cases.
• Thin – Disk space is allocated and zeroed upon demand, instead of upon creation. Using this
disk type results in a shorter creation time but reduced performance the first time a block is
written to. Subsequent writes have the same performance as an eager-zeroed thick disk. Use this
disk type when space use is the main concern for all types of applications. You can create thin
disks (also known as thin-provisioned disks) through the vSphere Client.
7
Slide 7-19
Storage Optimization
Lesson 2:
Monitoring Storage Activity
7
Slide 7-21
Storage Optimization
The overview charts on the
performance tab displays
usage details of the
datastore.
By default, the displayed
charts include:
Space Utilization
By Virtual Machines
(Top 5)
1 Day Summary
The overview charts on the performance tab for a datastore displays information on how the space
on a datastore is utilized. To view the charts:
1. From the Datastores and Datastore Clusters view, select the datastore.
To identify disk-related performance problems, start with determining the available bandwidth on
your host and compare it with your expectations. Are you getting the expected IOPS? Are you
getting the expected bandwidth (read/write)? The same thing applies to disk latency. Check disk
latency and compare it with your expectations. Are the latencies higher than you expected?
Compare performance with the hardware specifications of the particular storage subsystem.
Disk bandwidth and latency help determine whether storage is overloaded or slow. In your vSphere
environment, the most significant metrics to monitor for disk performance are the following:
• Disk throughput
• Disk latency
• Number of commands queued
• Number of active disk commands
• Number of aborted disk commands
7
Slide 7-23
Storage Optimization
Disk Read Rate, Disk Write Rate, Disk Usage
The vSphere Client allows you to monitor disk throughput using the advanced disk performance
charts. Metrics of interest:
• Disk Read Rate – Number of disk reads (in KBps) over the sample period
• Disk Write Rate – Number of disk writes (in KBps) over the sample period
• Disk Usage – Number of combined disk reads and disk writes (in KBps) over the sample period
These metrics give you a good indication of how well your storage is being utilized. You can also
use these metrics to monitor whether or not your storage workload is balanced across LUNs.
Disk Read Rate and Disk Write Rate can be monitored per LUN. Disk Read Rate, Disk Write Rate,
and Disk Usage can be monitored per host.
7
Slide 7-25
Storage Optimization
Device view: Type u.
One of the challenges when monitoring these values is to determine whether or not the values
indicate a potential disk bottleneck. The screenshots above were taken while a database was being
restored. A database restore can generate a lot of sequential I/O and is very disk-write intensive.
The first resxtop screen shows disk activity per disk adapter (type d in the window.) All I/O is
going through the same adapter, vmhba2. To display the set of fields, type f. Ensure that the
following fields are selected: A (adapter name), B (LUN ID), G (I/O statistics), and H (overall
latency statistics). If necessary, select fields to display by typing a, b, g, or h.
The second resxtop screen shows disk activity per device (type u in the window.) Most of the I/O
is going to the same LUN. To display the set of fields, type f. Ensure that the following fields are
selected: A (device name), G (I/O stats), and H (overall latency stats). If necessary, select fields to
display by typing a, g, or h.
7
Slide 7-26
Storage Optimization
The vSphere Client counters:
Physical device latency counters and kernel latency counters
Disk latency is the time taken to complete an I/O request and is most commonly measured in
milliseconds. Because multiple layers are involved in a storage system, each layer through which the
I/O request passes might add its own delay. In the vSphere environment, the I/O request passes from
the VMkernel to the physical storage device. Latency can be caused when excessive commands are
being queued either at the VMkernel or the storage subsystem. When the latency numbers are high,
the storage subsystem might be overworked by too many hosts.
In the vSphere Client, there are a number of latency counters that you can monitor. However, a
performance problem can usually be identified and corrected by monitoring the following latency
metrics:
• Physical device command latency – This is the average amount of time for the physical device
to complete a SCSI command. Depending on your hardware, a number greater than
15 milliseconds indicates that the storage array might be slow or overworked.
• Kernel command latency – This is the average amount of time that the VMkernel spends
processing each SCSI command. For best performance, the value should be 0–1 milliseconds. If
the value is greater than 4 milliseconds, the virtual machines on the ESXi host are trying to send
more throughput to the storage system than the configuration supports.
7
Slide 7-27
Storage Optimization
Host bus adapters (HBAs) latency stats from the
include SCSI, iSCSI, RAID, device, the kernel, and
and FC-HBA adapters. the guest
In addition to disk throughput, the disk adapter screen (type d in the window) lets you monitor disk
latency as well:
• ADAPTR – The name of the host bus adapter (vmhba#), which includes SCSI, iSCSI, RAID
and Fibre Channel adapters.
• DAVG/cmd – The average amount of time it takes a device (which includes the HBA, the
storage array, and everything in between) to service a single I/O request (read or write).
• If the value < 10, the system is healthy. If the value is 11–20 (inclusive), be aware of the
situation by monitoring the value more frequently. If the value is > 20, this most likely
indicates a problem.
• KAVG/cmd – The average amount of time it takes the VMkernel to service a disk operation.
This number represents time spent by the CPU to manage I/O. Because processors are much
faster than disks, this value should be close to zero. A value or 1 or 2 is considered high for this
metric.
GAVG/cmd – The total latency seen from the virtual machine when performing an I/O request.
GAVG is the sum of DAVG plus KAVG
There are metrics for monitoring the number of active disk commands and the number of disk
commands that are queued. These metrics provide information about your disk performance. They
are often used to further interpret the latency values that you might be observing.
• Number of active commands – This metric represents the number of I/O operations that are
currently active. This includes operations for which the host is processing. This metric can
serve as a quick view of storage activity. If the value of this metric is close to or at zero, the
storage subsystem is not being used. If the value is a nonzero number, sustained over time, then
constant interaction with the storage subsystem is occurring.
Number of commands queued – This metric represents the number of I/O operations that require
processing but have not yet been addressed. Commands are queued and awaiting management by
the kernel when the driver’s active command buffer is full. Occasionally, a queue will form and
result in a small, nonzero value for QUED. However, any significant (double-digit) average of
queued commands means that the storage hardware is unable to keep up with the host’s needs.
7
Slide 7-29
Storage Optimization
normal
VMkernel
latency
queuing at
the device
Here is an example of monitoring the kernel latency value, KAVG/cmd. This value is being
monitored for the device vmhba0. In the first resxtop screen (type d in the window), the kernel
latency value is 0.01 milliseconds. This is a good value because it is nearly zero.
In the second resxtop screen (type u in the window), there are 32 active I/Os (ACTV) and 2 I/Os
being queued (QUED). This means that there is some queuing happening at the VMkernel level.
Queuing happens if there is excessive I/O to the device and the LUN queue depth setting is not
sufficient. The default LUN queue depth is 32. However, if there are too many I/Os (more than 32)
to handle simultaneously, the device will get bottlenecked to only 32 outstanding I/Os at a time. To
resolve this, you would change the queue depth of the device driver.
For details on changing the queue depth of the device driver, see Fibre Channel SAN Configuration
Guide at http://www.vmware.com/support/pubs.
The vSphere
Client counter:
Disk Command
Aborts
resxtop counter:
ABRTS/s
When storage is severely overloaded, commands are aborted because the storage subsystem is
taking far too long to respond to the commands. The storage subsystem has not responded within an
acceptable amount of time, as defined by the guest operating system or application. Aborted
commands are a sign that the storage hardware is overloaded and unable to handle the requests in
line with the host’s expectations.
The number of aborted commands can be monitored by using either the vSphere Client or resxtop:
• From the vSphere Client, monitor Disk Command Aborts.
From resxtop, monitor ABRTS/s.
7
Slide 7-31
Storage Optimization
To configure datastore alarms:
Right-click on the datastore and select Add Alarm.
Enter the condition or event you want to monitor and what action to
occur as a result.
Alarms can be configured to detect when specific conditions or events occur against a selected
datastore.
Monitoring for conditions is limited to only three options:
• Datastore Disk Provisioned (%)
• Datastore Disk Usage (%)
• Datastore State to All Hosts
Monitoring for events offers more options. The events that can be monitored include the following:
• Datastore capacity increased
• Datastore discovered
• File or directory copied to datastore
• File or directory deleted
• File or directory moved to datastore
• Local datastore created
7
Slide 7-32
Storage Optimization
By selecting the Triggered Alarms button, the Alarms tab displays all
triggered alarms for the highlighted datastores.
By default, alarms are not configured to perform any actions. An alarm appears on the Alarms tab
when the Triggered Alarms button is selected and a red or yellow indicator appears on the
datastore, but the only way to know an alarm has triggered is to actively be searching for them.
Actions that can be configured include the following:
• Send a notification email
• Send a notification trap
• Run a command
Configuring actions enables the vCenter Server to notify you if an alarm is triggered.
Once a triggered alarm is discovered, you must gather as much information as possible concerning
the reason for the alarm. The Tasks & Events tab can provide more detail as to why the alarm
occurred. Searching the system logs might also provide some clues.
VMFS
shared storage
fileserver2.sh VMFS
datawrite.sh remote
TestVM01.vmdk
logwrite.sh data disk
your assigned
LUN
In this lab, you generate various types of disk I/O and compare performance. Your virtual machine is
currently configured with one virtual disk (the system disk). For this lab, you add two additional
virtual disks (a local data disk and a remote data disk) to your test virtual machine: one on your local
VMFS file system and the other on your assigned VMFS file system on shared storage. Your local
VMFS file system is located on a single physical drive and your assigned VMFS file system on
shared storage is located on a LUN consisting of two physical drives.
After adding the two virtual disks to your virtual machine, you run the following scripts:
• fileserver1.sh – This script generates random reads to the local data disk.
• fileserver2.sh – This script generates random reads to the remote data disk.
• datawrite.sh – This script generates random writes to the remote data disk.
• logwrite.sh – This script generates sequential writes to the remote data disk.
Each script starts a program called aio-stress. aio-stress is a simple command-line program
that measures the performance of a disk subsystem. For more information on the aio-stress
command, see the file readme.txt on the test virtual machine, in the same directory as the scripts.
7
Slide 7-34
Storage Optimization
In this lab, you will use performance charts to monitor disk performance:
1. Identify the vmhbas used for local storage and shared storage.
2. Add disks to your test virtual machine.
3. Display the real-time disk performance charts of the vSphere Client.
4. Perform test case 1: Measure sequential write activity to your remote virtual disk.
5. Perform test case 2: Measure random write activity to your remote virtual disk.
6. Perform test case 3: Measure random read activity to your remote virtual disk.
7. Perform test case 4: Measure random read activity to your local virtual disk.
8. Summarize your findings.
9. Clean up for the next lab.
Read Rate
Write Rate
7
Slide 7-36
Storage Optimization
You should be able to do the following:
Determine which disk metrics to monitor.
Identify metrics in vCenter Server and resxtop.
Demonstrate how to monitor disk throughput.
Lesson 3:
Command-Line Storage Management
7
Slide 7-38
Storage Optimization
After this lesson, you should be able to do the following:
Use VMware vSphere® Management Assistant (vMA) to manage
vSphere virtual storage.
Use vmkfstools for VMFS operations.
Use the vscsiStats command.
The VMware vSphere® Command-Line Interface (vCLI) storage commands enable you to manage
ESXi storage. ESXi storage is storage space on various physical storage systems (local or
networked) that a host uses to store virtual machine disks.
7
Slide 7-40
Storage Optimization
Use the esxcli command with the storage namespace:
esxcli <conn_options> storage core|filesystem <cmd_options>
Examples:
To list all logical devices known to a host:
esxcli –-server esxi02 storage core device list
To list a specific device:
esxcli –-server esxi02 storage core device
list –d mpx.vmhba32:C0:T0:L0
To display host bus adapter (HBA) information:
esxcli –-server esxi02 storage core adapter list
To print mappings of datastores to their mount points and UUIDs:
esxcli –-server esxi02 storage filesystem list
The esxcli storage core command is used to display information about available logical unit
numbers (LUNs) on the ESXi hosts.
The esxcli storage filesystem command is used for listing, mounting, and unmounting
VMFS volumes.
Examples:
To display mappings between HBAs and devices:
esxcli –-server esxi02 storage core path list
To list the statistics for a specific device path:
esxcli –-server esxi02 storage core path stats get
–-path vmhba33:C0:T2:L0
To rescan all adapters:
esxcli –-server esxi02 storage core adapter rescan
You can display information about paths by running the esxcli storage core path command.
To list information about a specific device path, use the --path <path_name> option:
• esxcli –-server esxi02 storage core path list –-path vmhba33:C0:T2:L0
where <path_name> is the runtime name of the device.
To rescan a specific storage adapter, use the --adapter <adapter_name> option:
esxcli –-server esxi02 storage core adapter rescan –-adapter vmhba33
7
Slide 7-42
Storage Optimization
Use the esxcli command with the storage nfs namespace:
esxcli <conn_options> storage nfs <cmd_options>
Examples:
To list NFS file systems:
esxcli –-server esxi02 storage nfs list
To add an NFS file system to an ESXi host:
esxcli –-server esxi02 storage nfs
add --host=nfs.vclass.local --share=/lun2
--volume-name=MyNFS
To delete an NFS file system:
esxcli –-server esxi02 storage nfs remove --volume-name=MyNFS
At the command prompt, you can add, delete, and list attributes concerning NAS datastores attached
to the ESXi host.
When adding an NFS file system to an ESXi host, use the following options:
• --host – Specifies the NAS/NFS server
• --share – Specifies the folder being shared
• --volume-name – Specifies the NFS datastore name
Another useful command is the showmount command:
• showmount -e displays the file systems that are exported on the server where the command is
being run.
showmount -e <host_name> displays the file systems that are exported on the specified host.
The esxcli iscsi command can be used to configure both hardware and software iSCSI storage
for the ESXi hosts.
When you add an iSCSI software adapter to an ESXi host, you are essentially binding a VMkernel
interface to an iSCSI software adapter. The –n option enables you to specify the VMkernel interface
that the iSCSI software adapter binds to. The -A option enables you to specify the vmhba name for
the iSCSI software adapter.
To find the VMkernel interface name (vmk#), use the command:
• esxcli –-server <server_name> network ip interface list
To find the vmhba, use the command:
esxcli –-server <server_name> storage core adapter list
7
Slide 7-44
Storage Optimization
You can prevent an ESXi host from accessing the following:
A storage array
A LUN
An individual path to a LUN
Use the esxcli command to mask access to storage.
esxcli –server esxi02 storage core claimrule add -r
<claimrule_ID> -t <type> <required_option> -P
MASK_PATH
Storage path masking has a variety of purposes. In troubleshooting, you can force a system to stop
using a path that you either know or suspect is a bad path. If the path was bad, then errors should be
reduced or eliminated in the log files as soon as the path is masked.
You can also use path masking to protect a particular storage array, LUN, or path. Path masking is
useful for test and engineering environments.
The NMP is an extensible multipathing module that ESXi supports by default. You can use esxcli
storage nmp to manage devices associated with NMP and to set path policies.
The NMP supports all storage arrays listed on the VMware storage Hardware Compatibility List
(HCL) and provides a path selection algorithm based on the array type. The NMP associates a set of
physical paths with a storage device. A SATP determines how path failover is handled for a specific
storage array. A PSP determines which physical path is used to issue an I/O request to a storage
device. SATPs and PSPs are plugins within the NMP plugin.
The list command lists the devices controlled by VMware NMP and shows the SATP and PSP
information associated with each device. To show the paths claimed by NMP, run esxcli
storage nmp path list to list information for all devices, or for just one device with the --
device option.
The set command sets the PSP for a device to one of the policies loaded on the system.
7
• Only show paths to a singe device (esxcli storage nmp path list --device
<device>)
Storage Optimization
• Only show information for a single path (esxcli storage nmp path list --
path=<path>).
The esxcli storage nmp psp generic deviceconfig get and esxcli storage nmp
psp generic pathconfig get commands retrieve PSP configuration parameters. The type of
PSP determines which command to use:
• Use nmp psp generic deviceconfig get for PSPs that are set to VMW_PSP_RR,
VMW_PSP_FIXED or VMW_PSP_MRU.
• Use nmp psp generic pathconfig get for PSPs that are set to VMW_PSP_FIXED or
VMW_PSP_MRU. No path configuration information is available for VMW_PSP_RR.
The ability to perform VMware vSphere® Storage vMotion® migrations at the command prompt
can be useful when the user wants to run a script to clear datastores. Sometimes, a script is the
preferred alternative over migrating with the vSphere Client. Using a script is especially desirable in
cases where a lot of virtual machines must be migrated.
The svmotion command can be run in interactive or noninteractive mode.
In noninteractive mode, all the options must be listed as part of the command. If an option is
missing, the command fails.
Running the command in interactive mode starts with the --interactive switch. In interactive
mode, the user is prompted for the information required to complete the migration. The user is
prompted for the following:
• The datacenter
• The virtual machine to migrate
• The new datastore on which to place the virtual machine
• Whether the virtual disks will reside on the same datastore or on different datastores
When the command is run, the --server option must refer to a vCenter Server system.
7
Slide 7-47
Storage Optimization
The vmkfstools command is a VMware vSphere® ESXi Shell
command for managing VMFS volumes and virtual disks.
vmkfstools performs many storage operations from the command
line.
Examples of vmkfstools operations:
Create a VMFS datastore
Manage a VMFS datastore
Manipulate virtual disk files
Common options, including connection options, can be displayed with
the following command:
vmkfstools --help
vmkfstools is one of the VMware vSphere® ESXi™ Shell commands for managing VMFS
volumes and virtual disks. The vmkfstools command is part of the vCLI and vMA.
You can perform many storage operations using the vmkfstools command. For example, you can
create and manage VMFS datastores on a physical partition, or you can manipulate virtual disk files
that are stored on VMFS or NFS datastores.
After you make a change using vmkfstools, the vSphere Client might not be updated immediately.
You must use a refresh or rescan operation from the vSphere Client.
You use the vmkfstools command to create and manipulate virtual disks, file systems, logical
volumes, and physical storage devices on an ESXi host. You can use vmkfstools to create and
manage a virtual machine file system on a physical partition of a disk and to manipulate files, such
as virtual disks, stored on VMFS-3 and NFS. You can also use vmkfstools to set up and manage
raw device mappings (RDMs).
7
Slide 7-49
Storage Optimization
General options for vmkfstools :
connection_options specifies the target server and authentication
information.
--help prints a help message for each command specific option.
--vihost specifies the ESXi host to run the command against.
-v specifies the verbosity level of the command output.
You can use one or more command-line options and associated arguments to specify the activity for
vmkfstools to perform, for example, choosing the disk format when creating a new virtual disk.
After entering the option, specify a target on which to perform the operation. The target can indicate
a partition, device, or path.
With vmkfstools you can use the following general options:
• connection_options: Specifies the target server and authentication information if required.
The connection options for vmkfstools are the same options that vicfg- commands use.
Run vmkfstools --help for a list of all connection options.
• --help: Prints a help message for each command-specific and each connection option. Calling
the script with no arguments or with --help has the same effect.
• --vihost | -h <esx_host>: When you execute vmkfstools with the --server option
pointing to a vCenter Server system, use --vihost to specify the ESXi host to run the
command against.
• -v: The -v suboption indicates the verbosity level of the command output.
Generally, you do not need to log in as the root user to run the vmkfstools command. However,
some commands, such as the file system commands, might require the root user login.
The vmkfstools command supports the following command syntax:
vmkfstools conn_options options target
Target specifies a partition, device, or path to which to apply the command option.
7
Slide 7-51
Storage Optimization
vmkfstools is used to create and manage VMFS file systems:
--createfs creates a VMFS5 file system of a specified partition.
--blocksize specifies the block size of the VMFS files system.
--setfsname sets name of the VMFS file system to create.
--spanfs extends the VMFS file system with the specified partition.
--rescanvmfs rescans the host for new VMFS volumes
--queryfs lists attributes of a file or directory on a VMFS volumes.
You use the vmkfstools command to create and manipulate file systems, logical volumes, and
physical storage devices on an ESXi host.
--createfs | -C vmfs5 -b | --blocksize -S | --setfsname <fsName>
<partition> – Creates a VMFS5 file system on a specified partition, such as naa.<naa_ID>:1.
The specified partition becomes the file system's head partition.
--blocksize | -b – Specifies the block size of the VMFS file system to create. When omitted,
defaults to using 1MB.
--setfsname | -S – Name of the VMFS file system to create.
You can manage virtual disks with vmkfstools. The following options can be used to manage
virtual disk files:
• --createvirtualdisk | -c <size> -a | --adaptertype <srcfile> -d | --
diskformat <location> Creates a virtual disk at the specified location on a VMFS volume.
With <size> you can specify specify k|K, m|M, or g|G. Default size is 1MB, default adapter type
is buslogic, and default disk format is zeroedthick.
• --adapterType | -a [buslogic|lsilogic|ide] Adapter type of a disk to be created.
Accepts buslogic, lsilogic, or ide.
• --diskformat | -d Specifies the target disk format when used with -c, -i, or -X. For c,
accepts zeroedthick, eagerzeroedthick, or thin. For i, accepts zeroedthick, eagerzeroedthick,
thin, rdm:dev, rdmp:dev, or 2gbsparse. For -X, accepts eagerzeroedthick.
• --clonevirtualdisk | -i <src_file> <dest_file> --diskformat | -d
<format> --adaptertype | -a <type> Creates a copy of a virtual disk or raw disk. The
copy is in the specified disk format. Takes source disk and destination disk as arguments.
• --deletevirtualdisk | -U <disk> Deletes files associated with the specified
virtual disk.
7
• --extendvirtualdisk | -X [-d eagerzeroedthick] Extends the specified VMFS
virtual disk to the specified length. This command is useful for extending the size of a virtual
disk allocated to a virtual machine after the virtual machine has been created. However, this
Storage Optimization
command requires that the guest operating system has some capability for recognizing the new
size of the virtual disk and taking advantage of this new size. For example, the guest operating
system can take advantage of this new size by updating the file system on the virtual disk to
take advantage of the extra space.
• --createrdm | -r <rdm_file> Creates a raw disk mapping, that is, maps a raw disk to a
file on a VMFS file system. Once the mapping is established, the mapping file can be used to
access the raw disk like a normal VMFS virtual disk. The file length of the mapping is the same
as the size of the raw disk that it points to.
• --createrdmpassthru | -z <device> <map_file> Creates a passthrough raw disk
mapping. Once the mapping is established, it can be used to access the raw disk like a normal
VMFS virtual disk. The file length of the mapping is the same as the size of the raw disk that it
points to.
• --querydm | -q This option is not currently supported.
• --geometry | -g Returns the geometry information (cylinders, heads, sectors) of a
virtual disk.
• --writezeros | -w Initializes the virtual disk with zeros. Any existing data on the virtual
disk is lost.
• --inflatedisk | -j Converts a thin virtual disk to eagerzeroedthick. Any data on the thin
disk is preserved. Any blocks that were not allocated are allocated and zeroed out.
This example illustrates creating a two-gigabyte virtual disk file named rh6-2.vmdk on the
VMFS file system named Storage1. This file represents an empty virtual disk that virtual machines
can access.
7
Slide 7-54
Storage Optimization
vscsiStats collects and reports counters on storage activity:
Data is collected at the virtual SCSI device level in the kernel.
Results are reported per VMDK, regardless of the underlying storage
protocol.
Reports data in histogram form, which includes:
I/O size
Seek distance
Outstanding I/Os
Latency in microseconds
Because vscsiStats collects its results where the guest interacts with the hypervisor, it is unaware
of the storage implementation. Latency statistics can be collected for all storage configurations with
this tool.
Instead of computing averages over large intervals, histograms are kept so that a full distribution is
available for later analysis. For example, knowing the average I/O size of 23KB for an application is
not as useful as knowing that the application is doing 8 and 64KB I/Os in certain proportions.
Sequentiality and I/O size are important characteristics of storage profiles. A variety of best
practices have been provided by storage vendors to enable customers to tune their storage to a
particular I/O size and access type. As an example, it might make sense to optimize an array’s stripe
size to its average I/O size. vscsiStats can provide a histogram of I/O sizes and block distribution
to help this process.
7
Slide 7-56
Storage Optimization
vscsiStats <options>
-l List running virtual machines and their world IDs
(worldGroupID).
-s Start vscsiStats data collection.
-x Stop vscsiStats data collection.
-p Print histograms, specifying histogram type.
-c Produce results in a comma-delimited list.
-h Display help menu for more information about
command-line parameters.
Starting the vscsiStats data collection requires the world ID of the virtual machine being
monitored. You can determine this identifier by running the -l option to generate a list of all
running virtual machines.
To start the data collection, use this command:
vscsiStats -s -w <worldGroup_ID>
Once vscsiStats data collection has been started, it will automatically stop after about 30
minutes. If the analysis is needed for a longer period, the start command (-s option) should be
repeated above in the window where the command was run originally. That will defer the timeout
and termination by another 30 minutes. The -p option works only while vscsiStats is running.
When you are viewing data counters, the histogram type is used to specify either all of the statistics
or one group of them. Options include all, ioLength, seekDistance, outstandingIOs,
latency, and interarrival. These options are case-sensitive.
To manually stop the data collection and reset the counters, use the -x option:
vscsiStats -x -w <worldGroup_ID>
Because results are accrued and reported out in summary, the histograms will include data since
collection was started. To reset all counters to zero without restarting vscsiStats, run:
/usr/lib/vmware/bin/vscsiStats -r
7
Slide 7-57
Storage Optimization
In this lab, you will use various commands to view storage
information and monitor storage activity.
1. Use esxcli to view storage configuration information.
2. Use vmkfstools to manage volumes and virtual disks.
3. Generate disk activity in your test virtual machine.
4. Start vscsiStats.
5. Use vscsiStats to generate a histogram for disk latency.
6. Use vscsiStats to determine type of I/O disk activity.
7. Generate disk activity in your test virtual machine.
8. Use vscsiStats to determine type of I/O disk activity.
9. Clean up for the next lab.
7
Slide 7-59
Storage Optimization
Lesson 4:
Troubleshooting Storage Performance
Problems
7
Slide 7-61
Storage Optimization
2. Check for resource pool
CPU saturation. 11. Check for using only one vCPU
in an SMP VM.
16. Check for low guest CPU
3. Check for host CPU saturation.
utilization.
12. Check for high CPU ready time
4. Check for guest CPU saturation. on VMs running in 17. Check for past VM memory
under-utilized hosts. swapping.
5. Check for active VM
memory swapping.
13. Check for slow storage device. 18. Check for high memory
demand in a resource pool.
6. Check for VM swap wait.
14. Check for random increase
in I/O latency on a shared storage 19. Check for high memory
7. Check for active VM
device. demand in a host.
memory compression.
The performance of the storage infrastructure can have a significant impact on the performance of
applications, whether running in a virtualized or nonvirtualized environment. In fact, problems with
storage performance are the most common causes of application-level performance problems.
Slow storage is the most common cause of performance problems in the vSphere environment.
Slow storage response times might be an indication that your storage is being overloaded. Slow
storage or overloaded storage are related problems, so the solutions presented in this lesson address
both problems.
Severely overloaded storage can be the result of a large number of different issues in the underlying
storage layout or infrastructure. In turn, overloaded storage can manifest itself in many different
ways, depending on the applications running in the virtual machines. When storage is severely
overloaded, operation time-outs can cause commands already issued to disk to be terminated (or
aborted). If a storage device is experiencing command aborts, the cause of these aborts must be
identified and corrected.
To check for command aborts
1. In the vSphere Client, select the host and display an advanced disk performance chart.
For all LUN objects, look at the counter named Commands Terminated. If the value is greater than 0
on any LUN, then storage is overloaded on that LUN. Otherwise, storage is not severely overloaded.
7
Slide 7-63
Storage Optimization
Excessive demand is placed on the storage device.
Storage is misconfigured. Check the following:
Number of disks per LUN
RAID level of a LUN
Assignment of array cache to a LUN
Slow storage is the most common cause of performance problems in a vSphere environment. The
disk latency metrics provided by the vSphere Client are Physical Device Read Latency and Physical
Device Write Latency. These metrics show the average time for an I/O operation to complete. The
I/O operation is measured from the time the operation is submitted to the hardware adapter to the
time the completion notification is provided back to ESXi. As a result, these values represent
performance achieved by the physical storage infrastructure. Storage is slow if the disk latencies
are high.
The advanced disk performance chart in the vSphere Client provides you with latency metrics. For
all LUN objects, look at the counters named Physical Device Read Latency and Physical Device
Write Latency. For these measurements, if the average is above 10 milliseconds, or if the peaks are
above 20 milliseconds, for any vmhba, then storage might be slow or overloaded on that vmhba.
Otherwise, storage does not appear to be slow or overloaded.
As an alternative to using the vSphere Client, you monitor the latency metrics like DAVG/cmd with
resxtop/esxtop. To display this information, type d in the resxtop/esxtop window.
These latency values are provided only as points at which further investigation is justified. Expected
disk latencies depend on the nature of the storage workload (for example, the read/write mix,
randomness, and I/O size) and the capabilities of the storage subsystems.
7
Slide 7-65
Storage Optimization
Three main workload factors that affect storage response time:
I/O arrival rate
I/O size
I/O locality
Use the storage devices monitoring tools to collect data to
characterize the workload.
To determine whether high disk latency values represent an actual problem, you must understand the
storage workload. Three main workload factors affect the response time of a storage subsystem:
• I/O arrival rate – A given configuration of a storage device has a maximum rate at which it can
handle specific mixes of I/O requests. When bursts of requests exceed this rate, they might need
to be queued in buffers along the I/O path. This queuing can add to the overall response time.
• I/O size – The transmission rate of storage interconnects, and the speed at which an I/O can be
read from or written to disk, are fixed quantities. As a result, large I/O operations naturally take
longer to complete. A response-time that is slow for small transfers might be expected for larger
operations.
• I/O locality – Successive I/O requests to data that is stored sequentially on disk can be
completed more quickly than to data that is spread randomly. In addition, read requests to
sequential data are more likely to be completed out of high-speed caches in the disks or arrays.
Storage devices typically provide monitoring tools for collecting data that can help characterize
storage workloads. In addition, storage vendors typically provide data on recommended
configurations and expected performance of their devices based on these workload characteristics.
The storage response times should be compared to expectations for that workload. If investigation
determines that storage response times are unexpectedly high, corrective action should be taken.
Applications, when consolidated, can share expensive physical resources such as storage. Sharing
storage resources offers many advantages: ease of management, power savings and higher resource
use. However, situations might occur when high I/O activity on the shared storage could impact the
performance of certain latency-sensitive applications:
• A higher than expected number of I/O-intensive applications in virtual machines sharing a
storage device become active at the same time. As a result, these virtual machines issue a very
high number of I/O requests concurrently. Increased I/O requests increases the load on the
storage device and the response time of I/O requests also increases. The performance of virtual
machines running critical workloads might be impacted because these workloads now must
compete for shared storage resources.
• Operations, such as Storage vMotion or a backup operation running in a virtual machine,
completely hog the I/O bandwidth of a shared storage device. These operations cause other
virtual machines sharing the device to suffer from resource starvation.
7
Storage I/O Control can detect the existence of an I/O congestion based on a preset congestion
threshold. Once the condition is detected, Storage I/O Control uses the resource controls set on the
Storage Optimization
virtual machines to control each virtual machine’s access to the shared datastore.
In addition, VMware vSphere® Storage DRS™ provides automatic or manual datastore cluster I/O
capacity and performance balancing. Storage DRS can be configured to perform automatic virtual
machine storage vMotion migrations when space use or I/O response time thresholds of a datastore
have been exceeded.
7
Slide 7-68
Storage Optimization
User complaint: Powering on a virtual machine takes longer than
usual.
Sometimes powering on a virtual machine takes 5 seconds.
Other times powering on a virtual machine takes 5 minutes!
What do you check?
Because powering on a virtual machine requires disk activity on the
host, check the disk metrics for the host.
In this example, users are complaining that, at times, powering on their virtual machines is very
slow. Sometimes it takes 5 seconds to power on a virtual machine, and that is OK. However, at other
times, a virtual machine takes 5 minutes to power on, which is unacceptable.
Because powering on a virtual machine requires disk activity on the virtual machine’s host, you
might want to check the host’s disk metrics, such as disk latency.
Maximum disk
latencies range
from 100ms to
1100ms.
This latency is
very high.
If you use the vSphere Client to monitor performance, the overview chart for your host’s disk gives
you a quick look at the highest disk latency experienced by your host.
Storage vendors provide latency statistics for their hardware that can be checked against the latency
statistics in either the vSphere Client or resxtop/esxtop. When the latency numbers are high, the
hardware might be overworked by too many servers. Some examples of latency numbers include the
following:
• When reading data from the array cache, latencies of 2–5 milliseconds usually reflect a healthy
storage system.
• When data is randomly being read across the disk, latencies of 5–12 milliseconds reflect a
healthy storage architecture.
• Latencies of 15 milliseconds or greater might reflect an overutilized storage array or a storage
array that is not operating properly.
In the example above, maximum disk latencies of 100–1,100 milliseconds are being experienced.
This is very high.
7
Slide 7-70
Storage Optimization
To further investigate the problem, you can view latency information using resxtop/esxtop. As a
general rule, if the value of GAVG/cmd is greater than 20 milliseconds, you have high latency.
This example shows relatively large values for DAVG/cmd and GAVG/cmd. Therefore, this host is
experiencing high latency, specifically on vmhba1. The value for KAVG/cmd is zero, which means
that the commands are not queuing in the VMkernel.
This leads to
high latencies.
Solving the problem might involve not only looking at performance statistics but also looking to
see if there are any errors encountered on the host. In this example, the host’s events list shows
that a disk has connectivity issues. A storage device that is not working correctly can lead to high
latency values.
7
Slide 7-72
Storage Optimization
Scenario:
A group of virtual machines runs on a host.
Each virtual machines application reads to and writes from the same
NAS device.
The NAS device is also a virtual machine.
Problem:
Users suddenly cannot log in to any of the virtual machines.
The login process is very slow.
Initial speculation:
Virtual machines are saturating a resource.
In this example, a group of virtual machines runs on a host. An application runs in each of these
virtual machines. Each application reads to and writes from a NAS device shared by all the
applications. The NAS device is implemented as a virtual machine.
The problem is that, after some time, the users are unable to log in to any of the virtual machines
because the login process is very slow to complete. This behavior might indicate that a resource
used by the virtual machine is being saturated.
The next few slides take you through a possible sequence of steps to take to identify the problem.
Because the initial speculation is that the virtual machines are saturating a resource, you might want
to check the host’s CPU usage. In the example above, a stacked chart of CPU usage is displayed for
the virtual machines on the host. During the first interval of 4:00 a.m. to 4:00 p.m., CPU usage is
predictable and the host is not saturated. However, during the second interval, starting at 10:00 p.m.,
CPU usage has increased dramatically and indicates that the host’s CPU resources are saturated.
But what about the other resources, such as disk usage?
7
Slide 7-74
Storage Optimization
Predictable, balanced
disk usage Uneven, reduced
disk usage
On checking the host’s disk usage, a different pattern exists. The stacked chart of virtual machine
disk usage shows that between 4:00 a.m. and 4:00 p.m., disk usage is balanced and predictable.
However, during the interval starting at 10:00 p.m., disk usage is reduced and somewhat uneven.
Because overall disk throughput is reduced, the next task is to break down the disk throughput into
the read rate and write rate.
Steady read
and write traffic
Increased write traffic.
Zero read traffic
Disk throughput can be monitored in terms of the disk read rate and disk write rate. The line chart
above shows the disk read rate and the disk write rate for the HBA being used on the host. During
the interval of 4:00 a.m. and 4:00 p.m., read and write activity is steady. However, during the
interval starting at 10:00 p.m., the write traffic increased and the read traffic dropped to zero.
Figuring out why the read traffic dropped to zero can help pinpoint the problem.
7
Slide 7-76
Storage Optimization
Problem diagnosis:
CPU usage increased per virtual machine.
Write traffic increased per virtual machine.
Write traffic to the NAS virtual machine significantly increased.
One possible problem:
An application bug in the virtual machine caused the error condition:
The error condition caused excessive writes to the NAS virtual machine.
Each virtual machine is so busy writing that it never reads.
To summarize, something caused excessive writes to occur from each virtual machine. At
10:00 p.m., an increase in CPU usage occurred for each virtual machine. Also at the same time, a
big increase in write traffic to the NAS virtual machine happened.
A possible problem, which has nothing to do with the storage hardware or virtual machine
configuration, could be that the application running in the virtual machine has a bug. An error
condition in the application caused excessive writes by each virtual machine running the application
to the same NAS device. Each virtual machine is so busy writing data that it never reads.
At this point, you might want to monitor the application itself to see if the application is indeed
the problem.
Due to the complexity and variety of storage infrastructures, providing specific solutions for slow or
overloaded storage is impossible. In a vSphere environment, beyond rebalancing the load, typically
little that can be done from within the vSphere environment to solve problems related to slow or
overloaded storage. Use the tools provided by your storage vendor to monitor the workload placed
on your storage device, and follow your storage vendor’s configuration recommendations to
configure the storage device to meet your demands.
To help guide the investigation of performance issues in the storage infrastructure, follow these
general considerations:
• Check that your hardware is operating properly and is using an optimal configuration.
• Reduce the storage needs of your hosts and virtual machines.
• Balance the I/O workload across all available LUNs.
• Understand the workload being placed on your storage devices.
7
Slide 7-78
Storage Optimization
To resolve the problems of slow or overloaded storage, solutions can
include the following:
Ensure that hardware is working properly.
Configure the HBAs and RAID controllers for optimal use.
Upgrade your hardware, if possible.
Excessive disk throughput can lead to overloaded storage. Throughput in a vSphere environment
depends on many factors related to both hardware and software. Among the important factors are
Fibre Channel link speed, the number of outstanding I/O requests, the number of disk spindles,
RAID type, SCSI reservations, and caching or prefetching algorithms.
High latency is an indicator of slow storage. Latency depends on many factors, including the
following:
• Queue depth or capacity at various levels
• I/O request size
• Disk properties such as rotational, seek, and access delays
• SCSI reservations
• Caching or prefetching algorithms
Make sure that your storage hardware is working properly and is configured according to your
storage vendor’s recommendations. Periodically check the host’s event logs to make sure that
nothing unusual is occurring.
Consider the trade-off between memory capacity and storage demand. Some applications can take
advantage of additional memory capacity to cache frequently used data, thus reducing storage loads.
For example, databases can use system memory to cache data and avoid disk access. Check the
applications running in your virtual machines to see if they might benefit from increased caches.
Provide more memory to those virtual machines if resources permit, which might reduce the burden
on the storage subsystem and also have the added benefit of improving application performance.
Refer to the application-specific documentation to determine whether an application can take
advantage of additional memory. Some applications require changes to configuration parameters in
order to make use of additional memory.
Also, eliminate all possible swapping to reduce the burden on the storage subsystem. First, verify
that the virtual machines have the memory they need by checking the swap statistics in the guest.
Provide memory if resources permit, then try to eliminate host-level swapping.
7
Slide 7-80
Storage Optimization
Spread I/O loads over the
available paths to the storage.
Configure VMware vSphere®
Storage DRS to balance
capacity and IOPS.
For disk-intensive workloads:
Use enough HBAs to handle
the load.
If necessary, separate storage
processors to separate
systems.
To optimize storage array performance, spread I/O loads across multiple HBAs and storage
processors. Spreading the I/O load provides higher throughput and less latency for each application.
Make sure that each server has a sufficient number of HBAs to allow maximum throughput for all
the applications hosted on the server for the peak period.
vSphere Storage DRS helps balance I/O loads by migrating a virtual machine’s files across multiple
datastores when latency reaches a threshold.
If you have heavy disk I/O loads, you might need to assign separate storage processors to separate
systems to handle the amount of traffic bound for storage.
Understand the load being placed on the storage device. To do this, many storage arrays have tools
that allow workload statistics (such as IOPS, I/O sizes, and queuing data) to be captured. Refer to
the vendor’s documentation for details. High-level data provided by the vSphere Client and
resxtop can be useful. The vscsiStats tool can also provide detailed information on the I/O
workload generated by a virtual machine’s virtual SCSI device.
Strive for complementary workloads. The idea is for the virtual machines not to overtax any of the
four resources (CPU, memory, networking, and storage). For example, the workload of a Web server
virtual machine is mainly memory-intensive and occasionally disk-intensive. The workload of a
Citrix server virtual machine is mainly CPU- and memory-intensive. Therefore, place the Web
server and Citrix server virtual machines on the same datastore. Or place a firewall virtual machine
(which is mainly CPU-, memory-, and network-intensive) on the same datastore as a database server
(which is CPU-, memory-, and disk-intensive).
Also consider peak usage time to determine virtual machine placement. For example, place
Microsoft Exchange Server virtual machines on the same datastore if the peak usage of each virtual
machine occurs at a different time of day. In the example above, peak usage time for the users in
Asia will most likely differ from peak usage time for the users in Europe.
As you capture long-term performance data, consider using cold migrations or Storage vMotion to
balance and optimize your datastore workloads.
7
Slide 7-82
Storage Optimization
Configure each LUN with the right RAID level and storage
characteristics for the applications in virtual machines that will use it.
Use VMFS file systems for your virtual machines.
Avoid oversubscribing paths (SAN) and links (iSCSI and NFS).
Isolate iSCSI and NFS traffic.
Applications that write a lot of data to storage should not share Ethernet
links to a storage device.
Postpone major storage maintenance until off-peak hours.
Eliminate all possible swapping to reduce the burden on the storage
subsystem.
In SAN configurations, spread I/O loads over the available paths to the
storage devices.
Strive for complementary workloads.
Best practices are recommendations for the configuration, operation, and management of the
architecture. Best practices are developed in the industry over time. As technology evolves so do
best practices. Always balance best practices with an organizations objectives.
7
Slide 7-84
Storage Optimization
Factors that affect storage performance include storage protocols,
storage configuration, queuing, and VMFS configuration.
Disk throughput and latency are two key metrics to monitor storage
performance.
Overloaded storage and slow storage are common storage
performance problems.
Questions?
CPU Optimization 8
Slide 8-1
Module 8
8
CPU Optimization
8
machines are allocated insufficient CPU resources, you should know
how to monitor both host and virtual machine CPU usage.
CPU Optimization
Likewise, to resolve a situation where the CPU of a host or virtual
machine becomes saturated, you should know how to troubleshoot
such a problem.
8
CPU Optimization
Lesson 1:
CPU Virtualization Concepts
8
The CPU scheduler has the following features:
CPU Optimization
Schedules virtual CPUs (vCPUs) on physical CPUs
Enforces the proportional-share algorithm for CPU usage
Supports SMP virtual machines
Uses relaxed co-scheduling for SMP virtual machines
Is NUMA/processor/cache topologyaware
The role of the CPU scheduler is to assign execution contexts to processors in a way that meets
system objectives such as responsiveness, throughput, and utilization. On conventional operating
systems, the execution context corresponds to a process or a thread. On VMware vSphere® ESXi™
hosts, the execution context corresponds to a world.
A virtual machine is a collection of worlds, with some worlds being virtual CPUs (vCPUs) and
other threads doing additional work. For example, a virtual machine consists of a world that controls
the mouse, keyboard, and screen (MKS). The virtual machine also has a world for its virtual
machine monitor (VMM).
There are nonvirtual machine worlds as well. These nonvirtual machine worlds are VMkernel
worlds and are used to perform various system tasks. Examples of these nonvirtual machine worlds
include the idle, driver, and vmotion worlds.
8
Schedules vCPUs on physical CPUs
Checks physical CPU utilization every 2 to 40 milliseconds and
migrates vCPUs as necessary
CPU Optimization
Enforces the proportional-share algorithm for CPU usage:
When CPU resources are overcommitted, hosts time-slice physical
CPUs across all virtual machines.
Each vCPU is prioritized by resource allocation settings (shares,
reservations, limits).
One of the main tasks of the CPU scheduler is to choose which world is to be scheduled to a
processor. If the target processor is already occupied, the scheduler must decide whether to preempt
the currently running world on behalf of the chosen one.
The CPU scheduler checks physical CPU utilization every 2 to 40 milliseconds, depending on the
CPU type, and migrates vCPUs as necessary. A world migrates from a busy processor to an idle
processor. A world migration can be initiated either by a physical CPU that becomes idle or by a
world that becomes ready to be scheduled.
An ESXi host implements the proportional-share-based algorithm. When CPU resources are
overcommitted, the ESXi host time-slices the physical CPUs across all virtual machines so that each
virtual machine runs as if it had its specified number of virtual processors. It associates each world
with a share of CPU resource.
This is called entitlement. Entitlement is calculated from user-provided resource specifications like
shares, reservation, and limits. When making scheduling decisions, the ratio of the consumed CPU
resource to the entitlement is used as the priority of the world. If there is a world that has consumed
less than its entitlement, the world is considered high priority and will likely be chosen to run next.
A symmetric multiprocessing (SMP) virtual machine presents the guest operating system and
applications with the illusion that they are running on a dedicated physical multiprocessor. An
ESXi host implements this illusion by supporting co-scheduling of the vCPUs within an SMP
virtual machine.
Co-scheduling refers to a technique used for scheduling related processes to run on different
processors at the same time. An ESXi host uses a form of co-scheduling that is optimized for
running SMP virtual machines efficiently.
8
virtual machine. A vCPU is considered to be making progress if it consumes CPU in the guest level
or halts. The time spent in the hypervisor is excluded from the progress. This means that the
CPU Optimization
hypervisor execution might not always be co-scheduled. This is acceptable because not all
operations in the hypervisor benefit from being co-scheduled. When it is beneficial, the hypervisor
makes explicit co-scheduling requests to achieve good performance.
The progress of each vCPU in an SMP virtual machine is tracked individually. The skew is
measured as the difference in progress between the slowest vCPU and each of the other vCPUs. A
vCPU is considered to be skewed if its cumulative skew value exceeds a configurable threshold,
typically a few milliseconds.
With relaxed co-scheduling, only those vCPUs that are skewed must be co-started. This ensures that
when any vCPU is scheduled, all other vCPUs that are “behind” will also be scheduled, reducing
skew. This approach is called relaxed co-scheduling because only a subset of a virtual machine’s
vCPUs must be scheduled simultaneously after skew has been detected.
The vCPUs that advanced too much are individually stopped. After the lagging vCPUs catch up, the
stopped vCPUs can start individually. Co-scheduling all vCPUs is still attempted to maximize the
performance benefit of co-scheduling.
There is no co-scheduling overhead for an idle vCPU, because the skew does not grow when a
vCPU halts. An idle vCPU does not accumulate skew and is treated as if it were running for co-
scheduling purposes. This optimization ensures that idle guest vCPUs do not waste physical
processor resources, which can instead be allocated to other virtual machines.
For example, an ESXi host with two physical cores might be running one vCPU each from two
different virtual machines, if their sibling vCPUs are idling, without incurring any co-scheduling
overhead.
8
The CPU scheduler spreads load across all sockets to maximize the
aggregate amount of cache available.
CPU Optimization
Cores within a single socket typically use a shared last-level cache.
Use of a shared last-level cache can improve vCPU performance if
running memory-intensive workloads.
The CPU scheduler can interpret processor topology, including the relationship between sockets,
cores, and logical processors. The scheduler uses topology information to optimize the placement of
vCPUs onto different sockets to maximize overall cache utilization and to improve cache affinity by
minimizing vCPU migrations.
A dual-core processor can provide almost double the performance of a single-core processor by
allowing two vCPUs to execute at the same time. Cores within the same processor are typically
configured with a shared last-level cache used by all cores, potentially reducing the need to access
slower, main memory. Last-level cache is a memory cache that has a dedicated channel to a CPU
socket (bypassing the main memory bus), enabling it to run at the full speed of the CPU.
By default, in undercommitted systems, the CPU scheduler spreads the load across all sockets. This
improves performance by maximizing the aggregate amount of cache available to the running
vCPUs. As a result, the vCPUs of a single SMP virtual machine are spread across multiple sockets.
In some cases, such as when an SMP virtual machine exhibits significant data sharing between its
vCPUs, this default behavior might be suboptimal. For such workloads, it can be beneficial to
schedule all of the vCPUs on the same socket, with a shared last-level cache, even when the ESXi
host is undercommitted. In such scenarios, you can override the default behavior of spreading
vCPUs across packages by including the following configuration option in the virtual machine’s
VMX configuration file: sched.cpu.vsmpConsolidate="TRUE".
In a nonuniform memory access (NUMA) server, the delay incurred when accessing memory varies
for different memory locations. Each processor chip on a NUMA server has local memory directly
connected by one or more local memory controllers. For processes running on that chip, this
memory can be accessed more rapidly than memory connected to other, remote processor chips.
When a high percentage of a virtual machine’s memory is located on a remote processor, this means
that the virtual machine has poor NUMA locality. When this happens, the virtual machine’s
performance might be less than if its memory were all local. All AMD Opteron processors and some
recent Intel Xeon processors use NUMA architecture.
When running on a NUMA server, the CPU scheduler assigns each virtual machine to a home node.
In selecting a home node for a virtual machine, the scheduler attempts to keep both the virtual
machine and its memory located on the same node, thus maintaining good NUMA locality.
However, there are some instances where a virtual machine will have poor NUMA locality despite
the efforts of the scheduler:
8
• Virtual machine memory size is greater than memory per NUMA node – Each physical
processor in a NUMA server is usually configured with an equal amount of local memory. For
CPU Optimization
example, a four-socket NUMA server with a total of 64GB of memory typically has 16GB
locally on each processor. If a virtual machine has a memory size greater than the amount of
memory local to each processor, the CPU scheduler does not attempt to use NUMA
optimizations for that virtual machine. This means that the virtual machine’s memory is not
migrated to be local to the processors on which its vCPUs are running, leading to poor NUMA
locality.
• The ESXi host is experiencing moderate to high CPU loads – When multiple virtual machines
are running on an ESXi host, some of the virtual machines end up sharing the same home node.
When a virtual machine is ready to run, the scheduler attempts to schedule the virtual machine
on its home node. However, if the load on the home node is high, and other home nodes are less
heavily loaded, the scheduler might move the virtual machine to a different home node.
Assuming that most of the virtual machine’s memory was located on the original home node,
this causes the virtual machine that is moved to have poor NUMA locality.
However, in return for the poor locality, the virtual machines on the home node experience a
temporary increase in available CPU resources. The scheduler eventually migrates the virtual
machine’s memory to the new home node, thus restoring NUMA locality. The poor NUMA locality
experienced by the virtual machine that was moved is a temporary situation and is seldom the source
of performance problems. However, frequent rebalancing of virtual machines between local and
remote nodes might be a symptom of other problems, such as host CPU saturation.
A Wide-VM has more vCPUs than the available cores on a NUMA node.
For example, a 4-vCPU SMP virtual machine is considered wide on a two-socket
(dual core) system.
Only the cores count (hyperthreading threads do not).
An 8-vCPU SMP virtual machine is considered wide on a two-socket (quad core) system
with hyperthreading enabled because the processor has only four cores per NUMA node.
Wide virtual machines have more vCPUs than the number of cores in each physical NUMA node.
These virtual machines are assigned two or more NUMA nodes and are preferentially allocated
memory local to those NUMA nodes.
Because vCPUs in these wide virtual machines might sometimes need to access memory outside
their own NUMA node, the virtual machine might experience higher average memory access
latencies than virtual machines that fit entirely within a NUMA node.
Memory latency can be mitigated by appropriately configuring Virtual NUMA on the virtual
machines. Virtual NUMA allows the guest operating system to take on part of the memory-locality
management task.
8
Assuming uniform memory access pattern, about 50% of memory
accesses are local, because there are two NUMA nodes without wide-
VM NUMA support.
CPU Optimization
In this case, wide-VM NUMA support wont make much performance
difference .
On a four-socket system, only about 25 percent of memory accesses
are local.
Wide-VM NUMA support likely improves it to 50 percent local accesses
due to memory interleaving allocations.
The performance benefit is even greater on bigger systems.
A slight performance advantage can occur in some environments that have virtual machines
configured with no more vCPUs than the number of cores in each physical NUMA node.
Some memory bandwidth bottlenecked workloads can benefit from the increased aggregate memory
bandwidth available when a virtual machine that would fit within one NUMA node is nevertheless
split across multiple NUMA nodes. This split can be accomplished by using the
numa.vcpu.maxPerMachineNode option.
On hyperthreaded systems, virtual machines with a number of vCPUs greater than the number of
cores in a NUMA node, but lower than the number of logical processors in each physical NUMA
node, might benefit from using logical processors with local memory instead of full cores with
remote memory. This behavior can be configured for a specific virtual machine with the
numa.vcpu.preferHT flag.
You might think that idle virtual machines do not cost anything in terms of performance. However,
timer interrupts still need to be delivered to these virtual machines. Lower timer interrupt rates can
help a guest operating system’s performance.
Using CPU affinity has a positive effect for the virtual machine being pinned to a vCPU. However,
for the entire system as a whole, CPU affinity constrains the scheduler and can cause an improperly
balanced load. VMware® strongly recommends against using CPU affinity.
Try to use as few vCPUs in your virtual machine as possible to reduce the amount of timer
interrupts necessary as well as reduce any co-scheduling overhead that might be incurred. Also, use
SMP kernels (and not uniprocessor kernels) in SMP virtual machines.
CPU capacity is a finite resource. Even on a server that allows additional processors to be
configured, there is always a maximum number of processors that can be installed. As a result,
performance problems might occur when there are insufficient CPU resources to satisfy demand.
8
If a vCPU tries to execute a CPU instruction while no cycles are
available on the physical CPU, the request is queued.
CPU Optimization
A physical CPU with no cycles could be due to high load on the
physical CPU or a higher-priority vCPU receiving preference.
The amount of time that the vCPU waits for the physical CPU to
become available is called ready time.
This latency can affect performance of the guest operating system and
its applications within a virtual machine.
To achieve best performance in a consolidated environment, you must consider ready time. Ready
time is the time a virtual machine must wait in the queue in a ready-to-run state before it can be
scheduled on a CPU.
When multiple processes are trying to use the same physical CPU, that CPU might not be
immediately available, and a process must wait before the ESXi host can allocate a CPU to it. The
CPU scheduler manages access to the physical CPUs on the host system.
8
CPU Optimization
Lesson 2:
Monitoring CPU Activity
Key metrics:
8
Host CPU used
Virtual machine CPU used
CPU Optimization
Virtual machine CPU ready time
The VMware vSphere® Client displays information in units of time
(milliseconds).
resxtop displays information in percentages (of sample time).
To accurately determine how well the CPU is running, you need to look at specific information
available from our monitoring tools.
The values obtained from the VMware vSphere® Client™ and resxtop are gathered based on
different sampling intervals. By default, the vSphere Client uses a sampling interval of 20 seconds
and resxtop uses a sampling interval of 5 seconds. Be sure to normalize the values before
comparing output from both monitoring tools:
• Used (CPU – host). Amount of time that the host’s CPU (physical CPU) was used during the
sampling period.
• Usage (CPU – virtual machine). Amount of time that the virtual machine’s CPU (vCPU) was
actively using the physical CPU. For virtual SMP virtual machines, this can be displayed as an
aggregate of all vCPUs in the virtual machine or per vCPU.
• Ready (CPU – virtual machine). Amount of time the virtual machine’s CPU (vCPU) was ready
but could not get scheduled to run on the physical CPU. CPU ready time is dependent on the
number of virtual machines on the host and their CPU loads. For virtual SMP virtual machines,
ready can be displayed as an aggregate of all vCPUs in the virtual machine or per vCPU.
Use the vSphere Client CPU performance charts to monitor CPU usage for hosts, clusters, resource
pools, virtual machines, and VMware vSphere® vApps™.
There are two important caveats about the vSphere Client performance charts:
• Only two counters at a time can be displayed in the chart.
• The virtual machine object shows aggregated data for all the CPUs in a virtual SMP virtual
machine. The numbered objects (in the example above, 0 and 1) track the performance of the
individual vCPUs.
A short spike in CPU used or CPU ready indicates that you are making the best use of the host
resources. However, if both values are constantly high, the hosts are probably overcommitted.
Generally, if the CPU used value for a virtual machine is above 90 percent and the CPU Ready
value is above 20 percent, performance is negatively affected.
VMware documentation provides performance thresholds in time values and percentages. To
convert time values to percentages, divide the time value by the sample interval. For the vSphere
Client, the default sampling interval is 20 seconds (or 20,000 milliseconds). In the example above,
the CPU ready value for vCPU0 is 14,996 milliseconds or 75 percent (14,996/20,000).
8
Per-group statistics:
%USED Utilization (includes %SYS)
CPU Optimization
%SYS VMkernel system activity time
%RDY Ready time
%WAIT Wait and idling time
%CSTP Pending co-schedule
%MLMTD Time not running because of a CPU limit
NWLD Number of worlds associated with a given group
The resxtop command give much of the same information as the vSphere Client but supplements
the displays with more detailed information.
In resxtop, a group refers to a resource pool, a running virtual machine, or a nonvirtual machine
world. For worlds belonging to a virtual machine, statistics for the running virtual machine are
displayed. The CPU statistics displayed for a group are the following:
• PCPU USED(%) – CPU utilization per physical CPU (includes logical CPU).
• %USED – CPU utilization. This is the percentage of physical CPU core cycles used by a group
of worlds (resource pools, running virtual machines, or other worlds). This includes %SYS.
• %SYS – Percentage of time spent in the ESXi VMkernel on behalf of the world/resource pool
to process interrupts and to perform other system activities.
• %RDY – Percentage of time the group was ready to run but was not provided CPU resources on
which to execute.
• %WAIT – Percentage of time the group spent in the blocked or busy wait state. This includes
the percentage of time the group was idle.
8
Type V to
CPU Optimization
filter the
output to
show just
virtual
machines.
resxtop often shows more information than you need for the specific performance problem that
you are troubleshooting. There are commands that can be issued in interactive mode to customize
the display. These commands are the most useful when monitoring CPU performance:
• Press spacebar – Immediately updates the current screen.
• Type s – Prompts you for the delay between updates, in seconds. The default value is 5
seconds. The minimum value is 2 seconds.
• Type V – Displays virtual machine instances only. Do not confuse this with lowercase v.
Lowercase v shifts the view to the storage virtual machine resource utilization screen. If you
accidentally use this option, type c to switch back to the CPU resource utilization panel.
• Type e – Toggles whether CPU statistics are displayed expanded or unexpanded. The expanded
display includes CPU resource utilization statistics broken down by individual worlds
belonging to a virtual machine. All percentages of the individual worlds are percentage of a
single physical CPU.
Type e to
show all the
worlds
associated
with a single
virtual
machine.
The default resxtop screen shows aggregate information for all vCPUs in a virtual machine. Type
e and enter the group ID (GID) to expand the group and display all the worlds associated with that
group. In the example above, the virtual machine group with GID 24 is expanded to show all the
worlds associated with that group.
To view the performance of individual CPUs, you must expand the output of resxtop for an
individual virtual machine. After it has been expanded, the utilization and ready time for each vCPU
can be tracked to further isolate performance problems.
8
Often an indicator of high utilization, a goal of many organizations
Ready time:
CPU Optimization
The best indicator of possible CPU performance problems
Ready time occurs when more CPU resources are being requested by
virtual machines than can be scheduled on the given physical CPUs.
High usage is not necessarily an indication of poor CPU performance, though it is often mistakenly
understood as such. Administrators in the physical space view high utilization as a warning sign,
due to the limitations of physical architectures. However, one of the goals of virtualization is to
efficiently utilize the available resources. When high CPU utilization is detected along with queuing,
this might indicate a performance problem.
As an analogy, consider a freeway. If every lane in the freeway is packed with cars but traffic is
moving at a reasonable speed, we have high utilization but good performance. The freeway is doing
what is was designed to do: moving a large number of vehicles at its maximum capacity.
Now imagine the same freeway at rush hour. More cars than the maximum capacity of the freeway
are trying to share the limited space. Traffic speed reduces to the point where cars line up (queue) at
the on-ramps while they wait for shared space to open up so that they can merge onto the freeway.
This is high utilization with queuing, and a performance problem exists.
Ready time reflects the idea of queuing within the CPU scheduler. Ready time is the amount of time
that a vCPU was ready to run but was asked to wait due to an overcommitted physical CPU.
Workload Workload
01
10
VM VM
In this lab, you will compare the performance of a MySQL database running on a single-vCPU
virtual machine, a dual-vCPU virtual machine, and a four-vCPU virtual machine. You will run four
tests, measuring CPU usage and ready time and recording the data after each test. There are four test
cases that you will perform. Two of them are described here:
• Case 1 – On a single-vCPU virtual machine, you will run the program starttest1.
starttest1 simulates a single-threaded application accessing a MySQL database. First, you
gather baseline data. Then you generate CPU contention by using a vApp named CPU-HOG.
Then you re-run starttest1 and monitor CPU performance.
• Case 2 – On a dual-vCPU virtual machine, you will run the program starttest1.
Cases 3 and 4 are described on the next page.
8
# ./starttest2 # ./starttest4
CPU Optimization
TestVM02 TestVM01
vApp
CPU-Hog
vApp creates
Case 3: CPU contention
Case 4:
2 threads, 2 4 threads, 4
vCPUs vCPUs
The remaining two test cases that you will perform are the following:
• Case 3– On a dual-vCPU virtual machine, you will run the program starttest2.
starttest2 simulates a dual-threaded application accessing a MySQL database.
• Case 4 – On a quad-vCPU virtual machine, you will run the program starttest4.
starttest4 simulates a quad-threaded application accessing a MySQL database.
In both test cases, CPU-HOG continues to run to generate CPU contention.
In this lab, you will use performance charts and resxtop to monitor CPU performance.
1. Run a single-threaded program in a single-vCPU virtual machine.
2. Use esxtop to observe baseline data.
3. Use the vSphere Client to observe baseline data.
4. Import the CPU-HOG vApp.
5. Generate CPU contention.
6. Observe the performance of the test virtual machine.
7. Run a single-threaded program in a dual-vCPU virtual machine.
8. Observe the performance of the test virtual machine.
9. Run a dual-threaded program in a dual-vCPU virtual machine.
10. Observe the performance of the test virtual machine.
11. Run a multithreaded program in a quad-vCPU virtual machine.
12. Observe the performance of the test virtual machine.
13. Summarize your findings.
14. Clean up for the next lab.
8
Case 1 Case 2 Case 3 Case 4
Counters 1 thread, 1 thread, 2 threads, 4 threads,
CPU Optimization
1 vCPU 2 vCPUs 2 vCPUs 4 vCPUs
opm
%USED
%RDY
Compare the results of case 1 and case 2. How does the performance differ between the two cases?
Compare the results of case 1 and case 3. How does the performance differ between the two cases?
Compare the results of case 3 and case 4. How does the performance differ between the two cases?
8
Lesson 3:
CPU Optimization
Troubleshooting CPU Performance
Problems
8
CPU saturation. 11. Check for using only one vCPU
in an SMP VM.
16. Check for low guest CPU
3. Check for host CPU saturation.
utilization.
12. Check for high CPU ready time
CPU Optimization
4. Check for guest CPU saturation. on VMs running in under-utilized 17. Check for past VM memory
hosts. swapping.
5. Check for active VM
memory swapping.
13. Check for slow storage device. 18. Check for high memory
demand in a resource pool.
6. Check for VM swap wait.
14. Check for random increase
in I/O latency on a 19. Check for high memory
7. Check for active VM
shared storage device. demand in a host.
memory compression.
When troubleshooting CPU issues in a VMware vSphere™ environment, you can perform the
problem checks in the basic troubleshooting flow for an ESXi host. For CPU-related problems, this
means checking host CPU saturation and guest CPU saturation. The problem checks assume that
you are using the vSphere Client and that you are using real-time statistics while the problem is
occurring.
Performing these problem checks requires looking at the values of specific performance metrics
using the performance monitoring capabilities of the vSphere Client. The threshold values specified
in these checks are only guidelines based on past experience. The best values for these thresholds
might vary, depending on the sensitivity of a particular configuration or set of applications to
variations in system load.
If resource pools are part of the virtual datacenter, the pools need to be checked for CPU saturation
to ensure the pools are not a performance bottleneck. An incorrectly configured resource pool can
inject CPU contention for the virtual machines that reside in the pool.
If virtual machine performance is degraded among pool members, two metrics should be checked:
• High CPU usage on the resource pool by itself does not indicate a performance bottleneck.
Compare CPU usage with the CPU limit setting on the resource pool if it is set. If the CPU
usage metric and the CPU limit value on the resource pool are close, the resource pool is
saturated. If this is not the case, the issue might be specific to the virtual machine.
• For each virtual machine in the pool, high ready time should be checked. If the average or latest
value for ready time is greater than 2000ms for any vCPU object, a contention issue exists.
However, the issue might reside on the ESXi host or the guest operating system. The problem
does not involve the resource pool.
For more details on performance troubleshooting, see vSphere Performance Troubleshooting Guide
at http://www.vmware.com/resources/techresources/.
8
No
CPU Optimization
Yes
Yes No
The root cause of host CPU saturation is simple: The virtual machines running on the host are
demanding more CPU resource than the host has available.
Remember that ready time is reported in two different values between resxtop and VMware®
vCenter Server™. In resxtop, it is reported in an easily consumed percentage format. A figure of
5 percent means that the virtual machine spent 5 percent of its last sample period waiting for
available CPU resources. In vCenter Server, ready time is reported as a time measurement. For
example, in vCenter Server’s real-time data, which produces sample values every 20,000
milliseconds, a figure of 1,000 milliseconds is reported for a 5 percent ready time. A figure of 2,000
milliseconds is reported for a 10 percent ready time.
Using resxtop values for ready time, here is how to interpret the value:
• If ready time <= 5 percent, this is normal. Very small single-digit numbers result in minimal
impact to users.
• If ready time is between 5 and 10 percent, ready time is starting to be worth watching.
If ready time is > 10 percent, though some systems continue to meet expectations, double-digit
ready time percentages often mean action is required to address performance issues.
Populating an ESXi host with too many virtual machines running compute-intensive applications
can make it impossible to supply sufficient resources to any individual virtual machine.
Three main scenarios in which this problem can occur are when the host has one of the following:
• A small number of virtual machines with high CPU demand
• A large number of virtual machines with moderate CPU demands
• A mix of virtual machines with high and low CPU demand
8
Reduce the number of virtual machines running on the host.
Increase the available CPU resources by adding the host to a VMware
vSphere® Distributed Resource Scheduler (DRS) cluster.
CPU Optimization
Increase the efficiency with which virtual machines use CPU resources.
Use resource controls to direct available resources to critical virtual
machines.
Four possible approaches exist to resolving the issue of a CPU-saturated host. These approaches are
discussed in the next few slides. You can apply these solutions to the three scenarios mentioned on
the previous page:
• When the host has a small number of virtual machines with high CPU demand, you might solve
the problem by first increasing the efficiency of CPU usage in the virtual machines. Virtual
machines with high CPU demands are those most likely to benefit from simple performance
tuning. In addition, unless care is taken with virtual machine placement, moving virtual
machines with high CPU demands to a new host might simply move the problem to that host.
• When the host has a large number of virtual machines with moderate CPU demands, reducing
the number of virtual machines on the host by rebalancing the load is the simplest solution.
Performance tuning on virtual machines with low CPU demands yields smaller benefits and is
time-consuming if the virtual machines are running different applications.
When the host has a mix of virtual machines with high and low CPU demand, the appropriate
solution depends on available resources and skill sets. If additional ESXi hosts are available,
rebalancing the load might be the best approach. If additional hosts are not available, or if expertise
exists for tuning the high-demand virtual machines, then increasing efficiency might be the best
approach.
The most straightforward solution to the problem of host CPU saturation is to reduce the demand for
CPU resources by migrating virtual machines to ESXi hosts with available CPU resources. In order
to do this successfully, use the performance charts available in the vSphere Client to determine the
CPU usage of each virtual machine on the host and the available CPU resources on the target hosts.
Ensure that the migrated virtual machines will not cause CPU saturation on the target hosts. When
manually rebalancing load in this manner, consider peak usage periods as well as average CPU
usage. This data is available in the historical performance charts when using vCenter Server to
manage ESXi hosts. The load can be rebalanced with no downtime for the affected virtual machines
if VMware vSphere® vMotion® is configured properly in your vSphere environment.
If additional ESXi hosts are not available, you can eliminate host CPU saturation by powering off
noncritical virtual machines. This action makes additional CPU resources available for critical
applications. Even idle virtual machines consume CPU resources. If powering off virtual machines
is not an option, you have to increase efficiency or explore the use of resource controls, such as
shares, limits, and reservations.
8
DRS can use vMotion to automatically balance the CPU load across
ESXi hosts.
CPU Optimization
Manual calculations and administration during peak versus
off-peak periods are eliminated.
Using VMware vSphere® Distributed Resource Scheduler™ (DRS) clusters to increase available
CPU resources is similar to the previous solution: additional CPU resources are made available to
the set of virtual machines that have been running on the affected host. However, in a DRS cluster,
load rebalancing is performed automatically. Thus, you do not have to manually compute the load
compatibility of specific virtual machines and hosts or to account for peak usage periods.
For more about DRS clusters, see vSphere Resource Management Guide at
http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html.
To increase the amount of work that can be performed on a saturated host, you must increase the
efficiency with which applications and virtual machines use those resources. CPU resources might
be wasted due to suboptimal tuning of applications and operating systems in virtual machines or
inefficient assignment of host resources to virtual machines.
The efficiency with which an application–operating system combination uses CPU resources
depends on many factors specific to the application, operating system, and hardware. As a result, a
discussion of tuning applications and operating systems for efficient use of CPU resources is beyond
the scope of this course.
Most application vendors provide performance-tuning guides that document best practices and
procedures for their application. These guides often include operating system–level tuning advice
and best practices. Operating system vendors also provide guides with performance-tuning
recommendations. In addition, many books and white papers discuss the general principles of
performance tuning as well as the application of those principles to specific applications and
operating systems. These procedures, best practices, and recommendations apply equally well to
virtualized and nonvirtualized environments. However, two application-level and operating system–
level tunings are particularly effective in a virtualized environment:
8
• Allocating more memory to a virtual machine – This approach might enable the application
running in the virtual machine to operate more efficiently. Additional memory might enable the
CPU Optimization
application to reduce I/O overhead or allocate more space for critical resources. Check the
performance-tuning information for the specific application to see if additional memory will
improve efficiency. Some applications must be explicitly configured to use additional memory.
Reducing the number of vCPUs allocated to virtual machines – Virtual machines that are not using
all of their CPU resources will make more resources available to other virtual machines. Even if the
extra vCPUs are idle, they still incur a cost in CPU resources, both in the CPU scheduler and in the
guest operating system overhead involved in managing the extra vCPU.
If you cannot rebalance CPU load or increase efficiency, or if all possible steps have been taken and
host CPU saturation still exists, then you might be able to reduce performance problems by
modifying resource controls. Many applications, such as batch jobs or long-running numerical
calculations, respond to a lack of CPU resources by taking longer to complete but still produce
correct and useful results. Other applications might experience failures or might be unable to meet
critical business requirements when denied sufficient CPU resources. The resource controls
available in vSphere can be used to ensure that resource-sensitive applications always get sufficient
CPU resources, even when host CPU saturation exists. For more about resource controls, see
vSphere Resource Management Guide at http://www.vmware.com/support/pubs/vsphere-esxi-
vcenter-server-pubs.html.
In this example, a lower-priority virtual machine was configured with a CPU limit, which is an
upper bound on the amount of CPU this virtual machine can consume. This artificial ceiling has
resulted in ready time for this specific virtual machine, most likely resulting in decreased
performance.
8
This might signal contention. used time
However, the host might not
be overcommitted, due to
CPU Optimization
workload availability.
Example:
ready time ~ used time
There might be periods of
activity and periods that are
idle.
The CPU is not
overcommitted
all of the time.
Ready time
< used time
Although high ready time typically signifies CPU contention, the condition does not always warrant
corrective action. If the value for ready time is close in value to the amount of time used on the
CPU, and if the increased ready time occurs with occasional spikes in CPU activity but does not
persist for extended periods of time, this value might not indicate a performance problem. The brief
performance hit is often within the accepted performance variance and does not require any action
on the part of the administrator.
This example uses resxtop to detect CPU overcommitment. Looking at the PCPU line near the top
of the screen, you can determine that the host has two CPUs, both 100 percent utilized. Three active
virtual machines are shown: memhog-linux-sm, kernelcompile-u, and memhog-linux. These virtual
machines are active because they have relatively high values in the %USED column. The values in
the %USED column alone do not necessarily indicate that the CPUs are overcommitted. In the
%RDY column, you see that the three active virtual machines have relatively high values. High
%RDY values, plus high %USED values, are a sure indicator that your CPU resources are
overcommitted.
8
If the CPU average usage is > 75 percent, or if there are
peaks > 90 percent, then guest CPU saturation exists.
CPU Optimization
Cause Solution
Increase CPU resources
Guest operating system and provided to application.
applications use all the CPU Increase the efficiency with
resources. which the virtual machine uses
CPU resources.
Guest CPU saturation occurs when the application and operating system running in a virtual
machine use all of the CPU resources that the ESXi host is providing to that virtual machine. The
occurrence of guest CPU saturation does not necessarily indicate that a performance problem exists.
Compute-intensive applications commonly use all available CPU resources, and even less-intensive
applications might experience periods of high CPU demand without experiencing performance
problems. However, if a performance problem exists when guest CPU saturation is occurring, steps
should be taken to eliminate the condition.
You have two possible approaches to solving performance problems related to guest CPU saturation:
• Increase the CPU resources provided to the application.
• Increase the efficiency with which the virtual machine uses CPU resources.
Adding CPU resources is often the easiest choice, particularly in a virtualized environment.
However, this approach misses inefficient behavior in the guest application and operating system
that might be wasting CPU resources or, in the worst case, that might be an indication of some
erroneous condition in the guest. If a virtual machine continues to experience CPU saturation even
after adding CPU resources, the tuning and behavior of the application and operating system should
be investigated.
Cause Solution
Guest operating system is
Upgrade to an SMP kernel or
configured with
HAL.
uniprocessor kernel.
When a virtual machine configured with more than one vCPU actively uses only one of those
vCPUs, resources that could be used to perform useful work are being wasted. Possible causes and
solutions for this problem include the following:
• The guest operating system is configured with a uniprocessor kernel (Linux) or hardware
abstraction layer (HAL) (Windows).
In order for a virtual machine to take advantage of multiple vCPUs, the guest operating system
running in the virtual machine must be able to recognize and use multiple processor cores. If the
virtual machine is doing all of its work on vCPU0, the guest operating system might be
configured with a kernel or a HAL that can recognize only a single processor core. Follow the
documentation provided by your operating system vendor to check for the type of kernel or
HAL that is being used by the guest operating system. If the guest operating system is using a
uniprocessor kernel or HAL, follow the operating system vendor’s documentation for upgrading
to an SMP kernel or HAL.
• The application is pinned to a single core in the guest operating system.
Many modern operating systems provide controls for restricting applications to run on only a
subset of available processor cores.
8
number of vCPUs to match the number actually used by the virtual machine.
• The application is single-threaded.
CPU Optimization
Many applications, particularly older applications, are written with only a single thread of
control. These applications cannot take advantage of more than one processor core. However,
many guest operating systems spread the runtime of even a single-threaded application over
multiple processors. Thus, running of a single-threaded application on a virtual SMP virtual
machine might lead to low guest CPU utilization, rather than to utilization on only vCPU0.
Without knowledge of the application design, it might be difficult to determine whether an
application is single-threaded other than by observing the behavior of the application in a
virtual SMP virtual machine. If, even when driven at peak load, the application is unable to
achieve CPU utilization of more than 100 percent of a single CPU, then the application is
unlikely to be single-threaded. Poor application and guest operating system tuning, or high I/O
response times, can also lead to artificial limits on the achievable utilization. The solution is to
reduce the number of vCPUs allocated to this virtual machine to one.
Cause Solution
Troubleshoot slow/overloaded
High storage response times
storage.
High response times from external
systems Solution is application- and
Poor application or operating system environment-dependent.
tuning
Application pinned to cores in guest Remove OS-level controls or reduce
operating system number of vCPUs.
Too many configured vCPUs Reduce number of vCPUs.
Restrictive resource allocations Modify VM resource settings.
To determine if the problem is low guest CPU utilization, use the vSphere Client to check the CPU
use of the virtual machines on the host.
To determine low guest CPU use
8
Prioritize virtual machine CPU usage with the proportional-share
algorithm.
CPU Optimization
Use vMotion and DRS to redistribute virtual machines and reduce
contention.
Increase the efficiency of virtual machine usage by:
Leveraging application tuning guides
Tuning the guest operating system
Optimizing the virtual hardware
8
Describe various CPU performance problems.
Discuss the causes of CPU performance problems.
CPU Optimization
Propose solutions to correct CPU performance problems.
Most often, CPU issues are caused when there are insufficient CPU
resources to satisfy demand.
High CPU usage values along with high ready time often indicate CPU
performance problems.
For CPU-related problems, you should check host CPU saturation and
guest CPU saturation.
Questions?
Memory Optimization 9
Slide 9-1
Module 9
9
Memory Optimization
9
common memory performance problems, such as those involving an
excessive demand for memory.
Memory Optimization
Lesson 1:
9
Memory Virtualization Concepts
Memory Optimization
virtual machine
9
guest physical memory guest
hypervisor
operating system
Memory Optimization
host physical memory hypervisor
allocated
list
virtual machine
application
free list
guest
OS
Applications start with no memory. Then, during their execution, they use the interfaces provided by
the operating system to explicitly allocate or deallocate virtual memory.
In a nonvirtual environment, the operating system assumes it owns all host physical memory in the
system. The hardware does not provide interfaces for the operating system to explicitly allocate or
free host physical memory. The operating system establishes the definitions of allocated or free
physical memory. Different operating systems have different implementations to realize this
abstraction. So, whether or not a physical page is free depends on which list the page resides in.
9
Guest operating system will not guest OS
explicitly allocate memory.
Memory Optimization
Memory is allocated on the virtual
machines first access to memory (read
or write). hypervisor
Because a virtual machine runs an operating system and one or more applications, the virtual
machine memory management properties combine both application and operating system memory
management properties. Like an application, when a virtual machine first starts, it has no
preallocated host physical memory. Like an operating system, the virtual machine cannot explicitly
allocate host physical memory through any standard interfaces.
The hypervisor intercepts the virtual machine’s memory accesses and allocates host physical
memory for the virtual machine on its first access to the memory. The hypervisor always writes
zeroes to the host physical memory before assigning it to a virtual machine.
Virtual machine memory deallocation acts just like an operating system, such that the guest
operating system frees a piece of guest physical memory by adding these memory page numbers to
the guest free list. But the data of the “freed” memory might not be modified at all. As a result,
when a particular piece of guest physical memory is freed, the mapped host physical memory does
not usually change its state and only the guest free list is changed.
It is difficult for the hypervisor to know when to free host physical memory when guest physical
memory is deallocated, or freed, because the guest operating system free list is not accessible to the
hypervisor. The hypervisor is completely unaware of which pages are free or allocated in the guest
operating system. As a result, the hypervisor cannot reclaim host physical memory when the guest
operating system frees guest physical memory.
9
Host-level, or hypervisor, swapping
Memory Optimization
The hypervisor must rely on memory-reclamation techniques to reclaim the host physical memory
freed up by the guest operating system. The memory-reclamation techniques are transparent page
sharing, ballooning, and host-level (or hypervisor) swapping.
When multiple virtual machines are running, some of them might have identical sets of memory
content. This presents opportunities for sharing memory across virtual machines (as well as sharing
within a single virtual machine). With transparent page sharing, the hypervisor can reclaim the
redundant copies and keep only one copy, which is shared by multiple virtual machines, in host
physical memory.
Ballooning and host-level swapping are the other two memory-reclamation techniques. These
techniques are discussed after a review of guest operating system memory terminology.
NOTE
In hardware-assisted memory virtualization systems, page sharing might not occur until memory
overcommitment is high enough to require the large pages to be broken into small pages. For further
information, see VMware® KB articles 1021095 and 1021896.
memory size
total amount of memory
presented to a guest
The memory used by the guest operating system in the virtual machine can be described as follows:
First, there is the memory size. This is the total amount of memory that is presented to the guest. By
default, the memory size corresponds to the maximum memory configured when the virtual machine
was created.
The total amount of memory (memory size) can be split into two parts: free memory and allocated
memory. Free memory is memory that has not been assigned to the guest operating system or to
applications. Allocated memory is memory that has been assigned to the guest operating system or
to applications.
Allocated memory can be further subdivided into two types: active memory and idle memory.
Active memory is memory that has “recently” been used by applications. Idle memory is memory
that has not been “recently” used by applications. To maximize virtual machine performance, it is
vital to keep a virtual machine’s active memory in host physical memory.
These terms are used specifically to describe the guest operating system’s memory. The VMware
vSphere® ESXi™ host does not know or care about a guest’s free, active, or idle memory.
9
free memory idle memory active memory
Memory Optimization
guest
OS
hypervisor
Due to the virtual machine’s isolation, the guest operating system is not aware that it is running
inside a virtual machine and is not aware of the states of other virtual machines on the same host.
When the hypervisor runs multiple virtual machines and the total amount of free host physical
memory becomes low, none of the virtual machines will free guest physical memory, because the
guest operating system cannot detect the host physical memory shortage.
The goal of ballooning is to make the guest operating system aware of the low memory status of the
host so that the guest operating system can free up some of its memory. The guest operating system
determines whether it needs to page out guest physical memory to satisfy the balloon driver’s
allocation requests. If the virtual machine has plenty of free or idle guest physical memory, inflating
the balloon will induce no guest-level paging and will not affect guest performance. However, if the
guest is already under memory pressure, the guest operating system decides which guest physical
pages are to be paged out to the virtual swap device in order to satisfy the balloon driver’s allocation
requests.
A B VM A B C VM D
Disk Disk
A B
With memory compression, ESXi stores pages in memory. These pages would otherwise be
swapped out to disk, through host swapping, in a compression cache located in the main memory.
Memory compression outperforms host swapping because the next access to the compressed page
only causes a page decompression, which can be an order of magnitude faster than the disk access.
ESXi determines if a page can be compressed by checking the compression ratio for the page.
Memory compression occurs when the page’s compression ratio is greater than 50%. Otherwise, the
page is swapped out. Only pages that would otherwise be swapped out to disk are chosen as
candidates for memory compression. This specification means that ESXi does not actively compress
guest pages when host swapping is not necessary. In other words, memory compression does not
affect workload performance when host memory is not overcommitted.
The slide illustrates how memory compression reclaims host memory compared to host swapping.
Assume ESXi needs to reclaim 8KB physical memory (that is, two 4KB pages) from a virtual machine.
With host swapping, two swap candidate pages, pages A and B, are directly swapped to disk. With
memory compression, a swap candidate page is compressed and stored using 2KB of space in a per-
virtual machine compression cache. Hence, each compressed page yields 2KB memory space for ESXi
to reclaim. In order to reclaim 8KB physical memory, four swap candidate pages must be compressed.
The page compression is much faster than the normal page swap-out operation, which involves a
disk I/O.
1
1
9
0
Flash
1
Memory Optimization
0
0
1
Main Memory
If memory compression doesn’t keep the virtual machine’s memory usage low enough, the ESXi
host will next forcibly reclaim memory using host-level swapping to a host cache if one has been
configured.
Swap to host cache allows users to configure a special swap cache on Solid State Disk (SSD)
storage. In most cases, swapping to the swap cache on SSD is much faster than swapping to regular
swap files on hard disk storage. Swapping to the SSD swap cache significantly reduces latency.
Thus, although some of the pages that the ESXi host swaps out might be active, swap to host cache
has a far lower performance impact than regular host-level swapping.
Placing swap files on SSD storage could potentially reduce vMotion performance. This is because if
a virtual machine has memory pages in a local swap file, they must be swapped in to memory before
a vMotion operation on that virtual machine can proceed.
guest
OS
hypervisor
In cases where transparent page sharing and ballooning are not sufficient to reclaim memory, ESXi
employs host-level swapping. To support this, when starting a virtual machine, the hypervisor
creates a separate swap file for the virtual machine. Then, if necessary, the hypervisor can directly
swap out guest physical memory to the swap file, which frees host physical memory for other virtual
machines.
Host-level, or hypervisor, swapping is a guaranteed technique to reclaim a specific amount of
memory within a specific amount of time. However, host-level swapping might severely penalize
guest performance. This occurs when the hypervisor has no knowledge of which guest physical
pages should be swapped out and the swapping might cause unintended interactions with the native
memory management policies in the guest operating system. For example, the guest operating
system will never page out its kernel pages, because those pages are critical to ensure guest kernel
performance. The hypervisor, however, cannot identify those guest kernel pages, so it might swap
out those physical pages.
NOTE
ESXi swapping is distinct from swapping performed by a guest OS due to memory pressure within a
virtual machine. Guest-OS level swapping may occur even when the ESXi host has ample resources.
9
VM1 (2GB) VM2 (2GB) VM3 (2GB)
Memory Optimization
VM
memory
hypervisor
(4GB)
Soft Ballooning
ESXi maintains four host free memory states: high, soft, hard, and low. When to use ballooning or
host-level swapping to reclaim host physical memory is largely determined by the current host free
memory state. (ESXi enables page sharing by default and reclaims host physical memory with little
overhead.)
In the high state, the aggregate virtual machine guest memory usage is smaller than the host physical
memory size. Whether or not host physical memory is overcommitted, the hypervisor will not
reclaim memory through ballooning or host-level swapping. (This is true only when the virtual
machine memory limit is not set.)
9
memory through swapping and blocks the execution of all virtual machines that consume more
memory than their target memory allocations.
Memory Optimization
0-4GB 6%
4-12GB 4%
12-28GB 2%
Remaining Memory 1%
MinFreePct determines the amount of memory the VMkernel should keep free. vSphere 4.1 allowed
the user to change the default MinFreePct value of 6% to a different value. vSphere 4.1 introduced a
dynamic threshold of the Soft, Hard, and Low state to set appropriate thresholds and prevent virtual
machine performance issues while protecting VMkernel.
Using a default MinFreePct value of 6% can be inefficient when 256GB or 512GB systems are
becoming more and more mainstream. A 6% threshold on a 512GB results in 30GB idling most of
the time. However, not all customers use large systems and prefer to scale out rather than to scale
up. In this scenario, a 6% MinFreePCT might be suitable. To have the best of both worlds, ESXi 5
uses a sliding scale for determining its MinFreePct threshold.
9
compression.
When the memory reaches 16% of MemFreePct, the Memory Free State changes to Low. In this
Memory Optimization
example, once the system drops below 255.62MB, the system experiences ballooning, memory
compression, and swapping.
To review, the hypervisor uses a variety of memory-reclamation techniques to reclaim host physical
(machine) memory.
Transparent page sharing and ballooning are the preferred techniques because they have a smaller
performance penalty for the virtual machine than host-level swapping. However, both techniques
take time to work. Transparent page sharing might or might not find memory to share. Ballooning
relies on the guest operating system to take actions to reclaim memory.
9
Transparent page sharing is always performed because it is enabled by default. Ballooning and host-
level swapping are performed when a host’s physical memory is overcommitted. If a host’s physical
memory is not overcommitted, there is no reason to reclaim memory through ballooning or host-
Memory Optimization
level swapping.
Example:
Virtualization of memory resources has some associated overhead. The memory space overhead has
the following components:
• The VMkernel uses a smaller amount of memory. The amount depends on the number and size
of the device drivers that are being used.
• Each virtual machine that is powered on has overhead. Overhead memory includes space
reserved for the virtual machine frame buffer and various virtualization data structures, such as
shadow page tables. Overhead memory depends on the number of virtual CPUs and the
configured memory for the guest operating system.
For a table of overhead memory on virtual machines, see vSphere Resource Management Guide at
http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html.
9
Memory Optimization
Lesson 2:
Monitoring Memory Activity
9
Memory Optimization
In order to quickly monitor virtual machine memory usage, the VMware vSphere® Client™
exposes two memory statistics in the Summary tab of a virtual machine: consumed host memory
and active guest memory.
Consumed host memory is the amount of host physical memory that is allocated to the virtual
machine. This value includes virtualization overhead.
Active guest memory is defined as the amount of guest physical memory that is currently being used
by the guest operating system and its applications.
These two statistics are quite useful for analyzing the memory status of the virtual machine and
providing hints to address potential performance issues.
Why is guest/host memory usage different than what I see inside the
guest operating system?
Guest physical memory:
Guest has better visibility while estimating active memory.
ESXi active memory estimate technique can take time to converge.
9
Host physical memory:
Host memory usage does not correspond to any memory metric within the
Memory Optimization
guest.
Host memory usage size is based on a virtual machines relative priority on
the physical host and memory usage by the guest.
As you monitor your system, you might notice that the host and guest memory usage values
reported in the vSphere Client are different than the memory usage metrics reported by the guest
operating system tools.
For guest memory usage, there are a couple of reasons why the values reported in the vSphere Client
might be different than the active memory usage reported by the guest operating system. The first
reason is that the guest operating system generally has a better idea (than the hypervisor) of what
memory is “active.” The guest operating system knows what applications are running and how it has
allocated memory. The second reason is that the method employed by the hypervisor to estimate
active memory usage takes time to converge. Therefore, the guest operating system’s estimation
might be more accurate than the ESXi host if the memory workload is fluctuating.
For host memory usage, the host memory usage metric has absolutely no meaning inside the guest
operating system. This is because the guest operating system does not know that it is running within
a virtual machine or that other virtual machines exist on the same physical host.
If you monitor memory usage, you might wonder why consumed host memory is greater than active
guest memory. The reason is that for physical hosts that are not overcommitted on memory,
consumed host memory represents the highest amount of memory usage by a virtual machine. It is
possible that in the past this virtual machine was actively using a very large amount of memory.
Because the host physical memory is not overcommitted, there is no reason for the hypervisor to
invoke ballooning or host-level swapping to reclaim memory. Therefore, you can find the virtual
machine in a situation where its active guest memory use is low but the amount of host physical
memory assigned to it is high. This is a perfectly normal situation, so there is nothing to be
concerned about.
If consumed host memory is less than or equal to active guest memory, this might be because the
active guest memory of a virtual machine might not completely reside in host physical memory.
This might occur if a guest’s active memory has been reclaimed by the balloon driver or if the
virtual machine has been swapped out by the hypervisor. In both cases, this is probably due to high
memory overcommitment.
possible states:
high, soft, hard,
and low
9
Memory Optimization
PCI hole
VMKMEM
There are a lot of memory statistics to look at in resxtop. To display the memory screen, type m.
Type V to list only the virtual machines. Here are a few metrics of interest:
• PMEM/MB – The total amount of physical memory on your host
• VMKMEM/MB – The amount of physical memory currently in use by the VMkernel
• PSHARE - Savings from transparent page sharing – The saving field shows you how much
memory you have saved through transparent page sharing. In the example above, two virtual
machines are running, with a memory savings of 578 MB.
Host-level swapping severely affects the performance of the virtual machine being
swapped, as well as other virtual machines. (The VMware vSphere® Client provides
metrics for monitoring swapping activity.)
9
Memory Optimization
Excessive memory demand can cause severe performance problems for one or more virtual
machines on an ESXi host. When ESXi is actively swapping the memory of a virtual machine in and
out from disk, the performance of that virtual machine will degrade. The overhead of swapping a
virtual machine's memory in and out from disk can also degrade the performance of other virtual
machines.
The metrics in the vSphere Client for monitoring swapping activity are the following:
• Memory Swap In Rate – The rate at which memory is being swapped in from disk
• Memory Swap Out Rate – The rate at which memory is being swapped out to disk
High values for either metric indicate a lack of memory and that performance is suffering as a result.
To monitor a host’s swapping activity in resxtop, view the memory screen. A useful metric is
SWAP/MB. This metric represents total swapping for all the virtual machines on the host. In the
example above, the value of curr is 198MB, which means that currently 198MB of swap space is
used. The value of target, which is 0 in the example above, is for where ESXi expects the swap
usage to be.
To monitor host-level swapping activity per virtual machine, type V in the resxtop/esxtop
window to display just the virtual machines. Here are some useful metrics:
• SWR/s and SWW/s – Measured in megabytes, these counters represent the rate at which the
ESXi host is swapping memory in from disk (SWR/s) and swapping memory out to disk
(SWW/s).
• SWCUR – This is the amount of swap space currently used by the virtual machine.
• SWTGT – This is the amount of swap space that the host expects the virtual machine to use.
9
Memory Optimization
In addition, the CPU screen (type c), contains a metric named %SWPWT. This is the best indicator
of a performance problem due to wait time experienced by the virtual machine. This metric
represents the percentage of time that the virtual machine is waiting for memory to be swapped in.
Ballooning is part of normal operations when memory is overcommitted. The fact that ballooning is
occurring is not necessarily an indication of a performance problem. The use of the balloon driver
enables the guest to give up physical memory pages that are not being used. However, if ballooning
causes the guest to give up memory that it actually needs, performance problems can occur due to
guest operating system paging.
In the vSphere Client, use the Memory Balloon metric to monitor a host’s ballooning activity. This
metric represents the total amount of memory claimed by the balloon drivers of the virtual machines
on the host. The memory claimed by the balloon drivers are for use by other virtual machines.
Again, this is not a performance problem per se, but it represents the host starting to take memory
from less needful virtual machines for those with large amounts of active memory. If the host is
ballooning, check swap rate counters (Memory Swap In Rate and Memory Swap Out Rate), which
might indicate performance problems.
9
Memory Optimization
Balloon activity can also be monitored in resxtop. Here are some useful metrics:
• MEMCTL/MB – This line displays the memory balloon statistics for the entire host. All
numbers are in megabytes. “curr” is the total amount of physical memory reclaimed using the
balloon driver (also known as the vmmemctl module). “target” is the total amount of physical
memory ESXi wants to reclaim with the balloon driver. “max” is the maximum amount of
physical memory that the ESXi host can reclaim with the balloon driver.
• MCTL? – This value, which is either Y (for yes) or N (for no), indicates whether the balloon
driver is installed in the virtual machine.
• MCTLSZ – This value is reported for each virtual machine and represents the amount of
physical memory that the balloon driver is holding for use by other virtual machines. In the
example above, the balloon driver for VMTest01 is holding approximately 634MB of memory
for use by other virtual machines.
• MCTLTGT – This value is the amount of physical memory that the host wants to reclaim from
the virtual machine through ballooning. In the example above, this value is approximately
634MB.
Case 1: Case 2:
Baseline Data after
data workloads started
VM VM
128MB 128MB 1GB
1GB RAM RAM RAM RAM
In this lab, you will use performance charts and resxtop to monitor
memory performance:
1. Generate database activity.
2. Display the memory screen in resxtop.
3. Display a memory performance chart for your test virtual machine.
9
4. Observe your test virtual machines memory usage.
5. Import the RAM-HOG vApp.
Memory Optimization
6. Overcommit memory resources.
7. Observe memory statistics.
8. Clean up for the next lab.
9
Memory Optimization
Lesson 3:
Troubleshooting Memory Performance
Problems
9
Memory Optimization
The slide shows the basic troubleshooting flow discussed in an earlier module. This lesson focuses
on memory-related problems. The problems you will look at are active (and past) host-level
swapping, guest operating system paging, and high guest memory demand.
No
Yes
9
No Yes
Memory Optimization
Yes
Excessive memory demand can cause severe performance problems for one or more virtual
machines on an ESXi host. When ESXi is actively swapping the memory of a virtual machine in and
out from disk, the performance of that virtual machine will degrade. The overhead of swapping a
virtual machine's memory in and out from disk can also degrade the performance of other virtual
machines.
Check for active host-level swapping. To do this, use the vSphere Client to display an advanced
performance chart for memory. Look at the Memory Swap In Rate and Memory Swap Out Rate
counters for the host.
If either of these measurements is greater than zero at any time during the displayed period, then the
ESXi host is actively swapping virtual machine memory. This is a performance problem. If either of
these measurements is not greater than zero, return to the basic troubleshooting flow and look at the
next problem in the list.
In addition, identify the virtual machines affected by host-level swapping. To do this, look at the
Memory Swap In Rate and Memory Swap Out Rate counters for all the virtual machines on the host.
A stacked graph using the vSphere Client works well for this information. If the values for any of
the virtual machines are greater than zero, then host-level swapping is directly affecting those virtual
machines. Otherwise, host-level swapping is not directly affecting the other virtual machines.
No
Yes
If you determine that host-level swapping is affecting the performance of your virtual machines, the
next step is to use resxtop/esxtop to measure the %SWPWT counter for the affected virtual
machines. %SWPWT tracks the percentage of time that the virtual machine is waiting for its pages
to be swapped in. This counter is the best indicator of a performance problem because the longer a
virtual machine waits for its pages, the slower will be the performance of the guest operating system
and applications running in it. Any value greater than 0 warrants attention. If the value is greater
than 5, the problem should be immediately addressed.
The %SWPWT counter is found on the CPU screen. (Type c in the resxtop window.)
9
Excessive memory overcommitment
Memory overcommitment with memory reservations
Memory Optimization
Balloon drivers in virtual machines not running or disabled
The basic cause of host-level swapping is memory overcommitment from using memory-intensive
virtual machines whose combined configured memory is greater than the amount of host physical
memory available. The following conditions can cause host-level swapping:
• Excessive memory overcommitment – In this condition, the level of memory overcommitment
is too high for the combination of memory capacity, virtual machine allocations, and virtual
machine workload. In addition, the amount of overhead memory must be taken into
consideration when using memory overcommitment.
• Memory overcommitment with memory reservations – In this condition, the total amount of
balloonable memory might be reduced by memory reservations. If a large portion of a host’s
memory is reserved for some virtual machines, the host might be forced to swap the memory of
other virtual machines. This might happen even though virtual machines are not actively using
all of their reserved memory.
• Balloon drivers in virtual machines not running or disabled – In this condition, if the balloon
driver is not running, or has been deliberately disabled in some virtual machines, the amount of
balloonable memory on the host will be decreased. As a result, the host might be forced to swap
even though there is idle memory that could have been reclaimed through ballooning.
There are four possible solutions to performance problems caused by active host-level swapping
(listed on the slide).
In most situations, reducing memory overcommitment levels is the proper approach for eliminating
swapping on an ESXi host. However, be sure to consider all of the factors discussed in this lesson to
ensure that the reduction is adequate to eliminate swapping. If other approaches are used, be sure to
monitor the host to ensure that swapping has been eliminated.
Each solution is discussed in the following pages.
9
Memory Optimization
To reduce the level of memory overcommitment, do one or more of the following:
• Add physical memory to the ESXi host. Adding physical memory to the host will reduce the
level of memory overcommitment and it might eliminate the memory pressure that caused
swapping to occur.
• Reduce the number of virtual machines running on the ESXi host. One way to do this is to use
VMware vSphere® vMotion® to migrate virtual machines to hosts with available memory
resources. In order to do this successfully, the vSphere Client performance charts should be
used to determine the memory usage of each virtual machine on the host and the available
memory resources on the target hosts. You must ensure that the migrated virtual machines will
not cause swapping to occur on the target hosts.
If additional ESXi hosts are not available, you can reduce the number of virtual machines by
powering off noncritical virtual machines. This will make additional memory resources
available for critical applications. Even idle virtual machines consume some memory resources.
• Increase available memory resources by adding the host to a VMware vSphere® Distributed
Resource Scheduler™ (DRS) cluster. This is similar to the previous solution. However, in a DRS
cluster, load rebalancing can be performed automatically. It is not necessary to manually compute
the compatibility of specific virtual machines and hosts, or to account for peak usage periods.
In order to maximize the ability of ESXi to recover idle memory from virtual machines, the balloon
driver should be enabled in all virtual machines. If a virtual machine has critical memory needs,
reservations and other resource controls should be used to ensure that those needs are satisfied. The
balloon driver is enabled when VMware® Tools™ is installed into the virtual machine.
If there is excessive memory overcommitment, however, ballooning will only delay the onset of
swapping. Therefore, it is not a sufficient solution. The level of memory overcommitment should be
reduced as well.
The balloon driver should never be deliberately disabled in a virtual machine. Disabling the balloon
driver might cause unintended performance problems such as host-level swapping. It also makes
tracking down memory-related problems more difficult. VMware vSphere® provides other
mechanisms, such as memory reservations, for controlling the amount of memory available to a
virtual machine.
9
VMware Tools is installed.
Memory Optimization
Memory hot add lets you add memory resources for a virtual machine while the machine is
powered on.
Hot-add memory allows ranges of physical memory to be added to a running operating system
without requiring the system to be shut down. Hot-add allows an administrator to add memory to a
running system to address existing or future needs for memory without incurring downtime for the
system or applications.
2. Click the Options tab and under Advanced, select Memory/CPU Hotplug.
9
If the reservation cannot be reduced, memory overcommitment must
be reduced.
Memory Optimization
If virtual machines configured with large memory reservations are causing the ESXi host to swap
memory from virtual machines without reservations, the need for those reservations should be
reevaluated.
If a virtual machine with reservations is not using its full reservation, even at peak periods, then the
reservation should be reduced.
If the reservations cannot be reduced, either because the memory is needed or for business reasons,
then the level of memory overcommitment must be reduced.
As a last resort, when it is not possible to reduce the level of memory overcommitment, configure
your performance-critical virtual machines with a sufficient memory reservation to prevent them
from swapping. However, doing this only moves the swapping problem to other virtual machines,
whose performance will be severely degraded. In addition, swapping of other virtual machines
might still affect the performance of virtual machines with reservations, due to added disk traffic
and memory management overhead.
9
Cause Solution
Memory Optimization
If overall memory demand is Use resource controls to direct
high, the ballooning mechanism resources to critical VMs.
might reclaim active memory Reduce memory
from the application or guest OS. overcommitment on the host.
High demand for host memory will not necessarily cause performance problems. Host physical
memory can be used without an effect on virtual machine performance as long as memory is not
overcommitted and sufficient memory capacity is available for virtual machine overhead and for the
internal operations of the ESXi host. Even if memory is overcommitted, ESXi can use ballooning to
reclaim unused memory from nonbusy virtual machines to meet the needs of more demanding
virtual machines. Ballooning can maintain good performance in most cases.
However, if overall demand for host physical memory is high, the ballooning mechanism might
reclaim memory that is needed by an application or guest operating system. In addition, some
applications are highly sensitive to having any memory reclaimed by the balloon driver.
To determine whether ballooning is affecting performance of a virtual machine, you must
investigate the use of memory within the virtual machine. If a virtual machine with a balloon value
of greater than zero is experiencing performance problems, the next step is to examine paging
statistics from within the guest operating system. In Windows guests, these statistics are available
through Perfmon. In Linux guests, they are provided by such tools as vmstat and sar. The values
provided by these monitoring tools within the guest operating system are not strictly accurate. But
they will provide enough information to determine whether the virtual machine is experiencing
memory pressure.
Swap target is
more for the virtual
machine without
the balloon driver .
9
MCTL: N Balloon
Memory Optimization
driver not active,
tools probably not Virtual
installed machine with
memory-
balloon driver
hog virtual
swaps less.
machines
This example shows why ballooning is preferable to swapping. There are multiple virtual machines
running, all the virtual machines are running a memory-intensive application. The MCTL? field
shows that the virtual machine named Workload## do not have the balloon driver installed, whereas
the virtual machine named VMtest02 does have the balloon driver installed. If the balloon driver is
not installed, this typically means that VMware Tools has not been installed into the virtual
machine.
The Workload virtual machines have a higher swap target (SWTGT) than the other virtual machines
because these virtual machines do not have the balloon driver installed. The virtual machine that has
the balloon driver installed (VMTest02) can reclaim memory by using the balloon driver. If ESXi
cannot reclaim memory through ballooning, then it will have to resort to other methods, such as
host-level swapping.
Cause Solution
Performance problems can occur when the application and the guest operating system running in a
virtual machine are using a high percentage of the memory allocated to them. High guest memory
can lead to guest operating system paging as well as other, application-specific performance
problems. The causes of high memory demand will vary depending on the application and guest
operating system being used. It is necessary to use application-specific and operating system–
specific monitoring and tuning documentation in order to determine why the guest is using most of
its allocated memory and whether this is causing performance problems.
If high demand for guest memory is causing performance problems, the following actions can be
taken to address the performance problems:
• Configure additional memory for the virtual machine. In a vSphere environment, it is much
easier to add memory to a guest than would be possible in a physical environment. Ensure that
the guest operating system and application used can take advantage of the added memory. For
example, some 32-bit operating systems and most 32-bit applications can access only 4GB of
memory, regardless of how much is configured. Also, check that the new allocation will not
cause excessive overcommitment on the host, because this will lead to host-level swapping.
• Tune the application to reduce its memory demand. The methods for doing this are application-
specific.
9
Host-level swapping slows the boot process but is not a problem
after the guest is booted up.
Memory Optimization
A virtual machines memory that is currently swapped out to disk will
not cause a performance problem if the memory is never accessed.
In some cases, virtual machine memory can remain swapped to disk even though the ESXi host is
not actively swapping. This will occur when high memory activity caused some virtual machine
memory to be swapped to disk and the virtual machine whose memory was swapped has not yet
attempted to access the swapped memory. As soon as the virtual machine does access this memory,
the host will swap it back in from disk.
Having virtual machine memory that is currently swapped out to disk will not cause performance
problems. However, it might indicate the cause of performance problems that were observed in the
past, and it might indicate problems that might recur at some point in the future.
There is one common situation that can lead to virtual machine memory being left swapped out even
though there was no observed performance problem. When a virtual machine’s guest operating
system first starts, there will be a period of time before the memory balloon driver begins running.
In that time, the virtual machine might access a large portion of its allocated memory. If many
virtual machines are powered on at the same time, the spike in memory demand, together with the
lack of running balloon drivers, might force the ESXi host to resort to host-level swapping. Some of
the memory that is swapped might never again be accessed by a virtual machine, causing it to
remain swapped to disk even though there is adequate free memory. This type of swapping might
slow the boot process but will not cause problems when the operating systems have finished
booting.
Allocate enough memory to hold the working set of applications you will
run in the virtual machine, thus minimizing swapping.
Never disable the balloon driver. Always keep it enabled.
Keep transparent page sharing enabled
Avoid overcommitting memory to the point that it results in heavy
memory reclamation.
9
3. Check for active host-level swapping.
4. Check for past host-level swapping.
Memory Optimization
5. Check for guest operating system paging.
6. Check for high guest memory demand.
7. Summarize your findings.
8. Clean up for the next lab.
9
The basic cause of memory swapping is memory overcommitment of
memory-intensive virtual machines.
Memory Optimization
Questions?
Module 10
10
Virtual Machine and Cluster Optimization
10
Virtual Machine and Cluster Optimization
Lesson 1:
Virtual Machine Optimization
10
Virtual Machine and Cluster Optimization
10
Network
When you create a virtual machine, select the guest operating system type that you intend to install
in that virtual machine. The guest operating system type determines the default optimal devices for
the guest operating system, such as the SCSI controller and the network adapter to use.
Virtual machines support both 32-bit and 64-bit guest operating systems. For information about
guest operating system support, see VMware Compatibility Guide at http://www.vmware.com/
resources/compatibility.
10
As of VMware vSphere® ESXi 5.0, VMware Tools is a suitable choice.
For Linux guests, use NTP instead of VMware Tools.
Because virtual machines work by time-sharing host physical hardware, virtual machines cannot
exactly duplicate the timing activity of physical machines. Virtual machines use several techniques
to minimize and conceal differences in timing performance. However, the differences can still
sometimes cause timekeeping inaccuracies and other problems in software running in a virtual
machine.
Several things can be done to reduce the problem of timing inaccuracies:
• If you have a choice, use guest operating systems that require fewer timer interrupts. The timer
interrupt rates vary between different operating systems and versions. For example:
• Windows systems typically use a base timer interrupt rate of 64 Hz or 100 Hz.
• 2.6 Linux kernels have used a variety of timer interrupt rates (100 Hz, 250 Hz, and 1000
Hz).
• The most recent 2.6 Linux kernels introduce the NO_HZ kernel configuration option
(sometimes called “tickless timer”) that uses a variable timer interrupt rate.
10
saturation.
VMware Tools should always be
running.
Install the latest version of VMware Tools in the guest operating system. Installing VMware Tools
in Windows guests updates the BusLogic SCSI driver included with the guest operating system to
the driver supplied by VMware. The VMware driver has optimizations that guest-supplied Windows
drivers do not.
VMware Tools also includes the balloon driver used for memory reclamation in VMware vSphere®
ESXi™. Ballooning does not work if VMware Tools is not installed. VMware Tools allows you to
synchronize the virtual machine time with that of the ESXi host.
VMware Tools is running when the guest operating system has booted. However, a couple of
situations exist that can prevent VMware Tools from starting properly after it has been installed:
• The operating system kernel in a Linux virtual machine has changed, for example, you updated
the Linux kernel to a newer version. Changing the Linux kernel can lead to conditions that
prevent the drivers that comprise VMware Tools from loading properly. To reenable VMware
Tools, rerun the script vmware-config-tools.pl in the Linux virtual machine.
For virtual machines running on an ESXi 5.x host, make sure that
their virtual hardware is at least version 9.
Virtual hardware version 9 includes support for the following:
Up to 64 vCPUs
Up to 1TB of RAM
Virtual GPU
Support for virtual Non-Uniform Memory Access (vNUMA)
Support for 3D graphics
10
Guest operating system storage reclamation
Enhanced CPU virtualization
Virtual machine hardware compatibility assigns a compatibility level that directly correlates to the
vSphere version feature set for the virtual machine. Hardware compatibility enables administrators
to identify the hardware features and capabilities of a virtual machine without having to remember
the feature set of a version of virtual hardware.
A virtual machine’s hardware compatibility is defined when it is created. The hardware can be
upgraded to a higher level, but not down to a lower level. Changing a virtual machine’s
compatibility requires it to be powered off. Virtual machines with older compatibility levels will be
fully supported on newer ESXi host versions.
Compatibility can be defined on a host or a cluster as a default setting. Making the setting a default
version ensures all newly created virtual machines are set to the correct version.
Hardware compatibility can be configured using the vSphere Web Client only.
10
Virtual Machine and Cluster Optimization
In SMP guests, the guest operating system can migrate processes from one vCPU to another. This
migration can incur a small CPU overhead. If the migration is very frequent, pinning guest threads
or processes to specific vCPUs might prove helpful.
If you have a choice, use guest operating systems that require fewer timer interrupts. For example:
• If you have a uniprocessor virtual machine, use a uniprocessor HAL.
• In some Linux versions, such as RHEL 5.1 and later, the “divider=1-” kernel boot parameter
reduces the timer interrupt rate to one tenth its default rate. For timekeeping best practices for
Linux guests see VMware knowledge base article 1006427 at http://kb.vmware.com/kb/
1006427.
• Kernels with tickless-timer support (NO_HZ kernels) do not schedule periodic timers to
maintain system time. As a result, these kernels reduce the overall average rate of virtual timer
interrupts. This reduction improves system performance and scalability on hosts running large
numbers of virtual machines.
vNUMA exposes NUMA topology to the guest operating system. vNUMA can provide significant
performance benefits, though the benefits depend heavily on the level of NUMA optimization in the
guest operating system and applications.
You can obtain the maximum performance benefits from vNUMA if your clusters are composed
entirely of hosts with matching NUMA architecture. When a vNUMA-enabled virtual machine is
powered on, its vNUMA topology is set based in part on the NUMA topology of the underlying
physical host. Once a virtual machine’s vNUMA topology is initialized, it does not change unless
the number of vCPUs in that virtual machine is changed. Therefore, if a vNUMA virtual machine is
moved to a host with a different NUMA topology, the virtual machine’s vNUMA topology might no
longer be optimal. This move could potentially result in reduced performance.
Size your virtual machines so they align with physical NUMA boundaries. For example, you have a
host system with six cores per NUMA node. In this case, size your virtual machines with a multiple
of six vCPUs (that is, 6 vCPUs, 12 vCPUs, 18vCPUs, 24 vCPUs, and so on).
When creating a virtual machine you have the option to specify the number of virtual sockets and
the number of cores per virtual socket. If the number of cores per virtual socket on a vNUMA-
enabled virtual machine is set to any value other than the default of 1 and that value doesn't align
with the underlying physical host topology, performance might be slightly reduced.
1. Select the virtual machine that you wish to change, then click Edit virtual machine settings.
2. Under the Options tab, select General and then click Configuration Parameters.
3. Look for numa.vcpu.maxPerVirtualNode. If it is not present, click Add Row and enter the
new variable.
4. Click on the value to be changed and configure it as you wish.
10
For information on NUMA and vNUMA, see Performance Best Practices for VMware vSphere 5.0
at http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf.
In addition to the usual 4KB memory pages, ESXi also provides 2MB memory pages, commonly
referred to as “large pages”. By default, ESXi assigns these 2MB machine memory pages to guest
operating systems that request them, giving the guest operating system the full advantage of using
large pages.
An operating system or application could benefit from large pages on a native system. In this case,
that operating system or application can potentially achieve a similar performance improvement on
a virtual machine backed with 2MB machine memory pages. Consult the documentation for your
operating system and application to determine how to configure them to use large memory pages.
Use of large pages can also change page sharing behavior. While ESXi ordinarily uses page sharing
regardless of memory demands, ESXi does not share large pages. Therefore with large pages, page
sharing might not occur until memory overcommitment is high enough to require the large pages to
be broken into small pages.
More information about large page support can be found in the performance study entitled Large
Page Performance (available at http://www.vmware.com/resources/techresources/1039).
The creation of a virtual machine’s swap file is automatic. By default, this file is created in the
virtual machine’s working directory, but a different location can be set.
10
You can define an alternate swap file location at the cluster level and at the virtual machine level.
2. In the cluster settings dialog box, select Swapfile Location in the left pane.
3. Choose the option to store the swap file in the datastore specified by the host and click OK.
4. Select the ESXi host in the inventory and click the Configuration tab.
5. In the Software panel, click the Virtual Machine Swapfile Location link.
6. Click the Edit link and select the datastore to use as that swap file location. Click OK.
To define an alternate swap file location at the virtual machine level
1. Right-click the virtual machine in the inventory and select Edit Settings.
2. In the virtual machine properties dialog box, click the Options tab.
The default virtual storage adapter is either BusLogic Parallel, LSI Logic Parallel, or LSI Logic
SAS. The adapter used depends on the guest operating system and the virtual hardware version. You
might choose to use the BusLogic Parallel virtual SCSI adapter, and you are using a Windows guest
operating system. In this case, you should use the custom BusLogic driver included in the VMware
Tools package.
ESXi also includes a paravirtualized SCSI storage adapter, PVSCSI (also called VMware
Paravirtual). The PVSCSI adapter offers a significant reduction in CPU utilization as well as
potentially increased throughput compared to the default virtual storage adapters.
The PVSCSI adapter is the best choice for environments with very I/O-intensive guest
applications. In order to use PVSCSI, your virtual machine must be using virtual hardware
version 7 or later. For information on PVSCI, see VMware knowledge base article 1010398 at
http://kb.vmware.com/kb/1010398.
10
Virtual Machine and Cluster Optimization
For the best performance, use the vmxnet3 network adapter for operating systems in which it is
supported. The virtual machine must use virtual hardware version 7 or later, and VMware Tools
must be installed in the guest operating system.
• If vmxnet3 is not supported in your guest operating system, use Enhanced vmxnet (which
requires VMware Tools). Both Enhanced vmxnet (vmxnet2) and vmxnet3 support jumbo
frames for better performance.
• If vmxnet2 is not supported in your guest operating system, use the flexible device type. This
device type automatically converts each vlance adapter to a vmxnet adapter if VMware Tools is
installed.
For the best networking performance, use network adapters that support hardware features such as
TCP checksum offload, TCP segmentation offload, and jumbo frames.
Ensure that network adapters have the proper speed and duplex settings. Typically, for 10/100 NICs,
set the speed and duplex. Make sure that the duplex is set to full duplex. For NICS, Gigabit Ethernet
or higher, set the speed and duplex to autonegotiate.
10
Virtual Machine and Cluster Optimization
A virtual machine is highly configurable. Different configuration options affect the virtual
machine’s ability to power on and boot correctly.
A virtual machine does not power on unless certain requirements are met. A virtual machine does
not power on unless the host can meet the virtual machine’s CPU and memory reservations. A
virtual machine does not power on unless enough storage space exists to create the virtual machine’s
swap file.
A virtual machine does power on but not boot if the guest operating system does not include a driver
for the type of SCSI HBA configured in the virtual hardware.
A virtual machine does not power on unless the host has sufficient unreserved CPU and memory
resources to meet the virtual machine’s reservations. The vSphere Client reports an insufficient
resources error.
In order to power on the virtual machine, you must either make more CPU and memory resources
available or reduce the virtual machine’s reservation settings.
To make more resource available, perform one or more of the following actions:
• Reduce the reservations of one or more existing virtual machines.
• Migrate one or more existing virtual machines to another host.
• Power off one or more existing virtual machines.
• Add additional CPU or memory resources.
10
memory. Configured memory: VM datastore:
2GB
2GB
Memory reservation: available
Virtual machine swap files are created at power-on and deleted at power-off. A virtual machine does
not power on unless enough storage space exists to create the swap file. Swap file size is determined
by subtracting the virtual machine’s memory reservation from its configured memory.
To make more storage space available, perform one or more of the following actions:
• Increase the reservations of one or more virtual machines.
• Decrease the configured memory for one or more virtual machines.
• Use VMware vSphere® Storage vMotion® to migrate files to another datastore.
• Increase the size of the datastore.
Once a virtual machine has been powered on, the virtual machine must still boot its operating
system. A virtual machine does not boot if the guest operating system does not include a driver for
the configured virtual SCSI HBA.
The virtual SCSI HBA is automatically selected during virtual machine creation, based on the
selected guest operating system. The default SCSI HBA selection can be overridden at creation time
by using the Create Virtual Machine wizard.
The SCSI HBA selection can also be changed by using the vSphere Client Edit Settings menu item
(available by right-clicking the virtual machine in the inventory panel).
Select the right guest operating system type during virtual machine
creation.
Disable unused devices, for example, USB, CD-ROM, and floppy devices.
Do not deploy single-threaded applications in SMP virtual machines.
Configure proper guest time synchronization.
Install VMware Tools in the guest operating system and keep it up to date.
Use the correct virtual hardware.
Size the guest operating system queue depth appropriately.
10
Be aware that large I/O requests split up by the guest might affect I/O
performance.
Lesson 2:
vSphere Cluster Optimization
10
Virtual Machine and Cluster Optimization
10
Do not set your reservations too high because high-reservation values
might limit DRS balancing.
Ensure that shares and limits do not limit the virtual machines storage
ESXi provides several mechanisms to configure and adjust the allocation of CPU and memory
resources for virtual machines running within it. Resource management configurations can have a
significant impact on virtual machine performance.
If you expect frequent changes to the total available resources, use shares, not reservations, to
allocate resources fairly across virtual machines. If you use shares and you subsequently upgrade the
hardware, each virtual machine stays at the same relative priority (keeps the same number of
shares). The relative priority remains the same even though each share represents a larger amount of
memory or CPU.
Keep in mind that shares take effect only when there is resource contention. For example, if you
give your virtual machine a high number of CPU shares, and you expect to see an immediate
difference in performance, you might be not see a difference because there might not be contention
for CPU
A situation might occur when the host’s memory is overcommitted and the virtual machine’s
allocated host memory is too small to achieve a reasonable performance. In this situation, the user
can increase the virtual machine’s shares to escalate the relative priority of the virtual machine. The
shares are increased so that the hypervisor can allocate more host memory for that virtual machine.
10
Virtual Machine and Cluster Optimization
Use resource pools to hierarchically partition available CPU and memory resources. With separate
resource pools, you have more options when you need to choose between high availability and load
balancing. Resource pools can also be easier to manage and troubleshoot.
Use resource pools to isolate performance. Isolation can prevent one virtual machine’s performance
from affecting the performance of other virtual machines. For example, you might want to limit a
department or a project’s virtual machines to a certain number of resources. Defined limits provide
more predictable service levels and less resource contention. Setting defined limits also protects the
resource pool from a virtual machine going into a runaway condition. For example, due to an
application problem, the virtual machine takes up 100 percent of its allocated processor time. Setting
a firm limit on a resource pool can reduce the negative effect that the virtual machine might have on
the resource pool. You can fully isolate a resource pool by giving it the type Fixed and using
reservations and limits to control resources.
Group virtual machines for a multitier service into a resource pool. Using the same resource pool
allows resources to be assigned for the service as a whole.
VMware recommends that resource pools and virtual machines not be made siblings in a hierarchy.
Instead, each hierarchy level should contain only resource pools or only virtual machines. By
default, resource pools are assigned share values that might not compare appropriately with those
assigned to virtual machines, potentially resulting in unexpected performance.
Exceeding the maximum number of hosts, virtual machines, or resource pools for each DRS cluster
is not supported. Even if the cluster seems to be operating properly, exceeding the maximums could
adversely affect VMware® vCenter Server™ or DRS performance. For cluster maximums, see
Configuration Maximums for VMware vSphere 5.0 at http://www.vmware.com/pdf/vsphere5/r50/
vsphere-50-configuration-maximums.pdf.
Differences can exist in the memory cache architecture of hosts in your DRS cluster. For example,
some of the hosts in your cluster might have NUMA architecture and others might not. This
difference can result in differences in performance numbers across your hosts. DRS does not
consider memory cache architecture because accounting for these differences results in minimal (if
any) benefits at the DRS granularity of scheduling.
10
is not factored in as part of the virtual machine’s memory usage. You can very easily overlook the
overhead when sizing your resource pools and when estimating how many virtual machines to place
1. Select the virtual machine in the inventory and click the Summary tab.
DRS can make migration recommendations only for virtual machines that can be migrated with
vSphere vMotion.
Ensure that the hosts in the DRS cluster have compatible CPUs. Within the same hardware platform
(Intel or AMD), differences in CPU family might exist, which means different CPU feature sets. As
a result, virtual machines are not be able to migrate across hosts. However, Enhanced vMotion
Compatibility (EVC) automatically configures server CPUs with Intel FlexMigration or AMD-V
Extended Migration technologies to be compatible with older servers. EVC prevents migrations
with vMotion from failing due to incompatible CPUs.’
Virtual machines running on hardware version 9 cannot run on prior versions of ESXi. Therefore,
such virtual machines can be moved using vMotion only to other ESXi 5.1 hosts.
10
reserved. A minimum total reservation of 30% of a processor core is reserved.
Placement of a virtual machine’s swap file can affect the performance of vMotion. If the swap file is
In most cases, keep the migration threshold at its default, which is moderate. Set the migration
threshold to more aggressive levels when the following conditions are satisfied:
• The hosts in the cluster are relatively homogeneous.
• The virtual machines in the cluster are running workloads that are fairly constant.
• Few to no constraints (affinity and anti-affinity rules) are defined for the virtual machines.
Use affinity and anti-affinity rules only when necessary. The more rules that are used, the less
flexibility DRS has when determining on what hosts to place virtual machines. In most cases,
leaving the affinity settings unchanged provides the best results. In rare cases, however, specifying
affinity rules can help improve performance.
Finally, make sure that DRS is in fully-automated mode cluster-wide. For more control over your
critical virtual machines, override the cluster-wide setting by setting manual or partially-automated
mode on selected virtual machines.
10
Calculate number of failover hosts
Specify Failover Hosts needed to hold your virtual
vCenter Server uses admission control to ensure that sufficient resources are available in a cluster to
provide failover protection. Admission control is also used to ensure that virtual machine resource
reservations are respected.
Choose a VMware vSphere® High Availability admission control policy based on your availability
needs and the characteristics of your cluster. When choosing an admission control policy, you
should consider a number of factors:
• Avoiding resource fragmentation – Resource fragmentation occurs when enough resources in
aggregate exist for a virtual machine to be failed over. However, those resources are located on
multiple hosts and are unusable because a virtual machine can run on one ESXi host at a time.
The Host Failures The Cluster Tolerates policy avoids resource fragmentation by defining a slot
as the maximum virtual machine reservation. The Percentage of Cluster Resources policy does
not address the problem of resource fragmentation. With the Specify Failover Hosts policy,
resources are not fragmented because hosts are reserved for failover.
CPU: 2GHz CPU: 2GHz CPU: 1GHz CPU: 1GHz CPU: 1GHz
RAM: 1GB RAM: 1GB RAM: 2GB RAM: 1GB RAM: 1GB
10
* plus memory overhead
A slot is a logical representation of memory and CPU resources. By default, a slot is sized to satisfy
the requirements for any powered-on virtual machine in the cluster. Slot size is calculated by
comparing both the CPU and memory requirements of the virtual machines and selecting the largest.
In the example above, the largest CPU requirement, 2GHz, is shared by the first and second virtual
machines. The largest memory requirement, 2GB, is held by the third virtual machine. Based on this
information, the slot size is 2GHz CPU and 2GB memory.
Of the three ESXi hosts in the example, ESXi host 1 can support four slots. ESXi hosts 2 and 3 can
support three slots each.
A virtual machine requires a certain amount of available overhead memory to power on. The
amount of overhead required varies, and depends on the amount of configured memory. The amount
of this overhead is added to the memory slot size value. For simplicity, the calculations do not
include the memory overhead. However, just be aware that such overhead exists.
vSphere HA admission control does not consider the number of vCPUs that a virtual machine has.
Admission control looks only at the virtual machine’s reservation values. Therefore, a 4GHz CPU
reservation means the same thing for both a 4-way SMP virtual machine and a single-vCPU virtual
machine.
After determining the maximum number of slots that each host can
support, the current failover capacity can be computed.
In this example, current failover capacity is one.
CPU: 2GHz CPU: 2GHz CPU: 1GHz CPU: 1GHz CPU: 1GHz
RAM: 1GB RAM: 1GB RAM: 2GB RAM: 1GB RAM: 1GB
Continuing with this example, the largest host in the cluster is host 1. If host 1 fails, then six slots
remain in the cluster, which is sufficient for all five of the powered-on virtual machines. The cluster
has one available slot left.
If both hosts 1 and 2 fail, then only three slots remain, which is insufficient for the number of
powered-on virtual machines to fail over. Therefore, the current failover capacity is one.
10
Total host resources: 24GHz, 21GB
With the Percentage of Cluster Resources Reserved admission control policy, vSphere HA ensures
that a specified percentage of aggregate CPU and memory resources are reserved for failover.
With this policy, vSphere HA enforces admission control as follows:
1. Calculates the total resource requirements for all powered-on virtual machines in the cluster
3. Calculates the current CPU failover capacity and current memory failover capacity for the
cluster
4. Determines if either the current CPU failover capacity or current memory failover capacity is
less than the corresponding configured failover capacity (provided by the user)
If so, admission control disallows the operation.
vSphere HA uses the actual reservations of the virtual machines. If a virtual machine does not have
reservations (reservation is 0), a default of 0MB memory and 256MHz CPU is applied.
CPU: 2GHz CPU: 2GHz CPU: 1GHz CPU: 1GHz CPU: 1GHz
RAM: 1GB RAM: 1GB RAM: 2GB RAM: 1GB RAM: 1GB
The current CPU failover capacity is computed by subtracting the total CPU resource requirements
from the total host CPU resources and dividing the result by the total host CPU resources. The
current memory failover capacity is calculated similarly.
In the example above, the total resource requirements for the powered-on virtual machines is 7GHz
and 6GB. The total host resources available for virtual machines is 24GHz and 21GB. Based on
these values, the current CPU failover capacity is 70%. Similarly, the current memory failover
capacity is 71%.
Problem:
You cannot power on a virtual machine in a vSphere HA cluster due to
insufficient resources.
Cause:
This problem can have several causes:
Hosts in the cluster are disconnected, in maintenance mode, not
responding, or have a vSphere HA error.
Cluster contains virtual machines that have much larger memory or CPU
reservations than the others.
10
No free slots exist in the cluster.
Solution:
If you select the Host Failures The Cluster Tolerates admission control policy and certain problems
arise, you might be prevented from powering on a virtual machine due to insufficient resources.
This problem can have several causes:
• Hosts in the cluster are disconnected, in maintenance mode, not responding, or have a vSphere
HA error.
Disconnected and maintenance mode hosts are typically caused by user action. Unresponsive or
error-possessing hosts usually result from a more serious problem. For example, hosts or agents
have failed or a networking problem exists.
• Cluster contains virtual machines that have much larger memory or CPU reservations than the
others.
The Host Failures the Cluster Tolerates admission control policy is based on the calculation on
a slot size comprised of two components, the CPU and memory reservations of a virtual
machine. If the calculation of this slot size is skewed by outlier virtual machines, the admission
control policy can become too restrictive and result in the inability to power on virtual
machines.
1. In the vSphere HA panel of the cluster’s Summary tab, click on the Advanced Runtime Info
link. This dialog box shows the slot size and how many available slots exist in the cluster.
2. If the slot size appears too high, click on the Resource Allocation tab of the cluster. Sort the
virtual machines by reservation to determine which have the largest CPU and memory
reservations.
3. If there are outlier virtual machines with much higher reservations than the others, consider
using a different vSphere HA admission control policy (such as the Percentage of Cluster
Resources Reserved admission control policy). Or use the vSphere HA advanced options to
place an absolute cap on the slot size. Both of these options, however, increase the risk of
resource fragmentation.
10
das.vmCpuMinMHz
das.vmMemoryMinMB
The Host Failures the Cluster Tolerates admission control policy uses slot size to determine whether
enough capacity exists in the cluster to power on a virtual machine.
The advanced options das.vmCpuMinMHz and das.vmMemoryMinMB set the default slot size if all
virtual machines are configured with reservations set at 0. These options are used:
• Specifically for vSphere HA admission control
• Only if CPU or memory reservations are not specified for the virtual machine
• In calculating the current failover level
das.vmMemoryMinMB specifies the minimum amount of memory (in megabytes) sufficient for any
virtual machine in the cluster to be usable. If no value is specified, the default is 256MB. This value
effectively defines a minimum memory bound on the slot size.
das.vmCpuMinHz specifies the minimum amount of CPU (in megahertz) sufficient for any virtual
machine in the cluster to be usable. If no value is specified, the default is 256MHz. This value
effectively defines a minimum processor bound on the slot size.
Set advanced
parameters by
editing vSphere HA
cluster settings.
After you have established a cluster, you have the option of setting attributes that affect how
vSphere HA behaves. These attributes are called advanced options.
To set an advanced option:
Problem:
The number of slots available in the cluster is smaller than expected.
Cause:
Slot size is calculated using the largest reservations plus the memory
overhead of any powered on virtual machines in the cluster.
vSphere HA admission control considers only the resources on a host
that are available for virtual machines.
This amount is less than the total amount of physical resources on the host
10
because some overhead exists.
Solution:
Another problem that you might encounter is that the Advanced Runtime Info dialog box might
display a smaller number of available slots in the cluster than you expect.
When you select the Host Failures the Cluster Tolerates admission control policy, the Advanced
Runtime Info link appears in the vSphere HA panel of the cluster's Summary tab. Clicking this
link displays information about the cluster, including the number of slots available to power on
additional virtual machines in the cluster.
Remember that slot size is calculated using the largest reservations plus the memory overhead of
any powered on virtual machines in the cluster. However, vSphere HA admission control considers
only the resources on a host that are available for virtual machines.
Therefore, the number of slots available might be smaller than expected because you did not account
for the memory overhead.
A few solutions exist, which are noted in the slide.
In this lab, you will detect possible virtual machine and vSphere HA
cluster problems.
1. Move your ESXi host into the first problem cluster.
2. Troubleshoot the first cluster problem.
3. Troubleshoot the second cluster problem.
4. Prepare for the next lab.
10
vMotion guidelines
Cluster setting guidelines
Always install the latest version of VMware Tools into all virtual
machines.
A well-tuned virtual machine provides an optimal environment for
applications to run in.
Use shares, limits, and reservations carefully because they can
significantly affect performance.
Resource pools are useful for limiting virtual machines to a specific
amount of resources.
DRS can make migration recommendations only for virtual machines
that can be migrated with vMotion.
Questions?
Module 11
11
Host and Management Scalability
11
Host and Management Scalability
Lesson 2:
vCenter Linked Mode
11
Host and Management Scalability
11
Host and Management Scalability
vCenter Linked Mode enables VMware vSphere® administrators to search across multiple
VMware® vCenter Server™ system inventories from a single VMware vSphere® Client™ session.
For example, you might want to simplify management of inventories associated with remote offices
or multiple datacenters. Likewise, you might use vCenter Linked Mode to configure a recovery site
for disaster recovery purposes.
vCenter Linked Mode enables you to do the following:
• Log in simultaneously to all vCenter Server systems for which you have valid credentials
• Search the inventories of all vCenter Server systems in the group
• Search for user roles on all the vCenter Server systems in the group
• View the inventories of all vCenter Server systems in the group in a single inventory view
• With vCenter Linked Mode, you can have up to ten linked vCenter Server systems and up to
3,000 hosts across the linked vCenter Server systems. For example, you can have ten linked
vCenter Server systems, each with 300 hosts, or five vCenter Server systems with 600 hosts
each. But you cannot have two vCenter Server systems with 1,500 hosts each, because that
exceeds the limit of 1,000 hosts per vCenter Server system. vCenter Linked Mode supports
30,000 powered-on virtual machines (and 50,000 registered virtual machines) across linked
vCenter Server systems.
Connection information
Certificates and thumbprints
Licensing information
User roles
vCenter Linked Mode uses Microsoft Active Directory Lightweight Directory Services (AD LDS) to
store and synchronize data across multiple vCenter Server instances. AD LDS is an implementation
of Lightweight Directory Access Protocol (LDAP). AD LDS is installed as part of the vCenter
Server installation. Each AD LDS instance stores data from all the vCenter Server instances in the
group, including information about roles and licenses. This information is regularly replicated across
all the AD LDS instances in the connected group to keep them in sync. After you have configured
multiple vCenter Server instances in a group, the group is called a Linked Mode group.
Using peer-to-peer networking, the vCenter Server instances in a Linked Mode group replicate
shared global data to the LDAP directory. The global data for each vCenter Server instance includes:
• Connection information (IP addresses and ports)
• Certificates and thumbprints
• Licensing information
• User roles
• The vSphere Client can connect to other vCenter Server instances by using the connection
information retrieved from AD LDS. The Apache Tomcat Web service running on vCenter
Server enables the search capability across multiple vCenter Server instances. All vCenter
Server instances in a Linked Mode group can access a common view of the global data.
11
Host and Management Scalability
For inventory searches, vCenter Linked Mode relies on a Java-based Web application called the
query service, which runs in Tomcat Web services.
When you use search for an object, the following operations take place:
1. The vSphere Client logs in to a vCenter Server system.
2. The vSphere Client obtains a ticket to connect to the local query service.
3. The local query service connects to the query services on other vCenter Server instances to do a
distributed search.
4. Before returning the results, vCenter Server filters the search results according to permissions.
The search service queries Active Directory (AD) for information about user permissions. So you
must be logged in to a domain account to search all vCenter Server systems in vCenter Linked
Mode. If you log in with a local account, searches return results only for the local vCenter Server
system, even if it is joined to other systems in a Linked Mode group.
2. Click the icon in the Search Inventory field at the top right of the vSphere Client window.
11
DNS name must match machine name.
Clocks synchronized across instances
11
Host and Management Scalability
During the installation of vCenter Server, you have the option of joining a Linked Mode group. If
you do not join during installation, you can join a Linked Mode group after vCenter Server has
been installed.
To join a vCenter Server system to a Linked Mode group:
1. Select Start > Programs > VMware > vCenter Server Linked Mode Configuration. Click
Next.
2. Select Modify linked mode configuration. Click Next.
3. Click Join this vCenter Server instance to an existing linked mode group or another
instance. Click Next.
4. When prompted, enter the remaining networking information. Click Finish. The vCenter Server
instance is now part of a Linked Mode group.
After you form a Linked Mode group, you can log in to any single instance of vCenter Server. From
that single instance, you can view and manage the inventories of all the vCenter Server instances in
the group. The delay is usually less than 15 seconds. A new vCenter Server instance might take a
few minutes to be recognized and published by the existing instances because group members do
not read the global data often.
When logged in to a vCenter Server system that is part of a Linked Mode group, you can monitor
the health of services running on each server in the group.
To display the status of vCenter services:
• On the vSphere Client Home page, click vCenter Service Status. The vCenter Service Status
page enables you to view information that includes:
• A list of all vCenter Server systems and their services
• A list of all vCenter Server plug-ins
• The status of all listed items
• The date and time of the last change in status
• Messages associated with the change in status.
11
Host and Management Scalability
When you join a vCenter Server system to a Linked Mode group, the roles defined on each vCenter
Server system in the group are replicated to the other systems in the group.
If the roles defined on each vCenter Server system are different, the role lists of the systems are
combined into a single common list. For example, vCenter Server 1 has a role named Role A and
vCenter Server 2 has a role named Role B. Both systems will have both Role A and Role B after the
systems are joined in a Linked Mode group.
If two vCenter Server systems have roles with the same name, the roles are combined into a single
role if they contain the same privileges on each vCenter Server system. If two vCenter Server
systems have roles with the same name that contain different privileges, this conflict must be
resolved by renaming at least one of the roles. You can choose to resolve the conflicting roles either
automatically or manually.
If you choose to reconcile the roles automatically, the role on the joining system is renamed to
<vCenter_Server_system_name> <role_name>. <vCenter_Server_system_name> is the name of the
vCenter Server system that is joining the Linked Mode group, and <role_name> is the name of the
original role. To reconcile the roles manually, connect to one of the vCenter Server systems with the
vSphere Client. Then rename one instance of the role before joining the vCenter Server system to
the Linked Mode group. If you remove a vCenter Server system from a Linked Mode group, the
vCenter Server system retains all the roles that it had as part of the group.
1
You can isolate a vCenter Server instance
from a Linked Mode group in two ways:
Use the vCenter Server Linked Mode 3
11
Configuration wizard.
Use Windows Add/Remove Programs.
1. Select Start > All Programs > VMware > vCenter Server Linked Mode Configuration.
3. Click Isolate this vCenter Server instance from linked mode group. Click Next.
4. Click Continue.
5. Click Finish. The vCenter Server instance is no longer part of the Linked Mode group.
Lesson 2:
vSphere DPM
11
Host and Management Scalability
11
Host and Management Scalability
VMware vSphere® Distributed Power Management™ (vSphere DPM) continuously monitors
resource requirements and power consumption across a VMware vSphere® Distributed Resource
Scheduler™ (DRS) cluster. When the cluster needs fewer resources, it consolidates workloads and
powers off unused VMware vSphere® ESXi™ hosts to reduce power consumption. Virtual
machines are not affected by disruption or downtime. Evacuated hosts are kept powered off during
periods of low resource use. When resource requirements of workloads increase, vSphere DPM
powers hosts on for virtual machines to use.
vSphere DPM can use one of three power management protocols to bring a host out of standby mode:
• Intelligent Platform Management Interface (IPMI)
• Hewlett-Packard Integrated Lights-Out (iLO)
• Wake on LAN (WOL)
2. iLO
3. WOL
A host that supports both WOL and IPMI defaults to using IPMI if IPMI is configured. A host that
supports both WOL and iLO defaults to using iLO if iLO is configured.
For each of these wake protocols, you must perform specific configuration and testing steps on each
host before you can enable vSphere DPM for the cluster.
For information about configuration and testing guidelines, see vSphere Resource Management
Guide at http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html.
vSphere DPM:
Operates on VMware vSphere® ESXi hosts that can be awakened
from a standby state through WOL packets or IPMI-based remote
power-on:
WOL packets are sent over the VMware vSphere® vMotion® network by
another ESXi host in the cluster.
vSphere DPM keeps at least one such host powered on.
Can be enabled or disabled at the cluster level:
vSphere DPM is disabled by default.
11
Host and Management Scalability
Hosts powered off by vSphere DPM are marked by vCenter Server as being in standby mode.
Standby mode indicates that the hosts are available to be powered on whenever they are needed.
vSphere DPM operates by awakening ESXi hosts from a powered-off state through WOL packets.
These packets are sent over the vMotion networking interface by another host in the cluster, so
vSphere DPM keeps at least one host powered on at all times. Before you enable vSphere DPM on a
host, manually test the “exit standby” procedure for that host to ensure that it can be powered on
successfully through WOL. You can do this test with the vSphere Client.
For WOL compatibility requirements, see VMware® knowledge base article 1003373 at
http://kb.vmware.com/kb/1003373.
vSphere DPM powers off the host when the cluster load is low.
vSphere DPM considers a 40-minute load history.
All virtual machines on the selected host are migrated to other hosts.
vSphere DPM powers on a host when the cluster load is high.
It considers a 5-minute load history.
The WOL packet is sent to the selected host, which boots up.
DRS load balancing initiates, and some virtual machines are migrated
to this host.
The vSphere DPM algorithm does not frequently power servers on and off in response to transient
load variations. Rather, it powers off a server only when it is very likely that it will stay powered off
for some time. It does this by taking into account the history of the cluster workload. When a server
is powered back on, vSphere DPM ensures that it stays powered on for a nontrivial amount of time
before being again considered for power-off. To minimize the power-cycling frequency across all of
the hosts in the cluster, vSphere DPM cycles through all comparable hosts enabled for vSphere
DPM hosts when trying to find a power-off candidate.
When vSphere HA admission control is disabled, failover resource constraints are not passed on to
DRS and vSphere DPM. The constraints are not enforced. DRS does evacuate virtual machines from
hosts. It places the hosts in maintenance mode or standby mode, regardless of the effect that this
action might have on failover requirements.
11
Host and Management Scalability
vSphere DPM is a cluster power management feature. Enhanced Intel SpeedStep and AMD
PowerNow! are CPU power management technologies.
ESXi supports Enhanced Intel SpeedStep and AMD PowerNow! The dynamic voltage and
frequency scaling (DVFS) enabled by these hardware features enables power savings on ESXi hosts
that are not operating at maximum capacity.
DVFS is useful. But the power savings that it provides, even for an idle host, are not necessarily
much lower than powering off the host. So DVFS is complementary to vSphere DPM, and VMware
recommends that the two be used together to maximize power savings.
In other words, vSphere DPM achieves a great power savings by completely powering off some
hosts. DVFS provides more savings on the remaining hosts if they are at least somewhat
underutilized. Using DVFS means that power can be saved even when the load is high enough to
keep vSphere DPM from powering off any hosts.
Implementing vSphere DPM at the cluster level, while using per-host DVFS power management
technologies, enables vSphere to provide effective power management on the levels of the virtual
machine, the host, and the datacenter.Monitoring Cluster Status
Automation Function
level
Off Disables features
Manual Recommends
only
11
Automatic Executes
recommendations
11
Host and Management Scalability
When you enable vSphere DPM, hosts in the DRS cluster inherit the power management automation
level of the cluster by default. You can override this default for an individual host so that its
automation level differs from that of the cluster.
After enabling and running vSphere DPM, you can verify that it is functioning properly by viewing
each host’s information in the Last Time Exited Standby column. This information is on the Host
Options page in the cluster Settings dialog box and on the Hosts tab for each cluster.
A time stamp is displayed as well as whether vCenter Server succeeded or failed the last time that it
attempted to bring the host out of standby mode. vSphere DPM derives times for the Last Time
Exited Standby column from the vCenter Server event log. If no such attempt has been made, the
column displays Never (on the slide). If the event log is cleared, the Last Time Exited Standby
column is reset to Never.
Use this page to disable the host for any host that fails to exit standby mode. When a host is
disabled, vSphere DPM does not consider that host a candidate for being powered off.
VMware strongly recommends that you disable any host with a Last Time Exited Standby status
of Never until you ensure that WOL is configured properly. You do not want vSphere DPM to
power off a host for which wake has not been successfully tested.
Lesson 3:
Host Profiles
11
Host and Management Scalability
11
Host and Management Scalability
Over time, your vSphere environment expands and changes. New hosts are added to the datacenter
and must be configured to coexist with other hosts in the cluster. Applications and guest operating
systems must be updated with the latest software versions and security patches. To review, many
components are configured on an ESXi host:
• Storage – Datastores (VMware vSphere® VMFS and NFS), iSCSI initiators and targets,
multipathing
• Networking – Virtual switches, port groups, physical network interface card (NIC) speed,
security, NIC teaming policies
• Licensing – Edition selection, license key input
• DNS and routing – DNS server, IP default gateway
• Processor – Use of hyperthreads as logical CPUs
• Memory – Amount of memory on an ESXi host
Processes already exist to modify the configuration of a single host:
• Manually, using the vSphere Client or the command prompt
• Automatically, using scripts
Benefits:
Simplified setup
and change
management for
ESXi hosts
Easy detection of noncompliance
with standard configurations
Automated remediation
The Host Profiles feature enables you to export configuration settings from a master reference host
and save them as a portable set of policies, called a host profile. You can use this profile to quickly
configure other hosts in the datacenter. Configuring hosts with this method drastically reduces the
setup time of new hosts: multiple steps are reduced to a single click. Host profiles also eliminate the
need for specialized scripts to configure hosts.
vCenter Server uses host profiles as host configuration baselines so that you can monitor changes to
the host configuration, detect discrepancies, and fix them.
As a licensed feature of vSphere, host profiles are available only with the appropriate licensing. If
you see errors, make sure that you have the appropriate vSphere licensing for your hosts.
host profile 2
memory reservation
storage
networking 4
date and time
firewall 3
security
services
5
users and user groups
new
host
11
1 reference host cluster
After the host profile is created and associated with a set of hosts or clusters, you can check the
compliance status from various places in the vSphere Client:
• Host Profiles main view – Displays the compliance status of hosts and clusters, listed by profile
• Host Summary tab – Displays the compliance status of the selected host
• Cluster Profile Compliance tab – Displays the compliance status of the selected cluster and all
the hosts in the selected cluster (shown on the slide)
1. Select the host in the Host Compliance Against Cluster Requirements And Host Profile(s)
panel.
2. Review the reasons that are listed in the Host Compliance Failures pane at the bottom of the
window.
11
TIP
You can schedule tasks in vSphere to help automate compliancy checking. To run a scheduled task,
After attaching the profile to an entity (a host or a cluster of hosts), configure the entity by applying
the configuration as contained in the profile. Before you apply the changes, notice that vCenter
Server displays a detailed list of actions that will be performed on the host to configure it and bring
it into compliance. This list of actions enables the administrator to cancel applying the host profile if
the parameters shown are not wanted.
For remediation of the host to occur from the application of a host profile, the host must be placed in
maintenance mode.
The Host Profile Compliance column displays the current status. On the slide, the host is not
compliant and so the host profile will be applied.
To apply a host profile:
11
Host and Management Scalability
After a host profile is first applied to a host, an answer file is created for that host.
For hosts provisioned with VMware vSphere® Auto Deploy™, vCenter Server owns the entire host
configuration, which is captured in a host profile. In most cases, the host profile information is
sufficient to store all configuration information. Sometimes the user is prompted for input when the
host provisioned with Auto Deploy boots. The answer file mechanism manages those cases. After
the user has specified the information, the system generates a host-specific answer file and stores it
with the Auto Deploy cache and the vCenter Server host object.
You can check the status of an answer file. If the answer file has all of the user input answers
needed, then the status is Complete. If the answer file is missing some of the required user input
answers, then the status is Incomplete. You can also update or change the user input parameters for
the host profile’s policies in the answer file.
.vpf File
Some organizations might have large environments that span many vCenter Server instances. If you
want to standardize on a single host configuration across multiple vCenter Server environments, you
can export the host profile out of vCenter Server and have other vCenter Servers import the profile
for use in their environments. Host profiles are not replicated, shared, or kept in sync across
VMware vCenter Server instances joined in Linked Mode.
You can export a profile to a file that is in the VMware profile format (.vpf).
NOTE
When a host profile is exported, administrator and user profile passwords are not exported. This
restriction is a security measure and stops passwords from being exported in plain text when the
profile is exported. You will be prompted to re-enter the values for the password after the profile is
imported and the password is applied to a host.
11
Host and Management Scalability
Lesson 4:
vSphere PowerCLI
11
Host and Management Scalability
11
Host and Management Scalability
VMware vSphere® PowerCLI™ provides Windows PowerShell interface to VMware vSphere®
API. Also, the vSphere PowerCLI command-line syntax is same as the generic Windows
PowerShell syntax.
vSphere PowerCLI is a powerful automation and administration tool that adds about 300 commands,
called cmdlets, on top of Windows PowerShell. These cmdlets ease the management and automation
of vSphere infrastructure. System administrators can perform useful actions with vSphere PowerCLI
without having to learn a new language.
11
Host and Management Scalability
vSphere PowerCLI is used by both basic administrators and advanced administrators. Basic
administrators use vSphere PowerCLI to manage their VMware infrastructure from the command
line. Advanced administrators use vSphere PowerCLI to develop scripts that can be reused by other
administrators or integrated into other applications.
Some of the advantages of using vSphere PowerCLI are:
• Reduces operating expenses, allowing you to focus on strategic projects: Automating common
virtualization tasks helps you to reduce your operation expenses and allows your team to focus
on strategic projects. Automating the management of your vSphere environment can help
increase your virtual machine–to–administrator ratio.
• Maintains consistency and ensures compliance: Script automation allows your team to deploy
servers and virtual machines in a structured and automated model. Using scripts removes the
risk of human error involved when performing repetitive tasks.
Reduces time spent on performing manual, repetitive tasks: Automating certain manual repetitive
tasks such as finding unused virtual machines, rebalancing storage paths, clearing space from
datastore saves you time.
11
Host and Management Scalability
• ESXi host set up and management can be time intensive without automation. Automating tasks
that you must perform several times in week saves you time.
• Reporting is critical to management of any size environment, because it allows your team to
measure trends and provide internal chargeback in your organization. You can automate
reporting to avoid human error.
• vSphere PowerCLI can help increase your organization’s virtual machine-to-administrator ratio
by automating common tasks and letting your team focus on strategic projects. The following
tasks are some of the tasks that you can automate:
• Set virtual machine power-on options, such as configuring tools to automatically upgrade
• Set resources, such as limits and reservations, on a virtual machine
Determine the version of VMware® Tools™ that the virtual machines are using?
11
Host and Management Scalability
vSphere PowerCLI uses objects to represent the vSphere infrastructure. Over 60 objects in vSphere
PowerCLI are defined. These objects include virtual machines, ESXi hosts, datastores, virtual
networks, and clusters. Objects in vSphere PowerCLI have properties and methods. Properties
describe the object. Methods perform an action on the object.
The Get-Member cmdlet returns information about objects. The cmdlet exposes the properties
and methods of an object. Use this cmdlet to inspect an object to see more information than what
vSphere PowerCLI returns by default.
The example uses the Get-Member cmdlet to view the properties and methods of the virtual
machine object. The cmdlet returns a list of all of the properties and methods that are associated
with the virtual machine object.
11
Host and Management Scalability
vSphere PowerCLI supports working with multiple default servers. If you select this option, every
time you connect to a different server with Connect-VIServer, the new server connection is
stored in an array variable. The new server connection is stored together with the previously
connected servers, unless the -NotDefault parameter is set. This variable is named
$DefaultVIServers and its initial value is an empty array. When you run a cmdlet and the target
servers cannot be determined from the specified parameters, the cmdlet runs against all servers
stored in the array variable. To remove a server from the $DefaultVIServers variable, you can
either use Disconnect-Server to close all active connections to the server, or modify the value
of $DefaultVIServers manually.
If you choose to work with a single default server, when you run a cmdlet and the target servers
cannot be determined from the specified parameters, the cmdlet runs against the last connected
server. This server is stored in the $defaultVIServer variable, which is updated every time you
establish a new connection. To switch between single and multiple default servers working mode,
use the Set-PowerCLIConfiguration cmdlet.
When Connect-VIServer is run, the title bar of the vSphere PowerCLI console indicates that a
connection has been made. A warning message might be generated if the ESXi host or vCenter
Server system is using the self-signed certificate. This warning can be ignored unless the server has
been issued a signed certificate.
11
Host and Management Scalability
Four kinds of cmdlets are commonly used in vSphere PowerCLI:
• “Get” cmdlets retrieve information about objects like virtual machines, virtual machine hosts,
and datastores.
• “New” cmdlets create objects like virtual machines, virtual switches, and datacenters.
• “Set” cmdlets change existing objects, for example, a hardware change to a virtual machine.
• “Move” cmdlets relocate objects, for example, moving a virtual machine from one ESXi host to
another.
• “Remove” cmdlet removes the specified object, for example, removing a virtual machine from
the ESXi host.
After connecting to the ESXi host or vCenter Server, you can manage
and monitor the vSphere infrastructure from the command prompt.
Examples:
The version of vSphere PowerCLI: Get-PowerCLIVersion
All the cmdlets that vSphere PowerCLI provides: Get-VICommand
The version of the ESXi host: Get-VMHost | Select Name,
@{Name="Product";Expression={$_.ExtensionData.Config
.Product.Name}}, Version
The number of processors on the ESXi host: Get-Vmhost|Select
CpuTotalMhz
The memory available on the ESXi host: Get-Vmhost|Select
MemoryTotalMB
Disconnect a CDROM drive : Get-VM | Get-Cddrive |
Set-CDDrive –Connected $false
11
Host and Management Scalability
The vSphere PowerCLI package includes three additional snap-ins for managing specific
vSphere features:
• VMware.VimAutomation.License - The Licensing snap-in contains the Get-
LicenseDataManager cmdlet for managing VMware License components.
• VMware.ImageBuilder - The Image Builder snap-in contains cmdlets for managing depots,
image profiles, and VIBs.
• VMware.DeployAutomation - The Auto Deploy snap-in contains cmdlets that provide an
interface to Auto Deploy for provisioning physical hosts with ESXi software.
To use these additional snap-ins, you must first add them using the Add-PSSnapin cmdlet. For
example, to add the Auto Deploy snap-in, in the PowerCLI console type Add-PSSnapin
VMware.DeployAutomation.
Listing the Cmdlets in a vSphere PowerCLI Snap-in: To view the installed PowerShell snap-ins, use
the Get-PSSnapin command. You can filter the vSphere PowerCLI snap-ins by using the -Name
parameter and a wildcard:
Get-PSSnapin -Name *vmware*
In this lab, you will use vSphere PowerCLI to run basic Windows PowerShell cmdlets.
1. Start vSphere PowerCLI and find the vSphere PowerCLI version.
2. Retrieve the Windows PowerShell security policy.
3. Create a simple script.
4. Display the vSphere PowerCLI help files.
5. Connect and disconnect to your ESXi host.
6. Define a variable to store the ESXi host and vCenter Server system names.
7. Connect to the vCenter Server system.
8. Run cmdlets to find ESXi host-related information.
9. List the datastores on an ESXi host.
10. List the properties of a virtual switch.
11. Use Get-VM to retrieve virtual machine properties.
12. Use Get-Stat to retrieve performance information.
11
13. Migrate a virtual machines disk to shared storage.
14. Use vMotion to migrate a virtual machine.
Lesson 5:
Image Builder
11
Host and Management Scalability
core CIM
hypervisor providers
11
plug-in
drivers
components
The ESXi image includes one or more VMware installation bundles (VIBs).
A VIB is an ESXi software package. In VIBs, VMware and its partners package solutions, drivers,
Common Information Model (CIM) providers, and applications that extend the ESXi platform. An
ESXi image should always contain one base VIB. Other VIBs can be added to include additional
drivers, CIM providers, updates, patches, and applications.
The challenge of using a standard ESXi image is that the image might
be missing desired functionality.
missing
CIM
provider
? missing
driver
standard
ESXi ISO image: missing
11
vendor
Base providers plug-in
Base drivers
Image Builder is a utility for customizing ESXi images. Image Builder consists of a server and
vSphere PowerCLI cmdlets. These cmdlets are used to create and manage VIBs, image profiles,
software depots, and software channels. Image Builder cmdlets are implemented as Microsoft
PowerShell cmdlets and are included in vSphere PowerCLI. Users of Image Builder cmdlets can use
all vSphere PowerCLI features.
11
software channels Used to group different types of VIBs
at a software depot
Image
Builder
To use Image Builder, the first step is to install vSphere PowerCLI and all prerequisite software. The
Image Builder snap-in is included with the vSphere PowerCLI installation.
To install Image Builder, you must install:
• Microsoft .NET 2.0
• Microsoft PowerShell 1.0 or 2.0
• vSphere PowerCLI which includes the Image Builder cmdlets
After you start the vSphere PowerCLI session, the first task is to verify that the execution policy is set
to Unrestricted. For security reasons, Windows PowerShell supports an execution policy feature. It
determines whether scripts are allowed to run and whether they must be digitally signed. By default,
the execution policy is set to Restricted, which is the most secure policy. If you want to run scripts or
load configuration files, you can change the execution policy by using the Set-ExecutionPolicy
cmdlet. To view the current execution policy, use the Get-ExecutionPolicy cmdlet.
The next task is to connect to your vCenter Server system. The Connect-VIServer cmdlet enables
you to start a new session or reestablish a previous session with a vSphere server.
For more about installing Image Builder and its prerequisite software, see vSphere Installation and
Setup Guide. For more about vSphere PowerCLI, see vSphere PowerCLI Installation Guide. Both
books are at http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html.
11
OEM VIBs
Cloning a published profile is the easiest way to create a custom image profile. Cloning a profile is
useful when you want to remove a few VIBs from a profile. Cloning is also useful when you want to
use hosts from different vendors and want to use the same basic profile, with the addition of vendor-
specific VIBs. VMware partners or large installations might consider creating a profile from the
beginning.
The New-EsxImageProfile cmdlet enables you to create an image profile or clone an
image profile.
To add one or more software packages (VIBs) to an image profile, use the Add-
EsxSoftwarePackage cmdlet. Likewise, the Remove-EsxSoftwarePackage cmdlet enables you
to remove software packages from an image profile. The Get-EsxSoftwarePackage cmdlet
retrieves a list of VIBs in an image profile.
driver
Image
VIBs ISO image
Builder
PXE-bootable
11
Image
OEM VIBs
In this lab, you will use Image Builder to export an image profile.
1. Export an image profile to an ISO image.
11
Host and Management Scalability
Lesson 6:
Auto Deploy
11
Host and Management Scalability
Auto Deploy is a new method for provisioning ESXi hosts in vSphere 5.0. With Auto Deploy,
vCenter Server loads the ESXi image directly into the host memory. When the host boots, the
software image and optional configuration are streamed into memory. All changes made to the state
of the host are stored in RAM. When the host is shut down, the state of the host is lost but can be
streamed into memory again when the host is powered back on.
Unlike the other installation options, Auto Deploy does not store the ESXi state on the host disk.
vCenter Server stores and manages ESXi updates and patching through an image profile and,
optionally, the host configuration through a host profile.
Auto Deploy enables the rapid deployment of many hosts. Auto Deploy simplifies ESXi host
management by eliminating the necessity to maintain a separate boot image for each host. A
standard ESXi image can be shared across many hosts. When a host is provisioned using Auto
Deploy, the host image is decoupled from the physical server. The host can be recovered without
having to recover the hardware or restore from a backup. Finally, Auto Deploy eliminates the need
for a dedicated boot device, thus freeing the boot device for other uses.
11
Event recording: log files, core dump add-in components
Auto Deploy
PowerCLI
Auto Deploy
server
fetch of predefined
image profiles
host and VIBs
profile
engine
public depot
Auto Deploy has a rules engine that determines which ESXi image
and host profiles can be used on selected hosts.
The rules engine maps software images and host profiles to hosts,
based on the attributes of the host:
For example, rules can be based on IP or MAC address.
The -AllHosts option can be used for every host.
For new hosts, the Auto Deploy server checks with the rules engine
before serving image and host profiles to a host.
vSphere PowerCLI cmdlets are used to set, evaluate, and update
image profile and host profile rules.
The rules engine includes rules and rule sets.
11
Host and Management Scalability
The rules engine includes the following:
• Rules – Rules can assign image profiles and host profiles to a set of hosts or specify the
inventory location (folder or cluster) of a host on the target vCenter Server system. A rule can
identify target hosts by boot MAC address, SMBIOS asset tag, BIOS UIID, or fixed DHCP IP
address. In most cases, rules apply to multiple hosts. You use vSphere PowerCLI to create rules.
After you create a rule, you must add it to a rule set. After you add a rule to a rule set, you
cannot edit it.
• Active rule set – When a newly started host contacts the Auto Deploy server with a request for
an image, the Auto Deploy server checks the active rule set for matching rules. The image
profile, host profile, and vCenter Server inventory location that are mapped by matching rules
are then used to boot the host. If more than one item is mapped by the rules, the Auto Deploy
server uses the item that is first in the rule set.
• Working rule set – The working rule set enables you to test changes to rules before making
them active. For example, you can use vSphere PowerCLI commands for testing compliance
with the working rule set.
When installing Auto Deploy, the software can be on its own server or it can be installed on the
same host as the vCenter Server instance. Setting up Auto Deploy includes installing the software
and registering Auto Deploy with a vCenter Server system.
The vCenter Server Appliance has the Auto Deploy software installed by default.
vSphere PowerCLI must be installed on a system that can be reached by the vCenter Server system
and the Auto Deploy server.
Installing vSphere PowerCLI includes installing Microsoft PowerShell. For Windows 7 and
Windows Server 2008, Windows PowerShell is installed by default. For Windows XP or Windows
2003, Windows PowerShell must be installed before installing vSphere PowerCLI.
The image profile can come from a public depot or it can be a zipped file stored locally. The local
image profile can be created and customized using vSphere PowerCLI. However, the base ESXi
image must be part of the image profile.
If you are using a host profile, save a copy of the host profile to a location that can be reached by the
Auto Deploy server.
For more about preparing your system and installing the Auto Deploy server, see vSphere Installation
and Setup Guide at http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html.
11
Host and Management Scalability
Autodeployed hosts perform a preboot execution environment (PXE) boot. PXE uses DHCP and
Trivial File Transfer Protocol (TFTP) to boot an operating system over a network.
A DHCP server and a TFTP server must be configured. The DHCP server assigns IP addresses to
each autodeployed host on startup and points the host to a TFTP server to download the gPXE
configuration files. The ESXi hosts can use the infrastructure’s existing DHCP and TFTP servers, or
new DHCP and TFTP servers can be created for use with Auto Deploy. Any DHCP server that
supports the next-server and filename options can be used.
The vCenter Server Appliance can be used as the Auto Deploy server, DHCP server, and TFTP
server. The Auto Deploy service, DHCP service, and TFTP service are included in the appliance.
When an autodeployed host boots up for the first time, a certain sequence of events occurs:
1. When the ESXi host is powered on, the ESXi host starts a PXE boot sequence.
2. The DHCP server assigns an IP address to the ESXi host and instructs the host to contact the
TFTP server.
3. The ESXi host downloads the gPXE image file and the gPXE configuration file from the
TFTP server.
11
Auto
OEM VIBs Deploy cluster A cluster B
The hosts image profile, host profile, and cluster are determined.
vCenter Server
waiter
Auto
OEM VIBs Deploy cluster A cluster B
The Auto Deploy server queries the rules engine for the following information about the host:
• The image profile to use
• The host profile to use
• Which cluster the host belongs to (if any)
The rules engine maps software and configuration settings to hosts, based on the attributes of the
host. For example, you can deploy image profiles or host profiles to two clusters of hosts by writing
two rules, each matching on the network address of one cluster.
For hosts that have not yet been added to vCenter Server, Auto Deploy checks with the rule engine
before serving image profiles and host profiles to hosts.
The image is pushed to the host, and the host profile is applied.
vCenter Server
11
Auto
OEM VIBs Deploy cluster A cluster B
The host is placed in the appropriate cluster, if specified by a rule. The ESXi
image and information on the image profile, host profile, and cluster to use
are held on the Auto Deploy server.
vCenter Server
image
profile Host Profile
rules engine Host Profile
host profile
image profile,
ESXi host profile,
VIBs cluster info
ESXi
driver host
VIBs waiter
ESXi
image
If a rule specifies a target folder or a cluster on the vCenter Server instance, the host is added to that
location. If no rule exists, Auto Deploy adds the host to the first datacenter.
If a host profile was used without an answer file and the profile requires specific information from
the user, the host is placed in maintenance mode. The host remains in maintenance mode until the
user reapplies the host profile and answers any outstanding questions.
If the host is part of a fully automated DRS cluster, the cluster rebalances itself based on the new
host, moving virtual machines onto the new host.
To make subsequent boots quicker, the Auto Deploy server stores the ESXi image as well as the
following information:
• The image profile to use
• The host profile to use
• The location of the host in the vCenter Server inventory
11
OEM VIBs Auto Deploy TFTP DHCP
2. The DHCP server assigns an IP address to the ESXi host and instructs the host to contact the
TFTP server.
3. The host downloads the gPXE image file and the gPXE configuration file from the
TFTP server.
vCenter Server
image
profile Host Profile
rules engine Host Profile
host profile
image profile,
ESXi host profile,
VIBs cluster info
ESXi
image
cluster A cluster B
As in the initial boot sequence, the gPXE configuration file instructs the host to make an HTTP boot
request to the Auto Deploy server.
The ESXi image is downloaded from the Auto Deploy server to the
host. The host profile is downloaded from vCenter Server to the host.
vCenter Server
image
profile Host Profile
rules engine Host Profile
host profile
image profile,
ESXi host profile,
VIBs cluster info
driver ESXi
VIBs waiter host
ESXi
11
image
TFTP DHCP
OEM VIBs Auto Deploy
vCenter Server
image
profile Host Profile
rules engine Host Profile
host profile
image profile,
ESXi host profile,
VIBs cluster info
ESXi
driver
waiter host
VIBs
ESXi
image
Finally, the ESXi host is placed in its assigned cluster on the vCenter Server instance.
11
Host and Management Scalability
Auto Deploy stateless caching PXE boots the ESXi host and loads the image in memory like
stateless ESXi hosts. However, when the host profile is applied to the ESXi host, the image running
in memory is copied to a boot device. The saved image acts as a backup in the event the PXE,
infrastructure or Auto Deploy servers are unavailable. If the host needs to reboot and it cannot
contact the DHCP, TFTP, or Auto Deploy server, the network boot will timeout and the host will
reboot using the cached disk image.
While stateless caching can potentially help ensure availability of an Auto Deployed ESXi host by
enabling the host to boot in the event of an outage affecting the DHCP, TFTP or Auto Deploy
servers, stateless caching does not guarantee that the image is current or the vCenter server is
available after the boot up. Stateless caching’s primary benefit is it enables you to boot the host in
order to help troubleshoot and resolve problems that prevent a successful PXE boot.
It’s also important to note that unlike stateless ESXi hosts, stateless caching requires a dedicated
boot device be assigned to the host.
esx,local
Stateless caching is configured using host profiles. When configuring stateless caching, you can
choose to save the image to a local boot device or a USB disk. You also have the option of leaving
the local VMFS intact or overwriting it.
To configure stateless caching:
2. Select the host profile you want to configure, and click on Edit Profile from the tool bar.
3. Expand the System Image Cache Configuration tree and highlight System Image Cache
Profile Settings.
4. From the pull-down menu on the Configuration Details tab, select Enable stateless caching on
the host.
Enter first disk arguments if needed and whether or not you want to overwrite the local VMFS
and select OK.
The host copies the state locally. Reboots are stateless only if the
PXE and Auto Deploy servers are reached.
vCenter Server
Image
Image
image
Profile Host Profile
Profile Host Profile
profile host profile
rules engine
image profile
host profile
ESXi cache
VIBs Image/Config
Driver
VIBs Waiter
11
Auto
OEM VIBs Deploy
The ESXi host initially boots using Auto Deploy. All subsequent
reboots use local disks.
The benefits of Auto Deploy stateful installations include the
following:
Quickly and efficiently provision hosts
Once provisioned no further requirement on the PXE and Auto Deploy
servers.
Some of the disadvantages of using stateful Installations include the
following:
Over time the configuration might become out of sync with the Auto
Deploy image.
Patching and updating ESXi hosts need to be done using traditional
methods.
Setting up stateful installation is similar to configuring stateless caching. The difference is instead of
attempting to use Auto Deploy to boot the host, the host will do a one-time PXE boot to install
ESXi. Once the image is cached to disk, the host will boot from the disk image on all subsequent
reboots.
11
Host and Management Scalability
Stateful installation is configured using host profiles. When configuring stateful installation you can
choose to save the image to a local boot device or a USB disk. You also have the option of leaving
the local VMFS intact or overwriting it.
To configure stateful installation:
2. Select the host profile that you want to configure, and click on Edit Profile from the tool bar.
3. Expand the System Image Cache Configuration tree and highlight System Image Cache
Profile Settings.
4. From the pull-down menu on the Configuration Details tab, select Enable stateful installs on
the host.
5. Enter first disk arguments if needed and whether or not you want to overwrite the local VMFS
and select OK.
Initial boot up uses Auto Deploy to install the image on the server.
Subsequent reboots are performed from local storage.
vCenter Server
Image
Image
image
Profile Host Profile
Profile Host Profile
profile host profile
rules engine
image profile
host profile
ESXi
ESXi cache
VIBs
VIBs Image/Config
Driver
VIBs Waiter
Auto
OEM VIBs Deploy
With stateful installation, the Auto Deploy server is used to provision new ESXi hosts. The first
time the host boots, it uses the PXE host, DHCP server, and Auto Deploy server just like a stateless
host. All subsequent reboots use the image saved to the local disk.
Once the image is written to the host’s local storage, the image is used for all future reboots. The
initial boot of a system is the equivalent of an ESXi host installation to a local disk. No attempt is
made to update the image, which means over time the image can become stale. Manual processes
need to be implemented to manage configuration updates and patches, losing most of the key
benefits of Auto Deploy.
11
The vCenter and Auto Deploy Servers do not need to be available for
stateless or stateful configurations.
Update Manager now supports PXE installations. Update Manager updates the hosts but does not
update the image on the PXE server.
Only live-install patches can be remediated with Update Manager. Any patch that requires a reboot
cannot be installed on a PXE host. The live-install patches can be from VMware or a third party.
In this lab, you will configure Auto Deploy to boot ESXi hosts.
1. Install Auto Deploy.
2. Configure the DHCP server and TFTP server for Auto Deploy.
3. Use vSphere PowerCLI to configure Auto Deploy.
4. (For vClass users only) Configure the ESXi host to boot from the
network.
5. (For non-vClass users) Configure the ESXi host to boot from the
network.
6. View the autodeployed host in the vCenter Server inventory.
7. (Optional) Apply a host profile to the autodeployed host.
8. (Optional) Define a rule to apply the host profile to an autodeployed
host when it boots.
11
Host and Management Scalability
11
Questions?