You are on page 1of 254

VMware Solution

Knowledge Transfer Workshop

©2019 VMware, Inc.


Agenda ​Introductions and Workshop Objectives

​Conceptual Design

​Solution Scope

​Logical Design

​vSphere Design Session


Consultant:
​Adoption for the Digital Workspace Before giving this presentation to a
<Process Type Name> customer, use the Delivery Reference Guide
to obtain the correct Knowledge Transfer
<Process Type Name>
Content for this engagement based upon the
<Process Type Name> specific products include in the Solution
Builder engagement file.
​Next Steps

2
Introductions and Workshop Objectives
​Participant introductions
• What experience do you have?
• Why are you attending this workshop?
PS Consultant: Please fill out these bullet points to provide
• What do you expect to achieve? scope and expectations of this workshop.

​Expectations of the workshop


• Outcomes expected of workshop

3
VMware Solution Conceptual Design

Management Automation Consumption Interface Consumption Interface


End User Access Security

Applications
Client Data
Financial Scaling GUI API CLI SaaS 3rd Platform
Server Availability

Compliance Analytics
Resource Catalogs Active Workloads Active Workloads
Data On Premesis Cloud
Governance Data Services Isolation
Applications Virtual Virtual
Containers Containers
Machines Machines
Infrastructure
Service Level Reclamation
Abstraction, Pooling and Tenancy
Development Threat
Event
lifecycle Compute Network Storage Compute Network Storage Compute Network Storage Containment

Application
Capacity
provisioning
Physical Resources
Infrastructure Compute Storage Network Data
Performance
Provisioning Encryption

4
IT Value Model

5
6

Digital Workspace Journey Model

6
VMware Solution
Logical Diagram

PSO Consultant: Insert


appropriate logical diagrams
based on the current solution.

7
Solution Scope
The table below lists the scope of the engagement in the context of the VMware journey model. These are
the IT Capabilities that have been determined as the focus for this engagement

IT Capabilities in Scope
Automatically recover from hardware failures
Abstract and pool compute and storage resources

8
Solution Scope
The table below lists the scope of the engagement in the context of the VMware journey model. These are
the IT Problems that have been determined as the focus for this engagement

IT Problems in Scope
High CAPEX for dedicated infrastructure
Single point of failure
Long wait time for hardware purchases
Unexpected infrastructure outages
Performance bottlenecks
Not enough data center resources or space

9
VMware Solution
Logical Diagram

PSO Consultant: Insert


appropriate logical diagrams
based on the current solution.
Review the diagram, covering
each component and its function

10
Engagement Scope - Technical
This specific engagement by VMware Professional Services included the following components of the VMware
Solution. This Solution Design content will only refer to these components.

Technology Components Version


vSphere 7.0.x

11
VMware vSphere 7.0.x

13
What’s New
vSphere 7.0

14
Product Overview
vSphere 7.0.x

16
Virtualization Overview
​Virtualization
• Abstracts traditional physical machine resources and runs workloads as virtual machines
• Each virtual machine runs a guest operating system and applications
• The operating system and applications don’t know that they are virtualized

17
Virtualization Overview (cont.)
​Hypervisors
• Partition computing resources of a server
for multiple virtual machines
• Hypervisors alone lack coordination for
higher availability and efficiency
• The VMware vSphere Hypervisor is ESXi

​VMware vSphere
• A suite of software that extends beyond basic host
partitioning by aggregating infrastructure
resources and providing additional services such as
dynamic resource scheduling
• Serves as the foundation of the software-defined
data center (SDDC)

18
Cloud Computing and the SDDC
​IT as a Service (ITaaS)
• Abstracts complexity in the enterprise data center
• Achieves economies of scale
• Renews focus on application services
– Availability
– Security
– Scalability

Management

Cloud OS

Enterprise Cloud

19
vSphere – Use Case Examples

Foundation for Business Solutions – “Adopt a More Agile Infrastructure”


• Security and Compliance: deploy cost-effective and adaptive security services built on vSphere
• Business Continuity: slash downtime with VMware vSphere High Availability, VMware vSphere Fault
Tolerance, VMware vSphere Distributed Resource Scheduler™, VMware vSphere vMotion, and VMware
vCenter™ Site Recovery Manager™
• Server Consolidation: cut capital and operating costs while increasing IT service delivery
• Business Critical Applications: increased agility and outstanding reliability

Foundation for Virtual Desktop – “Enable Data to Follow the User”


• vSphere is the Supporting Infrastructure for any VMware Horizon™ View™ Deployment
• Access a Virtual Desktop from Anywhere and with Any Device

Cloud Computing – “The Foundation for the VMware vRealize® Suite”


• vSphere Enables the Cloud and Choice (Private, Public, or Hybrid)
• Thousands of vCloud Providers Available Today
• Support for Existing and Future Applications

20
Foundation for the Software-Defined Enterprise
End User Desktop Mobile
Computing
Virtual Workspace

Applications Traditional Modern SaaS

Software-Defined Data Center


Policy-Based
Management and
Automation Cloud Automation Cloud Operations Cloud Business

Virtualized Infrastructure
Abstract and Pool
Hybrid Cloud
VMware and
vCloud Data Center
Partners
Compute Network Storage Abstraction
Abstraction = Abstraction = = Software-
Server Virtual Defined Storage Private Public
Virtualization Networking Clouds Clouds

Physical
Hardware
Compute Network Storage
21
vSphere 7 Adds Kubernetes to the VMKernel
vSphere 7 and VCF 4

Namespaces
DB & Analytics AI/ML Business Critical Time-critical

vCenter
Tanzu
Developer Compute Network Storage IT Operator
Kubernetes Grid
Services Services Services
Service

VMware Cloud Foundation Services

Data Center | Edge | Service Provider | Public Cloud

22
In April 2020 vSphere 7 Changed the SDDC
Bringing new features & vSphere with Kubernetes

23
vSphere 7 Update 1
Major Focus Areas

​Developer-Ready ​Scale Without ​Simplify Operations


Infrastructure Compromise

24
Architecture
vSphere 7.0.x

25
Agenda ​Architecture Overview ​vSphere with Tanzu

​VMware ESXi™ ​vSphere Clustering Services (vCLS)

​Virtual machines ​AMD SEV-ES

​VMware vCenter Server™ ​VMware vSphere Trust Authority


​VMware vSphere® vMotion®
​Virtual Disk Development Kit (VDDK)
​Availability

​Content Library

​VMware Certificate Authority (CA)

​Storage

​Networking

26
Architecture Overview

27
High-Level VMware vSphere Architectural Overview
VMware vSphere

VMware vCenter Server

Availability Scalability
Manage • VMware vSphere vMotion
Application • VMware vSphere Storage • DRS and DPM
vMotion •
Services •
Hot Add
VMware vSphere High
Availability • Over Commitment
• VMware vSphere FT • Content Library
• VMware Data Recovery

Cluster
Storage Network
• vSphere VMFS
• VMware Virtual • Standard vSwitch
Infrastructure ESXi ESXi ESXi Volumes • Distributed vSwitch
Services • VMware vSAN • VMware NSX
• Thin Provisioning • VMware vSphere
Network I/O Control
• vSphere Storage I/O
Control

28
Physical Resources
Introducing vSphere with Kubernetes
Transform your infrastructure to build, run and manage modern applications everywhere
vSphere
Developer Services

TKG Container
service service

Self-service Development

Developer
Network Registry
Volume service
service service

Application
focused
management Agile Operations

VI Admin

29
Enable vSphere with Kubernetes Supervisor Clusters

VI Admin

vCenter Server

vSphere Cluster

ESXi ESXi ESXi

Spherelet hostd Spherelet hostd Spherelet hostd DevOps

Pod Pod K8s Control Plane


VM

CRX Pod Pod


VM VM

30
VMware Tanzu Kubernetes Grid Service for vSphere
Developer Self Service

Custom Resources Tanzu Kubernetes Grid


CSI CNI Auth

Tanzu Kubernetes Cluster Tanzu Cluster Controller UI Integration


Tanzu Kubernetes Cluster
Pod Pod Pod Pod
Resource

Give me a cluster: Cluster Resource


Cluster API Controllers UI Integration
3 Nodes Machine Resources
Ctrl VM Ctrl VM Ctrl VM Cluster API Provider
Kubernetes 1.16
VirtualMachine Resources
Machine Class:
Node VM Node VM Node VM

Guaranteed-Small VM Operator UI Integration

Networking: Calico
… …
Supervisor Cluster

ESXi ESXi ESXi ESXi vCenter Server

31
Simplified Deployment and Consumption
vSphere with Tanzu

VCF
vSphere
VCF
with
with
with
Kubernetes
Tanzu
Tanzu
DB and Analytics AI/ML Business Critical Time-critical

Namespaces

VMware
vSphere
Cloud
withFoundation
Tanzu Services
Services

Tanzu Kubernetes
Grid Service vSphere Pod Registry Network Storage
Service Service Service Services
Developer IT Operator

vRealize vSphere vSAN vDS


NSX

Confidential │ ©2020 VMware, Inc. 32


vSphere With Tanzu: Drop-In To Existing Infrastructure
In vSphere 7 Update 1
Cluster

Services
TanzuTanzu Kubernetes
Kubernetes Grid Grid | vSphere Pods | Networks | Volumes | Registry

vSphere Distributed Switch

Management Portgroup

Frontend Portgroup

Workload Portgroup

K8S Control Plane K8S Control Plane K8S Control Plane HA Proxy

TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster
Node Node Node Node Node Node Node Node Node

ESXi ESXi ESXi ESXi ESXi

33
VMware ESXi

34
ESXi 7.0
​ESXi is bare metal VMware vSphere Hypervisor

​ESXi installs directly onto the physical server


enabling direct access to all server resources

VM
• ESXi is in control of all CPU, memory, network and
storage resources

wa
re
• Allows for virtual machines to be run at near native

E
performance, unlike hosted hypervisors

SX
i
​ESXi 7.0 allows
• Utilization of up to 768 Logical CPUs per host
• Utilization of up to 16 TB of RAM per host
• Deployment of up to 1024 virtual machines per host

35
ESXi Architecture

CLI Commands
for Configuration
ESXi Host
And Support

Agentless Agentless
Systems Hardware
Management Monitoring

VMware Common VMware VMware


Management Information Management Management
Framework Model (CIM) Framework Framework

Local Support Console (ESXi Shell)

VMkernel

Network and Storage


36
Virtual Machines

37
Virtual Machines
​The software computer and consumer of resources Virtual Machine
that ESXi is in charge of
App App App
​VMs are containers that can run any almost any
operating system and application. Operating System

​Segregated environment which does not cross


boundaries unless via network or otherwise permitted CPU RAM Disk Network /
through SDK access Video Cards

​Each VM has access to its own resources Keyboard Mouse SCSI CD / DVD
Controller

​VMs generally do not realize that they are virtualized


ESXi Host

38
Virtual Machine Architecture
​Virtual machines consist of files stored on a vSphere VMFS or NFS datastore
• Configuration file (.vmx)
• Swap files (.vswp)
• BIOS files (.nvram)
• Log files (.log)
• Template file (.vmtx)
• Raw device map file (<VM_name>-rdm.vmdk)
• Disk descriptor file (.vmdk)
• Disk data file (VM_name>-flat.vmdk)
• Suspend state file (.vmss)
• Snapshot data file (.vmsd)
• Snapshot state file (.vmsn)
• Snapshot disk file (<VM_name>-delta.vmdk)

39
VMware vCenter Server

40
VMware vCenter™ 7.0
​vCenter is the management platform for vSphere
environments
​Provides much of the feature set that comes with
vSphere, such as vSphere High Availability
​Also provides SDK access into the environment
for solutions such as VMware vRealize™
Automation™
vCenter Server is available as an appliance only
in vSphere 7.0 and beyond
​A single vCenter Server running version 7.0 can
manage:
• 2000 hosts
• 25,000 virtual machines

41
vCenter 7.0 Architecture
​In vCenter 7.0, the architecture is simplified dramatically. There
is only one architecture that is permitted, the Embedded
Architecture
​All services are provided by vCenter Server including:
• VMware vCenter Single Sign-On™
• License service
• Lookup service All services are provided from a
• VMware Directory Services vCenter Server instance
• VMware Certificate Authority
• vCenter Server
• VMware vSphere Client (HTML5)
• VMware vSphere Auto Deploy™
• VMware vSphere ESXi Dump Collector
• vSphere Syslog Service
• vSphere Update Manager

42
vCenter 7.0 Architecture (cont.)
​One architecture is supported as a result of this change

​Platform Services Controller is either Embedded or External to vCenter Server

​Choosing a mode depends on the size and feature requirements for the environment

External Platform Services Controller Embedded Platform Services Controller

Virtual Machine or Server Virtual Machine or Server

Platform Services
Controller Platform Services
Controller

vCenter Server
Virtual Machine or Server

vCenter Server

43
vCenter 7.0 Architecture (cont.)
​One architecture is supported in vSphere 7.0, the Embedded architecture from previous releases. Platform
Services Controllers can no longer be External to vCenter Server
​vCenter is also only available in appliance form factor, vCenter for Windows no longer is available.

Stand Alone vCenter Multiple vCenter Server Architecture


Server Architecture with Enhanced Linked Mode

44
vCenter 7.0 Architecture (cont.)
​Enhanced Linked Mode has the following maximums

Description Scalability Maximum

Maximum objects in a vSphere domain (users, groups, solution users) 1,000,000

Maximum number of Linked vCenter Servers 15

*Note: These are a point in time snapshot of the maximums taken April 2020. For the most up to date data, see http://configmax.vmware.com/

45
vCenter Architecture – ESXi and vCenter Server Communication
How vCenter Server components and ESXi hosts communicate

vCenter Server
& Platform
Services Controller

vpxd
TCP
443, 9443
TCP/UDP
902

hostd vpxa

ESXi Host

46
VMware vSphere vMotion

47
vSphere vMotion
​vSphere vMotion allows for live migration of
virtual machines between compatible ESXi
hosts
• Compatibility determined by CPU, network,
and storage access

​Encrypted vMotion is a feature of vSphere 7.0

​With vSphere 7.0, migrations can occur


• Between clusters
• Between datastores
• Between networks
• Between vCenter Servers
• Over long distances with an RTT of 150ms
• Cross-Cloud

48
vSphere vMotion Architecture
Long-Distance vSphere vMotion

Cross Continental
• Targeting cross continental USA
• Up to 150ms RTT

Performance

• Maintain standard vSphere


vMotion guarantees

49
vSphere vMotion Architecture
​vSphere vMotion involves transferring the entire
execution state of the virtual machine from the source
host to the destination
​Primarily happens over a high-speed network

​The execution state primarily consists of the following


components
• The virtual device state, including the state of the CPU,
network and disk adaptors, SVGA, and so on
• External connections with devices, including networking
and SCSI devices
• The virtual machine’s physical memory

​Generally a single ping is lost, and users do not even


know a VM has changed hosts

50
vMotion Improvements
Basic vMotion Workflow
1
Create VM on destination

2
Copy memory

3
Suspend VM on source

Source vMotion Network


4 Destination
Transfer Device State ESXi Host ESXi Host

5
Resume VM on destination Switch-over
Time of 1 sec

6
Power Off VM on source
Datastore
vMotion Improvements
Memory Copy

• During a vMotion, we keep


track of all changed memory
pages by using a page tracer.

• Changed (or dirtied) memory


pages are copied to the
destination ESXi again.

• Prior to vSphere 7, we
installed the page tracer on
all the vCPUs in a VM

• This caused significant


impact on workload
performance!

52
vMotion Improvements
Memory Copy Optimizations

• In vSphere 7, we claim
one vCPU to do all the
page tracing work during
a vMotion operation.

• Much more efficient by


improved page tracing.

• Greatly reduced
performance impact on
workload!

53
vMotion Improvements
Execution Switch-over Process

1
Quiesce VM on Source Switch-over
Time < 1 sec

2
Transfer checkpoint

3
Transfer changed bitmap Function of VM’s memory size
1 GB mem => 32 KB bitmap
24 TB mem => 768 MB bitmap
4
Transfer swap bitmap

5
Transfer remaining pages

6
Resume VM on Destination

54
vMotion Improvements
Memory Copy Optimizations

Example for VM with 24


• During the switch-over phase, the last TB Memory
memory bitmap is transferred.

• In previous versions, we transferred the


entire bitmap
1 41 125 8193
2s
Transmit full 768 MB bitmap
• 24 TB memory requires 768 MB bitmap

• In vSphere 7, we only transmit the


compacted bitmap.
1 41 125 8193

• Cuts time for a 24 TB VM from 2 seconds to


milliseconds! Transmit compacted bitmap
175ms

55
VMware vSphere Storage vMotion Architecture
​vSphere Storage vMotion works in very much the same
Read/Write
way as vSphere vMotion, only the disks are migrated
I/O to Virtual
instead Disk
​It works as follows
VM VM
1. Initiate storage migration
2. Use the VMkernel data mover or VMware vSphere Mirror Driver

Storage APIs - Array Integration (VAAI) to copy data


VMkernel Data Mover
3. Start a new virtual machine process
4. Use the mirror driver to mirror I/O calls to file blocks that
have already been copied to virtual disk on the destination
datastore Storage Array
5. Cut over to the destination VM process to begin accessing
the virtual disk copy
VAAI
Source Datastore Destination Datastore

56
vSphere Storage vMotion Architecture
Simultaneously Change Host and Storage

​vSphere vMotion also allows both storage and host to be changed at the same time

​In vSphere 6.x – the VM can be migrated between networks and vCenter Servers

ESXi Host Datastore Network vCenter

vSphere vMotion Network


Network A Network B

VMware ESXi VMware ESXi

vCenter vCenter
Server Server

57
Availability
VMware vSphere High Availability
VMware vSphere Fault Tolerance
VMware vSphere Distributed Resource Scheduler

58
Availability
VMware vSphere High Availability

59
vSphere High Availability
​vSphere High Availability is an availability solution
that monitors hosts and restarts virtual machines in the
case of a host failure
• OS and application-independent, requiring no complex
configuration changes
• Agents on the ESXi hosts monitor for the following
types of failures

Infrastructure Connectivity Application

Host failures Host network Guest OS


isolated hangs/crashes
VM crashes Datastore incurs Application
PDL or APD event hangs/crashes

Other Features available:


• VM component protection
• Proactive HA

60
vSphere High Availability Architecture – Overview
​Cluster of ESXi hosts created up to 64 hosts
• One of the hosts is elected as master when HA is enabled

​Availability heartbeats occur through network and storage

​HA’s agent communicates on the following networks by default


• Management network (or)
• VMware vSAN™ network (if vSAN is enabled)

Network heartbeats

Storage heartbeats

Master

61
vSphere High Availability Architecture – Host Failures

Master

62
vSphere High Availability Architecture – Host Failures

Master declares
Master slave host dead

63
vSphere High Availability Architecture – Host Failures

New master elected


and resumes master
Master duties

64
vSphere High Availability Architecture – Network Partition

A B

Master

65
vSphere High Availability Architecture – Host Isolation

Master

66
vSphere High Availability Architecture – VM Monitoring

Master

67
vSphere High Availability Architecture – VM Component Protection

Master
68
Availability
VMware vSphere Fault Tolerance

69
vSphere FT
​vSphere FT is an availability solution that provides
continuous availability for virtual machines
• Zero downtime
• Zero data loss

​No loss of TCP connections

​Completely transparent to guest software

​No dependency on guest OS, applications

​No application specific management and learning

​Supports up to 8 vCPUs and 128GB of RAM in


VMs with vSphere 7.0

70
vSphere FT Architecture
​vSphere FT creates two complete virtual machines when enabled with vSphere 7.x

​This includes a complete copy of


• VMX configuration files
• VMDK files including the ability to use separate datastores

Primary VM Secondary VM

.vmx file .vmx file

VMDK VMDK VMDK VMDK


Datastore 1 VM Network Datastore 2 VM Network

71
vSphere FT Architecture – Memory Checkpoint
​vSphere FT in vSphere 6.7 uses fast checkpoint technology
• This is similar to how vSphere vMotion works, but it is done continuously (rather than once)
• The fast checkpoint is a snapshot of all data not just memory (memory, disks, devices, and so on)
• vSphere FT logging network has a minimum requirement of 10 Gbps NIC

ESXi Host 1 ESXi Host 2

VM A VM A

Memory
bitmap

Fast Checkpoint Data


vSphere FT
Logging network

Production
network

VM End User
72
Availability
VMware vSphere Distributed Resource Scheduler
(DRS)

73
DRS
DRS
​DRS is a technology that monitors load and resource
usage and will use vSphere vMotion to balance virtual
machines across hosts in a cluster
• DRS also Includes VMware Distributed Power
Management (DPM) which allows for hosts to be evacuated
and powered off during periods of low utilization
​DRS uses vSphere vMotion functionality migrate VMs
VMware DPM
​Can be used in three ways
• Fully automated – where DRS acts on recommendations
automatically
• Partially automated – where DRS only acts for initial VM
power-on placement and an administrator has to approve
recommendations
• Manual – where administrator approval is required for all
movements

74
DRS Architecture
ESXi Host 1 ESXi Host 1
​DRS generates migration recommendations based
on how aggressive it has been configured
​For example
• The three hosts on the left side of the following
figure are unbalanced
ESXi Host 2 ESXi Host 2
• Host 1 has six virtual machines, its resources might
be overused while ample resources are available on
Host 2 and Host 3
• DRS migrates (or recommends the migration of)
virtual machines from Host 1 to Host 2 and Host 3
• On the right side of the diagram, the properly load
ESXi Host 3 ESXi Host 3
balanced configuration of the hosts that results
appears

75
Distributed Power Management Architecture
ESXi Host 1 ESXi Host 1
​DPM generates migration recommendations similar to
DRS, but in terms of achieving power savings
• It can be configured for how aggressively you want to
save power

​For example
ESXi Host 2 ESXi Host 2
• The three hosts on the left side of the following figure
have virtual machines running, but they are mostly idle
• DPM determines that given the load of the environment
shutting down Host 3 will not impact the level of
performance for the VMs
• DPM migrates (or recommends the migration of) virtual
ESXi Host 3 ESXi Host 3
machines from Host 3 to Host 2 and Host 1 and puts
Host 3 into standby mode
• On the right side of the diagram, the power managed Host
d by
configuration of the hosts appears Stan

76
Improved DRS in vSphere 7.0
Why?

• DRS was first released in 2006.

• Data centers and workloads have changed significantly since.

• Modern applications ask for a more workload centric approach.

• The DRS code is completely re-written to be more efficient.

• New DRS logic already in VMC on AWS M5 (UI exposed in M9).

77
Improved DRS
Compared with Previous Releases

Original DRS
• Cluster centric
• Runs every 5 min
• Uses cluster-wide standard
deviation model

Improved DRS
• Workload centric
• Runs every 1 min
• Uses the VM DRS Score
• Based on granted memory

78
Improved DRS
VM DRS Score

• VM DRS Score in buckets (0-20%, 20-40%, etc).


• Lower bucket score not necessarily means a VM is not running properly.
It’s about the execution efficiency of a VM.
• DRS calculates VM DRS Score for a VM on ESXi hosts in a cluster
• If another ESXi host can provide a lower score for the VM, DRS considers migration.

• VM DRS Score is calculated using i.e.:


• CPU %RDY (Ready) time
• Memory swap
• CPU cache behavior
• Headroom for workload to burst
• Migration cost

79
Improved DRS
Scalable Shares

​Challenge when not using DRS Scalable Shares:


No dynamic relative resource
entitlement.

VMs in resource pool set to


normal could get the same
resource entitlement as the high
share resource pool.

Higher share level does not


guarantee higher resource
entitlement.

80
Improved DRS
Scalable Shares

​DRS Scalable Shares enabled:


Relative resource entitlement to
other resource pools depending
on number of VMs in a
resource pool.

Setting share level to ‘high’


now ensures prioritization over
lower share VM entitlements.

Dynamic changed the share


allocation depending on the
number of VMs

81
Improved DRS
Scalable Shares

Cluster:
• Scalable Shares are
configured on cluster level
and/or resource pool level
• Not enabled by default
• Scalable Shares are used
by default for vSphere with
Kubernetes where a
Namespace = Resource Resource
Pool Pool:

82
Content Library

83
Content Library
​The Content Library is a distributed template, media and script library for vCenter Server

vCenter vCenter

321 23
Content Library Content Library
Subscribe
(Publisher) (Subscriber)

12 1 12 Sync
12 1 12

84
Content Library Architecture – Publication and Subscription
​Publication and subscription allow libraries to be shared between vCenter Servers

​Provides a single source for information that can be configured to download and sync according to
schedules or timeframes

vCenter vCenter

Templates HTTP GET

Other

Transfer Service Transfer Service


Subscribe using URL
Content Library Service Content Library Service

Subscription URL (to lib.json)

Password (optional)

85
Content Library Architecture – Content Synchronization
​Content Synchronization occurs when content changes

​Simple versioning used to denote the modification, and the item is transferred

vCenter vCenter

HTTP GET

Transfer Service VMware Content Transfer Service


Subscription Protocol Content Library Service
Content Library Service
(vCSP)

lib.json items.json item.json


VCDB VCDB

86
Content Library with vSphere 7.0 Improvements
VM Template Check-In/Check-Out & Versioning

Quickly find VM
Templates versions

​Check-out templates
for edits

​Check-in templates to
save changes made

​Revert to previous
versions

​Classic & New view


of Summary

87
Content Library
VM Template Check-In/Check-Out & Versioning

Quickly find VM
Templates versions

​Check-out templates
for edits

​Check-in templates to
save changes made

​Revert to previous
versions

​Classic & New view


of Summary

88
Content Library
VM Template Versioning Tab

​New Versioning tab


allows quick historical
view of edits

​Versioning info only


available when VM
Template is stored in
Content Library

89
Content Library
VM Template Versioning Tab

​New Versioning tab


allows quick historical
view of edits

​Versioning info only


available when VM
Template is stored in
Content Library

90
Content Library
Advanced Configuration and Optimization

​Advanced
Configurations in
Content Library

​Edit Auto-Sync
Frequency and
Performance
Optimization

91
Content Library
Developer Center >> API Explorer

​Easily find APIs for


Content Library
interaction &
configuration

​Execute (GET or POST)


commands directly in
vSphere Client

92
VMware Certificate Authority

93
VMware Certificate Authority
​In vSphere 7.x, vCenter ships with an internal Certificate Authority (CA) called the VMware Certificate
Authority
​Issues certificates for VMware components under its personal authority in the vSphere eco-system

​Runs as part of the Infrastructure Identity Core Service Group


• Directory service
• Certificate service
• Authentication framework

​VMware CA issues certificates only to clients that present credentials from VMDirectory in its own
identity domain
• It also posts its root certificate to its own server node in VMware Directory Services

94
How is the VMware Certificate Authority Used?
​Machine’s SSL certificate
• Used by reverse proxy on every vSphere node
• Used by the VMware Directory Service on Platform Services Controller and Embedded nodes
• Used by VPXD on Management and Embedded nodes

​Solution users’ certificates

​Single Sign-On signing certificates

95
Simplified Certificate Management
vSphere 6.x: Lots of Certificates

96
Simplified Certificate Management
vSphere 7: Much Simpler

97
Simplified Certificate Management
New Wizard for Certificate Import

98
Storage
iSCSI Storage Architecture
NFS Storage Architecture
Fibre Channel Architecture
Other Storage Architectural Concepts

99
Storage
​Both local and/or shared storage are a core
requirement for full utilization of ESXi features
VMware
​Many kinds of storage can be used with vSphere ESXi
hosts
• Local disks
• Fibre Channel (FC) SANs
• iSCSI SANs
Datastore
• NAS SANs types VMware vSphere VMFS NFS
• Virtual SAN
• Virtual Volumes (VVOLs)
File
​They are generally formatted either: system

• A VMFS file system Storage VSAN


technology Local
FC FCoE iSCSI or NAS
• The file system of the NFS Server Disks
VVOL

100
Storage – Protocol Features
​Each different protocol has its own set of supported features

​All major of features are supported by all protocols

Supports Boot Supports VMware Supports vSphere Supports Raw


Storage Protocol Supports DRS
from SAN vSphere vMotion High Availability Device Mapping

Fibre Channel ● ● ● ● ●

FCoE ● ● ● ● ●

iSCSI ● ● ● ● ●

NFS ● ● ●

Direct Attached Storage ● ●

vSAN ● ● ●

VMware Virtual Volumes ● ● ●

101
Storage
iSCSI Storage Architecture

102
Storage Architecture – iSCSI
​iSCSI storage utilizes regular IP traffic over a standard network to transport iSCSI commands

​The ESXi host connects through one of several types of iSCSI initiator

103
Storage Architecture – iSCSI Components
​All iSCSI systems share a common set of components that are used to provide the storage access

104
Storage Architecture – iSCSI Addressing
​Other than the standard IP addresses, iSCSI targets are identified by names as well

iSCSI target name:


iqn.1992-08.com.mycompany:stor1-47cf3c25
or
eui.fedcba9876543210
iSCSI alias: stor1
IP address: 192.168.36.101

iSCSI initiator name:


iqn.1998-01.com.vmware:train1-64ad4c29
or
eui.1234567890abcdef
iSCSI alias: train1
IP address: 192.168.36.88

105
Storage
NFS Storage Architecture

106
Storage Architecture – NFS Components
​Much like iSCSI, NFS accesses storage over the
network

NAS device or a Directory to share


server with storage with the ESXi host
over the network

ESXi host with NIC VMkernel port


mapped to virtual defined on virtual
switch switch

107
Storage Architecture – Addressing and Access Control with NFS
​ESXi Accesses NFS through NFS Server address /
name through a VMkernel port
​NFS version 4.1 and NFS version 3 are available with
vSphere 7.x
​Different features are supported with different versions
192.168.81.33
of the protocol
• NFS 4.1 supports multipathing unlike NFS 3
• NFS 3 supports all features, NFS 4.1 does not support
Storage DRS, VMware vSphere Storage I/O Control,
VMware vCenter Site Recovery Manager™, and Virtual
Volumes

​Dedicated switches are not required for NFS


192.168.81.72
configurations VMkernel port
configured with
IP address

108
Storage
Fibre Channel Architecture

109
Storage Architecture – Fibre Channel
​Unlike network storage such as NFS or iSCSI, Fibre Channel does not generally use an IP network for
storage Access.
• The exception here is when using Fibre Channel over Ethernet (FCoE)

110
Storage Architecture – Fibre Channel Addressing and Access Control
​Zoning and LUN masking are used for access control to storage LUNs

111
Storage Architecture – FCoE Adapters
​FCoE adapters allow access to Fibre Hardware FCoE Software FCoE
Channel Storage over Ethernet connections
ESXi Host ESXi 5.x Host
​Enables expansion to Fibre Channel SANs
when no Fibre Channel infrastructure exits Network FC Network Software
in many cases Driver Driver Driver FC

​Both hardware and software adapters Converged NIC


10 Gigabit
are allowed Network with FCoE
Ethernet
• Hardware adapters are often called
Adapter Support
converged network adapters (CNAs)
• Many times both a NIC and a HBA are FCoE Switch
presented from the single card in the
clients Ethernet IP Frames FC Frames to FC
to LAN Devices Storage Arrays

FC
LAN
SAN
112
Storage
Other Storage Architectural Concepts

113
Multipathing

​Multipathing enables continued access to


SAN LUNs if hardware fails
​It also can provide load balancing based
on the path policy selected

114
vSphere Storage I/O Control
With vSphere
Without vSphere
Storage
Storage I/O
I/O Control
Control

​vSphere Storage I/O Control allows traffic to Data Print Online Microsoft Data Print Online Microsoft

be prioritized during periods of contention


Mining Server Store Exchange Mining Server Store Exchange

• Brings the compute style shares/limits to


storage infrastructure

​Monitors device latency and acts when it over


exceeds a threshold
​Allows for important virtual machines to have
priority access to resources

During high I/O from non-critical application

115
Datastore Clusters
​A collection of datastores with shared resources similar to ESXi host clusters

​Allow for management to be done as a shared management interface

​Storage DRS can be used to manage the resource and ensure they are balanced

​Can be managed by using the following constructs


• Space utilization
• I/O latency load balancing
• Affinity rules for virtual disks

116
Software-Defined Storage
​Software-defined storage is a software construct which is
used by
• Virtual Volumes
• vSAN

​Uses storage policy-based management to assign policies


to virtual machines for storage access
​Policies are assigned on a per disk basis, rather than a per
datastore basis
​Key tenant to the software-defined data center

​vSAN is discussed in much greater detail in the VMware


vSAN Knowledge Transfer Kit.

117
Networking

118
Networking
​Networking is also a core resource for vSphere

​Two core types of switches are provided


• Standard virtual switches
– Virtual switch configuration for a single host
• Distributed virtual switches
– Data center level virtual switches that provide a consistent network configuration for virtual machines as they migrate across
multiple hosts

​There are two basic types of connectivity as well


• Virtual machine port groups
• VMkernel port groups
– For IP storage, vSphere vMotion migration, vSphere FT, vSAN, provisioning, and so on
– For the ESXi management network

119
Networking Architecture

VM1 VM2 VM3

Management
Network

VMkernel

Test VLAN 101


Production VLAN 102
IP Storage VLAN 103
Management VLAN 104
120
Network Architecture –
Standard Compared to Distributed
Distributed vSwitch
Standard vSwitch

121
Network Architecture – NIC Teaming and Load Balancing
​NIC Teaming enables multiple NICs to be connected to a single virtual switch for continued access to
networks if hardware fails
• This also can enables load balancing (if appropriate)

​Load Balancing Policies


• Route based on Originating Virtual Port
• Route based on Source MAC Hash
• Route based on IP Hash
• Route based on Physical NIC Load
• Use Explicit Failover Order

​Many available policies are configured on any type of switch


• Route based on Physical NIC Load is only available on VMware vSphere Distributed Switch™

122
VMware vSphere Network I/O Control
​vSphere Network I/O Control allows traffic to be
prioritized during periods of contention
• Brings the compute style of shares/limits to storage
infrastructure

​Monitors device latency and acts when it over


exceeds a threshold
​Allows important virtual machines or services to
Virtual Switch
have priority access to resources

10 GigE

123
Software-Defined Networking
​Software-Defined Networking is a software
construct that allows your physical network to be
treated as a pool of transport capacity, with
network and security services attached to VMs
with a policy-driven approach
​Decouples the network configuration from the
physical infrastructure
​Allows for security and micro-segmentation of
traffic
​Key tenant to the software-defined data center
(SDDC)

124
vSphere with Tanzu
vSphere 7 Update 1

125
Simplified Deployment and Consumption
vSphere with Tanzu

VCF
vSphere
VCF
with
with
with
Kubernetes
Tanzu
Tanzu
DB and Analytics AI/ML Business Critical Time-critical

Namespaces

VMware
vSphere
Cloud
withFoundation
Tanzu Services
Services

Tanzu Kubernetes
Grid Service vSphere Pod Registry Network Storage
Service Service Service Services
Developer IT Operator

vRealize vSphere vSAN vDS


NSX

Confidential │ ©2020 VMware, Inc. 126


Drop-in Enterprise Grade Kubernetes ​Deliver Developer
Into Existing Infrastructure Ready Infrastructure

Benefit
Configure an Enterprise grade Kubernetes infrastructure
on your choice of networking, storage and load
balancing solutions

vSphere with Tanzu


The FASTEST way to get started with
Kubernetes workloads

127
Building on the Best
vSphere with Tanzu Architecture

Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod

Namespace Namespace Namespace Namespace Namespace Namespace Namespace Namespace

Control Worker Worker Worker


Plane

VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM

Tanzu Kubernetes Cluster Tanzu Kubernetes Cluster Tanzu Kubernetes Cluster Tanzu Kubernetes Cluster

Namespace Namespace Namespace Namespace

Tanzu Tanzu
VM Operator Supervisor Cluster Cluster API Kubernetes VM Operator Supervisor Cluster Cluster API Kubernetes
Grid Grid

SDDC

Confidential │ ©2020 VMware, Inc. 128


vSphere With Tanzu: Drop-In To Existing Infrastructure
In vSphere 7 Update 1
Cluster

Services
TanzuTanzu Kubernetes
Kubernetes Grid Grid | vSphere Pods | Networks | Volumes | Registry

vSphere Distributed Switch

Management Portgroup

Frontend Portgroup

Workload Portgroup

K8S Control Plane K8S Control Plane K8S Control Plane HA Proxy

TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster TKG Cluster
Node Node Node Node Node Node Node Node Node

ESXi ESXi ESXi ESXi ESXi

129
Announcing vSphere With Tanzu
The simplest implementation of Kubernetes
brought to the fingertips of millions of IT admins

Available
today NEW

VCF with Tanzu vSphere with Tanzu


The BEST way to run Kubernetes The FASTEST way to get started with
workloads at scale Kubernetes workloads

Confidential │ ©2020 VMware, Inc. 130


vSphere Clustering Services
(vCLS)

131
vSphere Clustering Services (vCLS)
Distributing the vSphere Cluster Services

HA vCenter Server DRS


Challenge:
Many vSphere services depend on
vCenter Server being available in order
Cluster to operate

Small footprint vCLS Control Plane DRS


Solution:
‘agent VMs’
Separate & distribute control plane
components for vSphere Clustering
Services
• First step of many, in a good direction
vSphere ESXi
• Begins with DRS in vSphere 7U1

132
AMD SEV-ES
vSphere 7 Update 1

133
I want them to know I
We put
cannot seea inside
lot of trust
their
in the infrastructure.
workloads!

We’d likewetoknow
How do limit a
How
Can wedo limit
we assure
our
exposure and
vSphere Admin addisn’t
customers
risk? of
defense-in-depth.
watching us?
Wider Interest in
privacy?

Infrastructure Security
High-profile hardware issues
means people asking the right
questions

vSphere
Workload Admins
Admins, CISO,
Risk and Compliance Auditors

134
vSphere Isolation Protects Workloads From Each Other
ESXi Defense-in-Depth & Least Privilege, enhanced with SEV-ES

Guest OS Guest OS Guest OS

VM Runtime VM Runtime VM Runtime

Sandbox Sandbox Sandbox

ESXi
Encryption Key A Encryption Key B Encryption Key C
CPU and Memory

AMD Secure Processor

135
Security is Always a Tradeoff
…but it’s very nice to have great options

Considerations Benefits
AMD SEV-ES
Requires AMD EPYC 7xx2 CPUs Workloads gain deep data-in-use
protections without modification!
Requires guest OS support
Coexists with other workloads
vMotion, memory snapshots, hot-add,
suspend/resume, Fault Tolerance, clones, and Containers & modern applications (Tanzu)
guest integrity not supported make most operational considerations
invisible
Support SEV-ES (memory encryption +
encrypted register state), not just SEV Easy to enable & operate (PowerCLI
command for the VM)

Confidential │ ©2020 VMware, Inc. 136


VMware vSphere Trust Authority
vSphere 7 Update 1

137
Establishing Trust in Hardware Can Be Troublesome
vSphere Trust Authority

X =

Securing your infrastructure used to require a complicated


series of actions be performed on each host.

138
Establishing Trust in vSphere 6.7
A great start

​vCenter Server handles


attestation

​vCenter Server handles secrets


vCenter Server Key
Provider ​vCenter Server running in a
VM inside the cluster

​No repercussions for failing


attestation

ESXi Hosts

139
vSphere Trust Authority Automates & Enforces The Rules
Secure Infrastructure at Scale

​Attestation is a prerequisite for


access to secrets

​KMS credentials sealed to host


vCenter Server Key state
Provider
​Trusted hosts managed keys and
KMS connections

​Can encrypt vCenter Server

​Gain isolation

ESXi Hosts vTA Hosts


running attestation svc

140
Improvements to vSphere Trust Authority
Based on customer feedback

​UI and UX ​Better “Day 2” Cluster ​Improving Reporting and


Improvements Operations Alerting

141
Virtual Disk Development Kit
(VDDK)
vSphere 7 Update 1

142
VDDK Improvements
NIOC resource pool for backup network traffic
Network I/O Control helps prioritize and resolve conflicts when network traffic competes for
resources. Now backup traffic can be prioritized as well.

Improved backup job resiliency


Automatic switching of backup jobs to alternate hosts when the original host
enters maintenance mode, reducing the chance of job failure.

NBD (Network Block Device) updates


vCenter Server 7 Update 1 uses hostd instead of vpxa to manage NBD connections. The
recommended number of NBD connections to one host for
parallel backups remains at 50 or less.

Confidential │ ©2020 VMware, Inc. 143


Technical Walk Through
vSphere 7.0.x

144
Agenda ​VMware ESXi™

​Virtual Machines

​VMware vCenter Server™

​VMware vSphere vMotion®

​Availability

​VMware Certificate Authority (CA)

​Storage

​Networking

​Lifecycle Manager(vLCM)

​EVC for Graphics

​Paravirtual RDMA 145


Technical Walk-Through
​The technical walk-through expands on the architectural presentation to provide more detailed technical
best practice and troubleshooting information for each topic
​This is not comprehensive coverage of each topic

​If you require more detailed information use the VMware vSphere Documentation (
https://docs.vmware.com/en/VMware-vSphere/index.html) and
VMware Global Support Services might be of assistance

146
ESXi

147
Components of ESXi
​The ESXi architecture comprises the underlying operating system, called the VMkernel, and processes that
run on top of it
​VMkernel provides a means for running all processes on the system, including management applications
and agents as well as virtual machines
​It has control of all hardware devices on the server and manages resources for the applications

​The main processes that run on top of VMkernel are


• Direct Console User Interface (DCUI)
• Virtual Machine Monitor (VMM)
• VMware Agents (hostd, vpxa)
• Common Information Model (CIM) System

148
Components of ESXi (cont.)
​Direct Console User Interface
• Low-level configuration and management interface, accessible through the console of the server, used primarily
for initial basic configuration

​Virtual Machine Monitor


• Process that provides the execution environment for a virtual machine, as well as a helper process known as
VMX. Each running virtual machine has its own VMM and VMX process

​VMware Agents (hostd and vpxa)


• Used to enable high-level VMware Infrastructure™ management from remote applications

​Common Information Model System


• Interface that enables hardware-level management from remote applications through a set of standard APIs

149
ESXi Deep Dive
​VMkernel
• A POSIX-like operating system developed by VMware, which provides certain functionality similar to that found
in other operating systems, such as process creation and control, signals, file system, and process threads
• Designed specifically to support running multiple virtual machines and provides such core functionality such as
– Resource scheduling
– I/O stacks
– Device drivers
• Some of the more pertinent aspects of the VMkernel are presented in the following sections

150
ESXi Deep Dive (cont.)
​File System
• VMkernel uses a simple in-memory file system to hold the ESXi Server configuration files, log files, and staged
patches
• The file system structure is designed to be the same as that used in the service console of traditional ESX Server.
For example
– ESX Server configuration files are found in /etc/vmware
– Log files are found in /var/log/vmware
– Staged patches are uploaded to /tmp
• This file system is independent of the VMware vSphere VMFS file system used to store virtual machines
• The in-memory file system does not persist when the power is shut down. Therefore, log files do not survive a
reboot if no scratch partition is configured
• ESXi has the ability to configure a remote syslog server and remote dump server, enabling you to save all log
information on an external system

151
ESXi Deep Dive (cont.)
​User Worlds
• The term user world refers to a process running in the VMkernel operating system. The environment in which a
user world runs is limited compared to is found in a general-purpose POSIX-compliant operating system such as
Linux
– The set of available signals is limited
– The system API is a subset of POSIX
– The /proc file system is very limited
• A single swap file is available for all user world processes. If a local disk exists, the swap file is created
automatically in a small VFAT partition. Otherwise, the user is free to set up a swap file on one of the attached
VMFS datastores
• Several important processes run in user worlds. Think of these as native VMkernel applications. They are
described in the following sections

152
ESXi Deep Dive (cont.)
​Direct Console User Interface (DCUI)
• DCUI is the local user interface that is displayed only on the console of an ESXi system
• It provides a BIOS-like, menu-driven interface for interacting with the system. Its main purpose is initial
configuration and troubleshooting
• The DCUI configuration tasks include
– Set administrative password
– Set Lockdown mode (if attached to VMware vCenter™)
– Configure and revert networking tasks
• Troubleshooting tasks include
– Perform simple network tests
– View logs
– Restart agents
– Restore defaults

153
ESXi Deep Dive (cont.)
​Other User World Processes
• Agents used by VMware to implement certain management capabilities have been ported from running in the
service console to running in user worlds
– The hostd process provides a programmatic interface to VMkernel, and it is used by direct VMware vSphere Client™
connections as well as APIs. It is the process that authenticates users and keeps track of which users and groups have which
privileges
– The vpxa process is the agent used to connect to vCenter. It runs as a special system user called vpxuser. It acts as the
intermediary between the hostd agent and vCenter Server
– The FDM agent used to provide vSphere High Availability capabilities has also been ported from running in the service
console to running in its own user world
– A syslog daemon runs as a user world. If you enable remote logging, that daemon forwards all log files to the remote target in
addition to putting them in local files
– A process that handles initial discovery of an iSCSI target, after which point all iSCSI traffic is handled by the VMkernel, just
as it handles any other device driver

154
ESXi Deep Dive (cont.)
​Open Network Ports – A limited number of network ports are open on ESXi. The most important ports and
services are
• 80 – This port serves a reverse proxy that is open only to display a static Web page that you see when browsing to
the server. Otherwise, this port redirects all traffic to port 443 to provide SSL-encrypted communications to the
ESXi Server
• 443 (reverse proxy) – This port also acts as a reverse proxy to a number of services to provide SSL-encrypted
communication to these services. The services include API access to the host, which provides access to the
RCLIs, the vSphere Client, vCenter Server, and the SDK
• 5989 – This port is open for the CIM server, which is an interface for third-party management tools
• 902 – This port is open to support the older VIM API, specifically the older versions of the vSphere Client and
vCenter
• Many other ports depending on what is configured (vSphere High Availability, vSphere vMotion, and so on) have
their own port requirements, but this are only opened if these services are configured

155
ESXi Troubleshooting
​Troubleshooting ESXi is very much the same as any operating system

​Start by narrowing down the component which is causing the problem

​Next review the logs as required to narrow down the issue


• Common log files are as follows
– /var/log/auth.log: ESXi Shell authentication success and failure
– /var/log/esxupdate.log: ESXi patch and update installation logs
– /var/log/hostd.log: Host management service logs, including virtual machine and host Task and Events,
communication with the vSphere Client and vCenter Server vpxa agent, and SDK connections
– /var/log/syslog.log: Management service initialization, watchdogs, scheduled tasks and DCUI use
– /var/log/vmkernel.log: Core VMkernel logs, including device discovery, storage and networking device and driver
events, and virtual machine startup
– /var/log/vmkwarning.log: A summary of Warning and Alert log messages excerpted from the VMkernel logs
– /var/log/vmksummary.log: A summary of ESXi host startup and shutdown, and an hourly heartbeat with uptime,
number of virtual machines running, and service resource consumption
– /var/log/vpxa.log: vCenter Server vpxa agent logs, including communication with vCenter Server and the Host
Management hostd agent
– /var/log/fdm.log: vSphere High Availability logs, produced by the FDM service

156
ESXi Best Practices
​For in depth ESXi and other component practices, read the Performance Best Practices Guide (
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vsphere-esxi-vcenter-s
erver-67-performance-best-practices.pdf
)
​Always set up the VMware vSphere Syslog Collector (Windows) / VMware Syslog Service (Appliance) to remotely
collect and store the ESXi log files
​Always set up the VMware vSphere ESXi Dump Collector Service to allow dumps to be remotely collected in the case
of a VMkernel failure
​Ensure that only the firewall ports required by running services are enabled in the Security profile

​Ensure the management network is isolated from the general network (VLAN) to decrease the attack surface of the
hosts
​Ensure the management network has redundancy through NIC Teaming or by having multiple management interfaces

​Ensure that the ESXi Shell and SSH connectivity are not permanently enabled

157
VM Troubleshooting and Best
Practices

158
Virtual Machine Troubleshooting
​Virtual machines run as processes on the ESXi host

​Troubleshooting is split into two categories


• Inside the Guest OS – Standard OS troubleshooting should be used, including the OS-specific log files
• ESXi host level troubleshooting – Concerning the virtual machine process, where the log file for the virtual
machine is reviewed for errors

​ESXi host virtual machine log files are located in the directory which the virtual machine runs by default,
and are named vmware.log
​Generally issues occur as a result of a problem in the guest OS
• Host level crashes of the VM processes are relatively rare and are normally a result of hardware errors or
compatibility of hardware between hosts

159
Virtual Machine Best Practices
​Virtual machines should always run VMware Tools™ to ensure that the correct drivers are installed for virtual hardware

​Right-size VMs to ensure that they use only required hardware. If VMs are provisioned with an over-allocation of
resources that are not used, ESXi host performance and capacity is reduced
​Any devices not being used should be disconnected from VMs (CD-ROM/DVD, floppy, and so on)

​If NUMA is used on ESXi, VMs should be right-sized to the size of the NUMA nodes on the host to avoid performance
loss
​VMs should be stored on shared storage to allow for the maximum vSphere vMotion compatibility and vSphere High
Availability configurations in a cluster
​Memory/CPU reservations should not be used regularly because they reserve the resource and can prevent the VMware
vSphere Hypervisor from being able to take advantage of over commitment technologies
​VMs partitions should be aligned to the storage array partition alignment

​Storage and Network I/O Control can dramatically help VM performance in times of contention

160
vCenter Server

161
vCenter Server 7.0

Virtual Machine or Server

Platform Services Sufficient for most environments


Controller

vCenter Server Easiest to maintain and deploy

Multiple embedded PSCs can be linked together


with Enhanced Linked Mode.

Virtual Machine or Server Virtual Machine or Server Available as an appliance only.

Platform Services Platform Services


Controller Controller

vCenter Server vCenter Server

162
vCenter Appliance Deployment
•Changed Significantly!
•Installer support for Windows, Mac, and Linux
•Updated menu: Install, Upgrade, Migrate, Restore
•No longer supports external databases!
•VMware vSphere Update Manager included
•vCenter Appliance (incl. PSC Install) is a two stage process
• Stage 1 – Deploy OVF
• Stage 2 - Configuration

Benefits to 2-Stage Deployment


• Improved validations and checks
• Manual snapshot between stages for rollback
• Create a template for additional deployments

163
vCenter Appliance Migration – 7.0

​Adds support for migrating Windows vCenter 6.x to the vCenter 7.0 Appliance
• Windows is no longer supported architecture, therefore migration is required.

​Migrations for both embedded and external topologies


• External topologies not supported in vSphere 7.0.

​vSphere Update Manager included with the appliance installation

​Assumes the identity of the source vCenter (UUID, IP, OS Name)

​Migration Assistant pre-checks

​Option to select historical and performance data

164
vCenter Best Practices
​Verify that vCenter, the Platform Services Controller, and any database have adequate CPU, memory, and
disk resources available
​Verify that the proper inventory size is configured during the installation

​Minimize latency between components (vCenter and Platform Services Controller) by minimizing network
hops between components
​External databases should be used for large deployments

​If using Enhanced Linked Mode, VMware recommends having external Platform Services Controllers

​Verify that DNS is configured and functional for all components

​Verify that time is correct on vCenter and all other components in the environment

​VMware vSphere Update Manager™ for Windows should be installed on a separate system if inventory is
large

165
vSphere vMotion

166
vSphere vMotion
Troubleshooting and Best Practices

167
167
vSphere vMotion and vSphere Storage vMotion Troubleshooting
​vSphere vMotion and vSphere Storage vMotion are some of the best logged features in vSphere

​Each migration that occurs has a unique Migration ID (MID) that can be used to search logs for the vSphere
vMotion and vSphere Storage vMotion
• MIDs look as follows: 1295599672867508

​Each time a vSphere vMotion and vSphere Storage vMotion is attempted, all logs can be reviewed to find the
error using grep and searching for the term Migrate
​Both the source and the destination logs should be reviewed

​The following is a list of common log files and errors


• VMKernel.log – VMkernel logs usually contain storage or network errors (and possibly vSphere vMotion and
vSphere Storage vMotion timeouts)
• hostd.log – contains interactions between vCenter and ESXi
• vmware.log – virtual machine log file which will show issues with starting the virtual machine processes
• vpxd.log – vSphere vMotion as seen from vCenter normally shows a timeout or other irrelevant data because the
errors are occurring on the host itself

168
vSphere vMotion Troubleshooting – Example vmkernel.log – Source

2016-10-21T16:47:04.555Z cpu0:305224)Migrate: vm 305226:


InitMigration:3215: Setting VMOTION info: Source ts = 1295599672867508,
Migration
src ip = <10.0.0.3> dest ipID
= <10.0.0.1> Dest wid = 407254 using SHARED
swap

2016-10-21T16:47:04.571Z cpu0:305224)Migrate: StateSet:158:


1295599672867508 S: Changing state from 0 (None) to 1 (Starting migration
off)
2016-10-21T16:47:04.572Z cpu0:305224)Migrate: StateSet:158:
1295599672867508 S: Changing state from 1 (Starting migration off) to 3
(Precopying memory)
S: for Source
2016-10-21T16:47:04.587Z cpu1:3589)Migrate:
VMotionServerReadFromPendingCnx:192: Remote machine is ESX 4.0 or newer.

2016-10-21T16:47:05.155Z cpu1:588763)VMotionSend: PreCopyStart:1294:


1295599672867508 S: Starting Precopy, remote version 327683

2016-10-21T16:47:07.985Z cpu1:305226)VMotion: MemPreCopyIterDone:3927:


1295599672867508 S: Stopping pre-copy: only 156 pages left to send, which
can be sent within the switchover time goal of 1.000 seconds (network
bandwidth ~44.454 MB/s, 51865% t2d)
2016-10-21T16:47:07.991Z cpu1:305226)VMotion: PreCopyDone:3259:
169
169
vMotion Troubleshooting – Example vmkernel.log – Destination
Migration
2016-10-21T16:45:35.156Z cpu1:409301)Migrate: ID is theInitMigration:3215:
vm 407254:
Setting VMOTION info: Dest ts = 1295599672867508, src ip = <10.0.0.3> dest
same on the Destination
ip = <10.0.0.1> Dest wid = 0 using SHARED swap

2016-10-21T16:45:35.190Z cpu1:409301)Migrate: StateSet:158: 1295599672867508


D: Changing state from 0 (None) to 2 (migration on)
D: for Destination
2016-10-21T16:45:35.432Z cpu0:3556)Migrate:
VMotionServerReadFromPendingCnx:192: Remote machine is ESX 4.0 or newer.

2016-10-21T16:45:36.101Z cpu1:409308)VMotionRecv: PreCopyStart:416:


1295599672867508 D: got MIGRATE_MSG_PRECOPY_START
2016-10-21T16:45:36.101Z cpu1:409308)Migrate: StateSet:158: 1295599672867508
D: Changing state from 2 (Starting migration on) to 3 (Precopying memory)

2016-10-21T16:45:38.831Z cpu0:409308)VMotionRecv: PreCopyEnd:466:


1295599672867508 D: got MIGRATE_MSG_PRECOPY_END
2016-10-21T16:45:38.831Z cpu0:409308)VMotionRecv: PreCopyEnd:478:
1295599672867508 D: Estimated network bandwidth 44.611 MB/s during pre-copy
2016-10-21T16:45:38.917Z cpu0:409308)Migrate: StateSet:158: 1295599672867508
D: Changing state from 3 (Precopying memory) to 5 (Transferring cpt data)
2016-10-21T16:45:39.070Z cpu0:409308)Migrate: StateSet:158: 1295599672867508
D: Changing state from 5 (Transferring cpt data) to 6 (Loading cpt data)

170
170
vSphere vMotion Best Practices
​ESXi host hardware should be as similar as possible to avoid failures

​VMware Virtual Machine Hardware compatibility is important to avoid failures as newer hardware
revisions cannot be run on older ESXi hosts
​10 Gb networking will improve vSphere vMotion performance

​vSphere vMotion networking should be segregated form other traffic to prevent saturation of network links

​Multiple network cards can be configured for vSphere vMotion VMkernel networking to improve
performance of migrations

171
vSphere Storage vMotion Best Practices
​If vSphere Storage vMotion traffic takes place on storage that might also have other I/O loads (from other VMs on the
same ESXi host or from other hosts), it can further reduce the available bandwidth, so it should be done during times
when there will be less impact
​vSphere Storage vMotion will have the highest performance during times of low storage activity (when available
storage bandwidth is highest) and when the workload in the VM being moved is least active
​vSphere Storage vMotion can perform up to four simultaneous disk copies per vSphere Storage vMotion operation.
However, vSphere Storage vMotion will involve each datastore in no more than one disk copy at any one time. This
means, for example, that moving four VMDK files from datastore A to datastore B will occur serially, but moving four
VMDK files from datastores A, B, C, and D to datastores E, F, G, and H will occur in parallel
​For performance-critical vSphere Storage vMotion operations involving VMs with multiple VMDK files, you can use
anti-affinity rules to spread the VMDK files across multiple datastores, thus ensuring simultaneous disk copies
​vSphere Storage vMotion will often have significantly better performance on vStorage APIs for Array Integration
(VAAI)-Capable storage arrays

172
Availability
vSphere High Availability

173
vSphere High Availability Deep Dive
​In the vSphere High Availability architecture, each host in the cluster runs an FDM agent

​The FDM agents do not use vpxa and are completely decoupled from it

​The agent (or FDM) on one host is the master, and the agents on all other hosts are its slaves

​When vSphere High Availability is enabled, all FDM agents participate in an election to choose the master

​The agent that wins the election becomes the master

​If the host that is serving as the master should subsequently fail, be shutdown, or need to abdicate its role, a
new master election is held

174
vSphere High Availability Deep Dive – Role of the Master
​A master monitors ESXi hosts and VM availability

​A master will monitor slave hosts and it will restart VMs in the event of a slave host failure

​It manages the list of hosts that are members of the cluster and manages adding and removing hosts from
the cluster
​It monitors the power state of all the protected VMs, and if one should fail, it will restart the VM

​It manages the list of protected VMs and updates this list after each user-initiated power on or power off

​It sends heartbeats to the slaves so the slaves know the master is alive

​It caches the cluster configuration and informs the slaves of changes in configuration

​A master will reports state information to vCenter through property updates

175
vSphere High Availability Deep Dive – Role of the Slave
​A slave monitors the runtime state of the VMs running locally and forwards significant state changes to the
master
​It implements vSphere High Availability features that do not require central coordination, most notably VM
health monitoring
​It monitors the health of the master, and if the master should fail, it participates in a new master election

176
vSphere High Availability Deep Dive – Master and Slave Summary Views

Master Slave
View View

177
vSphere High Availability Deep Dive – Master Election
​A master is elected when the following conditions occur
• vSphere High Availability is enabled
• A master host fails
• A management network partition occurs

​The following algorithm is used for selecting the master


• If a host has the greatest number of datastores, it is the best host
• If there is a tie, then the host with the lexically highest moid is chosen. For example moid "host-99" would be
higher than moid "host-100" since 9 is greater than 1

​After a master is elected and contacts vCenter, vCenter sends a compatibility list to the master which saves
it on its local disk, and then pushes it out to the slave hosts in the cluster
​vCenter normally only talks to a master. It will sometimes talk to FDM agents on other hosts, especially if
master states that it cannot reach the slave agent. vCenter will try to contact the other host to figure out
why

178
vSphere High Availability Deep Dive – Partitioning
​Under normal operating conditions, there is only one master

​However, if a management network failure occurs, a subset of the hosts might become isolated. This means
that they cannot communicate with the other hosts in the cluster over the management network
​In such a situation, when the hosts can continue to ping the isolation response IP, but not other hosts, FDM
is called network partitioned
​Each partition without an existing master will elect a new one

​Thus, a partitioned cluster state will have multiple masters, one per partition

​However, vCenter cannot report back on more than one master, so you could be getting only one partition
details – the master that vCenter finds first
​When a network partition is corrected, one of the masters will take over from the others, thus reverting
back to a single master

179
vSphere High Availability Deep Dive – Isolation
​In some ways this is similar to a network partition state, except that a host can no longer ping the default
gateway/isolation IP address
​In this case, a host is called network isolated

​The host has the ability to inform the master that it is in this isolation state, through files on the heartbeat
datastores, which will be discussed shortly
​Then the Host Isolation Response is checked to see when the VMs on this host should be shut down or left
powered on
​If they are powered off, they can be restarted on other hosts in the cluster

180
vSphere High Availability Deep Dive – Virtual Machine Protection
​The master is responsible for restarting any protected VMs that fail

​The trigger to protect a VM is the master observing that the power state of the VM changes from powered
off to powered on
​The trigger to unprotect a VM is the master observing the VM’s power state changing from powered on to
power off
​After the master protects the VM, the master will inform vCenter that the VM has been protected, and
vCenter will report this fact through the vSphere High Availability Protection runtime property of the VM

181
HA Troubleshooting and Best
Practices

182
vSphere High Availability Troubleshooting
​Troubleshooting vSphere High Availability since vSphere 5.x is greatly simplified
• Agents were upgraded from using a third party component to using a component built by VMware called Fault
Domain Manager (FDM)

​A single log file, fdm.log, now exists for communication of all events related to vSphere High
Availability
​When troubleshooting a vSphere High Availability failure, be sure to collect logs from all hosts in the
cluster
• This is because when a vSphere High Availability event occurs, VMs might be moved to any host in the cluster.
To track all events, the FDM log for each host (including the master host) is required

​This should be the first point of call for


• Partitioning issues
• Isolation issues
• VM protection issues
• Election issues
• Failure to failover issues
183
vSphere High Availability Best Practices
​Networking
• When performing maintenance use the host network maintenance feature to suspend vSphere High Availability
monitoring
• When changing networking configuration, always reconfigure vSphere High Availability afterwards
• Specify which networks are used for vSphere High Availability communication. By default, this is the
management network
• Specify isolation addresses as appropriate for the cluster, if the default gateway does not allow for ICMP pings
• Network paths should be redundant to avoid isolations of vSphere High Availability

184
vSphere High Availability Best Practices (cont.)
​Interoperability
• Do not mix versions of ESXi in the same cluster
• vSAN uses its network for vSphere High Availability, rather than the default
• When enabling vSAN, vSphere High Availability should be disabled first and then enabled

​Admission Control
• Select the policy that best matches the need in the environment
• Do not disable admission control or VMs might not all be able to fail over if an event occurs
• Size hosts equally to prevent imbalances

185
Availability
vSphere FT

186
vSphere FT Troubleshooting
​vSphere FT has been completely rewritten in vSphere 6.x and beyond

​Now, CPU compatibility is the same as vSphere vMotion compatibility because the same technology is
used to ship memory, CPU, storage, and network states across to the secondary virtual machine
​When troubleshooting
• Get logs for both primary and secondary VMs and hosts
• Grab logs before log rotation
• Ensure time is synchronized on all hosts

​When reviewing the configuration, you should find both primary and secondary VMX logs in the primary
VMs directory
• They will named vmware.log and vmware-snd.log

​Also, be sure to review vmkernel.log and hostd.log from both the primary and secondary hosts for
errors

187
vSphere FT Troubleshooting – General Things To Look For (vmkernel, vmx)
​2016-10-17T18:12:25.892Z cpu3:35660)FTCpt: 2401: (1389982345707340120 pri) Primary init: nonce
2791343341
​2016-10-17T18:12:25.892Z cpu3:35660)FTCpt: 2440: (1389982345707340120 pri) Setting
allowedDiffCount = 64
​2016-10-17T18:12:25.892Z cpu3:35660)FTCpt: 1217: Queued accept request for ftPairID
1389982345707340120
​2016-10-17T18:12:25.892Z cpu3:35660)FTCpt: 2531: (1389982345707340120 pri) vmx 35660 vmm
35662
​2016-10-17T18:12:25.892Z cpu1:32805)FTCpt: 1262: (1389982345707340120 pri) Waiting for
connection
vSphere FT messages will prefix with “FTCpt:”

Like vSphere vMotion, vSphere FT sessions have an vSphere FT id unique


identifier taken from the migration ID that started it

The role of the VM is either “pri” or “snd”

188
vSphere FT Troubleshooting – Legacy vSphere FT or vSphere FT?

vmware.log file

• Search for: “ftcpt.enabled”


• If present and set to “TRUE”: FT
• Otherwise, legacy vSphere FT
• Important for triaging failures

189
vSphere FT Troubleshooting – Has vSphere FT Started?

vmkernel.log
• 2016-10-17T14:32:13.607Z cpu5:89619)FTCpt: 3831:
(1389969072618873992 pri) Start stamp: 2016-10-
17T14:32:13.607Z nonce 409806199
•…
• 2016-10-17T14:46:23.860Z cpu2:89657)FTCpt: 9821:
(1389969072618873992 pri) Last ack stamp: 2016-10-
17T14:46:15.639Z nonce 409806199
vmware.log
• 2016-10-21T22:56:01.635Z| vcpu-0| I120: FTCpt:
Activated ftcpt in VMM.
If you do not see these, vSphere FT may not have started

Check for XvMotion migration errors

190
vSphere FT Best Practices
​Hosts running primary and secondary VMs should run at approximately the same processor frequency to
avoid errors
• Homogeneous clusters work best for vSphere FT
​All hosts should have
• Common access to datastores used by VMs
• The same virtual network configuration
• The same BIOS settings (power management, hyper threading, and so on)
​FT Logging networks should be configured with 10 Gb networking connections
​Jumbo frames can also help performance of vSphere FT
​Network configuration should be
• Distribute each NIC team over two physical switches
• Deterministic teaming policies to ensure network traffic affinity
ISOs should be stored on shared storage

191
Availability
vSphere Distributed Resource Scheduler

192
DRS
Troubleshooting and Best Practices

193
DRS Troubleshooting
​DRS uses a proprietary algorithm to assess and determine resource usage and to determine which hosts to
balance VMs to
​DRS primarily uses vMotion to facilitate movements
• Troubleshooting failures generally consist of figuring out why vMotion failed, and not DRS itself as the algorithm
just follows resource utilization

​Ensure the following


• vSphere vMotion is enabled and configured
• The migration aggressiveness is set appropriately
• Fully automated if approvals are not needed for migrations

​To test DRS, from the vSphere Web Client, select the Run DRS option, which will initiate
recommendations
​Failures can be assessed and corrected at that time

194
DRS Best Practices
​Hosts should be as homogeneous as possible to ensure predictability of DRS placements

​vSphere vMotion should be compatible for all hosts or DRS will not function

​The more hosts available, the better DRS functions because there are more options for available placement
of VMs
​VMs that have a smaller CPU/RAM footprint provide more opportunities for placement across hosts

​DRS Automatic mode should be used to take full benefit of DRS

​Idle VMs can affect DRS placement decisions

​DRS affinity should be used to keep VMs apart, such as in the case of a load balanced configuration
providing high availability

195
VMware Certificate Authority

196
VMware CA – Management Tools
​A set of CLIs allows management of VMware CA, VMware Endpoint Certificate Store, and VMware
Directory Service are available
​certool
• Use to generate private keys, public keys
• Use to request a certificate
• Used to promote a plain Certificate Server to a Root CA

​dir-cli
• Use to create/delete/list/manage solution users in VMDirectory

​vecs-cli
• Use to create/delete/list/manage key stores in VMware Endpoint Certificate Store
• Use to create/delete/list/manage private keys and certificates in the key stores
• Use to manage the permissions on the key stores

197
VMware CA – Management Tools (cont.)
By default, the tools are in the following locations

Platform Location

Linux /usr/lib/vmware-vmafd/bin/vecs-cli

/usr/lib/vmware-vmafd/bin/dir-cli

/usr/lib/vmware-vmca/bin/certool

198
certool Configuration File
​certool uses a configuration file called certool.cfg
• override by using the --config=<file name> or --Locality=“Cork”

OS Location
VCSA /usr/lib/vmware-vmca/share/config

certool.cfg
Country = US
Name= cert
Organization = VMware
OrgUnit = Support
State = California
Locality = Palo Alto
IPAddress = 127.0.0.1
Email = ca@vmware.com
Hostname = machine.vmware.com

199
Machine SSL Certificates
​The SSL certificates for each node, also called machine certificates, are used to establish a socket that
allows secure communication. For example, using HTTPS or LDAPS
​During installation, VMware CA provisions each machine (vCenter / ESXi) with an SSL certificate
• Used for secure connections to other services and for other HTTPS traffic

​The machine SSL certificate is used as follows


• By the reverse proxy service on each Platform Service Controller node
SSL connections to individual vCenter services always go to the reverse proxy. Traffic does not go to the services
themselves
• vCenter service on Management and Embedded nodes
• By the VMware Directory Service on PSC and Embedded nodes
• By the ESXi host for all secure connections

200
Solution User Certificates
​Solution user certificate are used for authentication to vCenter Single Sign-On
• Issues the SAML tokens that allow services and other users to authenticate

​Each solution user must be authenticated to vCenter Single Sign-On


• A solution user encapsulates several services and uses the certificates to authenticate with vCenter Single Sign-On
through SAML token exchange

​The Security Assertion Markup Language (SAML) token contains group membership information so that
the SAML token could be used for authorization operations
​Solution user certificates enable the solution user to use any other vCenter service that vCenter Single
Sign-On supports without authenticating

201
Certificate Deployment Options
​VMware CA Certificates
• You can use the certificates that VMware CA assigned to vSphere components as is
– These certificates are stored in the VMware Endpoint Certificate Store on each machine
– VMware CA is a Certificate Authority, but because all certificates are signed by VMware CA itself, the certificates do not
include a certificate chain

​Third-Party Certificates with VMware CA


• You can use third-party certificates with VMware CA
– VMware CA becomes an intermediary in the certificate chain that the third-party certificate is using
– VMware CA provisions vSphere components that you add to the environment with certificates that are signed by the full
chain
– Administrators are responsible for replacing all certificates that are already in your environment with new certificates

202
VMware CA Best Practices
​Replacement of the certificates is not required to have trusted connections
• VMware CA is a CA, and therefore, all certificates used by vSphere components are fully valid and trusted
certificates
• Addition of the VMware CA as a trusted root certificate will allow the SSL warnings to be eliminated

​Integration of VMware CA to an existing CA infrastructure should be done in secure environments


• This allows the root certificate to be replaced, such that it acts as a subordinate CA to the existing infrastructure

203
Storage

204
Storage Troubleshooting
​Troubleshooting storage is a broad topic that very much depends on the type of storage in use

​Consult the vendor to determine what is normal and expected for storage

​In general, the following are problems that are frequently seen
• Overloaded storage
• Slow storage

205
Problem 1 – Overloaded Storage
​Monitor the number of disk commands aborted on the host
• If Disk Command Aborts > 0 for any LUN, then storage is overloaded on that LUN

​What are the causes of overloaded storage?


• Excessive demand is placed on the storage device
• Storage is misconfigured
• Check
– Number of disks per LUN
– RAID level of a LUN
– Assignment of array cache to a LUN

206
Problem 2 – Slow Storage
​For a host’s LUNs, monitor Physical Device Read Latency and Physical Device Write Latency counters
• If average > 10ms or peak > 20ms for any LUN, then storage might be slow on that LUN

​Or monitor the device latency (DAVG/cmd) in resxtop/esxtop.


• If value > 10, this might be a problem
• If value > 20, this is a problem

​Three main workload factors that affect storage response time


• I/O arrival rate
• I/O size
• I/O locality

​Use the storage device’s monitoring tools to collect data to characterize the workload

207
Example 1 – Bad Disk Throughput

Low Device
Good Throughput
Latency

Bad Throughput

High Device Latency


(Due To Disabled
Cache)

208
208
Example 2 – Virtual Machine Power On Is Slow
​User complaint – Powering on a virtual machine takes longer than usual
• Sometimes, powering on a virtual machine takes 5 seconds
• Other times, powering on a virtual machine takes 5 minutes!

​What do you check?


• Check the disk metrics for the host. This is because powering on a virtual machine requires disk activity

209
Monitoring Disk Latency Using the vSphere Web Client

Maximum disk
latencies range from
100ms to 1100ms

This is very high

210
Using esxtop to Examine Slow VM Power On
​Rule of thumb
• GAVG/cmd > 20ms = high latency!

​What does this mean?


• Latency when command reaches device is high.
• Latency as seen by the guest is high.
• Low KAVG/cmd means command is not queuing in VMkernel

Very Large Values


for DAVG/cmd
and GAVG/cmd 211
Storage Troubleshooting – Resolving Performance Problems
​Consider the following when resolving storage performance problems
• Check your hardware for proper operation and optimal configuration
• Reduce the need for storage by your hosts and virtual machines
• Balance the load across available storage
• Understand the load being placed on storage devices

​To resolve the problems of slow or overloaded storage, solutions can include the following
• Verify that hardware is working properly
• Configure the HBAs and RAID controllers for optimal use
• Upgrade your hardware, if possible

​Consider the trade-off between memory capacity and storage demand


• Some applications, such as databases, cache frequently used data in memory, thus reducing storage loads

​Eliminate all possible swapping to reduce the burden on the storage subsystem

212
Storage Troubleshooting – Balancing the Load
​Spread I/O loads over the available paths to the
storage
​For disk-intensive workloads
• Use enough HBAs to handle the load
• If necessary, separate storage processors to
separate systems

213
Storage Troubleshooting – Understanding Load
​Understand the workload
• Use storage array tools
to capture workload statistics

​Strive for complementary workloads


• Mix disk-intensive with non-disk-intensive
virtual machines on a datastore
• Mix virtual machines with different peak access
times

214
Storage Best Practices – Fibre Channel
​Best practices for Fibre Channel arrays
• Place only one VMFS datastore on each LUN
• Do not change the path policy the system sets for you unless you understand the implications of making such a
change
• Document everything. Include information about zoning, access control, storage, switch, server and FC HBA
configuration, software and firmware versions, and storage cable plan
• Plan for failure
– Make several copies of your topology maps. For each element, consider what happens to your SAN if the element fails
– Cross off different links, switches, HBAs and other elements to ensure you did not miss a critical failure point in your design
• Ensure that the Fibre Channel HBAs are installed in the correct slots in the host, based on slot and bus speed.
Balance PCI bus load among the available busses in the server
• Become familiar with the various monitor points in your storage network, at all visibility points, including host's
performance charts, FC switch statistics, and storage performance statistics
• Be cautious when changing IDs of the LUNs that have VMFS datastores being used by your ESXi host. If you
change the ID, the datastore becomes inactive and its virtual machines fail

215
Storage Best Practices – iSCSI
​Best practices for iSCSI arrays
• Place only one VMFS datastore on each LUN. Multiple VMFS datastores on one LUN is not recommended
• Do not change the path policy the system sets for you unless you understand the implications of making such a
change
• Document everything. Include information about configuration, access control, storage, switch, server and iSCSI
HBA configuration, software and firmware versions, and storage cable plan
• Plan for failure
– Make several copies of your topology maps. For each element, consider what happens to your SAN if the element fails
– Cross off different links, switches, HBAs, and other elements to ensure you did not miss a critical failure point in your design
• Ensure that the iSCSI HBAs are installed in the correct slots in the ESXi host, based on slot and bus speed.
Balance PCI bus load among the available busses in the server
• If you need to change the default iSCSI name of your iSCSI adapter, make sure the name you enter is worldwide
unique and properly formatted. To avoid storage access problems, never assign the same iSCSI name to different
adapters, even on different hosts

216
Storage Best Practices – NFS
​Best practices for NFS arrays
• Make sure that NFS servers you use are listed in the VMware Hardware Compatibility List. Use the correct
version for the server firmware
• When configuring NFS storage, follow the recommendations from your storage vendor
• Verify that the NFS volume is exported using NFS over TCP
• Verify that the NFS server exports a particular share as either NFS 3 or NFS 4.1, but does not provide both
protocol versions for the same share. This policy needs to be enforced by the server because ESXi does not
prevent mounting the same share through different NFS versions
• NFS 3 and non-Kerberos NFS 4.1 do not support the delegate user functionality that enables access to NFS
volumes using nonroot credentials. Typically, this is done on the NAS servers by using the no_root_squash option
• If the underlying NFS volume, on which files are stored, is read-only, make sure that the volume is exported as a
read-only share by the NFS server, or configure it as a read-only datastore on the ESXi host. Otherwise, the host
considers the datastore to be read-write and might not be able to open the files

217
Networking

218
Networking Troubleshooting
​Troubleshooting networking is very similar to physical network troubleshooting

​Start by validating connectivity


• Look at network statistics from esxtop as well as the physical switch

​Is it a network performance problem?


• Validate throughput
• Is CPU load too high?

​Are packets being dropped?

​Is the issue limited to the virtual environment, or is it seen in the physical environment too?

​One of the biggest issues that VMware has observed is dropped network packets (discussed next)

219
Network Troubleshooting – Dropped Network Packets
​Network packets are queued in buffers if the
• Destination is not ready to receive them (Rx)
• Network is too busy to send them (Tx)

​Buffers are finite in size


• Virtual NIC devices buffer packets when they cannot be handled immediately
• If the queue in the virtual NIC fills, packets are buffered by the virtual switch port
• Packets are dropped if the virtual switch port fills

220
Example Problem 1 – Dropped Receive Packets
​If a host’s droppedRx value > 0, there is a network throughput issue

Cause Solution

Increase CPU resources provided to virtual machine

High CPU utilization


Increase the efficiency with which the virtual machine uses CPU resources

Tune network stack in the guest operating system


Improper guest operating system
driver configuration Add virtual NICs to the virtual machine and spread network load
across them

221
Example Problem 2 – Dropped Transmit Packets
​If a host’s dropped TX value > 0, there is a network throughput issue

Cause Solution

Add uplink capacity to the virtual switch

Traffic from the set of virtual Move some virtual machines with high network demand to a different virtual
machines sharing a virtual switch switch
exceeds the physical capabilities
of the uplink NICs or the
networking infrastructure Enhance the networking infrastructure

Reduce network traffic

222
Networking Best Practices
​CPU plays a large role in performance of virtual networking. More CPUs, therefore, will generally result in
better network performance
​Sharing physical NICs is good for redundancy, but it can impact other consumers if the link is overutilized.
Carefully choose the policies and how items are shared
​Traffic between virtual machines on the same system does not need to go external to the host if they are on
the same virtual switch. Consider this when designing the network
​Distributed vSwitches should be used whenever possible because they offer greater granularity on traffic
flow than standard vSwitches
​vSphere Network and Storage I/O Control can dramatically help with contention on systems. This should
be used whenever possible
​VMware Tools, and subsequently VMXNET3 drivers, should be used in all virtual machines to allow for
enhanced network capabilities

223
Lifecycle Manager (vLCM)
vSphere 7 Update 1

224
vLCM and NSX-T Integration
Manage NSX-T Lifecycle with vLCM

Manage your NSX-T


lifecycle via vLCM
vCenter
1. Supported in the upcoming NSX-T
Server 7
Update 1 release
Data Center
2. NSX Manager leverages vLCM Image
manager to enable:

Installation of NSX-T
Upgrade of NSX-T
Uninstall of NSX-T
Add/Remove/Move a host in & out of
ESXi 7 ESXi 7 ESXi 7 vLCM-enabled clusters
Update 1 Update 1 Update 1

225
Install NSX-T on a vLCM Enabled Cluster
vLCM Image Manager will add NSX-T components to vLCM cluster image

Attach a TNP (Transport node profiles


created) and only use vDS on all the
hosts in a cluster

Upon configuring NSX-T, Image is


updated with NSX-T components and
vLCM starts the remediation of all the
hosts in serial

226
Configure NSX-T on a vLCM Enabled Cluster
Configuration Management Aspects of NSX-T using vLCM

ADD Host
NSX Manager will update the TNP (Transport node
profile)
vLCM will automatically starts the remediation and
install NSX-T components to the newly added ESXi host
Add/Remove host
Remove Host
NSX manager will remove the TNP (Transport node
profile)
vLCM will un-install NSX-T components

NSX manager now gets the feedback from vLCM if


there is any drift with respect to NSX components
NSX manager also has the capability to Resolve the
Drift management same. Resolve will trigger vLCM remediation tasks
and will ensure that NSX-T components are in
desired state

227
Upgrade NSX-T on a vLCM Enabled Cluster
Upgrade NSX-T workflow will update the vLCM Image and triggers vLCM remediation

​Sequential Process

Set
Stage Remediate
Solution

Stage the upgrade Set vLCM Import the upgrade


components Image components
Set desired Starts the
state remediation

228
EVC for Graphics
vSphere 7 Update 1

229
Graphics-enabled VMs can now consume a consistent set of
features seamlessly across varied hardware.

Supports vMotion! Comprehensive compatibility checks ensure


EVC For Graphics feature requirements are met.

Initial support for a single graphics mode called “baseline,”


corresponding to Direct3D 10.1 &
OpenGL 3.3 specifications.

Confidential │ ©2020 VMware, Inc. 230


Accessing EVC For Graphics

​Graphics Mode (vSGA) can


be accessed from the “Change
EVC Mode” settings option.

​Available at the cluster & per-


VM level.

​vSGA = Virtual Shared


Graphics Acceleration

231
Paravirtual RDMA
vSphere 7 Update 1

232
Virtual machines can now use vRDMA devices
✓ to communicate with other endpoints that are RDMA-
enabled, but not virtualized

Paravirtual RDMA Enhanced performance for applications & clusters that


Support For ✓ use RDMA to communicate
with storage devices & arrays
Native Endpoints

Considerations for vMotion, VM


✓ hardware versions, namespace support
in ESXi, and guest OS

Confidential │ ©2020 VMware, Inc. 233


vRDMA to vRDMA Communication

QPx1 QPy1 Apps in the guest VM exchange connection


VM1, App VM1, App information such as resource numbers used for the
QPx1 QPx1 connection.

VM1 QPx1 PhysQPy2


Control Control
Channel Channel
PhysQPy2 VM2QPy2 vRDMA peers create a control channel

Paravirtual RDMA
connection between each and send mappings of
the virtual to physical resource number.

Support For vRDMA vRDMA

Native Endpoints
Underlying hardware
communicates using the
physical resource numbers.

ESX1, QPx2 ESX1, QPx2

QPx2 QPy2
HCA HCA

Confidential │ ©2020 VMware, Inc. 234


vRDMA to Native Endpoint Communication

QPx1 QPy
VM1, App App
QPx1

Control
Channel

Paravirtual RDMA Native vRDMA assumes that a peer is native endpoint

Support For vRDMA Endpoint


QPy
if the control connection cannot be made.

Native Endpoints

ESX1, QPx1

QPx1 QPy
HCA HCA

Confidential │ ©2020 VMware, Inc. 235


vRDMA Support For Native Endpoint

Prerequisites

ESXi hosts must have namespace support enabled.

The Guest OS must have kernel and user-level support for


namespaces.
Namespaces support is upstream in Linux kernel 5.5+
User-level support is provided by using the rdma-core library

This feature requires VM Hardware version 18.

236
Consumption Activities
vSphere 7.0.x

237
Agenda ​Introduction

​Updating Hosts with vSphere Update Manager

​Performing a vSphere HA failover

​Perform vMotion Operations

​Create a virtual machine and explain VMware Tools

​Deploy Tanzu (vSphere 7 Update 1)

238
Introduction

239
Value - Adopt
Delivery Overview

See One Do One Teach One


Create assets Mentor customer individuals as they Enabled customer individuals pass on the
Demonstrate the use case in action run the use case knowledge

Provide technical guidance and Implement prescribed process Provide mentoring to the identified
assistance on applying and Mentor appropriately trained individuals utilizing the defined
customer individuals to be workflows (manual or
customizing VMware solutions for automated) and runbooks for the workflows to apply and customize
their specific use cases responsible for the use case the VMware solution
specific use case

240
Updating Hosts with vSphere
Update Manager
Explanation and Demo

241
vSphere Update Manager
• Updating a vSphere environment is imperative to having a secure and stable environment
• Can update a single host, or a cluster of hosts based on availability or other business requirements
• Demo shows an example of updating a host.

(Video automatically plays on the next slide, you can load directly from here: https://youtu.be/6eYfh18dqBc)

242
243
Intelligent vLCM Updates for vSAN Deployments
vSAN Fault Domain and Stretched Cluster awareness

​vSAN Cluster with Fault Domains


​vSAN Fault Domain
awareness for intelligent 1 2 3 4

updates with vSphere


Lifecycle Manager (vLCM)

​Update sequence is serialized


across fault domains Rack 1
Fault Domain 1
Rack 2
Fault Domain 2
Rack 3
Fault Domain 3
Rack 4
Fault Domain 4

​A fault domain will only


process once the prior has
been completed
​vSAN Stretched Cluster

​Honors availability zones 1 2


throughout the lifecycle ISL
process vSAN traffic

Preferred Site Secondary Site


Fault Domain 1 Fault Domain 2

244
Performing a vSphere HA failover
Explanation and Demo

245
Performing a “Controlled” vSphere HA failover
• vSphere HA allows for downtime to be minimized by restarting failed Virtual Machines on other hosts
• Seeing a failure occur, and the expected behavior of a failure is imperative to operating the environment
successfully
• The demo shows an example of a host failover due to a failure.

(Video automatically plays on the next slide, you can load directly from here: https://youtu.be/oawYgjLgIUk)

246
247
Perform vMotion Operations
Explanation and Demo

248
Performing vMotion Operations
• vSphere vMotion allows for live running VMs to be moved to a different host without any downtime
• Being able to perform a vMotion will allow for maintenance to be performed
• The demo shows an example of how to perform a vMotion

(Video automatically plays on the next slide, you can load directly from here: https://youtu.be/ctWOxZwm8C0)

249
250
Create a virtual machine and
explain VMware Tools
Explanation and Demo

251
Create a virtual machine and explain VMware Tools
• Virtual machines are a core tenant of vSphere.
• Creating and operating VMs can be slightly different than a physical environment
• The demo shows an example of creating a VM and installing VMware Tools

(Video automatically plays on the next slide, you can load directly from here: https://youtu.be/7AGZjcZ7p8I)
252
253
Deploy Tanzu
vSphere 7 Update 1

254
Less Than 1 Hour to Deploy Tanzu

255
Thank You

You might also like