You are on page 1of 38

Dell Integrated System

for Microsoft Azure


Stack HCI – Jigsaw
Release
January-2023
EKT Agenda

This session provides an overview of:


• What's new with HCI OS 22H2 in Azure Stack HCI
• New support for Intel E810-C 100Gb NICs
• New support for Mellanox CX-6 25Gb NICs
• New support for Nvidia A16 GPUs

Internal Use - Confidential 2 of Y © Copyright 2020 Dell Inc.


AX Support for HCI OS
22H2
Mike Mankovsky

Internal Use - Confidential 7 of Y © Copyright 2020 Dell Inc.


HCI OS 22H2 – New Features for AX
• Network ATC
• Intent based network deployment
• Simplified virtual network configuration
• Incorporates Microsoft best practices
• Supports Scalable and Switchless deployments
• Some caveats
• GPU-P
• GPU partitioning now supported on the A2 and A16 GPUs

Internal Use - Confidential 8 of Y © Copyright 2020 Dell Inc.


HCI OS 22H2 – What doesn't change?
• Current HCI OS deployment options remain the same.
• PowerShell based cluster deployment still
recommended.
• Majority of the existing steps will remain the same.
• Host/Cluster networking is the exception.
• WAC based cluster deployment has improved
but results vary.

Internal Use - Confidential 9 of Y © Copyright 2020 Dell Inc.


HCI OS 22H2 – Windows Features
• The list of Windows Features being utilized in 22H2 include:
• BitLocker
• Data Center Bridging
• Failover Clustering
• File Server
• FS-Data-Deduplication module
• Hyper-V
• Hyper-V PowerShell
• RSAT-AD-Clustering-PowerShell module
• RSAT-AD-PowerShell module
• NetworkATC
• NetworkHUD*
• SMB Bandwidth Limit
• Storage Replica (for stretched clusters)

*The Network HUD feature should be installed. However, Network HUD is still under evaluation by Dell
Engineering for support in a future release. Guidance will be included in the documentation at that time.

Internal Use - Confidential 10 of Y © Copyright 2020 Dell Inc.


HCI OS 22H2 – Security
• “Secured-by-default”
• More than 200 security settings enabled by default within the OS
• Enables customers to closely meet Center for Internet Security
(CIS) benchmark and Defense Information System Agency
(DISA) Security Technical Implementation Guide (STIG)
requirements for the OS
• Improves security posture by disabling legacy protocols and
ciphers

Internal Use - Confidential 11 of Y © Copyright 2020 Dell Inc.


Network ATC
Mike Mankovsky

Internal Use - Confidential 12 of Y © Copyright 2020 Dell Inc.


Network ATC – Overview
• With 22H2 the default, documented network deployment will
utilize Network ATC.
• Network ATC advantages over manual deployment:
• Reduces network configuration deployment time, complexity, and errors
due to erroneous input
• Uses the latest Microsoft validated and network configuration best
practices
• Ensures configuration consistency across the nodes in the cluster
• Eliminate configuration drift with periodic consistency checks every 15
minutes

• Supported for both Windows Admin Center and PowerShell


deployment methods
Internal Use - Confidential 13 of Y © Copyright 2020 Dell Inc.
Network ATC – Windows Admin Center

Internal Use - Confidential 14 of Y © Copyright 2020 Dell Inc.


Network ATC – Requirements
• Prior to working with Network ATC, the following prerequisites
must be met:
• All physical nodes to be used in the cluster must be Azure Stack HCI
certified
• Azure Stack HCI OS 22H2 must be deployed on the nodes you are
clustering
• The latest network adapter drivers and firmware must be installed
• The Network ATC Windows feature (and other dependent features) must
be installed on the nodes
• Each network adapter to be used with Network ATC must use the same
name across all nodes in the cluster
• Nodes must be cabled according to the desired network topology

Internal Use - Confidential 15 of Y © Copyright 2020 Dell Inc.


Network ATC – Intent Types
• Management: Used for node management access. Can be defined in a maximum
of one intent.
• Compute: Used for VM traffic. Can be defined in more than one intent.
• Storage: Used for SMB traffic. Can be defined in a maximum of one intent only.
• Stretch: Setup in similar manner as storage intent (but no RDMA)
• Issues have been found with Stretch intents during Engineering validation. Therefore, support
for Stretch intents will be dependent on resolution of these issues.
Notes:
• Network Intents to be used in clustered mode only.
• Only same physical adapters can be used in an intent. 10GbE and 25GbE cannot
be combined in the same intent.
• Cannot combine OCP and PCIe NIC card for creating an intent even if it is of the
same manufacturer and network speed.
Internal Use - Confidential 16 of Y © Copyright 2020 Dell Inc.
Network ATC – Windows Admin Center
(Intents)

Internal Use - Confidential 17 of Y © Copyright 2020 Dell Inc.


Network ATC – Overrides
• In the event the Microsoft validated and supported network
configuration best practice settings need to be changed,
“overrides” for Network ATC will need to be configured and
applied
• Overrides allow the user to customize various settings for storage
adapters, global cluster settings, and QoS to tailor to their specific
environment
• For more information, see:
Manage Network ATC - Azure Stack HCI | Microsoft Learn

Internal Use - Confidential 18 of Y © Copyright 2020 Dell Inc.


Network ATC – Overrides
Storage Overrides Global Cluster Overrides QoS Policy Overrides

• New-NetIntentStorageOverrides • New-NetIntentGlobalClusterOverrides • New-NetIntentQosPolicyOverrides


– Can be used to disable automation IP – Can be used for manually setting the – Can be used to change the QoS priority
generation for the storage NICs Virtual Machine Migration Performance values and bandwidth percentage
– Applied via the “Add-NetIntent” Option (I.E. Compressed or SMB) and allocated for SMB and Cluster use
cmdlet with the -StorageOverrides manually setting the Maximum SMB – Applied via the “Add-NetIntent”
parameter Migration Bandwidth in Gbps cmdlet with the -QoSPolicyOverrides
– Applied via the “Set-NetIntent” cmdlet parameter
with the -GlobalClusterOverrides
parameter

Example: Example: Example:

$storageOverrides = New-NetIntentStorageOverrides $globabClusterOverrides = New- $qosOverrides = New-NetIntentQosPolicyOverrides


$storageOverrides.EnableAutomaticIPGeneration = $false NetIntentGlobalClusterOverrides $qosOverrides.PriorityValue8021Action_SMB = 3
$globalClusterOverrides.VirtualMachinePerformanceOption = $qosOverrides.PriorityValue8021Action_Cluster = 5
Add-NetIntent -Name "Storage" -Storage -AdapterName “SMB” $qosOverrides.BandwidthPercentage_SMB = 50
"SLOT 3 Port 1","SLOT 3 Port 2" -StorageVlans 204,205 $qosOverrides.BandwidthPercentage_Cluster = 2
-StorageOverrides $storageOverrides Set-NetIntent -GlobalClusterOverrides
$globalClusterOverrides Add-NetIntent -Name "Storage" -Storage -AdapterName
"SLOT 3 Port 1","SLOT 3 Port 2" -StorageVlans 204,205 -
QoSPolicyOverrides $qosOverrides

Internal Use - Confidential 19 of Y © Copyright 2020 Dell Inc.


Network ATC – Topologies Validated
Fully Converged Non-Converged Switchless (Storage)

• Support for two-node,


three-node, and four-node
switchless topologies
• For more information, see:
Network ATC: Common
Preview Questions -
Microsoft Community Hub

Internal Use - Confidential 20 of Y © Copyright 2020 Dell Inc.


GPU-P
Jose Cruz

Internal Use - Confidential 21 of Y © Copyright 2020 Dell Inc.


GPU-P – Overview

• What is GPU-P?
• GPU partitioning allows for a physical GPU to be shared with multiple
VMs.
• This offering has been available on Azure since 2019 and is now
available on Azure Stack HCI.
• Requires HCI OS 22H2 on the AX nodes and valid NVIDIA licenses.
• Initial AX support includes the Nvidia A2 and A16 GPUs.
• Additional GPU models will be supported in a future releases.

Internal Use - Confidential 22 of Y © Copyright 2020 Dell Inc.


GPU-P – Caveats
• Requires a separate GPU-P specific driver for the Host, and
Guest OS
• If installed, the DDA driver must be un-installed before installing the GPU-
P driver

• Currently, live migration with V-GPU is not supported


• VMs can be automatically restarted and placed where GPU resources
are available

• Using Windows Admin Center is recommended over PowerShell


• Additional info can be found here: Partition and share GPUs with Azure Stack HCI virtual machines - Azure Stack
HCI | Microsoft Learn

Internal Use - Confidential 23 of Y © Copyright 2020 Dell Inc.


GPU-P – High-Level Deployment
• Prerequisites complete:
• HCI OS 22H2 cluster has been deployed with physical GPUs installed
• Same GPU model in each server in the cluster
• GPU host drivers installed
• Supported Guest OS deployed with GPU driver installed
• WAC instance with GPU extension installed (2.8.0 or higher)

• Configure the partition count of the GPU. It is recommended to keep all GPU Partitions counts the same across all
GPUS.

• Simply Assign the GPU partition to a VM, and install the Guest driver and activate the License.

Internal Use - Confidential 24 of Y © Copyright 2020 Dell Inc.


GPU-P – High-Level Deployment VM
Assignment
• Simply Assign the GPU partition to a VM and install the
Guest driver and activate the License.

Internal Use - Confidential 25 of Y © Copyright 2020 Dell Inc.


GPU-P – High-Level Deployment
Continued
• On the Guest VM, once the correct GPU-P driver is installed and
license is active the following will appear as a pop up:

• Failure to supply a license will result on the GPU throttling after 20


minutes and after 24 hours CUDA will stop working and will be
limited to 3 FPS.
• Only one GPU-P is allowed per VM.

Internal Use - Confidential 26 of Y © Copyright 2020 Dell Inc.


GPU-P License Options

• Onsite License Server:


• Complex Set up and requires onsite provisioning, only recommended for
offline cases such as sites with limited internet Access or very high-
volume scenarios, render farms etc.
• Guest Driver is pointed towards the License Server IP for authentication.

• Cloud License Server:


• Low effort, Nvidia hosts the Cloud instance for you and configuration is
minimal and all done on the web portal.
• Generates a file token, which you place on a Guest VM to activate. Highly
recommended for most customers.
• For details see : https://docs.nvidia.com/grid/14.0/grid-licensing-user-guide/index.html#how-grid-licensing-works
Internal Use - Confidential 27 of Y © Copyright 2020 Dell Inc.
Nvidia A16 GPUs
Jose Cruz

Internal Use - Confidential 28 of Y © Copyright 2020 Dell Inc.


Nvidia A16 GPU - Overview

• 250W, 64GB Passive, Double Wide, Full Height GPU


• Supported on the AX-750 and AX-7525 with FL riser configurations
– Same as A30 GPU

• Supports DDA or GPU-P

• AX-750 • AX-7525

Internal Use - Confidential 29 of Y © Copyright 2020 Dell Inc.


Intel E810-C NICs
Mohaimin Sadiq

Internal Use - Confidential 30 of Y © Copyright 2020 Dell Inc.


Intel E810-C - Overview
• 2x100Gbps adapter (100Gb Max)
• iWARP alternative to the Mellanox CX6 Dx
• The card supports both iWARP and RoCEv2
• iWARP is supported on release.
• RoCEv2 support is planned for a future release.

• The Intel E810-CQDA2 100Gb adapter is available in FH and LP form.


Server Models FH (85F8F) LP (DWNRF)

AX-750

AX-650
X
AX-7525
(FH only supported on
Riser Config 3-2 HL)

Internal Use - Confidential 31 of Y © Copyright 2020 Dell Inc.


Intel E810-C - Deployment
• Supported in Scalable and Switchless (Storage) network
topologies.
• Supported for both PowerShell and WAC based cluster
deployments.

Internal Use - Confidential 32 of Y © Copyright 2020 Dell Inc.


Mellanox CX6 Lx NICs
Mohaimin Sadiq

Internal Use - Confidential 33 of Y © Copyright 2020 Dell Inc.


Mellanox ConnectX-6 Lx - Overview
• 2x25Gbps NIC (updated version of the CX5)
• RoCEv2 only
• The CX6 Lx adapter is available in FH and LP form.
Server Models FH (85F8F) LP (DWNRF)

AX-750

AX-650
X
AX-7525

(FH only supported on


Riser Config 3-2 HL)

Internal Use - Confidential 34 of Y © Copyright 2020 Dell Inc.


Mellanox ConnectX-6 Lx - Deployment
• Supported in Scalable and Switchless network
topologies.
• Supported for both PowerShell and WAC based cluster
deployments.
• DCBX Mode NIC Advanced Property must be changed
to "host in charge".
• Set-NetAdapterAdvancedProperty -Name $nic1 -DisplayName 'DCBXMode' -
DisplayValue 'host in charge'

Internal Use - Confidential 35 of Y © Copyright 2020 Dell Inc.


Windows Admin Center
Cluster Deployment
Brandon Jones

Internal Use - Confidential 36 of Y © Copyright 2020 Dell Inc.


WAC Cluster Deployment - Overview
• Preview designation is being removed from the WAC
Deployment Guide with HCI OS 22H2.
• Network ATC is the only supported network deployment
option.
• 2-16 nodes in cluster
• Microsoft does not support single node WAC deployments

Internal Use - Confidential 37 of Y © Copyright 2020 Dell Inc.


WAC Cluster Deployment - Considerations
• Modify the Cluster traffic priority settings in the
"Customize network settings" menu of the Storage
intent. (Default is "7")

Internal Use - Confidential 38 of Y © Copyright 2020 Dell Inc.


WAC Cluster Deployment – Considerations Cont'd
• Change the RDMA protocol to iWARP for Intel E810
deployments (default is RoCE).

Internal Use - Confidential 39 of Y © Copyright 2020 Dell Inc.


WAC Cluster Deployment – Considerations Cont'd
• For deployment with Mellanox adapters, the "Dcbx
mode" setting will still need to be changed manually to
"Host in charge".
• Network ATC does not provide an override for this setting.

Internal Use - Confidential 40 of Y © Copyright 2020 Dell Inc.


Q&A

Internal Use - Confidential 41 of Y © Copyright 2020 Dell Inc.

You might also like