You are on page 1of 12

System Tuning Guide for AMD Instinct™

GPU Servers with EPYC 7002 CPUs

Application Note

Part Number: 57286_1.00


ii

© 2021 Advanced Micro Devices Inc. All rights reserved.


Disclaimer

The information contained herein is for informational purposes only and is subject to change without notice.
This document may contain technical inaccuracies, omissions, and typographical errors. AMD is under no
obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no
representations or warranties with respect to the accuracy or completeness of the contents of this document
and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or
fitness for particular purposes, with respect to the operation or use of AMD hardware, software, or other
products described herein. No license, including implied or arising by estoppel, to any intellectual property
rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products
or technology are as set forth in a signed agreement between the parties or in AMD’s Standard Terms and
Conditions of Sale.

This information is subject to the terms and conditions of the Export Control MOU, as well as other applicable
agreements, between AMD and the recipient of this document.

You shall adhere to all applicable U.S., European, and other export laws, including but not limited to the U.S.
Export Administration Regulations (“EAR”) (15 CFR Sections 730-774), and E.U. Council Regulation (EC) No
428/2009 of 5 May 2009. Further, pursuant to Section 740.6 of the EAR, You hereby certify that, except pursuant
to a license granted by the United States Department of Commerce Bureau of Industry and Security or as
otherwise permitted pursuant to a License Exception under the EAR, You will not (1) export, re-export, or
release to a national of a country in Country Groups D:1, E:1, or E:2 any restricted technology, software, or
source code it receives from AMD, or (2) export to Country Groups D:1, E:1, or E:2 the direct product of such
technology or software, if such foreign produced direct product is subject to national security controls as
identified on the Commerce Control List (currently found in Supplement 1 to Part 774 of EAR). For the most
current Country Group listings, or for additional information about the EAR or Your obligations under those
regulations, please refer to the U.S. Bureau of Industry and Security’s website at http://www.bis.doc.gov.

Trademarks

AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.
PCIe is a registered trademark of PCI-SIG.
Other product names used in this publication are for identification purposes only and may be trademarks of
their respective companies.
Dolby Laboratories, Inc.
Manufactured under license from Dolby Laboratories.
Rovi Corporation
This device is protected by U.S. patents and other intellectual property rights. The use of Rovi Corporation's
copy protection technology in the device must be authorized by Rovi Corporation and is intended for home
and other limited pay-per-view uses only, unless otherwise authorized in writing by Rovi Corporation.
Reverse engineering or disassembly is prohibited.
USB Implementers Forum, Inc.
USB Type-C and USB-C are trademarks of USB Implementers Forum, Inc.

USE OF THIS PRODUCT IN ANY MANNER THAT COMPLIES WITH THE MPEG ACTUAL OR DE FACTO VIDEO
AND/OR AUDIO STANDARDS IS EXPRESSLY PROHIBITED WITHOUT ALL NECESSARY LICENSES UNDER
APPLICABLE PATENTS. SUCH LICENSES MAY BE ACQUIRED FROM VARIOUS THIRD PARTIES INCLUDING,
BUT NOT LIMITED TO, THOSE LICENSES IN THE MPEG PATENT PORTFOLIO, WHICH ARE AVAILABLE FROM
MPEG LA, L.L.C., 6312 S. FIDDLERS GREEN CIRCLE, SUITE 400E, GREENWOOD VILLAGE, COLORADO 80111.

System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs © 2021 Advanced Micro Devices, Inc.
57286_1.00 AMD Confidential - Do not duplicate.
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 SBIOS Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Optimized PCIe Performance Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

© 2021 Advanced Micro Devices, Inc. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs
AMD Confidential - Do not duplicate. 57286_1.00
iv

System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs © 2021 Advanced Micro Devices, Inc.
57286_1.00 AMD Confidential - Do not duplicate.
Figures
3 Optimized PCIe Performance Targets
Figure 3–1 PCIe Transfer Types - Without Instinct Infinity Fabric Installed . . . . . . . . . . . . . . . . . . . . 10
Figure 3–2 Target Ranges with 18 Gbps Epyc Infinity Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Figure 3–3 Target Ranges with 16 Gbps Epyc Infinity Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

© 2021 Advanced Micro Devices, Inc. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs
AMD Confidential - Do not duplicate. 57286_1.00
Tables
2 SBIOS Settings
Table 2–1 SBIOS Setting Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

© 2021 Advanced Micro Devices, Inc. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs
AMD Confidential - Do not duplicate. 57286_1.00
Introduction 7

1 Introduction
This application note lists the SBIOS settings and other means to target maximum
performance for AMD Instinct™ GPU Servers with AMD Epyc 7002 CPUs. It defines
how to tune system parameters for optimal PCIe® bandwidth and latency, lower CPU
control latency, and higher GPU performance.
Idle power may be increased in order to provide additional performance for GPU-
centric applications. If this hinders the deployment, work with the systems manager
to determine the settings favorable for the specific scenario.

2 SBIOS Settings
This section describes the SBIOS settings, their priority and the relevance of each
option. The settings that you enable are dependent on your deployment criteria. Read
the notes in the table below and set appropriately.

Note: The names and parameters for some of the SBIOS settings may vary across
different platform vendors.

The SBIOS setting table below pulls information from the Workload Tuning Guide for
AMD EPYC™ 7002 Series Processor Based Servers.
Table 2–1 SBIOS Setting Descriptions
SBIOS Setting Priority Relevance

CRITICAL Necessary for GPU Large-Bar


Support (All GPU memory
Advanced ▷ PCIe ▷ Above 4G
mapped into PCIe® address
Decoding: Enable
space) and high-performance
GPU DMA
Enable Enhanced Preferred IO CRITICAL For peak PCIe performance it is
on all PCIe ports necessary to run the AMD I/O
Power Management Utility after
every boot. Doing so will improve
PCIe® bandwidth up to 60%
(9.8GB/s improvement) for
transfers between 256KB to
256MB.
AMD CBS ▷ NBIO Common CRITICAL Refer to the IOMMU Note 1 below
Options ▷ IOMMU: Disable*
Advanced ▷ PCIe Subsystem CRITICAL Disables Single Root IO
Setting ▷ SR-IOV: Disable Virtualization.

AMD CBS ▷ NBIO Common CRITICAL Improves PCIe® performance by


Options ▷ PCIe Ten Bit Tag enabling a larger number of
Support: Enable active/outstanding transactions.
With PCIe® Gen 4, to achieve full
bandwidth, an adapter should
support 10-bit extended tags.
AMD CBS ▷ CPU Common CRITICAL This should not be disabled, as the
Options ▷ Global C-state Clock Gating (CC1, or S/W C1)
Control: Auto and Power Gating (CC6, or S/W
C2) settings will be disabled.

© 2021 Advanced Micro Devices, Inc. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs
AMD Confidential - Do not duplicate. 57286_1.00
8 SBIOS Settings

SBIOS Setting Priority Relevance

AMD CBS ▷ NBIO Common Important Algorithm Performance Boost


Options ▷ SMU Common which controls the P-States for
Options ▷ Fixed SOC Pstate:P0 the Data Fabric. Under certain
scenarios, involving low
bandwidth but latency-sensitive
traffic (and memory latency
checkers), the transition from low
power to full power can adversely
impact latency. Setting APBDIS to
1 (to disable APB) and specifying
a fixed Infinity Fabric P-state of 0
will force the Infinity Fabric and
memory controllers into full-
power mode, eliminating any such
latency jitter.
• AMD CBS ▷ NBIO Common Important
Options ▷ SMU Common
Options ▷ xGMI Link Width
Control: Manual
• AMD CBS ▷ NBIO Common
Options ▷ SMU Common
Options ▷ xGMI Force Link
Width: 2
• AMD CBS ▷ NBIO Common
Forces the Infinity Fabric links
Options ▷ SMU Common
between the EPYC CPUs to
Options ▷ xGMI Force Link
maximum width (x16).
Width Control: Force
• AMD CBS ▷ NBIO Common
Options ▷ SMU Common
Options ▷ xGMI Force Link
Width:1
• AMD CBS ▷ NBIO Common
Options ▷ SMU Common
Options ▷ xGMI Max Link
Width Control: Manual
AMD CBS ▷ SMU Debug Recommended Disables CPU LCLK Deep Sleep
Options ▷ SMU Feature
Enable/Disable ▷ LCLK Deep
Sleep: Disabled
• AMD CBS ▷ NBIO Common Recommended
Options ▷ SMU Common Ensure maximum performance
Options ▷ Determinism levels for each CPU in a large
Control: Manual population of identically
• AMD CBS ▷ NBIO Common configured CPUs by throttling
Options ▷ SMU Common CPUs only when they reach the
Options ▷ Determinism same cTDP
Slider: Power
AMD CBS ▷ UMC Common Recommended
Options ▷ DDR4 Common
Prevents DRAM controllers from
Options ▷ DRAM Controller
powering down for lower latency
Configuration ▷ DRAM Power
DRAM access
Options ▷ Power Down Enable:
Disabled
Run cpupower idle-set -d 2 Recommended Disables the power-gating (C6) on
all cores.

System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs © 2021 Advanced Micro Devices, Inc.
57286_1.00 AMD Confidential - Do not duplicate.
Optimized PCIe Performance Targets 9

SBIOS Setting Priority Relevance

AMD CBS ▷ DF Common Application Dependent With NPS1, all eight memory
Options ▷ Memory Addressing channels are interleaved. With
▷ NUMA nodes per socket NPS2, every four channels are
interleaved with each other. With
NPS4, every pair of channels is
interleaved” Normal operations is
NPS1, however AMD's machine
learning data transfer library
RCCL prefers NPS2.
AMD CBS ▷ UMC Common Some deployments require Disable transparent secure
Options ▷ DDR4 Common encryption memory encryption. The impact
Options ▷ Security ▷ TSME: of this encryption is 5 ns–7 ns of
Disabled additional memory latency.

• AMD CBS ▷ DF Common Important if supported Forces the xGMI links between
Options ▷ Link ▷ 4-Link the EPYC CPUs to max speed (18)
xGMI Max Speed: 18Gbps if the server supports it. Up to
• AMD CBS ▷ DF Common 12.5% faster GPU-to-Remote CPU
Options ▷ Link ▷ 3-Link DRAM and GPU-to-GPU & GPU-
xGMI Max Speed: 18Gbps to-NIC transfers

• AMD CBS ▷ CPU Common Application Dependent


Options ▷ Performance ▷ 
CCD/Core/Thread
Enablement: Accept
Disables SMT for higher per-core
• AMD CBS ▷ CPU Common
performance
Options ▷ Performance ▷ 
CCD/Core/Thread
Enablement ▷ SMT
Control: Disable
1. In certain systems, it may be necessary to enable the IOMMU in SBIOS. When doing this, the
operating system must be configured to set the IOMMU in PassThru mode. For Ubuntu this is
performed by:
a. Set IOMMU in the SBIOS to Enabled
b. Edit /etc/default/grub and set the linux default line to:
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt"
c. Run 'update-grub'

3 Optimized PCIe Performance Targets


This section describes the measured PCIe® performance developed from data
gathered on an A+A GPU Server with eight AMD Instinct™ MI50 GPUs and dual AMD
Epyc™ 7742 GPUs. This server connectes all eight MI50s directly to the 7742s via
PCIe® Gen4 x16 ports, without PCIe® switches. The insertion of PCIe® switches
between the GPUs and CPUs will reduce the bandwidth and increase the latency.

Note: The SBIOS settings above were enabled to achieve these results. Systems
which deviate from the above SBIOS implementation may achieve different results.

The format described in the figures below matches the output generated by the rocm-
bandwidth-test executed with no parameters on a dual socket AMD Epyc 7742 System
Gen4 PCIe® server with no PCIe® switches and eight AMD Instinct™ MI50 Gen4
PCIe® GPUs.

Note: This is as measured by the rocm-bandwidth-test v2.4.0 or later. Systems with


differing boards or topologies will exhibit different performance.

© 2021 Advanced Micro Devices, Inc. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs
AMD Confidential - Do not duplicate. 57286_1.00
10 Optimized PCIe Performance Targets

Figure 3–1 PCIe Transfer Types - Without Instinct Infinity Fabric Installed

Figure 3–2 Target Ranges with 18 Gbps Epyc Infinity Fabric

System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs © 2021 Advanced Micro Devices, Inc.
57286_1.00 AMD Confidential - Do not duplicate.
Optimized PCIe Performance Targets 11

Figure 3–3 Target Ranges with 16 Gbps Epyc Infinity Fabric

© 2021 Advanced Micro Devices, Inc. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs
AMD Confidential - Do not duplicate. 57286_1.00
12 Optimized PCIe Performance Targets

System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs © 2021 Advanced Micro Devices, Inc.
57286_1.00 AMD Confidential - Do not duplicate.

You might also like