T4 For Virtualization

NVIDIA T4 FOR VIRTUALIZATION
Nicola Sessions, February 2019

• Why Choose NVIDIA T4 for Virtualization?
• NVIDIA T4 Performance for Virtualization Workloads
AGENDA • Selecting the Right GPU for Your Virtualization Workload
2
ANNOUNCING NVIDIA T4 FOR VIRTUALIZATION
The New Generation of Computer Graphics on a Quadro Virtual Data Center Workstation
• Virtual Quadro Workstation for the Professional

Designer & Data Scientist:
• Up to 2X graphics performance versus M60
• 5 Giga Rays per second for real-time,
interactive rendering
• NGC support; run deep learning inferencing
workloads 25x faster than CPU on a virtual
machine
• Virtual PCs for the Knowledge Worker:

• Up to 33% improved performance versus CPU
only VMs
• Support for VP9 decode and H.265 encode
and decode for improved CPU offload
3
DRIVING NEW WORKFLOWS
Empowering the Modern Digital Workplace
Digital Workplace Photorealistic Rendering Data Science

Windows 10 & Productivity Apps Increasingly Complex Designs Increase in AI/DL & Inference
4
RTX PERFORMANCE IN A
QUADRO VIRTUAL WORKSTATION
Support for up to 5 Giga Rays/Sec
Media & Entertainment Manufacturing Architecture

Real-time Rendering Simulation, modeling, design Rendering, design
5
NVIDIA T4 KEY SPECIFICATIONS
GPU Architecture NVIDIA Turing
NVIDIA CUDA® Cores 2,560

NVIDIA Turing™ Tensor Cores 320
RT Cores 40
Giga Rays/second 5
Memory Size 16 GB GDDR6
Memory BW Up to 320 GB/s

vGPU Profiles 1 GB, 2 GB, 4 GB, 8 GB, 16 GB
PCIe 3.0 single slot
Form Factor
(half height & length)
Power 70W
Thermal Passive
6
LATEST GENERATION
QUADRO VIRTUAL WORKSTATION
Work Faster with Larger Models
Quadro Virtual Workstations
Continued performance 1.6 1.5
increases with latest 1.4
generation GPUs 1.2 [VALUE]

1.0
1
Added AI support and ray
tracing support with 0.8
Tensor and RT cores 0.6
0.4
0.2
0
M60 P4 T4
3D Graphics: 1.5x performance
SPECviewperf13
SPECviewperf 13 results tested on a server with Intel Xeon Gold 6154 (18C, 3.0 GHz), Quadro vDWS with T4-16Q, VMware ESXi 6.7, host/guest driver
410.87/412.10, VM config, Windows 10, 8 vCPU, 16GB memory. 7
HIGHEST GRAPHICS PERFORMANCE
ON A VIRTUAL WORKSTATION
Work Faster with Larger Models
SPECviewperf13
Relative Performance
2.5
Up to 2X performance 2.2
compared to M60 2
1.6
2X framebuffer compared to 1.5 1.5 1.5 1.5
1.5
P4 to support larger models 1.2
1.2 M60
1.1
Professional Performance 1 P4
 Healthcare T4
 Oil & Gas 0.5
 Media & Entertainment
 Manufacturing
0
Geomean Medical Energy 3ds Max Maya CATIA Creo Siemens NX SOLIDWORKS
Healthcare Oil & Gas Media & Ent Manufacturing/Product Design
SPECviewperf 13 results tested on a server with Intel Xeon Gold 6154 (18C, 3.0 GHz), Quadro vDWS with T4-16Q, VMware ESXi 6.7, host/guest driver
410.87/412.10, VM config, Windows 10, 8 vCPU, 16GB memory. 8
RUN RTX APPLICATIONS
ON A VIRTUAL WORKSTATION
Quadro vDWS with RTX-Capable NVIDIA T4
Run applications built on the RTX platform,

the most powerful rendering platform, on
any device, anywhere
Real-time ray tracing performance of up to

5 Giga Rays per second
Accelerate batch rendering for faster time

-to-market
AI-enhanced denoising speeds creative

workflows
Photorealistic design with accurate

shadows, reflections & refractions
9
NVIDIA T4 WITH QUADRO vDWS
Real-Time Inference Performance
Video Inference
Quadro Virtual Workstation for 30

deep learning inferencing
Speedup vs. CPU Server

workloads 25
25X
20
Support for NVIDIA GPU Cloud
(NGC) 15
Ideal for deep learning labs and 10

classrooms 5
0
CPU VM T4 & Quadro vDWS
Speedup: 25x faster

ResNet-50 (7ms latency limit)
Tested on a server with Intel Xeon Gold 6154 (18C, 3.0 GHz), Quadro vDWS with T4-16Q, VMware ESXi 6.7, host/guest driver 410.87/412.10, VM config, Ubuntu 10
16.04, 8 vCPU, 32GB memory. 25X performance improvement over CPU VM.
NVIDIA T4 FOR VIRTUAL PCs
Optimize Data Center Utilization with Mixed Workloads
Virtual PCs
T4 vs. CPU only: Adding NVIDIA GPUs 1.4 [VALUE]
results in 33% better user experience
versus CPU only VMs** 1.2
1
1
T4 vs. M10: provides same user density
with lower power consumption* 0.8
0.6
Same user experience & performance**
0.4
Support for VP9 decode
0.2
Support for H.265 (HEVC) 4:4:4 encode 0

and decode CPU only VM T4
UX: 1.3x better
Support for >1TB system memory UX based on Remoted Frames
• Two NVIDIA T4 GPUs support the same user density as a single M10 and fit in the same 2 slot PCIe form factor. 11
** NVIDIA internal benchmark running Microsoft PowerPoint, Word, Excel, Chrome, PDF viewing and video playback.
NVIDIA DATA CENTER GPUs
Recommended for Virtualization
V100 P40 T4 M10 P6
GPUs / Board 1 1 1 4 1
(Architecture) (Volta) (Pascal) (Turing) (Maxwell) (Pascal)
2,560
CUDA Cores 5,120 3,840 2,560 2,048
(640 per GPU)
Tensor Cores 640 --- 320 --- ---
RT Cores --- --- 40 --- ---
32 GB GDDR5
Memory Size 32 GB/16 GB HBM2 24 GB GDDR5 16 GB GDDR6 16 GB GDDR5
(8 GB per GPU)
1 GB, 2 GB, 4 GB, 1 GB, 2 GB, 3 GB,

1 GB, 2 GB, 4 GB, 8 GB, 16 0.5 GB, 1 GB, 2 GB, 1 GB, 2 GB, 4 GB,
vGPU Profiles 8 GB, 16 GB, 4 GB, 6 GB, 8 GB,
GB 4 GB, 8 GB 8 GB, 16 GB
32 GB 12 GB, 24 GB
PCIe 3.0 Dual Slot & SXM2 PCIe 3.0 Dual Slot PCIe 3.0 Single Slot (rack PCIe 3.0 Dual Slot MXM
Form Factor
(rack servers) (rack servers) servers) (rack servers) (blade servers)
Power 250W/300W 250W 70W 225W 90W
Thermal passive passive passive passive bare board
PERFORMANCE DENSITY BLADE

Optimized Optimized Optimized
12
SELECTING THE RIGHT GPU
NVIDIA Quadro Virtual Data Center Workstation
Use Case: Entry to Midrange Quadro Smaller Profiles,
Workstations
NVIDIA T4 More Users
Workloads: CAD, CAE, Digital Content

Creation, Rendering, Inferencing, My end users work with
larger models or applications
Training
Use Case: High-end Quadro

Workstations
Decreasing
Workloads: Large, Complex CAD NVIDIA P40 user density
per server
models, Seismic Exploration, Complex Increasing
Digital Content Creation, Effects, 3D My end users use CAE workflow/model
Medical Imaging applications, or are complexity
experimenting with DL/AI
Use Case: Ultra High-end Quadro
Workstations
Workloads: Largest CAD models, CAE, NVIDIA V100 Larger Profiles,

Seismic Exploration, GPGPU compute,
Deep Learning, Immersive Visualization Fewer Users
13
SELECTING THE RIGHT GPU
NVIDIA GRID vPC/vApps
2 x NVIDIA T4 1 x NVIDIA M10

Density 32 users 32 users
Form Factor PCIe 3.0 single slot PCIe 3.0 dual slot
Power 140W (70W per GPU) 225W
Cores Available CUDA, Tensor, RT CUDA
CODECs VP9, H.265 H.264
System Memory Support
> 1TB < 1TB
Use Case Universal GPU for virtual workstations,

knowledge workers, rendering, inferencing, Lowest TCO for knowledge workers
training
14
NVIDIA T4 FOR VIRTUALIZATION
Powerful, Versatile Platform for VDI
Powerful virtual workstation for the

engineer, professional designer, and data
scientist
Deep learning inferencing for virtual

labs and classrooms
High density virtual desktops for the best

user experience for Windows 10
15

T4 For Virtualization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

T4 For Virtualization

Uploaded by

Copyright:

Available Formats

NVIDIA T4 FOR VIRTUALIZATION

Nicola Sessions, February 2019

• NVIDIA T4 Performance for Virtualization Workloads

AGENDA • Selecting the Right GPU for Your Virtualization Workload

• Virtual Quadro Workstation for the Professional

• Virtual PCs for the Knowledge Worker:

Digital Workplace Photorealistic Rendering Data Science

Media & Entertainment Manufacturing Architecture

GPU Architecture NVIDIA Turing

NVIDIA CUDA® Cores 2,560

Memory Size 16 GB GDDR6

Memory BW Up to 320 GB/s

Continued performance 1.6 1.5

increases with latest 1.4

generation GPUs 1.2 [VALUE]

Tensor and RT cores 0.6

Run applications built on the RTX platform,

Real-time ray tracing performance of up to

Accelerate batch rendering for faster time

AI-enhanced denoising speeds creative

Photorealistic design with accurate

Quadro Virtual Workstation for 30

Speedup vs. CPU Server

Ideal for deep learning labs and 10

Speedup: 25x faster

Support for H.265 (HEVC) 4:4:4 encode 0

Tensor Cores 640 --- 320 --- ---

RT Cores --- --- 40 --- ---

1 GB, 2 GB, 4 GB, 1 GB, 2 GB, 3 GB,

Power 250W/300W 250W 70W 225W 90W

Thermal passive passive passive passive bare board

PERFORMANCE DENSITY BLADE

Workloads: CAD, CAE, Digital Content

Use Case: High-end Quadro

Workloads: Largest CAD models, CAE, NVIDIA V100 Larger Profiles,

2 x NVIDIA T4 1 x NVIDIA M10

Use Case Universal GPU for virtual workstations,

Powerful virtual workstation for the

Deep learning inferencing for virtual

High density virtual desktops for the best

You might also like