Professional Documents
Culture Documents
Application Demands at
Scale, with Purpose-
Built AI Infrastructure
Simisola Olabisi & Andrew K Thomas
Optimize AI
performance
Deliver world-class performance for AI
Scale
2X 3X record
Faster throughput Estimated ROI for
per GPU 3 in LLM training
machine learning
MLPerf 3.1 2023 5
projects 4
AI innovators run on Azure AI Infrastructure
NVIDIA and Microsoft Azure have Co-designing supercomputers Our focus on conversational AI
collaborated through multiple with Azure has been crucial for requires us to develop and train
generations of products to bring scaling our demanding AI some of the most complex large
leading AI innovations to language models. Azure’s AI
training needs, making our
enterprises around the world. The infrastructure provides us with the
NDv5 H100 virtual machines will
research and alignment work
on systems like ChatGPT necessary performance to efficiently
help power a new era of train these models reliably at a
generative AI applications and possible.”
huge scale. We are thrilled about
services.” Greg Brockman the new VMs on Azure and the
President and increased performance they will
Ian Buck
Co-Founder of OpenAI
Vice President of hyperscale and bring to our AI development
high-performance computing at NVIDIA efforts.“
Mustafa Suleyman
CEO, Inflection
Microsoft is powered by Azure AI infrastructure
Edge
Security Microsoft Windows Dynamics Azure
Bing Chat
Copilot 365 Copilot Copilot 365 Copilot OpenAI API
Teams
Real time inference & low-cost Mid-range training & dense Distributed training & generative
compute inference inference
Microsoft runs on Azure AI infrastructure
7.5 54 100
Trillion Million Million
Characters translated Meeting hours transcribed Monthly active users
per month per month of AI text predictions
Azure beats on-prem and bare-metal for inference
delivered 0.99x-1.05x
100.00
relative performance
80.00
Samples/s
60.00
compared to the bare-metal Azure (VM)
NVIDIA (on-prem)
40.00
and on-prem competitors. Oracle (bare metal)
20.00
0.00
Server Offline
2,000
3,584
(MLPerf Training v3.0 - June 2023) (MLPerf Training v3.1 - November 2023)
aka.ms/AzureBlog/MLPerf3.1
Azure journey on the Top500 List
400
19X year-to-year
performance increase
petaFLOPS
300
200
10X list-to-list
performance increase
100
#13 #14
#11
0
GPU architecture
▪ Heavily parallelized
▪
▪
Optimization-focused
Purpose-built design
Purpose-built infrastructure
Proven speed & scale Model development
▪ ▪ Accelerated model training & inferencing
▪ Real-time responsiveness from cloud to edge ▪ Data-based models
▪ Improved performance ▪ Insight driven
▪ Automated and repeatable workflow ▪ Platform service model
Transformative AI services
Azure AI Services Azure Machine Learning Azure Data Lake
Optimized compute
ND-series VMs NC-series VMs
High-performing storage
Azure Blob Azure Managed Lustre Azure HPC Cache Azure NetApp Files
Meeting AI needs with Azure AI infrastructure
High end AI model training, Mid range training or AI workloads Typically achieved with HPC CPU
requiring massive exascale optimized with parrel processing or GPU performance. Supports
performance. Typically required across GPUs. Ideal for less variety of AI needs with focus on
for +100s of Billions of data complex AI workloads, where accelerated responses from
parameters. better performance delivers existing model queries.
accelerated results.
Optimize AI compute with GPUs
Cluster Facilitate
HPC clusters can be CPUs facilitate
simultaneously exploited computation via
for algorithmic gains. communication.
Real-time inferencing | Batch inference | Basic training | Midrange training | Data-parallel training | Model-parallel training
Azure provides best choices for optimal GPU utilization
NVIDIA Triton
Intelligent Inference Server
Edge Lightweight GPU
devices
NC
(Tesla K80)
NC NCsv2
(Tesla P100)
NCas_T4_v3 NCsv3
(NVIDIA V100 Tensor Core
(NVIDIA T4 Tensor Core GPU)
GPU)
ND NDv2 ND A100 v4
(Tesla P40) (NVIDIA V100 Tensor Core GPU) (NVIDIA A100 Tensor Core GPU 40GB)
ND
NDm A100 v4
(NVIDIA A100 Tensor Core GPU 80GB)
ND H100 v5
(NVIDIA H100 Tensor Core GPU 80GB)
with Microsoft for ▪ NVIDIA to use Azure supercomputers for AI research and
development
AI supercomputing ▪ Azure is the first public cloud to incorporate NVIDIA’s
advanced AI stack, adding tens of thousands of NVIDIA
A100 and H100 GPUs, NVIDIA Quantum-2 400Gb/s
InfiniBand networking and the NVIDIA AI Enterprise
software suite to its platform
Application platform
AI Builder
Power BI Power Apps Power Automate Power Virtual Agents
Bot Service Cognitive Search Document Video Indexer Metrics Advisor Immersive
Intelligence Reader
Scaling model “I need access to large amounts of compute and GPU reserved instance
platform storage” pre-bought
Platform “How do we secure our models and data in the Pane of glass
management different phases of the lifecycle” (AzureML)