MSFT AzureAIInfrastructure SimisolaOlabisiAndrewThomas March20

Meet Generative AI
Application Demands at
Scale, with Purpose-
Built AI Infrastructure
Simisola Olabisi & Andrew K Thomas
Optimize AI
performance
Deliver world-class performance for AI
Proven performance Speed and reliability at

with an AI leader any scale
Recognized as leader in AI by top Only Azure provides a truly

industry analysts. virtualized cloud environment.
Top in AI supercomputing with Get exascale supercomputing and
proven performance. scale up and out on demand
Proven
performance #3 #1 30%
and scale Supercomputer Cloud provider Faster training
Top500 List 2023 1 Top500 List 2023 1 for LLMs 2
Scale
2X 3X record
Faster throughput Estimated ROI for
per GPU 3 in LLM training
machine learning
MLPerf 3.1 2023 5
projects 4
AI innovators run on Azure AI Infrastructure
NVIDIA and Microsoft Azure have Co-designing supercomputers Our focus on conversational AI
collaborated through multiple with Azure has been crucial for requires us to develop and train
generations of products to bring scaling our demanding AI some of the most complex large
leading AI innovations to language models. Azure’s AI
training needs, making our
enterprises around the world. The infrastructure provides us with the
NDv5 H100 virtual machines will
research and alignment work
on systems like ChatGPT necessary performance to efficiently
help power a new era of train these models reliably at a
generative AI applications and possible.”
huge scale. We are thrilled about
services.” Greg Brockman the new VMs on Azure and the
President and increased performance they will
Ian Buck
Co-Founder of OpenAI
Vice President of hyperscale and bring to our AI development
high-performance computing at NVIDIA efforts.“
Mustafa Suleyman
CEO, Inflection
Microsoft is powered by Azure AI infrastructure
Edge
Security Microsoft Windows Dynamics Azure
Bing Chat
Copilot 365 Copilot Copilot 365 Copilot OpenAI API
Teams
Microsoft runs on Azure AI
Azure AI runs on Azure HPC infrastructure
Real time inference & low-cost Mid-range training & dense Distributed training & generative
compute inference inference
Microsoft runs on Azure AI infrastructure
7.5 54 100
Trillion Million Million
Characters translated Meeting hours transcribed Monthly active users
per month per month of AI text predictions
Azure beats on-prem and bare-metal for inference
Performance on LLM GPT-J - MLPerf Inference v3.1

as of September 2023
The ND H100 v5-series 120.00
delivered 0.99x-1.05x
100.00
relative performance
80.00
Samples/s
60.00
compared to the bare-metal Azure (VM)
NVIDIA (on-prem)
40.00
and on-prem competitors. Oracle (bare metal)
20.00
0.00
Server Offline
GPT-J (6B parameters, 99% accuracy)

Benchmark
Azure is most scalable and performant cloud for training
MLPerf-LLM 175B training scale record by
Azure ND H100 v5-series
Azure’s submission, the largest as of November 2023
in the history of MLPerf
12,000 4.0 mins
Training, demonstrates the
Number of NVIDIA H100 Tensor Core GPUs

extraordinary progress we 10,000
10,752
have made in optimizing the
8,000
scale of training."
6,000
David Kanter 10.9 mins
Executive Director of MLCommons 4,000
2,000
3,584
Previous Record Azure
(MLPerf Training v3.0 - June 2023) (MLPerf Training v3.1 - November 2023)
aka.ms/AzureBlog/MLPerf3.1
Azure journey on the Top500 List
#3 supercomputer all-up New cloud record of 561 petaFLOPS

600
#3
#1 cloud provider 500
400
19X year-to-year
performance increase
petaFLOPS
300
200
10X list-to-list
performance increase
100
#13 #14
#11
0
June 2022 November 2022 June 2023 November 2023

Accelerate AI
innovation
Complex AI models require purpose-built infrastructure
GPU architecture
▪ Heavily parallelized
▪
▪
Optimization-focused
Purpose-built design
Purpose-built infrastructure
Proven speed & scale Model development
▪ ▪ Accelerated model training & inferencing
▪ Real-time responsiveness from cloud to edge ▪ Data-based models
▪ Improved performance ▪ Insight driven
▪ Automated and repeatable workflow ▪ Platform service model
▪ Ability to focus on AI innovation ▪ Emerging use cases

Accelerate innovation with a full stack solution
Transformative AI services
Azure AI Services Azure Machine Learning Azure Data Lake
Optimized compute
ND-series VMs NC-series VMs
End-to-end workload orchestration

VM Scale Sets Azure Batch Azure CycleCloud Azure HPC Cache
Fast, secure networking

InfiniBand ExpressRoute
High-performing storage
Azure Blob Azure Managed Lustre Azure HPC Cache Azure NetApp Files
Meeting AI needs with Azure AI infrastructure
AI Supercomputing Mid-Range Training Inferencing
High end AI model training, Mid range training or AI workloads Typically achieved with HPC CPU
requiring massive exascale optimized with parrel processing or GPU performance. Supports
performance. Typically required across GPUs. Ideal for less variety of AI needs with focus on
for +100s of Billions of data complex AI workloads, where accelerated responses from
parameters. better performance delivers existing model queries.
accelerated results.
Optimize AI compute with GPUs
Connect Exploit Exchange

GPUs can be GPUs can be MPI allows blocks of data
interconnected. programmatically to be exchanged.
exploited.
Cluster Facilitate
HPC clusters can be CPUs facilitate
simultaneously exploited computation via
for algorithmic gains. communication.
Real-time inferencing | Batch inference | Basic training | Midrange training | Data-parallel training | Model-parallel training
Azure provides best choices for optimal GPU utilization
NVIDIA Triton
Intelligent Inference Server
Edge Lightweight GPU
devices
NV NV & NVv3 NVadsA10 v5

(Tesla M60) (NVIDIA A10 GPU)
NC
(Tesla K80)
NC NCsv2
(Tesla P100)
NCas_T4_v3 NCsv3
(NVIDIA V100 Tensor Core
(NVIDIA T4 Tensor Core GPU)
GPU)
ND NDv2 ND A100 v4
(Tesla P40) (NVIDIA V100 Tensor Core GPU) (NVIDIA A100 Tensor Core GPU 40GB)
ND
NDm A100 v4
(NVIDIA A100 Tensor Core GPU 80GB)
ND H100 v5
(NVIDIA H100 Tensor Core GPU 80GB)
Real-Time Batch Basic Midrange Data-Parallel Model-Parallel

Inferencing Inferencing Training Training Training Training
Visualization
▪ Microsoft to work with NVIDIA to build large AI
NVIDIA teams supercomputers
with Microsoft for ▪ NVIDIA to use Azure supercomputers for AI research and
development
AI supercomputing ▪ Azure is the first public cloud to incorporate NVIDIA’s
advanced AI stack, adding tens of thousands of NVIDIA
A100 and H100 GPUs, NVIDIA Quantum-2 400Gb/s
InfiniBand networking and the NVIDIA AI Enterprise
software suite to its platform
▪ As part of the collaboration, NVIDIA will utilize Azure’s

scalable virtual machine instances to research and further
accelerate advances in generative AI
Azure offers a full suite of AI services
Applications
Partner Solutions
Application platform
AI Builder
Power BI Power Apps Power Automate Power Virtual Agents
Customizable models Azure OpenAI

and scenario-based Vision Speech Language Decision Service
services
Azure AI services
Bot Service Cognitive Search Document Video Indexer Metrics Advisor Immersive
Intelligence Reader
ML platform Azure Machine Learning
Azure HPC platform Virtual Machines Networking Storage Workload services

Azure supports familiar toolchains and frameworks
Community project created by Meta and Microsoft

TensorFlow PyTorch Scikit-Learn  Use the best tool for the job.
 Train in one framework and transfer to another for
inference
MXNet Chainer Keras

Azure has the most comprehensive stack for AI
Category Challenges Audiences What Azure Offers
Data IT Process + Product

Business Scientist Department
“How do I train State of the Art models applicable Prebuilt
Model development
to my organization?” environments
Scaling model “I need access to large amounts of compute and GPU reserved instance
platform storage” pre-bought
CRITERIA FOR PERSONA

Model deployment “How do I deploy large models in production” Model Deployment
“I need a model small enough to be deployed on

Model optimization an edge device”
ONNX Runtime
Platform “How do we secure our models and data in the Pane of glass
management different phases of the lifecycle” (AzureML)
Responsible AI “I need to integrate an audit trail for a regulator” Responsible AI
Knowledge & “Our employees should be able to focus on their

Built in functionality
experience core skill without having to learn additional skills”

MSFT AzureAIInfrastructure SimisolaOlabisiAndrewThomas March20

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MSFT AzureAIInfrastructure SimisolaOlabisiAndrewThomas March20

Uploaded by

Copyright:

Available Formats

Meet Generative AI

Proven performance Speed and reliability at

Recognized as leader in AI by top Only Azure provides a truly

Microsoft runs on Azure AI

Azure AI runs on Azure HPC infrastructure

Performance on LLM GPT-J - MLPerf Inference v3.1

GPT-J (6B parameters, 99% accuracy)

Number of NVIDIA H100 Tensor Core GPUs

Previous Record Azure

#3 supercomputer all-up New cloud record of 561 petaFLOPS

June 2022 November 2022 June 2023 November 2023

▪ Ability to focus on AI innovation ▪ Emerging use cases

End-to-end workload orchestration

Fast, secure networking

AI Supercomputing Mid-Range Training Inferencing

Connect Exploit Exchange

NV NV & NVv3 NVadsA10 v5

Real-Time Batch Basic Midrange Data-Parallel Model-Parallel

▪ As part of the collaboration, NVIDIA will utilize Azure’s

Customizable models Azure OpenAI

ML platform Azure Machine Learning

Azure HPC platform Virtual Machines Networking Storage Workload services

Community project created by Meta and Microsoft

MXNet Chainer Keras

Data IT Process + Product

CRITERIA FOR PERSONA

“I need a model small enough to be deployed on

Responsible AI “I need to integrate an audit trail for a regulator” Responsible AI

Knowledge & “Our employees should be able to focus on their

You might also like