You are on page 1of 20

Meet Generative AI

Application Demands at
Scale, with Purpose-
Built AI Infrastructure
Simisola Olabisi & Andrew K Thomas
Optimize AI
performance
Deliver world-class performance for AI

Proven performance Speed and reliability at


with an AI leader any scale

Recognized as leader in AI by top Only Azure provides a truly


industry analysts. virtualized cloud environment.
Top in AI supercomputing with Get exascale supercomputing and
proven performance. scale up and out on demand
Proven
performance #3 #1 30%
and scale Supercomputer Cloud provider Faster training
Top500 List 2023 1 Top500 List 2023 1 for LLMs 2

Scale
2X 3X record
Faster throughput Estimated ROI for
per GPU 3 in LLM training
machine learning
MLPerf 3.1 2023 5
projects 4
AI innovators run on Azure AI Infrastructure

NVIDIA and Microsoft Azure have Co-designing supercomputers Our focus on conversational AI
collaborated through multiple with Azure has been crucial for requires us to develop and train
generations of products to bring scaling our demanding AI some of the most complex large
leading AI innovations to language models. Azure’s AI
training needs, making our
enterprises around the world. The infrastructure provides us with the
NDv5 H100 virtual machines will
research and alignment work
on systems like ChatGPT necessary performance to efficiently
help power a new era of train these models reliably at a
generative AI applications and possible.”
huge scale. We are thrilled about
services.” Greg Brockman the new VMs on Azure and the
President and increased performance they will
Ian Buck
Co-Founder of OpenAI
Vice President of hyperscale and bring to our AI development
high-performance computing at NVIDIA efforts.“
Mustafa Suleyman
CEO, Inflection
Microsoft is powered by Azure AI infrastructure

Edge
Security Microsoft Windows Dynamics Azure
Bing Chat
Copilot 365 Copilot Copilot 365 Copilot OpenAI API
Teams

Microsoft runs on Azure AI

Azure AI runs on Azure HPC infrastructure

Real time inference & low-cost Mid-range training & dense Distributed training & generative
compute inference inference
Microsoft runs on Azure AI infrastructure

7.5 54 100
Trillion Million Million
Characters translated Meeting hours transcribed Monthly active users
per month per month of AI text predictions
Azure beats on-prem and bare-metal for inference

Performance on LLM GPT-J - MLPerf Inference v3.1


as of September 2023
The ND H100 v5-series 120.00

delivered 0.99x-1.05x
100.00

relative performance
80.00

Samples/s
60.00
compared to the bare-metal Azure (VM)
NVIDIA (on-prem)
40.00
and on-prem competitors. Oracle (bare metal)

20.00

0.00
Server Offline

GPT-J (6B parameters, 99% accuracy)


Benchmark
Azure is most scalable and performant cloud for training
MLPerf-LLM 175B training scale record by
Azure ND H100 v5-series
Azure’s submission, the largest as of November 2023
in the history of MLPerf
12,000 4.0 mins
Training, demonstrates the

Number of NVIDIA H100 Tensor Core GPUs


extraordinary progress we 10,000
10,752
have made in optimizing the
8,000
scale of training."
6,000
David Kanter 10.9 mins
Executive Director of MLCommons 4,000

2,000
3,584

Previous Record Azure

(MLPerf Training v3.0 - June 2023) (MLPerf Training v3.1 - November 2023)

aka.ms/AzureBlog/MLPerf3.1
Azure journey on the Top500 List

#3 supercomputer all-up New cloud record of 561 petaFLOPS


600
#3
#1 cloud provider 500

400

19X year-to-year
performance increase

petaFLOPS
300

200

10X list-to-list
performance increase
100
#13 #14
#11
0

June 2022 November 2022 June 2023 November 2023


Accelerate AI
innovation
Complex AI models require purpose-built infrastructure

GPU architecture
▪ Heavily parallelized


Optimization-focused
Purpose-built design
Purpose-built infrastructure
Proven speed & scale Model development
▪ ▪ Accelerated model training & inferencing
▪ Real-time responsiveness from cloud to edge ▪ Data-based models
▪ Improved performance ▪ Insight driven
▪ Automated and repeatable workflow ▪ Platform service model

▪ Ability to focus on AI innovation ▪ Emerging use cases


Accelerate innovation with a full stack solution

Transformative AI services
Azure AI Services Azure Machine Learning Azure Data Lake

Optimized compute
ND-series VMs NC-series VMs

End-to-end workload orchestration


VM Scale Sets Azure Batch Azure CycleCloud Azure HPC Cache

Fast, secure networking


InfiniBand ExpressRoute

High-performing storage
Azure Blob Azure Managed Lustre Azure HPC Cache Azure NetApp Files
Meeting AI needs with Azure AI infrastructure

AI Supercomputing Mid-Range Training Inferencing

High end AI model training, Mid range training or AI workloads Typically achieved with HPC CPU
requiring massive exascale optimized with parrel processing or GPU performance. Supports
performance. Typically required across GPUs. Ideal for less variety of AI needs with focus on
for +100s of Billions of data complex AI workloads, where accelerated responses from
parameters. better performance delivers existing model queries.
accelerated results.
Optimize AI compute with GPUs

Connect Exploit Exchange


GPUs can be GPUs can be MPI allows blocks of data
interconnected. programmatically to be exchanged.
exploited.

Cluster Facilitate
HPC clusters can be CPUs facilitate
simultaneously exploited computation via
for algorithmic gains. communication.

Real-time inferencing | Batch inference | Basic training | Midrange training | Data-parallel training | Model-parallel training
Azure provides best choices for optimal GPU utilization
NVIDIA Triton
Intelligent Inference Server
Edge Lightweight GPU
devices

NV NV & NVv3 NVadsA10 v5


(Tesla M60) (NVIDIA A10 GPU)

NC
(Tesla K80)

NC NCsv2
(Tesla P100)

NCas_T4_v3 NCsv3
(NVIDIA V100 Tensor Core
(NVIDIA T4 Tensor Core GPU)
GPU)

ND NDv2 ND A100 v4
(Tesla P40) (NVIDIA V100 Tensor Core GPU) (NVIDIA A100 Tensor Core GPU 40GB)
ND
NDm A100 v4
(NVIDIA A100 Tensor Core GPU 80GB)

ND H100 v5
(NVIDIA H100 Tensor Core GPU 80GB)

Real-Time Batch Basic Midrange Data-Parallel Model-Parallel


Inferencing Inferencing Training Training Training Training
Visualization
▪ Microsoft to work with NVIDIA to build large AI
NVIDIA teams supercomputers

with Microsoft for ▪ NVIDIA to use Azure supercomputers for AI research and
development
AI supercomputing ▪ Azure is the first public cloud to incorporate NVIDIA’s
advanced AI stack, adding tens of thousands of NVIDIA
A100 and H100 GPUs, NVIDIA Quantum-2 400Gb/s
InfiniBand networking and the NVIDIA AI Enterprise
software suite to its platform

▪ As part of the collaboration, NVIDIA will utilize Azure’s


scalable virtual machine instances to research and further
accelerate advances in generative AI
Azure offers a full suite of AI services
Applications
Partner Solutions

Application platform
AI Builder
Power BI Power Apps Power Automate Power Virtual Agents

Customizable models Azure OpenAI


and scenario-based Vision Speech Language Decision Service
services
Azure AI services

Bot Service Cognitive Search Document Video Indexer Metrics Advisor Immersive
Intelligence Reader

ML platform Azure Machine Learning

Azure HPC platform Virtual Machines Networking Storage Workload services


Azure supports familiar toolchains and frameworks

Community project created by Meta and Microsoft


TensorFlow PyTorch Scikit-Learn  Use the best tool for the job.
 Train in one framework and transfer to another for
inference

MXNet Chainer Keras


Azure has the most comprehensive stack for AI
Category Challenges Audiences What Azure Offers

Data IT Process + Product


Business Scientist Department
“How do I train State of the Art models applicable Prebuilt
Model development
to my organization?” environments

Scaling model “I need access to large amounts of compute and GPU reserved instance
platform storage” pre-bought

CRITERIA FOR PERSONA


Model deployment “How do I deploy large models in production” Model Deployment

“I need a model small enough to be deployed on


Model optimization an edge device”
ONNX Runtime

Platform “How do we secure our models and data in the Pane of glass
management different phases of the lifecycle” (AzureML)

Responsible AI “I need to integrate an audit trail for a regulator” Responsible AI

Knowledge & “Our employees should be able to focus on their


Built in functionality
experience core skill without having to learn additional skills”

You might also like