You are on page 1of 28

Accelerate your

journey to AI with IBM

Nguyen Si Duy
Server Technical Specialist
IBM System Hardware
28 Apr 2021
AI is Transforming Every Industry

300% 200% 9/10


Increase in AI Spend Increase in Jobs CIOs Planning to Use
Year over Year Requiring AI Skills Machine Learning

“Businesses are preparing


“AI is the fastest growing “Demand for AI
for the widespread adoption
workload on the planet” talent on the rise”
of machine learning”
Make Your Company an AI Company

Healthcare and
Financial Services Manufacturing Marketing and Retail
Life Science
Wholesale / Commercial • Early cancer detection • Stock level predictions • Funnel predictions
Banking • Product recommendations • Demand forecasting • Personalized ads
• Know Your Customers (KYC) • Personalized prescription • Returns forecasting • Credit scoring
• Anti-Money Laundering matching • Machine failure • Fraud detection
(AML) • • Preemptive maintenance
Medical claim fraud
• Fault detection • Next best offer
Card / Payments Business detection
• Security Cyberlake • Next best action
• Transaction frauds • Flu season prediction
• Master Data Management • Customer segmentation
• Collusion fraud • Drug discovery • Dispatch failures • Customer churn
• Real-time targeting • ER and hospital • Arrival delays
management • Damaged packaged • Customer recommendations
• Credit risk scoring • Ad predictions and fraud
• In-context promotion • Remote patient monitoring • Returns
• Medical test predictions
Retail Banking
• Deposit fraud
• Customer churn prediction
• Auto-loan
Save Time. Save Money. Gain a Competitive Edge.
IBM Cognitive Systems AI Portfolio
AI for IBM Visual Insights H2O Driverless AI
Data Scientists and Auto-DL for Images & Video Auto-ML for Text & Numeric Data, NLP

non-Data Scientists Label Train Deploy Import Experiment Deploy

WML CE: Open Source ML Frameworks


Watson Machine
Learning Community
Edition Large Model Support (LMS)
Distributed Deep Learning
Deep Learning Impact
(up to 4 nodes)
(DLI) Module

Distributed Deep Learning Auto Hyper-parameter Data & Model


Watson Machine (DDL – 1000s of nodes) Tuning Management, ETL,
Learning Visualize, Advise
IBM Spectrum Conductor with Spark
Accelerator Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine

Accelerated
Infrastructure
Accelerated Servers Storage
POWER Processor Roadmap

POWER10
POWER9
POWER7+ POWER8
POWER7 32 nm 22 nm
POWER8NV
45 nm 3-4+ GHz
POWER6+ 22 nm
POWER6 65 nm 3-4+GHz
POWER5™ POWER5+ 65 nm
90 nm
130 nm 3-5GHz ▪ Enhanced Thread
1.5 – 2.3GHz performance
▪ Multi Core (up to 12)
▪ Dual Core
▪ Multi Core (up to 8)
▪ Double L1 data cache ▪ Analytics
▪ On-Chip eDRAM (L3) Optimization
▪ Dual Core ▪ High Frequencies ▪ Power Optimized Cores ▪ On Chip eDRAM (L3)
▪ Enhanced Scaling ▪ Virtualization + ▪ L4 on Memory cCards ▪ Extreme Big Data
▪ Mem Subsystem ++
▪ SMT 2 ▪ Memory Subsystem + ▪ SMT 4 ▪ Bandwidth +++ Optimization
▪ Distributed Switch + ▪ Altivec ▪ Reliability + ▪ SMT 8 ▪ Next Gen Memory
▪ Core Parallelism + ▪ Instruction Retry ▪ VSM & VSX (AltiVec) ▪ Reliability ++ ▪ On-chip accelerators
▪ FP Performance + ▪ Dynamic Energy Mgmt ▪
▪ On Chip Power Mgmt
Protection Keys+ ▪ Open CAPI
▪ Memory bandwidth + ▪ SMT 2+ ▪ PCIe Gen 3
▪ BW enhancements
▪ Virtualization ▪ Protection Keys ▪ Transistors 1.2B
▪ POWER7+ 2.1B
▪ Transistors 4B+
▪ Transistors 790M ▪ Transistors 8B
▪ Transistors 276M

2004 2007 2010 2012 2014 2016 2017 2020 5


IBM POWER9 Family
When data-intensive workloads are the bottom line

Mission Critical Data Intensive Workloads for Private Clouds Big Data Workloads Enterprise AI Workloads
Entry Midsize Enterprise
AC922
S922/S914/S924 E950 E980 LC922/LC921
IC922 6
H922/H924/L922/IC922
Traditional infrastructure isn’t
suited for AI workloads
The wrong infrastructure puts AI at risk.

Data pipeline too slow,


causing bottleneck effect

Systems don't easily


scale to meet demand

Processor not optimized


for AI workloads

IBM Cognitive Systems / February 2020 / © 2020 IBM Corporation


Evolving from Compute Systems to Cognitive Systems
Dev Ecosystem

Industry Alignment Not Just About Hardware Design


Partnerships

Open Frameworks It’s about co-optimization

IBM Software
software

P8 P9 P10 hardware
+
Open Accelerator which just works for ML, DL, and AI
Interfaces
Accelerator Roadmaps

8
IBM Power Systems for AI
IBM Visual Insights

IBM Watson Machine Learning Accelerator IBM Visual Insights

IBM Watson Machine Learning CE

Data Train Inference

IBM Power System IC922 IBM Power System AC922 IBM Power System IC922

IBM Spectrum Scale Storage

IBM Cloud Pak

IBM Cognitive Systems / February 2020 / © 2020 IBM Corporation


Air Cool Mechanical Overview
BMC (Service Processor Card)
• IPMI
• 1 Gb Eth
• 1 VGA PCIe slot (4x)
POWER9 Processor (2x) • Gen4 PCIe
• 190W & 250W • 1 USB 3.0
Memory DIMM’s (16x) • 2, x16 HHHL Adapter
• 8 DDR4 IS DIMMs per socket • 1, x8,x8 Shared HHHL Adapter
• 1 x4 HHHL Adapter

4X - Cooling Fans
• Counter- Rotating
• Hot swap
• 80mm Power Supplies (2x)
• 2200W
• Configuration limits for redundancy
• Hot Swap
• 200VAC, 277VAC, 400VDC input

Storage NVidia Volta GPU


• Optional 2x SFF SATA Disk • 2 per socket
• Optional 2x SFF SATA SSD • SXM2 form factor
• Disk are tray based for hot swap • 300W
• NVLink 2.0
• Air Cooled

Note: Front Bezel removed


Operator Interface
• 1 USB 3.0
• Power Button
• Service LED’s
This speeds up
CPU → GPU
GPU → GPU AND
communications

ONLY
This
speeds up
GPU → GPU
communications
Typical GPU connectivity on x86

System Memory
System
memory bus:
76.8 GB/s CPU
PCIe: 32 GB/s PCIe: 32 GB/s

GPU GPU
NVLink on x86
NVLink CPU to GPU connectivity only on POWER9
System Memory
System
memory bus
170 GB/s
CPU NVLink 2.0
NVLink 2.0 150 GB/s
150 GB/s

Coherent Coherent
NVLink 2.0
access to
system memory GPU 150 GB/s
GPU access to
system memory
Large AI Models Train Caffe with LMS (Large Model Support)
~4X Faster Runtime of 1000 Iterations
12000 3.1 Hours 3.8x Faster
POWER9 Servers with NVLink 10000
for CPUs to GPUs

Time (secs)
8000

vs. 6000

x86 Servers with PCIe 4000 49 Mins


for CPUs to GPUs 2000

0
Xeon x86 2640v4 w/ Power AC922 w/ 4x
4x V100 GPUs V100 GPUs

GoogleNet model on Enlarged


ImageNet Dataset (2240x2240)
Limited memory on a GPU
is was a problem for deep
neural network training

Traditional Model Support Large Model Support


(Competitors) (IBM Power)
Limited memory on GPU forces Use system memory and GPU
trade-off in model size / data coherency with NVLink 2.0 to
resolution which leads to train deep neural nets with
less complex, shallower higher resolution data and
neural nets that don’t perform develop more accurate models
for better inference capability
POWER9 delivers 2.3x more images
processed per second vs tested x86
systems
train more | build more | know more

Critical capabilities (regression, nearest neighbor, GoogLeNet – 1000 epochs


recommendation systems, +++) operate on more HIGHER IS BETTER
than just the GPU memory

Use Server and GPU memory to support higher [4763]


resolution data by moving large amounts of data 4xTesla images /
between the CPU and GPU
V100 GP
NVLINK 2.0
second

PowerAI automatically enables seamless use of 2.3x faster


Server and GPU memory

NVLINK 2.0 and POWER9 significantly cuts training [2042]


times and boosts performance (accuracy) of the 4xTesla images /
V100 GPUs
model with higher resolution data PCIe3 second
Benchmark details in speaker notes.
Server to Supercomputer
on the same infrastructure

Grow with your success without


having to re-platform.

The Power AC922 can scale


with your business.

1 Power AC922 servers The World’s 2nd fastest Super Computer


4,608 Power AC922 servers
384 hours (16 days)
to train a model built on ImageNet-22K
using ResNet-101 on a server with 8 GPUs.

Distributed Deep Learning DDL makes


trained this model in 7 hours AI scale
58x faster by scaling the workload across 64
servers and 256 GPUs. Now iterate!

POWER9 scales with 95% efficiency.


2019 Global Server
Hardware, Server OS
Reliability Report

IBM Power Systems


ranks #1 in every major
reliability category by
ITIC including
virtualization and security

19
Flexible deployment
Right size, right place. To support
business models, the Power AC922 offers
flexible and scalable deployments.

Deployment options

• On-premises

• IBM Cloud Private on Power AC922

• In the IBM Cloud

• In the cloud via CSPs


(e.g. Nimbix, Cirarascale Cloud)
IBM Cognitive Systems AI Portfolio
AI for IBM Visual Insights H2O Driverless AI
Data Scientists and Auto-DL for Images & Video Auto-ML for Text & Numeric Data, NLP

non-Data Scientists Label Train Deploy Import Experiment Deploy

WML CE: Open Source ML Frameworks


Watson Machine
Learning Community
Edition Large Model Support (LMS)
Distributed Deep Learning
Deep Learning Impact
(up to 4 nodes)
(DLI) Module

Distributed Deep Learning Auto Hyper-parameter Data & Model


Watson Machine (DDL – 1000s of nodes) Tuning Management, ETL,
Learning Visualize, Advise
IBM Spectrum Conductor with Spark
Accelerator Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine

Accelerated
Infrastructure
Accelerated Servers Storage
IBM Visual Insights
“Point-and-Click” AI for Images & Video
Label Image or Package & Deploy
Auto-Train AI
Video Data AI Model
Model

Demo available
22
Classification of x-rays to detect COVID-19

NORMAL PNUEMONIA COVID-19 caused


caused by Bacteria by Corona Virus
Streamlining the process – Label, Train and Deploy Demo

LOGIN LABEL XRAYS INTO CATEGORIES AUGMENT XRAY IMAGES

TRAIN A CLASSIFICATION MODEL


VALIDATE & DEPLOY TRAINED MODEL 24
Detect COVID-19 from
x-ray images *

Value
• Complement shortage of radiologists
and doctors in hospitals with AI to
detect and prescribe treatment
quickly.

• Work with available x-rays. Apply AI


to augment limited datasets for
training.

• Churn new models or retrain existing


models with new x-ray images.
• Streamlined process to develop
models in less that 60 mins.

• Save more lives by processing the x-


rays more quickly than a manual
process.

* Not certified by clinical trials


IBM Cognitive Systems AI Portfolio
AI for IBM Visual Insights H2O Driverless AI
Data Scientists and Auto-DL for Images & Video Auto-ML for Text & Numeric Data, NLP

non-Data Scientists Label Train Deploy Import Experiment Deploy

WML CE: Open Source ML Frameworks


Watson Machine
Learning Community
Edition Large Model Support (LMS)
Distributed Deep Learning
Deep Learning Impact
(up to 4 nodes)
(DLI) Module

Distributed Deep Learning Auto Hyper-parameter Data & Model


Watson Machine (DDL – 1000s of nodes) Tuning Management, ETL,
Learning Visualize, Advise
IBM Spectrum Conductor with Spark
Accelerator Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine

Accelerated
Infrastructure
Accelerated Servers Storage
H2O Driverless AI Delivers Automatic ML for the Enterprise

• Performs the function of an expert


data scientist

• Create models quickly with GPUs and


Machine Learning automation

• Delivers insights and interpretability

• Created and supported by world


renowned AI experts from H2O.ai

• Award-winning software

• Subscription based

21 day free trial for Driverless AI


Let’s start your AI
journey TODAY!

You might also like