02 - Accelerate Your Journey To AI With IBM Power System - v1 - Nguyen Si Duy

Accelerate your
journey to AI with IBM
Nguyen Si Duy
Server Technical Specialist
IBM System Hardware
28 Apr 2021
AI is Transforming Every Industry
300% 200% 9/10

Increase in AI Spend Increase in Jobs CIOs Planning to Use
Year over Year Requiring AI Skills Machine Learning
“Businesses are preparing

“AI is the fastest growing “Demand for AI
for the widespread adoption
workload on the planet” talent on the rise”
of machine learning”
Make Your Company an AI Company
Healthcare and
Financial Services Manufacturing Marketing and Retail
Life Science
Wholesale / Commercial • Early cancer detection • Stock level predictions • Funnel predictions
Banking • Product recommendations • Demand forecasting • Personalized ads
• Know Your Customers (KYC) • Personalized prescription • Returns forecasting • Credit scoring
• Anti-Money Laundering matching • Machine failure • Fraud detection
(AML) • • Preemptive maintenance
Medical claim fraud
• Fault detection • Next best offer
Card / Payments Business detection
• Security Cyberlake • Next best action
• Transaction frauds • Flu season prediction
• Master Data Management • Customer segmentation
• Collusion fraud • Drug discovery • Dispatch failures • Customer churn
• Real-time targeting • ER and hospital • Arrival delays
management • Damaged packaged • Customer recommendations
• Credit risk scoring • Ad predictions and fraud
• In-context promotion • Remote patient monitoring • Returns
• Medical test predictions
Retail Banking
• Deposit fraud
• Customer churn prediction
• Auto-loan
Save Time. Save Money. Gain a Competitive Edge.
IBM Cognitive Systems AI Portfolio
AI for IBM Visual Insights H2O Driverless AI
Data Scientists and Auto-DL for Images & Video Auto-ML for Text & Numeric Data, NLP
non-Data Scientists Label Train Deploy Import Experiment Deploy
WML CE: Open Source ML Frameworks

Watson Machine
Learning Community
Edition Large Model Support (LMS)
Distributed Deep Learning
Deep Learning Impact
(up to 4 nodes)
(DLI) Module
Distributed Deep Learning Auto Hyper-parameter Data & Model

Watson Machine (DDL – 1000s of nodes) Tuning Management, ETL,
Learning Visualize, Advise
IBM Spectrum Conductor with Spark
Accelerator Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine
Accelerated
Infrastructure
Accelerated Servers Storage
POWER Processor Roadmap
POWER10
POWER9
POWER7+ POWER8
POWER7 32 nm 22 nm
POWER8NV
45 nm 3-4+ GHz
POWER6+ 22 nm
POWER6 65 nm 3-4+GHz
POWER5™ POWER5+ 65 nm
90 nm
130 nm 3-5GHz ▪ Enhanced Thread
1.5 – 2.3GHz performance
▪ Multi Core (up to 12)
▪ Dual Core
▪ Multi Core (up to 8)
▪ Double L1 data cache ▪ Analytics
▪ On-Chip eDRAM (L3) Optimization
▪ Dual Core ▪ High Frequencies ▪ Power Optimized Cores ▪ On Chip eDRAM (L3)
▪ Enhanced Scaling ▪ Virtualization + ▪ L4 on Memory cCards ▪ Extreme Big Data
▪ Mem Subsystem ++
▪ SMT 2 ▪ Memory Subsystem + ▪ SMT 4 ▪ Bandwidth +++ Optimization
▪ Distributed Switch + ▪ Altivec ▪ Reliability + ▪ SMT 8 ▪ Next Gen Memory
▪ Core Parallelism + ▪ Instruction Retry ▪ VSM & VSX (AltiVec) ▪ Reliability ++ ▪ On-chip accelerators
▪ FP Performance + ▪ Dynamic Energy Mgmt ▪
▪ On Chip Power Mgmt
Protection Keys+ ▪ Open CAPI
▪ Memory bandwidth + ▪ SMT 2+ ▪ PCIe Gen 3
▪ BW enhancements
▪ Virtualization ▪ Protection Keys ▪ Transistors 1.2B
▪ POWER7+ 2.1B
▪ Transistors 4B+
▪ Transistors 790M ▪ Transistors 8B
▪ Transistors 276M
2004 2007 2010 2012 2014 2016 2017 2020 5

IBM POWER9 Family
When data-intensive workloads are the bottom line
Mission Critical Data Intensive Workloads for Private Clouds Big Data Workloads Enterprise AI Workloads
Entry Midsize Enterprise
AC922
S922/S914/S924 E950 E980 LC922/LC921
IC922 6
H922/H924/L922/IC922
Traditional infrastructure isn’t
suited for AI workloads
The wrong infrastructure puts AI at risk.
Data pipeline too slow,

causing bottleneck effect
Systems don't easily

scale to meet demand
Processor not optimized

for AI workloads
IBM Cognitive Systems / February 2020 / © 2020 IBM Corporation

Evolving from Compute Systems to Cognitive Systems
Dev Ecosystem
Industry Alignment Not Just About Hardware Design

Partnerships
Open Frameworks It’s about co-optimization
IBM Software
software
P8 P9 P10 hardware
+
Open Accelerator which just works for ML, DL, and AI
Interfaces
Accelerator Roadmaps
8
IBM Power Systems for AI
IBM Visual Insights
IBM Watson Machine Learning Accelerator IBM Visual Insights
IBM Watson Machine Learning CE
Data Train Inference
IBM Power System IC922 IBM Power System AC922 IBM Power System IC922
IBM Spectrum Scale Storage
IBM Cloud Pak
IBM Cognitive Systems / February 2020 / © 2020 IBM Corporation

Air Cool Mechanical Overview
BMC (Service Processor Card)
• IPMI
• 1 Gb Eth
• 1 VGA PCIe slot (4x)
POWER9 Processor (2x) • Gen4 PCIe
• 190W & 250W • 1 USB 3.0
Memory DIMM’s (16x) • 2, x16 HHHL Adapter
• 8 DDR4 IS DIMMs per socket • 1, x8,x8 Shared HHHL Adapter
• 1 x4 HHHL Adapter
4X - Cooling Fans
• Counter- Rotating
• Hot swap
• 80mm Power Supplies (2x)
• 2200W
• Configuration limits for redundancy
• Hot Swap
• 200VAC, 277VAC, 400VDC input
Storage NVidia Volta GPU

• Optional 2x SFF SATA Disk • 2 per socket
• Optional 2x SFF SATA SSD • SXM2 form factor
• Disk are tray based for hot swap • 300W
• NVLink 2.0
• Air Cooled
Note: Front Bezel removed

Operator Interface
• 1 USB 3.0
• Power Button
• Service LED’s
This speeds up
CPU → GPU
GPU → GPU AND
communications
ONLY
This
speeds up
GPU → GPU
communications
Typical GPU connectivity on x86
System Memory
System
memory bus:
76.8 GB/s CPU
PCIe: 32 GB/s PCIe: 32 GB/s
GPU GPU
NVLink on x86
NVLink CPU to GPU connectivity only on POWER9
System Memory
System
memory bus
170 GB/s
CPU NVLink 2.0
NVLink 2.0 150 GB/s
150 GB/s
Coherent Coherent
NVLink 2.0
access to
system memory GPU 150 GB/s
GPU access to
system memory
Large AI Models Train Caffe with LMS (Large Model Support)
~4X Faster Runtime of 1000 Iterations
12000 3.1 Hours 3.8x Faster
POWER9 Servers with NVLink 10000
for CPUs to GPUs
Time (secs)
8000
vs. 6000
x86 Servers with PCIe 4000 49 Mins

for CPUs to GPUs 2000
0
Xeon x86 2640v4 w/ Power AC922 w/ 4x
4x V100 GPUs V100 GPUs
GoogleNet model on Enlarged

ImageNet Dataset (2240x2240)
Limited memory on a GPU
is was a problem for deep
neural network training
Traditional Model Support Large Model Support

(Competitors) (IBM Power)
Limited memory on GPU forces Use system memory and GPU
trade-off in model size / data coherency with NVLink 2.0 to
resolution which leads to train deep neural nets with
less complex, shallower higher resolution data and
neural nets that don’t perform develop more accurate models
for better inference capability
POWER9 delivers 2.3x more images
processed per second vs tested x86
systems
train more | build more | know more
Critical capabilities (regression, nearest neighbor, GoogLeNet – 1000 epochs

recommendation systems, +++) operate on more HIGHER IS BETTER
than just the GPU memory
Use Server and GPU memory to support higher [4763]

resolution data by moving large amounts of data 4xTesla images /
between the CPU and GPU
V100 GP
NVLINK 2.0
second
PowerAI automatically enables seamless use of 2.3x faster

Server and GPU memory
NVLINK 2.0 and POWER9 significantly cuts training [2042]

times and boosts performance (accuracy) of the 4xTesla images /
V100 GPUs
model with higher resolution data PCIe3 second
Benchmark details in speaker notes.
Server to Supercomputer
on the same infrastructure
Grow with your success without

having to re-platform.
The Power AC922 can scale

with your business.
1 Power AC922 servers The World’s 2nd fastest Super Computer

4,608 Power AC922 servers
384 hours (16 days)
to train a model built on ImageNet-22K
using ResNet-101 on a server with 8 GPUs.
Distributed Deep Learning DDL makes

trained this model in 7 hours AI scale
58x faster by scaling the workload across 64
servers and 256 GPUs. Now iterate!
POWER9 scales with 95% efficiency.

2019 Global Server
Hardware, Server OS
Reliability Report
IBM Power Systems

ranks #1 in every major
reliability category by
ITIC including
virtualization and security
19
Flexible deployment
Right size, right place. To support
business models, the Power AC922 offers
flexible and scalable deployments.
Deployment options
• On-premises
• IBM Cloud Private on Power AC922
• In the IBM Cloud
• In the cloud via CSPs

(e.g. Nimbix, Cirarascale Cloud)

Watson Machine
Learning Community
(up to 4 nodes)
(DLI) Module

Accelerated
Infrastructure
IBM Visual Insights
“Point-and-Click” AI for Images & Video
Label Image or Package & Deploy
Auto-Train AI
Video Data AI Model
Model
Demo available
22
Classification of x-rays to detect COVID-19
NORMAL PNUEMONIA COVID-19 caused

caused by Bacteria by Corona Virus
Streamlining the process – Label, Train and Deploy Demo
LOGIN LABEL XRAYS INTO CATEGORIES AUGMENT XRAY IMAGES
TRAIN A CLASSIFICATION MODEL

VALIDATE & DEPLOY TRAINED MODEL 24
Detect COVID-19 from
x-ray images *
Value
• Complement shortage of radiologists
and doctors in hospitals with AI to
detect and prescribe treatment
quickly.
• Work with available x-rays. Apply AI

to augment limited datasets for
training.
• Churn new models or retrain existing

models with new x-ray images.
• Streamlined process to develop
models in less that 60 mins.
• Save more lives by processing the x-

rays more quickly than a manual
process.
* Not certified by clinical trials


Watson Machine
Learning Community
(up to 4 nodes)
(DLI) Module

Accelerated
Infrastructure
H2O Driverless AI Delivers Automatic ML for the Enterprise
• Performs the function of an expert

data scientist
• Create models quickly with GPUs and

Machine Learning automation
• Delivers insights and interpretability
• Created and supported by world

renowned AI experts from H2O.ai
• Award-winning software
• Subscription based
21 day free trial for Driverless AI

Let’s start your AI
journey TODAY!

02 - Accelerate Your Journey To AI With IBM Power System - v1 - Nguyen Si Duy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 - Accelerate Your Journey To AI With IBM Power System - v1 - Nguyen Si Duy

Uploaded by

Copyright:

Available Formats

Accelerate your

journey to AI with IBM

300% 200% 9/10

“Businesses are preparing

non-Data Scientists Label Train Deploy Import Experiment Deploy

WML CE: Open Source ML Frameworks

Distributed Deep Learning Auto Hyper-parameter Data & Model

2004 2007 2010 2012 2014 2016 2017 2020 5

Data pipeline too slow,

Systems don't easily

Processor not optimized

IBM Cognitive Systems / February 2020 / © 2020 IBM Corporation

Industry Alignment Not Just About Hardware Design

Open Frameworks It’s about co-optimization

IBM Watson Machine Learning Accelerator IBM Visual Insights

IBM Watson Machine Learning CE

Data Train Inference

IBM Spectrum Scale Storage

IBM Cloud Pak

IBM Cognitive Systems / February 2020 / © 2020 IBM Corporation

Storage NVidia Volta GPU

Note: Front Bezel removed

x86 Servers with PCIe 4000 49 Mins

GoogleNet model on Enlarged

Traditional Model Support Large Model Support

Critical capabilities (regression, nearest neighbor, GoogLeNet – 1000 epochs

Use Server and GPU memory to support higher [4763]

PowerAI automatically enables seamless use of 2.3x faster

NVLINK 2.0 and POWER9 significantly cuts training [2042]

Grow with your success without

The Power AC922 can scale

1 Power AC922 servers The World’s 2nd fastest Super Computer

Distributed Deep Learning DDL makes

POWER9 scales with 95% efficiency.

IBM Power Systems

• IBM Cloud Private on Power AC922

• In the IBM Cloud

• In the cloud via CSPs

non-Data Scientists Label Train Deploy Import Experiment Deploy

WML CE: Open Source ML Frameworks

Distributed Deep Learning Auto Hyper-parameter Data & Model

NORMAL PNUEMONIA COVID-19 caused

LOGIN LABEL XRAYS INTO CATEGORIES AUGMENT XRAY IMAGES

TRAIN A CLASSIFICATION MODEL

• Work with available x-rays. Apply AI

• Churn new models or retrain existing

• Save more lives by processing the x-

* Not certified by clinical trials

non-Data Scientists Label Train Deploy Import Experiment Deploy

WML CE: Open Source ML Frameworks

Distributed Deep Learning Auto Hyper-parameter Data & Model

• Performs the function of an expert

• Create models quickly with GPUs and

• Delivers insights and interpretability

• Created and supported by world

21 day free trial for Driverless AI

You might also like