Ai HPC

A PRACTICAL
INTRODUCTION
TO DEEP LEARNING
WITH KERAS AND TENSORFLOW
LEARNING GOALS
In couple of hours, we can only travel

so far
Main goal:
Become familiar with main ideas and
process
A starting point for solving your own

problems
2
INTRO TO DL, PART 1 LAB 1: CNNs AND KERAS 1:50-2:30 ET
101 LAB 2: TROPICAL CYCLONES
Data
Nice to
meet you! • US Naval Research Laboratory (NRL)
• 2000 to 2016
• ~30 minute interval
• Pacific and Atlantic
• Multiple geostationary satellites
• GOES, Himawari, MTSAT, etc…
• ~45,000 images
DL
Source: https://www.nrlmry.navy.mil/tcdat/tc05/ATL/12L.KATRINA/ir/geo/1km/
AGENDA
3
DEEP LEARNING ANALOGIES
What is this deep learning thing, anyway?
A NEW TYPE OF SOFTWARE A GENERALIZATION OF CURVE FITTING
4
A NEW WAY TO BUILD SOFTWARE
Traditional Programming vs Machine Learning
SOFTWARE 1.0: SOFTWARE 2.0:

Traditional Programming Machine Learning
Programmer Optimizer Machine
Task Human
Readable Tons of Learning
Function Examples Function
Expert
Knowledge
Adam
5
A DIFFERENT WAY TO BUILD SOFTWARE
Traditional Programming vs Machine Learning
SOFTWARE 2.0:
Machine Learning
Goals for today: Optimizer Machine

Tons of Learning
1. Learn to use this new approach Examples Function
2. Revolutionize Science Adam
6
Hand written vs leanred functions.
TEMP, PRESSURE, MOISTURE HAND-WRITTEN FUNCTION LEARNED FUNCTION
Function1(T,P,Q) Function1(T,P,Q)
update_mass() A = relu( w1 * [T,P,Q] + b1)

FUNCTION 1
update_momentum() B = relu( w2 * A + b2)
FUNCTION 2 update_energy() C = relu( w3 * B + b3)
do_macrophysics() D = relu( w4 * C + b4)

FUNCTION 3
do_microphysics() E = relu( w5 * D + b5)
FUNCTION 4 y = get_precipitation() y = sigmoid(w6 * E + b6)
return y return y
FUNCTION 5
Convert expert Reverse-engineer a function

PROBABILITY OF RAIN knowledge into a function from inputs / outputs
7
The two approaches are complimentary
MANUAL
MACHINE LEARNING
PROGRAMMING
“SOFTWARE 2.0”
“SOFTWARE 1.0”
REVERSE-ENGINEERED
ENGINEERED
AUTOMATIC
LABOR INTENSIVE
IMPLICIT
EXPLICIT
SUBTLE
EXPLAINABLE
COMPLEX
SIMPLE
FROM EXAMPLES
FROM EXPERTISE
(For best results, combine as needed)
8
Complex phenomena are best described implicitly.
EXAMPLE: ATMOSPHERIC
-3- RIVER
9
A GENERALIZATION OF CURVE FITTING
Curve fitting provides the starting intution
y=f(x)
x
10
A GENERALIZATION OF CURVE FITTING
Differences from traditional curve fitting
Find 𝒇, given 𝒙 and 𝒚 inputs outputs
𝒙1 𝒚𝟏 0 1
𝒙2 𝒚𝟐
0 1
𝒙 𝒚
𝒙3 𝒚𝟑
𝒇(𝒙) 0 1
𝒙4 𝒚𝟒
0 1
Supervised 𝒙5 High dimensional x,y

Hierarchical
𝒚𝟓
Deep 𝒙6
Millions of parameters
𝒚𝟔 0 1
Learning
11
EXAMPLES
12
RECOGNITION/CLASSIFICATION -> FILTER
De-noising gravitational waves
Laser Interferometer Gravitational-wave

Observatory (LIGO).
DL enabling 5000x faster filtering for real-time multi-messenger astronomy

13
USING NUMERIC SIMULATIONS TO TRAIN AI
Data-driven Fluid Simulations using Regression Forests
14
CONVERGED HPC
REVOLUTIONIZING DRUG DISCOVERY
Background
It takes 14 years and $2.5 Billon to develop 1 drug
Higher than 99.5% failure rate after the drug discovery phase
Challenge
QC simulation is computationally expensive - it takes 5 years to
compute on CPUs
So researchers use approximations, compromising on accuracy.
To screen 10M drug candidates,.
Solution
Researchers at the University of Florida and the University of
North Carolina leveraged GPU deep learning to develop a custom
framework ANAKIN-ME, to reproduce molecular energy surfaces
with super speed (microseconds versus several minutes),
extremely high (DFT) accuracy, and at up to 6 orders of
magnitude improvement in speed.
Impact
Speed and accuracy could start a revolution in computational
chemistry — and forever change the way we discover the
medicines of the future
15
EXAMPLE APPLICATIONS
2018: LUNAR CRATER IDENTIFICATION VIA DEEP LEARNING
DIGITAL ELEVATION GROUND TRUTH PREDICTIONS
https://arxiv.org/pdf/1803.02192.pdf
https://phys.org/news/2018-03-technique-ai-craters-moon.html
16
IMPLEMENTATION BASICS
17
AUTO-ML
Eventually, the optimizer might be able to do everything for you
18
WHAT YOU NEED TO MAKE DEEP LEARNING WORK
You need three main ingredients (and some skill)
KERAS + TENSFORFLOW
LARGE QUANTITIES OF DATA ML FRAMEWORK GPU ACCELERATOR
19
DEEP LEARNING FRAMEWORKS
Many frameworks to choose from (but not for Fortran)
Python C++ Julia
20
DEVELOPMENT ENVIRONMENT
JUPYTER NOTEBOOKS
21
NVIDIA GPU CLOUD REGISTRY
CONTAINERIZED SOFTWARE
Singularity
DEEP HPC HPC
LEARNING APPS VISUALIZATION
22
LINEAR REGRESSION
With Scikit Learn
23
LINEAR REGRESSION
With Tensorflow and Keras
24
TRAINING
25
TRAINING VS INFERENCE
INFERENCE PHASE
TRAINING PHASE
ONLINE
LEARNING
SEARCH FOR THE RIGHT PIECES
APPLY THE COMPLETED MODEL
26
TRAINING: THE PLAYERS
DATA, MODEL, LOSS, AND OPTIMIZER
OPTIMIZER
LOSS FCN
DATA
MODEL
27
TRAINING: GRADIENT DESCENT
Finding as solution is as easy as falling down a hill
Start with
random weights
Compute the gradient
and follow it downhill
Stop when
the error is small
28
OPTIMIZERS
Many variations on stochastic gradient descent
29
TRAINING: BACKPROPAGATION
Compute the gradient, by efficiently assigning blame
Error
Prediction
AUTOGRAD
AUTOGRAD
Let a framework keep track of your gradient, so you don’t have to
31
23
AI, MACHINE LEARNING, DEEP LEARNING
EXPERT SYSTEMS
EXPERT SYSTEMS
EXECUTE HAND-WRITTEN ALGORITHMS AT HIGH SPEED
EXECUTE HAND-WRITTEN ALGORITHMS AT HIGH SPEED
TRADITIONAL ML
LEARN FROM EXAMPLES USING HAND-CRAFTED
FEATURES
LEARNS BOTH OUTPUT AND FEATURES FROM DATA
32
DEEP LEARNING VS. MACHINE LEARNING
When should I use deep learning vs traditional machine learning?
TRADITIONAL MACHINE LEARNING

Random forests, SVM, K-means, Logistic Regression
Features hand-crafted by experts
Small set of features: 10s or 100s
NVIDIA RAPIDS: orders of magnitude speedup
SUPERVISED DEEP LEARNING

CNN, RNN, LSTM, GAN, Variational Auto-encoders
Finds features automatically
High dimensional data: images, sounds, speech
Large set of labelled data (10k+ examples)
NVIDIA CU-DNN: accelerates DL frameworks
33
ARTIFICIAL NEURONS
Simple equations with adjustable parameters
Biological neuron Artificial neuron
w1 w2 w3
x1 x2 x3
𝒚 = 𝒇(𝒘𝟏𝒙𝟏 + 𝒘𝟐𝒙𝟐 + 𝒘𝟑𝒙𝟑)
https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7
CURVE FIT WITH SINGLE LAYER NEURAL NETWORK
35
DATA SPLITTING
KEEP TEST, TRAINING, AND VALIDATION DATA SEPERATE
Data
Train Validation Test

For model training For hyperparameter tuning For final evaluation
36
MODEL CAPACITY
AND REGULARIZATION
37
MODEL CAPACITY
A good model is one that generalizes to new data
UNDERFIT GOOD FIT OVER FIT
38
GOOD-FIT
Checking for Generalization
OVER-FITTING
Captures training data, but generalizes poorly
Use more data points

Reduce model capacity
UNDER-FITTING
Model is too simple to fit the curve
Increase model capacity

Use a different model
REGULARIZATION
BatchNorm and Dropout
42
CHALLENGES AND
POTENTIAL SOLUTIONS
43
LABELLING LARGE QUANTITIES OF DATA
How can we overcome the need for manual labelling?
Data Fusion Self-Supervised Learning Reinforcement Learning Human-in-the-loop

Using one data source Predicting input B from input A Obtaining labels directly from the Using human machine iteration to
as the label for another environment or simulation make labelling easier
44
TRANSFER LEARNING: DON’T START FROM SCRATCH
Train on simulated or related data Fine-tune on the real data
45
ENFORCING PHYSICAL CONSTRAINTS
Conservation of Mass, Momentum, Energy, Incompressibility, Lagrange multipliers (penalization), Hard Constraints,
Turbulent Energy Spectra, Translational Invariance Projective Methods, Differentiable Programming
46
INTERPRETABILITY: EXPLAINABLE AI
Layer-wise Relevance Propagation
https://lrpserver.hhi.fraunhofer.de/image-classification
47
USING YOUR GPU
48
GPUS MAKE MACHINE LEARNING PRACTICAL
Train in a day, or a month?
PILLARS OF DATA SCIENCE PERFORMANCE
CUDA Architecture NVLink/NVSwitch CUDA-X AI
PYTHON
6x RAPIDS DL
NVLink FRAMEWORKS
DASK
cuDF cuML cuGraph
NVSwitch cuDNN
CUDA-X
APACHE ARROW on GPU Memory
Massively Parallel Processing High Speed Connecting between NVIDIA GPU Acceleration Libraries
GPUs for Distributed Algorithms for Data Science and AI
50
LEARNED FUNCTIONS ARE GPU ACCELERATED
Next level software. No porting required.
DATA GPU ACCELERATED

FUNCTIONS
51
HOW CAN I GET ACCESS TO A POWERFUL GPU?
Many way to take advantage of NVIDIA GPUs for Deep Learning
NVIDIA Quadro Cloud Computing Services National Supercomputers Google Colab

Laptop or Workstation (Free hours to start) (Apply for compute) (1 Free NVIDIA GPU)
52
VERIFYING YOUR GPU
Keras
TensorFlow
PyTorch
Julia
53
Keras
Automatically uses GPU if available
Julia
TRAINING ON
A SINGLE GPU
PyTorch
54
NVIDIA-SMI
System Management Interface
Memory Processor
Utilizaiton Utilization
Process
Info
55
LEVELS OF AI ENGAGEMENT
LEVEL 1 LEVEL 2 LEVEL 3 LEVEL 4 LEVEL 5
AI in a supporting AI and main AI takes over parts AI replaces The system is
role, but production system of the main significant parts designed with AI
decoupled from influence each production system of the main in mind from the
main production other but are system, classical start; classical
system largely stand-alone parts play algorithms
supporting role generate training
data
Data
Analytics
Numerical
Simulation
Signal Processing
Visualization
…
56
COMPUTATIONAL
SCIENCES Create
Inputs Mathematical
Outputs
Model, First Principles
Some Level of Approximation

Similarities to the shift
Feature → Network Engineering?
NNs as a Porting Strategy?
Create
Inputs Efficient Implementation Outputs
57
CAN THIS WORK ∀? ABOLUTELY, YES!
Proof: Universal Approximation Theorem
𝛼∗
𝛽∗
Combine to form peaks And assemble your arbitrary

Take many non-linearities (one hidden layer is enough!) function with arbitrary 𝜀
Problem: this is an essentially useless

theorem for practical purposes
58
WILL THIS WORK ∀?
Considering pesky practical constraints, like memory and performance
• Anecdotal Evidence: ∃ scientific cases where NNs seem to do work extremely well
• Save bet: it will not work for ∀
• Therefore, by induction (sort of):
• There exists ∃ a subspace in ∀ HPC applications, for which AI works well
• Need to explore the size and shape of this subspace
• Currently I think it is fair to say we don’t understand this domain very well
• But: Each individual case promising 10x, 100x,1000x performance
improvement is probably worth exploring; those can be groundbreaking!
59
WHAT MAKES
AI * HPC SPECIAL? Create
Inputs Mathematical
Outputs
Model
Create
Inputs Efficient Implementation Outputs
60
WHAT MAKES
AI * HPC SPECIAL? Create
Inputs Mathematical
Outputs
Model
Training Labels
Note: We have more information about the ground

truth in AI*HPC (often mathematically precise)
This should actually be an advantage!

Why does it sometimes feel like a disadvantage?
“Prior”
?
Create
Loss Function
Outputs
Inputs Efficient Implementation
Backpropagation
61
HOW TO FILL IN THE ?
Experience, Intuition, and Art Guided Design New Approaches
timestep
Tridiagonal solve
Advection Step
+ Tools Support
Halo Exchange
Pressure Projection
E.g. Declarative Building Blocks to NN E.g. Physics Informed Networks?1),

E.g. Adversarial Fuzzing ODE Networks?2)
Translation
1) Hidden Fluid Mechanics: A Navier-Stokes Informed Deep Learning Framework, M. Raissi et al.
2) Neural Ordinary Differential Equations, R.T.Q. Chen et al. 62
IS A ML MODEL USEFUL FOR SCIENCE?
63
6 STEPS APPROACH
Data Task Model
Evaluation Learning Loss

64
Thanks!

Ai HPC

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ai HPC

Uploaded by

Copyright:

Available Formats

A PRACTICAL

In couple of hours, we can only travel

A starting point for solving your own

A NEW TYPE OF SOFTWARE A GENERALIZATION OF CURVE FITTING

SOFTWARE 1.0: SOFTWARE 2.0:

Goals for today: Optimizer Machine

2. Revolutionize Science Adam

TEMP, PRESSURE, MOISTURE HAND-WRITTEN FUNCTION LEARNED FUNCTION

update_mass() A = relu( w1 * [T,P,Q] + b1)

FUNCTION 2 update_energy() C = relu( w3 * B + b3)

do_macrophysics() D = relu( w4 * C + b4)

FUNCTION 4 y = get_precipitation() y = sigmoid(w6 * E + b6)

Convert expert Reverse-engineer a function

(For best results, combine as needed)

Find 𝒇, given 𝒙 and 𝒚 inputs outputs

Supervised 𝒙5 High dimensional x,y

Laser Interferometer Gravitational-wave

DL enabling 5000x faster filtering for real-time multi-messenger astronomy

DIGITAL ELEVATION GROUND TRUTH PREDICTIONS

LARGE QUANTITIES OF DATA ML FRAMEWORK GPU ACCELERATOR

Python C++ Julia

SEARCH FOR THE RIGHT PIECES

APPLY THE COMPLETED MODEL

LEARNS BOTH OUTPUT AND FEATURES FROM DATA

TRADITIONAL MACHINE LEARNING

SUPERVISED DEEP LEARNING

Biological neuron Artificial neuron

𝒚 = 𝒇(𝒘𝟏𝒙𝟏 + 𝒘𝟐𝒙𝟐 + 𝒘𝟑𝒙𝟑)

Train Validation Test

UNDERFIT GOOD FIT OVER FIT

Use more data points

Increase model capacity

Data Fusion Self-Supervised Learning Reinforcement Learning Human-in-the-loop

Train on simulated or related data Fine-tune on the real data

Layer-wise Relevance Propagation

CUDA Architecture NVLink/NVSwitch CUDA-X AI

APACHE ARROW on GPU Memory

DATA GPU ACCELERATED

NVIDIA Quadro Cloud Computing Services National Supercomputers Google Colab

Some Level of Approximation

NNs as a Porting Strategy?

Inputs Efficient Implementation Outputs

Combine to form peaks And assemble your arbitrary

Problem: this is an essentially useless

Considering pesky practical constraints, like memory and performance

• Save bet: it will not work for ∀

• Therefore, by induction (sort of):

• There exists ∃ a subspace in ∀ HPC applications, for which AI works well

• Need to explore the size and shape of this subspace

Some Level of Approximation

Inputs Efficient Implementation Outputs

Some Level of Approximation

Note: We have more information about the ground

This should actually be an advantage!

E.g. Declarative Building Blocks to NN E.g. Physics Informed Networks?1),

Data Task Model

Evaluation Learning Loss

You might also like