Professional Documents
Culture Documents
Feedback to:
• joshptobin [at] gmail.com
XKCD, https://xkcd.com/1838/
Poor model
performance
Poor model
performance
Poor model
performance
Poor model
performance
Andrej Karpathy, CS231n course notes He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet
classification." Proceedings of the IEEE international conference on computer vision. 2015.
Josh Tobin. January 2019. !15 josh-tobin.com/troubleshooting-deep-neural-networks
0. Why is troubleshooting hard?
Poor model
performance
Poor model
performance
Data/model fit
Poor model
performance
Data/model fit
Poor model
performance
Dataset
Data/model fit
construction
PhD Tesla
Slide from Andrej Karpathy’s talk “Building the Software 2.0 Stack” at TrainAI 2018, 5/10/2018
Josh Tobin. January 2019. !21 josh-tobin.com/troubleshooting-deep-neural-networks
0. Why is troubleshooting hard?
• Class imbalances
• Noisy labels
Pessimism.
Improve
model/data
Improve
model/data
Starting simple
Steps
Choose a simple
a
architecture
c Normalize inputs
Input 1
Input 2
“This”
“is”
Input 3
“a”
“cat”
Josh Tobin. January 2019. !38 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple
Input 1
Input 2
“This”
“is”
Input 3
“a”
“cat”
Josh Tobin. January 2019. !39 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple
Input 1
(72-dim)
Input 2
LSTM (48-dim)
“This”
“is”
Input 3
“a”
“cat”
Josh Tobin. January 2019. !40 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple
Input 1
(72-dim)
Input 2
LSTM (48-dim)
“This”
“is”
Input 3
“a” (184-dim)
“cat”
Josh Tobin. January 2019. !41 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple
Input 1
(72-dim)
Input 2
T/F
LSTM (48-dim)
“This”
“is”
Input 3
“a” (184-dim) (128-dim)
“cat”
Josh Tobin. January 2019. !42 josh-tobin.com/troubleshooting-deep-neural-networks
(256-dim)
1. Start simple
Starting simple
Steps
Choose a simple
a
architecture
c Normalize inputs
• Regularization: None
Starting simple
Steps
Choose a simple
a
architecture
c Normalize inputs
Starting simple
Steps
Choose a simple
a
architecture
c Normalize inputs
• No regularization
0 (no pedestrian) 1 (yes pedestrian)
Starting simple
Steps Summary
Improve
model/data
Overfit a single
b
batch
Compare to a
c
known result
Josh Tobin. January 2019. !53 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
for v1
• tf.layers.dense(…)
• Rule of thumb: <200 lines
instead of
• (Tested infrastructure components tf.nn.relu(tf.matmul(W, x))
are fine)
• tf.losses.cross_entropy(…)
instead of writing out the exp
for v1
• tf.layers.dense(…)
• Rule of thumb: <200 lines
instead of
• (Tested infrastructure components tf.nn.relu(tf.matmul(W, x))
are fine)
• tf.losses.cross_entropy(…)
instead of writing out the exp
Overfit a single
b
batch
Compare to a
c
known result
Josh Tobin. January 2019. !58 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
• tensorflow: trickier
Option 1: step through graph creation
• tensorflow: trickier
Option 2: step into training loop
• tensorflow: trickier
Option 3: use tfdb
python -m tensorflow.python.debug.examples.debug_mnist --debug
Stops
execution at
each
sess.run(…)
and lets you
inspect
Josh Tobin. January 2019. !63 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
Overfit a single
b
batch
Compare to a
c
known result
Josh Tobin. January 2019. !69 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
Error explodes
Error oscillates
Error plateaus
Error explodes
Error oscillates
Error plateaus
Error oscillates
Error plateaus
Error explodes
Error plateaus
Error explodes
Error oscillates
Error plateaus
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits)
• Data or labels corrupted
Error plateaus
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits)
• Data or labels corrupted
Overfit a single
b
batch
Compare to a
c
known result
Josh Tobin. January 2019. !76 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
You can:
Less
useful
Josh Tobin. January 2019. !77 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
You can:
Less
useful
Josh Tobin. January 2019. !78 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
• Results fromYou
the paper (with no code)
can:
• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Same as before, but with lower
confidence
• Results from a similar model on a similar dataset
useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
You can:
useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
You can:
Less
useful
Josh Tobin. January 2019. !81 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug
useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
You can:
• Results from the Get
paper (with no code)
useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
You can:
• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Make sure your model is learning
anything at all
• Results from a similar model on a similar dataset
useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
Overfit a single
• Look for corrupted data, over-
b regularization, broadcasting errors
batch
Improve
model/data
Bias-variance decomposition
Bias-variance decomposition
Bias-variance decomposition
Bias-variance decomposition
Breakdown of test error by source
40
35 2 34
5 32
30
2 27
25
25
20
15
10
0
ro r s r o r ro r n g ro r
er i a e r . e., er it t i er
b l e l e b g)
a i n e ( i
Va
l e rf est
uci a b fittin Tr n c ng) t ov T
r e d o id er a r ia fitti s e
Ir Av , und V ver l
. o Va
(i . e
Bias-variance decomposition
This assumes train, val, and test all come from the same
distribution. What if not?
Use two val sets: one sampled from training distribution and
one from test distribution
15
10
5
0
or
or
or
ift
ce
ing
r
r
rro
rro
erf bias
sh
err
err
err
n
ng
t
le
le
ria
rfit
on
in
st
le
itti
va
va
Va
ve
., u ble
cib
Tra
Te
uti
lo
in
st
(i.e oida
trib
du
nd
Tra
Te
Va
Irre
Dis
Av
Josh Tobin. January 2019. !95 josh-tobin.com/troubleshooting-deep-neural-networks
3. Evaluation
Goal
1%
performance
Train - goal = 19%
(under-fitting)
Train error 20%
Goal
1%
performance
Goal
1%
performance
Improve
model/data
a Address under-fitting
b Address over-fitting
Address distribution
c
shift
Re-balance datasets
d
(if applicable)
Josh Tobin. January 2019. !101 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements
B. Reduce regularization
C. Error analysis
Goal performance 1% 1%
Goal performance 1% 1% 1%
Goal performance 1% 1% 1% 1%
a Address under-fitting
b Address over-fitting
Address distribution
c
shift
Re-balance datasets
d
(if applicable)
Josh Tobin. January 2019. !106 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements
E. Error analysis
G. Tune hyperparameters
H. Early stopping
I. Remove features
Try later
J. Reduce model size
Josh Tobin. January 2019. !107 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements
E. Error analysis
G. Tune hyperparameters
Not
H. Early stopping
recommended!
I. Remove features
Try later
J. Reduce model size
Josh Tobin. January 2019. !108 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements
Goal performance 1%
Goal performance 1% 1%
Goal performance 1% 1% 1%
Goal performance 1% 1% 1% 1%
Goal performance 1% 1% 1% 1% 1%
a Address under-fitting
b Address over-fitting
Address distribution
c
shift
Re-balance datasets
d
(if applicable)
Josh Tobin. January 2019. !114 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements
Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error analysis
Error % Error %
Error type Potential solutions Priority
(train-val) (test-val)
1. Hard-to-see
pedestrians
0.1% 0.1% • Better sensors Low
Medium
• Try to remove with pre-processing
• Better sensors
3. Nighttime
0.1% 1% • Synthetically darken training images
High
scenes • Simulate night-time data
Domain adaptation
• Fine-tuning a pre-
You have limited data trained model
Supervised
from target domain • Adding target data to
train set
• CycleGAN
a Address under-fitting
b Address over-fitting
Address distribution
c
shift
Re-balance datasets
d
(if applicable)
Josh Tobin. January 2019. !123 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements
Rebalancing datasets
Improve
model/data
Hyperparameter optimization
Model & optimizer choices? Running example
Network: ResNet
- Weight initialization?
- Kernel size?
- Etc
Optimizer: Adam
- Batch size?
- Learning rate?
Disadvantages
• Can be combined with other methods
(e.g., manually select parameter ranges to • Requires detailed understanding of the
optimizer over) algorithm
• Time-consuming
Disadvantages
Disadvantages
Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)
Disadvantages
Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)
Disadvantages
Best performers
Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)
Disadvantages
Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)
Disadvantages
Method 4: coarse-to-fine
How it works Advantages
• Can narrow in on very high performing
Hyperparameter 1 (e.g., batch size)
hyperparameters
Disadvantages
etc.
• Alternate between:
https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f