You are on page 1of 141

Troubleshooting Deep Neural Networks

A Field Guide to Fixing Your Model

Josh Tobin (with Sergey Karayev and Pieter Abbeel)

Josh Tobin. January 2019. !1 josh-tobin.com/troubleshooting-deep-neural-networks


Help me make this guide better!
Help me find:

• Things that are unclear

• Missing debugging tips, tools, tricks, strategies

• Anything else that will make the guide better

Feedback to:
• joshptobin [at] gmail.com

• Twitter thread (https://twitter.com/josh_tobin_)

Josh Tobin. January 2019. !2 josh-tobin.com/troubleshooting-deep-neural-networks


Why talk about DL troubleshooting?

XKCD, https://xkcd.com/1838/

Josh Tobin. January 2019. !3 josh-tobin.com/troubleshooting-deep-neural-networks


Why talk about DL troubleshooting?

Josh Tobin. January 2019. !4 josh-tobin.com/troubleshooting-deep-neural-networks


Why talk about DL troubleshooting?

Common sentiment among practitioners:

80-90% of time debugging and tuning

10-20% deriving math or implementing things

Josh Tobin. January 2019. !5 josh-tobin.com/troubleshooting-deep-neural-networks


Why is DL troubleshooting so hard?

Josh Tobin. January 2019. !6 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Suppose you can’t reproduce a result


Learning curve from the paper Your learning curve

He, Kaiming, et al. "Deep residual learning for image recognition." 



Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Josh Tobin. January 2019. !7 josh-tobin.com/troubleshooting-deep-neural-networks
0. Why is troubleshooting hard?

Why is your performance worse?

Poor model
performance

Josh Tobin. January 2019. !8 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Why is your performance worse?


Implementation
bugs

Poor model
performance

Josh Tobin. January 2019. !9 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Most DL bugs are invisible

Josh Tobin. January 2019. !10 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Most DL bugs are invisible


Labels out of order!

(real bug I spent 1 day on early in my PhD)

Josh Tobin. January 2019. !11 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Why is your performance worse?


Implementation
bugs

Poor model
performance

Josh Tobin. January 2019. !12 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Why is your performance worse?


Implementation Hyperparameter
bugs choices

Poor model
performance

Josh Tobin. January 2019. !13 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Models are sensitive to hyperparameters

Andrej Karpathy, CS231n course notes

Josh Tobin. January 2019. !14 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Models are sensitive to hyperparameters


Performance of a 30-layer ResNet with
different weight initializations

Andrej Karpathy, CS231n course notes He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet
classification." Proceedings of the IEEE international conference on computer vision. 2015.
Josh Tobin. January 2019. !15 josh-tobin.com/troubleshooting-deep-neural-networks
0. Why is troubleshooting hard?

Why is your performance worse?


Implementation Hyperparameter
bugs choices

Poor model
performance

Josh Tobin. January 2019. !16 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Why is your performance worse?


Implementation Hyperparameter
bugs choices

Poor model
performance

Data/model fit

Josh Tobin. January 2019. !17 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Data / model fit


Data from the paper: ImageNet Yours: self-driving car images

Josh Tobin. January 2019. !18 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Why is your performance worse?


Implementation Hyperparameter
bugs choices

Poor model
performance

Data/model fit

Josh Tobin. January 2019. !19 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Why is your performance worse?


Implementation Hyperparameter
bugs choices

Poor model
performance

Dataset
Data/model fit
construction

Josh Tobin. January 2019. !20 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Constructing good datasets is hard


Amount of lost sleep over...

PhD Tesla

Slide from Andrej Karpathy’s talk “Building the Software 2.0 Stack” at TrainAI 2018, 5/10/2018
Josh Tobin. January 2019. !21 josh-tobin.com/troubleshooting-deep-neural-networks
0. Why is troubleshooting hard?

Common dataset construction issues


• Not enough data

• Class imbalances

• Noisy labels

• Train / test from different distributions

• (Not the main focus of this guide)

Josh Tobin. January 2019. !22 josh-tobin.com/troubleshooting-deep-neural-networks


0. Why is troubleshooting hard?

Takeaways: why is troubleshooting hard?

• Hard to tell if you have a bug

• Lots of possible sources for the


same degradation in performance

• Results can be sensitive to small


changes in hyperparameters and
dataset makeup

Josh Tobin. January 2019. !23 josh-tobin.com/troubleshooting-deep-neural-networks


Strategy for DL troubleshooting

Josh Tobin. January 2019. !24 josh-tobin.com/troubleshooting-deep-neural-networks


Key mindset for DL troubleshooting

Pessimism.

Josh Tobin. January 2019. !25 josh-tobin.com/troubleshooting-deep-neural-networks


Key idea of DL troubleshooting

Since it’s hard to …Start simple and gradually


disambiguate errors… ramp up complexity

Josh Tobin. January 2019. !26 josh-tobin.com/troubleshooting-deep-neural-networks


Strategy for DL troubleshooting
Tune hyper-
parameters

Implement Meets re-


Start simple Evaluate
& debug quirements

Improve
model/data

Josh Tobin. January 2019. !27 josh-tobin.com/troubleshooting-deep-neural-networks


Quick summary
Overview
• Choose the simplest model & data possible
Start 
 (e.g., LeNet on a subset of your data)
simple

Josh Tobin. January 2019. !28 josh-tobin.com/troubleshooting-deep-neural-networks


Quick summary
Overview
• Choose the simplest model & data possible
Start 
 (e.g., LeNet on a subset of your data)
simple

• Once model runs, overfit a single batch &


Implement reproduce a known result
& debug

Josh Tobin. January 2019. !29 josh-tobin.com/troubleshooting-deep-neural-networks


Quick summary
Overview
• Choose the simplest model & data possible
Start 
 (e.g., LeNet on a subset of your data)
simple

• Once model runs, overfit a single batch &


Implement reproduce a known result
& debug

• Apply the bias-variance decomposition to


Evaluate decide what to do next

Josh Tobin. January 2019. !30 josh-tobin.com/troubleshooting-deep-neural-networks


Quick summary
Overview
• Choose the simplest model & data possible
Start 
 (e.g., LeNet on a subset of your data)
simple

• Once model runs, overfit a single batch &


Implement reproduce a known result
& debug

• Apply the bias-variance decomposition to


Evaluate decide what to do next

• Use coarse-to-fine random searches


Tune hyp-
eparams

Josh Tobin. January 2019. !31 josh-tobin.com/troubleshooting-deep-neural-networks


Quick summary
Overview
• Choose the simplest model & data possible
Start 
 (e.g., LeNet on a subset of your data)
simple

• Once model runs, overfit a single batch &


Implement reproduce a known result
& debug

• Apply the bias-variance decomposition to


Evaluate decide what to do next

• Use coarse-to-fine random searches


Tune hyp-
eparams

• Make your model bigger if you underfit; add


Improve data or regularize if you overfit
model/data
Josh Tobin. January 2019. !32 josh-tobin.com/troubleshooting-deep-neural-networks
We’ll assume you already have…

• Initial test set

• A single metric to improve

• Target performance based on human-level


performance, published results, previous
baselines, etc

Josh Tobin. January 2019. !33 josh-tobin.com/troubleshooting-deep-neural-networks


We’ll assume you already have…
Running example

• Initial test set

• A single metric to improve

• Target performance based on human-level


performance, published results, previous
baselines, etc

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy


Josh Tobin. January 2019. !34 josh-tobin.com/troubleshooting-deep-neural-networks
Strategy for DL troubleshooting
Tune hyper-
parameters

Implement Meets re-


Start simple Evaluate
& debug quirements

Improve
model/data

Josh Tobin. January 2019. !35 josh-tobin.com/troubleshooting-deep-neural-networks


1. Start simple

Starting simple
Steps

Choose a simple
a
architecture

b Use sensible defaults

c Normalize inputs

d Simplify the problem


Josh Tobin. January 2019. !36 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Demystifying neural network architecture selection


Your input data is… Start here Consider using this later

Images LeNet-like architecture ResNet


Images

LSTM with one hidden Attention model or


Sequences layer WaveNet-like model

Fully connected neural net Problem-dependent


Other with one hidden layer

Josh Tobin. January 2019. !37 josh-tobin.com/troubleshooting-deep-neural-networks


1. Start simple

Dealing with multiple input modalities

Input 1

Input 2

“This”

“is”
Input 3
“a”

“cat”
Josh Tobin. January 2019. !38 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Dealing with multiple input modalities


1. Map each input into a (lower-dimensional) feature space

Input 1

Input 2

“This”

“is”
Input 3
“a”

“cat”
Josh Tobin. January 2019. !39 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Dealing with multiple input modalities


1. Map each input into a (lower-dimensional) feature space
ConvNet Flatten
(64-dim)

Input 1

(72-dim)

Input 2

LSTM (48-dim)
“This”

“is”
Input 3
“a”

“cat”
Josh Tobin. January 2019. !40 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Dealing with multiple input modalities


2. Concatenate
ConvNet Flatten Con“cat”
(64-dim)

Input 1

(72-dim)

Input 2

LSTM (48-dim)
“This”

“is”
Input 3
“a” (184-dim)

“cat”
Josh Tobin. January 2019. !41 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Dealing with multiple input modalities


3. Pass through fully connected layers to output
ConvNet Flatten Concat FC FC Output
(64-dim)

Input 1

(72-dim)

Input 2

T/F
LSTM (48-dim)
“This”

“is”
Input 3
“a” (184-dim) (128-dim)

“cat”
Josh Tobin. January 2019. !42 josh-tobin.com/troubleshooting-deep-neural-networks
(256-dim)
1. Start simple

Starting simple
Steps

Choose a simple
a
architecture

b Use sensible defaults

c Normalize inputs

d Simplify the problem


Josh Tobin. January 2019. !43 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Recommended network / optimizer defaults

• Optimizer: Adam optimizer with learning rate 3e-4

• Activations: relu (FC and Conv models), tanh (LSTMs)

• Initialization: He et al. normal (relu), Glorot normal (tanh)

• Regularization: None

• Data normalization: None

Josh Tobin. January 2019. !44 josh-tobin.com/troubleshooting-deep-neural-networks


1. Start simple

Definitions of recommended initializers


• (n is the number of inputs, m is
the number of outputs)

• He et al. normal (used for ReLU)




 r !

 2

 N 0,
n
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>

• Glorot normal (used for tanh)


r !
2
N 0,
n+m
Josh Tobin. January 2019. !45 josh-tobin.com/troubleshooting-deep-neural-networks
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
1. Start simple

Starting simple
Steps

Choose a simple
a
architecture

b Use sensible defaults

c Normalize inputs

d Simplify the problem


Josh Tobin. January 2019. !46 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Important to normalize scale of input data

• Subtract mean and divide by variance

• For images, fine to scale values to [0, 1] 



(e.g., by dividing by 255)

[Careful, make sure your library doesn’t do it for you!]

Josh Tobin. January 2019. !47 josh-tobin.com/troubleshooting-deep-neural-networks


1. Start simple

Starting simple
Steps

Choose a simple
a
architecture

b Use sensible defaults

c Normalize inputs

d Simplify the problem


Josh Tobin. January 2019. !48 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Consider simplifying the problem as well

• Start with a small training set (~10,000 examples)

• Use a fixed number of objects, classes, smaller


image size, etc.

• Create a simpler synthetic training set

Josh Tobin. January 2019. !49 josh-tobin.com/troubleshooting-deep-neural-networks


1. Start simple

Simplest model for pedestrian detection


Running example

• Start with a subset of 10,000 images


for training, 1,000 for val, and 500 for
test

• Use a LeNet architecture with


sigmoid cross-entropy loss

• Adam optimizer with LR 3e-4

• No regularization
0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy


Josh Tobin. January 2019. !50 josh-tobin.com/troubleshooting-deep-neural-networks
1. Start simple

Starting simple
Steps Summary

Choose a simple • LeNet, LSTM, or Fully


a Connected
architecture

• Adam optimizer & no


b Use sensible defaults regularization

• Subtract mean and divide


c Normalize inputs by std, or just divide by
255 (ims)

• Start with a simpler


d Simplify the problem version of your problem
(e.g., smaller dataset)
Josh Tobin. January 2019. !51 josh-tobin.com/troubleshooting-deep-neural-networks
Strategy for DL troubleshooting
Tune hyper-
parameters

Implement Meets re-


Start simple Evaluate
& debug quirements

Improve
model/data

Josh Tobin. January 2019. !52 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Steps

Get your model to


a
run

Overfit a single
b
batch

Compare to a
c
known result
Josh Tobin. January 2019. !53 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Preview: the five most common DL bugs


• Incorrect shapes for your tensors

Can fail silently! E.g., accidental broadcasting: x.shape =
(None,), y.shape = (None, 1), (x+y).shape = (None, None)

• Pre-processing inputs incorrectly



E.g., Forgetting to normalize, or too much pre-processing

• Incorrect input to your loss function



E.g., softmaxed outputs to a loss that expects logits

• Forgot to set up train mode for the net correctly



E.g., toggling train/eval, controlling batch norm dependencies

• Numerical instability - inf/NaN



Often stems from using an exp, log, or div operation

Josh Tobin. January 2019. !54 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

General advice for implementing your model


Lightweight implementation

• Minimum possible new lines of code


for v1

• Rule of thumb: <200 lines

• (Tested infrastructure components


are fine)

Josh Tobin. January 2019. !55 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

General advice for implementing your model


Lightweight implementation Use off-the-shelf components, e.g.,

• Minimum possible new lines of code • Keras

for v1

• tf.layers.dense(…) 

• Rule of thumb: <200 lines
instead of 

• (Tested infrastructure components tf.nn.relu(tf.matmul(W, x))
are fine)
• tf.losses.cross_entropy(…) 

instead of writing out the exp

Josh Tobin. January 2019. !56 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

General advice for implementing your model


Lightweight implementation Use off-the-shelf components, e.g.,

• Minimum possible new lines of code • Keras

for v1

• tf.layers.dense(…) 

• Rule of thumb: <200 lines
instead of 

• (Tested infrastructure components tf.nn.relu(tf.matmul(W, x))
are fine)
• tf.losses.cross_entropy(…) 

instead of writing out the exp

Build complicated data pipelines later

• Start with a dataset you can load into


memory

Josh Tobin. January 2019. !57 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Steps

Get your model to


a
run

Overfit a single
b
batch

Compare to a
c
known result
Josh Tobin. January 2019. !58 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Implementing bug-free DL models


Common
issues Recommended resolution
Get your model to
a
run Shape
mismatch
Step through model creation and
inference in a debugger
Casting
issue

Scale back memory intensive


OOM
operations one-by-one

Standard debugging toolkit (Stack


Other
Overflow + interactive debugger)
Josh Tobin. January 2019. !59 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Implementing bug-free DL models


Common
issues Recommended resolution
Get your model to
a
run Shape
mismatch
Step through model creation and
inference in a debugger
Casting
issue

Scale back memory intensive


OOM
operations one-by-one

Standard debugging toolkit (Stack


Other
Overflow + interactive debugger)
Josh Tobin. January 2019. !60 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Debuggers for DL code


• Pytorch: easy, use ipdb

• tensorflow: trickier 


Option 1: step through graph creation

Josh Tobin. January 2019. !61 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Debuggers for DL code


• Pytorch: easy, use ipdb

• tensorflow: trickier 


Option 2: step into training loop

Evaluate tensors using sess.run(…)


Josh Tobin. January 2019. !62 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Debuggers for DL code


• Pytorch: easy, use ipdb

• tensorflow: trickier 


Option 3: use tfdb
python -m tensorflow.python.debug.examples.debug_mnist --debug

Stops
execution at
each
sess.run(…)
and lets you
inspect
Josh Tobin. January 2019. !63 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Implementing bug-free DL models


Common
issues Recommended resolution
Get your model to
a
run Shape
mismatch
Step through model creation and
inference in a debugger
Casting
issue

Scale back memory intensive


OOM
operations one-by-one

Standard debugging toolkit (Stack


Other
Overflow + interactive debugger)
Josh Tobin. January 2019. !64 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Shape
• Confusing tensor.shape, tf.shape(tensor),
mismatch Undefined tensor.get_shape()
shapes
• Reshaping things to a shape of type Tensor (e.g.,
when loading data from a file)

• Flipped dimensions when using tf.reshape(…)


Incorrect
shapes • Took sum, average, or softmax over wrong
dimension
• Forgot to flatten after conv layers
• Forgot to get rid of extra “1” dimensions (e.g., if
shape is (None, 1, 1, 4)
• Data stored on disk in a different dtype than
loaded (e.g., stored a float64 numpy array, and
loaded it as a float32)

Josh Tobin. January 2019. !65 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Casting issue • Forgot to cast images from uint8 to float32
Data not in
float32 • Generated data using numpy in float64, forgot to
cast to float32

Josh Tobin. January 2019. !66 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
OOM • Too large a batch size for your model (e.g.,
Too big a during evaluation)
tensor • Too large fully connected layers

Too much • Loading too large a dataset into memory, rather


than using an input queue
data
• Allocating too large a buffer for dataset creation

• Memory leak due to creating multiple models in


Duplicating the same session
operations
• Repeatedly creating an operation (e.g., in a
function that gets called over and over again)

Other • Other processes running on your GPU


processes
Josh Tobin. January 2019. !67 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Other common
• Forgot to initialize variables
errors Other bugs • Forgot to turn off bias when using batch norm
• “Fetch argument has invalid type” - usually you
overwrote one of your ops with an output during
training

Josh Tobin. January 2019. !68 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Steps

Get your model to


a
run

Overfit a single
b
batch

Compare to a
c
known result
Josh Tobin. January 2019. !69 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Overfit a single
b
batch Error goes up

Error explodes

Error oscillates

Error plateaus

Josh Tobin. January 2019. !70 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Overfit a single
b • Flipped the sign of the loss function / gradient
batch Error goes up • Learning rate too high
• Softmax taken over wrong dimension

Error explodes

Error oscillates

Error plateaus

Josh Tobin. January 2019. !71 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Overfit a single
b
batch Error goes up

• Numerical issue. Check all exp, log, and div operations


Error explodes
• Learning rate too high

Error oscillates

Error plateaus

Josh Tobin. January 2019. !72 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Overfit a single
b
batch Error goes up

Error explodes

• Data or labels corrupted (e.g., zeroed, incorrectly


Error oscillates shuffled, or preprocessed incorrectly)
• Learning rate too high

Error plateaus

Josh Tobin. January 2019. !73 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Overfit a single
b
batch Error goes up

Error explodes

Error oscillates

• Learning rate too low


• Gradients not flowing through the whole model

Error plateaus
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits)
• Data or labels corrupted

Josh Tobin. January 2019. !74 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Common
issues Most common causes
Overfit a single
b • Flipped the sign of the loss function / gradient
batch Error goes up • Learning rate too high
• Softmax taken over wrong dimension

• Numerical issue. Check all exp, log, and div operations


Error explodes
• Learning rate too high

• Data or labels corrupted (e.g., zeroed or incorrectly


Error oscillates shuffled)
• Learning rate too high

• Learning rate too low


• Gradients not flowing through the whole model

Error plateaus
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits)
• Data or labels corrupted

Josh Tobin. January 2019. !75 josh-tobin.com/troubleshooting-deep-neural-networks


2. Implement & debug

Implementing bug-free DL models


Steps

Get your model to


a
run

Overfit a single
b
batch

Compare to a
c
known result
Josh Tobin. January 2019. !76 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar
dataset to yours
useful

You can: 


• Walk through code line-by-line and


ensure you have the same output
• Ensure your performance is up to par
with expectations

Less
useful
Josh Tobin. January 2019. !77 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar
dataset to yours

useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)

You can: 


• Walk through code line-by-line and


ensure you have the same output

Less
useful
Josh Tobin. January 2019. !78 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar dataset
to yours

useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results fromYou
the paper (with no code)

can: 

• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Same as before, but with lower
confidence
• Results from a similar model on a similar dataset

Less • Super simple baselines (e.g., average of outputs or linear


useful regression)
Josh Tobin. January 2019. !79 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar
dataset to yours

useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from a paper (with no code)

You can: 


• Ensure your performance is up to par


with expectations
Less
useful
Josh Tobin. January 2019. !80 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar
dataset to yours

useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)
You can: 


• Unofficial model implementation

• Make sure your model performs well in a


simpler setting
• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g.,


MNIST)

Less
useful
Josh Tobin. January 2019. !81 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar dataset
to yours

useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

You can: 

• Results from the Get
paper (with no code)

• a general sense of what kind of


• Results from yourperformance
model on a can be
benchmarkexpected
dataset (e.g.,
MNIST)

• Results from a similar model on a similar dataset

Less • Super simple baselines (e.g., average of outputs or linear


useful regression)
Josh Tobin. January 2019. !82 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar dataset
to yours

useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

You can: 

• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Make sure your model is learning
anything at all
• Results from a similar model on a similar dataset

Less • Super simple baselines (e.g., average of outputs or linear


useful regression)
Josh Tobin. January 2019. !83 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Hierarchy of known results


More • Official model implementation evaluated on similar dataset
to yours

useful
• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g.,


MNIST)

• Results from a similar model on a similar dataset

Less • Super simple baselines (e.g., average of outputs or linear


useful regression)
Josh Tobin. January 2019. !84 josh-tobin.com/troubleshooting-deep-neural-networks
2. Implement & debug

Summary: how to implement & debug


Steps Summary
• Step through in debugger & watch out
Get your model to for shape, casting, and OOM errors
a
run

Overfit a single
• Look for corrupted data, over-
b regularization, broadcasting errors
batch

Compare to a • Keep iterating until model performs


c up to expectations
known result
Josh Tobin. January 2019. !85 josh-tobin.com/troubleshooting-deep-neural-networks
Strategy for DL troubleshooting
Tune hyper-
parameters

Implement Meets re-


Start simple Evaluate
& debug quirements

Improve
model/data

Josh Tobin. January 2019. !86 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Bias-variance decomposition

Josh Tobin. January 2019. !87 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Bias-variance decomposition

Josh Tobin. January 2019. !88 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Bias-variance decomposition

Josh Tobin. January 2019. !89 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Bias-variance decomposition
Breakdown of test error by source
40

35 2 34
5 32
30
2 27
25
25

20

15

10

0
ro r s r o r ro r n g ro r
er i a e r . e., er it t i er
b l e l e b g)
a i n e ( i
Va
l e rf est
uci a b fittin Tr n c ng) t ov T
r e d o id er a r ia fitti s e
Ir Av , und V ver l
. o Va
(i . e

Josh Tobin. January 2019. !90 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Bias-variance decomposition

Test error = irreducible error + bias + variance + val overfitting

This assumes train, val, and test all come from the same
distribution. What if not?

Josh Tobin. January 2019. !91 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Handling distribution shift


Train data Test data

Use two val sets: one sampled from training distribution and
one from test distribution

Josh Tobin. January 2019. !92 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

The bias-variance tradeoff

Josh Tobin. January 2019. !93 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Bias-variance with distribution shift

Josh Tobin. January 2019. !94 josh-tobin.com/troubleshooting-deep-neural-networks


3. Evaluation

Bias-variance with distribution shift


Breakdown of test error by source
40
35 2 34
3 32
30 2 29
2 27
25
25
20

15
10

5
0

or

or
or

ift
ce

ing
r

r
rro

rro
erf bias

sh
err

err
err

n
ng

t
le

le
ria

rfit
on
in

st
le

itti

va

va
Va

ve
., u ble
cib

Tra

Te
uti

lo
in

st
(i.e oida

trib
du

nd

Tra

Te

Va
Irre

Dis
Av
Josh Tobin. January 2019. !95 josh-tobin.com/troubleshooting-deep-neural-networks
3. Evaluation

Train, val, and test error for pedestrian detection


Running example
Error source Value

Goal
1%
performance
Train - goal = 19%

(under-fitting)
Train error 20%

Validation error 27%


0 (no pedestrian) 1 (yes pedestrian)
Test error 28%
Goal: 99% classification accuracy
Josh Tobin. January 2019. !96 josh-tobin.com/troubleshooting-deep-neural-networks
3. Evaluation

Train, val, and test error for pedestrian detection


Running example

Error source Value

Goal
1%
performance

Train error 20%


Val - train = 7%

(over-fitting)
Validation error 27%
0 (no pedestrian) 1 (yes pedestrian)

Test error 28%


Goal: 99% classification accuracy
Josh Tobin. January 2019. !97 josh-tobin.com/troubleshooting-deep-neural-networks
3. Evaluation

Train, val, and test error for pedestrian detection


Running example

Error source Value

Goal
1%
performance

Train error 20%

Validation error 27%


Test - val = 1%

0 (no pedestrian) 1 (yes pedestrian)
(looks good!)
Test error 28%
Goal: 99% classification accuracy
Josh Tobin. January 2019. !98 josh-tobin.com/troubleshooting-deep-neural-networks
3. Evaluation

Summary: evaluating model performance

Test error = irreducible error + bias + variance 



+ distribution shift + val overfitting

Josh Tobin. January 2019. !99 josh-tobin.com/troubleshooting-deep-neural-networks


Strategy for DL troubleshooting
Tune hyper-
parameters

Implement Meets re-


Start simple Evaluate
& debug quirements

Improve
model/data

Josh Tobin. January 2019. !100 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements
Prioritizing improvements 

(i.e., applying the bias-variance tradeoff)
Steps

a Address under-fitting

b Address over-fitting

Address distribution
c
shift

Re-balance datasets 

d
(if applicable)
Josh Tobin. January 2019. !101 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Addressing under-fitting (i.e., reducing bias)


Try first A. Make your model bigger (i.e., add layers or use
more units per layer)

B. Reduce regularization

C. Error analysis

D. Choose a different (closer to state-of-the art)


model architecture (e.g., move from LeNet to
ResNet)

E. Tune hyper-parameters (e.g., learning rate)

Try later F. Add features

Josh Tobin. January 2019. !102 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Train, val, and test error for pedestrian detection


Add more layers
to the ConvNet

Error source Value Value

Goal performance 1% 1%

Train error 20% 7%

0 (no pedestrian) 1 (yes pedestrian)


Validation error 27% 19%

Goal: 99% classification accuracy 



Test error 28% 20% (i.e., 1% error)

Josh Tobin. January 2019. !103 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Train, val, and test error for pedestrian detection


Switch to
ResNet-101

Error source Value Value Value

Goal performance 1% 1% 1%

Train error 20% 10% 3%

0 (no pedestrian) 1 (yes pedestrian)


Validation error 27% 19% 10%

Goal: 99% classification accuracy 



Test error 28% 20% 10% (i.e., 1% error)

Josh Tobin. January 2019. !104 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Train, val, and test error for pedestrian detection


Tune learning
rate

Error source Value Value Value Value

Goal performance 1% 1% 1% 1%

Train error 20% 10% 3% 0.8%

0 (no pedestrian) 1 (yes pedestrian)


Validation error 27% 19% 10% 12%

Goal: 99% classification accuracy 



Test error 28% 20% 10% 12% (i.e., 1% error)

Josh Tobin. January 2019. !105 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements
Prioritizing improvements 

(i.e., applying the bias-variance tradeoff)
Steps

a Address under-fitting

b Address over-fitting

Address distribution
c
shift

Re-balance datasets 

d
(if applicable)
Josh Tobin. January 2019. !106 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Addressing over-fitting (i.e., reducing variance)


Try first A. Add more training data (if possible!)

B. Add normalization (e.g., batch norm, layer norm)

C. Add data augmentation

D. Increase regularization (e.g., dropout, L2, weight decay)

E. Error analysis

F. Choose a different (closer to state-of-the-art) model


architecture

G. Tune hyperparameters

H. Early stopping

I. Remove features

Try later
J. Reduce model size
Josh Tobin. January 2019. !107 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Addressing over-fitting (i.e., reducing variance)


Try first A. Add more training data (if possible!)

B. Add normalization (e.g., batch norm, layer norm)

C. Add data augmentation

D. Increase regularization (e.g., dropout, L2, weight decay)

E. Error analysis

F. Choose a different (closer to state-of-the-art) model


architecture

G. Tune hyperparameters

Not
H. Early stopping
recommended!
I. Remove features

Try later
J. Reduce model size
Josh Tobin. January 2019. !108 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Train, val, and test error for pedestrian detection


Running example

Error source Value

Goal performance 1%

Train error 0.8%

Validation error 12%


0 (no pedestrian) 1 (yes pedestrian)

Test error 12% Goal: 99% classification accuracy


Josh Tobin. January 2019. !109 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Train, val, and test error for pedestrian detection


Increase dataset
size to 250,000 Running example

Error source Value Value

Goal performance 1% 1%

Train error 0.8% 1.5%

Validation error 12% 5%


0 (no pedestrian) 1 (yes pedestrian)

Test error 12% 6% Goal: 99% classification accuracy


Josh Tobin. January 2019. !110 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Train, val, and test error for pedestrian detection


Add weight
decay Running example

Error source Value Value Value

Goal performance 1% 1% 1%

Train error 0.8% 1.5% 1.7%

Validation error 12% 5% 4%


0 (no pedestrian) 1 (yes pedestrian)

Test error 12% 6% 4% Goal: 99% classification accuracy


Josh Tobin. January 2019. !111 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Train, val, and test error for pedestrian detection


Add data
augmentation Running example

Error source Value Value Value Value

Goal performance 1% 1% 1% 1%

Train error 0.8% 1.5% 1.7% 2%

Validation error 12% 5% 4% 2.5%


0 (no pedestrian) 1 (yes pedestrian)

Test error 12% 6% 4% 2.6% Goal: 99% classification accuracy


Josh Tobin. January 2019. !112 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Train, val, and test error for pedestrian detection


Tune num layers, optimizer params, weight
initialization, kernel size, weight decay Running example

Error source Value Value Value Value Value

Goal performance 1% 1% 1% 1% 1%

Train error 0.8% 1.5% 1.7% 2% 0.6%

Validation error 12% 5% 4% 2.5% 0.9%


0 (no pedestrian) 1 (yes pedestrian)

Test error 12% 6% 4% 2.6% 1.0% Goal: 99% classification accuracy


Josh Tobin. January 2019. !113 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements
Prioritizing improvements 

(i.e., applying the bias-variance tradeoff)
Steps

a Address under-fitting

b Address over-fitting

Address distribution
c
shift

Re-balance datasets 

d
(if applicable)
Josh Tobin. January 2019. !114 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Addressing distribution shift

Try first A. Analyze test-val set errors & collect more


training data to compensate

B. Analyze test-val set errors & synthesize more


training data to compensate

C. Apply domain adaptation techniques to


training & test distributions
Try later

Josh Tobin. January 2019. !115 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

Josh Tobin. January 2019. !116 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

Error type 1: hard-to-see


pedestrians
Josh Tobin. January 2019. !117 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

Error type 2: reflections

Josh Tobin. January 2019. !118 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Error analysis
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

Error type 3 (test-val only):


night scenes
Josh Tobin. January 2019. !119 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Error analysis
Error % Error %
Error type Potential solutions Priority
(train-val) (test-val)

1. Hard-to-see
pedestrians
0.1% 0.1% • Better sensors Low

• Collect more data with reflections

2. Reflections 0.3% 0.3% • Add synthetic reflections to train set

Medium
• Try to remove with pre-processing

• Better sensors

• Collect more data at night

3. Nighttime
0.1% 1% • Synthetically darken training images

High
scenes • Simulate night-time data

• Use domain adaptation

Josh Tobin. January 2019. !120 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Domain adaptation

What is it? When should you consider using it?


Techniques to train on “source” • Access to labeled data from test
distribution and generalize to another distribution is limited

“target” using only unlabeled data or • Access to relatively similar data is


limited labeled data plentiful

Josh Tobin. January 2019. !121 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements

Types of domain adaptation

Type Use case Example techniques

• Fine-tuning a pre-
You have limited data trained model

Supervised
from target domain • Adding target data to
train set

You have lots of un- • Correlation Alignment


(CORAL)

Un-supervised labeled data from target


domain • Domain confusion

• CycleGAN

Josh Tobin. January 2019. !122 josh-tobin.com/troubleshooting-deep-neural-networks


4. Prioritize improvements
Prioritizing improvements 

(i.e., applying the bias-variance tradeoff)
Steps

a Address under-fitting

b Address over-fitting

Address distribution
c
shift

Re-balance datasets 

d
(if applicable)
Josh Tobin. January 2019. !123 josh-tobin.com/troubleshooting-deep-neural-networks
4. Prioritize improvements

Rebalancing datasets

• If (test)-val looks significantly better than test,


you overfit to the val set

• This happens with small val sets or lots of hyper


parameter tuning

• When it does, recollect val data

Josh Tobin. January 2019. !124 josh-tobin.com/troubleshooting-deep-neural-networks


Strategy for DL troubleshooting
Tune hyper-
parameters

Implement Meets re-


Start simple Evaluate
& debug quirements

Improve
model/data

Josh Tobin. January 2019. !125 josh-tobin.com/troubleshooting-deep-neural-networks


5. Hyperparameter optimization

Hyperparameter optimization
Model & optimizer choices? Running example
Network: ResNet

- How many layers?

- Weight initialization?

- Kernel size?

- Etc

Optimizer: Adam

- Batch size?

- Learning rate?

- beta1, beta2, epsilon?

0 (no pedestrian) 1 (yes pedestrian)


Regularization

- …. Goal: 99% classification accuracy


Josh Tobin. January 2019. !126 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Which hyper-parameters to tune?


Choosing hyper-parameters Hyperparameter Approximate sensitivity
• More sensitive to some than others
Learning rate High
Optimizer choice Low
• Depends on choice of model

Other optimizer params



• Rules of thumb (only) to the right
(e.g., Adam beta1)
Low

• Sensitivity is relative to default values! 
 Batch size Low


(e.g., if you are using all-zeros weight Weight initialization Medium
initialization or vanilla SGD, changing to the Loss function High
defaults will make a big difference) Model depth Medium
Layer size High
Layer params 

Medium
(e.g., kernel size)
Weight of regularization Medium
Nonlinearity Low

Josh Tobin. January 2019. !127 josh-tobin.com/troubleshooting-deep-neural-networks


5. Hyperparameter optimization

Method 1: manual hyperparam optimization


How it works Advantages
• Understand the algorithm
• For a skilled practitioner, may require least
• E.g., higher learning rate means faster computation to get good result
less stable training

• Train & evaluate model

• Guess a better hyperparam value & re-


evaluate

Disadvantages
• Can be combined with other methods
(e.g., manually select parameter ranges to • Requires detailed understanding of the
optimizer over) algorithm

• Time-consuming

Josh Tobin. January 2019. !128 josh-tobin.com/troubleshooting-deep-neural-networks


5. Hyperparameter optimization

Method 2: grid search


How it works Advantages
• Super simple to implement

Hyperparameter 1 (e.g., batch size)

• Can produce good results

Disadvantages

• Not very efficient: need to train on all


cross-combos of hyper-parameters

• May require prior knowledge about


parameters to get

good results
Hyperparameter 2 (e.g., learning rate)
Josh Tobin. January 2019. !129 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Method 3: random search


How it works Advantages
Hyperparameter 1 (e.g., batch size)

Disadvantages

Hyperparameter 2 (e.g., learning rate)


Josh Tobin. January 2019. !130 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)

Disadvantages

Hyperparameter 2 (e.g., learning rate)


Josh Tobin. January 2019. !131 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)

Disadvantages
Best performers

Hyperparameter 2 (e.g., learning rate)


Josh Tobin. January 2019. !132 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)

Disadvantages

Hyperparameter 2 (e.g., learning rate)


Josh Tobin. January 2019. !133 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Method 4: coarse-to-fine
How it works Advantages
Hyperparameter 1 (e.g., batch size)

Disadvantages

Hyperparameter 2 (e.g., learning rate)


Josh Tobin. January 2019. !134 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Method 4: coarse-to-fine
How it works Advantages
• Can narrow in on very high performing
Hyperparameter 1 (e.g., batch size)

hyperparameters

• Most used method in practice

Disadvantages

• Somewhat manual process

etc.

Hyperparameter 2 (e.g., learning rate)


Josh Tobin. January 2019. !135 josh-tobin.com/troubleshooting-deep-neural-networks
5. Hyperparameter optimization

Method 5: Bayesian hyperparam opt


How it works (at a high level) Advantages
• Start with a prior estimate of parameter • Generally the most efficient hands-off way
distributions
to choose hyperparameters
• Maintain a probabilistic model of the
relationship between hyper-parameter
values and model performance

• Alternate between:

• Training with the hyper-parameter Disadvantages


values that maximize the expected
improvement

• Difficult to implement from scratch

• Can be hard to integrate with off-the-shelf


• Using training results to update our tools
probabilistic model

• To learn more, see:


https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f

Josh Tobin. January 2019. !136 josh-tobin.com/troubleshooting-deep-neural-networks


5. Hyperparameter optimization

Summary of how to optimize hyperparams

• Coarse-to-fine random searches

• Consider Bayesian hyper-parameter


optimization solutions as your
codebase matures

Josh Tobin. January 2019. !137 josh-tobin.com/troubleshooting-deep-neural-networks


Conclusion

Josh Tobin. January 2019. !138 josh-tobin.com/troubleshooting-deep-neural-networks


Conclusion

• DL debugging is hard due to many


competing sources of error

• To train bug-free DL models, we treat


building our model as an iterative process

• The following steps can make the process


easier and catch errors as early as possible

Josh Tobin. January 2019. !139 josh-tobin.com/troubleshooting-deep-neural-networks


How to build bug-free DL models
Overview
• Choose the simplest model & data possible
Start 
 (e.g., LeNet on a subset of your data)
simple

• Once model runs, overfit a single batch &


Implement reproduce a known result
& debug

• Apply the bias-variance decomposition to


Evaluate decide what to do next

• Use coarse-to-fine random searches


Tune hyp-
eparams

• Make your model bigger if you underfit; add


Improve data or regularize if you overfit
model/data
Josh Tobin. January 2019. !140 josh-tobin.com/troubleshooting-deep-neural-networks
Where to go to learn more
• Andrew Ng’s book Machine Learning
Yearning (http://www.mlyearning.org/)

• The following Twitter thread:



https://twitter.com/karpathy/status/
1013244313327681536

• This blog post: 



https://pcc.cs.byu.edu/2017/10/02/
practical-advice-for-building-deep-neural-
networks/
Josh Tobin. January 2019. !141 josh-tobin.com/troubleshooting-deep-neural-networks

You might also like