Understanding Weight Initialization For Neural Networks

PYIMAGESE
ARCH
DEEP LEARNING (HTTPS://PYIMAGESEARCH.COM/CATEGORY/DEEP-LEARNING/)
TUTORIALS (HTTPS://PYIMAGESEARCH.COM/CATEGORY/TUTORIALS/)
Understanding weight initialization for

neural networks
by Adrian Rosebrock (https://pyimagesearch.com/author/adrian/) on May 6, 2021
About Wistia (https://wistia

 
Build your own AI Bots

Join our GPT Kickstarter Campaign
02 09 19 49
DAYS HOURS MINUTES SECONDS
In this tutorial, we will discuss the concept of weight initialization, or more simply, how we
initialize our weight matrices and bias vectors.
This tutorial is not meant to be a comprehensive initialization technique; however, it does
highlight popular methods, but from neural network literature and general rules-of-thumb. To
illustrate how these weight initialization methods work I have included basic Python/NumPy-like
pseudocode when appropriate.
Constant Initialization
When applying constant initialization, all weights in the neural network are initialized with a
constant value, C. Typically C will equal zero or one.
To visualize this in pseudocode let’s consider an arbitrary layer of a neural network that has 64
inputs and 32 outputs (excluding any biases for notional convenience). To initialize these weights
via NumPy and zero initialization (the default used by Caffe, a popular deep learning framework)
we would execute:
Understanding weight initialization for neural networks

1. >>> W = np.zeros((64, 32))
Similarly, one initialization can be accomplished via:

1. >>> W = np.ones((64, 32))
We can apply constant initialization using an arbitrary of C using:

1. >>> W = np.ones((64, 32)) * C
Although constant initialization is easy to grasp and understand, the problem with using this
method is that it’s near impossible for us to break the symmetry of activations (Heinrich, 2015
method is that it s near impossible for us to break the symmetry of activations (Heinrich, 2015
(https://github.com/NVIDIA/DIGITS/blob/master/examples/weight-init/README.md)).
Therefore, it is rarely used as a neural network weight initializer.
Uniform and Normal Distributions

A uniform distribution draws a random value from the range [lower, upper] where every
value inside this range has equal probability of being drawn.
Again, let’s presume that for a given layer in a neural network we have 64 inputs and 32 outputs.
We then wish to initialize our weights in the range lower=-0.05 and upper=0.05 . Applying
the following Python + NumPy code will allow us to achieve the desired normalization:

1. >>> W = np.random.uniform(low=-0.05, high=0.05, size=(64, 32))
Executing the code above NumPy will randomly generate 64×32 = 2,048 values from the range
[−0.05, 0.05], where each value in this range has equal probability.
We then have a normal distribution where we define the probability density for the Gaussian
distribution as:
(1)
The most important parameters here are µ (the mean) and σ (the standard deviation). The square
of the standard deviation, σ2, is called the variance.
When using the Keras library the RandomNormal class draws random values from a normal
distribution with µ = 0 and σ = 0.05. We can mimic this behavior using NumPy below:

1. >>> W = np.random.normal(0.0, 0.05, size=(64, 32))
Both uniform and normal distributions can be used to initialize the weights in neural networks;
however, we normally impose various heuristics to create “better” initialization schemes (as we’ll
discuss in the remaining sections).
LeCun Uniform and Normal

If you have ever used the Torch7 or PyTorch frameworks you may notice that the default weight
initialization method is called “Efficient Backprop,” which is derived by the work of LeCun et al.
(1998) (http://dl.acm.org/citation.cfm?id=645754.668382).
Here, the authors define a parameter Fin (called “fan in,” or the number of inputs to the layer)
along with Fout (the “fan out,” or number of outputs from the layer). Using these values we can
apply uniform initialization by:

1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(3 / float(F_in))
4. >>> W = np.random.uniform(low=-limit, high=limit, size=(F_in, F_out))
We can also use a normal distribution as well. The Keras library uses a truncated normal
distribution when constructing the lower and upper limits, along with a zero mean:

1. >>> F_in = 64
2. >>> F_out = 32
4. >>> W = np.random.normal(0.0, limit, size=(F_in, F_out))
Glorot/Xavier Uniform and Normal

The default weight initialization method used in the Keras library is called “Glorot initialization” or
“Xavier initialization” named after Xavier Glorot, the first author of the paper, Understanding the
difficulty of training deep feedforward neural networks
(http://proceedings.mlr.press/v9/glorot10a.html).
For the normal distribution the limit value is constructed by averaging the Fin and Fout
together and then taking the square-root (Jones, 2016
(https://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization)). A
zero-center (µ = 0) is then used:

1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(2 / float(F_in + F_out))
Glorot/Xavier initialization can also be done with a uniform distribution where we place stronger
restrictions on limit :
1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(6 / float(F_in + F_out))
Learning tends to be quite efficient using this initialization method and I recommend it for most
neural networks.
He et al./Kaiming/MSRA Uniform and Normal

Often referred to as “He et al. initialization,” “Kaiming initialization,” or simply “MSRA initialization,”
this technique is named after Kaiming He, the first author of the paper, Delving Deep into
Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
(http://arxiv.org/abs/1502.01852).
We typically use this method when we are training very deep neural networks that use a ReLU-
like activation function (in particular, a “PReLU,” or Parametric Rectified Linear Unit).
To initialize the weights in a layer using He et al. initialization with a uniform distribution we set
limit to be , where Fin is the number of input units in the layer:

1. >>> F_in = 64
2. >>> F_out = 32
We can also use a normal distribution as well by setting µ = 0 and

1. >>> F_in = 64
2. >>> F_out = 32
Differences in Initialization Implementation

The actual limit values may vary for LeCun Uniform/Normal, Xavier Uniform/Normal, and He
et al. Uniform/Normal. For example, when using Xavier Uniform in Caffe,
limit = np.sqrt(3/n) (Heinrich, 2015
(https://github.com/NVIDIA/DIGITS/blob/master/examples/weight-init/README.md)), where n
is either the Fin, Fout, or their average.
On the other hand, the default Xaiver initialization for Keras uses
np.sqrt(6/(F_in + F_out)) (Keras contributors, 2016
(https://keras.io/initializers/#glorot_uniform)). No method is “more correct” than the other, but
you should read the documentation of your respective deep learning library.
What's next? I recommend PyImageSearch University
(https://pyimagesearch.com/pyimagesearch-
university/?
utm_source=blogPost&utm_medium=bottomBanner&u
tm_campaign=What%27s%20next%3F%20I%20recom
mend).
3:52
Course information:
79 total classes • 101+ hours of on-demand code walkthrough videos • Last updated:
August 2023
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision
and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming,
overwhelming, and complicated? Or has to involve complex mathematics and
equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain
things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to
change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be
PyImageSearch University, the most comprehensive computer vision, deep learning,
and OpenCV course online today. Here you’ll learn how to successfully and
confidently apply computer vision to your work, research, and projects. Join me in
computer vision mastery.
Inside PyImageSearch University you'll find:
✓ 79 courses on essential computer vision, deep learning, and OpenCV topics
✓ 79 Certificates of Completion
✓ 101+ hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-
art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and
Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.
CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY

(HTTPS://PYIMAGESEARCH.COM/PYIMAGESEARCH-UNIVERSITY/?
UTM_SOURCE=BLOGPOST&UTM_MEDIUM=BOTTOMBANNER&UTM_CAMPA
IGN=WHAT%27S%20NEXT%3F%20I%20RECOMMEND)
Summary
In this tutorial, we reviewed the fundamentals of neural networks. Specifically, we focused on the
history of neural networks and the relation to biology.
From there, we moved on to artificial neural networks, such as the Perceptron algorithm. While
important from a historical standpoint, the Perceptron algorithm has one major flaw — it cannot
accurately classify nonlinear separable points. In order to work with more challenging datasets
we need both (1) nonlinear activation functions and (2) multi-layer networks.
To train multi-layer networks we must use the backpropagation algorithm. We then implemented
backpropagation by hand and demonstrated that when used to train multi-layer networks with
nonlinear activation functions, we can model nonlinearly separable datasets, such as XOR.
Of course, implementing backpropagation by hand is an arduous process prone to bugs — we,

therefore, often rely on existing libraries such as Keras, Theano, TensorFlow, etc. This enables us
to focus on the actual architecture rather than the underlying algorithm used to train the network.
Finally, we reviewed the four key ingredients when working with any neural network, including
the dataset, loss function, model/architecture, and optimization method.
Unfortunately, as some of our results demonstrated (e.g., CIFAR-10) standard neural networks fail
to obtain high classification accuracy when working with challenging image datasets that exhibit
variations in translation, rotation, viewpoint, etc. In order to obtain reasonable accuracy on these
datasets, we’ll need to work with a special type of feedforward neural networks called
Convolutional Neural Networks (CNNs), which we will cover in a separate tutorial.
What's next? I recommend PyImageSearch University

( p py g py g
university/?
utm_source=blogPost&utm_medium=bottomBanner&u
tm_campaign=What%27s%20next%3F%20I%20recom
mend).
3:52
Course information:
79 total classes • 101+ hours of on-demand code walkthrough videos • Last updated:
August 2023
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision
and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming,
overwhelming, and complicated? Or has to involve complex mathematics and
equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain
things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to
change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be
PyImageSearch University, the most comprehensive computer vision, deep learning,
and OpenCV course online today. Here you’ll learn how to successfully and
confidently apply computer vision to your work, research, and projects. Join me in
computer vision mastery.
Inside PyImageSearch University you'll find:
✓ 79 courses on essential computer vision, deep learning, and OpenCV topics
✓ 79 Certificates of Completion
✓ 101+ hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-
art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and
Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.
CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY

(HTTPS://PYIMAGESEARCH.COM/PYIMAGESEARCH-UNIVERSITY/?
UTM_SOURCE=BLOGPOST&UTM_MEDIUM=BOTTOMBANNER&UTM_CAMPA
IGN=WHAT%27S%20NEXT%3F%20I%20RECOMMEND)
About the Author
Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and
researchers wasting their time, studying the wrong things, and generally struggling
to get started with Computer Vision, Deep Learning, and OpenCV. I created this
website to show you what I believe is the best possible way to get your start.
Previous Article:
Gradient Descent Algorithms and Variations
(https://pyimagesearch.com/2021/05/05/gradient-descent-algorithms-and-variations/)
Next Article:
Implementing feedforward neural networks with Keras and TensorFlow
(https://pyimagesearch.com/2021/05/06/implementing-feedforward-neural-networks-with-
keras-and-tensorflow/)
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love
hearing from readers, a couple years ago I made the tough decision to no longer offer
1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post
comments. I simply did not have the time to moderate and respond to them all, and
the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and
OpenCV community at large by focusing my time on authoring high-quality blog
posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to
my full catalog of books and courses (https://pyimagesearch.com/books-and-
courses/) — they have helped tens of thousands of developers, students, and
researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog. (https://pyimagesearch.com/books-and-

courses/)
Similar articles
DEEP LEARNING TUTORIALS

Introduction to Recurrent Neural Networks with Keras and TensorFlow
July 25, 2022
(https://pyimagesearch.com/2022/07/25/introduction-to-recurrent-neural-networks-
with-keras-and-tensorflow/)
PYIMAGECONF
PyImageConf 2018 Recap

October 1, 2018
(https://pyimagesearch.com/2018/10/01/pyimageconf-2018-recap/)
OBJECT DETECTION TUTORIALS
Introduction to the YOLO Family

April 4, 2022
(https://pyimagesearch.com/2022/04/04/introduction-to-the-yolo-family/)
You can learn Computer Vision, Deep Learning, and OpenCV.
Get your FREE 17 page Computer Vision, OpenCV, and Deep Learning Resource Guide PDF. Inside
you’ll find our hand-picked tutorials, books, courses, and libraries to help you master CV and DL.
Machine Learning and Computer Vision

(https://pyimagesearch.com/category/machine-
Topics learning-2/)
Medical Computer Vision

Deep Learning
(https://pyimagesearch.com/category/medical/)
(https://pyimagesearch.com/category/deep-
learning-2/) Optical Character Recognition (OCR)
(https://pyimagesearch.com/category/optical-
Dlib Library
character-recognition-ocr/)
(https://pyimagesearch.com/category/dlib/)
Object Detection
Embedded/IoT and Computer Vision
(https://pyimagesearch.com/category/embedded/) (https://pyimagesearch.com/category/object-
detection/)
Face Applications
Object Tracking
(https://pyimagesearch.com/category/faces/)
(https://pyimagesearch.com/category/object-
Image Processing tracking/)
(https://pyimagesearch.com/category/image-
OpenCV Tutorials
processing/)
(https://pyimagesearch.com/category/opencv/)
Interviews
Raspberry Pi
(https://pyimagesearch.com/category/interviews/)
(https://pyimagesearch.com/category/raspberry-
Keras (https://pyimagesearch.com/category/keras/) pi/)
OpenCV Install Guides
(https://pyimagesearch.com/opencv-tutorials-
resources-guides/)
Books & Courses PyImageSearch
PyImageSearch University Affiliates (https://pyimagesearch.com/affiliates/)

Get Started (https://pyimagesearch.com/start-
university/)
here/)
FREE CV, DL, and OpenCV Crash Course
About (https://pyimagesearch.com/about/)
(https://pyimagesearch.com/free-opencv-
computer-vision-deep-learning-crash-course/) Consulting (https://pyimagesearch.com/consulting-
2/)
Practical Python and OpenCV
(https://pyimagesearch.com/practical-python- Coaching (https://pyimagesearch.com/consult-
opencv/) adrian/)
Deep Learning for Computer Vision with Python FAQ (https://pyimagesearch.com/faqs/)

(https://pyimagesearch.com/deep-learning-
YouTube (https://pyimagesearch.com/youtube/)
computer-vision-python-book/)
Blog (https://pyimagesearch.com/topics/)
PyImageSearch Gurus Course
(https://pyimagesearch.com/pyimagesearch- Contact (https://pyimagesearch.com/contact/)
gurus/) Privacy Policy (https://pyimagesearch.com/privacy-
policy/)
Raspberry Pi for Computer Vision
(https://pyimagesearch.com/raspberry-pi-for-
computer-vision/)
(https://www.facebook.com/pyimagesearch)
(https://twitter.com/PyImageSearch) (http://www.linkedin.com/pub/adrian-
rosebrock/2a/873/59b) (https://www.youtube.com/channel/UCoQK7OVcIVy-nV4m-
SMCk_Q/videos)
© 2023 PyImageSearch (https://pyimagesearch.com). All Rights Reserved.

Understanding Weight Initialization For Neural Networks - PyImageSearch

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding Weight Initialization For Neural Networks - PyImageSearch

Uploaded by

Copyright:

Available Formats

PYIMAGESE

DEEP LEARNING (HTTPS://PYIMAGESEARCH.COM/CATEGORY/DEEP-LEARNING/)

Understanding weight initialization for

About Wistia (https://wistia

Build your own AI Bots

Similarly, one initialization can be accomplished via:

Understanding weight initialization for neural networks

We can apply constant initialization using an arbitrary of C using:

Understanding weight initialization for neural networks

Uniform and Normal Distributions

Understanding weight initialization for neural networks

Understanding weight initialization for neural networks

LeCun Uniform and Normal

Understanding weight initialization for neural networks

Understanding weight initialization for neural networks

Glorot/Xavier Uniform and Normal

Understanding weight initialization for neural networks

He et al./Kaiming/MSRA Uniform and Normal

Understanding weight initialization for neural networks

We can also use a normal distribution as well by setting µ = 0 and

Understanding weight initialization for neural networks

Differences in Initialization Implementation

Inside PyImageSearch University you'll find:

✓ 79 courses on essential computer vision, deep learning, and OpenCV topics

✓ 101+ hours of on-demand video

✓ Pre-configured Jupyter Notebooks in Google Colab

✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch

✓ Easy one-click downloads for code, datasets, pre-trained models, etc.

✓ Access on mobile, laptop, desktop, etc.

CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY

Of course, implementing backpropagation by hand is an arduous process prone to bugs — we,

What's next? I recommend PyImageSearch University

That’s not the case.

Inside PyImageSearch University you'll find:

✓ 79 courses on essential computer vision, deep learning, and OpenCV topics

✓ 101+ hours of on-demand video

✓ Pre-configured Jupyter Notebooks in Google Colab

✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch

✓ Easy one-click downloads for code, datasets, pre-trained models, etc.

✓ Access on mobile, laptop, desktop, etc.

CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY

Gradient Descent Algorithms and Variations

Implementing feedforward neural networks with Keras and TensorFlow

Click here to browse my full catalog. (https://pyimagesearch.com/books-and-

DEEP LEARNING TUTORIALS

PyImageConf 2018 Recap

OBJECT DETECTION TUTORIALS

Introduction to the YOLO Family

Machine Learning and Computer Vision

Medical Computer Vision

Books & Courses PyImageSearch

PyImageSearch University Affiliates (https://pyimagesearch.com/affiliates/)

Deep Learning for Computer Vision with Python FAQ (https://pyimagesearch.com/faqs/)

© 2023 PyImageSearch (https://pyimagesearch.com). All Rights Reserved.

You might also like