You are on page 1of 16

PYIMAGESE

ARCH

DEEP LEARNING (HTTPS://PYIMAGESEARCH.COM/CATEGORY/DEEP-LEARNING/)

TUTORIALS (HTTPS://PYIMAGESEARCH.COM/CATEGORY/TUTORIALS/)

Understanding weight initialization for


neural networks
by Adrian Rosebrock (https://pyimagesearch.com/author/adrian/) on May 6, 2021

About Wistia (https://wistia


 

Build your own AI Bots


Join our GPT Kickstarter Campaign
02 09 19 49
DAYS HOURS MINUTES SECONDS
In this tutorial, we will discuss the concept of weight initialization, or more simply, how we
initialize our weight matrices and bias vectors.
This tutorial is not meant to be a comprehensive initialization technique; however, it does
highlight popular methods, but from neural network literature and general rules-of-thumb. To
illustrate how these weight initialization methods work I have included basic Python/NumPy-like
pseudocode when appropriate.

Constant Initialization
When applying constant initialization, all weights in the neural network are initialized with a
constant value, C. Typically C will equal zero or one.

To visualize this in pseudocode let’s consider an arbitrary layer of a neural network that has 64
inputs and 32 outputs (excluding any biases for notional convenience). To initialize these weights
via NumPy and zero initialization (the default used by Caffe, a popular deep learning framework)
we would execute:

Understanding weight initialization for neural networks


1. >>> W = np.zeros((64, 32))

Similarly, one initialization can be accomplished via:

Understanding weight initialization for neural networks


1. >>> W = np.ones((64, 32))

We can apply constant initialization using an arbitrary of C using:

Understanding weight initialization for neural networks


1. >>> W = np.ones((64, 32)) * C

Although constant initialization is easy to grasp and understand, the problem with using this
method is that it’s near impossible for us to break the symmetry of activations (Heinrich, 2015
method is that it s near impossible for us to break the symmetry of activations (Heinrich, 2015
(https://github.com/NVIDIA/DIGITS/blob/master/examples/weight-init/README.md)).
Therefore, it is rarely used as a neural network weight initializer.

Uniform and Normal Distributions


A uniform distribution draws a random value from the range [lower, upper] where every
value inside this range has equal probability of being drawn.

Again, let’s presume that for a given layer in a neural network we have 64 inputs and 32 outputs.
We then wish to initialize our weights in the range lower=-0.05 and upper=0.05 . Applying
the following Python + NumPy code will allow us to achieve the desired normalization:

Understanding weight initialization for neural networks


1. >>> W = np.random.uniform(low=-0.05, high=0.05, size=(64, 32))

Executing the code above NumPy will randomly generate 64×32 = 2,048 values from the range
[−0.05, 0.05], where each value in this range has equal probability.

We then have a normal distribution where we define the probability density for the Gaussian
distribution as:

(1)

The most important parameters here are µ (the mean) and σ (the standard deviation). The square
of the standard deviation, σ2, is called the variance.

When using the Keras library the RandomNormal class draws random values from a normal
distribution with µ = 0 and σ = 0.05. We can mimic this behavior using NumPy below:

Understanding weight initialization for neural networks


1. >>> W = np.random.normal(0.0, 0.05, size=(64, 32))

Both uniform and normal distributions can be used to initialize the weights in neural networks;
however, we normally impose various heuristics to create “better” initialization schemes (as we’ll
discuss in the remaining sections).

LeCun Uniform and Normal


If you have ever used the Torch7 or PyTorch frameworks you may notice that the default weight
initialization method is called “Efficient Backprop,” which is derived by the work of LeCun et al.
(1998) (http://dl.acm.org/citation.cfm?id=645754.668382).

Here, the authors define a parameter Fin (called “fan in,” or the number of inputs to the layer)
along with Fout (the “fan out,” or number of outputs from the layer). Using these values we can
apply uniform initialization by:

Understanding weight initialization for neural networks


1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(3 / float(F_in))
4. >>> W = np.random.uniform(low=-limit, high=limit, size=(F_in, F_out))

We can also use a normal distribution as well. The Keras library uses a truncated normal
distribution when constructing the lower and upper limits, along with a zero mean:

Understanding weight initialization for neural networks


1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(1 / float(F_in))
4. >>> W = np.random.normal(0.0, limit, size=(F_in, F_out))

Glorot/Xavier Uniform and Normal


The default weight initialization method used in the Keras library is called “Glorot initialization” or
“Xavier initialization” named after Xavier Glorot, the first author of the paper, Understanding the
difficulty of training deep feedforward neural networks
(http://proceedings.mlr.press/v9/glorot10a.html).

For the normal distribution the limit value is constructed by averaging the Fin and Fout
together and then taking the square-root (Jones, 2016
(https://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization)). A
zero-center (µ = 0) is then used:

Understanding weight initialization for neural networks


1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(2 / float(F_in + F_out))
4. >>> W = np.random.normal(0.0, limit, size=(F_in, F_out))

Glorot/Xavier initialization can also be done with a uniform distribution where we place stronger
restrictions on limit :
Understanding weight initialization for neural networks
1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(6 / float(F_in + F_out))
4. >>> W = np.random.uniform(low=-limit, high=limit, size=(F_in, F_out))
Learning tends to be quite efficient using this initialization method and I recommend it for most
neural networks.

He et al./Kaiming/MSRA Uniform and Normal


Often referred to as “He et al. initialization,” “Kaiming initialization,” or simply “MSRA initialization,”
this technique is named after Kaiming He, the first author of the paper, Delving Deep into
Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
(http://arxiv.org/abs/1502.01852).

We typically use this method when we are training very deep neural networks that use a ReLU-
like activation function (in particular, a “PReLU,” or Parametric Rectified Linear Unit).

To initialize the weights in a layer using He et al. initialization with a uniform distribution we set
limit to be , where Fin is the number of input units in the layer:

Understanding weight initialization for neural networks


1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(6 / float(F_in))
4. >>> W = np.random.uniform(low=-limit, high=limit, size=(F_in, F_out))

We can also use a normal distribution as well by setting µ = 0 and

Understanding weight initialization for neural networks


1. >>> F_in = 64
2. >>> F_out = 32
3. >>> limit = np.sqrt(2 / float(F_in))
4. >>> W = np.random.normal(0.0, limit, size=(F_in, F_out))

Differences in Initialization Implementation


The actual limit values may vary for LeCun Uniform/Normal, Xavier Uniform/Normal, and He
et al. Uniform/Normal. For example, when using Xavier Uniform in Caffe,
limit = np.sqrt(3/n) (Heinrich, 2015
(https://github.com/NVIDIA/DIGITS/blob/master/examples/weight-init/README.md)), where n
is either the Fin, Fout, or their average.

On the other hand, the default Xaiver initialization for Keras uses
np.sqrt(6/(F_in + F_out)) (Keras contributors, 2016
(https://keras.io/initializers/#glorot_uniform)). No method is “more correct” than the other, but
you should read the documentation of your respective deep learning library.
What's next? I recommend PyImageSearch University
(https://pyimagesearch.com/pyimagesearch-
university/?
utm_source=blogPost&utm_medium=bottomBanner&u
tm_campaign=What%27s%20next%3F%20I%20recom
mend).

3:52

Course information:
79 total classes • 101+ hours of on-demand code walkthrough videos • Last updated:
August 2023
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision
and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming,
overwhelming, and complicated? Or has to involve complex mathematics and
equations? Or requires a degree in computer science?
That’s not the case.

All you need to master computer vision and deep learning is for someone to explain
things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to
change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be
PyImageSearch University, the most comprehensive computer vision, deep learning,
and OpenCV course online today. Here you’ll learn how to successfully and
confidently apply computer vision to your work, research, and projects. Join me in
computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 79 courses on essential computer vision, deep learning, and OpenCV topics

✓ 79 Certificates of Completion

✓ 101+ hours of on-demand video

✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-
art techniques

✓ Pre-configured Jupyter Notebooks in Google Colab

✓ Run all code examples in your web browser — works on Windows, macOS, and
Linux (no dev environment configuration required!)

✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch

✓ Easy one-click downloads for code, datasets, pre-trained models, etc.

✓ Access on mobile, laptop, desktop, etc.

CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY


(HTTPS://PYIMAGESEARCH.COM/PYIMAGESEARCH-UNIVERSITY/?
UTM_SOURCE=BLOGPOST&UTM_MEDIUM=BOTTOMBANNER&UTM_CAMPA

IGN=WHAT%27S%20NEXT%3F%20I%20RECOMMEND)
Summary
In this tutorial, we reviewed the fundamentals of neural networks. Specifically, we focused on the
history of neural networks and the relation to biology.

From there, we moved on to artificial neural networks, such as the Perceptron algorithm. While
important from a historical standpoint, the Perceptron algorithm has one major flaw — it cannot
accurately classify nonlinear separable points. In order to work with more challenging datasets
we need both (1) nonlinear activation functions and (2) multi-layer networks.

To train multi-layer networks we must use the backpropagation algorithm. We then implemented
backpropagation by hand and demonstrated that when used to train multi-layer networks with
nonlinear activation functions, we can model nonlinearly separable datasets, such as XOR.

Of course, implementing backpropagation by hand is an arduous process prone to bugs — we,


therefore, often rely on existing libraries such as Keras, Theano, TensorFlow, etc. This enables us
to focus on the actual architecture rather than the underlying algorithm used to train the network.

Finally, we reviewed the four key ingredients when working with any neural network, including
the dataset, loss function, model/architecture, and optimization method.

Unfortunately, as some of our results demonstrated (e.g., CIFAR-10) standard neural networks fail
to obtain high classification accuracy when working with challenging image datasets that exhibit
variations in translation, rotation, viewpoint, etc. In order to obtain reasonable accuracy on these
datasets, we’ll need to work with a special type of feedforward neural networks called
Convolutional Neural Networks (CNNs), which we will cover in a separate tutorial.

What's next? I recommend PyImageSearch University


(https://pyimagesearch.com/pyimagesearch-
( p py g py g
university/?
utm_source=blogPost&utm_medium=bottomBanner&u
tm_campaign=What%27s%20next%3F%20I%20recom
mend).

3:52

Course information:
79 total classes • 101+ hours of on-demand code walkthrough videos • Last updated:
August 2023
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision
and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming,
overwhelming, and complicated? Or has to involve complex mathematics and
equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain
things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to
change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be
PyImageSearch University, the most comprehensive computer vision, deep learning,
and OpenCV course online today. Here you’ll learn how to successfully and
confidently apply computer vision to your work, research, and projects. Join me in
computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 79 courses on essential computer vision, deep learning, and OpenCV topics

✓ 79 Certificates of Completion

✓ 101+ hours of on-demand video

✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-
art techniques

✓ Pre-configured Jupyter Notebooks in Google Colab

✓ Run all code examples in your web browser — works on Windows, macOS, and
Linux (no dev environment configuration required!)

✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch

✓ Easy one-click downloads for code, datasets, pre-trained models, etc.

✓ Access on mobile, laptop, desktop, etc.

CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY


(HTTPS://PYIMAGESEARCH.COM/PYIMAGESEARCH-UNIVERSITY/?
UTM_SOURCE=BLOGPOST&UTM_MEDIUM=BOTTOMBANNER&UTM_CAMPA
IGN=WHAT%27S%20NEXT%3F%20I%20RECOMMEND)
About the Author
Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and
researchers wasting their time, studying the wrong things, and generally struggling
to get started with Computer Vision, Deep Learning, and OpenCV. I created this
website to show you what I believe is the best possible way to get your start.

Previous Article:

Gradient Descent Algorithms and Variations

(https://pyimagesearch.com/2021/05/05/gradient-descent-algorithms-and-variations/)
Next Article:

Implementing feedforward neural networks with Keras and TensorFlow

(https://pyimagesearch.com/2021/05/06/implementing-feedforward-neural-networks-with-
keras-and-tensorflow/)

Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love
hearing from readers, a couple years ago I made the tough decision to no longer offer
1:1 help over blog post comments.

At the time I was receiving 200+ emails per day and another 100+ blog post
comments. I simply did not have the time to moderate and respond to them all, and
the sheer volume of requests was taking a toll on me.

Instead, my goal is to do the most good for the computer vision, deep learning, and
OpenCV community at large by focusing my time on authoring high-quality blog
posts, tutorials, and books/courses.

If you need help learning computer vision and deep learning, I suggest you refer to
my full catalog of books and courses (https://pyimagesearch.com/books-and-
courses/) — they have helped tens of thousands of developers, students, and
researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.

Click here to browse my full catalog. (https://pyimagesearch.com/books-and-


courses/)

Similar articles

DEEP LEARNING TUTORIALS


Introduction to Recurrent Neural Networks with Keras and TensorFlow
July 25, 2022
(https://pyimagesearch.com/2022/07/25/introduction-to-recurrent-neural-networks-
with-keras-and-tensorflow/)

PYIMAGECONF

PyImageConf 2018 Recap


October 1, 2018
(https://pyimagesearch.com/2018/10/01/pyimageconf-2018-recap/)

OBJECT DETECTION TUTORIALS

Introduction to the YOLO Family


April 4, 2022
(https://pyimagesearch.com/2022/04/04/introduction-to-the-yolo-family/)
You can learn Computer Vision, Deep Learning, and OpenCV.
Get your FREE 17 page Computer Vision, OpenCV, and Deep Learning Resource Guide PDF. Inside
you’ll find our hand-picked tutorials, books, courses, and libraries to help you master CV and DL.

Machine Learning and Computer Vision


(https://pyimagesearch.com/category/machine-
Topics learning-2/)

Medical Computer Vision


Deep Learning
(https://pyimagesearch.com/category/medical/)
(https://pyimagesearch.com/category/deep-
learning-2/) Optical Character Recognition (OCR)
(https://pyimagesearch.com/category/optical-
Dlib Library
character-recognition-ocr/)
(https://pyimagesearch.com/category/dlib/)
Object Detection
Embedded/IoT and Computer Vision
(https://pyimagesearch.com/category/embedded/) (https://pyimagesearch.com/category/object-
detection/)
Face Applications
Object Tracking
(https://pyimagesearch.com/category/faces/)
(https://pyimagesearch.com/category/object-
Image Processing tracking/)
(https://pyimagesearch.com/category/image-
OpenCV Tutorials
processing/)
(https://pyimagesearch.com/category/opencv/)
Interviews
Raspberry Pi
(https://pyimagesearch.com/category/interviews/)
(https://pyimagesearch.com/category/raspberry-
Keras (https://pyimagesearch.com/category/keras/) pi/)
OpenCV Install Guides
(https://pyimagesearch.com/opencv-tutorials-
resources-guides/)

Books & Courses PyImageSearch

PyImageSearch University Affiliates (https://pyimagesearch.com/affiliates/)


(https://pyimagesearch.com/pyimagesearch-
Get Started (https://pyimagesearch.com/start-
university/)
here/)
FREE CV, DL, and OpenCV Crash Course
About (https://pyimagesearch.com/about/)
(https://pyimagesearch.com/free-opencv-
computer-vision-deep-learning-crash-course/) Consulting (https://pyimagesearch.com/consulting-
2/)
Practical Python and OpenCV
(https://pyimagesearch.com/practical-python- Coaching (https://pyimagesearch.com/consult-
opencv/) adrian/)

Deep Learning for Computer Vision with Python FAQ (https://pyimagesearch.com/faqs/)


(https://pyimagesearch.com/deep-learning-
YouTube (https://pyimagesearch.com/youtube/)
computer-vision-python-book/)
Blog (https://pyimagesearch.com/topics/)
PyImageSearch Gurus Course
(https://pyimagesearch.com/pyimagesearch- Contact (https://pyimagesearch.com/contact/)
gurus/) Privacy Policy (https://pyimagesearch.com/privacy-
policy/)
Raspberry Pi for Computer Vision
(https://pyimagesearch.com/raspberry-pi-for-
computer-vision/)

(https://www.facebook.com/pyimagesearch)
(https://twitter.com/PyImageSearch) (http://www.linkedin.com/pub/adrian-
rosebrock/2a/873/59b) (https://www.youtube.com/channel/UCoQK7OVcIVy-nV4m-
SMCk_Q/videos)

© 2023 PyImageSearch (https://pyimagesearch.com). All Rights Reserved.

You might also like