0% found this document useful (0 votes)

108 views41 pages

Understanding Gradient Problems in Neural Networks

The document discusses the vanishing and exploding gradient problems encountered while training deep neural networks. It explains how these issues arise during backpropagation, particularly with activation functions like sigmoid and tanh, and suggests solutions such as using ReLU, leaky ReLU, residual networks, and batch normalization. Additionally, it addresses the exploding gradient problem caused by large weight values and recommends gradient clipping as a solution.

Uploaded by

cs22b2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views41 pages

Understanding Gradient Problems in Neural Networks

Uploaded by

cs22b2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

VANISHING AND

EXPLODING GRADIENT
PROBLEM

Dr. Umarani Jayaraman

Vanishing and Exploding Gradient
Problem
Vanishing and Exploding Gradient
Problem

 The problems with training very deep

neural network are vanishing and
exploding gradients.
 When training a very deep neural
network, sometimes derivatives
becomes very small (vanishing gradient)
or very big (exploding gradient) and this
makes training difficult.
Vanishing and Exploding Gradient
Problem

 This problem occurs during training on

NN- Back Propagation Learning
Vanishing Gradient Problem
 In earlier days, most commonly used
activation function is sigmoid activation
function.
Vanishing Gradient Problem
Vanishing Gradient Problem
 In sigmoid the values are converted
between 0 and 1 and their derivatives
lies between 0 and 0.25
 The formula for weight updation is,
Vanishing Gradient Problem
Vanishing Gradient Problem

= 0.2 * 0.15 *
0.05

= 0.0015

= 2.5 – (1)
0.0015
Vanishing Gradient Problem
 As the number of layers in a neural network
increases, the derivative keeps on
decreasing.
 Hence, addition of more layers would lead to
almost 0 derivative
 at that time we could see

 If new weight is approximately equal to old

weight, it actually stops learning. This is
called vanishing gradient problem
Vanishing Gradient Problem
 This is the reason why sigmoid is no
longer used as activation function in
hidden layers.
 The activation function tanh is also not
used because it’s derivatives lies
between 0 and 1. Therefore, it also leads
to vanishing gradient problem.
Vanishing Gradient Problem
 Solutions for Vanishing
gradient problem:
 The simplest solution is to use other
activation functions, such as ReLU, Leaky
ReLU, Pameteric ReLU
 This activation only saturates on one
direction and thus are more resilient to
the vanishing of gradients.
ReLU Activation function
ReLU Activation function

=1*1*1

= 2.5 – (1) 1

= 1.5
ReLU Activation function

=1*1*0

= 0 (dead
neuron)

= 2.5 – (1) 0

= 2.5
leaky ReLU
 To fix the problem of dead neuron leaky
ReLU is introduced.
leaky ReLU- derivatives
 The derivatives is no more zero and
hence no dead neurons
Leaky ReLU Activation
function

= 0.01 * 1 * 1

= 0.01 (No dead

neuron)

= 2.5 – (1) 0.01

= 2.49
Vanishing Gradient Problem
 The others solutions are,
 Use Residual networks (ResNets)
 Use Batch Normalization
 Use Multi-level hierarchy
 Use Long short term memory(LSTM)
network
 Use Faster hardware
 Genetic algorithms for weight search
Residual neural networks (ResNets)

 One of the newest and most

effective ways to resolve the vanishing
gradient problem is with residual neural
networks, or ResNets (not to be confused
with recurrent neural networks).
 It is noted before ResNets that a deeper
network would have higher training error
than the shallow network.
Residual neural networks (ResNets)

 As this gradient keeps flowing backwards

to the initial layers, this value keeps
getting multiplied by each local gradient.
 Hence, the gradient becomes smaller
and smaller, making the updates to the
initial layers very small, or no updation
in weights (it stops learning)
 We can solve this problem if the local
gradient somehow become 1.
Linear or Identity Activation Function

 Equation: f(x) =
x
 Derivative: f’(x)
=1
Residual neural networks (ResNets)

 How can the local gradient be 1, i.e, the derivative

of which function would always be 1?
 The Identity function!

 As this gradient is back propagated, it does not

decrease in value because the local gradient is 1.
Residual neural networks (ResNets)

 The residual  This residual

connection connection doesn’t
directly adds the go through
value at the activation functions
that “squashes” the
beginning of the
derivatives,
block, x, to the
resulting in a higher
end of the block overall derivative of
(F(x)+x). the block.
Residual neural networks (ResNets)

 These skip
connections act as
gradient superhigh
ways, allowing the
gradient to flow
unhindered
(without
restriction).
Batch Normalization
 batch normalization layers can also
resolve the vanishing gradient problem
 As stated before, the problem arises
when a large input space is mapped to a
small one, causing the derivatives to
disappear.
Batch Normalization
 In Image below, this is most clearly seen
at when |x| is big.
Batch Normalization
 Batch normalization reduces this
problem by simply normalizing the input
so |x| doesn’t reach the outer edges of
the sigmoid function.
 It normalizes the input so that most of it
falls in the green region, where the
derivative isn’t too small.
Exploding Gradient Problem
Exploding Gradient Problem

ŷ= ʄ (O1)
O1 O1=z1.W2+b

∂ ʄ (O1) ∂ O1
--- = --------- * -----
∂ŷ

∂z1 ∂O1 ∂z1

∂ (z1.W2+b)
--- = 0.25 * ----------------
∂ŷ

∂z1 ∂z1

--- = 0.25 * W2 = 0.25 * 500 =125 (when weights

∂ŷ
Exploding Gradient Problem
 Exploding problem is not because of
sigmoid function
 This problem occurs due to larger weight
value
 During initialization of weights if the
weights are initialized with larger value,
instead of converging, it keep oscillating.
 Hence, we should properly select initial
weight vectors
Exploding Gradient Problem
 When gradients explode, the gradients
could become NaN (Not a
Number) because of the numerical
overflow
 We might see irregular oscillations in
training cost when we plot the learning
curve.
Dealing with Exploding Gradients

 A solution to fix this is to apply

gradient clipping; which places a
predefined threshold on the gradients to
prevent it from getting too large, and by
doing this it doesn’t change the direction
of the gradients it only change its length.
Note: Step function
Note: Step function
Note: RelU function
Linear function definition
Non-linear function
definition
Source: Activation function
 https://
inblog.in/ACTIVATION-FUNCTION-BREAKT
HROUGH-VOyvxhTELU
Source: Vanishing gradient
problem
 https://www.mygreatlearning.com/blog/t
he-vanishing-gradient-problem/
 https://towardsdatascience.com/the-vani
shing-gradient-problem-69bf08b15484
 https://medium.com/analytics-vidhya/va
nishing-and-exploding-gradient-problems
-c94087c2e911
THANK YOU

Weight Initialization Techniques Assignment Questions
No ratings yet
Weight Initialization Techniques Assignment Questions
8 pages
Vanishing & Exploding Gradients in Neural Networks
No ratings yet
Vanishing & Exploding Gradients in Neural Networks
8 pages
Vanishing & Exploding Gradients in NN
No ratings yet
Vanishing & Exploding Gradients in NN
2 pages
Deep Learning: Training Techniques
No ratings yet
Deep Learning: Training Techniques
42 pages
RNN Gradient Problems Explained
No ratings yet
RNN Gradient Problems Explained
8 pages
Module 2
No ratings yet
Module 2
13 pages
Deep Learning Gradient Challenges
No ratings yet
Deep Learning Gradient Challenges
8 pages
Deep Neural Network Training Techniques
No ratings yet
Deep Neural Network Training Techniques
267 pages
Gradient Exploding & Vanishing Issues in Deep Learning
No ratings yet
Gradient Exploding & Vanishing Issues in Deep Learning
3 pages
Weight Initialization and Gradient Issues
No ratings yet
Weight Initialization and Gradient Issues
17 pages
Vanishing Gradient Problem in Deep Learning
No ratings yet
Vanishing Gradient Problem in Deep Learning
8 pages
Fixing Vanishing Gradients in Neural Networks
No ratings yet
Fixing Vanishing Gradients in Neural Networks
3 pages
Lecture 6
No ratings yet
Lecture 6
10 pages
Vanishing & Exploding Gradients in RNNs
No ratings yet
Vanishing & Exploding Gradients in RNNs
4 pages
Deep Learning Challenges & Solutions
No ratings yet
Deep Learning Challenges & Solutions
64 pages
Vanishing Gradient and Solutions
No ratings yet
Vanishing Gradient and Solutions
12 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Overview of Neural Network Activation Functions
No ratings yet
Overview of Neural Network Activation Functions
4 pages
Deep Feedforward Networks Overview
No ratings yet
Deep Feedforward Networks Overview
56 pages
Imperial Dlcourse2022 RNN Notes
No ratings yet
Imperial Dlcourse2022 RNN Notes
9 pages
Gradient Descent in Neural Networks
No ratings yet
Gradient Descent in Neural Networks
24 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
Feedforward Neural Network Overview
No ratings yet
Feedforward Neural Network Overview
35 pages
Data Science Interview Qes.
No ratings yet
Data Science Interview Qes.
15 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
NN Unit 3
No ratings yet
NN Unit 3
68 pages
Activation Function
No ratings yet
Activation Function
34 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
16 pages
Activation Functions
No ratings yet
Activation Functions
23 pages
Unit 2
No ratings yet
Unit 2
35 pages
Importance of Activation Functions in ANN
No ratings yet
Importance of Activation Functions in ANN
7 pages
14 - Học sâu (3) - Improve DNN - v3
No ratings yet
14 - Học sâu (3) - Improve DNN - v3
129 pages
Neural Networks: Structure & Training
No ratings yet
Neural Networks: Structure & Training
52 pages
Activation Functions and Batch Normalization
No ratings yet
Activation Functions and Batch Normalization
6 pages
Lecture 02 With Notes
No ratings yet
Lecture 02 With Notes
65 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
33 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
DL03 Classroom SNN
No ratings yet
DL03 Classroom SNN
41 pages
Deep Learning Activation Functions
No ratings yet
Deep Learning Activation Functions
10 pages
Activation Function
No ratings yet
Activation Function
13 pages
Iva Unit-5 Edited
No ratings yet
Iva Unit-5 Edited
42 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
CS2011 5
No ratings yet
CS2011 5
43 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
83 pages
Deep Learning Quiz 1: Concepts & Questions
No ratings yet
Deep Learning Quiz 1: Concepts & Questions
5 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
A) Explanation of Two Tensor Operations With Examp
No ratings yet
A) Explanation of Two Tensor Operations With Examp
11 pages
08 Practical Aspects of Deep Learning 2
No ratings yet
08 Practical Aspects of Deep Learning 2
100 pages
Deep FFNN QA Final Clean
No ratings yet
Deep FFNN QA Final Clean
4 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
17 pages
SS 2020
No ratings yet
SS 2020
21 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Cours 2
No ratings yet
Cours 2
25 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Understanding Neural Network Types & Functions
No ratings yet
Understanding Neural Network Types & Functions
7 pages
K-Max Pooling Operation
No ratings yet
K-Max Pooling Operation
134 pages
Convolutional vs Recurrent Networks
No ratings yet
Convolutional vs Recurrent Networks
14 pages
Sample Final Q1
No ratings yet
Sample Final Q1
4 pages
Artificial Intelligence SRM
No ratings yet
Artificial Intelligence SRM
2 pages
CSED 105 2024 1 Final
No ratings yet
CSED 105 2024 1 Final
11 pages
Syllabus
No ratings yet
Syllabus
3 pages
March 2025
No ratings yet
March 2025
1 page
IIT G AI Programme - Details
No ratings yet
IIT G AI Programme - Details
2 pages
Unit 5
No ratings yet
Unit 5
59 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
327E6B Artificial Nueral Network Syllabus
No ratings yet
327E6B Artificial Nueral Network Syllabus
2 pages
McCulloch-Pitts and Perceptron Overview
No ratings yet
McCulloch-Pitts and Perceptron Overview
55 pages
Deep Learning With Python Lab File
No ratings yet
Deep Learning With Python Lab File
33 pages
Activation Functions
No ratings yet
Activation Functions
10 pages
Survey of GPT-3 Family Language Models
No ratings yet
Survey of GPT-3 Family Language Models
48 pages
DLP Quiz1 Solution
No ratings yet
DLP Quiz1 Solution
3 pages
Step-by-Step Backpropagation Guide
No ratings yet
Step-by-Step Backpropagation Guide
13 pages
Research Papers DL
No ratings yet
Research Papers DL
1 page
Final QB NNDL 24-25
No ratings yet
Final QB NNDL 24-25
40 pages
Deep Learning & Robotics - Syllabus
No ratings yet
Deep Learning & Robotics - Syllabus
2 pages
AIML S7 Deep Learning Lab Viva Questions
No ratings yet
AIML S7 Deep Learning Lab Viva Questions
5 pages
Universal Reasoning Model
No ratings yet
Universal Reasoning Model
12 pages
Understanding Softmax and MLP Functions
No ratings yet
Understanding Softmax and MLP Functions
28 pages
Week4 CNN3
No ratings yet
Week4 CNN3
41 pages
Understanding Multilayer Perceptron MLP
No ratings yet
Understanding Multilayer Perceptron MLP
5 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
11 pages
Final DL
No ratings yet
Final DL
26 pages
BP and Activation Functions
No ratings yet
BP and Activation Functions
23 pages
Activation Functions and Keras Metrics
No ratings yet
Activation Functions and Keras Metrics
31 pages
MLStackCafe QAS 1754323382807
No ratings yet
MLStackCafe QAS 1754323382807
9 pages

Understanding Gradient Problems in Neural Networks

Uploaded by

Understanding Gradient Problems in Neural Networks

Uploaded by

VANISHING AND

Dr. Umarani Jayaraman

 The problems with training very deep

 This problem occurs during training on

 If new weight is approximately equal to old

= 0.01 (No dead

= 2.5 – (1) 0.01

 One of the newest and most

 As this gradient keeps flowing backwards

 How can the local gradient be 1, i.e, the derivative

 As this gradient is back propagated, it does not

 The residual  This residual

∂z1 ∂O1 ∂z1

--- = 0.25 * W2 = 0.25 * 500 =125 (when weights

 A solution to fix this is to apply

You might also like