DL - Assignment 9 Solution

NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
For the following figure A and figure B of loss landscape, choose correct statement
Figure A Figure B
a. Figure A has small learning rate, Figure B has High learning rate
b. Figure A has high learning rate, Figure B has small learning rate
c. Figure A and Figure B have different Loss function
d. None of Above
Correct Answer: a
Detailed Solution:
Figure A has small learning rate which is evident from slow convergence before optimal
valley point. Figure B has highly fluctuating weight updates therefore has high learning
rate. (Figures taken from book Dive into Deep Learning)
____________________________________________________________________________
QUESTION 2:
Which of the following problem is primarily solved by the Residual connection in ResNet?
a. Vanishing Gradient problem

b. Overfitting
c. Underfitting
d. Exploding gradient
Correct Answer: a
Detailed Solution:
Residual connection formulated as F(x) = H(x) + x provides a unattenuated signal form

deep layers to shallow layers.
____________________________________________________________________________
QUESTION 3:
The following is the equation of update vector for momentum optimizer. Which of the
following is true for 𝛾?
𝑉𝑡 = 𝛾𝑉𝑡−1 + 𝜂∇𝜃 𝐽(𝜃)
a. 𝛾 is the momentum term which indicates how much acceleration you want
b. 𝛾 is the step size
c. 𝛾 is the first order moment
d. 𝛾 is the second order moment
Correct Answer: a
Detailed Solution:
A fraction of the update vector of the past time step is added to the current update vector.
𝜸 is that fraction which indicates how much acceleration you want and its value lies
between 0 and 1.
____________________________________________________________________________
QUESTION 4:
Choose the correct option
Statement 1: Stochastic gradient descent is less prone to getting stuck in local minima because
of inherent noise due to minibatch sampling.
Statement 2: Large learning rates with annealing schedule can be used with higher mini-batch
size.
a. Statement 1 is True, Statement 2 is True
b. Statement 1 is False, Statement 2 is True
c. Statement 1 is True, Statement 2 is False
d. Statement 1 is False, Statement 2 is False
Correct Answer: a
Detailed Solution:
Stochastic Gradient Descent does not consider the whole batch for update and thus has noisier
updates, due to noise, the gradient direction is very likely to avoid updates in direction of Local
minima. With higher mini-batch size, noise in SGD goes down making higher learning rate with
annealing schedule strategies can be used.
____________________________________________________________________________
QUESTION 5:
Which of the following is simplest optimizer, in computational requirement sense, to deal with
oscillations and saddle points?
a. Stochastic Gradient Descent (SGD)

b. SGD and Momentum/Nestrovs Accelerated Gradient
c. RMSProp
d. AdaGrad/ Adam
Correct Answer: b
Detailed Solution:
Mini-batch gradient descent makes a parameter update after seeing just a subset of
examples, the direction of the update has some variance, and so the path taken by mini-
batch gradient descent will "oscillate" toward convergence. Using momentum can reduce
these oscillations and deal with saddle points for vanishing gradients.
____________________________________________________________________________
QUESTION 6:
Given following three figures A, B and C choose the correct option:
Figure A
Figure B
Figure C
a. Figure A is SGD momentum optimizer with high momentum, Figure B is RMSProp

or AdaGrad and Figure C is SGD momentum optimizer with low Momentum.
b. Figure A is RMSProp or AdaGrad, Figure B is SGD Momentum with low
Momentum and Figure C is SGD momentum with high momentum.
c. Figure A is SGD Momentum optimizer with low momentum, Figure B is RMSProp
or AdaGrad and Figure C is SGD Momentum optimizer with high Momentum.
d. None of the above
Correct Answer: b
Detailed Solution:
RMSProp/AdaGrad show less Oscillation in steep slopes of contour lines, Low value of
momentum will make optimizer converge with high degree of oscillations, High value of
momentum dampen the oscillation in high gradient regions. (Figures taken from book Dive
into Deep Learning)
______________________________________________________________________________
QUESTION 7:
For a function f(θ0,θ1), if θ0 and θ1 are initialized at a global minimum, then what should be the
values of θ0 and θ1 after a single iteration of gradient descent?
a. θ0 and θ1 will update as per gradient descent rule

b. θ0 and θ1 will remain same
c. Depends on the values of θ0 and θ1
d. Depends on the learning rate
Correct Answer: b
Detailed Solution:
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the
parameters.
______________________________________________________________________________
QUESTION 8:
What can be one of the practical problems of exploding gradient?
a. Too large update of weight values leading to unstable network
b. Too small update of weight values inhibiting the network to learn
c. Too large update of weight values leading to faster convergence
d. Too small update of weight values leading to slower convergence
Correct Answer: a
Detailed Solution:
Exploding gradients are a problem where large error gradients accumulate and result in very
large updates to neural network model weights during training. This has the effect of your model
being unstable and unable to learn from your training data.
____________________________________________________________________________
QUESTION 9:
Two version of SGD are implemented as follows:
SGD1: SGD1 samples data points in same order for every epoch while constructing minibatch
SGD2: SGD2 samples data samples in random order for every epoch to construct minibatch
Select the correct statement
a. SGD1 is faster than SGD2 and robust to local minima entrapment

b. SGD2 is faster than SGD1 and robust to local minima entrapment
c. SGD1 and SGD2 have same convergence characteristics
d. None of above
Correct Answer: b
Detailed Solution:
Stochasticity of gradient descent adds noise which makes it less likely to get attracted
towards local minima. Deterministic gradient descent is likely to get trapped as it follows
same sequence of gradient updates for each epoch.
______________________________________________________________________________
QUESTION 10:
Choose correct statement in regards to GoogleNet?
a. Multiple Auxiliary classifiers are used at different depth levels to avoid vanishing
gradient problem
b. Bottleneck Layer in reduces learnable weights
c. Inception module captures information of image at varying resolution
d. All of the above
Correct Answer: d
Detailed Solution:
Please refer to your class notes on GoogleNet lecture 41
____________________________________________________________________________
______________________________________________________________________
______________________________________________________________________________
************END*******

DL - Assignment 9 Solution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL - Assignment 9 Solution

Uploaded by

Copyright:

Available Formats

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

a. Vanishing Gradient problem

Residual connection formulated as F(x) = H(x) + x provides a unattenuated signal form

a. Stochastic Gradient Descent (SGD)

a. Figure A is SGD momentum optimizer with high momentum, Figure B is RMSProp

a. θ0 and θ1 will update as per gradient descent rule

d. Too small update of weight values leading to slower convergence

Select the correct statement

a. SGD1 is faster than SGD2 and robust to local minima entrapment

Please refer to your class notes on GoogleNet lecture 41

You might also like