You are on page 1of 7

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
For the following figure A and figure B of loss landscape, choose correct statement
Figure A Figure B

a. Figure A has small learning rate, Figure B has High learning rate
b. Figure A has high learning rate, Figure B has small learning rate
c. Figure A and Figure B have different Loss function
d. None of Above

Correct Answer: a
Detailed Solution:

Figure A has small learning rate which is evident from slow convergence before optimal
valley point. Figure B has highly fluctuating weight updates therefore has high learning
rate. (Figures taken from book Dive into Deep Learning)
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 2:
Which of the following problem is primarily solved by the Residual connection in ResNet?

a. Vanishing Gradient problem


b. Overfitting
c. Underfitting
d. Exploding gradient

Correct Answer: a
Detailed Solution:

Residual connection formulated as F(x) = H(x) + x provides a unattenuated signal form


deep layers to shallow layers.

____________________________________________________________________________

QUESTION 3:
The following is the equation of update vector for momentum optimizer. Which of the
following is true for 𝛾?
𝑉𝑡 = 𝛾𝑉𝑡−1 + 𝜂∇𝜃 𝐽(𝜃)
a. 𝛾 is the momentum term which indicates how much acceleration you want
b. 𝛾 is the step size
c. 𝛾 is the first order moment
d. 𝛾 is the second order moment

Correct Answer: a

Detailed Solution:

A fraction of the update vector of the past time step is added to the current update vector.
𝜸 is that fraction which indicates how much acceleration you want and its value lies
between 0 and 1.

____________________________________________________________________________

QUESTION 4:
Choose the correct option
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Statement 1: Stochastic gradient descent is less prone to getting stuck in local minima because
of inherent noise due to minibatch sampling.
Statement 2: Large learning rates with annealing schedule can be used with higher mini-batch
size.
a. Statement 1 is True, Statement 2 is True
b. Statement 1 is False, Statement 2 is True
c. Statement 1 is True, Statement 2 is False
d. Statement 1 is False, Statement 2 is False
Correct Answer: a
Detailed Solution:

Stochastic Gradient Descent does not consider the whole batch for update and thus has noisier
updates, due to noise, the gradient direction is very likely to avoid updates in direction of Local
minima. With higher mini-batch size, noise in SGD goes down making higher learning rate with
annealing schedule strategies can be used.

____________________________________________________________________________

QUESTION 5:
Which of the following is simplest optimizer, in computational requirement sense, to deal with
oscillations and saddle points?

a. Stochastic Gradient Descent (SGD)


b. SGD and Momentum/Nestrovs Accelerated Gradient
c. RMSProp
d. AdaGrad/ Adam

Correct Answer: b
Detailed Solution:

Mini-batch gradient descent makes a parameter update after seeing just a subset of
examples, the direction of the update has some variance, and so the path taken by mini-
batch gradient descent will "oscillate" toward convergence. Using momentum can reduce
these oscillations and deal with saddle points for vanishing gradients.

____________________________________________________________________________

QUESTION 6:
Given following three figures A, B and C choose the correct option:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Figure A

Figure B

Figure C
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

a. Figure A is SGD momentum optimizer with high momentum, Figure B is RMSProp


or AdaGrad and Figure C is SGD momentum optimizer with low Momentum.
b. Figure A is RMSProp or AdaGrad, Figure B is SGD Momentum with low
Momentum and Figure C is SGD momentum with high momentum.
c. Figure A is SGD Momentum optimizer with low momentum, Figure B is RMSProp
or AdaGrad and Figure C is SGD Momentum optimizer with high Momentum.
d. None of the above

Correct Answer: b

Detailed Solution:

RMSProp/AdaGrad show less Oscillation in steep slopes of contour lines, Low value of
momentum will make optimizer converge with high degree of oscillations, High value of
momentum dampen the oscillation in high gradient regions. (Figures taken from book Dive
into Deep Learning)

______________________________________________________________________________

QUESTION 7:
For a function f(θ0,θ1), if θ0 and θ1 are initialized at a global minimum, then what should be the
values of θ0 and θ1 after a single iteration of gradient descent?

a. θ0 and θ1 will update as per gradient descent rule


b. θ0 and θ1 will remain same
c. Depends on the values of θ0 and θ1
d. Depends on the learning rate

Correct Answer: b
Detailed Solution:
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the
parameters.
______________________________________________________________________________

QUESTION 8:
What can be one of the practical problems of exploding gradient?
a. Too large update of weight values leading to unstable network
b. Too small update of weight values inhibiting the network to learn
c. Too large update of weight values leading to faster convergence
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

d. Too small update of weight values leading to slower convergence

Correct Answer: a
Detailed Solution:
Exploding gradients are a problem where large error gradients accumulate and result in very
large updates to neural network model weights during training. This has the effect of your model
being unstable and unable to learn from your training data.
____________________________________________________________________________

QUESTION 9:
Two version of SGD are implemented as follows:

SGD1: SGD1 samples data points in same order for every epoch while constructing minibatch

SGD2: SGD2 samples data samples in random order for every epoch to construct minibatch

Select the correct statement

a. SGD1 is faster than SGD2 and robust to local minima entrapment


b. SGD2 is faster than SGD1 and robust to local minima entrapment
c. SGD1 and SGD2 have same convergence characteristics
d. None of above

Correct Answer: b
Detailed Solution:

Stochasticity of gradient descent adds noise which makes it less likely to get attracted
towards local minima. Deterministic gradient descent is likely to get trapped as it follows
same sequence of gradient updates for each epoch.

______________________________________________________________________________

QUESTION 10:
Choose correct statement in regards to GoogleNet?

a. Multiple Auxiliary classifiers are used at different depth levels to avoid vanishing
gradient problem
b. Bottleneck Layer in reduces learnable weights
c. Inception module captures information of image at varying resolution
d. All of the above
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: d

Detailed Solution:

Please refer to your class notes on GoogleNet lecture 41

____________________________________________________________________________

______________________________________________________________________

______________________________________________________________________________

************END*******

You might also like