Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021) 25th SEPTEMBER (9-11AM)

09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM,
(July - November 2021) 25th SEPTEMBER (9-11AM)
Deep Learning (CS 590) MID

SEMESTER EXAM, (July - November
2021)
25th SEPTEMBER (9-11AM)

Answer all questions. Time: 120 minutes Full Marks: 50

Disable Immersive Reader
Points: 25/50
Correct 1/1 Points
In the backpropagation algorithm for learning a multilayer perceptron
Error is calculated at the output layer. 
Each of the layer from output to input error gradient is propagated. 

Error is calculated at every hidden layer layer.
Each of the layer from input to output error gradient is propagated.
Correct 1/1 Points
https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 1/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021) 25th SEPTEMBER (9-11AM)
You have an input volume that is 63x63x16, and convolve it with 32 filters
that are each 7x7, using a stride of 2 and no padding. What is the output
volume?
16x16x32
29x29x32 
29x29x16
56x56x32
Correct 1/1 Points
CNNs, unlike fully-connected neural networks, have neurons that are only
“sparsely connected”. What does it imply?
Each activation in layer (k+1) depends on a small number of activations from layer
k. 
Each layer is connected to only two other layers
Regularization makes gradient descent set many parameters to zero
Each filter works on only the depth slices from the previous layer
Correct 1/1 Points
Suppose your input is a 30 by 30 color (RGB) image, and you are not using a
convolutional network. If the first hidden layer has 10 neurons, each one fully
connected to the input, how many parameters does this hidden layer have
(including the bias parameters)?
900
2700
27000
27010 
Incorrect 0/1 Points
If you increase the number of hidden layers in a Multi-Layer Perceptron, the

classification error of test data always decreases.
TRUE
FALSE 
Can't say
Correct 1/1 Points
What if we use a learning rate that’s too large?
Network will converge.
Network may not converge. 

Can’t Say.
Correct 1/1 Points
Which of the following gives non-linearity to a neural network?
Batch Normalization.
Hyperbolic Tangent. 
Convolution operator with stride > 1 only.
None of the above.
Correct 1/1 Points
Given an input feature f, to the Batch Normalization (BN) layer, what

happens if gamma and beta parameters of BN layer
learn sqrt(variance (f )), mean(f) respectively?
An identity mapping has been learnt. 

BN has reached a collapse mode.
Gamma, beta can never learn these values.
Can’t say anything.
Correct 1/1 Points
Choose the appropriate option (s): In what cases would Zero padding help

us while constructing a CNN?
Add an additional border of zeros around the output of a convolutional layer as a down-
sampling operation.
Add an additional border of zeros around the output of a convolutional layer such that it
retains the input size.
Add an additional border of zeros around the input such that input size can be
retained. 
Filter out unnecessary information from the input boundaries.
Correct 1/1 Points
10
If all the weights are set to zero instead of random initializations in NN for a
classification task, what can be an expected behaviour?
No problem. The NN will train properly.
The NN will train. However, all the neurons will end up recognizing the same thing. 
The NN will not train.
None of these.
11
What is the use of regularization parameter while performing a regularized

linear regression?
Until some point, increasing it reduces the variance of the model significantly with‐
out significant addition of bias to the model. 
It reduces the bias in the model and hence reduces overfitting.
Controls the trade-off between the need for the model to fit the training set well and also
have a large number of model parameters.
Helps to find the exact decision boundary regardless of its complexity.
12
Consider a particular hidden layer of a neural network having three

neurons. Let the activations of these neurons for three batches of input are
[4,0,2], [3,9,1] and [4,4,7] respectively. Upon these activations, batch
normalization is applied. The value of mean would be:
[2, 4.33, 5]
[3.66, 4.33, 3.33] 

[0, 0, 0]
[3.77, 3.77, 3.77]
13
Why is ReLU more preferred as an activation function than sigmoid in deep

learning?
Slower but guaranteed convergence
Computationally efficient 
No saturation of outputs in positive region 
Can’t say
Correct 1/1 Points
14
Batch Normalization is useful because
It normalizes (changes) all the input before sending it to the next layer. 
It returns back the normalized mean and standard deviation of weights.
It is a very efficient backpropagation technique.
None of these.
Correct 1/1 Points
15
Among the following steps in regression modelling, which one/more

impact(s) the trade-off between overfitting and underfitting the most.
The polynomial degree.


Whether to learn the weights by matrix inversion or gradient descent.
The constant-term.
Correct 1/1 Points
16
The neural network given above

takes two binary valued inputs
x1,x2 and the activation function
is the binary threshold function
(h(x)=1 if x>0, otherwise 0).
Which of the following logical
functions does it compute?
OR
AND
NOR
NAND 
17
Neural networks
Optimize only a convex cost function.
Can be used for regression as well as classification. 
Always output values between 0 and 1.
Can be used in an ensemble


18
Which of the following are true for univariate linear regression?
Changing the input variable by 1 unit always affects the output by 1 unit too.
Since it is univariate, we need to estimate one coefficient for modelling the data.
The decision boundary is a property of the data set given to us.
None of the above. 
Correct 1/1 Points
19
Any logical function over binary-valued (0 or 1) inputs x1 and x2 can be

(approximately) represented using some neural network.
TRUE 
FALSE
can't say
20
Which of these hold true for SVMs ?
Moving only the support vectors around affects the separating hyperplane as well. 
Can be used for both classification and regression. 
he parameter C in the cost function for SVM controls the trade-off between mis‐
classification and regularization.

Sensitive to noise. 
Correct 1/1 Points
21
Deep Neural Networks with Stochastic Depth bypasses the subset of layers
when training using
Linear function.
Batch Normalization layer.
Identity function. 
Self-Attention Module.
22
Consider a hypothetical CNN-based model for an arbitrary vision task. The

weights of such a model can be initialized with
Ones 
Zeros 
Sparse matrix where non-zero elements are extracted from a normal distribution
with mean = 0, std-dev = 0.01. 
Using Dirac Delta Function 

Only a, b
Correct 1/1 Points
23
Does backpropagation learning is based on gradient descent along error

surface?
NO
YES 
It depends on gradient descent but not error surface.
Can't Say
Correct 1/1 Points
24
Assertion (P): The path taken by Stochastic Gradient Descent (SGD) towards
the minima always has low variance and faster convergence as compared to
Batch gradient descent.
Reason (R): In each iteration of SGD, random samples are picked based

on which gradients are calculated.
Both P and R are true and R is the correct explanation of P.
Both P and R are true but R is not the correct explanation of P.
Both P and R are false.
P is false but R is true. 

P is true but R is false.
25
Let’s say, you are using activation function X in hidden layers of neural
network. At a particular neuron for any given input, you get the output as
“-0.0001”. Which of the following activation function(s) could X represent?
ReLU
TanH 
Leaky ReLU with α = -0.01
Sigmoid
26
Which of the following is NOT a reason for using batch normalization?
Prevent vanishing/exploding gradients.
Prevent covariate shift.
Faster convergence.
Faster inference time. 

Prevent overfitting.
27
Which of the following are true with respect to parameter sharing in CNNs?
Reduces overfitting 
Allows gradient descent to set many parameters to zero, making sparse connections be‐
tween neurons of the CNN layers
While training a CNN for image classification on a dataset of animals, instead of initializing
the network with random weights, use weights from a pre-trained CNN trained on
ImageNet. These shared weights (parameters) help in training convergence.
Allows one feature detector to be used in multiple local regions in the input image 
None of the above
28
Assertion (A): An implementation of VGG-100 or VGG-256 would be

disastrous.
Reason (R): During backpropagation, first the gradient of the loss is

calculated with respect to the weights and then it is propagated backwards
along the network. Hence, for so many layers, it would run into the vanishing
gradient problem.
Both A and R are true and R is the correct explanation of A.
Both A and R are true but R is not the correct explanation of A.
Both A and R are false.
A is false but R is true.
A is true but R is false. 
29
In the context of deep CNNs, generally, we aim to achieve the Bayes error
instead of absolute zero error, because of
Shortage of training data for some tasks. 
The available training input samples may not have full information about target
samples. 
Zero error will have negative infinite gradients.
Once zero error is achieved, the system may enter into an uncertainty state.
Correct 1/1 Points
30
Input: 57x57x3
Current layer (Max pool layer): 3x3 filters applied at stride 2
Output volume: 28x28x3
What is the number of parameters in this layer?
2352
9747
0 
None
Correct 1/1 Points
31
Let x,y,z be three variables taking binary values. Define (x,y,z) = x’y’z + x’yz’

+ xy’z’ + xyz, then the data set is linearly separable.
TRUE
FALSE 
Can't say
Correct 1/1 Points
32
Which of the following statements is/are true?
The deeper layers of a neural network are typically computing more complex fea‐
tures of the input than the earlier layers. 
The earlier layers of a neural network are typically computing more complex features of
the input than the deeper layers.
Most number of parameters in the CNN are usually present in the fully connected
layers 
Every neuron in a convolution layer looks for patterns in different regions of the input
Correct 1/1 Points
33
Select all the options which helps to alleviate the vanishing gradient
problem:
Build the model in a layer-by-layer fashion using unsupervised learning 

Using ReLU instead of Sigmoid 
Using Sigmoid Activation function
None of the above
Correct 1/1 Points
34
Which of the following is(are) hyperparameter(s)?
Number of iterations 
Bias vectors
Number of hidden layers 

Weight matrices
Learning rate 
Number of neurons in each hidden layer 
35
Stack of five 3x3 conv (stride 1) layers has same effective receptive field as
.............. layer ?
One 7x7
One 11x11 
One 13X13
None
Correct 1/1 Points
36
Which statement(s) is true?
Overfitting implies too simple model architecture
Underfitting implies complex model architecture.
Overfitting implies too complex model architecture 
Overfitting and underfitting have no relation with model complexity.
37
What is the sequence of the following tasks in a perceptron?
1. For a sample input, compute an output
2. Go to the next batch of dataset
3. Initialize weights of perceptron randomly
4. If the prediction does not match the output, change the weights
1234
3412
3142 
3214
38
Why does VGG-Net use a stack of small 3x3 filters instead of a single, high-
dimensional kxk filter?
To force a regularization on the k x k filter 

To make the decision function more discriminative 
To facilitate parameter sharing
None of the above
39
Which of the following are true for a pooling layer when used in a
Convolutional Neural Network:
Upsampling the input.
If object is spatially translated, fails to help the CNN to detect its class.
Reduces network parameters. 

Induces trainable parameters in the CNN to help it learn a better summarization of the
input.
40
Assertion : Consider an arbitrary high-level vision task for which a CNN-

based model is proposed. The proposed model has 15 blocks where each
block consists of 4 layers, namely, 2 Convolution layers with different
receptive fields, Batch Normalization, ReLU activation, except for first two
blocks. The first block consists of only one Convolution layer, whereas the
second block consists of Convolution layer, followed by ReLU activation. A
researcher claimed that instead of first two blocks, authors may use only
second block.
Reason : Series of two convolution layers without activation function and

nothing else in between, is equivalent to a single convolution.
Which of the following is TRUE ?
Assertion is TRUE, but Reason is False.
Both Assertion, Reason are TRUE, but Reason is NOT proper explanation of
Assertion. 
Both Assertion, Reason are TRUE, and Reason is proper explanation of Assertion.
Both Assertion, Reason are FALSE.
41
Regarding bias and variance, which of the following statements are true?
Models which overfit have a high bias.
Models which overfit have a low bias. 

Models which underfit have a high variance.
Models which underfit have a low variance. 
42
What steps can we take to prevent overfitting in a Neural Network?
Data Augmentation. 
Weight Sharing. 
Remove Batch-Norm from first few layers of the network.
None of the above.
43
As a fan of the movie franchise: Star Wars, you decide to build a system for
tracking the movements of Master Yoda by natural language commands, as
shown in the figure below:
Your system internally uses a VGG-16, incorporating tanh activation for all

hidden units. You initialize the weights to relatively large random values, and
you are using gradient descent as an optimizer. What will happen?
Gradients become large, and to prevent divergence you have to slow down the learning
rate – hence convergence is slow
Hidden units become highly activated and convergence is faster as weights are high from
the beginning itself
Gradients always become constant after a few convolutional layers
As long as weights are randomly initialized, gradient descent is not affected by small or
large values of initialized weights
Gradients become close to zero, and convergence is slow 
Correct 1/1 Points
44
In Stochastic gradient descent, how many training samples are used before
updating the weights?
Depends on the number of training samples.
Depends on the number of test samples.
One. 
None of the above.
45
Why it is at all required to choose different learning rates for different

weights?
This would aid to reach optimum point faster. 
To avoid the problem diminishing learning rate.
To avoid the overshooting the optimum point.
All of the above
46
In neural networks, nonlinear activation functions such as sigmoid, tanh,

and ReLU
Speed up the gradient calculation in backpropagation, as compared to linear units.
Help to learn nonlinear decision boundaries. 

Are applied only to the output units.
Always output values between 0 and 1.

47
In the context of deep CNN, for a given arbitrary low-level vision task, a
CNN-based model is proposed. The model has a fixed kernel size
throughout. The authors claim that “increasing the kernel size will result in
significant performance gain”. What can you say about this claim?
TRUE, but only for low-level vision tasks.
FALSE, but only for low-level vision tasks.
TRUE.
FALSE. 
Correct 1/1 Points
48
What will this filter do

when convolved with a grayscale
image?
Detect vertical edges
Detect horizontal edges
Sharpen the image 
Blur the image
49
You are training multilayer neural networks using Batch Gradient Descent

(BGD). You notice that the training error is going down and converges to a
local minimum. Then when you test on the new data, the test error is
abnormally high. What is probably going wrong and what would you do?
The training data size is not large enough. Collect a larger training data and retrain
it. 
Tune the learning rate and add regularization term to the objective function. 
Use the same training data but add a few more hidden layers.
Do Stochastic Gradient Descent instead of Batch Gradient Descent.
Correct 1/1 Points
50
In general, which of the following method(s) is used for predicting

continuous dependent variable?
1. Linear Regression 2. Logistic Regression
1 and 2.
Only 1. 
Only 2.
None of these.
This content is created by the owner of the form. The data you submit will be sent to the form owner. Microsoft is
not responsible for the privacy or security practices of its customers, including those of this form owner. Never give
out your password.
Powered by Microsoft Forms | Privacy and cookies | Terms of use

Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021) 25th SEPTEMBER (9-11AM)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021) 25th SEPTEMBER (9-11AM)

Uploaded by

Copyright:

Available Formats

09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM,

(July - November 2021) 25th SEPTEMBER (9-11AM)

Deep Learning (CS 590) MID

25th SEPTEMBER (9-11AM)

Correct 1/1 Points

In the backpropagation algorithm for learning a multilayer perceptron

Error is calculated at the output layer. 

Each of the layer from output to input error gradient is propagated. 

Each of the layer from input to output error gradient is propagated.

Correct 1/1 Points

Correct 1/1 Points

Each layer is connected to only two other layers

Regularization makes gradient descent set many parameters to zero

Correct 1/1 Points

Incorrect 0/1 Points

If you increase the number of hidden layers in a Multi-Layer Perceptron, the

Correct 1/1 Points

What if we use a learning rate that’s too large?

Network will converge.

Network may not converge. 

Correct 1/1 Points

Which of the following gives non-linearity to a neural network?

Convolution operator with stride > 1 only.

None of the above.

Correct 1/1 Points

Given an input feature f, to the Batch Normalization (BN) layer, what

An identity mapping has been learnt. 

Gamma, beta can never learn these values.

Can’t say anything.

Correct 1/1 Points

Choose the appropriate option (s): In what cases would Zero padding help

Filter out unnecessary information from the input boundaries.

Correct 1/1 Points

No problem. The NN will train properly.

Incorrect 0/1 Points

What is the use of regularization parameter while performing a regularized

It reduces the bias in the model and hence reduces overfitting.

Helps to find the exact decision boundary regardless of its complexity.

Incorrect 0/1 Points

Consider a particular hidden layer of a neural network having three

[3.66, 4.33, 3.33] 

[3.77, 3.77, 3.77]

Incorrect 0/1 Points

Why is ReLU more preferred as an activation function than sigmoid in deep

Slower but guaranteed convergence

Correct 1/1 Points

Batch Normalization is useful because

It returns back the normalized mean and standard deviation of weights.

It is a very efficient backpropagation technique.

Correct 1/1 Points

Among the following steps in regression modelling, which one/more

The polynomial degree.

Whether to learn the weights by matrix inversion or gradient descent.

Correct 1/1 Points

The neural network given above

Incorrect 0/1 Points

Optimize only a convex cost function.

Can be used for regression as well as classification. 

Always output values between 0 and 1.

Can be used in an ensemble

Incorrect 0/1 Points

Which of the following are true for univariate linear regression?

The decision boundary is a property of the data set given to us.