You are on page 1of 23

09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, 

(July - November 2021)  25th SEPTEMBER (9-11AM)

Deep Learning (CS 590) MID


SEMESTER EXAM, (July - November
2021) 

25th SEPTEMBER (9-11AM) 


Answer all questions. Time: 120 minutes Full Marks: 50   


Disable Immersive Reader

Points: 25/50

Correct 1/1 Points

In the backpropagation algorithm for learning a multilayer perceptron

Error is calculated at the output layer. 

Each of the layer from output to input error gradient is propagated. 


Error is calculated at every hidden layer layer.

Each of the layer from input to output error gradient is propagated.

Correct 1/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 1/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

You have an input volume that is 63x63x16, and convolve it with 32 filters
that are each 7x7, using a stride of 2 and no padding. What is the output
volume? 

16x16x32

29x29x32 

29x29x16

56x56x32

Correct 1/1 Points

CNNs, unlike fully-connected neural networks, have neurons that are only
“sparsely connected”. What does it imply? 

Each activation in layer (k+1) depends on a small number of activations from layer
k. 

Each layer is connected to only two other layers

Regularization makes gradient descent set many parameters to zero 

Each filter works on only the depth slices from the previous layer

Correct 1/1 Points

Suppose your input is a 30 by 30 color (RGB) image, and you are not using a
convolutional network. If the first hidden layer has 10 neurons, each one fully
connected to the input, how many parameters does this hidden layer have
(including the bias parameters)? 

900

2700

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 2/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

27000

27010 

Incorrect 0/1 Points

If you increase the number of hidden layers in a Multi-Layer Perceptron, the


classification error of test data always decreases. 

TRUE

FALSE 
Can't say

Correct 1/1 Points

What if we use a learning rate that’s too large? 

Network will converge.

Network may not converge. 


Can’t Say.

Correct 1/1 Points

Which of the following gives non-linearity to a neural network?

Batch Normalization.

Hyperbolic Tangent. 

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 3/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Convolution operator with stride > 1 only.

None of the above.

Correct 1/1 Points

Given an input feature f, to the Batch Normalization (BN) layer, what


happens if gamma and beta parameters of BN layer
learn sqrt(variance (f )), mean(f) respectively? 

An identity mapping has been learnt. 


BN has reached a collapse mode.

Gamma, beta can never learn these values.

Can’t say anything.

Correct 1/1 Points

Choose the appropriate option (s): In what cases would Zero padding help


us while constructing a CNN? 

Add an additional border of zeros around the output of a convolutional layer as a down-
sampling operation.

Add an additional border of zeros around the output of a convolutional layer such that it
retains the input size.

Add an additional border of zeros around the input such that input size can be
retained. 

Filter out unnecessary information from the input boundaries.

Correct 1/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 4/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

10

If all the weights are set to zero instead of random initializations in NN for a
classification task, what can be an expected behaviour? 

No problem. The NN will train properly.

The NN will train. However, all the neurons will end up recognizing the same thing. 
The NN will not train.

None of these.

Incorrect 0/1 Points

11

What is the use of regularization parameter while performing a regularized


linear regression? 

Until some point, increasing it reduces the variance of the model significantly with‐
out significant addition of bias to the model. 

It reduces the bias in the model and hence reduces overfitting.

Controls the trade-off between the need for the model to fit the training set well and also
have a large number of model parameters.

Helps to find the exact decision boundary regardless of its complexity.

Incorrect 0/1 Points

12

Consider a particular hidden layer of a neural network having three


neurons. Let the activations of these neurons for three batches of input are
[4,0,2], [3,9,1] and [4,4,7] respectively. Upon these activations, batch
normalization is applied. The value of mean would be: 

[2, 4.33, 5]

[3.66, 4.33, 3.33] 


https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 5/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

[0, 0, 0]

[3.77, 3.77, 3.77]

Incorrect 0/1 Points

13

Why is ReLU more preferred as an activation function than sigmoid in deep


learning? 

Slower but guaranteed convergence

Computationally efficient 
No saturation of outputs in positive region 

Can’t say

Correct 1/1 Points

14

Batch Normalization is useful because

It normalizes (changes) all the input before sending it to the next layer. 

It returns back the normalized mean and standard deviation of weights.

It is a very efficient backpropagation technique.

None of these.

Correct 1/1 Points

15

Among the following steps in regression modelling, which one/more


impact(s) the trade-off between overfitting and underfitting the most. 

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 6/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

The polynomial degree.


Whether to learn the weights by matrix inversion or gradient descent.

The constant-term.

Correct 1/1 Points

16

The neural network given above


takes two binary valued inputs
x1,x2 and the activation function
is the binary threshold function
(h(x)=1 if x>0, otherwise 0).
Which of the following logical
functions does it compute? 

OR

AND

NOR

NAND 

Incorrect 0/1 Points

17

Neural networks 

Optimize only a convex cost function.

Can be used for regression as well as classification. 

Always output values between 0 and 1.

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 7/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Can be used in an ensemble


Incorrect 0/1 Points

18

 Which of the following are true for univariate linear regression? 

Changing the input variable by 1 unit always affects the output by 1 unit too.

Since it is univariate, we need to estimate one coefficient for modelling the data.

The decision boundary is a property of the data set given to us.

None of the above. 

Correct 1/1 Points

19

Any logical function over binary-valued (0 or 1) inputs x1 and x2 can be


(approximately) represented using some neural network. 

TRUE 

FALSE

can't say

Incorrect 0/1 Points

20

Which of these hold true for SVMs ? 

Moving only the support vectors around affects the separating hyperplane as well. 
Can be used for both classification and regression. 

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 8/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

he parameter C in the cost function for SVM controls the trade-off between mis‐
classification and regularization.

Sensitive to noise. 

Correct 1/1 Points

21

Deep Neural Networks with Stochastic Depth bypasses the subset of layers
when training using 

Linear function.

Batch Normalization layer.

Identity function. 

Self-Attention Module.

Incorrect 0/1 Points

22

Consider a hypothetical CNN-based model for an arbitrary vision task. The


weights of such a model can be initialized with 

Ones 
Zeros 
Sparse matrix where non-zero elements are extracted from a normal distribution
with mean = 0, std-dev = 0.01. 

Using Dirac Delta Function 


Only a, b

Correct 1/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 9/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

23

Does backpropagation learning is based on gradient descent along error


surface? 

NO

YES 
It depends on gradient descent but not error surface.

Can't Say

Correct 1/1 Points

24

Assertion (P): The path taken by Stochastic Gradient Descent (SGD) towards
the minima always has low variance and faster convergence as compared to
Batch gradient descent. 

Reason (R): In each iteration of SGD, random samples are picked based


on which gradients are calculated. 

Both P and R are true and R is the correct explanation of P.

Both P and R are true but R is not the correct explanation of P.

Both P and R are false.

P is false but R is true. 


P is true but R is false.

Incorrect 0/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 10/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

25

Let’s say, you are using activation function X in hidden layers of neural
network. At a particular neuron for any given input, you get the output as
“-0.0001”. Which of the following activation function(s)  could X represent? 

ReLU

TanH 
Leaky ReLU with α = -0.01

Sigmoid

Incorrect 0/1 Points

26

Which of the following is NOT a reason for using batch normalization? 

Prevent vanishing/exploding gradients.

Prevent covariate shift.

Faster convergence.

Faster inference time. 


Prevent overfitting.

Incorrect 0/1 Points

27

Which of the following are true with respect to parameter sharing in CNNs? 

Reduces overfitting 
Allows gradient descent to set many parameters to zero, making sparse connections be‐
tween neurons of the CNN layers
https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 11/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

While training a CNN for image classification on a dataset of animals, instead of initializing
the network with random weights, use weights from a pre-trained CNN trained on
ImageNet. These shared weights (parameters) help in training convergence.

Allows one feature detector to be used in multiple local regions in the input image 

None of the above

Incorrect 0/1 Points

28

Assertion (A): An implementation of VGG-100 or VGG-256 would be


disastrous. 

Reason (R): During backpropagation, first the gradient of the loss is


calculated with respect to the weights and then it is propagated backwards
along the network. Hence, for so many layers, it would run into the vanishing
gradient problem. 

Both A and R are true and R is the correct explanation of A.

 Both A and R are true but R is not the correct explanation of A.

Both A and R are false.

A is false but R is true.

A is true but R is false. 

Incorrect 0/1 Points

29

In the context of deep CNNs, generally, we aim to achieve the Bayes error
instead of absolute zero error, because of 

Shortage of training data for some tasks. 

The available training input samples may not have full information about target
samples. 

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 12/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Zero error will have negative infinite gradients.

Once zero error is achieved, the system may enter into an uncertainty state.

Correct 1/1 Points

30

Input: 57x57x3 

Current layer (Max pool layer): 3x3 filters applied at stride 2 

Output volume:  28x28x3 

What is the number of parameters in this layer? 

2352

9747

0 

None

Correct 1/1 Points

31

Let x,y,z be three variables taking binary values.  Define (x,y,z) = x’y’z + x’yz’


+ xy’z’ + xyz, then the data set is linearly separable. 

TRUE

FALSE 

Can't say

Correct 1/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 13/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

32

Which of the following statements is/are true? 

The deeper layers of a neural network are typically computing more complex fea‐
tures of the input than the earlier layers. 

The earlier layers of a neural network are typically computing more complex features of
the input than the deeper layers.

Most number of parameters in the CNN are usually present in the fully connected
layers 

Every neuron in a convolution layer looks for patterns in different regions of the input

Correct 1/1 Points

33

Select all the options which helps to alleviate the vanishing gradient
problem: 

Build the model in a layer-by-layer fashion using unsupervised learning 


Using ReLU instead of Sigmoid 

Using Sigmoid Activation function

None of the above

Correct 1/1 Points

34

Which of the following is(are) hyperparameter(s)?

Number of iterations 

Bias vectors

Number of hidden layers 


https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 14/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Weight matrices

Learning rate 
Number of neurons in each hidden layer 

Incorrect 0/1 Points

35

Stack of five 3x3 conv (stride 1) layers has same effective receptive field as
.............. layer ? 

One 7x7

One 11x11 
One 13X13

None

Correct 1/1 Points

36

Which statement(s) is true?

Overfitting implies too simple model architecture

Underfitting implies complex model architecture.

Overfitting implies too complex model architecture 

Overfitting and underfitting have no relation with model complexity.

Incorrect 0/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 15/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

37

What is the sequence of the following tasks in a perceptron? 

      1. For a sample input, compute an output 

      2. Go to the next batch of dataset 

      3. Initialize weights of perceptron randomly 

      4. If the prediction does not match the output, change the weights 

1234

3412

3142 
3214

Incorrect 0/1 Points

38

Why does VGG-Net use a stack of small 3x3 filters instead of a single, high-
dimensional kxk filter? 

To force a regularization on the k x k filter 


To make the decision function more discriminative 
To facilitate parameter sharing

None of the above

Incorrect 0/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 16/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

39

Which of the following are true for a pooling layer when used in a
Convolutional Neural Network: 

Upsampling the input.

If object is spatially translated, fails to help the CNN to detect its class.

Reduces network parameters. 


Induces trainable parameters in the CNN to help it learn a better summarization of the
input.

Incorrect 0/1 Points

40

Assertion : Consider an arbitrary high-level vision task for which a CNN-


based model is proposed. The proposed model has 15 blocks where each
block consists of 4 layers, namely, 2 Convolution layers with different
receptive fields, Batch Normalization, ReLU activation, except for first two
blocks.  The first block consists of only one Convolution layer, whereas the
second block consists of Convolution layer, followed by ReLU activation. A
researcher claimed that instead of first two blocks, authors may use only
second block. 

Reason :   Series of two convolution layers without activation function and


nothing else in between, is equivalent to a single convolution. 

Which of the following is TRUE ? 

Assertion is TRUE, but Reason is False.

Both Assertion, Reason are TRUE, but Reason is NOT proper explanation of
Assertion. 

Both Assertion, Reason are TRUE, and Reason is proper explanation of Assertion.

Both Assertion, Reason are FALSE.

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 17/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Incorrect 0/1 Points

41

Regarding bias and variance, which of the following statements are true? 

Models which overfit have a high bias.

Models which overfit have a low bias. 


Models which underfit have a high variance.

Models which underfit have a low variance. 

Incorrect 0/1 Points

42

What steps can we take to prevent overfitting in a Neural Network? 

Data Augmentation. 
Weight Sharing. 

Remove Batch-Norm from first few layers of the network.

None of the above.

Incorrect 0/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 18/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

43

As a fan of the movie franchise: Star Wars, you decide to build a system for
tracking the movements of Master Yoda by natural language commands, as
shown in the figure below: 

Your system internally uses a VGG-16, incorporating tanh activation for all


hidden units. You initialize the weights to relatively large random values, and
you are using gradient descent as an optimizer. What will happen? 

Gradients become large, and to prevent divergence you have to slow down the learning
rate – hence convergence is slow

Hidden units become highly activated and convergence is faster as weights are high from
the beginning itself

Gradients always become constant after a few convolutional layers

As long as weights are randomly initialized, gradient descent is not affected by small or
large values of initialized weights

Gradients become close to zero, and convergence is slow 

Correct 1/1 Points

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 19/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

44

In Stochastic gradient descent, how many training samples are used before
updating the weights? 

Depends on the number of training samples.

Depends on the number of test samples.

One. 
None of the above.

Incorrect 0/1 Points

45

Why it is at all required to choose different learning rates for different


weights? 

This would aid to reach optimum point faster. 

To avoid the problem diminishing learning rate.

To avoid the overshooting the optimum point.

All of the above

Incorrect 0/1 Points

46

In neural networks, nonlinear activation functions such as sigmoid, tanh,


and ReLU 

Speed up the gradient calculation in backpropagation, as compared to linear units.

Help to learn nonlinear decision boundaries. 


Are applied only to the output units.

Always output values between 0 and 1. 


https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 20/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Incorrect 0/1 Points

47

In the context of deep CNN, for a given arbitrary low-level vision task, a
CNN-based model is proposed. The model has a fixed kernel size
throughout. The authors claim that “increasing the kernel size will result in
significant performance gain”. What can you say about this claim? 

TRUE, but only for low-level vision tasks.

FALSE, but only for low-level vision tasks.

TRUE.

FALSE. 

Correct 1/1 Points

48

What will this filter do


when convolved with a grayscale
image? 

Detect vertical edges

Detect horizontal edges

Sharpen the image 

Blur the image

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 21/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Incorrect 0/1 Points

49

You are training multilayer neural networks using Batch Gradient Descent


(BGD). You notice that the training error is going down and converges to a
local minimum. Then when you test on the new data, the test error is
abnormally high. What is probably going wrong and what would you do? 

The training data size is not large enough. Collect a larger training data and retrain
it. 

Tune the learning rate and add regularization term to the objective function. 

Use the same training data but add a few more hidden layers.

Do Stochastic Gradient Descent instead of Batch Gradient Descent.

Correct 1/1 Points

50

In general, which of the following method(s) is used for predicting


continuous dependent variable? 

1. Linear Regression          2. Logistic Regression

1 and 2.

Only 1. 

Only 2.

None of these.

This content is created by the owner of the form. The data you submit will be sent to the form owner. Microsoft is
not responsible for the privacy or security practices of its customers, including those of this form owner. Never give
out your password.

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 22/23
09/09/2022, 21:22 Deep Learning (CS 590) MID SEMESTER EXAM, (July - November 2021)  25th SEPTEMBER (9-11AM)

Powered by Microsoft Forms | Privacy and cookies | Terms of use

https://forms.office.com/Pages/ResponsePage.aspx?id=jacKheGUxkuc84wRtTBwHCTd1h3zvShFq0ZkcCDdWudUOVNJUlUzWFZESUtSQlRX… 23/23

You might also like