You are on page 1of 3

Assignment 5

Introduction to Machine Learning


Prof. B. Ravindran
1. If the step size in gradient descent is too large, what can happen?
(a) Overfitting
(b) The model will not converge
(c) We can reach maxima instead of minima
(d) None of the above
Sol. (b)
Ref. lecture

2. Recall the XOR(tabulated below) example from class where we did a transformation of features
to make it linearly separable. Which of the following transformations can also work?

X1 X2 Y
-1 -1 -1
1 -1 1
-1 1 1
1 1 -1

(a) X1′ = X12 , X2′ = X22


(b) X1′ = 1 + X1 , X2′ = 1 − X2
(c) X1′ = X1 X2 , X2′ = −X1 X2
(d) X1′ = (X1 − X2 )2 , X2′ = (X1 + X2 )2
Sol. (c), (d)
(c)

X1′ X2′ Y
1 -1 -1
-1 1 1
-1 1 1
1 -1 -1

(d)

X1′ X2′ Y
0 4 -1
4 0 1
4 0 1
0 4 -1

The two transformations above are linearly separable.

1
3. What is the effect of using activation function f (x) = x for hidden layers in an ANN?

(a) No effect. It’s as good as any other activation function (sigmoid, tanh etc).
(b) The ANN is equivalent to doing multi-output linear regression.
(c) Backpropagation will not work.
(d) We can model highly complex non-linear functions.

Sol. (b)
Ref. lecture

4. Which of the following functions can be used on the last layer of an ANN for classification?

(a) Softmax
(b) Sigmoid
(c) Tanh
(d) Linear
Sol. (a), (b), (c)
Ref. lecture

5. Statement: Threshold function cannot be used as activation function for hidden layers.
Reason: Threshold functions do not introduce non-linearity.

(a) Statement is true and reason is false.


(b) Statement is false and reason is true.
(c) Both are true and the reason explains the statement.
(d) Both are true and the reason does not explain the statement.
Sol. (a)
The reason is that threshold function is non-differentiable so we will not be able to calculate
gradient for backpropagation.

6. We use several techniques to ensure the weights of the neural network are small (such as
random initialization around 0 or regularisation). What conclusions can we draw if weights of
our ANN are high?
(a) Model has overfitted.
(b) It was initialized incorrectly.
(c) At least one of (a) or (b).
(d) None of the above.
Sol. (d)
Overfitting may be because of high weights but the two are not always associated.

2
7. On different initializations of your neural network, you get significantly different values of loss.
What could be the reason for this?
(a) Overfitting
(b) Some problem in the architecture
(c) Incorrect activation function
(d) Multiple local minima
Sol. (d)
Ref. lecture

8. The likelihood L(θ|X) is given by:

(a) P (θ|X)
(b) P (X|θ)
(c) P (X).P (θ)
P (θ)
(d) P (X)

Sol. (b)
Ref. lecture

9. You are trying to estimate the probability of it raining today using maximum likelihood esti-
mation. Given that in n days, it rained nr times, what is the probability of it raining today?
nr
(a) n
nr
(b) nr +n
n
(c) nr +n
(d) None of the above.

Sol. (a)
The question follows the same idea as the coin example discussed in the class.
10. Choose the correct statement (multiple may be correct):
(a) MLE is a special case of MAP when prior is a uniform distribution.
(b) MLE acts as regularisation for MAP.
(c) MLE is a special case of MAP when prior is a beta disrubution .
(d) MAP acts as regularisation for MLE.
Sol. (a), (d)
Ref. lecture

You might also like