DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024

NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 5
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
What is the main benefit of stacking multiple layers of neuron with non-linear activation
functions over a single layer perceptron?
a. Reduces complexity of the network

b. Reduce inference time during testing
c. Allows to create complex non-linear decision boundaries
d. All of the above
Correct Answer: c
Detailed Solution:
A single layer perceptron without non-linear activation function is capable of classifying

only linearly separable classes. Stacking multiple layers of neurons helps in creating non-
linear decision boundaries and thus can be used for classifying examples belonging to
classes which are NOT linearly separable.
QUESTION 2:
For a 2-class classification problem, what is the minimum number of nodes required for the
output layer of a multi-layered neural network?
a. 2
b. 1
c. 3
d. None of the above
Correct Answer: b
Detailed Solution:
Only 1 node is enough. We can expect that node to be activated (have high activation value)
only when class = +1 else the node should NOT be activated (have activation close to zero).
We can use the binary (2-class) cross entropy loss to train such a model.
QUESTION 3:
What will the output from node 𝑎3 in the following neural network setup when the inputs are
(𝑥1, 𝑥2) = (0, 1).
The activation function used in each of three nodes 𝑎1 , 𝑎2 and 𝑎3 are zero-thresholding i.e.,
0, 𝑖𝑓 𝑣 < 0
𝑓(𝑣) = {
1, 𝑖𝑓 𝑣 ≥ 0
a. -1
b. 0
c. 1
d. 0.5
Correct Answer: b
Detailed Solution:
Output of 𝒂𝟏 : 𝒇(𝟎. 𝟓 ∗ 𝟏 + −𝟏 ∗ 𝟎 + −𝟏 ∗ 𝟏) = 𝒇(−𝟎. 𝟓) = 𝟎
Output of 𝒂𝟐 : 𝒇(−𝟏. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟏) = 𝒇(−𝟎. 𝟓) = 𝟎
Output of 𝒂𝟑 : 𝒇(−𝟎. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟎) = 𝒇(−𝟎. 𝟓) = 𝟎
______________________________________________________________________________
QUESTION 4:
Which basic logic gate is implemented by the following neural network setup. The activation
function used in each of three nodes 𝑎1 , 𝑎2 and 𝑎3 are zero-thresholding i.e.,
0, 𝑖𝑓 𝑣 < 0
𝑓(𝑣) = {
1, 𝑖𝑓 𝑣 ≥ 0
a. AND
b. NOR
c. XNOR
d. XOR
Correct Answer: c
Detailed Solution:
for input (𝟎, 𝟎)
Output of 𝒂𝟏 : 𝒇(𝟎. 𝟓 ∗ 𝟏 + −𝟏 ∗ 𝟎 + −𝟏 ∗ 𝟎) = 𝒇(𝟎. 𝟓) = 𝟏
Output of 𝒂𝟐 : 𝒇(−𝟏. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟎) = 𝒇(−𝟏. 𝟓) = 𝟎
Output of 𝒂𝟑 : 𝒇(−𝟎. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟏 + 𝟏 ∗ 𝟎) = 𝒇(𝟎. 𝟓) = 𝟏

Similarly for input (𝟏, 𝟏)
Output of 𝒂𝟏 : 𝒇(𝟎. 𝟓 ∗ 𝟏 + −𝟏 ∗ 𝟏 + −𝟏 ∗ 𝟏) = 𝒇(−𝟏. 𝟓) = 𝟎
Output of 𝒂𝟐 : 𝒇(−𝟏. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟏 + 𝟏 ∗ 𝟏) = 𝒇(𝟎. 𝟓) = 𝟏
Output of 𝒂𝟑 : 𝒇(−𝟎. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟏) = 𝒇(𝟎. 𝟓) = 𝟏

For other inputs (𝟏, 𝟎) and (𝟎, 𝟏) Output of 𝒂𝟑 is 0. So, the network resembles XNOR.
QUESTION 5:
𝜕𝐽
Find the gradient component for the network shown below if 𝐽 (∙) = (𝑝̂ − 𝑝)2 is the loss
𝜕𝑤1
function, 𝑝 is the target and the non-linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function
represented as 𝜎(∙)?
a. 2𝑝̂ × (1 − 𝜎(𝑦)) × 𝑥 1
b. 2(𝑝̂ − 𝑝) × 𝜎 (𝑦) × (1 − 𝜎 (𝑦)) × 𝑥 1
c. 2(𝑝̂ − 𝑝) × (1 − 𝜎 (𝑦)) × 𝑥 1
d. 2(1 − 𝑝) × (1 − 𝜎 (𝑦)) × 𝑥 1
Correct Answer: b
Detailed Solution:
̂ − 𝒑) 𝟐
𝑱 ( ∙) = ( 𝒑
̂ = 𝒇𝑵𝑳 (𝒚)
𝒑
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝟏
Using chain rule,
𝝏𝑱 𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= ∙ ∙
𝝏𝒘𝟏 𝝏𝒑 ̂ 𝝏𝒚 𝝏𝒘𝟏
𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= 𝟐(𝒑
̂ − 𝒑) , = 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)), = 𝒙𝟏
𝝏𝒑
̂ 𝝏𝒚 𝝏𝒘 𝟏
𝝏𝑱
= 𝟐 (𝒑
̂ − 𝒑) × 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)) × 𝒙𝟏
𝝏𝒘𝟏
QUESTION 6:
Find the output 𝑝̂ corresponding to input {𝑥 1 = 1, 𝑥 2 = 1, 𝑥 3 = 0}, for the network shown
below. The non-linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function.
The weights are given as 𝑤1 = 2, 𝑤2 = −1, 𝑤3 = 1, 𝑏 = −1.
a. 0
b. 1
c. 0.5
d. 0.25
Correct Answer: c
Detailed Solution:
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝒃 = 𝟏 𝑿 𝟐 − 𝟏 𝑿 𝟏 + 𝟎 𝑿 𝟏 − 𝟏 = 𝟎 ̂ = 𝝈(𝒚) = 𝜎(0) =
𝒑
0.5
QUESTION 7:
𝜕𝐽
Find the gradient component for the network shown below if 𝐽(∙) = (𝑝̂ − 𝑝)2 is the loss
𝜕𝑤2
function, 𝑝=1 is the target and the non-linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function
represented as 𝜎(∙)?
The input to the network is {𝑥 1 = 1, 𝑥 2 = 1, 𝑥 3 = 0}
The weights are given as 𝑤1 = 2, 𝑤2 = −1, 𝑤3 = 1, 𝑏 = −1.
a. −0.5
b. −1
c. 0
d. −0.25
Correct Answer: d
Detailed Solution:
̂ − 𝒑) 𝟐
𝑱 ( ∙) = ( 𝒑
̂ = 𝒇𝑵𝑳 (𝒚)
𝒑
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝒃 = 𝟏 𝑿 𝟐 − 𝟏 𝑿 𝟏 + 𝟎 𝑿 𝟏 − 𝟏 = 𝟎 ̂ = 𝝈(𝒚) = 𝜎(0) =
𝒑
0.5
Using chain rule,
𝝏𝑱 𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= ∙ ∙
𝝏𝒘𝟐 𝝏𝒑 ̂ 𝝏𝒚 𝝏𝒘𝟐
𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= 𝟐(̂
𝒑 − 𝒑) , = 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)), = 𝒙𝟐
𝝏𝒑
̂ 𝝏𝒚 𝝏𝒘 𝟐
𝝏𝑱
= 𝟐 (𝒑
̂ − 𝒑) × 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)) × 𝒙𝟐 = 𝟐 × (𝟎. 𝟓 − 𝟏) × 𝟎. 𝟓 × (𝟏 − 𝟎. 𝟓) × 𝟏
𝝏𝒘𝟐
= −𝟎. 𝟐𝟓
QUESTION 8:
What will be the updated value of 𝑤2 after the first iteration from the current state of the
network shown below if 𝐽 (∙) = (𝑝̂ − 𝑝)2 is the loss function, 𝑝=1 is the target and the non-
linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function represented as 𝜎(∙)?
The input to the network is {𝑥 1 = 1, 𝑥 2 = 1, 𝑥 3 = 0}, the learning rate η = 2
The weights of the current state are given as 𝑤1 = 2, 𝑤2 = −1, 𝑤3 = 1, 𝑏 = −1.
a. −0.5
b. −1
c. 0
d. −0.25
Correct Answer: a
Detailed Solution:
̂ − 𝒑) 𝟐
𝑱 ( ∙) = ( 𝒑
̂ = 𝒇𝑵𝑳 (𝒚)
𝒑
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝒃 = 𝟏 𝑿 𝟐 − 𝟏 𝑿 𝟏 + 𝟎 𝑿 𝟏 − 𝟏 = 𝟎 ̂ = 𝝈(𝒚) = 𝜎(0) =
𝒑
0.5
Using chain rule,
𝝏𝑱 𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= ∙ ∙
𝝏𝒘𝟐 𝝏𝒑 ̂ 𝝏𝒚 𝝏𝒘𝟐
𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= 𝟐(𝒑
̂ − 𝒑) , = 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)), = 𝒙𝟐
𝝏𝒑
̂ 𝝏𝒚 𝝏𝒘 𝟐
𝝏𝑱
= 𝟐 (𝒑
̂ − 𝒑) × 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)) × 𝒙𝟐 = 𝟐 × (𝟎. 𝟓 − 𝟏) × 𝟎. 𝟓 × (𝟏 − 𝟎. 𝟓) × 𝟏
𝝏𝒘𝟐
= −𝟎. 𝟐𝟓
𝝏𝑱
Updated 𝒘𝟐 = 𝒘𝟐 − 𝜼 𝝏𝒘 = −𝟏 − 𝟐 × (−𝟎. 𝟐𝟓) = −𝟏 + 𝟎. 𝟓 = −𝟎. 𝟓
𝟐
QUESTION 9:
Suppose a neural network has 3 input nodes, x, y, z. There are 2 neurons, Q and F. Q = x + y and
F = Q * z. What is the gradient of F with respect to x, y and z? Assume, (x, y, z) = (-2, 5, -4).
a. (-4, 3, -3)
b. (-4, -4, 3)
c. (4, 4, -3)
d. (3, 3, 4)
Correct Answer: b
Detailed Solution:
𝝏𝑭
𝑭 = 𝑸. 𝒛, = 𝑸 = 𝒙+𝒚 = 𝟑
𝝏𝒛
𝝏𝑭 𝝏𝑭
𝑭 = 𝑸. 𝒛 = (𝒙 + 𝒚). 𝒛, = 𝒛 = −𝟒, = 𝒛 = −𝟒
𝝏𝒙 𝝏𝒚
QUESTION 10:
Suppose a fully-connected neural network has a single hidden layer with 15 nodes. The input is
represented by a 5D feature vector and the number of classes is 3. Calculate the number of
parameters of the network. Consider there are NO bias nodes in the network?
a. 225
b. 75
c. 78
d. 120
Correct Answer: d
Detailed Solution:
Number of parameters = (5 * 15) + (15 * 3) = 120
************END*******

DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024

Uploaded by

Copyright:

Available Formats

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

a. Reduces complexity of the network

A single layer perceptron without non-linear activation function is capable of classifying

Output of 𝒂𝟏 : 𝒇(𝟎. 𝟓 ∗ 𝟏 + −𝟏 ∗ 𝟎 + −𝟏 ∗ 𝟏) = 𝒇(−𝟎. 𝟓) = 𝟎

Output of 𝒂𝟐 : 𝒇(−𝟏. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟏) = 𝒇(−𝟎. 𝟓) = 𝟎

Output of 𝒂𝟑 : 𝒇(−𝟎. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟎) = 𝒇(−𝟎. 𝟓) = 𝟎

Output of 𝒂𝟏 : 𝒇(𝟎. 𝟓 ∗ 𝟏 + −𝟏 ∗ 𝟎 + −𝟏 ∗ 𝟎) = 𝒇(𝟎. 𝟓) = 𝟏

Output of 𝒂𝟐 : 𝒇(−𝟏. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟎) = 𝒇(−𝟏. 𝟓) = 𝟎

Output of 𝒂𝟑 : 𝒇(−𝟎. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟏 + 𝟏 ∗ 𝟎) = 𝒇(𝟎. 𝟓) = 𝟏

Output of 𝒂𝟏 : 𝒇(𝟎. 𝟓 ∗ 𝟏 + −𝟏 ∗ 𝟏 + −𝟏 ∗ 𝟏) = 𝒇(−𝟏. 𝟓) = 𝟎

Output of 𝒂𝟐 : 𝒇(−𝟏. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟏 + 𝟏 ∗ 𝟏) = 𝒇(𝟎. 𝟓) = 𝟏

Output of 𝒂𝟑 : 𝒇(−𝟎. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟏) = 𝒇(𝟎. 𝟓) = 𝟏

Using chain rule,

The weights are given as 𝑤1 = 2, 𝑤2 = −1, 𝑤3 = 1, 𝑏 = −1.

The input to the network is {𝑥 1 = 1, 𝑥 2 = 1, 𝑥 3 = 0}

The weights are given as 𝑤1 = 2, 𝑤2 = −1, 𝑤3 = 1, 𝑏 = −1.

The input to the network is {𝑥 1 = 1, 𝑥 2 = 1, 𝑥 3 = 0}, the learning rate η = 2

The weights of the current state are given as 𝑤1 = 2, 𝑤2 = −1, 𝑤3 = 1, 𝑏 = −1.

Number of parameters = (5 * 15) + (15 * 3) = 120

You might also like