You are on page 1of 32

Old Finals / Midterms Questions

Q1. What is the depth of a multilayer perceptron (MLP)?


(a) The number of input features
(b) The length of the longest path from a source to a sink
(c) The number of neurons in hidden layer
(d) The number of neurons in the output layer
(e) None of the given statements is correct.

Q2. Which of the following is/are the example(s) of activation function(s) used in MLP?
(a) ReLU
(b) Sigmoid
(c) Hyperbolic tangent (tanh)
(d) All of the given statements are correct
(e) Threshold/step function.

Solution:
ReLU (Recti ed Linear Unit), sigmoid, and hyperbolic tangent are all commonly used activation functions
in MLP. ReLU is often used in hidden layers because of its simplicity and ability to prevent vanishing
gradients, while sigmoid and hyperbolic tangent are often used in the output layer for binary and multi-
class classi cation tasks, respectively.

Q3. Which of the following statements is true regarding the bias term in the Multi-Layer Perceptron (MLP)?
(a) The bias term is always set to 1.
(b) The bias term is always set to 0.
(c) The bias term can be a real number.
(d) The bias term is always greater than 1.
(e) None of the statements given is correct.
Solution:
The bias term in the MLP is used to shift the activation function of the neurons in the hidden layer and the
output layer. It can be any real number and is typically included as a trainable parameter in the MLP.

Q4. What would be the threshold for a perceptron that emulates a logic gate that will output the following
Boolean function ( , , , , , , , h, , ) = abcdefghij
(a) 4
(b) 6
(c) 10
(d) -2
(e) None of the statements given is correct.
Solution:
The perceptron should re only with this pattern 1001100001. Thus, the threshold should be 4. Any of the
direct inputs becomes 0, the total sum will be <4, while if any of the inverted inputs becomes 1, total sum
will be < 4 and the output is 0.

Q5. Consider the following NN, how many neurons in hidden layer#1?
(a) 1
(b) 2
(c) 3
(d) 4
(e) 5

Solution:
A “layer” is the set of neurons that are all at the same depth with respect to the input, so hidden layer #1
has 4 neurons!

Q6. In the “real” valued perceptron, which of the following is FALSE


(a) Inputs are real values
(b) Weights are real values
(c) Output can be only real
(d) Any real-valued “activation” function can be used
(e) None of the statements given is correct.
Solution:
Output can be real or Boolean
Q7. If we have n inputs MLP, the decision boundary de ned by the weights is a hyperplane of dimension:
(a) –1
(b)
(c) + 1
(d) Depends on other factors
(e) None of the statements given is correct.
Solution:
If we have n inputs, the weights de ne a decision boundary that is an –1 dimensional hyperplane

Q8. The Boolean function given in the table below is to be implemented using the multilayer perceptron (MLP)
given in the diagram.

What are the appropriate values of the weights W1, W2, ... W6, and thresholds T1, T2, and T3 to implement
the function above?
(a) T1 = 0, T2 = -2, T3 = 2, W1 = 1, W2 = -1, W3 = -1, W4 = 1, W5 = 1, W6 = -1
(b) T1 = 0, T2 = 2, T3 = 2, W1 = 1, W2 = -1, W3 = -1, W4 = 1, W5 = 1, W6 = -1
(c) T1 = 2, T2 = 0, T3 = 1, W1 = -1, W2 = 1, W3 = -1, W4 = 1, W5 = 1, W6 = -1
(d) T1 = 0, T2 = 2, T3 = 1, W1 = -1, W2 = 1, W3 = -1, W4 = 1, W5 = 1, W6 = 1
(e) T1 = 0, T2 = 2, T3 = 2, W1 = -1, W2 = -1, W3 = -1, W4 = 1, W5 = 1, W6 = -2
Solution:
The Boolean equation to be implemented by the MLP is: F = (¬x ∧ ¬y) ∨ (x ∧ y) = F1 ∨ F2
F1 = (¬x ∧ ¬y) hence W1 = -1, W3 = -1, T1 = 0,
F2 = (x ∧ y) hence W2 = 1, W4 = 1, T2 = 2
Finally, an OR gate is required between F1 and F2, hence W5 = 1, W6 = 1and T3 = 1

Q9. For some Neural Network using the Gradient Decent algorithm, the optimum step size (ηopt) was found to
be ηopt = 0.4. Which of the following statements is TRUE?
(a) If you choose η = 0.1, then the algorithm will converge quickly.
(b) If you choose η = 0.3, then the algorithm will oscillate and may converge after many iterations.
(c) If you choose η = 0.8, then the algorithm will converge in two steps.
(d) If you choose η = 0.9, then the algorithm will diverge.
(e) None of the given answers is correct.

Q10. Consider training a neural network that has a single output using 5 training samples. The table below
@Div
shows the computed @w(3) for each of the input samples.
1;1

(3)
Assuming that w1;1 = 1, a learning rate η = 0.3, and we are using gradient decent that updates the weights
once after processing all training samples. Then the updated value for w1(3)
;1
is:
(a) 0.94
(b) 0.70
(c) 1.06
(d) 1.30
(e) None of the above
Q11. Which of the following functions can be used as an activation function in the output layer if we wish to
predict the probabilities of n classes (p1,p2,...,pn) such that the sum of p over all n outputs is equal to 1?
(a) Softmax
(b) ReLu
(c) Sigmoid
(d) Step function
(e) Tanh
2 2
Q12. Given The function f (x1; x2) = (x1 ¡ 3) + (x2 + 1) + 2 (x1 + 1) ¡ 8 (x2 ¡ 2)
,applying the gradient descent algorithm to minimize the function starting with the initial solution (x1, x2) =
(1, 1) using a learning rate η = 0.3, will result in the following next value of (x1, x2) rounded to one decimal
place:
(a) x1 =0.4,x2 =−1.2
(b) x1 = 0.4,x2 = 2.2
(c) x1 =1.6,x2 =−1.2
(d) x1 = 1.6,x2 = 2.2
(e) None of the above

Q13. Suppose you are training a neural network using gradient descent algorithm to minimize a given loss
function f. Consider that based on the current iteration, the gradient of f with respect to a weight parameter
w1 (of a certain connection in the neural network) is positive and the gradient of f with respect to another
weight parameter w2 (of a di erent connection in the neural network) is negative. Which of the following
statements are correct regarding the updates of w1 and w2:

(a) Increasing w1 will decrease the loss, increasing w2 will decrease the loss
(b) Decreasing w1 will increase the loss, increasing w2 will decrease the loss
(c) Increasing w1 will increase the loss, increasing w2 will increase the loss
(d) Increasing w1 will increase the loss, increasing w2 will decrease the loss
(e) Increasing w1 will increase the loss, decreasing w2 will decrease the loss
X
Q14. Assuming a simple perceptron: output = 1 if wixi + b ¸ 0 , the weights and biases for implementing the
i
equation Y = ¬(x1 ∧ x2 ∧ x3) using a single perceptron as shown below are: (Hint: use DeMorgan’s Law)

(a) w1 =−1,w2 =−1,w3 =−1,b=2


(b) w1 =−1,w2 =−1,w3 =−1,b=3
(c) w1 =1,w2 =1,w3 =1,b=−2
(d) w1 =1,w2 =1,w3 =1,b=−3
(e) None of the above
X
wx +b¸0
Q15. Assuming the use of a single simple perceptron: output = 1 if i
i i
, the used weights and
biases that detect the decision boundary below the given line (i.e., output = 1 for area under the line) are:

(a) w1 =−1,w2 =−1,b=−2


(b) w1 =−1,w2 =−1,b=2
(c) w1 =1,w2 =1,b=2
(d) w1 =1,w2 =1,b=−2
(e) None of the above
Use the following text for questions 16 to 18: (2) (2)
The network below is a neural network with inputs x and x , and outputs y1 and y2 : The neural
network architecture is shown below. All variables are scalar values. Note that ReLU (x) = max(0, x).

The expressions for the internal nodes in the network are given here for convenience:

(2)
Q16. What are the values of the z , s , and y1 ?

Q17. Compute the following gradients


(a) 4, 1, and 1
(b) 7, 1, and 0
(c) 2, 1, and 0
(d) -10, 0, and 1
(e) 2, 0, and 1

Q18. Compute the following gradient


(a) 1
(b) 0
(c) 5
(d) 4
(e) 3

Q19. Given a feed forward fully connected neural network with 5 inputs, one hidden layer with 10
neurons and two outputs, the number of parameters to be learned by training the neural network is:
(Consider both weights and biases in your calculation)
(a) 82
(b) 50
(c) 70
(d) 60
(e) 80
Use the following gure for questions 20 to 21:

X
wx +b¸0
i i
Q20. Assuming the use of a single simple perceptron where the output is equal to 1 if
i
, what will be the correct values of the weights for this perceptron that is classifying any point to the left of
line AB to 1:
(a) w2 =−1,w1 =1,b=0
(b) w2 =0,w1 =−1,b=1
(c) w2 =1,w1 =1,b=0
(d) w2 =1,w1 =0,b=1
(e) None of the above

Q21. An AI designer came up with the following MLP to classify the area inside the triangle ABC where
the following perceptrons are used:

i. Perceptron P1 for line AB,


ii. Perceptron P2 for line BC, and
iii. Perceptron P3 for line AC.

What will be the value of b1, b2 and b3 for the MLP to work properly:
(a) b1 =1,b2 =1 and b3 =1
(b) b1 =0,b2 =1 and b3 =0
(c) b1 =1,b2 =1 and b3 =0
(d) b1 =0,b2 =0 and b3 =1
(e) None of the above

Q22. Suppose that we would like to design a Multi-layer Perceptron MLP network to detect all points in
the shaded area shown below, the number of perceptrons needed to achieve 100% accuracy is: (Hint:
neurons could be shared)

(a) 8
(b) 13
(c) 9
(d) 12
(e) 10
Q23. Given the MLP network below where X, Y, Z, W are Boolean inputs and F is the output which
represents a Boolean
X function F(X,Y,Z,W). Each perceptron uses a Threshold activation function such that
the output=1 if wixi ¸ T . For each perceptron the threshold value is indicated as an integer within the
perceptron. Which iof the following Boolean functions, F, is represented by the given MLP?

(a) F = ¬((¬X∧¬Y)∧(X∧Z))∨((Y ∨ ¬Z ∨ W )∧¬(¬Z∧W))


(b) F = ((X∨Y)∧((X∨Z)))∧((Y ∧ Z ∧ W )∨(Z∨W))
(c) F = ((¬X∧¬Y)∧((X∧ Z))∨((Y ∨ ¬Z ∨ W )∧¬(¬Z∧ W))
(d) F =((¬X∨¬Y)∧(X∨Z))∧((Y ∧¬Z∧W)∨¬(¬Z∨W))
(e) F =¬((X∨Y)∧(X∧¬Z))∧((Y ∨¬Z ∨ W)∨(Z∧W))

Q24. What logic gate is implemented by the following MLP with the given weights and thresholds?

(a) y = X1 ∨ X2
(b) y = ¬ (X1 ⊕ X2)
(c) y = ¬ (X1 ∧ X2)
(d) y = X1 ∧ X2
(e) y = X1 ⊕ X2

Q25. An Arti cial Neural Network (ANN) has the shown architecture and the given weights and biases:

Assume that all neurons use the ReLu activation function which is given by
Which of the following are the correct values of the output y when the inputs are
x = 3 and x = -8, respectively?
(a) y=3,y=-8
(b) y=6,y=-16
(c) y=-3,y=8
(d) y=3,y=8
(e) y=6,y=16
Q26. Referring to question 25, which of the following functions represent the output y of the given ANN?
(a) y=-x
(b) y=2x
(c) y = x
(d) y = 2|x|
(e) y = min(0,x)

Q27. Consider the below single perceptron with weights W0 = 5, W1 = 1, W2 = 1, and W3 = -2.

What will be the output y, given the inputs X1 = 1, X2 = 2, and X3 = 4 and the activation
function is sigmoid, which is given by:f (z) = 1
(a) 11 1 + e¡z
(b) 0.5
(c) 0
(d) 1
(e) -0.5

Q28. We aim to design an MLP network to classify points within the decision boundary shown below,
assuming a simple perceptron where the output

The MLP must classify points inside the given shape versus points that are outside. Note that the
coordinates of the points on the given shape are: p1=(-1,2), p2=(-2,1), p3=(1,1) and p4 =(2,2).
The designed MLP architecture is shown below. The MLP shows the assigned values for some weights and
some thresholds. It is also assumed that values of all the biases in the MLP are set to zero. However,
weights w2, w3, w4, w8 and thresholds T1, T4 and T5 are missing.

To correctly classify the decision boundary above, values of the missing weights w2, w3, w4, w8 and
thresholds T1, T4 and T5 (shown in the MLP) must be:

(a) w2 =1,w3 =1,w4 =1,w8 =1,T1 =2,T4 = -1 and T5 =4


(b) w2 =-1,w3 =0,w4 =1,w8 =-1,T1 =3,T4 = 2 and T5 =1
(c) w2 =-1,w3 =0,w4 =1,w8 =-1,T1 =-3,T4 = 2 and T5 =2
(d) w2 =-1,w3 =0,w4 =1,w8 =-1,T1 =-3,T4 = -2 and T5 =4
(e) w2 =-1,w3 =-1,w4 =0,w8 =0,T1 =-3,T4 = -2 and T5 =4
Q29. Which of the following statements is TRUE about the backpropagation algorithm in neural networks?
(a) It is used for neural networks learning in which the error is transmitted back through the network to
allow weights and biases to be adjusted.
(b) It is another name given to the activation functions used in perceptrons.
(c) It is used for neural networks learning in which the forward pass involves computing partial derivatives
of an error function with respect to the network parameters.
(d) It is used for dynamically changing number of hidden layers in the neural network depending on the
gradients computations.
(e) All of the above statements are true.

Q30. A Neural Network Architecture is de ned by:


(a) the number of layers in the network
(b) the number of neurons in the hidden layers
(c) how neurons are connected
(d) the number of neurons in the input and output layers
(e) all of the above

Q31. In the Gradient Descent algorithm, a learning rate (λ) is selected to control how fast or slow the
algorithm reaches the minimum. Given an optimal step size (λopt = 0.2), Which of the following statement
is TRUE about λ?
(a) If λ = 0.4, then the algorithm will converge in one step.
(b) If λ = 0.05, then the algorithm will converge quickly.
(c) If λ = 0.2, then the algorithm will oscillate and may converge after many steps.
(d) If λ = 0.6, then the algorithm will diverge.
(e) None of the above statements is true.

Q32. What Boolean function is implemented by the below single perceptron with the given weights W1 =
5, W2 = 6, Bias = -7 and Threshold (0)?

(a) 2 input XOR gate


(b) 2 input AND gate
(c) 2 input OR gate
(d) 2 input OR gate
(e) NOT gate

Q33. What Boolean function is implemented by the following MLP network with the given weights and
thresholds?

(a) 2 input XOR gate


(b) 2 input AND gate
(c) 2 input OR gate
(d) 2 input OR gate
(e) None of the above
Q34. Suppose we have a fully connected Neural Network with the following given architecture:
*Number of inputs in the input layer is 2
*Number of hidden layers is 3
*Number of perceptrons in each hidden layer is 4 Number of perceptrons in the output layer is 1
What will be the total number of weights and biases that we will need to train?
(a) 15
(b) 45
(c) 57
(d) 13
(e) 29

Q35. Given the MLP architecture below where X, Y, Z and W are the inputs. The output F represents a
Boolean function F(X,Y,Z,W). Each perceptron uses a Threshold activation function with a threshold value
indicated as an integer within each perceptron. Which of the following Boolean functions, F, is represented
by the given MLP architecture?

(a) F =¬(X∨Y)∨(X∧¬Z)∨((Y ∨¬Z∨W))∧((Z∧W))


(b) F =(X∨Y)∨(X∧Z)∨((Y ∨Z∨W))∧((Z∧W))
(c) F =((X∨Y)∧((X∧¬Z)))∨((Y ∨W))∨((X∧Z∧W))
(d) F =¬((X∨Y)∧((X∧¬Z)))∧((Y ∨¬Z∨W))∨((Z∧W))
(e) F =((X∨Y)∧((X∨¬Z)))∨((Y ∨W)∨¬(X∧Z∧W)).

Q36. In neural networks, nonlinear activation functions such as sigmoid and ReLU .
(a) always output binary values; either 0 or 1
(b) are the only activation functions used in neural networks
(c) help to learn nonlinear decision boundaries
(d) always output real values between 0 and 1
(e) speed up the gradient calculation in backpropagation, as compared to linear functions

Q37. The Gradient Descent algorithm helps neural networks to reduce the output error by adjusting its
parameters, i.e. weights and biases. Based on the gradient sign, which of the following statements is NOT
true?
(a) Regardless of the gradient sign, decreasing the weight will always decrease the error.
(b) If the gradient is negative, decreasing the weight will increase the error.
(c) If the gradient is negative, increasing the weight will decrease the error.
(d) If the gradient is positive, increasing the weight will increase the error.
(e) If the gradient is positive, decreasing the weight will decrease the error.

Q38. What Boolean function is implemented by the following perceptron with the given weights (w1 = w2 =
0.6) and threshold (1)?

(a) OR gate
(b) XOR gate
(c) NOT gate
(d) AND gate
(e) None of the above
Q39. A Boolean function is to be implemented using an MLP with appropriate weights and thresholds.
Below is the truth table of the required Boolean function and the correspond- ing suggested architecture
of the MLP.

Indicate the correct thresholds, T1, T2 and T3 to implement the function using the shown MLP. Note that T1
and T2 perceptrons are Universal OR gates, while T3 perceptron is a Universal AND gate.
(a) T1=1,T2=-1,T3=2
(b) T1=-1,T2=1,T3=-1
(c) T1=0,T2=0,T3=2
(d) T1=1,T2=-1,T3=1
(e) T1=0,T2=1,T3=-1
df
Q40. Suppose you run gradient descent on the function f(x) for few iterations with a step size η = 0.2 and
is computed after each iteration. You nd that the value of df
decreases until it reaches a local minimum dx
dx
at X5.Furthermore, assume that ηopt = 0.3. Based on this, which of the following statements is TRUE:

(a) For step size η > 0.6, the gradient descent can escape the local minimum and attempt to nd the global
minimum.
(b) For step size η < 0.3, the gradient descent can guarantee convergence to the global minimum.
(c) If we start with step size η = 1.0 rather than η = 0.2, gradient descent guarantees convergence to the
global minimum.
(d) For any step size η, the gradient descent cannot nd the global minimum once it nds the local minimum
at X5.
(e) None of the above.

Q41. Given the MLP architecture below where X,Y,Z and W are the inputs. The output F represents a
Boolean function F(X,Y,Z,W). Each perceptron uses a Threshold activation function with a threshold value
indicated within each perceptron. Which of the following Boolean functions, F, is represented by the given
MLP architecture?

(a) F =((X∨Y)∧((X∨¬Z)))∨((Y ∨W)∨(X∧Z∧W))


(b) F =((X∨Y)∧((X∨¬Z)))∨((Y ∨W)∨¬(X∧Z∧W))
(c) F =(¬(X∨Y)∧(¬(X∨¬Z)))∨(¬(Y ∨W)∨(X∧Z∧W))
(d) F =((¬X∨¬Y)∧((¬X∨Z)))∨((¬Y ∨¬W)∨(X∧Z∧W))
(e) F =((X∧Y)∨((X∧¬Z)))∧((Y ∧W)∨¬(X∨Z∨W))
Q42. Suppose we have a Neural Network with the following given architecture:
1. Number of perceptrons in the input layer is 3
2. Number of hidden layers is 1
3. Number of perceptrons in the hidden layer is 5
4. Number of perceptrons in the output layer is 4
What will be the total number of weights and biases that we will need to train if the Neural network has a
fully connected architecture.
(a) 15
(b) 20
(c) 60
(d) 35
(e) 44

Q43. Which of the following is NOT true about the backpropagation algorithm in neural network?
(a) The forward pass involves computing partial derivatives of an error function with respect to the
network parameters.
(b) The algorithm is used for neural networks learning.
(c) The algorithm uses gradient descent to minimize a divergence function.
(d) Adjusting the neural network parameters depends on the gradients computations.
(e) The activation functions used must be di erentiable.

Q44. Consider the below single perceptron with weights W0 = −5,W1 = 2,W2 = −1, and W3 = 3.

What will be the output y, given the inputs X1 = 3, X2 = 2, and X3 = 4 for each of the following activation
functions?

(a) 0 for the Threshold activation function, 1 for ReLU activation function
(b) 11 for the Threshold activation function, 11 for ReLU activation function
(c) 0 for the Threshold activation function, 0 for ReLU activation function
(d) 1 for the Threshold activation function, 11 for ReLU activation function
(e) 1 for the Threshold activation function, 16 for ReLU activation function

Q45. For a convex error function (i.e., with a bowl shape), which of the following is true, given the optimal
learning rate?
(a) Gradient descent is guaranteed to converge to the global minimum.
(b) Gradient descent is always divergent.
(c) Gradient descent may converge or diverge if step size is less than the optimal step size.
(d) Gradient descent is not guaranteed to converge to the global minimum.
(e) None of the above.

Q46. An Arti cial Neural Network (ANN) has the shown architecture and the given weights and biases:

Assume that all neurons use the ReLu activation function which is given by
Which of the following are the correct values of the output y when the input x = 5 and x = -5, respectively?
(a) y=5,y=5
(b) y=-10,y=10
(c) y=10,y=-10
(d) y=0,y=0
(e) y=-5,y=5
Q47. Referring to question 46, which of the following functions represent the output y of the given ANN?
(a) y=-x
(b) y=min(0,x)
(c) y = x
(d) y = |x|
(e) y = max(0, x)

Q48. Which of the following is true about Back Propagation?


a. It is used to train a neural network.
b. It uses the gradient decent technique.
c. It uses the chain rule.
d. Error in output is propagated backwards to update the weight and get a better t.
e. All the given answers are true.

Q49. When using the Gradient Decent algorithm to nd the minimum of an error curve,
a. If the derivative is positive, then moving left decreases error.
b. If the derivative is positive, then moving left increases error.
c. If the derivative is negative, then moving left decreases error.
d. If the derivative is negative, then moving right increases error.
e. The derivative does not give any indication and we need to use something else.

Q50. In the Gradient Decent algorithm, a step size ( ) is selected to control how fast or slow the algorithm
reaches the minimum. Given an optimal step size ( ), Which of the following is true about ?
a. If < , then the algorithm will converge quickly.
b. If = , then the algorithm will oscillate and may converge after many steps.
c. If > , then the algorithm will diverge.
d. If > , then the algorithm will converge in one step.
e. None of the given answer is correct.

Q51. For a neural network, which one of these structural assumptions is the one that most a ects the
trade-o between under tting and over tting:
a. The architecture of the neural network
b. The choice of weights on which the network starts with
c. The chosen learning rate ( )
d. The size of the output layer
e. The size of the input layer

Q52. If a perceptron has two inputs and that are associated with weights and respectively.
Moreover, assume that the bias input to the perceptron is and the activation function is given by the
function ( ).Then, the output of the perceptron is given by:
a. =h( 1+ 2+ 0)
b. = 1 1 + 2 2 + 0
c. = h( 1 1 + 2 2 + 0)
d. =h( 1 1+ 2 2− 0)
e. = h( 1 1) + h( 2 2) + h( 0)

Q53. Consider the following gure:

a. Draw a multi-layer perceptron (MLP) that can be customized to classify the shaded area within the
given shape.
b. Indicate on the gure the weights, bias, and threshold values of each perceptron in your gure.

Ans:
Q54. An Arti cial Neural Network has the following topology:

Given the following weights and biases:

Suppose that 1 is the input to the to the last neuron from the upper path while the 2 is the input to the
last neuron from the lower path. Fill the following table with the correct values:

Q55. Given the shown single perceptron with a ReLu activation function de ned as

a. Given W1= 2, X1 = 4, W2 = -1, X2 = 3, and W0 = -1. Use backpropagation algorithm to ll the following
table:

b. Fill in the blank based on the results in part a:


i. In order to decrease the output (Y), w1 should be ___________ (increased/decreased).
ii. In order to decrease the output (Y), x2 should be ___________ (increased/decreased).
Q56. Design an MLP neural networkX to classify the given decision boundary shown below, assuming
simple perceptron: =1 wixi ¸ T . The MLP must classify points inside the given shape verses
i
points that are outside. Note that the coordinates of the points on the given shape are: 1 = (−1,2), 2 =
(−1,−2), 3 = (0,0), 4 = (1,1), 5 = (2,0), 6 = (1,−1).

a. Draw the architecture of the MLP that can classify data within the shapes shown graph above. Assume
that the value of all the biases in the MLP are set to zero.
b. On the Drawing show the threshold value of each perceptron by writing it inside the perceptron.
c. On the Drawing, clearly show the used weights for each connection in your architecture.
Solution:
The equation of the line for the triangle are:
Line P1-P2: x1 = -1
The decision boundary is x1 ≥-1➔x1 ≥-1 Thus w2=0,w1=1 and T=-1
Line P1-P3: x2 = -2 x1
The decision boundary is x2 ≤-2x1➔-x2 -2x1≥0, Thus w2=-1,w1=-2 and T=0
Line P2-P3: x2 = 2 x1
The decision boundary is x2 ≥2x1➔x2 -2x1≥0,Thus w2=1,w1=-2, and T=0
The equation of the line for the square are:
Line P3-P4: x2 = x1
The decision boundary is x2 ≤x1➔-x2 +x1≥0, Thus w2=-1,w1=1 and T=0
Line P3-P6: x2 = -x1
The decision boundary is x2 ≥-x1➔x2 +x1≥0, Thus w2=1,w1=1, and T=0
Line P4-P5: x2 = -x1+2
The decision boundary is x2 ≤-x1+2➔-x2 -x1 ≥-2, Thus w2=-1,w1=-1 and T=-2
Line P5-P6: x2 = x1-2
The decision boundary is x2 ≥x1-2➔x2 -x1≥-2, Thus w2=1,w1=-1 and T =-2
Thus, the MLP is:
X
Q57. Assuming a simple perceptron: output = 1 if wixi + b ¸ 0, the weights and biases for implementing the
i
equation Y = x1 ∧ ¬x2 ∧ x3 using a single perceptron as shown below should be equal to

(a) w1=1, w2=-1, w3=1, b=2


(b) w1=1, w2=-1, w3=1, b=-2
(c) w1=1, w2=1, w3=1, b=2
(d) w1=-1, w2=1, w3=-1, b=-2
(e) None of the above

Explanation:
Non-inverted inputs should have a weight of 1 while inverted inputs should have a weight of -1. The
threshold for implementing an AND operation is equal to the number of non-inverted inputs = 2. The bias is
equal to -ve of threshold = -2.
X
Q58. Assuming a simple perceptron: output = 1 if wixi + b ¸ 0 , the weights and biases for implementing
i
the equation Y = x1 ∨ ¬x2 ∨ ¬x3 using a single perceptron as shown below should be equal to

(a) w1=1, w2=-1, w3=-1, b=1


(b) w1=1, w2=-1, w3=-1, b=-1
(c) w1=-1, w2=1, w3=1, b=1
(d) w1=-1, w2=1, w3=1, b=-1
(e) None of the above

Explanation:
Non-inverted inputs should have a weight of 1 while inverted inputs should have a weight of -1. The
threshold for implementing an OR operation is equal to 1-number of inverted inputs = -1. The bias is equal
to -ve of threshold = 1.
X
Q59. Assuming a simple perceptron: output = 1 if wixi + b ¸ 0 , the used weights and biases that detect
i
the decision boundary below the given line (i.e., output =1 for area under the line) should be equal to

Explanation:
Equation of the line x2 = mx1+ c
m = (-2-1)/(1-(-2)) = -3/3 = -1
x2 = -x1+ c
Substituting the point (1, -2) → -2 = -1 + c →
(a) w1=-1, w2=-1, b=-1 c = -1 Thus, equation of the line x2 = -x1- 1
(b) w1=1, w2=1, b=-1 Thus, the decision boundary is x2 ≤ -x1 - 1
(c) w1=1, w2=1, b=1 → 0 ≤ -x1 - x2 - 1
(d) w1=-1, w2=-1, b=1 Thus, w1=-1, w2=-1, b=-1.
(e) None of the above
Q60. Suppose that we would like to design a Multi-layer Perceptron (MLP) network to detect all points inside
the concave pentagon shape given below, the minimum number of perceptrons needed to achieve 100%
accuracy is: (Hint: perceptrons can be shared)

(a) 5
(b) 6
(c) 9
(d) 10
(e) 11

Explanation:
The shape needs to be divided into 3 triangles: DAB, DBE and ECB. 3+1 perceptrons are needed for the
three boundaries and for implementing AND operation of triangle DAB, 3+1 perceptrons are needed for the
three boundaries and for implementing AND operation of triangle ECB, and 1+1 perceptrons are needed for
the boundary DE and for implementing AND operation of triangle DBE. Note that the boundaries DB and EB
are shared from those used for triangles DAB and ECB. A nal perceptron is needed for the OR operation.
Thus, the total number of perceptrons needed is 4+4+2+1=11.

Q61. Consider the logic function Y de ned by the given truth table. The inputs are x1, x2, and x3. It is
required to implement this function, directly from truth table, using an MLP that has a single hidden layer of
neurons that implement AND operations. Then the number of total neurons needed in the entire MLP is
equal to

(a) 6
(b) 4
(c) 5
(d) 7
(e) None of the above

Explanation:
The truth table has 5 ones and 3 zeros. For a single-hidden layer of AND operation implementation, we need
5 neurons to implement 5 AND gates plus 1 neuron to implement OR operation in the output layer.
Old Quizzes Questions

Q1. The correct notations for the shown quantities (in order): a, b, c, and d are:
Hint: these quantities may be weights of arcs or outputs of neurons, or even input

Q2. For the above NN, the numbers (α,β,γ), de ned as: α is the number of hidden layers, β is the number of
W coe cients to be trained, and γ is the maximum number of activation functions to be used, is given by?
a) (4, 48,13).
b) (4, 36,12).
c) (3, 48,13).
d) (3, 36,12).
e) None of the above

Q3. It is required to minimize the function f (x) = x2 ¡ 2x + 3 using the Gradient Descent (GD) algorithm
speci ed in class. Assume initial X (0) = 2. If is the input at which ( ) attains its global minimum,
then the pair ( , ( df
)) is given by Hint: you are required to minimize ( ). Solve for = 0 to nd the
exact/analytic solution. dx
a) (1,0)
b) (0,2)
c) (2,1)
d) (1,2)
e) None of the above
Solution: ′( )=2 − 2 = 0➔ = 1 and (1)= 12 ¡ 2 (1) + 3 = 2

Q4. The following activation function is the most computationally e cient


a) binary-step function
b) sigmoid function
c) ReLU function
d) tanh function
e) softmax function

Q5. The following activation function is the most widely used in the last layer of NN
a) binary-step function
b) sigmoid function
c) tanh function
d) ReLU function
e) softmax function

Q6. The following activation function is the most widely used in hidden layers of NN
a) binary-step function
b) sigmoid function
c) tanh function
d) ReLU function
e) softmax function
Q7. Comparing the two activation functions: sigmoid and tanh, select the most appropriate statement:
a) sigmoid is preferred over tanh since it has probabilistic interpretation of output and does not su er from
the vanishing gradient problem.
b) tanh is preferred over sigmoid since it has +ve and -ve outputs and does not su er from the vanishing
gradient problem.
c) sigmoid and tanh are both di erentiable and do not su er from the vanishing gradient problem.
d) sigmoid and tanh are non-di erentiable and both su er from the vanishing gradient problem.
e) None of the above

Q8. Comparing the two activation functions: sigmoid and tanh, select the most appropriate statement:
a) sigmoid is preferred over tanh since it has probabilistic interpretation of output and does not su er from
the vanishing gradient problem.
b) tanh is preferred over sigmoid since it has +ve and -ve outputs and does not su er from the vanishing
gradient problem.
c) sigmoid and tanh are both di erentiable and do not su er from the vanishing gradient problem.
d) sigmoid and tanh are both di erentiable and both su er from the vanishing gradient problem.
e) None of the above

Q9. The vanishing gradient problem, refers to the fact that:


a) gradient is not de ned for a range of input.
b) gradient is extreme (+ve or -ve values) for most range of input.
c) gradient is roughly zero for most range of input.
d) gradient is not constant with respect to input.
e) None of the above

Q10. The vanishing gradient problem, refers to the fact that:


a) gradient is not monotonic (i.e. all increasing or all decreasing) for range of input.
b) gradient is not de ned for a range of input.
c) gradient is extreme (+ve or -ve values) for most range of input.
d) gradient is not constant with respect to input.
e) None of the above

Q11. The following is an example of one-hot representation:


a) [1 0 0], [0 1 0], and [0 0 1]
b) [1 0 0], [0 0 1], and [1 1 1]
c) [1 1 0], [1 0 1], and [0 1 1]
d) [1 0 0], [1 1 0], and [1 1 1]
e) [0 0 0], [0 1 0], and [1 0 1]

Q12. The following is an example of one-hot representation:


a) [1 0 0], [0 0 1], and [1 1 1]
b) [1 0 1], [0 1 0], and [0 0 1]
c) [1 1 0], [1 0 1], and [0 1 1]
d) [1 0 0], [0 1 0], and [1 0 0]
e) [0 0 0], [0 1 0], and [1 0 1]

Q13. The following is an example of one-hot representation:


a) [1 0 0], [0 0 1], and [1 1 1]
b) [1 1 0], [0 1 0], and [0 0 1]
c) [0 1 0], [1 0 0], and [0 1 0]
d) [1 0 0], [1 1 0], and [1 1 1]
e) [0 0 0], [0 1 0], and [1 0 1]

Q14. The correct notations for the shown quantities (in order): a, b, c, and d are: Hint: these quantities
may be weights of arcs or outputs of neurons, or even input
Q15. The correct notations for the shown quantities (in order): a, b, c, and d are: Hint: these quantities may
be weights of arcs or outputs of neurons, or even input

Q16. It is required to minimize the function f (x) = 2x2 ¡ 4X + 5 using the Gradient Descent (GD) algorithm
speci ed in class. Assume initialX (0)= 2. If is the input at which ( ) attains its global minimum,
df
then the pair ( , ( )) is given by Hint: you are required to minimize ( ). Solve for dx = 0 to nd the
exact/analytic solution.
a) (1,0)
b) (0,2)
c) (2,3)
d) (1,4)
e) None of the above
Solution: ′( )=4 − 4 = 0➔ = 1 and (1)= 12 ¡ 2 (1) + 5 = 4

Q17. It is required to minimize the function f (x) = x2 ¡ 4X + 3 using the Gradient Descent (GD) algorithm
speci ed in class. Assume initial X (0) = 2. If is the input at which ( ) attains its global minimum,
then the pair ( , ( )) is given by Hint: you are required to minimize ( ). Solve for dx = 0 to nd the
df

exact/analytic solution.
a) (1,0)
b) (1,4)
c) (0,2)
d) (2,3)
e) None of the above
Solution: ′( )=2 − 4 = 0➔ = 2 and (1)= 22 ¡ 2 (2) + 3 = 3

Q18. Design a single hidden layer MLP to model the logical expression given below (without applying
Demorgan’s law), assuming simple perceptron: =1 ∑ ≥ . Clearly show the used weights and
biases.
Y = (¬A ^ B ^ ¬D) V ¬ (C ^ D)
Q19. Assuming the use of a single simple perceptron: = ∑ + ≥ . Clearly show the used
weights and biases that detect the decision boundary below the given blue line (i.e., output =1 for area under
the blue line).

Equation of the line x2 = mx1 + c


m = (-2-1)/(1-(-2)) = -3/3 = -1
x2 = -x1 + c
Substituting the point (1, -2) => -2 = -1 + c => c = -1 Thus, equation of the line x2 = -x1 - 1
Thus, the decision boundary is x2 <= -x1 - 1 => 0<=-x1-x2-1
Thus, w1=-1, w2=-1, b=-1

Q20. Suppose that we would like to design an MLP to detect all points inside the octagon shape given
below, determine the number of perceptrons needed to achieve 100% accuracy. Justify your answer.

We need one perceptron to detect each of the eight decision boundaries. Then we need one perceptron to
perform AND operation to combine the results of the 8 perceptrons. Thus, we need a total of 9 perceptrons.

The following gure is for questions 21 and 22:

Q21. Given the gure above, what should be the value of so that the neuron represents a Boolean function
of AND:
A. 0
B. 5
C. 3
D. 6
E. None of the above

Q22. Given the gure above, what should be the value of T so that the neuron represents a Boolean function
of OR:
A. 0
B. 5
C. 3
D. 6
E. None of the above
Q23. If we want to use one hidden layer MLP to model the given Boolean function in the table, how many
rows from the given truth table will contribute towards the construction of the MLP:
A. 1
B. 2
C. 3
D. 4
E. None of the above

Q24. If we want to use one hidden layer MLP to model the given Boolean function in the table, choose the
best expression that can model the given behavior:
A. +
B. 1 2+ 1 2
C. +
D. ̅ +
E. +

Q25. Assuming the use of a single simple perceptron: = ∑ + ≥ . What will be the used
weights and biases that detect the decision boundary below the given line (i.e., output =1 for area under the
below line).
A. 2 =−2, 1 =−1, =0
B. 2=−1, 1=−1, =0
C. 2 =−2, 1 =−2, =0
D. 2 =−1, 1 =−2, =0
E. None of the above

Q26. The Sigmoid function is de ned as: f (z) = 1


1 + e¡z
If a very deep neural network utilizes the sigmoid activation function in all its hidden layers, then training of
this neural network becomes di cult because:
A. The Sigmoid is a non-linear function which makes it very di cult to take the derivative of the function to
nd the slope of the gradient.
B. The value of the output gets very large with increasing value of the input and the network might be
overtrained.
C. The output of some neurons might be negative that makes training of neural network harder.
D. The Sigmoid function is non-di erentiable at zero that makes training of the network hard if the input to
any neuron is zero
E. The output of all the neurons will be of the same sign and hence even if the contribution of this neuron is
not signi cant, we cannot ignore it.

Consider a single perceptron shown below with the Tanh activation function, answer questions 27 to 30

Q27. If we perform the forward propagation then the correct values placed in the location ofa, b, c,anddare:
A.a= 2,b=-2,c= 0,d= 0
B.a=-2,b=-2,c= 0,d= 1
C.a= 0,b= 1,c=-2,d= 0
D.a= 2,b= 2,c= 1,d=-1
E. None of the above

Q28. The value of e is qual to:


A. Equal to d
B. 0
C. 1
D. -1
E. None of the above
Q29. The value of f is equal to:
A. 0
B. 1
C. 1.4
D. 2
E. None of the above

Q30. Which of the following statement is true if we want to update the weights to lower the overall error:
A. The value of 1 should be increased while 2 should be decreased
B. Both values should be increased
C. Both values should be decreased
D. Changing the value will have no e ect on the error.
E. The value of 1 should be decreased while 2 should be increased

Q31. The shown Multi-Layer Perceptron (MLP) has two perceptrons; p1 and p2, in the hidden layer and one
perceptron in the output layer; p3. The activation function for p1 and p2 is ReLu which is given by
while p3 has a threshold activation such that the output = 1 ∑ + ≥ , where T = 3. The
MLP has six weights w1, w2, w3, w4, w5 and w6 and two Boolean inputs x1 and x2.

(a) For each possible value of the inputs x1 and x2 ll the table below. Assume (w1 = w4 = w5 = w6 = 1), (w2
= w3 = 2) and the bias value for all perceptrons is set to zero.

(b) Based on your answer in (a), which Boolean function is implemented by the above MLP?
i. XOR; = 1⊕ 2
ii. NOT; =¬ 1
iii. AND; = 1⋅ 2
iv. OR; = 1+ 2

Q32. Given a fully connected Neural Network with 5 inputs in the input layer, one hidden layer with 3
perceptrons, and one perceptron in the output layer. How many weights and biases we will need to train?
(a) 9
(b) 60
(c) 18
(d) 22 = (5*3+3*1)=18 weights + 4 biases

Q33. Suppose we designed the shown MLP network to detect all points within the shaded area given below
with 100% accuracy, assuming a simple perceptron where the =1 ∑ + ≥ (In this MLP all
bias values are set to 0). Note that the coordinates of the points of the given shape are: 1 = (0,1), 2 = (2,1),
and 3 = (1,0), 4 = (0,−1), 5 = (2,−1).

(a) To correctly detect all points within the shaded area, values of the weights, w12, w22, and threshold T2
(as shown in the MLP) must be:
i. w12=1,w22=1, and T2 =-1.
ii. w12=-1,w22=1, and T2 =-1.
iii. w12=1,w22=1, andT2 =1.
iv. w12=-1,w22=1, and T2 =1.
NOTE: w12, w22, and T2 are associated with the area of the area above p2-p3.
(b) To correctly detect all points within the shaded area, values of thresholds T7 , T8 and T9 (as shown in the
MLP) must be:
i. T7 =3,T8 =3 and T9 =1.
ii. T7 =-1,T8 =-1 and T9 =1.
iii. T7 =1,T8 =-1 and T9 =2.
iv. T7 =3,T8 =3 and T9 =2.

Q34. Given the shown single perceptron with ReLu activation function given by
Suppose X1 = 2, w1 = -5, X2 = 3, w2 = 4 and w0 = 1. Compute the values of z, y, dy
; and
dy
dw0 dw1
Fill the table below

Q35. The 4-input perceptron ( gure) has weights w1, w2, w3, w4, and a bias b0. The
activation function is the Sigmoid function, given by f (z) = 1
1 + e¡z

A. Write an expression for the output y of the perceptron.

B. Given w1=1, w2=2, w3=3, w4=4, b0=1, x1= -4, x2 =4, x3=1and x4= -2, what would be the output y?

Q36. In gradient descent, if the cost (error) function is convex, then it converges to a ..................
a) global maximum
b) global minimum
c) local minimum
d) local maximum

Q37. Given a neural network with 3 perceptrons in the input layer, 1 hidden layer with 5 perceptrons and 4
perceptrons output layer. What will be the total number of weights and biases that we will need to train if the
Neural network has a fully connected architecture?
a) 15
b) 20
c) 44
d) 35
Q38. Given the following MLP with 2 inputs x and y. We are required to nd the area it represents on the 2D
space. Find the equations of each of the lines represented by the 4 perceptrons P1, P2, P3, and P4.

B. On the two-dimensional graph, represent the area de ned by the given multilayer perceptron.

Q39. Which Boolean function F(X, Y, Z) is implemented by the following perceptron with the given weight
and threshold values?

Q40. Which of the following statements is true about the perceptron algorithm?
a) Given a linearly separable dataset, the perceptron algorithm is guaranteed to nd a max-margin
hyperplane.
b) If while running the perceptron algorithm we make one pass through the data and make a single
classi cation mistake, the algorithm has converged.
c) A perceptron is guaranteed to learn a separating decision boundary for a separable dataset within a nite
number of training steps.
d) A single perceptron can compute the XOR function.

Q41. Which of the following statements is true about the backpropagation algorithm?
a) It is another name given to the activation functions used in perceptrons.
b) It is used during the learning phase in which the error is transmitted back through the network to allow
weights and biases to be adjusted.
c) It allows dynamically changing the number of hidden layers in the network based on computations.
d) It is used for the learning phase in which the forward pass involves computing partial derivatives of an
error function with respect to the network parameters.

Q42. The Maximum Margin Classi er is the classi er that


a) nds the smallest distance among support vectors of di erent classes
b) nds some arbitrary line that can classify di erent sets of data.
c) uses a kernel to nd a good line that separates di erent classes.
d) nds the widest distance among support vectors of di erent classes.
Q43. The shown 4-input perceptron has weights w1, w2, w3, w4 of 1, 2, 3 4, respectively and a bias b = 1.
1
The activation function is set as the Sigmoid function which is given by f (z) = ¡z
1+e

(a) In terms of inputs, weights and bias, write an expression for calculating z, i.e. the input of activation
function.

=∑ + = 1 +2 2 +3 3 +4 4 +1

(b) Given the inputs x1, x2, x3 and x4 as, -4, 4, 1 and -2, then the output, y, of this perceptron will be:
i. -1
ii. 0
iii. 0.5
iv. 1/e

Q44. Given a fully connected Neural Network with 11 inputs in the input layer, 5 perceptrons in one
hidden layer and 1 perceptron in the output layer. How many weights and biases does it contain?
(a) 17
(b) 55
(c) 66
(d) It is an arbitrary value

Q45. Which Boolean function is implemented by the following perceptron with the given weights (w1 =
w2 = 20), a bias of -30 and a threshold value of -10?
(a) XOR gate; = 1 ⊕ 2
(b) AND gate; = 1 ⋅ 2
(c) OR gate; = 1 + 2
(d) NOT gate = ¬ 1

Q46. Which Boolean function F(x1, x2) is implemented by the follow ing perceptron with the following
weights: (w1 = w2 = 20), a bias of -30 and a threshold value of -10?
Solution:
OR Gate
Q47. Consider the following gure Draw a multi-layer perceptron (MLP) that can be used to classify the
points within the shaded area. Indicate on the gure the weights, biases, and threshold values of each
perceptron.

Q48. Design an MLP neural network to classify the given decision boundary shown below, assuming
simple perceptron: =1 ∑ ≥ . The MLP must classify the points inside the given shape
versus points that are outside. Identify the value of the weights and the thresholds for each perceptron and
assume the bias is always set to zero.

Solution:
For line segment connecting points (0,-1) and (1, 1), the equation of the line is:
= 2 − 1➔ − 2 = −1 Lets nd which side of this line do points of grey region lie in.
• Consider a point from grey region, e.g., (0,1)
• Substitute x=0, y=1 in the above equation and compare the value of the left-
and right-hand sides of the equation.
• We get − 2 = 1 , which is ≥ −1
• Hence, we now know that grey region is represented by
− ≥−
Thus: w2 = 1, w1 =-2 and T = -1.
Repeating the same steps for line segment connecting (0,-1) and (-1, 1) we get: w2 = 1, w1 =2 and T = -1
Finally, for line segment connecting (1, 1) and (-1, 1) the equation of the line is = 1. The grey region would
be represented by ≤ 1, which upon re-arranging becomes − ≥ −1 . and thus w2 = -1, w1 = 0, T = -1.
Q49. Since any Boolean function can be represented by a Multi-Layer-Perceptron (MLP), draw the
equivalent MLP that produces for the Boolean function:
=( ∨ ∨¬ ) ∨¬( ∧ ∧ )
Solution:

Q50. Given the single perceptron shown below with the ReLu activation function, perform the following:
1. In the shaded squares above the line, perform a forward pass calculation and show the outputs of the
forward pass. Fill all shaded squares.
2. Using the backpropagation algorithm, indicate the values shown in the spikey circles.
3. To decrease the value of the output, indicate whether the values of the following variables should be
increased or decreased?
Hint: The ReLu function is de ned as

Q51. Since any Boolean function can be represented by a Multi-Layer-Perceptron (MLP), draw the
equivalent MLP that produces for the Boolean function:
=( ∨ ∨¬ ) ∨¬( ∧ ∧ )
Solution:
Q52. Design an MLP neural network to classify the given decision boundary shown below, assuming
simple perceptron: =1 ∑ ≥ . The MLP must classify the points inside the given shape
versus points that are outside. Identify the value of the weights and the thresholds for each perceptron
and assume the bias is always set to zero.

Solution:
For line segment connecting points (0,1) and (-1, -1), the equation of the line is:
=2 +1
Since the points must be under the line we get:
Rearranging terms we get:
Thus: w2 = -1, w1 =2 and T = -1.
≤ 2 + 1 − + 2 ≥ −1
Repeating the same steps for line segment connecting (0,1) and (1, -1) we get: w2 = -1, w1 =-2 and T = -1
Finally, for line segment connecting (-1, -1) and (1, -1) the equation of the line is = −1 and thus w2 = 0,
w1 = 1, T = -1.

Q53. Given the single perceptron shown below with the ReLu activation function, perform the following:
1. In the shaded squares above the line, perform a forward pass calculation and show the outputs of the
forward pass. Fill all shaded squares.
2. Using the backpropagation algorithm, indicate the values shown in the spikey circles.
3. To decrease the value of the output, indicate whether the values of the following variables should be
increased or decreased?
Hint: The ReLu function is de ned as
Solution:
Old HWs Questions
Question 1: Design an MLP neural network to classify the given decision boundary shown below,
assuming simple perceptron: =1 ∑ ≥ . The MLP must classify the points inside the given
shape versus points that are outside. Identify the value of the weights and the thresholds for each
perceptron and assume the bias is always set to zero.

For each segment joining two points we nd the equation of the supporting line and use it to determine the
weights. For example, for the line segment connecting points (-1,0) and (0,1), the equation of the line is:
2= 1+1
Since the point we want are under the line segment, we can write:
2 ≤ 1 + 1 Putting the above equation in the form ∑ ≥ we get:
− 2 + 1 ≥ −1
Thus 2 = −1, 1 = 1, and since b = 0 then = −1. Repeating these steps for line segment connecting
(0,1) and (1,0) we get: 2 = −1, 1 = −1 and = −1. For line segment connecting (-1,0) and 0,-1), the
equation of the line is:
2=− 1−1
Since the point we want are over the line segment, we can write:
2 ≥ − 1 − 1 Putting the above equation in the form ∑ ≥ we get:
2 + 1 ≥ −1
Thus 2 = 1, 1 = 1, and = −1 (remember that b = 0). Repeating these steps for line segment
connecting (1,0) and (0,1) we get: 2 = 1, 1 = −1 and = −1.

Then nally the MLP is


Question 2: Given the MLP where the two perceptrons in a hidden layer use a ReLu activation function and
the output layer with a single perceptron

2
Suppose the loss (Error) function is given by as = 12 ( − ), where Y is the predicted output and D is
the desired output.
(a) In terms of all inputs, weights and biases, write expressions for each of the following:

(b) Thetablebelowshowstheoptimalparameters(weightsandbiases)whichhavebeenpredictedfor the given set of


inputs. However, b3 is set to 1.6, which resulted in a prediction error.

Use gradient descent to nd the optimal value of b3 such that the error in the output is reduced to a value
below 0.01. Assuming a learning rate = 0.2 ll in the table below. For simplicity, apply gradient descent
when the input values are X1 = 0, X2 = 1 only.
Question 3:

You might also like