School of Computing and Information Systems The University of Melbourne COMP90049 Introduction To Machine Learning (Semester 1, 2022)

School of Computing and Information Systems
The University of Melbourne

COMP90049 Introduction to Machine Learning (Semester 1, 2022)
Week 10: Solutions
1. Consider the two levels deep network illustrated below. It is composed of three perceptron.
The two perceptron of the first level implement the AND and OR function, respectively.
Determine the weights θ1, θ2 and bias θ0 such that the network implements the XOR
function. The initial weights are set to zero, i.e., θ01 = θ11 = θ21 = 0, and the learning rate 𝜂
(eta) is set to 0.1.
Notes:
• The input function for the perceptron on level 2 is the weighted sum (Σ) of its input.
• The activation function f for the perceptron on level 2 is a step function:
• Assume that the weights for the perceptron of the first level are given.
Learning Algorithm for XOR:

For each training example x
𝑝 ← (𝑝! , 𝑓"#$ (𝑥), 𝑓%& (𝑥))
𝑦) ← 𝑓(∑(')! 𝜃' 𝑝' )
𝑦 ← 𝑡𝑎𝑟𝑔𝑒𝑡 𝑜𝑓 𝑥
For i = 1: n
∆𝜃' ← 𝜂(𝑦 − 𝑦)) 𝑝'
𝜃' ← 𝜃' + ∆𝜃'
To calculate the output of the two level 1 perceptron, we can use the following table (Remember since
the weights for these perceptron are given, we don’t need to do the iterative learning):
Train instance # 𝑥* 𝑥+ 𝑝* 𝑝+ y
𝑓"#$ (𝑥) 𝑓%& (𝑥) 𝑥* 𝑋𝑂𝑅 𝑥+
1 1 0 0 1 1
2 0 1 0 1 1
3 1 1 1 1 0
4 0 0 0 0 0
Based on the results from above table, our input signals (training instances) for level 2 perceptron (XOR)
are 𝑃' =< 𝑝! , 𝑝* , 𝑝+ > = < −1 , 𝑝* , 𝑝+ > and the parameters are θ =<θ01, θ11, θ21> = <0, 0, 0>.
So, for the first epoch we have:
ep P z=θ.P 𝑦) y ∆𝜃*'
<𝑝! , 𝑝* , 𝑝+ > 𝜃! × 𝑝! + 𝜃* × 𝑝* + 𝜃+ × 𝑝+ f(z) 𝜂 (𝑦 − 𝑦)) 𝑝'
<-1,0,1> 0 x (-1) + 0 x 0 + 0 x 1=0 f(0)=0 1 0.1 x (1-0) x <-1, 0,1>
Update θ to = <-0.1, 0, 0.1>
<0, 0, 0> + <-0.1,0 ,0.1>=<-0.1,0 ,0.1>
<-1,0,1> (-0.1) x -1 + 0 x 0 + (0.1) x 1 = 0.2 f(0.2)=1 1 No change required

1
<-1,1,1> (-0.1) x -1 + 0 x 1 + (0.1) x 1 = 0.2 f(0.2)=1 0 0.1 x (0-1) x <-1, 1,1>
Update θ to = <0.1, -0.1 ,-0.1>
<-0.1,0 ,0.1> + <0.1, -0.1, -0.1> = < 0, -0.1, 0>
<-1,0,0> 0 x -1 + (-0.1) x 0 + 0 x 0 = 0 f(0)=0 0 No change required

<-1,0,1> 0 x -1 + (-0.1) x 0+ 0 x 1 = 0 f(0)=0 1 0.1 x (1-0) x <-1, 0,1>
Update θ to = <-0.1, 0, 0.1>
< 0, -0.1, 0> + <-0.1, 0, 0.1> = < -0.1, -0.1, 0.1>
<-1,0,1> (-0.1) x -1 + (-0.1) x 0 + (0.1) x 1 = 0.2 f(0.2)=1 1 No change required

2
<-1,1,1> (-0.1) x -1 + (-0.1) x 1 + (0.1) x 1 = 0.1 f(0.1)=1 0 0.1 x (0-1) x <-1, 1,1>
Update θ to = <0.1, -0.1, -0.1>
< -0.1, -0.1, 0.1> + <0.1, -0.1, -0.1> = <0, -0.2, 0>
<-1,0,0> 0 x -1 + (-0.2) x 0 + 0 x 0 = 0 f(0)=0 0 No change required
<-1,0,1> 0 x -1 + (-0.2) x 0 + 0 x 1 = 0 f(0)=0 1 0.1 x (1-0) x <-1, 0,1>
Update θ to = <-0.1, 0, 0.1>
<0, -0.2, 0> + <-0.1, 0, 0.1> = < -0.1, -0.2, 0.1>
<-1,0,1> (-0.1) x -1 + (-0.2) x 0 + (0.1) x 1 = 0.2 f(0.2)=1 1 No change required
3
<-1,1,1> (-0.1) x -1 + (-0.2) x 1 + (0.1) x 1 = 0 f(0)=0 0 No change required
<-1,0,0> (-0.1) x -1 + (-0.2) x 0 + (0.1) x 0 = 0.1 f(0.1)=1 0 0.1 x (0-1) x <-1, 0, 0>
Update θ to = < 0.1, 0, 0>
< -0.1, -0.2, 0.1> + < 0.1, 0, 0> = < 0, -0.2, 0.1>
<-1,0,1> 0 x -1 + (-0.2) x 0 + (0.1) x 1 = 0.1 f(0.1)=1 1 No change required
<-1,0,1> 0 x -1 + (-0.2) x 0 + (0.1) x 1 = 0.1 f(0.1)=1 1 No change required
4
<-1,1,1> 0 x -1 + (-0.2) x 1 + (0.1) x 1 = -0.1 f(-0.1)=0 0 No change required
<-1,0,0> 0 x -1 + (-0.2) x 0 + (0.1) x 0 = 0 f(0)=0 0 No change required
Since in the last epoch we didn’t have any updated for our weights (θ1), the algorithm converges here.
So, for our network to perform XOR function, the final θ1 would be θ1= < 0, -0.2, 0.1>. In other words,
θ10 = 0, θ11 = -0.2 and θ12 = 0.1
2. Consider the following multilayer perceptron.
The network should implement the XOR function. Perform one epoch of backpropagation as
introduced in the lecture on multilayer perceptron.
Notes:
• The activation function f for a perceptron is the sigmoid function:
1
𝑓(𝑥) =
1 + 𝑒 !"
• The bias nodes are set to -1. They are not shown in the network
• Use the following initial parameter values:
($) ($) (')
𝜃#$ = 2 𝜃#' = −1 𝜃#$ = −2
($) ($) (')
𝜃$$ = 6 𝜃$' = 8 𝜃$$ = 6
($) ($) (')
𝜃'$ = −6 𝜃'' = −8 𝜃'$ = −6
• The learning rate is set to η = 0.7
i. Compute the activations of the hidden and output neurons.
Since our activation function here is the sigmoid (σ) function, in each node (neuron) we can calculate the
*
output by applying the sigmoid function (𝜎(𝑥) = *,- !") on the weighted sum (Σ) of its input (that we
(0)
usually represent by 𝑧. where 𝑎 shows the level of the neuron (e.g., level 1, 2, 3) and 𝑏 is the index.
So, for the neuron i in level 1, we will have:

(*) (*) (*) (*) (*) (*)
𝑎' = 𝜎C𝑧' D = 𝜎C𝜃' . 𝑋D = 𝜎(𝜃!' × 𝑥! + 𝜃*' × 𝑥* + 𝜃+' × 𝑥+ )
Therefore, we will have:

(*) (*) (*) (*)
𝑎* = 𝜎(𝜃!* × 𝑥! + 𝜃** × 𝑥* + 𝜃+* × 𝑥+ )
= 𝜎(2 × (−1) + 6𝑥* − 6𝑥+ )
(*) (*) (*) (*)
𝑎+ = 𝜎(𝜃!+ × 𝑥! + 𝜃*+ × 𝑥* + 𝜃++ × 𝑥+ )
= 𝜎((−1) × (−1) + 8𝑥* − 8𝑥+ )
Now the output of our network is simply applying the same rule on the inputs of our last neuron:
(+) (+) (+) (*) (+) (*) (+) (*)
J = 𝑎* = 𝜎C𝑧* D = 𝜎C𝜃 (+) . 𝐴(*) D = 𝜎C𝜃!* × 𝑎! + 𝜃** × 𝑎* + 𝜃+* × 𝑎+ D
𝒚
(*) (*)
= 𝜎((−2) × (−1) + 6𝑎* − 6𝑎+ )
You can find the schematic representation of these calculations in the following network.
ii. Compute the error of the network.
1 1 (+)
𝐸 = (y − 𝑦))+ = (y − 𝑎* )+
2 2
iii. Backpropagate the error to determine ∆θij for all weights θij and updates the weight θij.
In neural networks with backpropagation, we want to minimize the error of our network by finding the
optimum weights (θij) for our network. To do so we want to find the relation (dependency) between the
error and the weights in each layer. Therefore, we use the derivatives of our error function.
(6) (6) (6)

𝜃45 ← 𝜃45 + ∆𝜃45
(6) 𝜕𝐸 (6) (67*)

𝑤ℎ𝑒𝑟𝑒: ∆𝜃45 = −𝜂 (6)
= 𝜂 𝛿5 𝑎4
𝜕𝜃45
(6) (6) (6) (6)
𝑎𝑛𝑑 𝛿5 = U1 − 𝜎C𝑧5 DV 𝜎C𝑧5 DC𝑦 − 𝑎5 D 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑙𝑎𝑠𝑡 𝑙𝑎𝑦𝑒𝑟
(𝑙)
𝜎 ( (𝑧𝑘 )
(6) (6) (6) (6,*) (6,*)

𝑜𝑟 𝛿5 = 𝜎C𝑧5 D U1 − 𝜎C𝑧5 DV 𝜃5* 𝛿* 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑙𝑎𝑦𝑒𝑟 𝑏𝑒𝑓𝑜𝑟𝑒
Now we use our first training data (1, 0) to train the model:
(*) (*)
𝑎* = 𝜎C𝑧* D = 𝜎(6𝑥* − 6𝑥+ − 2) = 𝜎(6 × 1 − 6 × 0 − 2) = 𝜎(4) ≅ 0.982
(*) (*)
𝑎+ = 𝜎C𝑧+ D = 𝜎(8𝑥* − 8𝑥+ + 1) = 𝜎(8 × 1 − 8 × 0 + 1) = 𝜎(9) ≅ 0.999
(+) (+) (*) (*)
𝑎* = 𝜎C𝑧* D = 𝜎C2 + 6𝑎* − 6𝑎+ D = 𝜎(2 + 6 × 0.982 − 6 × 0.999) = 𝜎(1.898)
≅ 0.8691
1
𝐸 = ( 1 − 0.8691)+ = 0.0086
2
(*) (*) (+) *
x1 x2 𝑎* 𝑎+ 𝑎* Y (XOR) E(𝜃)= + (y − 𝑦))+
1 0 𝜎(4) = 0.982 𝜎(9) = 0.999 0.8691 1 0.0086
Now we calculate the backpropagation error. Starting from the last layer, we will have:
(+) (+) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV (𝑦 − 𝑎* )
= 𝜎(1.898)C1 − 𝜎(1.898)D(1 − 0.8691) = 0.8691(1 − 0.8691)(0.13) = 0.0149
(*) (*) (*) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV 𝜃** 𝛿*
= 𝜎(4)C1 − 𝜎(4)D × 6 × 0.0149 = 0.982(1 − 0.982)0.0882 = 0.0016
(*) (*) (*) (+) (+)
𝛿+ = 𝜎C𝑧+ D U1 − 𝜎C𝑧+ DV 𝜃+* 𝛿*
= 𝜎(9)C1 − 𝜎(9)D × (−6) × 0.0149 = 0.999(1 − 0.999)(−0.0882) = −0.000001103
(6)
Using the learning rate of η = 0.7 we can now calculate ∆𝜃45 :
(+) (+) (*)

∆𝜃!* = 𝜂𝛿* 𝑎! = 0.7 × 0.0149 × (−1) = −0.0104
(+) (+) (*)
∆𝜃** = 𝜂𝛿* 𝑎* = 0.7 × 0.0149 × 0.982 = 0.0102
(+) (+) (*)
∆𝜃+* = 𝜂𝛿* 𝑎+ = 0.7 × 0.0149 × 0.999 = 0.0104
(*) (*)
∆𝜃!* = 𝜂𝛿* 𝑥! = 0.7 × 0.0016 × (−1) = −0.0011
(*) (*)
∆𝜃** = 𝜂𝛿* 𝑥* = 0.7 × 0.0016 × 1 = 0.0011
(*) (*)
∆𝜃+* = 𝜂𝛿* 𝑥+ = 0.7 × 0.0016 × 0 = 0
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.7 × 0.000001103 × (−1) ≅ 0
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = 0.7 × 0.000001103 × 1 ≅ 0
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0.7 × 0.000001103 × 0 = 0
Based on these results we can update the network weights:

(+) (+) (+)
𝜃!* = 𝜃!* + ∆𝜃!* = −2 + (−0.0104) = −2.0104
(+) (+) (+)
𝜃** = 𝜃** + ∆𝜃** = 6 + 0.0102 = 6.0102
(+) (+) (+)
𝜃+* = 𝜃+* + ∆𝜃+* = −6 + 0.0104 = −5.9896
(*) (*) (*)

𝜃!* = 𝜃!* + ∆𝜃!* = 2 + (−0.0011) = 1.9989
(*) (*) (*)
𝜃** = 𝜃** + ∆𝜃** = 6 + 0.0011 = 6.0011
The rest of the weights do not change i.e.

(*) (*) (*) (*)
𝜃!+ = −1, 𝜃+* = −6, 𝜃++ = −8, ∆𝜃*+ = 8
Now we use our next training instance (0,1):

(*) (*)
𝑎* = 𝜎C𝑧* D = 𝜎(1.9989 + 6.00112𝑥* − 6𝑥+ ) = 𝜎(6.00112 × 0 − 6 × 1 + 1.9989)
= 𝜎(−7.9989) ≅ 3.387𝑒 − 4
(*) (*)
𝑎+ = 𝜎C𝑧+ D = 𝜎(−1 + 8𝑥* − 8𝑥+ ) = 𝜎(8 × 0 − 8 × 1 + 1) = 𝜎(−7) ≅ 9.11𝑒 − 4
(+) (+)
𝑎* = 𝜎C𝑧* D = 𝜎C−2.01029 + 6.0101 × 𝜎(−7.9989) − 5.98972 × 𝜎(−7)D = 𝜎(2.007)
= 0.8815
1
𝐸 = ( 1 − 0.8815)+ = 0.007
2
(*) (*) (+) *

x1 x2 𝑎* 𝑎+ 𝑎* Y (XOR) E(𝜃)= (y − 𝑦))+
+
1 0 𝜎(4) 𝜎(9) 0.8691 1 0.0086
0 1 𝜎(−7.9989) 𝜎(−7) 0.8815 1 0.007
To calculate the backpropagation error, we will have1:
(+) (+) (+) (+)

𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV C𝑦 − 𝑎* D
= 𝜎(2.007)C1 − 𝜎(2.007)D(1 − 0.8815) = 0.0124
(*) (*) (*) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV 𝜃** 𝛿*
= 𝜎(−7.9989)C1 − 𝜎(−7.9989)D × 6.0102 × 0.0124 ≅ 0
(*) (*) (*) (+) (+)
𝛿+ = 𝜎C𝑧+ D U1 − 𝜎C𝑧+ DV 𝜃+* 𝛿*
= 𝜎(−7)C1 − 𝜎(−7)D × (−5.9896) × 0.0124 ≅ 0
(+) (+) (*)

∆𝜃!* = 𝜂𝛿* 𝑎! = 0.7 × 0.0124 × (−1) = −0.0087
(+) (+) (*)
∆𝜃** = 𝜂𝛿* 𝑎* = 0.7 × 0.0124 × 3.387𝑒 − 4 ≅ 0
(+) (+) (*)
∆𝜃+* = 𝜂𝛿* 𝑎+ = 0.7 × 0.0124 × 9.11𝑒 − 4 ≅ 0
(*) (*)
∆𝜃!* = 𝜂𝛿* 𝑥! = 0.7 × (2.496 𝑒 − 5) × (−1) ≅ 0
(*) (*)
∆𝜃** = 𝜂𝛿* 𝑥* = 0.7 × (2.496 𝑒 − 5) × 0 = 0
(*) (*)
∆𝜃+* = 𝜂𝛿* 𝑥+ = 0.7 × (2.496 𝑒 − 5) × 1 ≅ 0
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.7 × (−6.7454 𝑒 − 5) × (−1) ≅ 0
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = 0.7 × (−6.7454 𝑒 − 5) × 0 = 0
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0.7 × (−6.7454 𝑒 − 5) × 1 ≅ 0

(+) (+) (+)
𝜃!* = 𝜃!* + ∆𝜃!* = −2.01029 + (−0.0087) ≅ −2.0191
Since the other values are very small, the rest of the weights do not change.
Now we use our next training instance (1, 1):

(*) (*)
𝑎* = 𝜎C𝑧* D = 𝜎(6.0102𝑥* − 5.9896𝑥+ + 2.0191)
= 𝜎(6.0102 × 1 − 5.9896 × 1 + 2.0191) = 𝜎(−1.9978)
(*) (*)
𝑎+ = 𝜎C𝑧+ D = 𝜎(8𝑥* − 8𝑥+ + 0.9999) = 𝜎(8 × 1 − 8 × 1 + 0.9999) = 𝜎(0.9999)
(+) (+)
𝑎* = 𝜎C𝑧* D = 𝜎C2.0191 + 6.0102 × 𝜎(−1.9978) − 5.9896 × 𝜎(0.9999)D
= 𝜎(−1.6416) = 0.1622
1
𝐸 = ( 0 − 0.1622)+ = 0.0132
2
(*) (*) (+) *

x1 x2 𝑎* 𝑎+ 𝑎* Y (XOR) E(𝜃)= (y − 𝑦))+
+
1 0 𝜎(4) 𝜎(9) 0.8691 1 0.0086
0 1 𝜎(−7.9989) 𝜎(−7) 0.8815 1 0.007
1 1 𝜎(−1.9978) 𝜎(0.9999) 0.1622 0 0.0132
1
In this assignment, we make the simplifying assumption that any update < 1e-4 ≅ 0 to keep the computations feasible.
To calculate the backpropagation error, we will have:
(+) (+) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV C𝑦 − 𝑎* D = −0.0221
(*) (*) (*) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV 𝜃** 𝛿* = −0.0139
(*) (*) (*) (+) (+)
𝛿+ = 𝜎C𝑧+ D U1 − 𝜎C𝑧+ DV 𝜃+* 𝛿* = 0.0260
(+) (+) (*)

∆𝜃!* = 𝜂𝛿* 𝑎! = 0.0154
(+) (+) (*)
∆𝜃** = 𝜂𝛿* 𝑎* = −0.0018
(+) (+) (*)
∆𝜃+* = 𝜂𝛿* 𝑎+ = −0.0113
(*) (*)
∆𝜃!* = 𝜂𝛿* 𝑥! = −0.0098
(*) (*)
∆𝜃** = 𝜂𝛿* 𝑥* = −0.0098
(*) (*)
∆𝜃+* = 𝜂𝛿* 𝑥+ = 0.0182
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.0182
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = −0.0098
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0.0182

(+) (+) (+)
𝜃!* = 𝜃!* + ∆𝜃!* = −2.0037
(+) (+) (+)
𝜃** = 𝜃** + ∆𝜃** = 6.0084
(+) (+) (+)
𝜃+* = 𝜃+* + ∆𝜃+* = −6.0009
(*) (*) (*)

𝜃!* = 𝜃!* + ∆𝜃!* = 2.0086
(*) (*) (*)
𝜃** = 𝜃** + ∆𝜃** = 5.9913
(*) (*) (*)
𝜃+* = 𝜃+* + ∆𝜃+* = −6.0097
(*) (*) (*)

𝜃!+ = 𝜃!+ + ∆𝜃!+ = −1.0181
(*) (*) (*)
𝜃*+ = 𝜃*+ + ∆𝜃*+ = 8.0182
(*) (*) (*)
𝜃++ = 𝜃++ + ∆𝜃++ = −7.9819
Now we use our next training instance (0, 0):

(*) (*)
𝑎* = 𝜎C𝑧* D = 𝜎(5.9913𝑥* − 6.00097𝑥+ + 2.0037)
= 𝜎(5.9913 × 0 − 6.00097 × 0 + 2.0086) = 𝜎(2.0086)
(*) (*)
𝑎+ = 𝜎C𝑧+ D = 𝜎(8.0182𝑥* − 7.9819𝑥+ + 1.0181) = 𝜎(1.0181)
(+) (+)
𝑎* = 𝜎C𝑧* D = 𝜎C2.0037 + 6.0084 × 𝜎(2.0086) − 6.0009 × 𝜎(1.0181)D
= 𝜎(−1.6938) = 0.1553
1
𝐸 = ( 00.1553)+ = 0.0121
2
(*) (*) (+) *
x1 x2 𝑎* 𝑎+ 𝑎* Y (XOR) E(𝜃)= + (y − 𝑦))+
1 0 𝜎(4) 𝜎(9) 0.8691 1 0.0086
0 1 𝜎(−7.9989) 𝜎(−7) 0.8815 1 0.007
1 1 𝜎(−1.9978) 𝜎(0.9999) 0.1622 0 0.0132
0 0 𝜎(−2.0086) 𝜎(1.0181) 0.1553 0 0.0121
To calculate the backpropagation error, we will have:
(+) (+) (+) (+)

𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV C𝑦 − 𝑎* D = −0.0204
(*) (*) (*) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV 𝜃** 𝛿* = −0.0128
(*) (*) (*) (+) (+)
𝛿+ = 𝜎C𝑧+ D U1 − 𝜎C𝑧+ DV 𝜃+* 𝛿* = 0.0238
(+) (+) (*)

∆𝜃!* = 𝜂𝛿* 𝑎! = 0.0143
(+) (+) (*)
∆𝜃** = 𝜂𝛿* 𝑎* = −0.0017
(+) (+) (*)
∆𝜃+* = 𝜂𝛿* 𝑎+ = −0.0105
(*) (*)
∆𝜃!* = 𝜂𝛿* 𝑥! = −0.0098
(*) (*)
∆𝜃** = 𝜂𝛿* 𝑥* = 0
(*) (*)
∆𝜃+* = 𝜂𝛿* 𝑥+ = 0
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.0167
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = 0
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0

(+) (+) (+)
𝜃!* = 𝜃!* + ∆𝜃!* = −2.0176
(+) (+) (+)
𝜃** = 𝜃** + ∆𝜃** = 6.0067
(+) (+) (+)
𝜃+* = 𝜃+* + ∆𝜃+* = −6.0113
(*) (*) (*)

𝜃!* = 𝜃!* + ∆𝜃!* = 2.0176
(*) (*) (*)
𝜃** = 𝜃** + ∆𝜃** = 5.9913
(*) (*) (*)
𝜃+* = 𝜃+* + ∆𝜃+* = −6.0097
(*) (*) (*)

𝜃!+ = 𝜃!+ + ∆𝜃!+ = −1.0348
(*) (*) (*)
𝜃*+ = 𝜃*+ + ∆𝜃*+ = 8.0182
(*) (*) (*)
𝜃++ = 𝜃++ + ∆𝜃++ = −7.9819
And this would be the end of epoch one! J

School of Computing and Information Systems The University of Melbourne COMP90049 Introduction To Machine Learning (Semester 1, 2022)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

School of Computing and Information Systems The University of Melbourne COMP90049 Introduction To Machine Learning (Semester 1, 2022)

Uploaded by

Copyright:

Available Formats

School of Computing and Information Systems

The University of Melbourne

Learning Algorithm for XOR:

<-1,0,1> (-0.1) x -1 + 0 x 0 + (0.1) x 1 = 0.2 f(0.2)=1 1 No change required

<-1,0,0> 0 x -1 + (-0.1) x 0 + 0 x 0 = 0 f(0)=0 0 No change required

<-1,0,1> (-0.1) x -1 + (-0.1) x 0 + (0.1) x 1 = 0.2 f(0.2)=1 1 No change required

i. Compute the activations of the hidden and output neurons.

So, for the neuron i in level 1, we will have:

Therefore, we will have:

(6) (6) (6)

(6) 𝜕𝐸 (6) (67*)

(6) (6) (6) (6,*) (6,*)

(+) (+) (*)

Based on these results we can update the network weights:

(*) (*) (*)

The rest of the weights do not change i.e.

Now we use our next training instance (0,1):

(*) (*) (+) *

(+) (+) (+) (+)

(+) (+) (*)

Based on these results we can update the network weights:

Now we use our next training instance (1, 1):

(*) (*) (+) *

(+) (+) (*)

Based on these results we can update the network weights:

(*) (*) (*)

(*) (*) (*)

Now we use our next training instance (0, 0):

To calculate the backpropagation error, we will have:

(+) (+) (+) (+)

(+) (+) (*)

Based on these results we can update the network weights:

(*) (*) (*)

(*) (*) (*)

And this would be the end of epoch one! J

You might also like

(6) (6) (6) (6,) (6,)

() () (*)

() () (+) *

() () (+) *

() () (*)

() () (*)

() () (*)

() () (*)