Professional Documents
Culture Documents
1. Consider the two levels deep network illustrated below. It is composed of three perceptron.
The two perceptron of the first level implement the AND and OR function, respectively.
Determine the weights θ1, θ2 and bias θ0 such that the network implements the XOR
function. The initial weights are set to zero, i.e., θ01 = θ11 = θ21 = 0, and the learning rate 𝜂
(eta) is set to 0.1.
Notes:
• The input function for the perceptron on level 2 is the weighted sum (Σ) of its input.
• The activation function f for the perceptron on level 2 is a step function:
• Assume that the weights for the perceptron of the first level are given.
To calculate the output of the two level 1 perceptron, we can use the following table (Remember since
the weights for these perceptron are given, we don’t need to do the iterative learning):
Train instance # 𝑥* 𝑥+ 𝑝* 𝑝+ y
𝑓"#$ (𝑥) 𝑓%& (𝑥) 𝑥* 𝑋𝑂𝑅 𝑥+
1 1 0 0 1 1
2 0 1 0 1 1
3 1 1 1 1 0
4 0 0 0 0 0
Based on the results from above table, our input signals (training instances) for level 2 perceptron (XOR)
are 𝑃' =< 𝑝! , 𝑝* , 𝑝+ > = < −1 , 𝑝* , 𝑝+ > and the parameters are θ =<θ01, θ11, θ21> = <0, 0, 0>.
So, for the first epoch we have:
ep P z=θ.P 𝑦) y ∆𝜃*'
<𝑝! , 𝑝* , 𝑝+ > 𝜃! × 𝑝! + 𝜃* × 𝑝* + 𝜃+ × 𝑝+ f(z) 𝜂 (𝑦 − 𝑦)) 𝑝'
<-1,0,1> 0 x (-1) + 0 x 0 + 0 x 1=0 f(0)=0 1 0.1 x (1-0) x <-1, 0,1>
Update θ to = <-0.1, 0, 0.1>
<0, 0, 0> + <-0.1,0 ,0.1>=<-0.1,0 ,0.1>
Since in the last epoch we didn’t have any updated for our weights (θ1), the algorithm converges here.
So, for our network to perform XOR function, the final θ1 would be θ1= < 0, -0.2, 0.1>. In other words,
θ10 = 0, θ11 = -0.2 and θ12 = 0.1
2. Consider the following multilayer perceptron.
The network should implement the XOR function. Perform one epoch of backpropagation as
introduced in the lecture on multilayer perceptron.
Notes:
• The activation function f for a perceptron is the sigmoid function:
1
𝑓(𝑥) =
1 + 𝑒 !"
• The bias nodes are set to -1. They are not shown in the network
• Use the following initial parameter values:
($) ($) (')
𝜃#$ = 2 𝜃#' = −1 𝜃#$ = −2
($) ($) (')
𝜃$$ = 6 𝜃$' = 8 𝜃$$ = 6
($) ($) (')
𝜃'$ = −6 𝜃'' = −8 𝜃'$ = −6
• The learning rate is set to η = 0.7
Since our activation function here is the sigmoid (σ) function, in each node (neuron) we can calculate the
*
output by applying the sigmoid function (𝜎(𝑥) = *,- !") on the weighted sum (Σ) of its input (that we
(0)
usually represent by 𝑧. where 𝑎 shows the level of the neuron (e.g., level 1, 2, 3) and 𝑏 is the index.
Now the output of our network is simply applying the same rule on the inputs of our last neuron:
(+) (+) (+) (*) (+) (*) (+) (*)
J = 𝑎* = 𝜎C𝑧* D = 𝜎C𝜃 (+) . 𝐴(*) D = 𝜎C𝜃!* × 𝑎! + 𝜃** × 𝑎* + 𝜃+* × 𝑎+ D
𝒚
(*) (*)
= 𝜎((−2) × (−1) + 6𝑎* − 6𝑎+ )
You can find the schematic representation of these calculations in the following network.
ii. Compute the error of the network.
1 1 (+)
𝐸 = (y − 𝑦))+ = (y − 𝑎* )+
2 2
iii. Backpropagate the error to determine ∆θij for all weights θij and updates the weight θij.
In neural networks with backpropagation, we want to minimize the error of our network by finding the
optimum weights (θij) for our network. To do so we want to find the relation (dependency) between the
error and the weights in each layer. Therefore, we use the derivatives of our error function.
Now we use our first training data (1, 0) to train the model:
(*) (*)
𝑎* = 𝜎C𝑧* D = 𝜎(6𝑥* − 6𝑥+ − 2) = 𝜎(6 × 1 − 6 × 0 − 2) = 𝜎(4) ≅ 0.982
(*) (*)
𝑎+ = 𝜎C𝑧+ D = 𝜎(8𝑥* − 8𝑥+ + 1) = 𝜎(8 × 1 − 8 × 0 + 1) = 𝜎(9) ≅ 0.999
(+) (+) (*) (*)
𝑎* = 𝜎C𝑧* D = 𝜎C2 + 6𝑎* − 6𝑎+ D = 𝜎(2 + 6 × 0.982 − 6 × 0.999) = 𝜎(1.898)
≅ 0.8691
1
𝐸 = ( 1 − 0.8691)+ = 0.0086
2
(*) (*) (+) *
x1 x2 𝑎* 𝑎+ 𝑎* Y (XOR) E(𝜃)= + (y − 𝑦))+
1 0 𝜎(4) = 0.982 𝜎(9) = 0.999 0.8691 1 0.0086
Now we calculate the backpropagation error. Starting from the last layer, we will have:
(+) (+) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV (𝑦 − 𝑎* )
= 𝜎(1.898)C1 − 𝜎(1.898)D(1 − 0.8691) = 0.8691(1 − 0.8691)(0.13) = 0.0149
(*) (*) (*) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV 𝜃** 𝛿*
= 𝜎(4)C1 − 𝜎(4)D × 6 × 0.0149 = 0.982(1 − 0.982)0.0882 = 0.0016
(*) (*) (*) (+) (+)
𝛿+ = 𝜎C𝑧+ D U1 − 𝜎C𝑧+ DV 𝜃+* 𝛿*
= 𝜎(9)C1 − 𝜎(9)D × (−6) × 0.0149 = 0.999(1 − 0.999)(−0.0882) = −0.000001103
(6)
Using the learning rate of η = 0.7 we can now calculate ∆𝜃45 :
(*) (*)
∆𝜃!* = 𝜂𝛿* 𝑥! = 0.7 × 0.0016 × (−1) = −0.0011
(*) (*)
∆𝜃** = 𝜂𝛿* 𝑥* = 0.7 × 0.0016 × 1 = 0.0011
(*) (*)
∆𝜃+* = 𝜂𝛿* 𝑥+ = 0.7 × 0.0016 × 0 = 0
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.7 × 0.000001103 × (−1) ≅ 0
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = 0.7 × 0.000001103 × 1 ≅ 0
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0.7 × 0.000001103 × 0 = 0
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.7 × (−6.7454 𝑒 − 5) × (−1) ≅ 0
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = 0.7 × (−6.7454 𝑒 − 5) × 0 = 0
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0.7 × (−6.7454 𝑒 − 5) × 1 ≅ 0
Since the other values are very small, the rest of the weights do not change.
1
In this assignment, we make the simplifying assumption that any update < 1e-4 ≅ 0 to keep the computations feasible.
To calculate the backpropagation error, we will have:
(+) (+) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV C𝑦 − 𝑎* D = −0.0221
(*) (*) (*) (+) (+)
𝛿* = 𝜎C𝑧* D U1 − 𝜎C𝑧* DV 𝜃** 𝛿* = −0.0139
(*) (*) (*) (+) (+)
𝛿+ = 𝜎C𝑧+ D U1 − 𝜎C𝑧+ DV 𝜃+* 𝛿* = 0.0260
(*) (*)
∆𝜃!* = 𝜂𝛿* 𝑥! = −0.0098
(*) (*)
∆𝜃** = 𝜂𝛿* 𝑥* = −0.0098
(*) (*)
∆𝜃+* = 𝜂𝛿* 𝑥+ = 0.0182
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.0182
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = −0.0098
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0.0182
(*) (*)
∆𝜃!* = 𝜂𝛿* 𝑥! = −0.0098
(*) (*)
∆𝜃** = 𝜂𝛿* 𝑥* = 0
(*) (*)
∆𝜃+* = 𝜂𝛿* 𝑥+ = 0
(*) (*)
∆𝜃!+ = 𝜂𝛿+ 𝑥! = 0.0167
(*) (*)
∆𝜃*+ = 𝜂𝛿+ 𝑥* = 0
(*) (*)
∆𝜃++ = 𝜂𝛿+ 𝑥+ = 0