Professional Documents
Culture Documents
Subha Fernando
Dr.Eng, M.Eng, B.Sc(Special)Hons.
X1 X2 XOR
0 0 0
0 1 1
1 0 1
1 1 0
Weight matrix of the input layer neurons and hidden layer neurons
b1 b2 −1.5 0.5 1
W = w
11 w 12
=
1 1
lets consider the input X = 0
w21 w22 1 1 0
T
−1.5 0.5 1
h i
Y1H = Φ W1
T × X = Φ
1 1 × 0
1
1 1 0
1
T
−1.5 1 1 H
−1.5 0 y
Y1H = Φ = = 1
× 0 = Φ
H
0.5 1 1 0.5 1 y2
0
Weight matrix of the hidden layer neurons and the output layer neuron
T
b3 1
h i
Y1O = Φ W1 T ×X
1 = Φ
w × y H
31
1
w32 H
y2
1
Y1O = Φ −0.5 × 0 = Φ [−0.5] = 0
−2 1
1
Similarly by classifying the other inputs, we can show that XOR problem can be solved using Multilayer Perceptron Algorithm
How to Adjust the Weights of the network using the Learning rule
For Output Neurons:
W0i (n + 1) =
W0i (n) + αW0i (n − 1) + ηδoi (n)Y
W31 (n + 1) =
W31 (n) + αW31 (n − 1) + ηδo1 (n) × y1
W32 (n + 1) =
W32 (n) + αW32 (n − 1) + ηδo1 (n) × y2
W41 (n + 1) =
W41 (n) + αW41 (n − 1) + ηδo2 (n) × y1
W42 (n + 1) =
W42 (n) + αW42 (n − 1) + ηδo2 (n) × y2
For Hidden Neurons:
For Bias Terms Wij (n + 1) =
bi (n+1) = bi (n)+αbi (n−1)+ηδi (n)×1 Wij (n) + αWij (n − 1) + ηδhi (n)xj
b3 (n + 1) = W11 (n + 1) =
b3 (n) + αb3 (n − 1) + ηδ01 (n) × 1 W11 (n) + αW11 (n − 1) + ηδh1 (n) × x1
b4 (n + 1) = W12 (n + 1) =
b4 (n) + αb4 (n − 1) + ηδ02 (n) × 1 W12 (n) + αW12 (n − 1) + ηδh1 (n) × x2
b2 (n + 1) = W21 (n + 1) =
b2 (n) + αb2 (n − 1) + ηδh2 (n) × 1 W21 (n) + αW21 (n − 1) + ηδh2 (n) × x1
b1 (n + 1) = W22 (n + 1) =
b1 (n) + αb1 (n − 1) + ηδh1 (n) × 1 W22 (n) + αW22 (n − 1) + ηδh2 (n) × x2
Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,
Artificial Neural Network (ANN) Slide 7
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm
BP- Example
Output Calculations:
v1 = 1 × b1 + x1 × w11 + x2 × w12
y1 = Φ(v1 )
v2 = 1 × b2 + x1 × w21 + x2 × w22
y2 = Φ(v2 )
v3 = 1 × b3 + y1 × w31 + y2 × w32
y3 = Φ(v3 )
Therefore e3 = d3 − y3, in order to reduce the
error, the error will be back propagated and update
the weight matrix.
Gradients Calculations:
δo1 = Φ1 (v3 ) × e3 = Φ(v3 ) (1 − Φ(v3 )) × e3
δh1 = Φ1 (v1 ) × (δ01 × w31 )
δh2 = Φ1 (v2 ) × (δ01 × w32 )
Weight Calculations:
d3 = 0.9 and η = 0.25 and α = 0.0001 w31 (2) = w31 (1) + α × w31 (0) + ηδo1 (1) × y1 ;
Take at the first step: w31 (0) = w31 (1)
.....................
Draw the updated network after first training step.
Consider multi-layer network with one hidden layer, where nodes in input layer are
index by i values, nodes in hidden layer are indexed by j values, and nodes in
output layer are indexed by k values.
Then weights between input and hidden layers are symbolized by wji and the
weights between hidden and output layer neurons are symbolized by wkj .
Consider the network, if gradient descent approach is used to update weights, the
objective of the learning is to modify the weight matrices to reduce a sum of
square error E = k (dk − yk )2
P
The back propagation algorithm applies a correction to the synaptic weight ∆wkj
∂E(n)
which is proportional to the partial derivative ∂wkj (n)
.
1 2
∴ ∂e∂E(n)
P n
from (2) E = k e (n)
2 k
, = ek (n) − − − (5)
k
∂ek (n)
from (1) ek (n) = dk (n) − yk (n), ∴ ∂y (n) = −1 − − − (6)
k
∂y (n)
from (4) yk (n) = φ (vk (n)) , ∴ ∂vk (n) = φ1 (vk (n)) − − − − − −(7)
k
P ∂v (n)
from (3) vk (n) = j=0 wkj (n)yj (n), ∴ ∂w k (n) = yj (n) − − − − − − − − − −(8)
kj
∂E(n)
∂wkj (n)
= −ek (n)yj (n)φ1 (vk (n))
The correction ∆wkj (n) applied to wkj (n) is defined by the delta rule:
∂E(n)
∆wkj (n) = −η w where η is the learning rate parameter of the back propagation
kj (n)
algorithm (The use of minus sign is to seek a direction for weight change that reduces
the value of E(n))
∆wkj (n) = ηδk (n)yj (n)
where δk (n) = ek (n)φ1 (vk (n))
Properties of BP
Advantageous and Limitations of BP
The back propagation algorithm applies
a correction δwij (n) to the synaptic
weight wij (n) which is proportional to
∂Error
∂w
.
ij
Advantageous:
Relatively simple implementation
A Standard method and generally works
well
Limitations:
Slow and inefficient
It can get stuck in local minima resulting in
sub-optimal solutions
Properties of BP - Momentum α
Minibatch Learning
Noise Structure