You are on page 1of 17

Artificial Neural Network (ANN)

Back Propagation Algorithm

Subha Fernando
Dr.Eng, M.Eng, B.Sc(Special)Hons.

Department of Computational Mathematics


University of Moratuwa

December 11, 2020

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 1
Department of Computational Mathematics
Faculty of Information Technology
Artificial Neural Network University of Moratuwa
Multilayer Perceptron Model

Multilayer Perceptron Algorithm


Multi Layer Perceptron is used to describe any general feed-forward network.
MLP consists of input layer, one or more hidden layer and an output layer.
The training of the network is done by the highly popular algorithm known as
error-back propagation algorithm.
There are two passes through the different layers of the network
Forward Pass (inputs are passed from input layers to output layers)
Backward Pass (errors are passed from output layers to input layers).

Lets solve the XOR problem with 2 layers.


For that consider a network with 2 inputs, 1 hidden layers and 1 output layer.
Assume that activation function is the threshold function.
Exercise 1: Initial parameters are:
W11 = W12 = W21 = W22 = W32 = +1, W31 = −2, b1 = −1.5, b2 = b3 = −0.5
Exercise 2: Initial parameters are: W11 = W12 = W21 = W22 = W32 = +1, W31 =
1, b1 = 1.5, b2 = 0.5, b3 = −0.5

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 2
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
XOR Problem with 2 layers

Wij is the weight associated with the


connection from j th neuron to ith neuron.
Activation
 function is

 1 if v ≥ 0

φ (v) =

 0 if v < 0

X1 X2 XOR

0 0 0

0 1 1

1 0 1

1 1 0
Weight matrix of the input layer neurons and hidden layer neurons
     
 b1 b2  −1.5 0.5 1
     
     
W =  w
 11 w 12 
 = 
 1 1 
 lets consider the input X = 0
 
     
     
w21 w22 1 1 0

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 3
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
XOR Problem with 2 layers

 T  
−1.5 0.5 1
   
h i   
Y1H = Φ W1
T × X = Φ 
1  1  × 0
1   
  
 
 
  
1 1 0

  
1
 T      

 −1.5 1 1 H
−1.5 0 y 
 
Y1H = Φ   =   =  1 
  
  × 0 = Φ 
  
H
      
 0.5 1 1   0.5 1 y2
  
0

Weight matrix of the hidden layer neurons and the output layer neuron
 T  
 b3   1 
h i     
Y1O = Φ W1 T ×X  
1 = Φ 
 w  × y H 

 31 
  1  
   
w32 H
y2
  
  1
  
Y1O = Φ  −0.5 × 0 = Φ [−0.5] = 0
  
 −2 1  
  
1

Similarly by classifying the other inputs, we can show that XOR problem can be solved using Multilayer Perceptron Algorithm

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 4
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

BP Algorithm General Procedure

The Activation function is Sigmoid function:


1
Φ (v) = 1+exp(−v)
The first derivation of the function is :
Φ−1 (v) = Φ(v)(1 − Φ(v))

Do the forward pass,


 i.e. calculate the output of each neuron i using
yi = Φ W T × X
Calculate the local gradients for the neurons , i.e. δi
Adjust the weights of the network using learning rule, i.e.Wij .

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 5
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

How to Calculate Local Gradients

δo1 = Φ1 ((1 × b3 ) + (y1 × w31 ) + (y2 × w32 )) ×


(d3 − y3 )
δo2 = Φ1 ((1 × b4 ) + (y1 × w41 ) + (y2 × w42 )) ×
(d4 − y4 )
δh1 = Φ1 ((1 × b1 ) + (x1 × w11 ) + (x2 × w12 )) ×
((δ01 × w31 ) + (δ02 × w41 ))
δh2 = Φ1 ((1 × b2 ) + (x1 × w21 ) + (x2 × w22 )) ×
((δ01 × w32 ) + (δ02 × w42 ))

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 6
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

How to Adjust the Weights of the network using the Learning rule
For Output Neurons:
W0i (n + 1) =
W0i (n) + αW0i (n − 1) + ηδoi (n)Y
W31 (n + 1) =
W31 (n) + αW31 (n − 1) + ηδo1 (n) × y1
W32 (n + 1) =
W32 (n) + αW32 (n − 1) + ηδo1 (n) × y2
W41 (n + 1) =
W41 (n) + αW41 (n − 1) + ηδo2 (n) × y1
W42 (n + 1) =
W42 (n) + αW42 (n − 1) + ηδo2 (n) × y2
For Hidden Neurons:
For Bias Terms Wij (n + 1) =
bi (n+1) = bi (n)+αbi (n−1)+ηδi (n)×1 Wij (n) + αWij (n − 1) + ηδhi (n)xj
b3 (n + 1) = W11 (n + 1) =
b3 (n) + αb3 (n − 1) + ηδ01 (n) × 1 W11 (n) + αW11 (n − 1) + ηδh1 (n) × x1
b4 (n + 1) = W12 (n + 1) =
b4 (n) + αb4 (n − 1) + ηδ02 (n) × 1 W12 (n) + αW12 (n − 1) + ηδh1 (n) × x2
b2 (n + 1) = W21 (n + 1) =
b2 (n) + αb2 (n − 1) + ηδh2 (n) × 1 W21 (n) + αW21 (n − 1) + ηδh2 (n) × x1
b1 (n + 1) = W22 (n + 1) =
b1 (n) + αb1 (n − 1) + ηδh1 (n) × 1 W22 (n) + αW22 (n − 1) + ηδh2 (n) × x2
Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,
Artificial Neural Network (ANN) Slide 7
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

BP- Example
Output Calculations:
v1 = 1 × b1 + x1 × w11 + x2 × w12
y1 = Φ(v1 )
v2 = 1 × b2 + x1 × w21 + x2 × w22
y2 = Φ(v2 )
v3 = 1 × b3 + y1 × w31 + y2 × w32
y3 = Φ(v3 )
Therefore e3 = d3 − y3, in order to reduce the
error, the error will be back propagated and update
the weight matrix.
Gradients Calculations:
δo1 = Φ1 (v3 ) × e3 = Φ(v3 ) (1 − Φ(v3 )) × e3
δh1 = Φ1 (v1 ) × (δ01 × w31 )
δh2 = Φ1 (v2 ) × (δ01 × w32 )
Weight Calculations:
d3 = 0.9 and η = 0.25 and α = 0.0001 w31 (2) = w31 (1) + α × w31 (0) + ηδo1 (1) × y1 ;
Take at the first step: w31 (0) = w31 (1)
.....................
Draw the updated network after first training step.

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 8
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Back Propagation Algorithm- Theory


In the Single Layer Perceptron algorithm, we used gradient descent on the error
function to find the correct weights ∆w(n) = η[d − y]X(n)
∆wij (n) = η[di − yi ]Xj (n)
We see that errors/updates are local to the node i.e. the change in the weight
from node j to output i (wij ) is controlled by the input that travels along the
connection and the error signal from output i.
But the problem is how to calculate the weight changes to hidden layer when the
output is calculated only for output layer neurons?
Back Propagation has two phases:
Forward pass phase: computes ’functional signal’, feedforward propagation of input
pattern signals through network.
Backward pass phase: computes ’error signal’, propagates the error backwards through
network starting at output units (where the error is the difference between actual and
desired output values).

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 9
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Back Propagation Algorithm- Theory

Consider multi-layer network with one hidden layer, where nodes in input layer are
index by i values, nodes in hidden layer are indexed by j values, and nodes in
output layer are indexed by k values.
Then weights between input and hidden layers are symbolized by wji and the
weights between hidden and output layer neurons are symbolized by wkj .
Consider the network, if gradient descent approach is used to update weights, the
objective of the learning is to modify the weight matrices to reduce a sum of
square error E = k (dk − yk )2
P

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 10
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Back Propagation Algorithm- Theory


The error signal at the output of neuron k at iteration n is defined by
ek (n) = dk (n) − yk (n) − − − −(1)
Therefore error energy for neuron k is
E = 21 e2k (n)
ThusP total energyof the overall neurons in the output layer is
E = k 12 e2k (n) − − − −(2)
The induced local field vk (n) produced at the input of the activation function
associatedPwith neuron k is therefore,
vk (n) = j=0 wkj (n)yj (n) − − − − − − − − − − − (3)
yk (n) = φ (vk (n)) − − − − − − − − − − − − − − − (4)

The back propagation algorithm applies a correction to the synaptic weight ∆wkj
∂E(n)
which is proportional to the partial derivative ∂wkj (n)
.

∂E(n) ∂En ∂e (n)


∂wkj (n)
= . k
∂ek (n) ∂wkj (n)
∂E(n) ∂En ∂e (n) ∂yk (n)
∂wkj (n)
= . k .
∂ek (n) ∂yk (n) ∂wkj (n)
∂E(n) ∂En ∂ek (n) ∂yk (n) ∂vk (n)
∂wkj (n)
= . . .
∂ek (n) ∂yk (n) ∂vk (n) ∂wkj (n)
Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,
Artificial Neural Network (ANN) Slide 11
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Back Propagation Algorithm- Theory


∂E(n) ∂e (n) ∂y (n) ∂v (n)
∂w (n)
= ∂e∂E(n)
n
. ∂yk (n) . ∂vk (n) . ∂w k (n)
kj k k k kj

1 2
∴ ∂e∂E(n)
P  n
from (2) E = k e (n)
2 k
, = ek (n) − − − (5)
k
∂ek (n)
from (1) ek (n) = dk (n) − yk (n), ∴ ∂y (n) = −1 − − − (6)
k
∂y (n)
from (4) yk (n) = φ (vk (n)) , ∴ ∂vk (n) = φ1 (vk (n)) − − − − − −(7)
k
P ∂v (n)
from (3) vk (n) = j=0 wkj (n)yj (n), ∴ ∂w k (n) = yj (n) − − − − − − − − − −(8)
kj
∂E(n)
∂wkj (n)
= −ek (n)yj (n)φ1 (vk (n))
The correction ∆wkj (n) applied to wkj (n) is defined by the delta rule:
∂E(n)
∆wkj (n) = −η w where η is the learning rate parameter of the back propagation
kj (n)

algorithm (The use of minus sign is to seek a direction for weight change that reduces
the value of E(n))
∆wkj (n) = ηδk (n)yj (n)
where δk (n) = ek (n)φ1 (vk (n))

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 12
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Back Propagation Algorithm- Theory


∂E(n)
For δk (n) = ek (n)φ1 (vk (n)); we can show that: δk (n) = ∂vk (n)
For hidden neurons
∂E(n) ∂y (n)
δj (n) = ∂y (n) . ∂vj (n)
j j
Differentiate (2) with respect to the function signal yj (n) we get, from (2),
E = k 12 e2k (n)
P 
∂E P ∂e (n)
∂yj (n)
= k ek (n) ∂yk (n)
j
Again using the chain rule,
∂E P ∂e (n) ∂v (n)
∂y (n)
= k ek (n) ∂vk (n) . ∂yk (n)
j k j
∂e (n)
from (1), ek (n) = dk (n) − yk (n) and yk (n) = φ (vk (n)), ∴ ∂vk (n) = φ1 (vk (n))
k
P ∂v (n)
from (3), vk (n) = j=0 wkj (n)yj (n), ∴ ∂yk (n) = wkj (n)
j
∂e (n) ∂v (n)
Now substitute the values to ∂y∂E(n) = k ek (n) ∂vk (n) . ∂yk (n)
P
j k j
∂E P 1 (v (n))w (n) and y (n) = φ(v (n))
∂yj (n)
= k ek (n)φ k kj j j
∂E(n) ∂y (n)
So, δj (n) = − ∂y (n) . ∂vj (n) = φ1 (vj (n) k δk (n)wkj (n)
P
j j

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 13
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Back Propagation Algorithm- Summary


The local gradient for the output neuron is:
δk (n) = ek (n)φ1 (vk (n))

The weight updating rule for the output neuron is:


wji (n + 1) = wji (n) + αwji (n − 1)δk (n)yj (n)

The local gradientPfor the hidden neuron is:


δj (n) = φ1 (vj (n) k δk (n)wkj (n)

The weight updating rule for the hidden neuron is:


wji (n + 1) = wji (n) + αwji (n − 1)δj (n)yi (n)

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 14
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Properties of BP
Advantageous and Limitations of BP
The back propagation algorithm applies
a correction δwij (n) to the synaptic
weight wij (n) which is proportional to
∂Error
∂w
.
ij

Advantageous:
Relatively simple implementation
A Standard method and generally works
well
Limitations:
Slow and inefficient
It can get stuck in local minima resulting in
sub-optimal solutions

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 15
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
Back Propagation Algorithm

Properties of BP - Momentum α

The effect of Learning rate η


The learning rate coefficient determines
the size of the weight adjustments made at
each iteration and hence influences the
rate of convergence.
Poor choice of the coefficient constant can
Add percentage of the last result in a failure in convergence.
movement to the current If the learning rate is too large, the search
movement path will oscillate and converges more
Useful to get over small bumps in slowly than a direct descent.
the error function (i.e To smooth If the coefficient is too small, the descent
out the descent path by preventing will progress in a small steps significantly
extreme changes in the gradients increasing the time to converge.
due to local anomalies).
Often finds a minimum in less
steps
Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,
Artificial Neural Network (ANN) Slide 16
Department of Computational Mathematics
Faculty of Information Technology
Multilayer Perceptron Model University of Moratuwa
BP with Minibatch Learning

Minibatch Learning
Noise Structure

Subha Fernando Dr.Eng, M.Eng, B.Sc(Special)Hons.,


Artificial Neural Network (ANN) Slide 17

You might also like