You are on page 1of 16

Multilayer neural networks

1
Objectives
• Multilayer neural networks

• Training a Neural Network with Backpropagation

• Backpropagation learning algorithm

• Solved example

2
Multilayer Neural Networks
• Multilayer neural network: it is a neural network containing more than
computational layers (hidden layers).
• The specific architecture of multilayers neural networks is called, feed-
forward networks
• The default architecture of feed-forward is that all nodes in one layer
are connected to those in the next layers.
• Bias neuron can be used in hidden layers or in output layer.
• A fully connected architecture perform well in many settings, but
better performance is often achieved by pruning many connections or
sharing them in an insightful way.

3
Notations of Multilayer NNs
Two notations:
We can describe
multilayer network in
two ways(notation):
- vector notation
- Scalar notation

4
Training NN with Backpropagation
• In multilayer NN the loss (refers to the error)is a complicated composition
function in the earlier weights.
• Loss tells us how poorly the model is performing at that current instant
• Now we need to use this loss to train our network such that it performs
better.
• By minimizing the loss, our model is going to perform better.
• How do we minimize the loss?
By using an optimization algorithm(types of optimizers will be discussed later ).
• A popular optimization algorithm is called “Gradient Descent”
• Gradient of a composition function is computed by “Backpropagation”
algorithm.

5
Gradient
• Gradient: is a graphical slope representing
the relationship between a network’s
weights and its error”.
• we can look to gradient as a measure of how
much the output of a function changes if you
change the inputs a little bit.

• Our aim is to get to the bottom of our


graph(Cost vs weights), or to get a point
(local minimum (least value of cost
function)) where we can no longer move
downhill.
6
Backpropagation
• Backpropagation: computes the error gradients as the summation of
local gradients products over the various paths from a node to the
output node.
• Backpropagation can be computed efficiently by dynamic programing.
• It contains two phases:
1- forward phase: compute the output values and local derivatives(of
loss function) at various nodes.
2- backward phase: accumulate the products(derivatives) of these
local values over the paths from the node to output node to learn the
gradients of loss function with respect to weights.

7
Major steps of backpropagation learning algo.
• The network is first initialized by setting up all its weights to be small
random numbers – say between –1 and +1.
• the output at various nodes are calculated (this is forward pass).
• The calculated output is completely different to what you want (the
Target), since all the weights are random.
• We then calculate the Error of each neuron, which is essentially:
=Target – Actual output.
• This error is then used mathematically to change the weights in such a way
that the error will get smaller.
• In other words, the Output of each neuron will get closer to its Target (this
part is the reverse pass.
• The process is repeated until the error is minimal.
• “The next slides show math. equations that control the process of updating
the weights”
8
Backpropagation (Cont.)

Eq. 3.1

9
Backpropagation (Cont.)
• Consider a sequence of hidden layers:

• The partial derivative of loss function with respect to weights is:

Based on the type of activation, we can define delta as follows:

Eq. 3.2

10
Backpropagation(Cont.)
• For more clarification, consider you have the following network:

• Based on “ sigmoid activation”, we can rewrite the final equations of


backpropagation as follows:

11
Eq. 3.3

12
Solved example
• It is important for you to execute and practice backpropagation algorithm
step by step.
• You can follow Equations 3.1 or 3.3
• Consider the following network

Assume that the neurons have a Sigmoid activation function and:


(Optionally , ignore learning rate or set it as 0.1)
(i) Perform a forward pass on the network.
(ii) Perform a reverse pass (training) once (Target = 0.5).
(iii) Perform a further forward pass
13
Solved example(Cont.)
Answer(i):
• Input to top neuron = (0.35*0.1)+(0.9*0.8)=0.755.
Out f(net)= 1 1+𝑒 −0.755 =0.68 (by sigmoid)
• Input to bottom neuron = (0.9*0.6)+(0.35*0.4) = 0.68.
Out f(net) =1 1+𝑒 −0.68 = 0.6637
Input to final neuron = (0.3*0.68)+(0.9*0.6637) = 0.80133.
Out = 1 1+𝑒 −0.801 =0.69
E=d – o = 0.5 – 0.69 = -0.19.

14
Solved example(Cont.)
Answer(ii):
1. Output error δ=(t-o)(1-o)o = (0.5 - 0.69)(1- 0.69)*0.69 = -0.0406
2. New weights for output layer
• w1+ = w1+(δ * input) = 0.3 + (-0.0406*0.68) = 0.272392.
• w2+ = w2+(δ * input) = 0.9 + (-0.0406*0.6637) = 0.87305.
3. Errors for hidden layers:
• δ1 = (δ * w1)* (1-o)o = -0.0406 * 0.272392 *(1-o)o = -2.406* 10−3
• δ2= (δ * w2)* (1-o)o = -0.0406 * 0.87305 * (1-o)o = -7.916* 10−3
4. New hidden layer weights:
• w3+= 0.1 + (-2.406 * 10−3 * 0.35) = 0.09916.
• w4+ = 0.8 + (-2.406 * 10−3 * 0.9) = 0.7978.
• w5+ = 0.4 + (-7.916 * 10−3 * 0.35) = 0.3972.
• w6+ = 0.6 + (-7.916 * 10−3 * 0.9) = 0.5928
15
Solved example(Cont.)
Answer(iii) :
• Old error was -0.19.
• When you execute forward pass again by using new weights,
the new error = -0.18205.
• “Try further forward pass by your self ”

16

You might also like