You are on page 1of 27

Backpropagation Learning in

Neural networks

1
Three-layer networks
x1
x2

Input Output

xn

Hidden layers
2
Properties of architecture

• No connections within a layer


• No direct connections between input and output layers
• Fully connected between layers
• Often more than 3 layers
• Number of output units need not equal number of input units
• Number of hidden units per layer can be more or less than
input or output units

Each unit is a perceptron


Each layer receives activations from
previous layer and sends activation to the
next layer. Different activation functions for
different layers as well as different neurons
in same layer

3
 nH  d  
gk ( x )  zk  f   wkj f   w ji xi  w j 0   wk 0  (1)
 j 1  i 1  
(k  1,..., c)

Multiple-Layer Networks
(A. Kolmogorov)
“Any continuous function from input to output can be
implemented in a three-layer net, given sufficient number of hidden
units proper nonlinearities, and weights.”

The power of backpropagation is that it enables us to compute an


effective error for each hidden unit, and thus derive a learning rule for
the input-to-hidden weights.
What do each of the layers do?

3rd layer can generate


1st layer draws 2nd layer combines arbitrarily complex
linear boundaries the boundaries boundaries 5
Backpropagation
yj Backward step:
dj propagate errors from
output to hidden layer
wjk

xk
dk

wki
Forward step:
xi Propagate activation
from input to output
layer
The idea behind backpropagation
• We don’t know what the hidden units ought to
do, but we can compute how fast the error
changes as we change a hidden activity.
– Instead of using desired activities to train the hidden
units, use error derivatives w.r.t. hidden activities.
– Each hidden activity can affect many output units and
can therefore have many separate effects on the
error. These effects must be combined.
– We can compute error derivatives for all the hidden
units efficiently.
– Once we have the error derivatives for the hidden
activities, its easy to get the error derivatives for the
weights going into a hidden unit.
Backpropagation Algorithm
• Initialize each wi to some small random value
• Until the termination condition is met, Do
– For each training example <(x1,…xn),t> Do
• Input the instance (x1,…,xn) to the network and compute the
network outputs yk
• For each output unit k
– dk=yk(1-yk)(tk-yk)
• For each hidden unit h
– dh=yh(1-yh) k wh,k dk
• For each network weight wi,j Do
• wi,j=wi,j+wi,j where
wi,j=  dj xi,j
Illustrating Backpropagation
Note: Input nodes do not have activation funtions
Hidden nodes are not directly connected to the incoming data or to the eventual output.
if there is one hidden layer, there are two processing layers, the hidden layer and output
layer….and so on..
The non-linear processing
Backpropagation
Symbols w(xm)n represent weights of connections between network input xm and
neuron n in input layer. Symbols yn represents output signal of neuron n.
Backpropagation
Backpropagation
Backpropagation
Propagation of signals through the hidden layer. Symbols wmn represent weights
of connections between output of neuron m and input of neuron n in the next
layer.
Backpropagation
Backpropagation
Backpropagation
Propagation of signals through the output layer.
Backpropagation
In the next algorithm step the output signal of the network y is
compared with the desired output value (the target), which is found in
training data set. The difference is called error signal d of output layer
neuron
Learning Algorithm:
Backpropagation
The idea is to propagate error signal d (computed in single teaching step)
back to all neurons, which output signals were input for discussed
neuron.
Learning Algorithm:
Backpropagation
The idea is to propagate error signal d (computed in single teaching step)
back to all neurons, which output signals were input for discussed
neuron.
Learning Algorithm:
Backpropagation
The weights' coefficients wmn used to propagate errors back are equal to
this used during computing output value. Only the direction of data flow
is changed (signals are propagated from output to inputs one after the
other). This technique is used for all network layers. If propagated errors
came from few neurons they are added. The illustration is below:
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
The decision boundary initial
Initial random weights
The decision boundary after training
How NN is different from SVM
NNs use nonlinear f(x) so they SVMs only draw straight lines,
can draw complex boundaries, but they transform the data
first
but keep the data unchanged in a way that makes that OK

You might also like