Professional Documents
Culture Documents
Machine Learning:
Algorithms and Applications
Floriano Zini
Neural Networks
(continued)
1
17/04/12
Perceptron
A perceptron is the simplest type of ANN
= sgn ( w • x )
Perceptron learning
Given a training set X={(x,outx)}
x is the input vector
outx is the desired output value (i.e., -1 or 1)
Determine
a weight vector w that makes the perceptron
produce the correct output (-1 or 1) for every training
instance
The hypothesis space is H = {w | w ! "n+1 }
How to learn: idea
If a training instance x is correctly classified, then no (weight) update
is needed
If outx=1 but the perceptron
n
outputs -1, then the weight w should be
updated so that ! wi xi is increased
i=0
If outx=-1 but then perceptron outputs 1, then the weight w should be
updated so that ! wi xi is decreased
i=0
2
17/04/12
Examples
• x is correctly classified, outx–ox=0
à no update
• ox=-1 but outx=1, outx-ox>0 • ox=1, but outx=-1, outx-ox<0
à wi is increased if xi>0, à wi is decreased if xi>0,
decreased otherwise increased otherwise
à wx is increased à wx is decreased
A perceptron
cannot correctly
classify this
training set!
3
17/04/12
o = w•x
4
17/04/12
Gradient descent
The gradient ∇E of error E is a vector that
specifies the direction that produces the steepest increase in E
1 !
= $ 2 (outx " ox ) !w (outx " ox ) =
2 x#X i
!
= $ (out x " ox )
!wi
(outx " w • x )
x#X
!E
= $ ( outx " ox ) ("xix )
!wi x#X
where xix denotes the i-th input of the linear unit for example x
The variation of the i-th weight is
!wi = ! $ ( outx " ox ) % xix , i = 0,…, n
x#X
5
17/04/12
Gradient_descent (X, η)
Initialize w (wi ← an initial (small) random value)
while stopping criterion not satisfied
initialize each Δwi to 0 • Stopping criterion: # of iterations
for each training instance (x,outx) ∈ X (epochs), threshold error, etc.
compute the network output ox • This algorithm can be applied
for each weight wi whenever
Δwi ← Δwi + η(outx-ox)xi
• The hypothesis space contains
end for
continuously parameterized
end for hypothesis (i.e., the weights in a
for each weight wi linear unit)
wi ← wi + Δwi • The error can be differentiated
end for with respect to these hypothesis
end while parameters
return w
Stochastic_gradient_descent (X, η)
Initialize w (wi ← an initial (small) random value) Stochastic gradient
while stopping criterion not satisfied descent can sometimes
initialize each Δwi to 0
for each training instance (x,outx) ∈ X avoid falling into local
compute the network output ox minima because it uses
for each weight wi the various gradients of E
wi ← wi + η(outx-ox)xi
rather than overall
end for
end for gradient of E
end while
return w Delta rule
6
17/04/12
7
17/04/12
the input is two numerical parameters F1 and F2 obtained from a spectral analysis of the
sound
1
! (x) =
1+ e! x sigmoid function
d! (x)
= ! (x) (1! ! (x))
nice property:
dx the derivative is easily expressed
8
17/04/12
Backpropagation algorithm
The BP learning algorithm is used to learn the weights of a
multi-layer ANN having fixed structure (i.e., fixed set of
neurons and interconnections)
1
E(w) ! $ $ (outix " oix )2
2 x#X i#outputs
Error surface can have multiple local minima
Backpropagation algorithm
Backpropagation consists of two phases
Signal forward phase
the input signals (i.e., the input vector) are propagated forwards from the input layer
to the output layer (through the hidden layers) and output is calculated
Starting at the output layer, the error is propagated backwards through the network,
layer by layer, to the input layer
The error backpropagation is performed by recursively computing the local gradient of
each neuron. Weights are updated.
9
17/04/12
Backpropagation algorithm
Let’s use this 3-layer ANN to
illustrate the details of the BP Input xj x1 xj xm
... ...
learning algorithm (j=1..m)
m input signals xj (j=1..m) wqj
l hidden neurons zq (q=1..l)
Hidden
n output neurons yi (i=1..n) neuron zq ... ...
oq
wqjis the weight of the (q=1..l)
interconnection from input wiq
signal xj to hidden neuron zq
wiqis the weight of the Output ... ...
neuron yi
interconnection from hidden
(i=1..n)
neuron zq to output neuron yi oi
oqis the output value of
hidden neuron zq
oi
is the output of the output
neuron yi
Back-propagation algorithm
Backpropagation (X, η)
Initialize all network weights to some small random value
(e.g., between -0.05 and 0.05)
Until the termination condition is met, do
For each training example <(x1,…xm),out>, do
Propagate the input forward through the network
1. Input the instance (x1,…,xm) to the network and compute the network
outputs oi
Propagate the errors backward through the network
2. For each output unit yi, calculate its error term
δißoi(1-oi)(outi-oi)
3. For each hidden unit zq, calculate its error term
δqßoq(1-oq) Σi wqi δi
4. For each network weight wjk do
wjkßwjk+Δwjk where Δwjk= η δj xjk
o(1-o): derivative of sigmoid
10
17/04/12
11
17/04/12
BP algorithm – generalization
12
17/04/12
Adding Momentum
13
17/04/12
14
17/04/12
15
17/04/12
16
17/04/12
Advantages
Disadvantages
17