You are on page 1of 28

Neural Nets Using

Backpropagation

Chris Marriott
Ryan Shirley
CJ Baker
Thomas Tannahill

Agenda

Review of Neural Nets and Backpropagation


Backpropagation: The Math
Advantages and Disadvantages of Gradient
Descent and other algorithms
Enhancements of Gradient Descent
Other ways of minimizing error

Review

Approach that developed from an analysis of


the human brain
Nodes created as an analog to neurons
Mainly used for classification problems (i.e.
character recognition, voice recognition,
medical applications, etc.)

Review

Neurons have weighted inputs, threshold


values, activation function, and an output

Weighted inputs

Output

Activationfunction=f( (inputs*weight))

Review
4InputAND

Inputs

Threshold=1.5
Outputs

Threshold=1.5
Inputs
Threshold=1.5
Allweights=1andalloutputs=1ifactive0otherwise

Review

Output space for AND gate


Input1
(1,1)

(0,1)

1.5=w1*I1+w2*I2

(0,0)

(1,0)

Input2

Review

Output space for XOR gate


Demonstrates need for hidden layer
Input1
(0,1)

(1,1)

Input2
(0,0)

(1,0)

Backpropagation: The Math

General multi-layered neural network


OutputLayer

0
X0,0

HiddenLayer

1
W1,0

W0,0

Wi,0

i
InputLayer

9
X9,0

X1,0

Backpropagation: The Math

Backpropagation

Calculation of hidden layer activation values

Backpropagation: The Math

Backpropagation

Calculation of output layer activation values

Backpropagation: The Math

Backpropagation

Calculation of error

k = f(Dk) -f(Ok)

Backpropagation: The Math

Backpropagation

Gradient Descent objective function

Gradient Descent termination condition

Backpropagation: The Math

Backpropagation

Output layer weight recalculation

Learning Rate
(eg. 0.25)

Error at k

Backpropagation: The Math

Backpropagation

Hidden Layer weight recalculation

Backpropagation Using Gradient


Descent

Advantages

Relatively simple implementation


Standard method and generally works well

Disadvantages

Slow and inefficient


Can get stuck in local minima resulting in suboptimal solutions

Local Minima

Local
Minimum

GlobalMinimum

Alternatives To Gradient Descent

Simulated Annealing

Advantages

Can guarantee optimal solution (global minimum)

Disadvantages

May be slower than gradient descent


Much more complicated implementation

Alternatives To Gradient Descent

Genetic Algorithms/Evolutionary Strategies

Advantages

Faster than simulated annealing


Less likely to get stuck in local minima

Disadvantages

Slower than gradient descent


Memory intensive for large nets

Alternatives To Gradient Descent

Simplex Algorithm

Advantages

Similar to gradient descent but faster


Easy to implement

Disadvantages

Does not guarantee a global minimum

Enhancements To Gradient
Descent

Momentum

Adds a percentage of the last movement to the


current movement

Enhancements To Gradient
Descent

Momentum

Useful to get over small bumps in the error function


Often finds a minimum in less steps
w(t) = -n*d*y + a*w(t-1)

w is the change in weight


n is the learning rate
d is the error
y is different depending on which layer we are calculating
a is the momentum parameter

Enhancements To Gradient
Descent

Adaptive Backpropagation Algorithm

It assigns each weight a learning rate


That learning rate is determined by the sign of the gradient
of the error function from the last iteration

If the signs are equal it is more likely to be a shallow slope so


the learning rate is increased
The signs are more likely to differ on a steep slope so the
learning rate is decreased

This will speed up the advancement when on gradual


slopes

Enhancements To Gradient
Descent

Adaptive Backpropagation

Possible Problems:

Since we minimize the error for each weight separately


the overall error may increase

Solution:

Calculate the total output error after each adaptation


and if it is greater than the previous error reject that
adaptation and calculate new learning rates

Enhancements To Gradient
Descent

SuperSAB(Super Self-Adapting Backpropagation)

Combines the momentum and adaptive methods.


Uses adaptive method and momentum so long as the sign
of the gradient does not change

This is an additive effect of both methods resulting in a faster


traversal of gradual slopes

When the sign of the gradient does change the momentum


will cancel the drastic drop in learning rate

This allows for the function to roll up the other side of the
minimum possibly escaping local minima

Enhancements To Gradient
Descent

SuperSAB

Experiments show that the SuperSAB converges


faster than gradient descent
Overall this algorithm is less sensitive (and so is less
likely to get caught in local minima)

Other Ways To Minimize Error

Varying training data

Add noise to training data

Cycle through input classes


Randomly select from input classes
Randomly change value of input node (with low
probability)

Retrain with expected inputs after initial training

E.g. Speech recognition

Other Ways To Minimize Error

Adding and removing neurons from layers

Adding neurons speeds up learning but may cause


loss in generalization
Removing neurons has the opposite effect

Resources

Artifical Neural Networks, Backpropagation, J.


Henseler
Artificial Intelligence: A Modern Approach, S.
Russell & P. Norvig
501 notes, J.R. Parker
www.dontveter.com/bpr/bpr.html
www.dse.doc.ic.ac.uk/~nd/surprise_96/journal/
vl4/cs11/report.html