Professional Documents
Culture Documents
• Applications
– As powerful problem solvers
2
– As biological models
Biological Background
• Biological Neuron consists
of: Neural activation :
– Cell body Throught dendrites/axon
– Dendrites Synapses have different
strengths
– Axon
– Synapses
3
A neuron in maths
• Artificial neurons are much simpler than the biological
neuron
• Weighted average of inputs. If the average is above a
threshold T it fires (outputs 1) else its output is 0 or -1.
4
A simple Decision
• Suppose you want to decide whether you are
going to attend a wedding this upcoming
weekend.
• There are three variables that go into your
decision:
– Is the weather good?
– Does your friend want to go with you?
– Is it near public transportation?
• We’ll assume that answers to these questions
are the only factors that go into your
decision. 5
A simple Decision
• If we write the answers to these question as
binary variables xi, with zero being the
answer ‘no’ and one being the answer ‘yes’:
– Is the weather good? X1
– Does your friend want to go with you? X2
– Is it near public transportation? X3
• Now, what is an easy way to describe the
decision statement resulting from these
inputs.
6
A simple Decision
• We could determine weights wi
indicating how important each feature is
to whether you would like to attend. We
can then see if:
x1 · w1 + x2 · w2 + x3 · w3 ≥ threshold
8
A simple Decision
• If we define a new binary variable y that
represents whether we go to the festival, we
can write this variable as:
9
A simple Decision
• Now, if I rewrite this in terms of a dot product
between the vector of all binary inputs
(x), a vector of weights (w), and change the
threshold to the negative bias (b), we have:
10
Perceptron
• Linear Threshold Unit (LTU)
• We can graphically represent this decision
algorithm as an object that takes n binary
inputs and produces a single binary output:
x0=1
x1 w1
w0
w2
x2 o
.
n
. wi xi
wn i=0
. n
xn 1 if wi xi >0
o(xi)= { i=0
-1 otherwise 11
Example - Perceptrons
For AND
-1 A B Output
W=?
00 0
01 0
x t = 0.0
W=? 10 0
11 1
W=?
y
12
Example - Perceptrons
For AND
-1 A B Output
W = 0.3
00 0
01 0
x t = 0.0
W = 0.5 10 0
11 1
W = -0.4
y
I1 I 2 I 3 Summation Output
-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
13
Perceptron Training
• Training a network - The process of modifying
the weights in the connections between
network layers with the objective of achieving
the expected output.
• This is achieved through
– Supervised learning
– Unsupervised learning
– Reinforcement learning
14
Perceptron Training
• Learning Procedure
• Randomly assign weights (between 0-1)
• Present inputs from training data
• Get output O, nudge weights to gives results toward our
desired output T
• Repeat; stop when no errors, or enough epochs completed
:
2 .5
.3 =-1
1
16
Learning algorithm
• Target Value (T) - a value that we require the
network to produce.
• For example, if we present the network with [1,1] for the
AND function the target value will be 1
• Output (O) - The output value from the neuron
• Ij - Inputs being presented to the neuron
• Wj - Weight from input neuron (Ij) to the output
neuron
• Learning rate - This dictates how quickly the
network converges.
• It is set by a matter of experimentation. It is typically 0.1
17
Learning algorithm
• Initialize values of weights
• Apply training instances and get output
• Update weights according to the update rule:
n : learning rate
t : target output
o : observed output
18
Perception Training
2 .5
wi (t 1) wi (t ) wi (t )
1 .3 =-1
wi (t ) (T O ) I i
– 67,1,4,120,229,…, 1
– 37,1,3,130,250,… ,0
– 41,0,2,130,204,… ,0
AND XOR
Input 1 0 0 1 1 0 0 1 1
Input 2 0 1 0 1 0 1 0 1
Output 0 0 0 1 0 1 1 0
22
AND and OR Linear Separators
0 1
XOR Problem: Not Linearly Separable!
27
Sigmoid Neuron
• As one option, we could simply have the neuron
emit the value:
30
Activation Functions
Example – Compute Hidden
• 2 input values
• Hidden unit computation
32
Compute Output
33
Back propagation training
34
Key Concepts
• Gradient descent
– error is a function of the weights
– we want to reduce the error
– gradient descent: move towards the error minimum
– compute gradient -> get direction to the error
minimum
– adjust weights towards direction of lower error
• Back-propagation
– first adjust last set of weights
– propagate error back to each previous layer
35
– adjust their weights
Gradient Descent
36
Gradient Descent
37
Derivative of Sigmoid
38
Our Example
39
Hidden Layer Updates
40
Initialization of Weights
41
Applications
• The properties of neural networks
define where they are useful.
– Can learn complex mappings from
inputs to outputs, based solely on samples
– Difficult to analyse: firm predictions
about neural network behaviour
difficult;
• Unsuitable for safety-critical applications.
– Require limited understanding from
trainer, who can be guided by heuristics.
42
ALVINN
Drives 70 mph on a public highway
30 outputs
for steering
30x32 weights
4 hidden
into one out of
units
four hidden
30x32 pixels unit
as inputs
43
Signature recognition
• Each person's signature is different.
• There are structural similarities
which are difficult to quantify.
• One company has manufactured a
machine which recognizes signatures
to within a high level of accuracy.
– Considers speed in addition to gross shape.
– Makes forgery even more difficult.
44
Stock market prediction
45
Mortgage Assessment
46
Neural Network Problems
• Many Parameters to be set
• Overfitting
• Long training times
• ...
47
Parameter setting
• Number of layers
• Number of neurons
• too many neurons, require more training time
• Learning rate
• from experience, value should be small ~0.1
• Momentum term
• ..
48
Over-fitting
• With sufficient nodes can classify any
training set exactly
• May have poor generalisation ability.
• Cross-validation with some patterns
– Typically 30% of training patterns
– Validation set error is checked for each epoch
– Stop training if validation error goes up
49
Training time
50