You are on page 1of 50

Chapter 5

Artificial Neural Networks


What are Artificial Neural Networks?
• Models of the brain and nervous system
• Highly parallel
– Process information much more like the brain than a serial
computer
• Learning

• Very simple principles


• Very complex behaviours

• Applications
– As powerful problem solvers
2
– As biological models
Biological Background
• Biological Neuron consists
of: Neural activation :
– Cell body  Throught dendrites/axon
– Dendrites  Synapses have different
strengths
– Axon
– Synapses

3
A neuron in maths
• Artificial neurons are much simpler than the biological
neuron
• Weighted average of inputs. If the average is above a
threshold T it fires (outputs 1) else its output is 0 or -1.

4
A simple Decision
• Suppose you want to decide whether you are
going to attend a wedding this upcoming
weekend.
• There are three variables that go into your
decision:
– Is the weather good?
– Does your friend want to go with you?
– Is it near public transportation?
• We’ll assume that answers to these questions
are the only factors that go into your
decision. 5
A simple Decision
• If we write the answers to these question as
binary variables xi, with zero being the
answer ‘no’ and one being the answer ‘yes’:
– Is the weather good? X1
– Does your friend want to go with you? X2
– Is it near public transportation? X3
• Now, what is an easy way to describe the
decision statement resulting from these
inputs.
6
A simple Decision
• We could determine weights wi
indicating how important each feature is
to whether you would like to attend. We
can then see if:
x1 · w1 + x2 · w2 + x3 · w3 ≥ threshold

• For some pre-determined threshold.


• If this statement is true, we would attend
the festival, and otherwise we would not. 7
A simple Decision
• For example, if we really hated bad weather
but care less about going with our friend and
public transit, we could pick the weights 6, 2
and 2
• With a threshold of 5, this causes us to go if
and only if the weather is good.
• What happens if the threshold is decreased to
3? What about if it is decreased to 1?

8
A simple Decision
• If we define a new binary variable y that
represents whether we go to the festival, we
can write this variable as:

• Is this starting to look familiar yet?

9
A simple Decision
• Now, if I rewrite this in terms of a dot product
between the vector of all binary inputs
(x), a vector of weights (w), and change the
threshold to the negative bias (b), we have:

10
Perceptron
• Linear Threshold Unit (LTU)
• We can graphically represent this decision
algorithm as an object that takes n binary
inputs and produces a single binary output:
x0=1
x1 w1
w0
w2
x2 o
.
 n

.  wi xi
wn i=0
. n
xn 1 if  wi xi >0
o(xi)= { i=0
-1 otherwise 11
Example - Perceptrons
For AND
-1 A B Output
W=?
00 0
01 0
x t = 0.0
W=? 10 0
11 1
W=?
y

•What are the weight values?


•Initialize with random weight values

12
Example - Perceptrons
For AND
-1 A B Output
W = 0.3
00 0
01 0
x t = 0.0
W = 0.5 10 0
11 1
W = -0.4
y

I1 I 2 I 3 Summation Output
-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0

13
Perceptron Training
• Training a network - The process of modifying
the weights in the connections between
network layers with the objective of achieving
the expected output.
• This is achieved through
– Supervised learning
– Unsupervised learning
– Reinforcement learning

14
Perceptron Training
• Learning Procedure
• Randomly assign weights (between 0-1)
• Present inputs from training data
• Get output O, nudge weights to gives results toward our
desired output T
• Repeat; stop when no errors, or enough epochs completed
:
2 .5

.3  =-1
1

2(0.5) + 1(0.3) + -1 = 0.3 , Output O =1


Learning algorithm
• Epoch - Presentation of the entire training set to the
neural network.
• In the case of the AND function an epoch consists of four
sets of inputs being presented to the network (i.e. [0,0],
[0,1], [1,0], [1,1])

• Error - The error value is the amount by which the


value output by the network differs from the target
value.
• For example, if we required the network to output 0 and it
output a 1, then Error = -1

16
Learning algorithm
• Target Value (T) - a value that we require the
network to produce.
• For example, if we present the network with [1,1] for the
AND function the target value will be 1
• Output (O) - The output value from the neuron
• Ij - Inputs being presented to the neuron
• Wj - Weight from input neuron (Ij) to the output
neuron
• Learning rate - This dictates how quickly the
network converges.
• It is set by a matter of experimentation. It is typically 0.1
17
Learning algorithm
• Initialize values of weights
• Apply training instances and get output
• Update weights according to the update rule:
n : learning rate
t : target output
o : observed output

• Repeat till converges

• Can represent linearly separable functions only

18
Perception Training
2 .5
wi (t  1)  wi (t )  wi (t )
1 .3  =-1
wi (t )  (T  O ) I i

Weights include Threshold. T=Desired, O=Actual output.

Example: T=0, O=1, W1=0.5, W2=0.3, I1=2, I2=1,Theta=-1


w1 (t  1)  0.5  (0  1)( 2)  1.5
w2 (t  1)  0.3  (0  1)(1)  0.7
w (t  1)  1  (0  1)(1)  2
If we present this input again, we’d output 0 instead
Use of Perceptron Network
• Generally used to learn how to make classifications
• Say you have collected some data regarding the
diagnosis of patients with heart disease
– Age, Sex, Chest Pain Type, Resting BPS, Cholesterol, …,
Diagnosis (<50% diameter narrowing, >50% diameter
narrowing)

– 67,1,4,120,229,…, 1
– 37,1,3,130,250,… ,0
– 41,0,2,130,204,… ,0

• Train network to predict heart disease of new patient


Classifying using Perceptrons
• Can add learning rate to speed up the learning process;
– just multiply in with delta computation
• Essentially a linear discriminant
• Perceptron Theorem: If a linear discriminant exists that
can separate the classes without error, the training
procedure is guaranteed to find that line or plane.
Class1 Class2
What can perceptrons represent?

AND XOR
Input 1 0 0 1 1 0 0 1 1
Input 2 0 1 0 1 0 1 0 1
Output 0 0 0 1 0 1 1 0

22
AND and OR Linear Separators

• Functions which can be separated in this way are


called Linearly Separable
• Only linearly Separable functions can be represented
by a perceptron
Exclusive Or (XOR) Problem
1 0

Input: 0,0 Output: 0


Input: 0,1 Output: 1
Input: 1,0 Output: 1
Input: 1,1 Output: 0

0 1
XOR Problem: Not Linearly Separable!

• We could however construct multiple layers of


perceptrons to get around this problem.
• A typical multi-layered system minimizes LMS Error,
Network of Perceptrons
• Its easy to build a network where the output from some
perceptrons are used in the inputs of other
perceptrons:

• Notice that some perceptrons have multiple output


arrows, even though we have defined them as having
only one output.
• This is only meant to indicate that a single output is
25
being sent to multiple new perceptrons.
Network of Perceptrons
• The input and outputs are typically represented as their
own neurons, with the other neurons named hidden
layers

• Feedforward network - only sends signals in one


direction 26
Sigmoid Neuron
• Shortcoming of a perceptron - a small change
in the input values can cause a large change
the output because each node (or neuron)
only has two possible states: 0 or 1.
• A better solution would be to output a
continuum of values, say any number
between 0 and 1.
• Basis for multilayer feedforward networks

27
Sigmoid Neuron
• As one option, we could simply have the neuron
emit the value:

• For a particularly positive or negative value of x·w


+ b, the result will be nearly the same as with the
perceptron (i.e., near 0 or 1).
• For values close to the boundary of the separating
hyperplane, values near 0.5 will be emitted.
28
Activation Functions
• Activation function – is the choice of
what function to use to go from x·w + b
to an output
• Benefits of activation function - able to
easily take derivatives and the interpret
them using logistic regression.
• One can use different functions to obtain
different models.
29
Activation Functions
• Tree most common choices:
– 1) Step function
– 2) Sign function
– 3) Sigmoid function
• An output of 1 represents firing of a neuron
down the axon.

30
Activation Functions
Example – Compute Hidden

• 2 input values
• Hidden unit computation

32
Compute Output

• Output unit computation

33
Back propagation training

• Computed output: y = .76


• Correct output: t = 1.0
• How do we adjust the weights?

34
Key Concepts
• Gradient descent
– error is a function of the weights
– we want to reduce the error
– gradient descent: move towards the error minimum
– compute gradient -> get direction to the error
minimum
– adjust weights towards direction of lower error
• Back-propagation
– first adjust last set of weights
– propagate error back to each previous layer
35
– adjust their weights
Gradient Descent

36
Gradient Descent

37
Derivative of Sigmoid

38
Our Example

39
Hidden Layer Updates

40
Initialization of Weights

41
Applications
• The properties of neural networks
define where they are useful.
– Can learn complex mappings from
inputs to outputs, based solely on samples
– Difficult to analyse: firm predictions
about neural network behaviour
difficult;
• Unsuitable for safety-critical applications.
– Require limited understanding from
trainer, who can be guided by heuristics.

42
ALVINN
Drives 70 mph on a public highway

30 outputs
for steering
30x32 weights
4 hidden
into one out of
units
four hidden
30x32 pixels unit
as inputs
43
Signature recognition
• Each person's signature is different.
• There are structural similarities
which are difficult to quantify.
• One company has manufactured a
machine which recognizes signatures
to within a high level of accuracy.
– Considers speed in addition to gross shape.
– Makes forgery even more difficult.

44
Stock market prediction

• “Technical trading” refers to trading based


solely on known statistical parameters; e.g.
previous price
• Neural networks have been used to attempt
to predict changes in prices.
• Difficult to assess success since companies
using these techniques are reluctant to
disclose information.

45
Mortgage Assessment

• Assess risk of lending to an individual.


• Difficult to decide on marginal cases.
• Neural networks have been trained to make
decisions, based upon the opinions of
expert underwriters.
• Neural network produced a 12% reduction
in delinquencies compared with human
experts.

46
Neural Network Problems
• Many Parameters to be set
• Overfitting
• Long training times
• ...

47
Parameter setting

• Number of layers
• Number of neurons
• too many neurons, require more training time
• Learning rate
• from experience, value should be small ~0.1
• Momentum term
• ..

48
Over-fitting
• With sufficient nodes can classify any
training set exactly
• May have poor generalisation ability.
• Cross-validation with some patterns
– Typically 30% of training patterns
– Validation set error is checked for each epoch
– Stop training if validation error goes up

49
Training time

• How many epochs of training?


– Stop if the error fails to improve (has reached a
minimum)
– Stop if the rate of improvement drops below a
certain level
– Stop if the error reaches an acceptable level
– Stop when a certain number of epochs have
passed

50

You might also like