Professional Documents
Culture Documents
Ai NN PDF
Ai NN PDF
Neural Networks
13/09/2012 2
Historic Portsmouth
• Home of HMS
Victory, the Mary
Rose and HMS Warrior
• Historic Old
Portsmouth with
cobbled streets and
ancient buildings
• Birthplace of Charles
Dickens and home of
Sir Arthur Conan
Doyle
13/09/2012 4
Key Facts about the University
Established in 1869
30 academic departments
across five faculties
13/09/2012 7
School of Computing
13/09/2012 8
Outline
Introduction
The Perceptron
ADALINE example
Linear separability problem
Multi-Layer Perceptrons (MLP)
Backpropagation
13/09/2012 Dr. I. Jordanov 9
Essential texts (not mandatory)
• Robert Schalkoff, Intelligent Systems, Jones & Bartlett,
2011, ISBN-10: 0-7637-8017-0.
• Ethem Alpaydin, Machine Learning (2nd ed.), MIT, 2010,
ISBN: 978-0-262-01243-0.
• Stephen Marsland, Machine Learning. An Algorithmic
Perspective, CRC Press, 2009, ISBN: 978-1-4200-6718-7.
• Melanie Mitchell, An Introduction to GA, MIT; 2001,
ISBN: 0-262-13316-4.
• Kenneth De Jong, Evolutionary Computation, MIT,
2006, ISBN: 0-262-04194-4.
• H. Iba & N. Norman, New Frontier in Evolutionary
Algorithms, ICP, 2012, ISBN: 0-13-978-1-84816-681-3.
• George Luger, Artificial Intelligence (6th ed.), Pearson/
Addison-Wesley, 2009, ISBN: 0-13-209001-5.
(b)
x0 Summing Activation
w0 junction function
x1
Output
Iuput
w1
x2 O(x)
w2
…
wn
xn
Figure 10. A Perceptron.
-1 -1 -1 -1 +1
-1 -1 -1 -1 +1
-1 -1 +1 +1 +1
-1 -1 -1 -1 +1
-1 -1 -1 -1 +1
-1 +1 +1 +1 +1
-1 -1 -1 -1 +1
-1 -1 -1 -1 +1 -1 y9 9
+1 +1 +1 +1
-1
-1
-1
x35
Fig. 11(a). Network of simple perceptrons (to recognize the digits 0 to 9
- just some of the weight connections are shown). When number 3 is
presented, only the corresponding output neuron should fire.
Each weighted input is:
–1 * –1 = 1 (white), or 1 * 1 = 1 (black).
If the threshold (offset) is set to -34, than the weighted
sum, net = (35*1) - 34 = 1 (which is > 0), so the
output is +1 after passing through the signum function.
x2
x1
Fig. 13. A Linearly separable pattern classification problem: black points
belong to class1 and whites to class2 (e.g., ‘thin’ and ‘fat’ classes
for people: x2- body height; x1 – body weight).
13/09/2012 Dr. I. Jordanov 31
If we solve the equation f ( x ) 0 , i. e.,
w0x0 + w1x1 + w2x2 = 0, (bear in mind that x0=1),
we shall receive equation of a line:
w1 w0 w1 w0
x2 x1 , ( x2 ax1 b, a , b ).
w2 w2 w2 w2
x2
x1
-b/a
Fig. 13a. The two coefficients (weights) define the angle
between the line and x1 axes.
13/09/2012 Dr. I. Jordanov 32
The line equation will be equal to 0 for every point (x1, x2)
belonging to the line (less than 0 for every point below and
greater than 0 for every point above the line):
f ( x ) x2 ax1 b 0
The line location is determined by the three weights.
If a point (x1, x2), Fig. 13, lies on one side of the line
(f(x)>0), the perceptron will output 1, and if on the line,
or on the other side (f(x)<0), it will output 0 (assuming
the Heaviside step activation function is used).
x1
Fig. 14. Decision surfaces (line 1 and line 2) with two misclassified patterns.
The distances are given in blue. Line 2 is better than line 1 (smaller sum of
distances, but still not a solution). Line 3 (green) is one possible solution.
(Assume, e.g., class1 (o) –”fat people”; class2 (●) - “thin people”; (x1 –
body weight, x2 – body height).
13/09/2012 Dr. I. Jordanov 35
If we define perceptron objective function J (w), as a
sum of the distances (Fig. 14, drawn in blue) to the
decision surface of the misclassified input vectors (in
other words, the sum of the errors):
n
J ( w) wi xi w x
xX i 0 xX
: class 1, : class 2
x2 x1 x2 x1 XOR x2
1.5
0 0 0
1
0 1 1
input 2
0.5 1 0 1
1 1 0
0 x1
0 0.5 1 1.5
input 1
x0 1 w1
ij 1 h0 w2ij
x1 h1 y1
x2 y2
h2
yk
xn hm
x3 w3j
Node j
Fig. 17. A node j of a backpropagation MLP.
The weighted sum of the n
incoming inputs plus a bias
term gives the net input of a
xj xw
i 0
i ij
node j (Fig. 17):
13/09/2012 Dr. I. Jordanov 39
Backpropagation Learning
Algorithm
The BP algorithm (also known as generalized delta rule)
is used for supervised learning (when the outputs are
known for given inputs).
It has two passes for each input-output vector pair:
forward pass - in which the input propagates through
the net and an output is produced
(http://www.youtube.com/watch?v=G3-ppsCCNww&NR=1 );
Fig. 19. Using the data set (split sample) for training
and generalisation evaluation.
13/09/2012 Dr. I. Jordanov 43
Split sample (‘hold out’) method.
Validation
Training
Number of epochs
Fig. 21. Training with validation (to avoid overfitting).
Training Testing
1 ··· 9 10
Training Testing
…
1 ··· 9 10
Training Testing