Professional Documents
Culture Documents
Sing Layer Perc
Sing Layer Perc
• This seminal paper pointed out that simple artificial “neurons” could
be made to perform basic logical operations such as AND, OR and
NOT.
1 1 1
sum output
inputs output
Nervous Systems as Logical Circuits
Groups of these “neuronal” logic gates could carry out any computation, even
though each neuron was very limited.
1 1 1
sum output
inputs output
The Perceptron
Frank Rosenblatt (1962). Principles of Neurodynamics, Spartan,
New York, NY.
sum output
*
Linear
neuron
150 50 100
2 5 3
portions portions portions
of fish of chips of beer
A model of the cashier’s brain
with arbitrary initial weights
Price of meal = 500 • Residual error = 350
• The learning rule is:
wi xi ( y yˆ )
• With a learning rate of
1/35, the weight changes are
50 50 50 +20, +50, +30
• This gives new weights of
70, 100, 80
2 5 3 • Notice that the weight for
portions of portions of portions of chips got worse!
fish chips beer
Behavior of the iterative learning procedure
cases:
E yˆ n En
• Now differentiate to get error wi
1
2 n w yˆ
i n
derivatives for weights
xi ,n ( yn yˆ n )
n
E w1
w2
Online versus batch learning
w1 w1
constraint from
w2 training case 2
w2
Adding biases
• A linear neuron is a more yˆ b xi wi
flexible model if we include
a bias. i
• We can avoid having to
figure out a separate
learning rule for the bias by
using a trick:
– A bias is exactly
equivalent to a weight on b w1 w2
an extra input line that
always has an activity of
1. 1 x1 x2
Preprocessing the input vectors
e x ex
hyperbolic tangent: f ( x) x x
e e
Healthcare Applications of ANNs
• Predicting/confirming myocardial infarction, heart
attack, from EKG output waves
– Physicians had a diagnostic sensitivity and
specificity of 73.3% and 81.1% while ANNs
performed 96.0% and 96.0%
• Identifying dementia from EEG patterns,
performed better than both Z statistics and
discriminant analysis; better than LDA for (91.1%
vs. 71.9%) in classifying with Alzheimer disease.
• Papnet: A Pap Smear screening system by
Neuromedical Systems in used by US FDA
• Predict mortality risk of preterm infants, screening
tool in urology, etc.
Classification Applications of ANNs
X1 X2 Output
0 0 0
0 1 1
1 0 1
1 1 0
The Fall of the Perceptron
Marvin Minsky & Seymour Papert (1969). Perceptrons, MIT Press, Cambridge, MA.
Successful
Footballers
Academics
Few Many
Hours in Hours in …despite the simplicity of
the Gym the Gym their relationship:
per per
Academics = Successful
Week Week
XOR Gym
Unsuccessful
In this example, a perceptron would not be able to discriminate between the
footballers and the academics…
This failure caused the majority of researchers to walk away.
The simple XOR example masks a deeper problem ...
1. 2. 3. 4.
There is no doubt that Minsky and Papert's book was a block to the funding of
research in neural networks for more than ten years.
The book was widely interpreted as showing that neural networks are basically
limited and fatally flawed.
What IS controversial is whether Minsky and Papert shared and/or promoted this
belief ?
Following the rebirth of interest in artificial neural networks, Minsky and Papert
claimed that they had not intended such a broad interpretation of the conclusions
they reached in the book Perceptrons.
z xi wi
i
1
1 if z
y y
0
0 otherwise
threshold z
The perceptron convergence procedure
w1 w2 , 0 0,0 1,0
w1 , w2 The positive and negative cases
cannot be separated by a plane
What can perceptrons do?
v t Bv
v tWv
between
within within
Fisher Linear Discrimination
Suppose a case two classes
1 1
• ‘Scatterness’ of the projected samples: mi
n yYi
y
n xXi
w t x w t mi
• Criterion function:
si 2 (y m )
yYi
i
2
2
m1 m2
J v
s12 s 22
Fisher Linear Discrimination
Wi ( x m )( x m )
x X i
i i
t
W W1 W2
si2 (v x v m )
xX i
t t
i
2
v ( x m )( x m ) v v W v
xX i
t
i i
t t
i
Bv Wv
Fisher Linear Discrimination
26
Temperature
24
20 22
20
30
40
20
30
10 20
0 10
0 10 20
0 0
Given examples
Predict given a new point
Linear regression
40
26
Temperature
24
20 22
20
30
40
20
30
10 20
0 10
0 20
0 0
Prediction Prediction
Ordinary Least Squares (OLS)
Error or “residual”
Observation
Prediction
0
0 20
Linear equation
Linear system
Alternative derivation
where
Online algorithm
Beyond lines and planes
40
20
still linear in
0
0 10 20
20
10
400
0 300
-10 200
100
0
10
20
0
[Matlab demo]
Ordinary Least Squares [summary]
Given examples
Let
For example
Let n
Minimize by solving
Predict
Probabilistic interpretation
0
0 20
Likelihood