Professional Documents
Culture Documents
Neural Networks • History traces back to the 50’s but became popular in the 80’s
with work by Rumelhart, Hinton, and Mclelland
– A General Framework for Parallel Distributed Processing in Parallel
Distributed Processing: Explorations in the Microstructure of
Cognition
• Peaked in the 90’s. Today:
– Hundreds of variants
– Less a model of the actual brain than a useful tool, but still some
debate
• Numerous applications
– Handwriting, face, speech recognition
– Vehicles that drive themselves
– Models of reading, sentence production, dreaming
• Debate for philosophers and cognitive scientists
– Can human consciousness or cognitive abilities be explained by a
connectionist model or does it require the manipulation of symbols?
1
Neuron Firing Process What is represented by a
biological neuron?
1) Synapse receives incoming signals, change
electrical (ionic) potential of cell body • Cell body sums electrical potentials from
2) When potential of cell body reaches some incoming signals
– Serves as an accumulator function over time
limit, neuron “fires”, electrical signal – But “as a rule many impulses must reach a neuron
(action potential) sent down axon almost simultaneously to make it fire” (p. 33, Brodal,
3) Axon propagates signal to other neurons, 1992; italics added)
downstream • Synapses have varying effects on cell potential
– Synaptic strength
2
Another Example: 8 units in each Units & Weights
layer, fully connected network 1
Unit numbers
W1,1
Unit number
2 W1,2
• Units 1
– Sometimes notated W1,3
with unit numbers 3
• Weights W1,4
4
– Sometimes give by
symbols
– Sometimes given by
numbers
– Always represent 1 0.3
numbers -0.1
– May be integer or
2
1
real valued
3 2.1
-1.1
4
1 0.3
Input: (3, 1, 0, -2) Is called the
2 -0.1
threshold
1 if x
Processing: 2.1
3
3(0.3) + 1(-0.1) + 0(2.1) + -1.1(-2) -1.1 f ( x) =
0 if x
4
= 0.9 + (-0.1) + 2.2
Output: 3
Step function
3
Step Function Example Step Function Example (2)
• Let = 3 • Let = 3
1 1 0.3 1 1 0.3
2 -0.1 2 -0.1
x=3 x
2.1 2.1
3 3 3 3
-1.1 -1.1
4 4
Output after passing through f (3) =1 Output after passing through f (x) = ?
step activation function: step activation function:
1 + e −x
Input: (0, 10, 0, 0)
network output?
4
Sigmoidal Another Example
• A two weight layer, feedforward network
1.2
• Two inputs, one output, one ‘hidden’ unit
1
Input: (3, 1) 1
0.8 f ( x) =
1 + e− x
1/(1+exp(-x)))
0.6
1/(1+exp(-10*x)))
0.5
0.4
0.75
0.2
-0.5
0
What is the output?
-5
-4.4
-3.8
-3.2
-2.6
-2
-1.4
-0.8
-0.2
0.4
1
1.6
2.2
2.8
3.4
4
4.6
5
Notation for Weighted Sums
Generalizing
a4,1
W1,3
W1,4
a4,1 W1,4 ak ,l layer l; layers increase in
number from left to right
Example
Can Also Use Vector Notation
6
Scalar Result:
Summed Weighted Input Computing New Activation Value
a1,1
For the case we
a2,1
W1a1 = W1,1 W1,2 W1,3 W1,4 = 1*1 matrix (scalar) were considering:
a3,1
a = f (W1,1a1,1 + W1,2a2,1 + W1,3a3,1 + W1,4a4,1 )
a4,1
1*4 row vector a = f (W1a1 )
4*1 column vector In the general case:
a = f (Wi ai )
= W1,1a1,1 + W1,2a2,1 + W1,3a3,1 + W1,4a4,1
Where: f(x) is the activation function, e.g., the
sigmoid function, and we are talking about unit
i in some layer
7
Example Architecture for ANN Approximating
• Say we want to create a neural network that tests for the Equality Function for 2 Bits
the equality of two bits: f(x1, x2) = z1
When x1 and x2 are equal, z1 is 1, otherwise, z1 is 0 Possible Network Goal
• The function we want to approximate is as follows: Architecture
Inputs: outputs:
Goal No. x1 x2 z1
Inputs: outputs: x1 y1
1 0 0 1
Sample No. x1 x2 z1 z1
2 0 1 0
1 0 0 1 x2 y2
3 1 0 0
2 0 1 0 4 1 1 1
3 1 0 0
What weights would allow this architecture to
4 1 1 1 approximate this function?
Later: How do we define the weights through a process of learning or training?
What architecture might be suitable for a neural network?
http://www.d.umn.edu/~cprince/courses/cs5541fall03/lectures/neural-networks/equality-no-bias.xls
8
How well did this approximate the • Compute the summed squared error for our
example
goal function?
x1 x2 z1
• Categorically 0 0 .925
– For inputs x1=0, x2=0 and x1=1, x2=1, the
0 1 .192
output of the network was always greater than
for inputs x1=1, x2=0 and x1=0, x2=1 1 0 .19
1 1 .433
• Summed squared error
numTrainSamples numTrainSamples
( ActualOutput
s =1
s − DesiredOutputs ) 2 ( ActualOutput
s =1
s − DesiredOutputs ) 2
9
Notation Example
Updating hidden
layer activation Updating output
1 3 .0 .1
n*1 column vector; 4 3
W ai summed weights for
“right” layer
0 .4
f ( 1.1 − 1 )
2 2 3 1 .4 2
f ( .1 7 4 6 5 3 )
.75
2 0 0 1 3 3 .6 3
n*1 - New activation values for
f (Wai ) “right” layer
3 − 2 1
10