You are on page 1of 10

Neural Network History

Neural Networks • History traces back to the 50’s but became popular in the 80’s
with work by Rumelhart, Hinton, and Mclelland
– A General Framework for Parallel Distributed Processing in Parallel
Distributed Processing: Explorations in the Microstructure of
Cognition
• Peaked in the 90’s. Today:
– Hundreds of variants
– Less a model of the actual brain than a useful tool, but still some
debate
• Numerous applications
– Handwriting, face, speech recognition
– Vehicles that drive themselves
– Models of reading, sentence production, dreaming
• Debate for philosophers and cognitive scientists
– Can human consciousness or cognitive abilities be explained by a
connectionist model or does it require the manipulation of symbols?

Comparison of Brains and Traditional


‘a’ or ‘the’ brain?
Computers
• Are we using computer models of neurons
• 200 billion neurons, 32 • 1 billion bytes RAM but to model ‘the’ brain or model ‘a’ brain?
trillion synapses trillions of bytes on disk
• Element size: 10-6 m • Element size: 10-9 m
• Energy use: 25W • Energy watt: 30-90W (CPU)
• Processing speed: 100 Hz • Processing speed: 109 Hz
• Parallel, Distributed • Serial, Centralized
• Fault Tolerant • Generally not Fault Tolerant
• Learns: Yes • Learns: Some
• Intelligent/Conscious: • Intelligent/Conscious:
Usually Generally No

1
Neuron Firing Process What is represented by a
biological neuron?
1) Synapse receives incoming signals, change
electrical (ionic) potential of cell body • Cell body sums electrical potentials from
2) When potential of cell body reaches some incoming signals
– Serves as an accumulator function over time
limit, neuron “fires”, electrical signal – But “as a rule many impulses must reach a neuron
(action potential) sent down axon almost simultaneously to make it fire” (p. 33, Brodal,
3) Axon propagates signal to other neurons, 1992; italics added)
downstream • Synapses have varying effects on cell potential
– Synaptic strength

ANN (Artificial Neural Nets)


Graphical Notation & Terms
• Approximation of biological neural nets by
ANN’s • Circles
– No direct model of accumulator function over time – Are neural units
– Synaptic strength – Metaphor for nerve cell
• Approximate with connection weights (real numbers) body
– Spiking of output • Arrows
• Approximate with non-linear activation functions – Represent synaptic
• Neural units connections from one unit Another layer of
to another
– Represent activation values (numbers) – These are called weights
neural units
– Represent inputs, and outputs (numbers) and represented with a One layer of
scalar numeric value (e.g., neural units
a real number)

2
Another Example: 8 units in each Units & Weights
layer, fully connected network 1

Unit numbers
W1,1

Unit number
2 W1,2
• Units 1
– Sometimes notated W1,3
with unit numbers 3
• Weights W1,4
4
– Sometimes give by
symbols
– Sometimes given by
numbers
– Always represent 1 0.3
numbers -0.1
– May be integer or
2
1
real valued
3 2.1
-1.1
4

Computing with Neural Units Activation Functions


• Inputs are presented to input units • Usually, don’t just use weighted sum directly
• Apply some function to the weighted sum before it
• How do we generate outputs? is used (e.g., as output)
• One idea • Call this the activation function
– Summed Weighted Inputs • Step function could be a good simulation of a
biological neuron spiking


1 0.3
Input: (3, 1, 0, -2) Is called the
2 -0.1
threshold
1 if x   
Processing: 2.1
3
3(0.3) + 1(-0.1) + 0(2.1) + -1.1(-2) -1.1 f ( x) =  
0 if x   
4
= 0.9 + (-0.1) + 2.2
Output: 3
Step function

3
Step Function Example Step Function Example (2)
• Let  = 3 • Let  = 3

Input: (3, 1, 0, -2) Input: (0, 10, 0, 0)

1 1 0.3 1 1 0.3
2 -0.1 2 -0.1
x=3 x
2.1 2.1
3 3 3 3
-1.1 -1.1
4 4
Output after passing through f (3) =1 Output after passing through f (x) = ?
step activation function: step activation function:

 

Another Activation Function:


Sigmoidal Example
The Sigmoidal
Input: (3, 1, 0, -2)
• The math of some neural nets requires that
the activation function be continuously
0.3  = 2
-0.1
1
differentiable 2.1 f (x) =
1+ e−2x
• A sigmoidal function often used to -1.1
approximate the step function
 Is the
steepness
f (3) =
1
1+ e−2x
= .998
1
f ( x) = parameter

1 + e −x
Input: (0, 10, 0, 0)


network output?

4
Sigmoidal Another Example
• A two weight layer, feedforward network
1.2
• Two inputs, one output, one ‘hidden’ unit
1

Input: (3, 1) 1
0.8 f ( x) =
1 + e− x
1/(1+exp(-x)))
0.6
1/(1+exp(-10*x)))
0.5
0.4
0.75
0.2

-0.5
0
What is the output?
-5
-4.4
-3.8
-3.2
-2.6
-2
-1.4
-0.8
-0.2
0.4
1
1.6
2.2
2.8
3.4
4
4.6

Computing in Multilayer Networks Notation


• Start at leftmost layer • At times useful to represent weights and
– Compute activations based on inputs activations using vector and matrix
• Then work from left to right, using computed activations notations
as inputs to next layer
• Example solution
Weight (scalar) from unit j
– Activation of hidden unit
f(0.5(3) + -0.5(1)) = f ( x) =
1 a1,1 W1,1
Wi , j in left layer to unit i in
f(1.5 – 0.5) = 1 + e− x right layer
f(1) = 0.731 a2,1 W1,2
– Output activation
f(0.731(0.75)) = a3,1 W1,3 a1,2
Activation value of unit k in
f(0.548) = .634
a4,1 W1,4 ak ,l layer l; layers increase in
number from left to right

5
Notation for Weighted Sums
Generalizing

a1,2 = ak ,l +1 = f (i =1Wi , j ai ,l )


n

f (W1,1a1,1 + W1,2a2,1 + W1,3a3,1 + W1,4a4,1 ) Weight (scalar) from unit j


a1,1 Wi , j in left layer to unit i in
a1,1 W1,1 right layer
W1,1 a2,1 W1,2
a2,1 W1,2
a3,1 W1,3 a1,2
a3,1 a1,2 Activation value of unit k in

a4,1
W1,3
W1,4
a4,1 W1,4 ak ,l layer l; layers increase in
number from left to right

Example
Can Also Use Vector Notation

a1,1 W1 = W1,1 W1, 2 W1,3 W1, 4 


Wi Row vector of incoming weights for unit i
a2,1
W1,1
W1,2
 a1 
a3,1 W1,3 a1,2 a 
a1 =  2  Recall: multiplying a
ai Column vector of activation values of units
connected to unit i
a4,1 W1,4 a3 
 
n*r with a r*m matrix
a 4  produces an n*m matrix,
C, where each element in
 a1,1  that n*m matrix Ci,j is
a  produced as the scalar
W1a1 = W1,1 W1, 2 W1,3 W1, 4  
2 ,1 product of row i of the
(Assuming that the layer for unit i is  a3,1  left and column j of the
specified in the context)   right
a4,1 

6
Scalar Result:
Summed Weighted Input Computing New Activation Value
 a1,1 
  For the case we
a2,1
W1a1 = W1,1 W1,2 W1,3 W1,4   = 1*1 matrix (scalar) were considering:
a3,1
  a = f (W1,1a1,1 + W1,2a2,1 + W1,3a3,1 + W1,4a4,1 )
a4,1
1*4 row vector a = f (W1a1 )
4*1 column vector In the general case:

a = f (Wi ai )
= W1,1a1,1 + W1,2a2,1 + W1,3a3,1 + W1,4a4,1
Where: f(x) is the activation function, e.g., the
sigmoid function, and we are talking about unit
i in some layer

Example Function Approximation


• Draw the corresponding ANN • We can use ANN’s to approximate
• Compute the output value functions
f(X) = Y
• Input units (X): Function inputs (vector)
1 
Activation values
• Output units (Y): Function outputs (vector)
f (0.4 0.5 - 12 ) = • Hidden layers/weights:
3 – Computation of function
1
weights f ( x) =
1 + e− x

7
Example Architecture for ANN Approximating
• Say we want to create a neural network that tests for the Equality Function for 2 Bits
the equality of two bits: f(x1, x2) = z1
When x1 and x2 are equal, z1 is 1, otherwise, z1 is 0 Possible Network Goal
• The function we want to approximate is as follows: Architecture
Inputs: outputs:
Goal No. x1 x2 z1
Inputs: outputs: x1 y1
1 0 0 1
Sample No. x1 x2 z1 z1
2 0 1 0
1 0 0 1 x2 y2
3 1 0 0
2 0 1 0 4 1 1 1
3 1 0 0
What weights would allow this architecture to
4 1 1 1 approximate this function?
Later: How do we define the weights through a process of learning or training?
What architecture might be suitable for a neural network?

Approximate Solution Actual


network
Network Architecture results: Quality Measures
x1 y1 x1 x2 z1
0 0 .925 • A given ANN may only approximate the
z1 desired function (e.g., equality for two bits)
0 1 .192
x2 y2
1 0 .19 • We need to measure the quality of the
1 1 .433
approximation
Weights • I.e., how closely did the ANN approximate
the desired function?
w_x1_y1 w_x1_y2 w_x2_y1 w_x2_y2 w_y1_z1 w_y2_z1

-1.8045 -7.7299 -1.8116 -7.6649 -10.3022 15.3298

http://www.d.umn.edu/~cprince/courses/cs5541fall03/lectures/neural-networks/equality-no-bias.xls

8
How well did this approximate the • Compute the summed squared error for our
example
goal function?
x1 x2 z1
• Categorically 0 0 .925
– For inputs x1=0, x2=0 and x1=1, x2=1, the
0 1 .192
output of the network was always greater than
for inputs x1=1, x2=0 and x1=0, x2=1 1 0 .19
1 1 .433
• Summed squared error
numTrainSamples numTrainSamples

 ( ActualOutput
s =1
s − DesiredOutputs ) 2  ( ActualOutput
s =1
s − DesiredOutputs ) 2

Solution Weight Matrix


Expected Actual
x1 x2 z1 z1 squared error
0 0 1 0.925 0.005625
• Row vector provides weights for a single unit in
0 1 0 0.192 0.036864
“right” layer
1 0 0 0.19 0.0361 • A weight matrix can provide all weights
1 1 1 0.433 0.321489 connecting “left” layer to “right” layer
• Let W be a n*r weight matrix
Sum squared error = 0.400078 – Row vector i in matrix connects unit i on “right” layer
to units in “left” layer
Generally, lower values for sum squared error – n units in layer to “right”
indicate better approximation; 0 is “perfect” – r units in layer to “left”

Need also consider generalization-- later.

9
Notation Example
Updating hidden
layer activation Updating output

ai The vector of activation values of layer to


“left”; an r*1 column vector (same as before)
values activation values

 1 3 .0  .1
n*1 column vector; 4 3   
W ai summed weights for
“right” layer
  0 .4 
f ( 1.1 − 1    )
 2 2 3 1 .4   2 
 
f ( .1 7 4 6 5   3  )
  .75  
2 0  0 1 3 3 .6  3 
n*1 - New activation values for
f (Wai ) “right” layer
 3 − 2  1 

Function f is now taken as applying to


Draw the architecture (units and
elements of a vector
arcs representing weights) of
the connectionist model

Answer Bias Weights


• 2 input units • Used to provide a trainable threshold
• 5 hidden layer units 1
• 3 output units b
W1,1
• Fully connected, feedforward network W1,2
b is treated as
another weight;
W1,3 but connected to
W1,4 a unit with
constant
activation value

10

You might also like