You are on page 1of 5

Connectionist approach to AI

Lecture # 21
 neural networks, neuron model
 perceptrons
Connectionist approach to AI threshold logic, perceptron training, convergence theorem
single layer vs. multi-layer
 backpropagation
stepwise vs. continuous activation function

1 2

Connectionist models (neural nets) Biology


 humans lack the speed & memory of computers
 yet humans are capable of complex reasoning/action  The brain doesn’t seem to have a CPU.
 maybe our brain architecture is well-suited for certain tasks  Instead, it’s got lots of simple, parallel, asynchronous units,
called neurons.
 Every neuron is a single cell that has a number of relatively
short fibers, called dendrites, and one long fiber, called an
axon.
 The end of the axon branches out into more short fibers
 Each fiber “connects” to the dendrites and cell bodies of other
neurons
 The “connection” is actually a short gap, called a synapse
 Axons are transmitters, dendrites are receivers

general brain architecture:


 many (relatively) slow neurons, interconnected
 dendrites serve as input devices (receive electrical impulses from other neurons)
 cell body "sums" inputs from the dendrites (possibly inhibiting or exciting)
 if sum exceeds some threshold, the neuron fires an output impulse along axon
3 4

Neuron How neurons work

 The fibers of surrounding neurons emit chemicals (neurotransmitters)


that move across the synapse and change the electrical potential of the
cell body
 Sometimes the action across the synapse increases the potential, and
sometimes it decreases it.
 If the potential reaches a certain threshold, an electrical pulse, or action
potential, will travel down the axon, eventually reaching all the
branches, causing them to release their neurotransmitters. And so on ...

5 6

1
Brain metaphor From a neuron to a perceptron
connectionist models are based on the brain metaphor
 All connectionist models use a similar model of a neuron
 large number of simple, neuron-like processing elements
 large number of weighted connections between neurons  There is a collection of units each of which has
note: the weights encode information, not symbols!  a number of weighted inputs from other units
 parallel, distributed control  inputs represent the degree to which the other unit is firing
 emphasis on learning  weights represent how much the units wants to listen to other units
 a threshold that the sum of the weighted inputs are compared
against
brief history of neural nets  the threshold has to be crossed for the unit to do something (“fire”)
1940's theoretical birth of neural networks  a single output to another bunch of units
McCulloch & Pitts (1943), Hebb (1949)  what the unit decided to do, given all the inputs and its threshold
1950's & 1960's optimistic development using computer models
Minsky (50's), Rosenblatt (60's)
1970's DEAD
Minsky & Papert showed serious limitations
1980's & 1990's REBIRTH – new models, new techniques
Backpropagation, Hopfield nets
7 8

A unit (perceptron) Interesting questions for NNs


w1
x1
x2 w2  How do we wire up a network of perceptrons?
x3 w3 O=f(y)
. y=Σ
Σwixi - i.e., what “architecture” do we use?
.
. wn
 How does the network represent knowledge?
xn
- i.e., what do the nodes mean?
 xiare inputs
 How do we set the weights?
wiare weights - i.e., how does learning take place?
wnis usually set for the threshold with xn =1 (bias)
y is the weighted sum of inputs including the
threshold (activation level)
o is the output. The output is computed using a
function that determines how far the
perceptron’s activation level is below or
above 0
9 10

Artificial neurons Computation via activation function

 McCulloch & Pitts (1943) described an artificial neuron  can view an artificial neuron as a computational element
 inputs are either electrical impulse (1) or not (0)  accepts or classifies an input if the output fires
 (note: original version used +1 for excitatory and –1 for inhibitory signals)
INPUT: x1 = 1, x2 = 1
 each input has a weight associated with it
.75*1 + .75*1 = 1.5 >= 1  OUTPUT: 1
 the activation function multiplies each input value by its weight
1
 if the sum of the weighted inputs >= θ, INPUT: x1 = 1, x2 = 0
 then the neuron fires (returns 1), else doesn't fire (returns 0) .75 .75 .75*1 + .75*0 = .75 < 1  OUTPUT: 0

θ INPUT: x1 = 0, x2 = 1

w1 wn
if Σwixi >= θ, output = 1 .75*0 + .75*1 = .75 < 1  OUTPUT: 0
x1 x2
w2
... if Σwixi < θ, output = 0 INPUT: x1 = 0, x2 = 0
.75*0 + .75*0 = 0 < 1  OUTPUT: 0
x1 x2 xn
this neuron computes the AND function
11 12

2
In-class exercise Computation via activation function

 specify weights and thresholds to compute OR  can view an artificial neuron as a computational element
 accepts or classifies an input if the output fires
INPUT: x1 = 1, x2 = 1
INPUT: x1 = 1, x2 = 1
w1*1 + w2*1 >= θ  OUTPUT: 1
INPUT: x1 = 1, x2 = 0
INPUT: x1 = 1, x2 = 0
θ 1
INPUT: x1 = 0, x2 = 1
w1*1 + w2*0 >= θ  OUTPUT: 1
w1 w2 .75 .75 INPUT: x1 = 0, x2 = 0
INPUT: x1 = 0, x2 = 1
w1*0 + w2*1 >= θ  OUTPUT: 1
TRY to calculate neuron to computes the OR
x1 x2 x1 x2
INPUT: x1 = 0, x2 = 0 function
w1*0 + w2*0 < θ  OUTPUT: 0

13 14

Another exercise? Normalizing thresholds

 specify weights and thresholds to compute XOR  to make life more uniform, can normalize the threshold to 0
INPUT: x1 = 1, x2 = 1  simply add an additional input x0 = 1, w0 = -θ

w1*1 + w2*1 >= θ  OUTPUT: 0


θ 0
INPUT: x1 = 1, x2 = 0
θ w1 wn −θ wn
w1*1 + w2*0 >= θ  OUTPUT: 1 w2 w1 w2
w1 w2 ... ...
INPUT: x1 = 0, x2 = 1
x1 x2 xn 1 x1 x2 xn
w1*0 + w2*1 >= θ  OUTPUT: 1
x1 x2
INPUT: x1 = 0, x2 = 0
advantage: threshold = 0 for all neurons
w1*0 + w2*0 < θ  OUTPUT: 0
Σwixi >= θ ≡ -θ*1 + Σwixi >= 0

15 16

Normalized examples Perceptrons


INPUT: x1 = 1, x2 = 1  Rosenblatt (1958) devised a learning algorithm for artificial neurons
1*-1 + .75*1 + .75*1 = .5 >= 0  OUTPUT: 1
AND 0
 start with a training set (example inputs & corresponding desired
INPUT: x1 = 1, x2 = 0 outputs)
-1 .75
1*-1 +.75*1 + .75*0 = -.25 < 0  OUTPUT: 0  train the network to recognize the examples in the training set (by
.75
INPUT: x1 = 0, x2 = 1 adjusting the weights on the connections)
1*-1 +.75*0 + .75*1 = -.25 < 0  OUTPUT: 0  once trained, the network can be applied to new examples
INPUT: x1 = 0, x2 = 0
1 x1 x2 1*-1 +.75*0 + .75*0 = -1 < 0  OUTPUT: 0
Perceptron learning algorithm:
1. Set the weights on the connections with random values.
INPUT: x1 = 1, x2 = 1
2. Iterate through the training set, comparing the output of the network with the
OR desired output for each example.
0 INPUT: x1 = 1, x2 = 0
3. If all the examples were handled correctly, then DONE.
INPUT: x1 = 0, x2 = 1
-.5 .75 4. Otherwise, update the weights for each incorrect example:
.75 INPUT: x1 = 0, x2 = 0
• if should have fired on x1, …,xn but didn't, wi += xi (0 <= i <= n)
• if shouldn't have fired on x1, …,xn but did, wi -= xi (0 <= i <= n)
5. GO TO 2
1 x1 x2
17 18

3
Example: perceptron learning Example: perceptron learning (cont.)
using these updated weights:
0 x1 = 1, x2 = 1: 0.1*1 + 1.6*1 + 1.2*1 = 2.9  1 OK
 Suppose we want to train a perceptron to compute AND
x1 = 1, x2 = 0: 0.1*1 + 1.6*1 + 1.2*0 = 1.7  1 WRONG
0.1 1.2 x1 = 0, x2 = 1: 0.1*1 + 1.6*0 + 1.2*1 = 1.3  1 WRONG
training set: x1 = 1, x2 = 1  1 1.6
x1 = 1, x2 = 0  0 x1 = 0, x2 = 0: 0.1*1 + 1.6*0 + 1.2*0 = 0.1  1 WRONG
x1 = 0, x2 = 1  0
x1 = 0, x2 = 0  0 new weights: w0 = 0.1 - 1 - 1 - 1 = -2.9
randomly, let: w0 = -0.9, w1 = 0.6, w2 = 0.2 1 x1 x2 w1 = 1.6 - 1 - 0 - 0 = 0.6
0
w2 = 1.2 - 0 - 1 - 0 = 0.2
-0.9 0.2 using these weights:
0.6
x1 = 1, x2 = 1: -0.9*1 + 0.6*1 + 0.2*1 = -0.9  0 WRONG
x1 = 1, x2 = 0: -0.9*1 + 0.6*1 + 0.2*0 = -0.3  0 OK using these updated weights:
x1 = 0, x2 = 1: -0.9*1 + 0.6*0 + 0.2*1 = -0.7  0 OK 0 x1 = 1, x2 = 1: -2.9*1 + 0.6*1 + 0.2*1 = -2.1  0 WRONG
x1 = 0, x2 = 0: -0.9*1 + 0.6*0 + 0.2*0 = -0.9  0 OK x1 = 1, x2 = 0: -2.9*1 + 0.6*1 + 0.2*0 = -2.3  0 OK
1 x1 x2 -2.9 0.2
0.6 x1 = 0, x2 = 1: -2.9*1 + 0.6*0 + 0.2*1 = -2.7  0 OK
x1 = 0, x2 = 0: -2.9*1 + 0.6*0 + 0.2*0 = -2.9  0 OK
new weights: w0 = -0.9 + 1 = 0.1
w1 = 0.6 + 1 = 1.6
w2 = 0.2 + 1 = 1.2 new weights: w0 = -2.9 + 1 = -1.9
1 x1 x2 w1 = 0.6 + 1 = 1.6
19 w2 = 0.2 + 1 = 1.2 20

Example: perceptron learning (cont.) Hidden units


 the addition of hidden units allows the network to develop complex feature
detectors (i.e., internal representations)
using these updated weights:
0 x1 = 1, x2 = 1: -1.9*1 + 1.6*1 + 1.2*1 = 0.9  1 OK
x1 = 1, x2 = -1: -1.9*1 + 1.6*1 + 1.2*0 = -1.5  0 OK  e.g., Optical Character Recognition (OCR)
-1.9 1.2
1.6 x1 = -1, x2 = 1: -1.9*1 + 1.6*0 + 1.2*1 = -2.3  0 OK  perhaps one hidden unit
x1 = -1, x2 = -1: -1.9*1 + 1.6*0 + 1.2*0 = -4.7  0 OK "looks for" a horizontal bar
 another hidden unit
DONE! "looks for" a diagonal
1 x1 x2
 another looks for the vertical base

 the combination of specific


hidden units indicates a 7
EXERCISE: train a perceptron to compute OR

21 22

Building multi-layer nets Hidden units & learning

 smaller example: can combine perceptrons to perform more complex  every classification problem has a perceptron solution if enough hidden
computations (or classifications) layers are used
 i.e., multi-layer networks can compute anything
3-layer neural net
 (recall: can simulate AND, OR, NOT gates)
2 input nodes expressiveness is not the problem – learning is!
1 0.1
.75 1 hidden node  it is not known how to systematically find solutions
2 output nodes  the Perceptron Learning Algorithm can't adjust weights between levels
.75 -3.5
1.5 1.5
1.5
RESULT?
1 1
Minsky & Papert's results about the "inadequacy" of perceptrons pretty much
HINT: left output node is AND killed neural net research in the 1970's
x1 x2 right output node is XOR
rebirth in the 1980's due to several developments
FULL ADDER  faster, more parallel computers
 new learning algorithms e.g., backpropagation
 new architectures e.g., Hopfield nets
23 24

4
Backpropagation nets Backpropagation learning
backpropagation nets are multi-layer networks
 normalize inputs between 0 (inhibit) and 1 (excite)  there exists a systematic method for adjusting weights, but no global
 utilize a continuous activation function convergence theorem (as was the case for perceptrons)

 backpropagation (backward propagation of error) – vaguely stated


 perceptrons utilize a stepwise activation  select arbitrary weights
function
 pick the first test case
output = 1 if sum >= 0  make a forward pass, from inputs to output
0 if sum < 0  compute an error estimate and make a backward pass, adjusting
weights to reduce the error
 repeat for the next test case

testing & propagating for all training cases is known as an epoch


 backpropagation nets utilize a continuous
activation function despite the lack of a convergence theorem, backpropagation works well in
output = 1/(1 + e-sum) practice
 however, many epochs may be required for convergence

25 26

Backpropagation applets on the Web Neural net applications

 pattern classification
 9 of top 10 US credit card companies use Falcon
 uses neural nets to model customer behavior, identify fraud
 http://www.cs.ubc.ca/labs/lci/CIspace/Version4/neural/  claims improvement in fraud detection of 30-70%
 Sharp, Mitsubishi, … -- Optical Character Recognition (OCR)
prediction & financial analysis
 Merrill Lynch, Citibank, … -- financial forecasting, investing
 Spiegel – marketing analysis, targeted catalog sales
 http://members.aol.com/Trane64/java/JRec.html
control & optimization
 Texaco – process control of an oil refinery
 Intel – computer chip manufacturing quality control
 AT&T – echo & noise control in phone lines (filters and compensates)
 Ford engines utilize neural net chip to diagnose misfirings, reduce emissions

 also from AI video: ALVINN project at CMU trained a neural net to drive
backpropagation network: video input, 9 hidden units, 45 outputs
27 28

DEMONSTRATION

SOLVING AND, OR and XOR problems using NN model


: Multi layer perceptron:

29

You might also like