You are on page 1of 43

Neural Networks Part 1

Lida Aleksanyan
Biological Neurons
Biological neuron vs Artificial neuron
It is believed that neurons are
arranged in a hierarchical
fashion and each layer has its
own role and responsibility.
To detect a face, the brain
could be relying on the entire
network and not on a single
layer.
McCulloch-Pitts Neuron
The first computational
model of a neuron was
proposed by Warren
MuCulloch (neuroscientist)
and Walter Pitts (logician)
in 1943.
Example: watch a random football game or not on TV?
inputs are all boolean i.e., {0,1}
output variable is also boolean {0: Will watch it, 1: Won’t watch it}
● x_1 could be isPremierLeagueOn (I like Premier League more)
● x_2 could be isItAFriendlyGame (I tend to care less about the
friendlies)
● x_3 could be isNotHome (Can’t watch it when I’m outside)

and so on.

Which variable is the most important here?


Formally, this is what is going on:

g(x) is just doing a sum of the inputs


theta here is called thresholding parameter
Boolean Functions Using M-P Neuron

This representation just


denotes that, for the
boolean inputs x_1, x_2
and x_3 if the g(x) i.e.,
sum ≥ theta, the neuron
will fire otherwise, it won’t.
Boolean Functions Using M-P Neuron

An AND function neuron would only fire when When will OR function neuron fire?
ALL the inputs are ON i.e., g(x) ≥ 3 here.

When will NOR function neuron fire? When will NOT function neuron fire?
Why did we put here threshold 1?
● Whenever x_2 is 1, the output will be 0.
● x_1 AND !x_2 would output 1 only when x_1 is 1 and x_2 is 0

So it is obvious that the threshold parameter should be 1.

Lets verify that, the g(x) i.e., x_1 + x_2 would be ≥ 1 in only 3 cases:

Case 1: when x_1 is 1 and x_2 is 0

Case 2: when x_1 is 1 and x_2 is 1

Case 3: when x_1 is 0 and x_2 is 1

But in both Case 2 and Case 3 output will be 0 because x_2 is 1 in


both of them.

x_1 AND !x_2 would output 1 for Case 1 so our thresholding parameter
holds good for the given function.
Geometric Interpretation Of M-P Neuron

Decision boundary

The inputs are boolean, so only 4 combinations are possible —


(0,0), (0,1), (1,0) and (1,1)
Geometric Interpretation Of M-P Neuron
What if we have more than 2 inputs?
All possible boolean functions for 2 inputs:
Can any boolean function be represented using
a McCulloch Pitts unit ?
XOR function

There is no appropriate
decision boundary
What about functions which are not linearly separable ?
● Frank Rosenblatt, an American
psychologist, proposed the
classical perceptron model
(1958).
● A more general computational
model than Mcculloch–Pitts
neurons.
● Main differences:Introduction of
numerical weights for inputs and
a mechanism for learning these
weights.
● Inputs are no longer limited to
boolean values.
Then what is the difference?

● The weights (including threshold) can be learned.


● The inputs can be real valued.
Let us fix the threshold (−w0= 1) and try different
values of w1,w2
Say, w1=−1, w2=−1
What is wrong with this line? We make an error
on 1 out of the 4 inputs.
We are interested in finding an algorithm which
finds the values of w1, w2 which minimize the
error.
Perceptron Learning Algorithm
Why would this work ?
We are interested in finding the
line wTx= 0 which divides the
input space into two halves.

Every point (x) on this line


satisfies the equation wTx= 0.

What can you tell about the angle


(α) between w and any point (x)
which lies on this line ?
Why would this work ?

The angle is 90° as :

cosα=wTx/(||w||*||x||)= 0
Since the vector w is perpendicular to
every point on the line it is actually
perpendicular to the line itself.
Why would this work ?

Consider some points (vectors) which lie in the


positive half space of this line (i.e.,wTx≥0).

What will be the angle between any such vector


and w?

What about points (vectors) which lie in the


negative half space of this line (i.e.,wTx<0)?
Let’s look at perceptron algorithm again
For x∈P if wTx<0 then it means that the angle (α) between this
x and the current w is greater than 90° (but we want α to be
less than 90°). What happens to the new angle (αnew) when
wnew=w+x?
Thus α new will be less than α.

For x∈N if wTx≥0 then it means that the angle (α) between
this x and the current w is less than 90° (but we want α to
be greater than 90°) What happens to the new angle(αnew)
when wnew=w−x?

Thus α new will be greater than α.


Proof of Convergence
Summary
What about non-boolean (say, real) inputs?
Real valued inputs are allowed in perceptron.
Do we always need to hand code the threshold?
No, we can learn the threshold.
What about functions which are not linearly separable ?
Not possible with a single perceptron but we will see how to handle this.
● Most real world data is not linearly
separable and will always contain some
outliers.
● In fact, sometimes there may not be any
outliers but still the data may not be
linearly separable.

While a single perceptron cannot deal


with such data, we will show that a
network of perceptrons can indeed
deal with such data.
Example: XOR function

? ? ?
Solution
XOR(x1,x2) = AND(NOT(AND(x1,x2)), OR(x1,x2))
Note: b1 here is vector of biases not a
single number, as we have 2 Perceptrons
(different w0s for each neuron).

X1 X2 OR NAND AND

0 0

1 0

0 1

1 1
X1 X2 NAND

W1=-0.6, W2=-0.6, W0=1


0 0 1 W0+0*W1+0*W2>=0

1 0 1 W0+1*W1+0*W2>=0

0 1 1 W0+0*W1+1*W2>=0

1 1 0 W0+1*W1+1*W2<0

OR NAND XOR

W1=0.6, W2=0.6, W0=-1


0 1 0 W0+0*W1+1*W2<0

1 1 1 W0+1*W1+1*W2>=0

1 1 1 W0+1*W1+1*W2>=0

1 0 0 W0+1*W1+0*W2<0

You might also like