Professional Documents
Culture Documents
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
How do our brains work?
▪ A processing element
The axon endings almost touch the dendrites or cell body of the
next neuron.
How do our brains work?
▪ A processing element
Neurotransmitters are chemicals which are released from the first neuron
and which bind to the
Second.
How do our brains work?
▪ A processing element
Processing ∑
∑= X1+X2 + ….+Xm =y
Output y
How do ANNs work?
Not all inputs are equal
............
xm x2 x1
Input
wm ..... w2 w1
weights
Processing ∑ ∑= X1w1+X2w2 + ….+Xmwm
=y
Output y
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
............
xm x2 x1
Input
wm ..... w2 w1
weights
Processing ∑
Transfer Function
f(vk)
(Activation Function)
Output y
The output is a function of the input, that is
affected by the weights, and the transfer
functions
Artificial Neural Networks
An ANN can:
1. compute any computable function, by the appropriate
selection of the network topology and weights values.
2. learn from experience!
▪ Specifically, by trial‐and‐error
From Biological Neuron to
Artificial Neuron
1 if wixi >t
output=
{ i=0
0 otherwise
35
• Bias is a constant which helps the model in
a way that it can fit best for the given data.
NN - Bias • In other words, Bias is a constant which
gives freedom to perform best. This
is Bias.
1 if wixi >t
AND with a Biased input
{
output= i=0
0 otherwise
-1
W1 = 1.5
X W2 = 1 t = 0.0
W3 = 1
36 Y
Activation functions
• Transforms neuron’s input into output.
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation
levels through the network.
• Simple and easy to calculate
37
Standard activation functions
• Number of layers
• Number of neurons
• too many neurons, require more training time
• Learning rate
• from experience, value should be small ~0.1
• Momentum term
• ..
39
Over-fitting
40
Training time
41
Learning algorithm
While epoch produces an error
Present network with next inputs from epoch
Error = T – O
If Error <> 0 then
Wj = Wj + LR * Ij * Error
End If
End While
42
Learning algorithm
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function an epoch consists
of four sets of inputs being presented to the
network (i.e. [0,0], [0,1], [1,0], [1,1])
Error: The error value is the amount by which the value
output by the network differs from the target
value. For example, if we required the network to
output 0 and it output a 1, then Error = -1
43
Learning algorithm
Target Value, T : When we are training a network we not
only present it with the input but also with a value
that we require the network to produce. For
example, if we present the network with [1,1] for
the AND function the target value will be 1
Output , O : The output value from the neuron
Ij : Inputs being presented to the neuron
Wj : Weight from input neuron (Ij) to the output neuron
LR : The learning rate. This dictates how quickly the
network converges. It is set by a matter of
experimentation. It is typically 0.1
44
Training Perceptrons
For AND
-1
A B Output
W1 = ?
00 0
01 0
x t = 0.0
W2 = ? 10 0
11 1
W3 = ?
y
45
Training Perceptrons
For AND
-1
A B Output
W1 = 0.3
00 0
01 0
x t = 0.0
W2 = 0.5 10 0
11 1
W3 =-0.4
y
I1 I2 I3 Summation Output
-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
46
Learning in Neural Networks
Learn values of weights from I/O pairs
Start with random weights
Load training example’s input
Observe computed input
Modify weights to reduce difference
Iterate over all training examples
Terminate when weights stop changing OR when error is
very small
47
Decision boundaries
48
Decision Surface of a Perceptron
x2 x2
+
+ + -
+ -
- x1
x1
+ - - +
-
Linearly separable Non-Linearly separable
49
Linear Separability
X1
A
A
A B Decision
A Boundary
B
A B
B
A B
B
A B
X2
B
50
Rugby players & Ballet dancers
2 Rugby ?
Height (m)
Ballet?
1
50 120
Weight (Kg)
51
Hyperplane partitions
52
Hyperplane partitions
53
Different Non-Linearly
Separable Problems
Types of Exclusive-OR Classes with Most General
Structure
Decision Regions Problem Meshed regions Region Shapes
Three-Layer Arbitrary
(Complexity A B
B
Limited by No. A
of Nodes) B A
54
Multilayer Perceptron (MLP)
Output Values
Output Layer
Adjustable
Weights
Input Layer
55
Solving the XOR Problem
o1
w11
Network x1 w13
Topology: w21 w01
y
2 hidden nodes w12 -1 w23
w03
1 output x2 w22
-1
w02 o2
Desired behavior: -1
x1 x2 o1 o2 y Weights:
0 0 0 0 0 w11= w12=1
1 0 0 1 1 w21=w22 = 1
0 1 0 1 1 w01=3/2; w02=1/2; w03=1/2
1 1 1 1 0 w13=-1; w23=1
How it works?
Set initial values of the weights randomly.
Input: truth table of the XOR
Do
▪ Read input (e.g. 0, and 0)
▪ Compute an output (e.g. 0.60543)
▪ Compare it to the expected output. (Diff= 0.60543)
▪ Modify the weights accordingly.
Loop until a condition is met
▪ Condition: certain number of iterations
▪ Condition: error threshold
Design Issues
Initial weights (small random values ∈[‐1,1])
Transfer function (How the inputs and the weights are
combined to produce output?)
Error estimation
Weights adjusting
Number of neurons
Data representation
Size of training set
Transfer Functions
Linear: The output is proportional to the total
weighted input.
Threshold: The output is set at one of two values,
depending on whether the total weighted input is
greater than or less than some threshold value.
Non‐linear: The output varies continuously but not
linearly as the input changes.
Error Estimation
The root mean square error (RMSE) is a frequently-
used measure of the differences between values
predicted by a model or an estimator and the values
actually observed from the thing being modeled or
estimated
Weights Adjusting
After each iteration, weights should be adjusted to
minimize the error.
– All possible weights
– Back propagation
Architecture
Feedforward Network
Feedforward networks often have one or more hidden layers of sigmoid neurons followed
by an output layer of linear neurons.
Multiple layers of neurons with nonlinear transfer functions allow the network to learn
nonlinear and linear relationships between input and output vectors.
The linear output layer lets the network produce values outside the range -1 to +1. On the
other hand, if you want to constrain the outputs of a network (such as between 0 and 1),
then the output layer should use a sigmoid transfer function (such as logsig).
Difference between Hebb rule and
perceptron learning rule?
• when the network responds correctly no
connection weights are modified in a
perceptron whereas we modify the weights in
Hebb learning for every input
Ch2: Adaline and Madaline
Adaline : Adaptive Linear neuron
Madaline : Multiple Adaline
2.1 Adaline (Bernard Widrow, Stanford Univ.)
Neuron:
105
Neuron model:
y = ( wT x )