You are on page 1of 92

SUPERVISED LEARNING NETWORK

Perceptron Networks
Perceptron Networks
 The basic networks in supervised learning
 The perceptron network consist of three units
– Sensory unit (input unit)
– Associator unit (hidden unit)
– Response unit (output unit)
Perceptron Networks
• Input unit is connected to hidden units with fixed weights -
1,0,-1 assigned at random
• Binary activation function is used in input and hidden unit
• Output unit (1,0,-1) activation. The binary step with fixed
threshold ϴis used as activation.
• Output of perceptron is
y  f ( yin )

1ifyin   
 
f ( yin )  0if    yin   
 1 yin   
 
Perceptron Networks
• Weight updation between hidden and output unit
• Checks out for error between hidden and calculated output
layer
• Error=target-calculated output
• weights are adjusted in case of error
wi (new)  wi (old )  txi
b(new)  b(old )  t
• α is the learning rate, ‘t’ is the target which is -1 or 1.
• No error-no weight change-training is stopped
Single classification perceptron
network
Perceptron Training Algorithm for
Single Output Classes
Step 0: initialize weights, bias, learning rate( 0<  <=1)
Step 1: perform step 2-6 until final stopping condition is false
Step 2: perform steps 3-5 for each bipolar or binary training pair indicated by s:t
Step 3: input layer is applied with identity activation fn:
xi=si
Step 4: calculate output response of each input j=1 to m
first, net input is calculated

activation are applied over the net input to calculate the output response.
Perceptron Training Algorithm for
Single Output Classes
Step 5: Make adjustment in weight and bias j=1 to m and i=1 to n

Step 6: Test for the stopping condition. If there no change in


weight then stop the training process else start again from step2.
Flow chart for perceptron Network
with single output
Example
Implement AND function using perceptron
networks for bipolar inputs and target.

x1 X2 t
1 1 1
1 -1 -1
-1 1 -1
-1 -1 -1
• The perceptron network, which uses perceptron learning
rule, is used to train the AND function.
• The network architecture is as shown in Figure.
• The input patterns are presented to the network one by one.
When all the four input patterns are presented, then one
epoch is said to be completed.
• The initial weights and threshold are set to zero
W1 = W2. = b = 0 and ϴ= 0. The learning rate is set equal to=1.
• First input, (1,1,1) Calculate the net input yin

• The output y is computed by activation function over the


net input calculated:

ϴ = 0 hence, when yin =0, then y=0


Check whether t=y . Here t=1 and y=0
so t≠y, hence weight updation take place
The weight W1 = 1, W2 = l, b = 1 are the final weights after first
input pattern is presented.
The same process is repeated for all the input patterns.
The process can be stopped when all the targets become equal
to the calculated output or when a separating line is obtained
using the final weights for separating the positive responses
from negative responses.
Input Target Net Calculated Weight changes Weight
output
input
x1 x2 1 (t) yin y ∆w1 ∆w2 ∆wb W1 W2 b
0 0 0
1 1 1 1 0 0 1 1 1 1 1 1
1 -1 1 -1
-1 1 1 -1
-1 -1 1 -1
EXAMPLE
• Find the weights using perceptron network for the
given truth table. when all the inputs are presented
only one time. Use bipolar inputs and targets.

x1 x2 t
1 1 -1
1 -1 1
-1 1 -1
-1 -1 -1
NETWORK STRUCTURE
• S

1
b
w1 y
x1 X1 Y
w2

x2
X2
COMPUTATIONS
• Let us take w1  w2  0;   1,  0
b=0
First input: (1, 1, -1)
yin  b  x1w1  x2 w2  0 1 0 1 0  0
• The output using activation function is

1, if yin  0;
y  f ( yin )  0, if yin  0;
 1, if y  0.
 in
COMPUTATIONS
• So, output (y = 0)  (t = -1)
• So weight updation is necessary

w1(new)  w1(old )  .t.x1


 0 1 (1) 1
 1
w2 (new)  w2 (old )  .t.x1
 0 1 (1) 1
 1
b(new)  1
• The new weights are (-1, -1, -1)
COMPUTATIONS
• Second input: (1,-1,1)
yin  b  x1w1  x2 w2  11 (1)  (1 1)  1
( y  f ( yin )  1)  (t  1)
• So, new weights are to be computed
w1(new)  w1(old )  .t.x1 w2 (new)  w2 (old )  .t.x1
 (1) 111  111 (1)
0  2

b(new)  0

• The new weights are (0, -2, 0)


COMPUTATIONS
• Third input: (-1,1,-1)
yin  b  x1w1  x2 w2  0  (1)  0  (1 2)  2
( y  f ( yin )  1)  (t  1)
• Weight updation is not necessary
• So, new weights are not to be computed
• Fourth input: (-1, -1, -1)
yin  b  x1w1  x2 w2  0  (1)  0  (1 2)  2
( y  f ( yin )  1)  (t  1)

• So, new weights are to be computed


COMPUTATIONS
w1(new)  w1(old )  .t.x1
• S

 0 1 (1)  (1)


1
w2 (new)  w2 (old )  .t.x1
 2 1 (1)  (1)
 1
b(new)  1

• The new weights are (1, -1, -1)


FINAL ANALYSIS
Input Target Net Calculated weights
Input output

x1 x2 b t yin y w1 w2 b

1 1 1 -1 0 0 -1 -1 -1

1 -1 1 1 -1 -1 0 -2 0

-1 1 1 -1 -2 -1 0 -2 0

-1 -1 1 -1 2 1 1 -1 -1
EXAMPLE

Find the weights required to perform classification


using perceptron network. The vectors (1, 1, 1, 1) and
(-1, 1, -1, -1) are belonging to the class (so have target
value 1), vectors (1, 1, 1,-1) and (1, -1, -1, 1) are not
belonging to the class (so we have target value -1).
Assume learning rate as 1 ,initial weight as 0 and   0.2
INITIAL TABLE
• The truth table is given by

x1 x2 x3 x4 b t

1 1 1 1 1 1

-1 1 -1 -1 1 1

1 1 1 -1 1 -1

1 -1 -1 1 1 -1
COMPUTATIONS
• Here we take w1  w2  w3  w4  0,b=0,   0.2 . Also,   1
• The activation function is given by
1, if yin  0.2;
y  0, if yin  -0.2 ≤ 𝑦𝑖𝑛 ≤ 0.2;
 1, if y   0.2.
 in

• The net input is given by


yin  b  x1w1  x2 w2  x3w3  x4 w4
• The next table reflects the training performed with weights
computed
COMPUTATIONS
Input Target Net Out Weights
Inp put
ut
EPOCH-1 x x4 b t Y in y w1 w2 w3 w4 b
x1 x2 3
1 1 1 1 1 1 0 0 1 1 1 1 1

-1 1 -1 1 1 1 -1 -1 0 2 0 0 2

1 1 1 -1 1 -1 4 1 -1 1 -1 1 1

1 -1 -1 1 1 -1 1 1 -2 2 0 0 0
COMPUTATIONS
Input Target Net Out weights
Input put
EPOCH-2 x x4 b t Y in y w1 w2 w3 w4 b
x1 x2 3
1 1 1 1 1 1 0 0 -1 3 1 1 1

-1 1 -1 1 1 1 4 1 -1 3 1 1 1

1 1 1 -1 1 -1 5 1 -2 2 0 2 0

1 -1 -1 1 1 -1 -2 -1 -2 2 0 2 0
COMPUTATIONS
Input Target Net Out Weights
Input put
EPOCH-3 x x4 b t Y in y w1 w2 w3 w4 b
x1 x2 3
1 1 1 1 1 1 2 1 -2 2 0 2 0

-1 1 -1 1 1 1 6 1 -2 2 0 2 0

1 1 1 -1 1 -1 -2 -1 -2 2 0 2 0

1 -1 -1 1 1 -1 -2 -1 -2 2 0 2 0

Here the target outputs are equal to the actual outputs. So, we stop.
THE FINAL NET
• s

1
x1 b 0
X1
w1  2

x2 w2  2 Y y
X2
w3  0
x3
X3 w4  2

x4 X4
ADALINE Networks
Adaptive Linear Neuron (Adaline)
• A network with a single linear unit is called an ADALINE (ADAptive
LINear Neuron)
• Input-output relationship is linear
• It uses bipolar activation for its input signals and its target output
• Weights between the input and output are adjustable and has only
one output unit
• Trained using Delta rule (Least mean square) or (Widrow-Hoff rule)
Delta Rule
• Delta rule for Single output unit
– Minimize the error over all training patterns.
– Done by reducing the error for each pattern one at a time
• Delta rule for adjusting the weight for ith pattern is (i=1to n)
wi   (t  yin) xi
• Delta rule in case of several output units for adjusting the
weight from ith input unit to jth output unit

wij   (t  yinj ) xi
Adaline Model
x0=1
1

b
x1 w1
X1
yin= xiwi
f(yin)
w2 
x2
X2 wn

yin

xn
Xn e=t-yin
Output error t
Adaptive
generator
algorithm
Adaline Training Algorithm
Step 0: Weights and bias are set to some random values other
than zero. Learning rate parameter α

Step 1: Perform Steps 2-6 when stopping condition is false.

Step 2: Perform steps 3-5 for each bipolar training pair s:t

Step 3: Set activations for input units i=1 to n xi=si

Step 4: Calculate the net input to the output unit


n
yin  b   xiwi
i 1
Adaline Training Algorithm
Step 5: Update the weights and bias for i=1 to n

wi (new)  wi (old )   (t  yin) xi

b(new)  b(old )   (t  yin)

• Step 6: If highest weight change that occurred during training


is smaller than a specified tolerance then stop the training
else continue. (Stopping condition)
Start
Stop

Y
Initialize weights and
bias and α
If Ei=Es

Input the specified


tolerance error Es

Calculate error
Ei=Σ(t-yin)2

For
each wi (new)  wi (old )   (t  yin ) xi
s:t b(new)  b(old )   (t  yin )

Activate input units Calculate net input


Xi=si Yin=b+Σxi wi
Testing Algorithm
• Step 0: Initialize the weights(from training algorithm)
• Step 1: Perform steps2-4 for each bipolar input vector x
• Step 2: Set the activations of the input units to x
• Step 3: Calculate the net input
yin  b   xiwi

• Step 4: Apply the activation function over the net input


calculated
Example-7
Implement OR function with bipolar inputs and
target using Adaline network.

x1 x2 1 t
1 1 1 1
1 -1 1 1
-1 1 1 1
-1 -1 1 -1
The initial weights are taken to be W1 = W2 =b = 0.1

learning rate α = 0.1.

For the first input sample, X1 = 1, X2 = 1, t = 1,

Calculate the net input as


Input Target Net Weight changes Weight Error
inpu
t
x1 x2 1 (t) yin (t-yin) ∆w1 ∆w2 ∆wb W1 W2 b
0.1 0.1 0.1 (t-yin)2
1 1 1 1 0.3 0.7 0.07 0.07 0.07 0.17 0.17 0.17 0.49
1 -1 1 1
-1 1 1 1
-1 -1 1 -1
Epoch = Total Error value
• Epoch 1= (0.49+0.69+0.83+1.01) = 3.02
• Epoch 2= (0.046+0.564+0.629+0.699) =1.938
• Epoch 3 = (0.007+0.487+0.515+0.541) =1.550
Epoch4 = (0.076+0.437+0.448+0.456)= 1.417
Epoch 5 = (0.155+0.405+0.408+0.409) = 1.377
BACK-PROPAGATION NETWORK
(BPN)
BACK PROPOGATION NETWORK
• Back propagation learning algorithm is one of the most important
developments in neural networks
• This learning algorithm is applied to multilayer feed-forward
networks consisting of processing elements with continuous
differentiable activation functions
• The networks using the back propagation learning algorithm
are also called back propagation networks (BPN)
• Given a set of input, output pairs this algorithm provides a
procedure for changing the weights in a BPN to classify the given
input pattern correctly.
• The basic concept used for weight updation is the gradient-
descent method as used in simple perceptron network
BACK PROPOGATION NETWORK
• In this method the error is propagated back to the hidden unit
• Neural network is to train the net to achieve a balance between
the net's ability to respond (memorization) and its ability to give
reasonable responses to the input that similar but not identical
to the one that is used in training (generalization)
• It differs from other networks in respect to the process by which
the weights are calculated during the learning period
• When the number of hidden layers is increased the
complexity increases
• The error is usually measured at the output layer
• At the hidden layers there is no information about the errors
• So, other techniques are needed to be followed to calculate the
error at the hidden layers so that the ultimate error is minimized
ARCITECTURE OF BACK PROPOGATION
NETWORK
• It is a multi layer, feed forward neural network
• It has one input layer,one hidden layer and one output layer
• The neurons in the hidden and output layer have biases

• During the back-propagation phase of learning, signals are sent in


the reverse direction
• The output obtained from the net can be
Binary {0, 1} / Bipolar {-1, 1}
• The activation functions should increase monotonically,
differentiable
DIAGRAMATIC REPRESENTATION OF THE ARCHITECTURE OF BPN

A 1 bias 1 w01
bias t1
x1 X1 y1
Y1
vo1

Z1 yk t2
Xi Yk
x2
vij wjk

Zj
ym tm
Ym
xn Xn

Zp
Input Layer Hidden Layer Output Layer
BACK PROPOGATION NETWORK

The training of the BPN is done in three stages

 The feed-forward of the input training pattern

 Calculation and back-propagation of the error

 Updation of weights.
FLOWCHART DESCRIPTION FOR TRAINING PROCESS

• x= input training vector ( x 1 , x 2 , . . . x n ) [‘n’ in number]


• t= target output vector ( t 1 , t 2 , . . . t m ) [‘m’ in number]
•  = learning
𝑡ℎ
rate parameter
• x i = 𝑖 input unit
• v0 j = bias on 𝑗𝑡ℎ hidden unit
• wok = bias on 𝑘𝑡ℎ output unit
• z j = 𝑗𝑡ℎ hidden unit
𝑡ℎ
• Then we have input to the 𝑗 hidden unit is given by
n
(z j )in  voj  xi.vij , j 1, 2...p
i1
FLOWCHART DESCRIPTION FOR TRAINING PROCESS
CONTD…

• The output from the 𝑗𝑡ℎ hidden unit is given by


z j  f ((z j )in )
• Let y k be the 𝑘 𝑡ℎ output unit. The net input to it is
p
( yk )in  wok   z j .wjk , k  1, 2...m
j 1

• The output from the 𝑘 𝑡ℎ output unit is


yk  f (( yk )in )
ACTIVATION FUNCTIONS USED
• The commonly used activation functions are binary sigmoidal
and bipolar sigmoidal functions have the properties:
 x
1
1 e f (x)   x
,
f (x)   x
, 1 e
• Differentiable 1 e
• Monotonously Non-decreasing
• So, these functions are used as the activation functions. Here
• k = Error correction weight adjustment for wjk
• This error is at the output unit y k
• This is back-propagated to the hidden units which have fed
into y k
ERROR CORRECTION
CONTD…
• j
is the error correction weight adjustment for v
ij
• This error correction has occurred due to the back
propagation of error to the hidden unit z j
BPN TRAINING ALGORITHM
• STEP 0: Initialize weights and learning rate (some random
small values are taken)
• STEP 1: Perform Steps 2 -9 when stopping condition is false
• STEP 2: Perform steps 3 – 8 for each training pair
Feed-forward phase( phase I)
• STEP 3: Each input unit receives input signal x i , i=1,…n
and sends it to the hidden unit
• STEP 4: Each hidden unit z j , j = 1,…p sums its weighted input
signals to calculate net input:
n
(z j )in  voj   xi .vij
i1
TRAINING ALGORITHM
• Calculate the outputs from the hidden layer by applying
activation functions over(zin ) j , (binary or bipolar sigmoidal
activation function)
z j  f ((zin ) j )
These signals are sent as input signals for the output units
Step 5:
• For each output unit yk , k  1,...m , calculate the net input
p

( yk )in  w0k   z j .wjk


j 1

Apply the activation function to compute the output signal

yk  f (( yk )in ), k  1,2...m.
TRAINING ALGORITHM (BACK PROPAGATION OF ERROR)
Back-propagation of error (Phase II)
• STEP 6: Each output unit yk , k  1, 2...m receives a target
pattern to the input training pattern and computes the error
correction using:
k  (tk  yk ) f '(( yk )in )
Note : f ' ( yin )  f ( yin )[1  f ( yin )]

On the basis of the calculated error correction term, update the


change in weights and bias:
wjk  .k .z j
w0k  .k
• Also, send k to the hidden layer backwards
TRAINING ALGORITHM (BACK PROPAGATION OF ERROR)

• STEP 7: Each hidden unit z j , j 1, 2...p sums its delta inputs
from the output units:
m

( )inj  .kw jk
k =1

• The term ( )inj gets multiplied with the derivative of f ((z j )in )
to calculate the error term:
𝜹𝒋 = 𝜹𝒊𝒏𝒋 𝒇′ 𝒛𝒊𝒏𝒋
Update the changes in weight and bias

v ij   .  xi
j

 v   .j
oj
BPN TRAINING ALGORITHM

Weights and Bias Updation (Phase III)


Step 8: Each output unit, yk , k  1, 2,...m updates the bias and
weights as:
wjk (new)  wjk (old)   w jk
w0k (new)  w0k (old)   w0k
Each hidden unit, z j , j 1,...p updates its bias and weights as:
vij (new)  vij (old)   vij
v0 j (new)  v0 j (old)   v0 j

• STEP 9: Check for the stopping condition. The stopping


condition may be a certain number of cycles reached or
when actual output is equal to the target output (That is
error is zero)
TESTING ALGORITHM FOR BPN

• STEP 0: Initialize the weights. (The weights are taken from the
training phase
• STEP 1: Perform steps from 2 – 4 for each input vector
• STEP 2: Set the activation of input for xi , I = 1,2,…n
• STEP 3: Calculate the net input to hidden unit z and its output


n
(z in ) j  voj  i 1
xi .vij and z j  f ((zin ) j )
• STEP 4: Compute the net input and the output of the output
layer ( y )  w  p z .w , y  f (( y ) )
in k 0k  j1 j jk k in k

• USE SIGMOIDAL FUNCTIONS AS ACTIVATION FUNCTIONS
EXAMPLE-1
• Using the Back propagation network, find the new weights
for the net shown below. It is presented with the input
pattern [0,1] and the target output 1. Use a learning rate of
0.25 and binary sigmoidal activation function.

1
0.3 1
0.2
0 0.6 0.5 0.4 y
X1 Z1 Y
0.3
0.1
1 0.1
X2 Z2
0.4
COMPUTATIONS
• The initial weights are:
v11  0.6, v21  0.1, v01  0.3
v12  0.3, v22  0.4, v02  0.5
w1  0.4, w2  0.1, w0  0.2
• The learning rate:  0.25 .
1
• The activation function is: Binary sigmoidal , i.e. f (x) 
1 e x
Phase I :The feed-forward of the input training pattern
Calculate the net input , for hidden layer
• For z 1 ne uron :

(z1 )in  v01  v11.x1  v21.x2  0.3  0  (0.6) 1 (0.1)  0.2
• For z2 neuron:
(z2 )in  v02  v12 .x1  v22 .x2  0.5  0 (0.3) 1(0.4)  0.9

• Applying the activation function to calculate the


output, for hidden layer
Calculate the net input entering the output layer ,
• Input:
yin  w0  w1z1  w2 z2  0.2  0.5498 0.4  0.7109  0.1  0.09101

• Output:

Applying activation function to calculate the output,


Phase II: Calculation and back-propagation of the error
• We use the gradient descent formula: k  (tk  yk ) f ' (( yk )in )
• We have,
f ' ( yin )  f ( yin )[1  f ( yin )]  0.5227 [1  0.5227]  0.2495
• Here k = 1. So,
1  (1 0.5227) (0.2495)  0.1191
We next compute the changes in weights between the hidden
and the output layer
w1  .1.z1  0.25 0.1191 0.5498  0.0164
w2  .1 .z2  0.25 0.1191 0.7109  0.02117
w0  .1  0.25 0.1191  0.02978
COMPUTATIONS CONTD…
Compute the error portion between input and hidden layer

• The general formula is  j  (in ) j  f ' ((zin ) j )


• Each hidden unit sums its delta inputs from the output units.
m

(in ) j    k .w jk
k 1
COMPUTATIONS CONTD…

Here, m =1 (the output neuron). So, (in ) j  1.wj1


• So, ( )   .w  0.1191 0.4  0.04764
in 1 1 11
(in )2  1.w21  0.1191 0.1  0.1191
• Now, Error correction
f ' ((zin )1 )  f ((zin )1 )[1 f (zin )1 ]  0.5498 [1 0.5498]  0.2475
Hence, 1  (in )1. f ' ((zin )1 )  0.04764  0.2475  0.0118
• Again,
f ' ((zin )2 )  f ((zin )2 )[1  f (zin )2 ]  0.7109 [1  0.7109]  0.2055

So, 2  (in )2 . f ' ((zin )2 )  0.01191 0.2055  0.00245



COMPUTATIONS CONTD…
Now find the changes in weights between input and hidden layer

v11  .1.x1  0.25 0.0118 0  0


v21  .1.x2  0.25 0.01181  0.00295
v01  .1  0.25 0.0118  0.00295

v12  .2 .x1  0.25 0.00245 0  0


v22  .2 .x2  0.25 0.002451  0.0006125
v02  .2  0.25 0.00245  0.0006125
Phase :III Compute the final weights of the network
• s
v11 (new)  v11 (old )  v11  0.6  0  0.6
v12 (new)  v12 (old )  v12  0.3  0  0.3
v21 (new)  v21 (old )  v21  0.1  0.00295  0.09705
v22 (new)  v22 (old )  v22  0.4  0.0006125  0.4006125
v01 (new)  v01 (old )  v01  0.3  0.00295  0.30295
v02 (new)  v02 (old )  v02  0.5  0.0006125  0.5006125

w1(new)  w1(old)   w1  0.4  0.0164  0.4164


w2 (new)  w2 (old)   w 2  0.1 0.02117  0.12117
w0 (new)  w0 (old)   w 0  0.2  0.02978  0.17022
LEARNING FACTORS OF BACK PROPAGATION ALGORITHM

Convergence of the BPN is based upon several important factors

• Initial Weights

• Learning rate

• Upgradation rule

• Size and nature of the training set

• Architecture (Number of layers and number of neurons per layer)


INITIAL WEIGHTS
• Ultimate solution may be affected by the initial weights
• These are initialized by small random values
• The choice of initial weights determines the speed at which the network
converges
• Higher initial weights may lead to saturation of activation functions from
the beginning by stuck up at a local minimum
• One method to select the weights wij is choosing it in the range

−3 3
,
𝑜𝑖 𝑜𝑖

• oi = The number of processing elements j that feed forward to the


ith processing element
LEARNING RATE

• Learning rate  also affects the convergence of the network

• Larger value of  may speed up the convergence but may

lead to overshooting

• The range of  is 103 to 10

• Large learning rate leads to rapid learning but there will be


oscillation of weights

• Lower learning rate leads to slow learning


MOMENTUM FACTOR
• To overcome the problems stated above (in the previous slide)
we add a factor called momentum factor to the usual gradient
descent method

• Momentum factor  is in the interval [ 0, 1]

•  is normally taken to be 0.9


GENERALIZATION
• The best network for generalization is BPN

• A network is said to be generalized when it sensibly interpolates with


input networks that are new to the network.

• Overfitting or overtraining problem occur

• Solution to this problem is to monitor the error on the test set and
terminates the training when the errors increases.

• Improving the ability of the network to generalize from a training


data set to a test data set, it is desirable to make small changes in
the input space of training pattern as part of the training set.
NUMBER OF TRAINING DATA
• The training data should be sufficient and proper

• Training data should cover the entire expected input space.

• Training vectors should be taken randomly from the training set

• Scaling or Normalization has to be done to help learning.


NUMBER OF HIDDEN LAYER NODES
• If the number of hidden layers is more than one in a BPN
then the calculations performed for a single layer are
repeated for all the layer and are summed up at the end
• The size of a layer is very important
• It is determined experimentally
• If the network does not converge then the number of hidden
nodes are to be increased
• If the network converges then the user may try with a few
hidden nodes and settle for a size based on the performance
• In general the size of hidden nodes should be a relatively
small fraction of the input layer
EXAMPLE-2
• Find the new weights, using back-propagation network for
the network shown below. The network is presented with the
input pattern [-1,1] and the target output +1. Use a learning
rate of   0.25 and bipolar sigmoidal activation function.
1
v02  0.5 v01  0.3 1
1 v11  0.6 w0  0.2
X1 Z1 w1  0.4
v12  0.3 y
w2  0.1
Y
v21  0.1

1 X2 Z2
v22  0.4
COMPUTATIONS
• Here, the activation function is the bipolar sigmoidal
activation function, that is
x
f (x)  1 e
x
• and 1 e
v11  0.6, v21  0.1, v01  0.3
v12  0.3, v22  0.4, v02  0.5
w1  0.4, w2  0.1, w0  0.2
• The input vector is [-1, 1] and target vector is t = 1
• Learning rate   0.25
COMPUTATIONS
• The net input:
• For z 1 neuron :
(z1 )in  v01  v11.x1  v21.x2  0.3  (1) (0.6) 1(0.1)  0.4

• For z2 neuron:
(z2 )in  v02  v12 .x1  v22 .x2  0.5  (1)  (0.3) 1 (0.4)  1.2
• Outputs:
1 e( z1 )in 1 e0.4
z1  f ((z1 )in )  ( z1 )in   0.1974
0.4
• 1 e 1 e
1 e( z1 )in 1 e1.2
z2  f ((z2 )in )  ( z2 )in
 1.2
 0.537
1 e 1 e
COMPUTATIONS
• For output layer:
• Input:
yin  w0  w1z1 w2z2  0.2  0.1974 0.4  0.537  0.1  0.22526

• Output
1 e  yin  1 e 0.22526
y  f ( yin )   0.1122
 y in 0.22526
1 e 1 e
COMPUTATIONS
• Error at the output neuron:
• We use the gradient descent formula: k  (tk  yk ) f ' (( yk )in )

f ' ( yin )  0.5[1 f ( yin )][1 f ( yin )]  0.5[1 0.1122] [1 0.1122]  0.4937

• Here k = 1. So, 1  (1 0.1122)(0.4937)  0.5491


• The changes in weights between the hidden and output layer:
1 1 1
• w  . .z  0.25 0.5491(0.1974)  0.0271

w2  .2 .z2  0.25 0.5491 0.537  0.0737


w0  .1  0.25 0.5491  0.1373
COMPUTATIONS

• Next we compute the error portion j between the input and


the hidden layer
• The general formula is  j  (in ) j  f ' ((zin ) j )
• Each hidden unit sums its delta inputs from the output units.

So, m
(in ) j    k .w jk
• k 1
COMPUTATIONS
• Here, m =1 (the output neuron). So, (in ) j  1.wj1
• Hence,
(in )1  1.w11  0.5491 0.4  0.21964
(in )2  1.w21  0.5491 0.1  0.05491
• Now,
f ' ((zin )1 )  0.5 [1 f ((zin )1 )][1  f ((zin )1)]  0.5  (1 0.1974)(1  0.1974)

• So, 1  (in )1  f ' ((zin )1 )


 0.21964 0.5 (1 0.1974)(1 0.1974)
 0.1056
COMPUTATIONS
• and 2  (in )2  f ' ((zin )2 )
 0.5491 0.5 (1 0.537)(1 0.537)
 0.0195
• Now, the changes in the weights between the input and the
hidden layer are
v11  .1.x1  0.25 0.1056 (1)  0.0264
v21  .1.x2  0.25 0.10561  0.0264
v01  .1  0.25 0.1056  0.0264
v12  .2 .x1  0.25 0.0195 (1)  0.0049
v22  .2 .x2  0.25 0.01951  0.0049
v02  .2  0.25 0.0195  0.0049
FINAL WEIGHTS
w1(new)  w1(old)   w1  0.4  0.0271  0.3729
• s

w2 (new)  w2 (old)   w 2  0.1 0.0737  1.737


w0 (new)  w0 (old)   w 0  0.2  0.1373  0.0627
v11 (new)  v11 (old )  v11  0.6  0.0264  0.5736
v12 (new)  v12 (old )  v12  0.3  0.0049  0.3049
v21 (new)  v21 (old )  v21  0.1 0.0264  0.0736
v22 (new)  v22 (old )  v22  0.4  0.0049  0.4049
v01 (new)  v01 (old )  v01  0.3  0.0264  0.3264
v02 (new)  v02 (old )  v02  0.5  0.0049  0.5049
Radial Basis Function(RBF)
Network
Radial Base Function(RBF) Network
 The radial basis function (RBF) is a
classification and functional
approximation neural network
developed by M.J.D. Powell.

 The network uses the most


common nonlinearities such as
sigmoidal and Gaussian kernel
functions.

 The Gaussian functions are also


used in regularization networks.

 The Gaussian function is generally


defined as
Radial Base Function(RBF) Network
Radial Base Function(RBF) Network
Training Algorithm
Step 0: set the weights to small random values
Step 1: perform step 2-8 when the stopping condition is false
Step 2: perform step 3-7 for each input
Step 3: Each input unit receives the input signals and transmits
to the next hidden layer unit.
Step 4: calculate the radial basis function
Step 5: select the centres for the radial base function . The
centre are selected from the set of inputs vector
Radial Base Function(RBF) Network
Step 6: calculate the output from the hidden unit

where x^ji; is the center of the RBF unit for input variables; σithe width of ith
RBF unit; xji the jth variable of input pattern.
Step 7:calculate the output of the neural network

Where k is the number of hidden layer nodes (RBF function);


ynet the output value of mth node in output layer for the nth incoming pattern
Wim weight between ith RBF unit and mth output node; wo the biasing term
at nth output node.
Step 8: calculate the error and test for the stopping condition. The stopping
condition may be number of epochs or to certain extent weight change.

You might also like