Professional Documents
Culture Documents
00
10 0
01 1
11
x1
w1
Y
w2
x2
1 2
1 2
Σ
Here, y = wi xi − θ and w1 = 0.5,w2 = 0.5 and θ = 0.9
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 5 / 27
Single layer feed forward neural network
The concept of the AND problem and its solution with a single
neuron can be extended to multiple neurons.
INPUT OUTPUT
ɵ1
1
f
w11
x1 I1 = o1
w12
w13 f2
x2 I2 = o2
f3
x3 I3 = o3
………..
………..
w1n
fn
xm In = on
INPUT OUTPUT
ɵ1
f1
w11
x1 I1= o1
w12
w13 f2
x2 I2= o2
f3
x3 I3= o3
………..
………..
w1n
fn
xm In= on
Note that the input layer and output layer, which receive input
signals and transmit output signals are although called layers,
they are actually boundary of the architecture and hence truly not
layers.
The only layer in the architecture is the synaptic links carrying the
weights connect every input to the output neurons.
………..
………..
………..
………..
I2m = f2m
I3n= f 3n on
I1l= f 11
ɵ2m
ɵ3 n
xp ɵl1
HIDDEN OUTPUT
INPUT
I31= f31
f11 I21= f
21
ɵ31
o1
x1 I11= ɵ11 ɵ21
I32= f32
x2 I12= f12 I22= 22
f ɵ32
o2
ɵ21 ɵ22
………..
………..
………..
………..
I2m= f2m
I3n= f3n on
I1l= f11
ɵm 2
ɵ3 n
xp ɵl1
Thus, in these networks, there could exist one layer with feedback
connection.
There could also be neurons with self-feedback links, that is, the
output of a neuron is fed back into itself as input.
16
Training Perceptrons
For AND
X0 X1 X2 O
W0 = -0.3
0 0 0
0 1 0
X1 th= 0.0
W1 = 0.5 1 0 0
1 1 1
W2 = -0.4
X2
X0 X1 X2 Summation Output
1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
17
Gradient Descent Learning Rule
18
Gradient Descent
Gradient:
Gradient Descent (w ,w )
E[w]=[E/w0,… E/wn]
1 2
wi=- E/wi
=- /wi 1/2d(td-od)2
= - /wi 1/2d(td-i wi xi)2
= d(td- od)(xi)
19
Gradient Descent
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t>
where (x1,…,xn) is the vector of input values, and t is the target output
value
Initialize each wi to some small random value
Until the termination condition is met, Do
Initialize each wi to zero
For each <(x1,…xn),t> in training_examples Do
Input the instance (x1,…,xn) to the linear unit and compute the
output o
For each linear unit weight wi Do
wi= wi + d(td- od) xi
20
Weight Updation
W0= -0.3 + [(0-0)1+(0-0)1+(0-1)1+(1-0)1]= -0.3
W1= 0.5 + [(0-0)0+(0-0)0+(0-1)1+(1-0)1]= 0.5
W2= -0.4 + [(0-0)0+(0-0)1+(0-1)0+(1-0)1]= 0.6
X0 X1 X2 Summation Output
1 0 0 (-1*0.3) + (0*0.5) + (0*0.6) = -0.3 0
1 0 1 (-1*0.3) + (0*0.5) + (1*0.6) = 0.3 1
1 1 0 (-1*0.3) + (1*0.5) + (0*0.6) = 0.2 1
1 1 1 (-1*0.3) + (1*0.5) + (1*0.6) = 0.8 1
21
Weight Updation
W0= -0.3 + [(0-0)1+(0-1)1+(0-1)1+(1-1)1]= -2.3
W1= 0.5 + [(0-0)0+(0-1)0+(0-1)1+(1-1)1]= -0.5
W2= 0.6 + [(0-0)0+(0-1)1+(0-1)0+(1-1)1]= -0.4
X0 X1 X2 Summation Output
1 0 0 (-1*2.3) + (-0*0.5) + (-0*0.4) = -2.3 0
1 0 1 (-1*2.3) + (-0*0.5) + (-1*0.4) = -2.7 0
1 1 0 (-1*2.3) + (-1*0.5) + (-0*0.4) = -2.8 0
1 1 1 (-1*2.3) + (-1*0.5) + (-1*0.4) = -3.2 0
22
Weight Updation
X0 X1 X2 Summation Output
1 0 0 (-1*1.3) + (0*0.5) + (0*0.6) = -1.3 0
1 0 1 (-1*1.3) + (0*0.5) + (1*0.6) = -0.7 0
1 1 0 (-1*1.3) + (1*0.5) + (0*0.6) = -0.8 0
1 1 1 (-1*1.3) + (1*0.5) + (1*0.6) = -0.2 0
23
Weight Updation
X0 X1 X2 Summation Output
1 0 0 (-1*0.3) + (0*1.5) + (0*1.6) = -0.3 0
1 0 1 (-1*0.3) + (0*1.5) + (1*1.6) = -1.3 0
1 1 0 (-1*0.3) + (1*1.5) + (0*1.6) = 1.2 1
1 1 1 (-1*0.3) + (1*1.5) + (1*1.6) = 2.8 1
24
Weight Updation
X0 X1 X2 Summation Output
1 0 0 (-1*1.3) + (0*0.5) + (0*1.6) = -1.3 0
1 0 1 (-1*1.3) + (0*0.5) + (1*1.6) = 0.3 1
1 1 0 (-1*1.3) + (1*0.5) + (0*1.6) = -0.8 0
1 1 1 (-1*1.3) + (1*0.5) + (1*1.6) = 0.8 1
25
Weight Updation
X0 X1 X2 Summation Output
1 0 0 (-1*2.3) + (0*0.5) + (0*0.6) = -2.3 0
1 0 1 (-1*2.3) + (0*0.5) + (1*0.6) = -1.7 0
1 1 0 (-1*2.3) + (1*0.5) + (0*0.6) = -1.8 0
1 1 1 (-1*2.3) + (1*0.5) + (1*0.6) = -1.2 0
26
Weight Updation
X0 X1 X2 Summation Output
1 0 0 (-1*1.3) + (0*1.5) + (0*1.6) = -1.3 0
1 0 1 (-1*1.3) + (0*1.5) + (1*1.6) = 0.3 1
1 1 0 (-1*1.3) + (1*1.5) + (0*1.6) = 0.2 1
1 1 1 (-1*1.3) + (1*1.5) + (1*1.6) = 1.8 1
27
Weight Updation
W0= -1.3 + [(0-0)1+(0-1)1+(0-1)1+(1-1)1]= -3.3
W1= 1.5 + [(0-0)0+(0-1)0+(0-1)1+(1-1)1]= 0.5
W2= 1.6 + [(0-0)0+(0-1)1+(0-1)0+(1-1)1]= 0.6
X0 X1 X2 Summation Output
1 0 0 (-1*3.3) + (0*0.5) + (0*0.6) = -3.3 0
1 0 1 (-1*3.3) + (0*0.5) + (1*0.6) = -2.7 0
1 1 0 (-1*3.3) + (1*0.5) + (0*0.6) = -2.8 0
1 1 1 (-1*3.3) + (1*0.5) + (1*0.6) = -2.2 0
28
Weight Updation
W0= -3.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -2.3
W1= 0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 1.5
W2= 0.6 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 1.6
X0 X1 X2 Summation Output
1 0 0 (-1*2.3) + (0*1.5) + (0*1.6) = -2.3 0
1 0 1 (-1*2.3) + (0*1.5) + (1*1.6) = -0.7 0
1 1 0 (-1*2.3) + (1*1.5) + (0*1.6) = -0.8 0
1 1 1 (-1*2.3) + (1*1.5) + (1*1.6) = 0.8 1
29
Decision Surface of a Perceptron
x2 x2
+ + -
+
+ -
- x1
x1 - +
+ -
-
Linearly separable Non-Linearly separable
30
Why different type of neural network architecture?
To give the answer to this question, let us first consider the case of a
single neural network with two inputs as shown below.
x2
w0
x1 w1
x1
w2
x2
x2
0 0 0 y=0
Y=1
0 1 0
1 0 0 0,0
y=0 1,0
1 1 1 y=0
0
1 x1
f = 0.5 x1 + 0.5 x2 - 0.9
x2
0,1 1,1
x1 x2 Output (y) 1 y=1 y=0
0 0 0
0 1 1
1 0 1
y=0 y=1
1 1 0
0
x1
0,0 1,0
X1 1 0.5 1
1 f
0.5
1
-1
X2 1.5
1
TARGET
NEURAL NETWORK ERROR
INPUTS OUTPUT
ARCHITECTURE CALCULATION
FEED BACK
Adjust weights /
architecture