You are on page 1of 41

Artificial Neural Network : Architectures

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 1 / 27


Neural network architectures

There are three fundamental classes of ANN architectures:

Single layer feed forward architecture

Multilayer feed forward architecture

Recurrent networks architecture

Before going to discuss all these architectures, we first discuss the


mathematical details of a neuron at a single level. To do this, let us first
consider the AND problem and its possible solution with neural
network.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 2 / 27


The AND problem and its Neural network
The simple Boolean AND operation with two input variables x1
and x2 is shown in the truth table.
Here, we have four input patterns: 00, 01, 10 and 11.
For the first three patterns output is 0 and for the last pattern
output is 1.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 3 / 27


The AND problem and its neural network
Alternatively, the AND problem can be thought as a perception
problem where we have to receive four different patterns as input
and perceive the results as 0 or 1.

00
10 0

01 1
11

x1
w1

Y
w2

x2

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 4 / 27


The AND problem and its neural network
A possible neuron specification to solve the AND problem is given
in the following. In this solution, when the input is 11, the weight
sum exceeds the threshold (θ = 0.9) leading to the output 1 else it
gives the output 0.

1 2

1 2

Σ
Here, y = wi xi − θ and w1 = 0.5,w2 = 0.5 and θ = 0.9
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 5 / 27
Single layer feed forward neural network

The concept of the AND problem and its solution with a single
neuron can be extended to multiple neurons.

INPUT OUTPUT
ɵ1
1
f
w11
x1 I1 = o1
w12

w13 f2

x2 I2 = o2

f3

x3 I3 = o3
………..

………..

w1n

fn

xm In = on

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 6 / 27


Single layer feed forward neural network

INPUT OUTPUT

ɵ1
f1
w11
x1 I1= o1
w12

w13 f2

x2 I2= o2

f3

x3 I3= o3
………..

………..

w1n

fn

xm In= on

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 7 / 27


Single layer feed forward neural network

We see, a layer of n neurons constitutues a single layer feed


forward neural network.

This is so called because, it contains a single layer of artificial


neurons.

Note that the input layer and output layer, which receive input
signals and transmit output signals are although called layers,
they are actually boundary of the architecture and hence truly not
layers.

The only layer in the architecture is the synaptic links carrying the
weights connect every input to the output neurons.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 8 / 27


Modeling SLFFNN

In a single layer neural network, the inputs x1, x2, · · · , xm are


connected to the layers of neurons through the weight matrix W . The
weight matrix Wm×n can be represented as follows.

w11 w12 w13 · · · w1n


. w21 w22 w23 · · · w2n .
w=
. . .. .
(1)
. .. ..
.wm1 wm2 wm3 · · · wmn .

The output of any k -th neuron can be determined asfollows.


.Σ Σ
Ok=f k
m i
=1 (wik xi ) + θk

where k = 1, 2, 3, · · · , n and θk denotes the threshold value of the k-th


neuron. Such network is feed forward in type or acyclic in nature and
hence the name.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 9 / 27


Multilayer feed forward neural networks

This network, as its name indicates is made up of multiple layers.


Thus architectures of this class besides processing an input and
an output layer also have one or more intermediary layers called
hidden layers.

The hidden layer(s) aid in performing useful intermediary


computation before directing the input to the output layer.

A multilayer feed forward network with l input neurons (number of


neuron at the first layer), m1, m2, · · · , mp number of neurons at i-th
hidden layer (i = 1, 2, · · · , p) and n neurons at the last layer (it is
the output neurons) is written as l − m1 − m2 − · · · − mp − n
MLFFNN.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 10 / 27


Multilayer feed forward neural networks

Figure shows a schematic diagram of multilayer feed forward neural


network with a configuration of l − m − n.
HIDDEN OUTPUT
INPUT

f 11 I21= f21 I31= f 31 o1


x1 I11= ɵ31
ɵ11 ɵ21

I22= f22 I32= f 32


o2
x2 I12= f 12
1 ɵ22
ɵ32
ɵ
2

………..
………..
………..
………..

I2m = f2m
I3n= f 3n on
I1l= f 11
ɵ2m
ɵ3 n
xp ɵl1

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 11 / 27


Multilayer feed forward neural networks

HIDDEN OUTPUT
INPUT

I31= f31
f11 I21= f
21
ɵ31
o1
x1 I11= ɵ11 ɵ21

I32= f32
x2 I12= f12 I22= 22
f ɵ32
o2
ɵ21 ɵ22

………..
………..
………..
………..

I2m= f2m
I3n= f3n on
I1l= f11
ɵm 2
ɵ3 n
xp ɵl1

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 12 / 27


Multilayer feed forward neural networks
In l − m − n MLFFNN, the input first layer contains l numbers
neurons, the only hidden layer contains m number of neurons and
the last (output) layer contains n number of neurons.
The inputs x1, x2, .....xp are fed to the first layer and the weight
matrices between input and the first layer, the first layer and the
hidden layer and those between hidden and the last (output) layer
are denoted as W 1, W 2, and W 3, respectively.
Further, consider that f 1, f 2, and f 3 are the transfer functions of
neurons lying on the first, hidden and the last layers, respectively.

Likewise, the threshold values of any i-th neuron in j-th layer is


denoted by θij.
Moreover, the output of i-th,.Σ j-th, and k -th
Σ neuron in any l-thlayer
is represented by O i = f i
l l Xi W + θ ,i where X isl the input
l l

vector to the l-th layer.


Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 13 / 27
Recurrent neural network architecture

The networks differ from feedback network architectures in the


sense that there is at least one ”feedback loop”.

Thus, in these networks, there could exist one layer with feedback
connection.

There could also be neurons with self-feedback links, that is, the
output of a neuron is fed back into itself as input.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 14 / 27


Recurrent neural network architecture

Depending on different type of feedback loops, several recurrent neural


networks are known such as Hopfield network, Boltzmann machine
network etc.
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 15 / 27
Training Perceptrons
For AND
X0 X1 X2 O
W0 = ? 0 0 0
0 1 0
X1 th= 0.0 1 0 0
W1 = ?
1 1 1
W2 = ?
X2

•What are the weight values?


•Initialize with random weight values
Summation = W0 + X1 W1 + X2 W2

16
Training Perceptrons
For AND
X0 X1 X2 O
W0 = -0.3
0 0 0
0 1 0
X1 th= 0.0
W1 = 0.5 1 0 0
1 1 1
W2 = -0.4
X2

X0 X1 X2 Summation Output
1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
17
Gradient Descent Learning Rule

Consider linear unit without threshold and


continuous output o (not just –1,1)
o=w0 + w1 x1 + … + wn xn
Train the wi’s such that they minimize the
squared error
E[w1,…,wn] = ½ dD (td-od)2
where D is the set of training
examples

18
Gradient Descent

Gradient:
Gradient Descent (w ,w )
E[w]=[E/w0,… E/wn]
1 2

w=- E[w] (w1+w1,w2 +w2)

wi=- E/wi
=- /wi 1/2d(td-od)2
= - /wi 1/2d(td-i wi xi)2
= d(td- od)(xi)
19
Gradient Descent
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t>
where (x1,…,xn) is the vector of input values, and t is the target output
value
Initialize each wi to some small random value
Until the termination condition is met, Do
Initialize each wi to zero
For each <(x1,…xn),t> in training_examples Do
Input the instance (x1,…,xn) to the linear unit and compute the
output o
For each linear unit weight wi Do
wi= wi + d(td- od) xi

20
Weight Updation
W0= -0.3 + [(0-0)1+(0-0)1+(0-1)1+(1-0)1]= -0.3
W1= 0.5 + [(0-0)0+(0-0)0+(0-1)1+(1-0)1]= 0.5
W2= -0.4 + [(0-0)0+(0-0)1+(0-1)0+(1-0)1]= 0.6

X0 X1 X2 Summation Output
1 0 0 (-1*0.3) + (0*0.5) + (0*0.6) = -0.3 0
1 0 1 (-1*0.3) + (0*0.5) + (1*0.6) = 0.3 1
1 1 0 (-1*0.3) + (1*0.5) + (0*0.6) = 0.2 1
1 1 1 (-1*0.3) + (1*0.5) + (1*0.6) = 0.8 1
21
Weight Updation
W0= -0.3 + [(0-0)1+(0-1)1+(0-1)1+(1-1)1]= -2.3
W1= 0.5 + [(0-0)0+(0-1)0+(0-1)1+(1-1)1]= -0.5
W2= 0.6 + [(0-0)0+(0-1)1+(0-1)0+(1-1)1]= -0.4

X0 X1 X2 Summation Output
1 0 0 (-1*2.3) + (-0*0.5) + (-0*0.4) = -2.3 0
1 0 1 (-1*2.3) + (-0*0.5) + (-1*0.4) = -2.7 0
1 1 0 (-1*2.3) + (-1*0.5) + (-0*0.4) = -2.8 0
1 1 1 (-1*2.3) + (-1*0.5) + (-1*0.4) = -3.2 0
22
Weight Updation

W0= -2.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -1.3


W1= -0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 0.5
W2= -0.4 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 0.6

X0 X1 X2 Summation Output
1 0 0 (-1*1.3) + (0*0.5) + (0*0.6) = -1.3 0
1 0 1 (-1*1.3) + (0*0.5) + (1*0.6) = -0.7 0
1 1 0 (-1*1.3) + (1*0.5) + (0*0.6) = -0.8 0
1 1 1 (-1*1.3) + (1*0.5) + (1*0.6) = -0.2 0
23
Weight Updation

W0= -1.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -0.3


W1= 0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 1.5
W2= 0.6 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 1.6

X0 X1 X2 Summation Output
1 0 0 (-1*0.3) + (0*1.5) + (0*1.6) = -0.3 0
1 0 1 (-1*0.3) + (0*1.5) + (1*1.6) = -1.3 0
1 1 0 (-1*0.3) + (1*1.5) + (0*1.6) = 1.2 1
1 1 1 (-1*0.3) + (1*1.5) + (1*1.6) = 2.8 1
24
Weight Updation

W0= -0.3 + [(0-0)1+(0-0)1+(0-1)1+(1-1)1]= -1.3


W1= 1.5 + [(0-0)0+(0-0)0+(0-1)1+(1-1)1]= 0.5
W2= 1.6 + [(0-0)0+(0-0)1+(0-1)0+(1-1)1]= 1.6

X0 X1 X2 Summation Output
1 0 0 (-1*1.3) + (0*0.5) + (0*1.6) = -1.3 0
1 0 1 (-1*1.3) + (0*0.5) + (1*1.6) = 0.3 1
1 1 0 (-1*1.3) + (1*0.5) + (0*1.6) = -0.8 0
1 1 1 (-1*1.3) + (1*0.5) + (1*1.6) = 0.8 1
25
Weight Updation

W0= -1.3 + [(0-0)1+(0-1)1+(0-0)1+(1-1)1]= -2.3


W1= 0.5 + [(0-0)0+(0-1)0+(0-0)1+(1-1)1]= 0.5
W2= 1.6 + [(0-0)0+(0-1)1+(0-0)0+(1-1)1]= 0.6

X0 X1 X2 Summation Output
1 0 0 (-1*2.3) + (0*0.5) + (0*0.6) = -2.3 0
1 0 1 (-1*2.3) + (0*0.5) + (1*0.6) = -1.7 0
1 1 0 (-1*2.3) + (1*0.5) + (0*0.6) = -1.8 0
1 1 1 (-1*2.3) + (1*0.5) + (1*0.6) = -1.2 0
26
Weight Updation

W0= -2.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -1.3


W1= 0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 1.5
W2= 0.6 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 1.6

X0 X1 X2 Summation Output
1 0 0 (-1*1.3) + (0*1.5) + (0*1.6) = -1.3 0
1 0 1 (-1*1.3) + (0*1.5) + (1*1.6) = 0.3 1
1 1 0 (-1*1.3) + (1*1.5) + (0*1.6) = 0.2 1
1 1 1 (-1*1.3) + (1*1.5) + (1*1.6) = 1.8 1
27
Weight Updation
W0= -1.3 + [(0-0)1+(0-1)1+(0-1)1+(1-1)1]= -3.3
W1= 1.5 + [(0-0)0+(0-1)0+(0-1)1+(1-1)1]= 0.5
W2= 1.6 + [(0-0)0+(0-1)1+(0-1)0+(1-1)1]= 0.6

X0 X1 X2 Summation Output
1 0 0 (-1*3.3) + (0*0.5) + (0*0.6) = -3.3 0
1 0 1 (-1*3.3) + (0*0.5) + (1*0.6) = -2.7 0
1 1 0 (-1*3.3) + (1*0.5) + (0*0.6) = -2.8 0
1 1 1 (-1*3.3) + (1*0.5) + (1*0.6) = -2.2 0
28
Weight Updation
W0= -3.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -2.3
W1= 0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 1.5
W2= 0.6 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 1.6

X0 X1 X2 Summation Output
1 0 0 (-1*2.3) + (0*1.5) + (0*1.6) = -2.3 0
1 0 1 (-1*2.3) + (0*1.5) + (1*1.6) = -0.7 0
1 1 0 (-1*2.3) + (1*1.5) + (0*1.6) = -0.8 0
1 1 1 (-1*2.3) + (1*1.5) + (1*1.6) = 0.8 1
29
Decision Surface of a Perceptron
x2 x2
+ + -
+
+ -
- x1
x1 - +
+ -
-
Linearly separable Non-Linearly separable

•But functions that are not linearly separable (e.g. XOR)


XOR can solved as:
XOR(x1, x2)= AND( OR(x1, x2), NAND(x1, x2))

30
Why different type of neural network architecture?
To give the answer to this question, let us first consider the case of a
single neural network with two inputs as shown below.

x2
w0

x1 w1
x1

w2
x2

f=w0ɵ + w1x1 + w2x2


=b0 + w1x1 + w2x2
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 31 / 27
Revisit of a single neural network
Note that f = b0 + w1x1 + w2x2 denotes a straight line in the plane
of x1-x2 (as shown in the figure (right) in the last slide).
Now, depending on the values of w1 and w2, we have a set of
points for different values of x1 and x2.
We then say that these points are linearly separable, if the straight
line f separates these points into two classes.
Linearly separable and non-separable points are further illustrated
in Figure.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 32 / 27


AND and XOR problems

To illustrate the concept of linearly separable and non separable tasks


to be accomplished by a neural network, let us consider the case of
AND problem and XOR problem.

x1 Inputs x2 Output (y) x1 x2 Output (y)


0 0 0 0 0 0
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 0

AND Problem XOR Problem

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 33 / 27


AND problem is linearly separable

x2

x1 x2 Output (y) 1 0,1


1,1

0 0 0 y=0
Y=1
0 1 0
1 0 0 0,0
y=0 1,0
1 1 1 y=0
0

1 x1
f = 0.5 x1 + 0.5 x2 - 0.9

The AND Logic AND-problem is linearly separable

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 34 / 27


XOR problem is linearly non-separable

x2

0,1 1,1
x1 x2 Output (y) 1 y=1 y=0
0 0 0
0 1 1
1 0 1
y=0 y=1
1 1 0
0
x1
0,0 1,0

XOR Problem XOR-problem is non-linearly separable

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 35 / 27


Our observations

From the example discussed, we understand that a straight line is


possible in AND-problem to separate two tasks namely the output as 0
or 1 for any input.
However, in case of XOR problem, such a line is not possible.
Note: horizontal or a vertical line in case of XOR problem is not
admissible because in that case it completely ignores one input.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 36 / 27


Example

So, far a 2-classification problem, if there is a straight line, which


acts as a decision boundary then we can say such problem as
linearly separable; otherwise, it is non-linearly separable.

The same concept can be extended to n-classification problem.


Such a problem can be represented by an n-dimensional space
and a boundary would be with n − 1 dimensions that separates a
given sets.

In fact,any linearly separable problem can be solved with a single


layer feed forward neural network. For example, the AND problem.

On the other hand, if the problem is non-linearly separable, then a


single layer neural network can not solves such a problem.
To solve such a problem, multilayer feed forward neural network is
required.
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 37 / 27
Example: Solving XOR problem

X1 1 0.5 1
1 f
0.5
1
-1
X2 1.5
1

Neural network for XOR-problem

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 38 / 27


Dynamic neural network

In some cases, the output needs to be compared with its target


values to determine an error, if any.

Based on this category of applications, a neural network can be


static neural network or dynamic neural network.

In a static neural network, error in prediction is neither calculated


nor feedback for updating the neural network.

On the other hand, in a dynamic neural network, the error is


determined and then feed back to the network to modify its
weights (or architecture or both).

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 39 / 27


Dynamic neural network

TARGET
NEURAL NETWORK ERROR
INPUTS OUTPUT
ARCHITECTURE CALCULATION

FEED BACK
Adjust weights /
architecture

Framework of dynamic neural network

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 40 / 27


Dynamic neural network

From the above discussions, we conclude that


For linearly separable problems, we solve using single layer feed
forward neural network.

For non-linearly separable problem, we solve using multilayer feed


forward neural networks.

For problems, with error calculation, we solve using recurrent


neural networks as well as dynamic neural networks.

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 41 / 27

You might also like