ANN Architecture

Artificial Neural Network : Architectures
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 1 / 27

Neural network architectures
There are three fundamental classes of ANN architectures:
Single layer feed forward architecture
Multilayer feed forward architecture
Recurrent networks architecture
Before going to discuss all these architectures, we first discuss the

mathematical details of a neuron at a single level. To do this, let us first
consider the AND problem and its possible solution with neural
network.

The AND problem and its Neural network
The simple Boolean AND operation with two input variables x1
and x2 is shown in the truth table.
Here, we have four input patterns: 00, 01, 10 and 11.
For the first three patterns output is 0 and for the last pattern
output is 1.

The AND problem and its neural network
Alternatively, the AND problem can be thought as a perception
problem where we have to receive four different patterns as input
and perceive the results as 0 or 1.
00
10 0
01 1
11
x1
w1
Y
w2
x2

The AND problem and its neural network
A possible neuron specification to solve the AND problem is given
in the following. In this solution, when the input is 11, the weight
sum exceeds the threshold (θ = 0.9) leading to the output 1 else it
gives the output 0.
1 2
1 2
Σ
Here, y = wi xi − θ and w1 = 0.5,w2 = 0.5 and θ = 0.9
Single layer feed forward neural network
The concept of the AND problem and its solution with a single
neuron can be extended to multiple neurons.
INPUT OUTPUT
ɵ1
1
f
w11
x1 I1 = o1
w12
w13 f2
x2 I2 = o2
f3
x3 I3 = o3
………..
………..
w1n
fn
xm In = on

INPUT OUTPUT
ɵ1
f1
w11
x1 I1= o1
w12
w13 f2
x2 I2= o2
f3
x3 I3= o3
………..
………..
w1n
fn
xm In= on

We see, a layer of n neurons constitutues a single layer feed

forward neural network.
This is so called because, it contains a single layer of artificial

neurons.
Note that the input layer and output layer, which receive input
signals and transmit output signals are although called layers,
they are actually boundary of the architecture and hence truly not
layers.
The only layer in the architecture is the synaptic links carrying the
weights connect every input to the output neurons.

Modeling SLFFNN
In a single layer neural network, the inputs x1, x2, · · · , xm are

connected to the layers of neurons through the weight matrix W . The
weight matrix Wm×n can be represented as follows.
w11 w12 w13 · · · w1n

. w21 w22 w23 · · · w2n .
w=
. . .. .
(1)
. .. ..
.wm1 wm2 wm3 · · · wmn .
The output of any k -th neuron can be determined asfollows.

.Σ Σ
Ok=f k
m i
=1 (wik xi ) + θk
where k = 1, 2, 3, · · · , n and θk denotes the threshold value of the k-th

neuron. Such network is feed forward in type or acyclic in nature and
hence the name.

Multilayer feed forward neural networks
This network, as its name indicates is made up of multiple layers.

Thus architectures of this class besides processing an input and
an output layer also have one or more intermediary layers called
hidden layers.
The hidden layer(s) aid in performing useful intermediary

computation before directing the input to the output layer.
A multilayer feed forward network with l input neurons (number of

neuron at the first layer), m1, m2, · · · , mp number of neurons at i-th
hidden layer (i = 1, 2, · · · , p) and n neurons at the last layer (it is
the output neurons) is written as l − m1 − m2 − · · · − mp − n
MLFFNN.

Figure shows a schematic diagram of multilayer feed forward neural

network with a configuration of l − m − n.
HIDDEN OUTPUT
INPUT
f 11 I21= f21 I31= f 31 o1

x1 I11= ɵ31
ɵ11 ɵ21
I22= f22 I32= f 32

o2
x2 I12= f 12
1 ɵ22
ɵ32
ɵ
2
………..
………..
………..
………..
I2m = f2m
I3n= f 3n on
I1l= f 11
ɵ2m
ɵ3 n
xp ɵl1

HIDDEN OUTPUT
INPUT
I31= f31
f11 I21= f
21
ɵ31
o1
x1 I11= ɵ11 ɵ21
I32= f32
x2 I12= f12 I22= 22
f ɵ32
o2
ɵ21 ɵ22
………..
………..
………..
………..
I2m= f2m
I3n= f3n on
I1l= f11
ɵm 2
ɵ3 n
xp ɵl1

In l − m − n MLFFNN, the input first layer contains l numbers
neurons, the only hidden layer contains m number of neurons and
the last (output) layer contains n number of neurons.
The inputs x1, x2, .....xp are fed to the first layer and the weight
matrices between input and the first layer, the first layer and the
hidden layer and those between hidden and the last (output) layer
are denoted as W 1, W 2, and W 3, respectively.
Further, consider that f 1, f 2, and f 3 are the transfer functions of
neurons lying on the first, hidden and the last layers, respectively.
Likewise, the threshold values of any i-th neuron in j-th layer is

denoted by θij.
Moreover, the output of i-th,.Σ j-th, and k -th
Σ neuron in any l-thlayer
is represented by O i = f i
l l Xi W + θ ,i where X isl the input
l l
vector to the l-th layer.

Recurrent neural network architecture
The networks differ from feedback network architectures in the

sense that there is at least one ”feedback loop”.
Thus, in these networks, there could exist one layer with feedback
connection.
There could also be neurons with self-feedback links, that is, the
output of a neuron is fed back into itself as input.

Recurrent neural network architecture
Depending on different type of feedback loops, several recurrent neural

networks are known such as Hopfield network, Boltzmann machine
network etc.
Training Perceptrons
For AND
X0 X1 X2 O
W0 = ? 0 0 0
0 1 0
X1 th= 0.0 1 0 0
W1 = ?
1 1 1
W2 = ?
X2
•What are the weight values?

•Initialize with random weight values
Summation = W0 + X1 W1 + X2 W2
16
Training Perceptrons
For AND
X0 X1 X2 O
W0 = -0.3
0 0 0
0 1 0
X1 th= 0.0
W1 = 0.5 1 0 0
1 1 1
W2 = -0.4
X2
X0 X1 X2 Summation Output
1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
17
Gradient Descent Learning Rule
Consider linear unit without threshold and

continuous output o (not just –1,1)
o=w0 + w1 x1 + … + wn xn
Train the wi’s such that they minimize the
squared error
E[w1,…,wn] = ½ dD (td-od)2
where D is the set of training
examples
18
Gradient Descent
Gradient:
Gradient Descent (w ,w )
E[w]=[E/w0,… E/wn]
1 2
w=- E[w] (w1+w1,w2 +w2)
wi=- E/wi
=- /wi 1/2d(td-od)2
= - /wi 1/2d(td-i wi xi)2
= d(td- od)(xi)
19
Gradient Descent
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t>
where (x1,…,xn) is the vector of input values, and t is the target output
value
Initialize each wi to some small random value
Until the termination condition is met, Do
Initialize each wi to zero
For each <(x1,…xn),t> in training_examples Do
Input the instance (x1,…,xn) to the linear unit and compute the
output o
For each linear unit weight wi Do
wi= wi + d(td- od) xi
20
Weight Updation
W0= -0.3 + [(0-0)1+(0-0)1+(0-1)1+(1-0)1]= -0.3
W1= 0.5 + [(0-0)0+(0-0)0+(0-1)1+(1-0)1]= 0.5
W2= -0.4 + [(0-0)0+(0-0)1+(0-1)0+(1-0)1]= 0.6
1 0 0 (-1*0.3) + (0*0.5) + (0*0.6) = -0.3 0
1 0 1 (-1*0.3) + (0*0.5) + (1*0.6) = 0.3 1
1 1 0 (-1*0.3) + (1*0.5) + (0*0.6) = 0.2 1
1 1 1 (-1*0.3) + (1*0.5) + (1*0.6) = 0.8 1
21
Weight Updation
W0= -0.3 + [(0-0)1+(0-1)1+(0-1)1+(1-1)1]= -2.3
W1= 0.5 + [(0-0)0+(0-1)0+(0-1)1+(1-1)1]= -0.5
W2= 0.6 + [(0-0)0+(0-1)1+(0-1)0+(1-1)1]= -0.4
1 0 0 (-1*2.3) + (-0*0.5) + (-0*0.4) = -2.3 0
1 0 1 (-1*2.3) + (-0*0.5) + (-1*0.4) = -2.7 0
1 1 0 (-1*2.3) + (-1*0.5) + (-0*0.4) = -2.8 0
1 1 1 (-1*2.3) + (-1*0.5) + (-1*0.4) = -3.2 0
22
Weight Updation
W0= -2.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -1.3

W1= -0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 0.5
W2= -0.4 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 0.6
1 0 0 (-1*1.3) + (0*0.5) + (0*0.6) = -1.3 0
1 0 1 (-1*1.3) + (0*0.5) + (1*0.6) = -0.7 0
1 1 0 (-1*1.3) + (1*0.5) + (0*0.6) = -0.8 0
1 1 1 (-1*1.3) + (1*0.5) + (1*0.6) = -0.2 0
23
Weight Updation
W0= -1.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -0.3

W1= 0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 1.5
W2= 0.6 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 1.6
1 0 0 (-1*0.3) + (0*1.5) + (0*1.6) = -0.3 0
1 0 1 (-1*0.3) + (0*1.5) + (1*1.6) = -1.3 0
1 1 0 (-1*0.3) + (1*1.5) + (0*1.6) = 1.2 1
1 1 1 (-1*0.3) + (1*1.5) + (1*1.6) = 2.8 1
24
Weight Updation
W0= -0.3 + [(0-0)1+(0-0)1+(0-1)1+(1-1)1]= -1.3

W1= 1.5 + [(0-0)0+(0-0)0+(0-1)1+(1-1)1]= 0.5
W2= 1.6 + [(0-0)0+(0-0)1+(0-1)0+(1-1)1]= 1.6
1 0 0 (-1*1.3) + (0*0.5) + (0*1.6) = -1.3 0
1 0 1 (-1*1.3) + (0*0.5) + (1*1.6) = 0.3 1
1 1 0 (-1*1.3) + (1*0.5) + (0*1.6) = -0.8 0
1 1 1 (-1*1.3) + (1*0.5) + (1*1.6) = 0.8 1
25
Weight Updation
W0= -1.3 + [(0-0)1+(0-1)1+(0-0)1+(1-1)1]= -2.3

W1= 0.5 + [(0-0)0+(0-1)0+(0-0)1+(1-1)1]= 0.5
W2= 1.6 + [(0-0)0+(0-1)1+(0-0)0+(1-1)1]= 0.6
1 0 0 (-1*2.3) + (0*0.5) + (0*0.6) = -2.3 0
1 0 1 (-1*2.3) + (0*0.5) + (1*0.6) = -1.7 0
1 1 0 (-1*2.3) + (1*0.5) + (0*0.6) = -1.8 0
1 1 1 (-1*2.3) + (1*0.5) + (1*0.6) = -1.2 0
26
Weight Updation
W0= -2.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -1.3

W1= 0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 1.5
W2= 0.6 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 1.6
1 0 0 (-1*1.3) + (0*1.5) + (0*1.6) = -1.3 0
1 0 1 (-1*1.3) + (0*1.5) + (1*1.6) = 0.3 1
1 1 0 (-1*1.3) + (1*1.5) + (0*1.6) = 0.2 1
1 1 1 (-1*1.3) + (1*1.5) + (1*1.6) = 1.8 1
27
Weight Updation
W0= -1.3 + [(0-0)1+(0-1)1+(0-1)1+(1-1)1]= -3.3
W1= 1.5 + [(0-0)0+(0-1)0+(0-1)1+(1-1)1]= 0.5
W2= 1.6 + [(0-0)0+(0-1)1+(0-1)0+(1-1)1]= 0.6
1 0 0 (-1*3.3) + (0*0.5) + (0*0.6) = -3.3 0
1 0 1 (-1*3.3) + (0*0.5) + (1*0.6) = -2.7 0
1 1 0 (-1*3.3) + (1*0.5) + (0*0.6) = -2.8 0
1 1 1 (-1*3.3) + (1*0.5) + (1*0.6) = -2.2 0
28
Weight Updation
W0= -3.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -2.3
W1= 0.5 + [(0-0)0+(0-0)0+(0-0)1+(1-0)1]= 1.5
W2= 0.6 + [(0-0)0+(0-0)1+(0-0)0+(1-0)1]= 1.6
1 0 0 (-1*2.3) + (0*1.5) + (0*1.6) = -2.3 0
1 0 1 (-1*2.3) + (0*1.5) + (1*1.6) = -0.7 0
1 1 0 (-1*2.3) + (1*1.5) + (0*1.6) = -0.8 0
1 1 1 (-1*2.3) + (1*1.5) + (1*1.6) = 0.8 1
29
Decision Surface of a Perceptron
x2 x2
+ + -
+
+ -
- x1
x1 - +
+ -
-
Linearly separable Non-Linearly separable
•But functions that are not linearly separable (e.g. XOR)

XOR can solved as:
XOR(x1, x2)= AND( OR(x1, x2), NAND(x1, x2))
30
Why different type of neural network architecture?
To give the answer to this question, let us first consider the case of a
single neural network with two inputs as shown below.
x2
w0
x1 w1
x1
w2
x2
f=w0ɵ + w1x1 + w2x2

=b0 + w1x1 + w2x2
Revisit of a single neural network
Note that f = b0 + w1x1 + w2x2 denotes a straight line in the plane
of x1-x2 (as shown in the figure (right) in the last slide).
Now, depending on the values of w1 and w2, we have a set of
points for different values of x1 and x2.
We then say that these points are linearly separable, if the straight
line f separates these points into two classes.
Linearly separable and non-separable points are further illustrated
in Figure.

AND and XOR problems
To illustrate the concept of linearly separable and non separable tasks

to be accomplished by a neural network, let us consider the case of
AND problem and XOR problem.
x1 Inputs x2 Output (y) x1 x2 Output (y)

0 0 0 0 0 0
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 0
AND Problem XOR Problem

AND problem is linearly separable
x2
x1 x2 Output (y) 1 0,1

1,1
0 0 0 y=0
Y=1
0 1 0
1 0 0 0,0
y=0 1,0
1 1 1 y=0
0
1 x1
f = 0.5 x1 + 0.5 x2 - 0.9
The AND Logic AND-problem is linearly separable

XOR problem is linearly non-separable
x2
0,1 1,1
x1 x2 Output (y) 1 y=1 y=0
0 0 0
0 1 1
1 0 1
y=0 y=1
1 1 0
0
x1
0,0 1,0
XOR Problem XOR-problem is non-linearly separable

Our observations
From the example discussed, we understand that a straight line is

possible in AND-problem to separate two tasks namely the output as 0
or 1 for any input.
However, in case of XOR problem, such a line is not possible.
Note: horizontal or a vertical line in case of XOR problem is not
admissible because in that case it completely ignores one input.

Example
So, far a 2-classification problem, if there is a straight line, which

acts as a decision boundary then we can say such problem as
linearly separable; otherwise, it is non-linearly separable.
The same concept can be extended to n-classification problem.

Such a problem can be represented by an n-dimensional space
and a boundary would be with n − 1 dimensions that separates a
given sets.
In fact,any linearly separable problem can be solved with a single

layer feed forward neural network. For example, the AND problem.
On the other hand, if the problem is non-linearly separable, then a

single layer neural network can not solves such a problem.
To solve such a problem, multilayer feed forward neural network is
required.
Example: Solving XOR problem
X1 1 0.5 1
1 f
0.5
1
-1
X2 1.5
1
Neural network for XOR-problem

Dynamic neural network
In some cases, the output needs to be compared with its target

values to determine an error, if any.
Based on this category of applications, a neural network can be

static neural network or dynamic neural network.
In a static neural network, error in prediction is neither calculated

nor feedback for updating the neural network.
On the other hand, in a dynamic neural network, the error is

determined and then feed back to the network to modify its
weights (or architecture or both).

TARGET
NEURAL NETWORK ERROR
INPUTS OUTPUT
ARCHITECTURE CALCULATION
FEED BACK
Adjust weights /
architecture
Framework of dynamic neural network

From the above discussions, we conclude that

For linearly separable problems, we solve using single layer feed
forward neural network.
For non-linearly separable problem, we solve using multilayer feed

forward neural networks.
For problems, with error calculation, we solve using recurrent

neural networks as well as dynamic neural networks.

ANN Architecture

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ANN Architecture

Uploaded by

Copyright:

Available Formats

Artificial Neural Network : Architectures

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 1 / 27

There are three fundamental classes of ANN architectures:

Single layer feed forward architecture

Multilayer feed forward architecture

Recurrent networks architecture

Before going to discuss all these architectures, we first discuss the

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 2 / 27

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 3 / 27

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 4 / 27

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 6 / 27

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 7 / 27

We see, a layer of n neurons constitutues a single layer feed

This is so called because, it contains a single layer of artificial

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 8 / 27

In a single layer neural network, the inputs x1, x2, · · · , xm are

w11 w12 w13 · · · w1n

The output of any k -th neuron can be determined asfollows.

where k = 1, 2, 3, · · · , n and θk denotes the threshold value of the k-th

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 9 / 27

This network, as its name indicates is made up of multiple layers.

The hidden layer(s) aid in performing useful intermediary

A multilayer feed forward network with l input neurons (number of

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 10 / 27

Figure shows a schematic diagram of multilayer feed forward neural

f 11 I21= f21 I31= f 31 o1

I22= f22 I32= f 32

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 11 / 27

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 12 / 27

Likewise, the threshold values of any i-th neuron in j-th layer is

vector to the l-th layer.

The networks differ from feedback network architectures in the

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 14 / 27

Depending on different type of feedback loops, several recurrent neural

•What are the weight values?

Consider linear unit without threshold and

w=- E[w] (w1+w1,w2 +w2)

W0= -2.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -1.3

W0= -1.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -0.3

W0= -0.3 + [(0-0)1+(0-0)1+(0-1)1+(1-1)1]= -1.3

W0= -1.3 + [(0-0)1+(0-1)1+(0-0)1+(1-1)1]= -2.3

W0= -2.3 + [(0-0)1+(0-0)1+(0-0)1+(1-0)1]= -1.3

•But functions that are not linearly separable (e.g. XOR)

f=w0ɵ + w1x1 + w2x2

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 32 / 27

To illustrate the concept of linearly separable and non separable tasks

x1 Inputs x2 Output (y) x1 x2 Output (y)

AND Problem XOR Problem

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 33 / 27

x1 x2 Output (y) 1 0,1

The AND Logic AND-problem is linearly separable

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 34 / 27

XOR Problem XOR-problem is non-linearly separable

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 35 / 27

From the example discussed, we understand that a straight line is

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 36 / 27

So, far a 2-classification problem, if there is a straight line, which

The same concept can be extended to n-classification problem.

In fact,any linearly separable problem can be solved with a single

On the other hand, if the problem is non-linearly separable, then a

Neural network for XOR-problem

Debasis Samanta (IIT Kharagpur) Soft Computing Applications 29.01.2016 38 / 27

In some cases, the output needs to be compared with its target

Based on this category of applications, a neural network can be