You are on page 1of 19

# A MATLAB BASED APPOARCH FOR TESTING LINEAR

## CLASSIFICATION OF A NEURAL NETWORK AND SOLVING

NONLINEARITY PROBLEM (XOR PROBLEM) OF NEURAL
NETWORK
AL AMIN SHOHAG1 , Dr .UZZAL KUMAR ACHARJA 2

## CSE Department, Jagannath University, Dhaka

ABSTRACT
Neural network is the first and foremost step in machine learning. It provides the entire basis for
a machine to act like a human. It is prerequisite for a machine to take different categories of data
from analog world. But most of analog world data is non-linear. This non-linearity of analog data
raises a problem for neural network. A neural network classifies dataset linearly. That is, it can
only handle problems which are linearly classified. Thus, it is a necessity for neural network to
have a way to solve non-linearity. In this piece of work, we are going test this linearity
characteristic a neural network using OR, AND operation dataset which are linear. And we will
discuss nonlinearity problem for neural network using XOR dataset. At the end, we will solve
this problem of non-linearity and demonstrate it using MATLAB.

KEYWORDS
Neural Network, Linearity, Perceptron, Back propagation algorithm, XOR, MATLAB

1. Introduction
Neural network is an artificial network which tries to mimic a neural network of human brain.
Neural network of human brain consists of many neurons. Similarly an artificial neural network
consists of many artificial neurons. Thus, it produces almost similar result of a neural network of
human brain.

## 1.1 Artificial neural network

A neural network is a massively parallel distributed processor that has a natural propensity for
storing experiential knowledge and making it available for use. In its most general form, a neural
network is a machine that is designed to model the way in which the brain performs a particular
task or function of interest. The network is usually implemented using electronic components or
simulated in software or on a digital computer. In most cases the interest is confined largely to an
important class of neural network that performs useful computations through a process of
learning. It resembles the brain in two respects:

## 1. Knowledge is acquired by the network through a learning process.

2. Interneuron connection strengths known as synaptic weights are used to store the
knowledge.

## 1. Massively parallel distributed structure

2. Generalization

However, an artificial neural network consists of many neurons. The basic model of neuron
consists of many synaptic inputs in association with synaptic weights, a summing junction to
produce sum of products of synaptic weights and inputs and an activation function to limit the
output of the neuron. A basic model of neuron is shown below:

## Fig: An Artificial Neuron

1.2 Linearity

A neural network takes a problem and tries to generalize this problem into classes. This approach
of generalize into classes is linearity approach of neural network. It tries to draw a single or
multiple linear lines to produce multiple classes of dataset based on similar feature of the
problem dataset. For example, an AND as well as an OR operation has input dataset below:

A B A AND B
0 0 0
0 0 0
1 0 0
1 1 1
Fig: AND operation

A B A OR B
0 0 1
0 1 1
1 0 1
1 1 0

Fig: OR operation

With this data set a neural network will try to classify the output dataset into two classes. It will
produce a linear boundary line. One side of boundary line will contain all zeros for AND, all
ones for OR operation. On the other side of the boundary it will contain only 1 for AND, only 0
for OR operation. For the classification of this dataset, data set will need a single layer
perceptron only.

But in case of XOR where the output is non-linear, a single perceptron cannot produce a linear
classification. Dataset for XOR is shown below:

A B A XOR B
0 0 1
0 1 0
1 0 0
1 1 1

## Fig: XOR dataset

In this case, a multilayer perceptron is needed. We will see how a multilayer perceptron can solve
this problem in later sections.
2. Perceptron
A perceptron is the simplest form of a neural network used for the classification of a special type
of datasets said to be linearly separable. A perceptron is shown below:

Fig: perceptron

In the case of an elementary perceptron, there two decision regions separated by a hyper plane
defined by the equation below:
l

wki x i=0
i=1

w ki xi
Where are the synaptic weights and are the inputs. is the threshold value. For
example, a single layer perceptron can classify OR and AND dataset linearly. Because these
datasets are linearly separable.

x2

1 1 1

0 0

x1
0 1
x1 , x2
Fig: OR ( )

x2

0 1

0 1 1

x1
0 1

x1 , x2
Fig: AND (

But it cannot classify problems which are not linearly separable such as XOR dataset.

x2

1 1

0 1

0 0

x1
0 1

Fig: XOR

As we can see dataset is not linearly separable. To solve this problem we need multilayer
perceptron. In the next section we will discuss multilayer perceptron and how it solves this
problem using back-propagation algorithm.
3. Multilayer perceptron
A multilayer perceptron has one input dataset and one or many hidden layer and one output layer.
A multilayer perceptron is shown below:

( x 1 , x2)

1 1

1 0

0 0

0 1 ( x1 , x 2)

Fig: XOR

## 3.1Back Propagation Algorithm

Multilayer perceptrons have been applied successfully to solve some difficult and diverse
problems by training them in a supervised manner with a highly popular algorithm known as the
error back-propagation algorithm. This algorithm is based on the error-correction learning rule.
Error back propagation learning consists of two passes through the different layers of the
network, a forward pass and a backward pass.

In the forward pass, an activity pattern is applied to the sensory nodes of the network and its
effect propagates through the network layer by layer. Finally, a set of outputs is produced as the
actual response of the network. During the forward pass, the synaptic weights of the networks
are all fixed.

During the backward pass, on the other hand, the synaptic weights are all adjusted in accordance
with an error-correction rule. Specifically, the actual response of the network is subtracted from a
desired response to produce an error signal. This error signal is then propagated backward
through the network against the direction of the synaptic connections- the name error back
propagation. The synaptic weights are adjusted to make the actual response of the network move
closer to the desired response in a statistical sense.

## e j ( n )=d j ( n ) y j (n) ; Neuron j is an output node

1 2
e ( n)
We define the instantaneous value of the error energy for neuron j as 2 j .

1 2
E(n) is obtained by summing e (n)
Correspondingly, the instantaneous value 2 j over all

neurons in the output layer; these are the only visible neurons for which error signals can be
calculated directly. We may thus write

1 2
E ( n )=
2 j c
e j (n)

The instantaneous error energy E(n) and therefore the average error energy Eav , is a

Eav
function of all the free parameters of the network. For a given training set, represents the
cost function as measure of learning performance, the objective of the learning process is to
E
adjust the free parameters of the network to minimize av . For this we consider a simple
method of training in which the weights updated on a pattern- by-pattern basis until one epoch
that is one complete presentation of the entire training set has been dealt with. The adjustments to
the weights are made in accordance with the respective errors computed for each pattern
presented to the network.

The induced local field V j (n) produced at the input of the activation function associated with
neuron j is therefore
m
V j ( n ) = w ji ( n ) y i (n)
i=0

w j0
Where m is the total number of inputs applied to neuron j. The synaptic weight equals the

bias by applied to neuron j. Hence the function signal y j (n) appearing at the output of the

neuron j at iteration n is

y j ( n )= j( v j ( n ) )

The back propagation algorithm applies a correction w ji (n) to the synaptic weight w ji (n)

E( n)
.
, which is proportional to the partial derivative w ji (n) According to the chain rule of

## E(n) E( n) e j (n) y j (n) v j (n)

calculus; we may express the gradient as =
w ji (n) e j (n) y j (n) v j ( n) w ji (n)

E( n)
The partial derivative w ji (n) represents a sensitivity factor, determining the direction of

## search in weight space for the synaptic weight w ji (n) .

E(n)
Now, let us calculate the parameters of the partial derivative w ji (n) in the equation above:

E(n)
=e (n)
e j (n) j

And
e j (n)
=1
y j (n)

And

V j (n)
y j (n) '
=
v j (n)

## And for the last parameter we have:

v j (n)
= y i (n)
w ji (n)

E( n)
Thus, the partial derivative w ji (n) becomes:

E( n) '
=e j (n) (V j ( n ) ) y i(n)
w ji (n)

n
The correction w ji (n) applied to w ji ) to w ji (n) be defined by the delta rule:

E( n)
w ji ( n )=
w ji (n)

## Where is the learning rate parameter of the back-propagation algorithm.

Accordingly

w ji ( n )= j ( n ) y i (n)

## Where the local gradient j ( n ) is defined by

E (n) '
j ( n )= =e j (n) (V j ( n ) )
v j (n)
The local gradient points to required changes in synaptic weights.

## The back propagation algorithm can be summarized as follows:

1. Initialization: Assuming that no prior information is available, pick the synaptic weights
and thresholds from a uniform distribution whose mean is zero and whose variance is
chosen to make the standard deviation of the induced local fields of the neurons lie at the
transition between the linear and saturated parts of the sigmoid activation function.

2. Presentation of training examples: Present the network with an epoch of the training
examples for each example in the set, ordered in some fashion; perform the sequence of
forward and backward computations described under points 3 and 4 respectively.

## 3. Forward Computation: Let a training example in the epoch be denoted by (x(n),d(n)),

with the input vector X(n) applied to the input layer of sensory nodes and the desire
response vector d(n) presented to the output layer of sensory nodes and the desired
presented to the output layer of computation nodes. Compute the induced local field and
function signals of the network by proceeding forward through the network layer by
l
layer. The induced local field v j ( n ) for neuron j in layer is
m0
v ( n )= w lji ( n ) y l1
l
j i (n)
l=0

Where y l1
i (n) is the output signal of the neuron I in the previous layer (l1) at

iteration n.
l
w ji (n) is the synaptic weight of neuron j in layer l that is fed form neuron

y l1 1 l
i in the layer l1 . For i=0 , we have 0 ( n ) =+1w j 0 ( n ) =b j ( n) is the bias

applied to neuron j in layer l . Assuming the use of a sigmoid function, the output

## signal at neuron j in layer l is

y 0j ( n )=x j (n)

Where x j (n) is the jth element of the input vector X (n) . If neuron j is in
the output layer set

y Lj =O j (n)
Compute the error signal:

e j ( v )=d j ( n )O j (n)

Where d i (n) is the jth element of the desired response vector d (n) .

## 4. Backward computation: Compute the

L ; l ' l l +1 l+1
e j ( n ) Q j (v j ( n ))for neuron j at output layer l j (v j ( n ) ) k ( n ) w kj (n)
k
l ]
( n )=
j

The synaptic weights of the network in layer according to the generalized delta rule:

w lji ( n1 ) + lj (n) y l1
i ( n)
l l
w ji ( n+ 1 )=w ji ( n )+

## Where isthe l earningrate parameter is themomentum constant .

5. Iteration: Iterate the forward and backward computations under points 3 and 4 by
presenting new epochs of training examples to the network until the stopping criterion is
met.

## 3.2XOR Problem Solution

We may solve the XOR problem by using a single hidden layer with two neurons. The signal-
flow graph of the network is shown below. The following assumptions are made here:

## 1. Each neuron is represented by a McCulloch-Pitts model which uses a threshold function

for its activation function.
2. Bits 0 and 1 are represented by the levels 0 and 1 respectively.

Fig: Signal flow graph of the network for solving XOR problem

w 11=w12=+1

And

3
b1=
2

w 21=w22=+1

And

1
b2=
2

w 31=2

w 32+1

1
b3 =
2

(0, 1) (1, 1)

(0, 1) (1, 1)

(0, 1) (1, 1)

## (0, 0) Fig: (c) (1, 0)

3.3MATLAB Demonstration
In MATLAB demonstration we will test linearity for AND as well as OR dataset with a
perceptron. We will also test test non-linearity for XOR dataset for a perceptron. Later, we will
see how a multilayer perceptron can solve this non-linearity problem for XOR dataset. We will
be using regression plot for all of these purposes.

## 3.2.1 OR Dataset test for single perceptron with no hidden layer

MATLAB code for OR dataset is given below:
clc;
close all;
x=[0 0;0 1; 1 1; 1 0];
i=x'
t=[0 1 1 1];
net=perceptron;
view(net);
net=train(net,i,t);
y=net(i);
plotconfusion(t,y);
Confusion is shown below:

## Fig: Confusion for OR dataset

3.2.2 AND Dataset test for single perceptron with no hidden layer
MATLAB code for OR dataset is given below:
clc;
close all;
x=[0 0;1 1; 0 1; 1 0];
i=x'
t=[0 0 0 1];
net=perceptron;
view(net);
net=train(net,i,t);
y=net(i);
plotconfusion(t,y);
Confusion is shown below:

## Fig: Confusion for AND dataset

3.2.3 XOR Dataset test for single perceptron with no hidden layer
MATLAB code for XOR dataset is given below:
clc;
close all;
x=[0 0;1 1; 0 1; 1 0];
i=x'
t=[1 1 0 0];
net=perceptron;
view(net);
net=train(net,i,t);
y=net(i);
plotconfusion(t,y);
Confusion is shown below:

## Fig: Confusion for XOR dataset

As we can see from confusion XOR dataset is non-linearly classified for all the targeted output.
So, a single perceptron with no hidden layer cannot solve an XOR problem. Now let us see if a
single perceptron with one hidden layer can solve this problem.

3.2.3 XOR Dataset test for single perceptron with a hidden layer and back
propagation training algorithm
MATLAB code for XOR dataset is given below:
clc;
close all;
x=[0 0;1 1; 0 1; 1 0];
i=x'
t=[1 1 0 0];
net=feedforwardnet(1,'trainrp');
view(net);
net=train(net,i,t);
y=net(i);
plot(y,t);
plotconfusion(t,y)
Confusion is shown below:

As we can see we get linear classification for XOR data set, thus solving the problem for a
perceptron with no hidden layer.

4. Conclusion
We have successfully showed the incapability of a single perceptron with no hidden layer cannot
classify the XOR dataset linearly. We have also successfully showed that this problem can be
solved by using a perceptron with a single hidden layer and using back propagation training
algorithm.
5. Reference
[1]. Neural Network: A comprehensive foundation By Simon Haykin, Mc Master, University