Professional Documents
Culture Documents
IN ELECTRONIC ENGINEERING
PROJECT REPORT
I would like to thank Ms Jennifer Bruton for her time and invaluable guidance during this
project. I would also like to thank my friends and family for their ordinary laughter and
most especially my mom for “hanging in there” with me.
Declaration
I hereby declare that, except where otherwise indicated, this document is entirely my own
work and has not been submitted in whole or in part to any other university.
ii
Abstract
This project outlines the development of a neural network model for system identification. It
traces the growth of neural networks from their humble beginnings as single-layer
perceptrons to neural network models. Both multi-layer and recurrent networks models are
examined and their merits as system identifiers discussed. The system chosen as a basis for
the empirical data collection is the anti-lock brake system, which exhibits highly non-linear
behaviour and lends itself to neural network modelling for system identification purposes.
The backpropagation algorithm is used in the development of the neural network. Until
recently, backpropagation neural networks made up 80% of all neural network applications
[1]. The use of backpropagation has declined due to the relatively long required training
times for the iterative algorithm. Genetic algorithms are discussed as a possible alternative.
iii
Table of Contents
Acknowledgements..................................................................................................................ii
Declaration...............................................................................................................................ii
Abstract ...................................................................................................................................iii
Table of Contents....................................................................................................................iv
Introduction.............................................................................................................................. 1
1.1 Artificial Neural Networks ........................................................................................ 1
1.1.1 Background ......................................................................................................... 1
1.1.2 How the Human Brain Learns ............................................................................ 2
1.2 Artificial Neuron and Activation Function................................................................ 3
1.2.1 Linear Activation Function................................................................................. 3
1.2.2 Non - Linear Activation Functions ..................................................................... 4
1.2.3 Neural Network Matlab Toolbox........................................................................ 6
1.3 Summary .................................................................................................................... 6
The Perceptron......................................................................................................................... 7
2.1 Implementing a Single Layer Perceptron in Matlab.................................................. 8
2.1.1 Single Layer Perceptron Designed without Neural Network Toolbox............... 8
2.1.2 Designing and Training using the Neural Network Toolbox ............................. 9
2.2 Multi-Layer Perceptron............................................................................................ 10
2.2.1 Implementing a Multi-Layer Perceptron in Matlab - XOR Classification ....... 12
2.3 Summary .................................................................................................................. 16
iv
4.2.2 Loading Data..................................................................................................... 20
4.3 Summary .................................................................................................................. 21
References.............................................................................................................................. 44
Appendix 1............................................................................................................................. 45
Appendix 2............................................................................................................................. 46
Appendix 3............................................................................................................................. 47
v
Table of Figures
FIGURE 1 COMPONENTS OF BIOLOGICAL NEURON [2] ................................................................................2
FIGURE 2 COMPONENTS OF THE SYNAPSE [2].............................................................................................3
FIGURE 3 LINEAR ACTIVATION FUNCTION, EQUATION 1.............................................................................4
FIGURE 4 LOG SIGMOID ACTIVATION FUNCTION, EQUATION 2 ...................................................................5
FIGURE 5 TAN-SIGMOID ACTIVATION FUNCTION, EQUATION 3...................................................................5
FIGURE 6 SINGLE LAYER PERCEPTRON ARCHITECTURE [6] .......................................................................7
FIGURE 7 INPUT VECTORS OF THE SLP PLOTTED........................................................................................9
FIGURE 8 CLASSIFICATION PLOT WITH NEW INPUT CORRECTLY PLOTTED IN RED. ....................................10
FIGURE 9 MULTI-LAYER PERCEPTRON ARCHITECTURE [6].......................................................................11
FIGURE 10 OUTPUT PLOT OF XOR INPUTS INTO SLP, WHICH WAS UNABLE TO PERFORM CLASSIFICATION12
FIGURE 11 MODEL OF MLP THAT SOLVES EXOR CLASSIFICATION DIFFICULTIES ...................................13
TABLE 1 THE XOR TRUTH TABLE ............................................................................................................13
TABLE 2 TRUTH TABLE FOR THE NEURON WITH STRONG NEGATIVITY N1 AND THE NEURON WITH STRONG
POSITIVITY N2 [5]. .........................................................................................................................13
FIGURE 12 OVERALL CLASSIFICATION OF XOR PROBLEM [5]..................................................................14
FIGURE 13 TRAINING OF THE XOR NETWORK, MEAN SQUARE ERROR PLOT OVER 70 EPOCHS [5] ............15
FIGURE 14 TRAINING OF XOR NETWORK WITH PERFORMANCE GOAL MET, MEAN SQUARE ERROR PLOT UNTIL
CONVERGENCE [5]..........................................................................................................................15
FIGURE 15 ABS MODEL WITH PSEUDO RANDOM BINARY SEQUENCE INPUT .............................................18
FIGURE 16 PSEUDO RANDOM BINARY SEQUENCE.....................................................................................19
FIGURE 17 A VISUAL REPRESENTATION OF INPUT AND OUTPUT DATA ......................................................20
FIGURE 18 PARALLEL IDENTIFICATION MODEL [3]...................................................................................23
FIGURE 19 SERIES-PARALLEL IDENTIFICATION MODEL [3].......................................................................24
FIGURE 20 TRAINING PLOT OF TRAINGD ..................................................................................................26
FIGURE 21 TRAINGDM PLOT WITH MOMENTUM CONSTANT OF 0.9 ...........................................................27
FIGURE 22 TRAINGDM WITH MU=0, TRAINING PLOT SIMILAR TO TRAINGD PLOT AS WEIGHT CHANGE BASED ON
GRADIENT ......................................................................................................................................28
FIGURE 23 TRAINING PLOT OF TRAINGDA................................................................................................29
FIGURE 24 VARIABLE LEARNING RATE PLOTTED AGAINST EACH EPOCH ITERATION ................................29
FIGURE 25 TRAINLM PERFORMANCE TRAINING PLOT ...............................................................................30
FIGURE 26 TRAINING, TESTING AND VALIDATION DATA PLOT USING TRAINLM, HIGHLIGHTING OVER-FITTING 31
FIGURE 27 TRAINING, TESTING AND VALIDATION DATA PLOT USING TRAINGDM, HIGHLIGHTING OVER-FITTING 32
FIGURE 28 POST TRAINING ANALYSIS PLOT FOR TRAINLM ALGORITHM. ..................................................33
FIGURE 29 POST TRAINING ANALYSIS PLOT FOR TRAINDGM ALGORITHM. ...............................................34
FIGURE 31 POOR PERFORMANCE OF RECURRENT NEURAL NETWORK WITH TRAINLM ALGORITHM ..........38
FIGURE 32 TRAINGDX ALGORITHM PERFORMANCE PLOT .........................................................................39
FIGURE 33 DETERIORATED PERFORMANCE OF RECURRENT NEURAL NETWORK .......................................40
FIGURE 34 SUM SQUARE ERROR PLOT OF RECURRENT NETWORK WITH PRE AND POST PROCESSING IMPLEMENTED
.......................................................................................................................................................41
FIGURE 35 THE BASIC CONCEPTS BEHIND GENETIC ALGORITHMS [7] .......................................................43
vi
Chapter 1
Introduction
System identification using both conventional and neural network systems is the
development of a mathematical model of a dynamic system based on empirical data.
Choice of identifier structure is based on well-established results in linear systems theory
and can be applied in the development of non-linear neural networks identifiers with great
success. This is the basis of neural network system identification solutions and the
technique applied in the anti-lock braking system identification model. Before neural
network system identification and its merits is examined, artificial neural networks and their
concepts will be described. The background and concepts behind artificial neural networks
is discussed along with the development of these simple structures into more complex
recurrent neural networks.
1.1.1 Background
Artificial Neural Networks (ANN) can be likened to collections of identical mathematical
models that emulate some of the observed properties of biological nervous systems and
draw on the analogies of adaptive biological learning. The key element of an Artificial
Neural Network is its structure. It is composed of a number of interconnected processing
elements tied together with weighted connections, which take inspiration from biological
neurons. Learning like in a biological system takes place through training, or exposure to a
set of input and output data where the training algorithm adjusts the weights iteratively.
Artificial Neural Networks are good pattern recognition engines and robust classifiers, with
the ability to make decisions about imprecise input data. This ability makes them extremely
useful as a medical analysis tool. There is no need to provide a specific algorithm on how
to identify the disease when using a neural network. Neural networks learn by example so
the details of how to recognise the disease are not needed. What is needed is a set of
examples that is representative of all the variations of the disease. The quality of examples
is not as important as the 'quantity'. Artificial Neural Networks for this reason are used
1
extensively for system modelling where the physical processes are not understood fully or
are highly complex.
The structure of the human brain neuron is the template for artificial learning. However,
lack of knowledge leads to approximations and assumptions of the general architecture of
an artificial neural network. The knowledge of neurons is incomplete and computing power
is limited so models are often idealisations of real networks of neurons.
2
Figure 2 Components of the synapse [2]
f(x)=x equation 1
3
Figure 3 Linear activation function, equation 1
There are several types of non-linear activation functions; the two most common are the
log-sigmoid transfer function and the tan-sigmoid transfer function. Plots of these
differentiable, non-linear activation functions are illustrated in figures 4 & 5. They are
commonly used in networks trained with backpropagation. The networks referred to in this
project are generally backpropagation models and they mainly use log-sig and tan-sig
activation functions. The logistic activation function; it is defined by the equation
Logsig(x)=1/(1+exp(-βx)) equation 2
β=1 though it can be changed which in turn changes the shape of the sigmoid. As β tends
toward infinity it behaves more and more like a hard-limiter where the slope of the sigmoid
is zero. In this case where the slope is not zero, the output range is contained between 0 and
1.
4
Figure 4 Log sigmoid activation function, equation 2
5
1.2.3 Neural Network Matlab Toolbox
These activation functions were built using Matlab. MATLAB, which stands for matrix
laboratory is an interactive system, which was originally written as software for matrix
computation. It has evolved into a testing and analysis research tool used in engineering,
mathematics and science. Matlab toolboxes are collections of functions used to solve
particular classes of problems. In this project the Matlab Neural Network Toolbox is used
to build, train and test system identification neural network models.
1.3 Summary
Artificial neural networks use the CNS of living creatures as a basis for system architecture.
This architecture is used as the basis for artificial structures called artificial neural networks.
This development of an artificial neural network requires an activation function, which is
either linear or non-linear. This function changes the activation level of a unit into an
output signal. This activation function must be applied to all neural networks including the
single layer perceptron.
6
Chapter 2
The Perceptron
A single layer perceptron (SLP) is the simplest form of artificial neural network that can be
built. This chapter discusses the single layer perceptron in detail. It consists of one or more
artificial neurons in parallel. Each neuron in the single layer provides one network output,
and is usually connected to all of the external inputs (Figure 6). The diagram below
illustrates a very simple neural network; it consists of a single neuron in the output layer.
There are n neurons in the input layer; each circle represents a neuron. The total input
stimuli to the neuron in the output layer is
n
zin = ∑ xi wi = x0 w0 + x1w1 + x2 w2 + ...xn wn equation 4
i =0
y= Output of the neuron = f(zin) The input x0 is a special input, referred to as the bias
input. Its value is normally fixed at +1. Its associated weight w0 is referred to as the bias
weight.
7
2.1 Implementing a Single Layer Perceptron in Matlab
e = error
µ = fixed value
Γ = t arg et
A hard limit activation function is used to calculate the y(k). This activation function is a
threshold activation function; it is implemented in Matlab code using the sign function. The
activation function limits the output between one and minus one. When an element is fed
through this function it returns a one if the element is greater than zero; or zero if it equals
zero; and minus one if it is less than zero. A zero output will never be produced if the target
is never set to zero. This network is in effect a binary output perceptron. It can only
classify input patterns that are linearly separable. Frank Rosenblatt first developed this
perceptron architecture in 1958 [3].
8
2.1.2 Designing and Training using the Neural Network Toolbox
Although it has been shown that a neural network can be implemented without the use of
the neural network toolbox this network is limited in its applications. Building a single
layer neural network is most successfully done with the toolbox using the function newp()
[Appendix 2]. This function has a default hard limit activation function. Using newp()
inputs can be classified according to Boolean AND logic (Figure 7). Inputs are fed into the
newly created neural network and targets are applied – these targets are based on the outputs
of an AND gate. The network is first trained with the inputs and classification does not take
place. This model is based on a demonstration model in the toolbox itself called demop1.
The network trains so that it behaves like an AND gate. The outputs are linearly separable
so the network can classify them as a one or a zero like binary logic. A classification line is
drawn across the linear plane, which is shown in blue in figure 8. If a new input is applied
the newly trained network is simulated and classification of this new point occurs. In this
case the new input is [0.7; 1.2], it is correctly classified as a one and shown in red on the
right side of the classification line.
9
Figure 8 Classification plot with new input correctly plotted in red.
After one training cycle of the network the correct classification is not always achieved. It
can take several training cycles or epochs to modify the weights until the correct
classification of the problem is achieved. This design structure is the basis for all artificial
neural networks. The SLP leads to the creation of multi-layer perceptrons, which are
structures of multiple single layer perceptrons.
A multi-layer perceptron builds on the architecture of the single layer perceptron. The
single layer perceptron is not very useful because of its limited mapping ability; it is only
really applicable to linearly separable inputs. It will fail if the inputs are not linearly
separable. The SLP however, can be used as a building block for larger, much more
practical structures. Using multi-layer architectures, non-binary activation functions and
more complex training algorithms mean the limitations of a simple perceptron may be
overcome. A typical multi-layer perceptron (MLP) network consists of a set of source
nodes forming the input layer, one or more hidden layers of computation nodes, and an
output layer of nodes illustrated in figure 9. The input signal propagates through the
10
network layer by layer. The computations performed by this feed forward network with a
single hidden layer, non-linear activation functions and a linear output layer, can be written
mathematically as
x = f ( s) = Bϕ ( As + a ) + b equation 8
• s = inputs
• x = outputs
• A = matrix of weights of the first layer
• a = bias vector of the first layer
• B= weight matrix of second layer
• b=bias vector of second layer
• ϕ= non-linearity function.
11
neurons will ensure optimum network convergence and if the weight matrix that
corresponds to that error goal can be found. These solutions are unique to each neural
network and the input and output data applied [4]. To begin with the MLP architecture is
applied to the EXOR problem. Historically, it was this problem that first exhibited the
limitations of the SLP and also led to the development of more complex multi-layer
perceptrons. Minsky and Papert (1969) believed that in their, “… intuitive judgement the
extension (to multi-layer systems would be) sterile”. This opinion was based on the
inability of the SLP to classify the EXOR problem and other such linearly non-separable
problems [6].
Figure 10 Output plot of XOR inputs into SLP, which was unable to perform classification
A new MLP must be built using the newff() function [5]. This creates a new network
function, which has an input layer, a hidden layer and an output layer (Figure 11)
X1
12
1
3 y
Figure 11 Model of MLP that solves EXOR classification difficulties
The essence of this problem is to build a perceptron network that takes two Boolean inputs
and outputs the XOR of them. The XOR truth table is shown below in table 1.
X1 X2 Desired
Outputs
0 0 0
0 1 1
1 0 1
1 1 0
Table 1 The XOR truth table
The first neuron is designed with strong negativity. The second neuron is designed with
strong positivity and the third neuron must discriminate between the two of them
X1 X2 N1 N2 Y
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 0 1
Table 2 Truth table for the neuron with strong negativity N1 and the neuron with strong positivity N2 [5].
13
This problem is now linearly separable and classification can be achieved. Matlab produces
a plot of the overall classification (Figure 12). There is a classification line through (0,0) to
(1,1) indicating the output is 0 for both of these inputs and another classification line
through (0,1) to (1,0) indicating that both of these inputs produce 1 as an output.
The Neural Network does not automatically classify the inputs correctly. When the input
data was first applied the output was incorrect, the network had to be trained to recognise
the inputs and perform as an XOR gate. The training of the data takes place over 70 epochs.
Figure 13 highlights that the minimum gradient has been reached and the performance goal
was not met.
14
Figure 13 Training of the XOR network, mean square error plot over 70 epochs [5]
The network is trained again. This time a specific goal is set for the network to achieve.
This goal is 0.0037^2. It only takes four epochs for this goal to be achieved and correct
classification then takes place (Figure 14). The hidden layer behaves like a little black box,
hence its name, hidden layer. Its behaviour is hidden from view and can only be
approximated so it may behave slightly differently each time the network and training
algorithm are run, every time producing different results.
Figure 14 Training of XOR network with performance goal met, mean square error plot until convergence [5]
15
2.3 Summary
The single layer perceptron is the simplest form of artificial neural network. It is possible to
implement the SLP without the neural network toolbox but this perceptron is not as
powerful as one created with the toolbox. Using the toolbox the single layer perceptron can
perform classification on linear inputs. The next step is to extend the single layer
perceptron architecture to solve more challenging problems.
This leads to the development of multi-layer perceptrons whose structure can be applied to
more difficult problems, which the SLP cannot solve. The XOR classification problem is
one such example. It clearly illustrates the benefits of multi-layer perceptrons in the
solution classification problems. An MLP can classify non-linear problems successfully.
The next step is to apply multi-layer perceptrons to system identification. The system
chosen to test MLP applications is the Anti-Lock Braking System (ABS).
16
Chapter 4
This ability to steer during braking is the one of the main benefits of ABS; in a hard braking
situation without the ABS the wheels may skid and at times lose traction between the tires
and road, which could result in accidents. Neural Networks have already been used with
great success to develop a genetic neural fuzzy controller. This controller finds the optimal
wheel slips that maximize the road adhesion coefficient [7]. The Anti-lock brake system
lends itself to neural network modelling and Fuzzy Logic Control because of its need to
constantly alter its response to variations of inputs [8]. It exhibits highly non-linear
behaviour; also artificial neural modelling of ABS results in applications implemented in
the real world. For this reason and also because an externally controlled input can be
applied in the form of a Pseudo-Random Binary Sequence (PRBS) the ABS system is
chosen as a model. This model provides input and output data that is used in an artificial
neural network built with the Matlab neural network toolbox.
17
4.1.2 The Simulink Model
This ABS model is a simplified model of a normal ABS design (figure 15). This model
captures the essential features of the process and it is reasonable to assume that this model
behaves as a real ABS would [7]. It is possible to develop a set of input equations to model
this system. Recent studies of the ABS model derive these equations in full from tractive
forces and normal forces acting on the tyres and other elements like adhesion and angular
velocity [7]. The model used for this recent research is very similar (albeit more simplistic)
to the model used in this project, which is shown below.
18
4.1.3 Pseudo Random Binary Sequence Input
The controlled input into this model is a PRBS. Within Matlab/Simulink generating a
PRBS m-file is done in the frequency domain system identification toolbox using mlbs –
maximum length binary sequence or in the system identification toolbox version 4.0 using
idinput. The pseudo-random binary sequence as its name suggests generates a pseudo
random binary sequence output shown below (Figure 16). This output is used as a
controlled input into the ABS model. The input is either 0 or 1, which provides random
excitation. The data is persistently exciting, so that the training set has to be representative
of the entire class of inputs that may excite the system.
4.2 Data
19
time a new training algorithm is implemented to ensure training; testing and validation
parameter conditions remain constant though out the experiment. Any variations in results
can only be related to the algorithms or the networks architecture as opposed to a different
input data sequence and its corresponding output data.
load input_data
load output_data
Figure 17 illustrates the response of the ABS to the random excitation signal input. This is
a visual representation of a small section of the data that is loaded before the network can be
run or trained.
20
4.3 Summary
The ABS model exhibits a high level of non-linearity, this is the main reason for its choice
as a model for system identification; also it is easy to modify the Simulink Model so that a
PRBS input can be applied. Data is collected from the ABS Simulink Model and this data is
applied to the design of the Neural Network. This is the first step in the system
identification process.
21
Chapter 5
Scaling
The function premnmx() is used to scale the inputs and targets so they fall within a specified
range. The output of the network is now trained to produce outputs in the (-1,1) range.
These are converted back into the same units that were used for the original targets.
22
5.1.2 Neural Network Model Structure
There are two basic neural network model structures; the parallel identification structure
and the series parallel structure. The parallel identification structure has direct feedback
from the network outputs to its inputs (Figure 18). It estimates the outputs and uses these
estimates to predict the future outputs. However, this structure does not guarantee stability
because of feedback. As, it also requires dynamic backpropagation training. This structure
is only used if the actual plant outputs are not available.
The series-parallel identification structure does not use feedback (Figure 19). Instead, it
uses the actual plant output to predict the future outputs. Static backpropagation is used and
generally stability and convergence are guaranteed with this method [3]
23
Figure 19 Series-parallel identification model [3]
Like a normal system identification model a neural network model structure is defined by
inputs but also by the neural network architecture. This architecture includes the type of
network, hidden layers and hidden nodes. In this case the series-parallel identification
model is used as the neural network model structure. This is because of its high level of
stability and convergence success and because of its ability to be used off line [10].
24
variation algorithms are the basis of test procedures evaluating the overall most effective
way to model the ABS.
MLP Code
tic;
load input_data
load output_data
teach1=teach1';
teach2=teach2';
net = newff(minmax(teach1),[5,2],{'tansig'
'purelin'},'traingd');
Y = sim(net,teach1);
%plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.ep
och,tr.tperf)
%legend('Training','Validation','Test',-1);
%ylabel('Squared Error'); xlabel('Epoch')
toc
25
5.1.4 Training Algorithms – multi-layer network results
The ABS is modelled using a number of training algorithms; the first is the steepest descent.
Steepest descent is the simplest implementation of back propagation learning. It updates the
network weights and biases in the direction in which the performance function decreases.
This function is represented as
x k +1 − x k − α k g k equation 9
26
Traingdm implements the steepest descent with momentum. Momentum allows the
network to respond to the local gradient and recent trends in error surface. Momentum
prevents the network getting beyond a local minima. The momentum constant is defined by
µ it is a number between 0 and 1. The training plot in figure 21 exhibits the ABS data
modelled with the traingdm algorithm with a momentum constant of 0.9. When the
momentum constant is 1 the new weight change is set equal to the last weight change and
the gradient is simply ignored. When the momentum constant µ is 0 a weight change is
based solely on the gradient and the traingdm simply behaves, as the traingd algorithm
would (Figure 22).
27
Figure 22 Traingdm with mu=0, training plot similar to Traingd plot as weight change based on gradient
Traingda implements the steepest descent training function with a variable learning rate. If
the learning rate is set too large the algorithm can oscillate and become unstable but if it is
set too small the algorithm will take to long to converge. The learning rate with the
algorithm Traingda is allowed to change during the training process in response to the
complexity of the local surface error. This procedure increases the learning rate, but only to
the extent that the network can learn without large error increases. Near optimal learning is
achieved for the local terrain. When a large learning rate could result in stable learning the
learning rate is increased, when the learning rate is too high to guarantee a decrease in error
it gets decreased until stable learning is achieved again. In figure 23 the minimum gradient
is reached by the 66 epoch so the learning rate variation and training stops at this epoch.
28
Figure 23 Training plot of Traingda
The increase in learning rate is plotted in figure 24. The learning rate increase terminates at
epoch 66 when the training stops. The training plots outputted with these steepest gradient
decent algorithms all achieve performance at around 0.477 mse. Trainlm algorithm, another
type of algorithm is implemented however it does not affect the mse performance output.
The performance of the network remains constant at approximately 0.477 mse. Even with
the use of the most efficient training algorithm in the toolbox the performance is unchanged.
The efficiency of this algorithm is concluded to relate to time and its ability to compute the
algorithm more rapidly than other algorithms.
30
5.2 Test Procedure for Multi-Layer Neural Network
5.2.1 Over-fitting
The testing data set has been implemented with previously described procedure, the whole
collection of data is implemented including the training, testing and validation data. These
new plots highlight how the neural model is performing. The data sets are implemented
with out pre-processing and the performance mse is 0.6667 (Figure 26). This performance
is not as good compared to 0.477 mse achieved with pre and post processing of the data set.
Over-fitting however cannot be held responsible for this neural networks poor performance.
Over-fitting occurs when the error on the training set is driven to a very small value but
when new data is presented to the network the error is large. The network has memorised
the training examples but has learned not to generalise to new situations. These data sets do
not show signs of over-fitting. Over-fitting is typically highlighted by the validation data
rising and converging at a higher level than the training data [11]. This is not the case in the
in the Traingdm plot where the test set rises above the validation and training set (Figure
27). If over-fitting were to occur early stopping could be implemented. In this case more
data is easily collected from the ABS model and the size of the training set increased, so
there is no possibility of over-fitting.
Figure 26 Training, testing and validation data plot using Trainlm, highlighting over-fitting
31
Figure 27 Training, testing and validation data plot using Traingdm, highlighting over-fitting
This neural network seems to be producing a poor response, which is evident from the
training, testing and validation plots. In order to examine exactly how poor a response
given by the neural network post training analysis is carried out. Post training or regression
analysis is performed between the network response and the corresponding targets. The
following code produces a plot for post training analysis.
[a]=postmnmx(Y,mint,maxt);
[m,b,r]=postreg(a(2,:),teach2(2,:))
;
m and b correspond to the slope of the y-intercept of the best linear regression relating
targets to the network outputs. If there was a perfect fit i.e. if the outputs exactly equal the
targets, the slope would be 1 and the y-intercept would be 0. The third variable returned,
the R-value is the coefficient between the outputs and targets. It is a measure of how well
32
the variation in output is explained by the targets. If this number is equal to 1, then there is
perfect correlation between targets and outputs. These are the post training analysis outputs
for the Trainlm algorithm.
m= 6.9188e-005
b=63.2698
v=0.0083
The R-value is extremely low and indicates a very poor linear fit, which is shown in figure
28. A similar plot is obtained for the Traingdm algorithm indicating the overall weakness of
the neural network to perform system identification (Figure 29). The R-value in this case is
a negative number but the system still exhibits poor linear fit.
33
Figure 29 Post training analysis plot for Traindgm algorithm.
The analysis of this system indicates a very poorly functioning neural network identifier.
This system could be improved by changing the architecture. This could be done by adding
more hidden layers and increasing the number of input neurons. The actual optimum
structure is achieved through trial and error. Some changes are made to the structure but a
signification improvement in performance is not highlighted. The hidden layer of the
network designed with the Trainlm algorithm is increased from 5 neurons to 22 neurons.
The output training performance plot shows no significant change (Figure 30).
35
Chapter 6
Although multi-layer networks and recurrent neural networks have different structures they
may be viewed similarly. The networks have the potential to be used in unison in systems
with dynamic elements and feedback [10]. In effect recurrent neural networks used for
identification or model based predictive control are multi-layer neural networks with a delay
element in their feedback loop. Recurrent neural networks could be built with multi-layer
networks in their feedback loop, creating a system where the structures compute in tandem.
Hence the networks could be used in unison creating systems with both dynamic elements
and feedback. This is beyond the scope of the structures examined and tested with the ABS
data, multi-layer perceptrons and recurrent neural networks were tested as separate entities
and their results compared. There are two neural network structures available in the Matlab
neural network toolbox: the Hopfield and the Elman structure. The Elman structure is
chosen as the architecture of the recurrent network used to model ABS. This choice is made
because the Hopfield architecture is seldom used in practice, even the best Hopfield designs
may have spurious results that can lead to incorrect answers [11]. Elman networks are two-
layer backpropagation networks with the addition of a feedback connection from the output
of the hidden layer to its input.
36
neural network architecture. Elman Code is an example of the code used to test the Elman
structure.
Elman Code
tic;
load input_data
load output_data
teach1=teach1'; %converting input sequence into columns
teach2=teach2'; %converting the target to columns
net=newelm([0 1],[5,2],{'tansig','tansig'},'traingdx');
teach1seq=con2seq(teach1);
teach2seq=con2seq(teach2);
net=init(net);
net.trainParam.epochs=300;
net.trainParam.show=5;
net.trainParam.goal=0.01;
net.performFcn='sse';
[pn,minp,maxp,tn,mint,maxt]=premnmx(teach1,teach2);
pnseq=con2seq(pn);
tnseq=con2seq(tn);
[net,tr]=train(net,pnseq,tnseq);
toc
hold on;
semilogy(tr.epoch,tr.perf)
title('Sum squared error of Elman Network')
xlabel('Epoch')
ylabel('Sum squared error')
Y=sim(net,pnseq);
37
The recurrent connection present in the Elman network allows the network to detect and
generate time-varying patterns. The Elman structure differs from conventional two layer
networks in that the first layer has the recurrent connection. The delay in this connection
stores values form the previous time step, which can be used as the current time step. This
property may give rise to the miscorrelation of results. Even if two Elman networks with
the same weights and biases are given identical inputs at a given time step their outputs can
be different due to different feedback states. The network has proved effective at storing
information for future reference and that is why it is tested for identification of the ABS
model. Different training algorithms are tested and the results compared with the multi-
layer structures.
6.2.2 Results
Trainlm is the first algorithm, which trains the network it is the quickest of all the
algorithms. It tends to proceed so rapidly it does not necessarily do well when implemented
in Elman structures. However, this is a relative statement as the algorithm takes 75.0630
minutes to run 100 epochs compared with the multi-layer network run time of 28.6410
seconds for the trainlm algorithm. The performance results were also very poor. The mean
square error performance measurement was 3954.82. Figure 31 highlights the networks
poor performance.
These results were inadequate and pre and post processing is implemented to see if
improvements can be made. First, all the mean and standard deviation of the input and
target data are normalised. As a result of normalisation they now have zero mean and unity
standard deviation. After training the inputs and outputs are scaled back into the original
units. This does not improve performance; in fact figure 33 highlights that performance has
deteriorated.
39
Figure 33 Deteriorated performance of recurrent neural network
A second type of pre and post processing, scaling is implemented because of the lack of
success with the mean and standard deviation method. The function premnmx() scales the
data for training and postmnmx() converts the data back to its original state after the
algorithm has run. The resultant plot shown in figure 34 does not show any significant
difference in performance even when mean and standard deviation processing was carried
out on the data.
40
Figure 34 Sum square error plot of recurrent network with pre and post processing implemented
6.3.1 Comparison
Both systems tested do not perform to their optimum potential i.e. the MLP & recurrent
network. The multi-layer network out performs the recurrent network in terms of run time
and also square error performance. This result is not wholly unexpected because both
structures tested had just one hidden layer with a maximum of 5 neurons in this layer. For
an Elman to have the best chance at learning a problem it needs more hidden neurons in its
hidden layer than actually are required for a solution by any other method. With fewer
neurons, the Elman network is less able to find the appropriate weights for hidden neurons
since the error gradient is approximated [11]. Extensive testing is needed to improve the
performance of both networks because it is necessary to modify the architecture sometimes
only very slightly to produce a huge performance improvement. This testing for recurrent
networks is restricted by the length of time it takes for the networks to converge using the
backpropagation algorithm sometimes the structures have to be left over night to train
because of their long running time. The Genetic Algorithm (GA) is a possible solution for
41
the backpropagation training algorithm because it is not based on error gradient and does
not require as much computational time when the neuron number is high [12].
Development of genetic algorithms for identification and training purposes is a relatively
new direction and could produce extremely interesting results.
42
Figure 35 The basic concepts behind genetic algorithms [7]
Development of the project in the future will not be limited to the use of genetic algorithms
and the improvement of the structures which use backpropagation. Architecture may also
be developed to include both multi-layer and recurrent networks hence maximising the
strength of each of the individual architectures in one unified unit. The multi-layers
strength lies in its success at pattern recognition problems and the recurrent networks
success is in its solution of optimisation problems. Matlab toolbox has proved a very
powerful tool for building each of the architecture separately its capabilities may be
investigated and perhaps extended to build a more complex model. In this study the
development of research and testing has been progressive. It traces the development of the
SLP through its growth into recurrent networks. Testing highlights the flaws in all the
architectures such as the SLP inability to perform non-linear classification, the MLP poor
error performance and the recurrent networks poor error performance and long training
durations. Possible solutions are offered and interesting future directions are discussed in
the form genetic algorithm development and architecture modification.
43
References
[1] Bruce D. Baker & Craig E. Richards (In Press), Exploratory application of neural
networks to school finance: forecasting educational spending
[2] Arthur W.Ham, (1974), Histology Seventh Edition, J.B. Lippincott Company,
Philadelphia and Toronto.
[3] J.Wesley Hines (1997), Fuzzy and Neural Approaches in Engineering, A Wiley-
Interscience Publication, John Wiley & Sons, INC.
[4] S. Haykin (1994), Neural Networks: A Comprehensive Foundation, N.Y.
Macmillian
[5] Jennifer Bruton, Course notes and reference code mlpeg1
[6] Chris Stergiou, Historical Background of Neural Networks
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/cs11/article1.html
[7] Yonggon Lee & Stanislaw H. Zak (2001), Designing a Genetic Neural Fuzzy Anti-
Lock Brake System Controller, IEEE Transactions on Evolutionary Computation
[8] W.K. Lennon & K.M. Passino (1995), “Intelligent control for brake systems”,IEEE
Transactions on Fuzzy Systems, VOL.3, 381-388.
[9] S.Rossignol, X.Rodet, J.Soumagne, J-L Collette & P Depalle, Feature extraction
and temporal segmentation of acoustic signal, CNET/RENNES (Centre National
d’Etudes des Telecommunicatiors), France
[10] Kumpati S Narendra & Kannan Parthasarathy (1990), Identification and Control of
Dynamical Systems Using Neural Networks, IEEE Transactions on Neural
Networks, VOL 1, no. 1.
[11] http://www.mathworks.com/access/helpdesk/help/helpdesk.shtml
[12] A. Blanco, M. Delgado, M.C. Pegalarjar (2001), A real-coded genetic algorithm for
training recurrent neural networks, Neural Networks VOL 14, 93-95.
44
Appendix 1
%The andgate problem again this time with 12 cycles
clear
w1=[0 1 -1]';
b=1;
k=1;
x1=[-1 -1]';
x2=[-1 1]';
x3=[1 -1]';
x4=[1 1]';
tau1=-1;
tau2=-1;
tau3=-1;
tau4=1;
tau=[tau1 tau2 tau3 tau4];
p=[[b;x1][ b;x2][ b;x3][ b;x4]];
mu=0.2;
new_w(:,k)=w1;
y(k)=sign(w1'*p(:,k))
e(k)=tau(:,k)-y(k);
new_w(:,k+1)=w1+(mu*e(k)*p(:,k));
k=0
while k<12;
for i=1:4;
y(i)=sign(new_w(:,k+i)'*p(:,i));
e(i)=tau(:,i)-y(i);
new_w(:,k+i+1)=new_w(:,k+i)+(mu*e(i)*p(:,i));
end
k=k+4;
end
45
Appendix 2
P=[-0.5 -0.5 0.3 0.1; %inputs
-0.5 0.5 -0.5 1.0];
T=[0 0 0 1]; %targets
plotpv(P,T); %vectors plotted
net=newp(minmax(P),1); %network created with one layer (slp)
plotpv(P,T); %vectors replotted with networks
%attempt at classification
net.b{1}=1; %bias
E=1;
while (sse(E));
[net,Y,E]=adapt(net,P,T);
clf;
plotpv(P,T);
plotpc(net.IW{1},net.b{1});
drawnow;
end
p=[0.7;1.2];
a=sim(net,p);
plotpv(p,a);
Point = findobj(gca,'type','line');
set(Point,'color','red');
hold on;
plotpv(P,T);
plotpc(net.IW{1},net.b{1});
46
Appendix 3
%Designing Neural Network
close all % close all open figures
clear all % clear all old variables, to reduce the risk of
confusing errors
tic;
load input_data
load output_data
teach1=teach1';
teach2=teach2';
net = newff(minmax(teach1),[5,2],{'tansig'
'purelin'},'traingd');
Y = sim(net,teach1);
%plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf)
%legend('Training','Validation','Test',-1);
%ylabel('Squared Error'); xlabel('Epoch')
toc
47