You are on page 1of 53

B.ENG.

IN ELECTRONIC ENGINEERING

PROJECT REPORT

Development of Neural Networks


For
System Identification

Siobhan Murphy 98276506


Acknowledgements

I would like to thank Ms Jennifer Bruton for her time and invaluable guidance during this
project. I would also like to thank my friends and family for their ordinary laughter and
most especially my mom for “hanging in there” with me.

Declaration
I hereby declare that, except where otherwise indicated, this document is entirely my own
work and has not been submitted in whole or in part to any other university.

Signed: ...................................................................... Date: ...............................

ii
Abstract
This project outlines the development of a neural network model for system identification. It
traces the growth of neural networks from their humble beginnings as single-layer
perceptrons to neural network models. Both multi-layer and recurrent networks models are
examined and their merits as system identifiers discussed. The system chosen as a basis for
the empirical data collection is the anti-lock brake system, which exhibits highly non-linear
behaviour and lends itself to neural network modelling for system identification purposes.
The backpropagation algorithm is used in the development of the neural network. Until
recently, backpropagation neural networks made up 80% of all neural network applications
[1]. The use of backpropagation has declined due to the relatively long required training
times for the iterative algorithm. Genetic algorithms are discussed as a possible alternative.

iii
Table of Contents

Acknowledgements..................................................................................................................ii

Declaration...............................................................................................................................ii

Abstract ...................................................................................................................................iii

Table of Contents....................................................................................................................iv

Table of Figures ......................................................................................................................vi

Introduction.............................................................................................................................. 1
1.1 Artificial Neural Networks ........................................................................................ 1
1.1.1 Background ......................................................................................................... 1
1.1.2 How the Human Brain Learns ............................................................................ 2
1.2 Artificial Neuron and Activation Function................................................................ 3
1.2.1 Linear Activation Function................................................................................. 3
1.2.2 Non - Linear Activation Functions ..................................................................... 4
1.2.3 Neural Network Matlab Toolbox........................................................................ 6
1.3 Summary .................................................................................................................... 6

The Perceptron......................................................................................................................... 7
2.1 Implementing a Single Layer Perceptron in Matlab.................................................. 8
2.1.1 Single Layer Perceptron Designed without Neural Network Toolbox............... 8
2.1.2 Designing and Training using the Neural Network Toolbox ............................. 9
2.2 Multi-Layer Perceptron............................................................................................ 10
2.2.1 Implementing a Multi-Layer Perceptron in Matlab - XOR Classification ....... 12
2.3 Summary .................................................................................................................. 16

Anti-Lock Braking System .................................................................................................... 17


4.1 ABS Model .............................................................................................................. 17
4.1.1 Basic Steps of System Identification ................................................................ 17
4.1.2 The Simulink Model ......................................................................................... 18
4.1.3 Pseudo Random Binary Sequence Input........................................................... 19
4.2 Data .......................................................................................................................... 19
4.2.1 Data Collection ................................................................................................. 19

iv
4.2.2 Loading Data..................................................................................................... 20
4.3 Summary .................................................................................................................. 21

Building Neural Network – the design detail ........................................................................ 22


5.1 Design Detail ........................................................................................................... 22
5.1.1 Pre & Post processing ....................................................................................... 22
5.1.2 Neural Network Model Structure ..................................................................... 23
5.1.3 Types of Neural Networks ................................................................................ 24
5.1.4 Training Algorithms – multi-layer network results .......................................... 26
5.2 Test Procedure for Multi-Layer Neural Network .................................................... 31
5.2.1 Over-fitting ....................................................................................................... 31
5.2.2 Post Training Analysis...................................................................................... 32
5.3 Summary .................................................................................................................. 35

Recurrent Neural Networks ................................................................................................... 36


6.1 Structure of Recurrent Neural Network – design detail .......................................... 36
6.2 The Elman Structure ................................................................................................ 36
6.2.1 Building the structure........................................................................................ 36
6.2.2 Results............................................................................................................... 38
6.3 Overall Analysis of Networks.................................................................................. 41
6.3.1 Comparison ....................................................................................................... 41
6.3.2 Conclusion and Future Directions .................................................................... 42

References.............................................................................................................................. 44

Appendix 1............................................................................................................................. 45

Appendix 2............................................................................................................................. 46

Appendix 3............................................................................................................................. 47

v
Table of Figures
FIGURE 1 COMPONENTS OF BIOLOGICAL NEURON [2] ................................................................................2
FIGURE 2 COMPONENTS OF THE SYNAPSE [2].............................................................................................3
FIGURE 3 LINEAR ACTIVATION FUNCTION, EQUATION 1.............................................................................4
FIGURE 4 LOG SIGMOID ACTIVATION FUNCTION, EQUATION 2 ...................................................................5
FIGURE 5 TAN-SIGMOID ACTIVATION FUNCTION, EQUATION 3...................................................................5
FIGURE 6 SINGLE LAYER PERCEPTRON ARCHITECTURE [6] .......................................................................7
FIGURE 7 INPUT VECTORS OF THE SLP PLOTTED........................................................................................9
FIGURE 8 CLASSIFICATION PLOT WITH NEW INPUT CORRECTLY PLOTTED IN RED. ....................................10
FIGURE 9 MULTI-LAYER PERCEPTRON ARCHITECTURE [6].......................................................................11
FIGURE 10 OUTPUT PLOT OF XOR INPUTS INTO SLP, WHICH WAS UNABLE TO PERFORM CLASSIFICATION12
FIGURE 11 MODEL OF MLP THAT SOLVES EXOR CLASSIFICATION DIFFICULTIES ...................................13
TABLE 1 THE XOR TRUTH TABLE ............................................................................................................13
TABLE 2 TRUTH TABLE FOR THE NEURON WITH STRONG NEGATIVITY N1 AND THE NEURON WITH STRONG
POSITIVITY N2 [5]. .........................................................................................................................13
FIGURE 12 OVERALL CLASSIFICATION OF XOR PROBLEM [5]..................................................................14
FIGURE 13 TRAINING OF THE XOR NETWORK, MEAN SQUARE ERROR PLOT OVER 70 EPOCHS [5] ............15
FIGURE 14 TRAINING OF XOR NETWORK WITH PERFORMANCE GOAL MET, MEAN SQUARE ERROR PLOT UNTIL
CONVERGENCE [5]..........................................................................................................................15
FIGURE 15 ABS MODEL WITH PSEUDO RANDOM BINARY SEQUENCE INPUT .............................................18
FIGURE 16 PSEUDO RANDOM BINARY SEQUENCE.....................................................................................19
FIGURE 17 A VISUAL REPRESENTATION OF INPUT AND OUTPUT DATA ......................................................20
FIGURE 18 PARALLEL IDENTIFICATION MODEL [3]...................................................................................23
FIGURE 19 SERIES-PARALLEL IDENTIFICATION MODEL [3].......................................................................24
FIGURE 20 TRAINING PLOT OF TRAINGD ..................................................................................................26
FIGURE 21 TRAINGDM PLOT WITH MOMENTUM CONSTANT OF 0.9 ...........................................................27
FIGURE 22 TRAINGDM WITH MU=0, TRAINING PLOT SIMILAR TO TRAINGD PLOT AS WEIGHT CHANGE BASED ON
GRADIENT ......................................................................................................................................28
FIGURE 23 TRAINING PLOT OF TRAINGDA................................................................................................29
FIGURE 24 VARIABLE LEARNING RATE PLOTTED AGAINST EACH EPOCH ITERATION ................................29
FIGURE 25 TRAINLM PERFORMANCE TRAINING PLOT ...............................................................................30
FIGURE 26 TRAINING, TESTING AND VALIDATION DATA PLOT USING TRAINLM, HIGHLIGHTING OVER-FITTING 31
FIGURE 27 TRAINING, TESTING AND VALIDATION DATA PLOT USING TRAINGDM, HIGHLIGHTING OVER-FITTING 32
FIGURE 28 POST TRAINING ANALYSIS PLOT FOR TRAINLM ALGORITHM. ..................................................33
FIGURE 29 POST TRAINING ANALYSIS PLOT FOR TRAINDGM ALGORITHM. ...............................................34
FIGURE 31 POOR PERFORMANCE OF RECURRENT NEURAL NETWORK WITH TRAINLM ALGORITHM ..........38
FIGURE 32 TRAINGDX ALGORITHM PERFORMANCE PLOT .........................................................................39
FIGURE 33 DETERIORATED PERFORMANCE OF RECURRENT NEURAL NETWORK .......................................40
FIGURE 34 SUM SQUARE ERROR PLOT OF RECURRENT NETWORK WITH PRE AND POST PROCESSING IMPLEMENTED
.......................................................................................................................................................41
FIGURE 35 THE BASIC CONCEPTS BEHIND GENETIC ALGORITHMS [7] .......................................................43

vi
Chapter 1

Introduction
System identification using both conventional and neural network systems is the
development of a mathematical model of a dynamic system based on empirical data.
Choice of identifier structure is based on well-established results in linear systems theory
and can be applied in the development of non-linear neural networks identifiers with great
success. This is the basis of neural network system identification solutions and the
technique applied in the anti-lock braking system identification model. Before neural
network system identification and its merits is examined, artificial neural networks and their
concepts will be described. The background and concepts behind artificial neural networks
is discussed along with the development of these simple structures into more complex
recurrent neural networks.

1.1 Artificial Neural Networks

1.1.1 Background
Artificial Neural Networks (ANN) can be likened to collections of identical mathematical
models that emulate some of the observed properties of biological nervous systems and
draw on the analogies of adaptive biological learning. The key element of an Artificial
Neural Network is its structure. It is composed of a number of interconnected processing
elements tied together with weighted connections, which take inspiration from biological
neurons. Learning like in a biological system takes place through training, or exposure to a
set of input and output data where the training algorithm adjusts the weights iteratively.
Artificial Neural Networks are good pattern recognition engines and robust classifiers, with
the ability to make decisions about imprecise input data. This ability makes them extremely
useful as a medical analysis tool. There is no need to provide a specific algorithm on how
to identify the disease when using a neural network. Neural networks learn by example so
the details of how to recognise the disease are not needed. What is needed is a set of
examples that is representative of all the variations of the disease. The quality of examples
is not as important as the 'quantity'. Artificial Neural Networks for this reason are used

1
extensively for system modelling where the physical processes are not understood fully or
are highly complex.

1.1.2 How the Human Brain Learns


Artificial Neural Networks success at system modelling for highly complex physical
processes can be attributed to the original architecture on which they are based, the human
brain. At present, brain function is not fully understood. A brain neuron collects signals
from other neurons of the Central Nervous System (CNS), through structures called
dendrites (Figure1). The neuron sends out spikes of electrical activity through a long thin
strand called an axon. This axon splits into thousands of branches. At the end of a branch,
a structure called a synapse converts the activity from the axon into electrical effects (Figure
2). They may excite or inhibit activity in the connected neurons. When a neuron receives
an excitatory input that is sufficiently large compared with its inhibitory input, it sends a
spike of electrical activity down its axon. Learning occurs by changing the effectiveness of
the synapses so that the influence of one neuron on another changes. [2]

Figure 1 Components of biological neuron [2]

The structure of the human brain neuron is the template for artificial learning. However,
lack of knowledge leads to approximations and assumptions of the general architecture of
an artificial neural network. The knowledge of neurons is incomplete and computing power
is limited so models are often idealisations of real networks of neurons.
2
Figure 2 Components of the synapse [2]

1.2 Artificial Neuron and Activation Function


The artificial neuron like the biological neuron described in figures 1 & 2 is a processing
element. An output for this artificial neuron is calculated by multiplying its inputs by a
weight vector. The results are then added together and an activation function is applied to
the sum. The activation function is a function used to transform the activation level of a
unit or rather a neuron into an output signal. Typically, activation functions have a
“squashing” effect; they contain the output within a range.

1.2.1 Linear Activation Function


There are many activation functions that can be applied to neural networks; three main
activation functions are dealt with in this project. [3]
The first is the linear transform function, or purelin function. It is defined as follows

f(x)=x equation 1

Neurons of this type are used as linear approximators.

3
Figure 3 Linear activation function, equation 1

1.2.2 Non - Linear Activation Functions

There are several types of non-linear activation functions; the two most common are the
log-sigmoid transfer function and the tan-sigmoid transfer function. Plots of these
differentiable, non-linear activation functions are illustrated in figures 4 & 5. They are
commonly used in networks trained with backpropagation. The networks referred to in this
project are generally backpropagation models and they mainly use log-sig and tan-sig
activation functions. The logistic activation function; it is defined by the equation
Logsig(x)=1/(1+exp(-βx)) equation 2

β=1 though it can be changed which in turn changes the shape of the sigmoid. As β tends
toward infinity it behaves more and more like a hard-limiter where the slope of the sigmoid
is zero. In this case where the slope is not zero, the output range is contained between 0 and
1.

4
Figure 4 Log sigmoid activation function, equation 2

Figure 5 Tan-sigmoid activation function, equation 3

Tansig(x) is equivalent to tanh(x), and is defined as


e x − e−x
f(x)=tanh(x)= equation 4
e x + e −x
Tansig(x) runs faster than tanh(x) so it is a good choice when speed is an important factor.

5
1.2.3 Neural Network Matlab Toolbox
These activation functions were built using Matlab. MATLAB, which stands for matrix
laboratory is an interactive system, which was originally written as software for matrix
computation. It has evolved into a testing and analysis research tool used in engineering,
mathematics and science. Matlab toolboxes are collections of functions used to solve
particular classes of problems. In this project the Matlab Neural Network Toolbox is used
to build, train and test system identification neural network models.

1.3 Summary

Artificial neural networks use the CNS of living creatures as a basis for system architecture.
This architecture is used as the basis for artificial structures called artificial neural networks.
This development of an artificial neural network requires an activation function, which is
either linear or non-linear. This function changes the activation level of a unit into an
output signal. This activation function must be applied to all neural networks including the
single layer perceptron.

6
Chapter 2

The Perceptron
A single layer perceptron (SLP) is the simplest form of artificial neural network that can be
built. This chapter discusses the single layer perceptron in detail. It consists of one or more
artificial neurons in parallel. Each neuron in the single layer provides one network output,
and is usually connected to all of the external inputs (Figure 6). The diagram below
illustrates a very simple neural network; it consists of a single neuron in the output layer.

Figure 6 Single layer Perceptron Architecture [6]

There are n neurons in the input layer; each circle represents a neuron. The total input
stimuli to the neuron in the output layer is
n
zin = ∑ xi wi = x0 w0 + x1w1 + x2 w2 + ...xn wn equation 4
i =0

y= Output of the neuron = f(zin) The input x0 is a special input, referred to as the bias

input. Its value is normally fixed at +1. Its associated weight w0 is referred to as the bias

weight.

7
2.1 Implementing a Single Layer Perceptron in Matlab

2.1.1 Single Layer Perceptron Designed without Neural Network Toolbox.


A single layer perceptron can be built in Matlab without the use of the neural network
toolbox [Appendix 1]. This approach to building a single layer perceptron encourages a
greater understanding of the concepts relating to neural networks. The single layer
perceptron, implements a form of supervised learning. Supervised neural networks are
trained to produce desired outputs when specific inputs are used in the system. Supervised
neural networks are particularly well suited for modelling and controlling dynamic systems,
classifying noisy data, and predicting future events. In this case, building without the
toolbox creates a less powerful but functioning SLP. When designing the SLP structure the
weights are assigned small random values, input and target output patterns are also applied.
The output of the perceptron is calculated from the equation
y (k ) = f ( wT (k ) x(k )) equation 5
y = output
w = weights
x = inputs

The weights are adapted using the error until ∆w=0:


The update of the weights is as follows
w(k + 1) = w(k ) + µe(k ) x(k ) equation 6
e( k ) = Γ ( k ) − y ( k ) equation 7

e = error
µ = fixed value
Γ = t arg et
A hard limit activation function is used to calculate the y(k). This activation function is a
threshold activation function; it is implemented in Matlab code using the sign function. The
activation function limits the output between one and minus one. When an element is fed
through this function it returns a one if the element is greater than zero; or zero if it equals
zero; and minus one if it is less than zero. A zero output will never be produced if the target
is never set to zero. This network is in effect a binary output perceptron. It can only
classify input patterns that are linearly separable. Frank Rosenblatt first developed this
perceptron architecture in 1958 [3].
8
2.1.2 Designing and Training using the Neural Network Toolbox
Although it has been shown that a neural network can be implemented without the use of
the neural network toolbox this network is limited in its applications. Building a single
layer neural network is most successfully done with the toolbox using the function newp()
[Appendix 2]. This function has a default hard limit activation function. Using newp()
inputs can be classified according to Boolean AND logic (Figure 7). Inputs are fed into the
newly created neural network and targets are applied – these targets are based on the outputs
of an AND gate. The network is first trained with the inputs and classification does not take
place. This model is based on a demonstration model in the toolbox itself called demop1.

Figure 7 Input vectors of the SLP plotted.

The network trains so that it behaves like an AND gate. The outputs are linearly separable
so the network can classify them as a one or a zero like binary logic. A classification line is
drawn across the linear plane, which is shown in blue in figure 8. If a new input is applied
the newly trained network is simulated and classification of this new point occurs. In this
case the new input is [0.7; 1.2], it is correctly classified as a one and shown in red on the
right side of the classification line.

9
Figure 8 Classification plot with new input correctly plotted in red.

After one training cycle of the network the correct classification is not always achieved. It
can take several training cycles or epochs to modify the weights until the correct
classification of the problem is achieved. This design structure is the basis for all artificial
neural networks. The SLP leads to the creation of multi-layer perceptrons, which are
structures of multiple single layer perceptrons.

2.2 Multi-Layer Perceptron

A multi-layer perceptron builds on the architecture of the single layer perceptron. The
single layer perceptron is not very useful because of its limited mapping ability; it is only
really applicable to linearly separable inputs. It will fail if the inputs are not linearly
separable. The SLP however, can be used as a building block for larger, much more
practical structures. Using multi-layer architectures, non-binary activation functions and
more complex training algorithms mean the limitations of a simple perceptron may be
overcome. A typical multi-layer perceptron (MLP) network consists of a set of source
nodes forming the input layer, one or more hidden layers of computation nodes, and an
output layer of nodes illustrated in figure 9. The input signal propagates through the

10
network layer by layer. The computations performed by this feed forward network with a
single hidden layer, non-linear activation functions and a linear output layer, can be written
mathematically as

x = f ( s) = Bϕ ( As + a ) + b equation 8

• s = inputs
• x = outputs
• A = matrix of weights of the first layer
• a = bias vector of the first layer
• B= weight matrix of second layer
• b=bias vector of second layer
• ϕ= non-linearity function.

Figure 9 Multi-layer perceptron architecture [6].


It has been proven that this architecture can approximate any continuous function to any
degree of accuracy of a compact set. The multi-layer perceptron has been termed the
universal approximator. However, it is never known exactly how many hidden layers of

11
neurons will ensure optimum network convergence and if the weight matrix that
corresponds to that error goal can be found. These solutions are unique to each neural
network and the input and output data applied [4]. To begin with the MLP architecture is
applied to the EXOR problem. Historically, it was this problem that first exhibited the
limitations of the SLP and also led to the development of more complex multi-layer
perceptrons. Minsky and Papert (1969) believed that in their, “… intuitive judgement the
extension (to multi-layer systems would be) sterile”. This opinion was based on the
inability of the SLP to classify the EXOR problem and other such linearly non-separable
problems [6].

2.2.1 Implementing a Multi-Layer Perceptron in Matlab - XOR


Classification
The opinion of Minsky and Papert has since been discarded and the XOR problem solved
using multi-layer perceptrons. The XOR problem is linearly non-separable so when it is
applied to the single layer perceptron no classification line can be plotted because the linear
plane cannot be divided shown in figure 10 below.

Figure 10 Output plot of XOR inputs into SLP, which was unable to perform classification
A new MLP must be built using the newff() function [5]. This creates a new network
function, which has an input layer, a hidden layer and an output layer (Figure 11)

X1
12
1

3 y
Figure 11 Model of MLP that solves EXOR classification difficulties

The essence of this problem is to build a perceptron network that takes two Boolean inputs
and outputs the XOR of them. The XOR truth table is shown below in table 1.

X1 X2 Desired
Outputs
0 0 0
0 1 1
1 0 1
1 1 0
Table 1 The XOR truth table

The first neuron is designed with strong negativity. The second neuron is designed with
strong positivity and the third neuron must discriminate between the two of them

X1 X2 N1 N2 Y
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 0 1

Table 2 Truth table for the neuron with strong negativity N1 and the neuron with strong positivity N2 [5].

13
This problem is now linearly separable and classification can be achieved. Matlab produces
a plot of the overall classification (Figure 12). There is a classification line through (0,0) to
(1,1) indicating the output is 0 for both of these inputs and another classification line
through (0,1) to (1,0) indicating that both of these inputs produce 1 as an output.

Figure 12 Overall classification of XOR problem [5]

The Neural Network does not automatically classify the inputs correctly. When the input
data was first applied the output was incorrect, the network had to be trained to recognise
the inputs and perform as an XOR gate. The training of the data takes place over 70 epochs.
Figure 13 highlights that the minimum gradient has been reached and the performance goal
was not met.

14
Figure 13 Training of the XOR network, mean square error plot over 70 epochs [5]

The network is trained again. This time a specific goal is set for the network to achieve.
This goal is 0.0037^2. It only takes four epochs for this goal to be achieved and correct
classification then takes place (Figure 14). The hidden layer behaves like a little black box,
hence its name, hidden layer. Its behaviour is hidden from view and can only be
approximated so it may behave slightly differently each time the network and training
algorithm are run, every time producing different results.

Figure 14 Training of XOR network with performance goal met, mean square error plot until convergence [5]

15
2.3 Summary
The single layer perceptron is the simplest form of artificial neural network. It is possible to
implement the SLP without the neural network toolbox but this perceptron is not as
powerful as one created with the toolbox. Using the toolbox the single layer perceptron can
perform classification on linear inputs. The next step is to extend the single layer
perceptron architecture to solve more challenging problems.

This leads to the development of multi-layer perceptrons whose structure can be applied to
more difficult problems, which the SLP cannot solve. The XOR classification problem is
one such example. It clearly illustrates the benefits of multi-layer perceptrons in the
solution classification problems. An MLP can classify non-linear problems successfully.
The next step is to apply multi-layer perceptrons to system identification. The system
chosen to test MLP applications is the Anti-Lock Braking System (ABS).

16
Chapter 4

Anti-Lock Braking System


The ABS model is a demo model found in Simulink Matlab. It, like many other models,
can be modelled or identified using multi-layer perceptrons. A typical anti-lock braking
system senses when the wheel lock up is to occur. It then releases the brakes for a very
short time and reapplies the brakes when the wheel spins up again. ABS greatly reduces the
possibility of skidding during hard braking. ABS also lets the driver steer during braking.

This ability to steer during braking is the one of the main benefits of ABS; in a hard braking
situation without the ABS the wheels may skid and at times lose traction between the tires
and road, which could result in accidents. Neural Networks have already been used with
great success to develop a genetic neural fuzzy controller. This controller finds the optimal
wheel slips that maximize the road adhesion coefficient [7]. The Anti-lock brake system
lends itself to neural network modelling and Fuzzy Logic Control because of its need to
constantly alter its response to variations of inputs [8]. It exhibits highly non-linear
behaviour; also artificial neural modelling of ABS results in applications implemented in
the real world. For this reason and also because an externally controlled input can be
applied in the form of a Pseudo-Random Binary Sequence (PRBS) the ABS system is
chosen as a model. This model provides input and output data that is used in an artificial
neural network built with the Matlab neural network toolbox.

4.1 ABS Model

4.1.1 Basic Steps of System Identification


There are three phases of system identification:
• collect experimental input/output data,
• select and estimate the model structures used to build the neural network,
• validate the models and select the best model.
These are the steps followed in the development of multi-layer perceptrons for the ABS
model.

17
4.1.2 The Simulink Model

This ABS model is a simplified model of a normal ABS design (figure 15). This model
captures the essential features of the process and it is reasonable to assume that this model
behaves as a real ABS would [7]. It is possible to develop a set of input equations to model
this system. Recent studies of the ABS model derive these equations in full from tractive
forces and normal forces acting on the tyres and other elements like adhesion and angular
velocity [7]. The model used for this recent research is very similar (albeit more simplistic)
to the model used in this project, which is shown below.

Figure 15 ABS Model with pseudo random binary sequence input

18
4.1.3 Pseudo Random Binary Sequence Input
The controlled input into this model is a PRBS. Within Matlab/Simulink generating a
PRBS m-file is done in the frequency domain system identification toolbox using mlbs –
maximum length binary sequence or in the system identification toolbox version 4.0 using
idinput. The pseudo-random binary sequence as its name suggests generates a pseudo
random binary sequence output shown below (Figure 16). This output is used as a
controlled input into the ABS model. The input is either 0 or 1, which provides random
excitation. The data is persistently exciting, so that the training set has to be representative
of the entire class of inputs that may excite the system.

Figure 16 Pseudo random binary sequence.

4.2 Data

4.2.1 Data Collection


Data must be collected from the model during simulation. Input and output data is collected
using a workspace sink. The data is categorised into three heading. These are training data,
testing data and validation data. Generally 60% of the input and output data is used for
training, 20% is used for testing and 20% used for validation. However, previous research
using neural networks for feature extraction and temporal segmentation of acoustic signal
used 80% of the collected data for training the system and 20% for testing [9]. Testing and
validation of a network is a very important aspect in developing an effective neural network
so when modelling the ABS 60% of the data is used for training and validation and testing
data files are created using 20% of the collected data in each. This set of data is reused each

19
time a new training algorithm is implemented to ensure training; testing and validation
parameter conditions remain constant though out the experiment. Any variations in results
can only be related to the algorithms or the networks architecture as opposed to a different
input data sequence and its corresponding output data.

4.2.2 Loading Data


The data must be loaded in matlab before the neural network can be run or trained. The
network in a sense must be able to see the input and target data. It models itself on this
data. Loading the data is done with the following code.

load input_data
load output_data

Figure 17 illustrates the response of the ABS to the random excitation signal input. This is
a visual representation of a small section of the data that is loaded before the network can be
run or trained.

Figure 17 A visual representation of input and output data

20
4.3 Summary
The ABS model exhibits a high level of non-linearity, this is the main reason for its choice
as a model for system identification; also it is easy to modify the Simulink Model so that a
PRBS input can be applied. Data is collected from the ABS Simulink Model and this data is
applied to the design of the Neural Network. This is the first step in the system
identification process.

21
Chapter 5

Building Neural Network – the design detail


System identification is carried out in phases. The first is the data collection process, which
is outlined in previous section (Section 4.2 Data). Next, this data is processed to filter it and
remove any outliners. Processing can improve the overall performance of a model [11]. A
model structure is selected and the best parameters for this structure computed. The
model’s properties and convergence results are examined and analysed. The matlab neural
network toolbox provides all the necessary functions to ensure these procedures can be
followed.

5.1 Design Detail

5.1.1 Pre & Post processing


Network training can be made more efficient if certain processing steps are performed on
the network inputs and targets. Two types of pre and post processing are implemented in
testing [11].
• scaling – known as min and max,
• normalisation of mean and standard deviation of the standard deviation of the
training set.

Scaling
The function premnmx() is used to scale the inputs and targets so they fall within a specified
range. The output of the network is now trained to produce outputs in the (-1,1) range.
These are converted back into the same units that were used for the original targets.

Mean and Standard Deviation


The second approach is normalisation of the mean and standard deviation of the training set.
This is done using prestd(). It normalises the inputs and targets so they will have zero mean
and unity standard deviation. The outputs are converted back into the same units that are
used for the original targets using poststd.

22
5.1.2 Neural Network Model Structure

There are two basic neural network model structures; the parallel identification structure
and the series parallel structure. The parallel identification structure has direct feedback
from the network outputs to its inputs (Figure 18). It estimates the outputs and uses these
estimates to predict the future outputs. However, this structure does not guarantee stability
because of feedback. As, it also requires dynamic backpropagation training. This structure
is only used if the actual plant outputs are not available.

Figure 18 Parallel identification model [3]

The series-parallel identification structure does not use feedback (Figure 19). Instead, it
uses the actual plant output to predict the future outputs. Static backpropagation is used and
generally stability and convergence are guaranteed with this method [3]

23
Figure 19 Series-parallel identification model [3]

Like a normal system identification model a neural network model structure is defined by
inputs but also by the neural network architecture. This architecture includes the type of
network, hidden layers and hidden nodes. In this case the series-parallel identification
model is used as the neural network model structure. This is because of its high level of
stability and convergence success and because of its ability to be used off line [10].

5.1.3 Types of Neural Networks


Two types of neural networks are used to construct this series-parallel identification model.
These networks are recurrent neural networks and multi-layer networks. First of all the
series-parallel identification model is constructed in Matlab using a feed-forward
backpropagation network, which is a multi-layer network. [Appendix 3] This feed-forward
backpropagation model, which is a model with static backpropagation is built using the
newff() shown in the example MLP code. A neural network is not programmed but
‘trained’. The algorithm that is used to adjust the weights of the links so as to produce the
desired output is known as “the training the network”. Backpropagation involves
performing computations backwards through the neural network. There are several
variations to the basic training algorithm of the back propagation neural network. These

24
variation algorithms are the basis of test procedures evaluating the overall most effective
way to model the ABS.

MLP Code

%Designing Neural Network


close all % close all open figures
clear all % clear all old variables, to reduce
the risk of confusing errors

tic;
load input_data
load output_data
teach1=teach1';
teach2=teach2';

net = newff(minmax(teach1),[5,2],{'tansig'
'purelin'},'traingd');

%Training the Neural Network


net=init(net);
Y = sim(net,teach1);
[pn,minp,maxp,tn,mint,maxt] =
premnmx(teach1,teach2);
%net.trainParam.show=5;
net.trainParam.epochs=200;
net.trainParam.lr=0.02;
net=train(net,pn,tn);

Y = sim(net,teach1);

%plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.ep
och,tr.tperf)
%legend('Training','Validation','Test',-1);
%ylabel('Squared Error'); xlabel('Epoch')

toc

25
5.1.4 Training Algorithms – multi-layer network results
The ABS is modelled using a number of training algorithms; the first is the steepest descent.
Steepest descent is the simplest implementation of back propagation learning. It updates the
network weights and biases in the direction in which the performance function decreases.
This function is represented as
x k +1 − x k − α k g k equation 9

x k = vector of current weights bias


g k = current gradient
α k = learning rate
This function is known as the steepest gradient descent training function. The changes to
the weights and biases are obtained by multiplying the learning rate by the negative
gradient. The higher the learning rate the larger the step taken. If the learning rate is set to
large the algorithm can become unstable, if the learning rate is set to small then the
algorithm will take to long to converge.
Traingd implements the steepest descent algorithm. Figure 20 shows the training plot of
the artificial neural network using Traingd. The learning rate is set to 0.02. The
performance of the network is measured in this case according to the mean square errors
(mse).

Figure 20 Training plot of Traingd

26
Traingdm implements the steepest descent with momentum. Momentum allows the
network to respond to the local gradient and recent trends in error surface. Momentum
prevents the network getting beyond a local minima. The momentum constant is defined by
µ it is a number between 0 and 1. The training plot in figure 21 exhibits the ABS data
modelled with the traingdm algorithm with a momentum constant of 0.9. When the
momentum constant is 1 the new weight change is set equal to the last weight change and
the gradient is simply ignored. When the momentum constant µ is 0 a weight change is
based solely on the gradient and the traingdm simply behaves, as the traingd algorithm
would (Figure 22).

Figure 21 Traingdm plot with momentum constant of 0.9

27
Figure 22 Traingdm with mu=0, training plot similar to Traingd plot as weight change based on gradient

Traingda implements the steepest descent training function with a variable learning rate. If
the learning rate is set too large the algorithm can oscillate and become unstable but if it is
set too small the algorithm will take to long to converge. The learning rate with the
algorithm Traingda is allowed to change during the training process in response to the
complexity of the local surface error. This procedure increases the learning rate, but only to
the extent that the network can learn without large error increases. Near optimal learning is
achieved for the local terrain. When a large learning rate could result in stable learning the
learning rate is increased, when the learning rate is too high to guarantee a decrease in error
it gets decreased until stable learning is achieved again. In figure 23 the minimum gradient
is reached by the 66 epoch so the learning rate variation and training stops at this epoch.

28
Figure 23 Training plot of Traingda

The increase in learning rate is plotted in figure 24. The learning rate increase terminates at
epoch 66 when the training stops. The training plots outputted with these steepest gradient
decent algorithms all achieve performance at around 0.477 mse. Trainlm algorithm, another
type of algorithm is implemented however it does not affect the mse performance output.

Figure 24 Variable learning rate plotted against each epoch iteration


29
Trainlm implements the Levenberg-Marquardt algorithm. Trainlm was designed to
overcome the problems of having to compute the Hessian matrix (second derivatives) of the
performance index at the current values of weights and biases. This algorithm appears to be
a faster method for training moderate size feed-forward neural networks. In this case the
time elapsed is just 9.0470 seconds for the training to take place in comparison to the time
elapsed for the training algorithm Traingda which was approximately 15 seconds. Trainlm
is a very efficient Matlab implementation since the solution of the matrix equation is a built-
in function so its attributes become even more pronounced in a Matlab setting [11].

Figure 25 Trainlm performance training plot

The performance of the network remains constant at approximately 0.477 mse. Even with
the use of the most efficient training algorithm in the toolbox the performance is unchanged.
The efficiency of this algorithm is concluded to relate to time and its ability to compute the
algorithm more rapidly than other algorithms.

30
5.2 Test Procedure for Multi-Layer Neural Network

5.2.1 Over-fitting
The testing data set has been implemented with previously described procedure, the whole
collection of data is implemented including the training, testing and validation data. These
new plots highlight how the neural model is performing. The data sets are implemented
with out pre-processing and the performance mse is 0.6667 (Figure 26). This performance
is not as good compared to 0.477 mse achieved with pre and post processing of the data set.
Over-fitting however cannot be held responsible for this neural networks poor performance.

Over-fitting occurs when the error on the training set is driven to a very small value but
when new data is presented to the network the error is large. The network has memorised
the training examples but has learned not to generalise to new situations. These data sets do
not show signs of over-fitting. Over-fitting is typically highlighted by the validation data
rising and converging at a higher level than the training data [11]. This is not the case in the
in the Traingdm plot where the test set rises above the validation and training set (Figure
27). If over-fitting were to occur early stopping could be implemented. In this case more
data is easily collected from the ABS model and the size of the training set increased, so
there is no possibility of over-fitting.

Figure 26 Training, testing and validation data plot using Trainlm, highlighting over-fitting

31
Figure 27 Training, testing and validation data plot using Traingdm, highlighting over-fitting

5.2.2 Post Training Analysis

This neural network seems to be producing a poor response, which is evident from the
training, testing and validation plots. In order to examine exactly how poor a response
given by the neural network post training analysis is carried out. Post training or regression
analysis is performed between the network response and the corresponding targets. The
following code produces a plot for post training analysis.

[a]=postmnmx(Y,mint,maxt);
[m,b,r]=postreg(a(2,:),teach2(2,:))
;

m and b correspond to the slope of the y-intercept of the best linear regression relating
targets to the network outputs. If there was a perfect fit i.e. if the outputs exactly equal the
targets, the slope would be 1 and the y-intercept would be 0. The third variable returned,
the R-value is the coefficient between the outputs and targets. It is a measure of how well
32
the variation in output is explained by the targets. If this number is equal to 1, then there is
perfect correlation between targets and outputs. These are the post training analysis outputs
for the Trainlm algorithm.

m= 6.9188e-005

b=63.2698

v=0.0083

Figure 28 Post training analysis plot for Trainlm algorithm.

The R-value is extremely low and indicates a very poor linear fit, which is shown in figure
28. A similar plot is obtained for the Traingdm algorithm indicating the overall weakness of
the neural network to perform system identification (Figure 29). The R-value in this case is
a negative number but the system still exhibits poor linear fit.

33
Figure 29 Post training analysis plot for Traindgm algorithm.

The analysis of this system indicates a very poorly functioning neural network identifier.
This system could be improved by changing the architecture. This could be done by adding
more hidden layers and increasing the number of input neurons. The actual optimum
structure is achieved through trial and error. Some changes are made to the structure but a
signification improvement in performance is not highlighted. The hidden layer of the
network designed with the Trainlm algorithm is increased from 5 neurons to 22 neurons.
The output training performance plot shows no significant change (Figure 30).

Figure 30 Performance training plot with 22 hidden layers


34
5.3 Summary
The building of a neural network follows a number of systematic procedures. Adherence to
these procedures does not necessarily guarantee a highly effective neural network model.
The model requires vigorous testing to obtain the optimal architecture as the number of
hidden layers and neurons in each layer determine the performance of the network. The
MLP architecture in this study does not reach its optimal potential. However, the structure
provides the basis for a recurrent neural network.

35
Chapter 6

Recurrent Neural Networks

6.1 Structure of Recurrent Neural Network – design detail

Although multi-layer networks and recurrent neural networks have different structures they
may be viewed similarly. The networks have the potential to be used in unison in systems
with dynamic elements and feedback [10]. In effect recurrent neural networks used for
identification or model based predictive control are multi-layer neural networks with a delay
element in their feedback loop. Recurrent neural networks could be built with multi-layer
networks in their feedback loop, creating a system where the structures compute in tandem.
Hence the networks could be used in unison creating systems with both dynamic elements
and feedback. This is beyond the scope of the structures examined and tested with the ABS
data, multi-layer perceptrons and recurrent neural networks were tested as separate entities
and their results compared. There are two neural network structures available in the Matlab
neural network toolbox: the Hopfield and the Elman structure. The Elman structure is
chosen as the architecture of the recurrent network used to model ABS. This choice is made
because the Hopfield architecture is seldom used in practice, even the best Hopfield designs
may have spurious results that can lead to incorrect answers [11]. Elman networks are two-
layer backpropagation networks with the addition of a feedback connection from the output
of the hidden layer to its input.

6.2 The Elman Structure

6.2.1 Building the structure


The structure of the Elman recurrent network takes its skeletal shape from the multi-layer
architecture. The matlab function newelm() is used in a similar way to newff(). This
function includes a delay in the feedback loop calculations, hence creating a recurrent

36
neural network architecture. Elman Code is an example of the code used to test the Elman
structure.

Elman Code

%Designing Recurrent Neural Network

close all % close all open figures


clear all % clear all old variables, to reduce the risk
of confusing errors

tic;
load input_data
load output_data
teach1=teach1'; %converting input sequence into columns
teach2=teach2'; %converting the target to columns

net=newelm([0 1],[5,2],{'tansig','tansig'},'traingdx');
teach1seq=con2seq(teach1);
teach2seq=con2seq(teach2);
net=init(net);

net.trainParam.epochs=300;
net.trainParam.show=5;
net.trainParam.goal=0.01;
net.performFcn='sse';
[pn,minp,maxp,tn,mint,maxt]=premnmx(teach1,teach2);
pnseq=con2seq(pn);
tnseq=con2seq(tn);

[net,tr]=train(net,pnseq,tnseq);

toc
hold on;
semilogy(tr.epoch,tr.perf)
title('Sum squared error of Elman Network')
xlabel('Epoch')
ylabel('Sum squared error')
Y=sim(net,pnseq);

37
The recurrent connection present in the Elman network allows the network to detect and
generate time-varying patterns. The Elman structure differs from conventional two layer
networks in that the first layer has the recurrent connection. The delay in this connection
stores values form the previous time step, which can be used as the current time step. This
property may give rise to the miscorrelation of results. Even if two Elman networks with
the same weights and biases are given identical inputs at a given time step their outputs can
be different due to different feedback states. The network has proved effective at storing
information for future reference and that is why it is tested for identification of the ABS
model. Different training algorithms are tested and the results compared with the multi-
layer structures.

6.2.2 Results
Trainlm is the first algorithm, which trains the network it is the quickest of all the
algorithms. It tends to proceed so rapidly it does not necessarily do well when implemented
in Elman structures. However, this is a relative statement as the algorithm takes 75.0630
minutes to run 100 epochs compared with the multi-layer network run time of 28.6410
seconds for the trainlm algorithm. The performance results were also very poor. The mean
square error performance measurement was 3954.82. Figure 31 highlights the networks
poor performance.

Figure 31 Poor performance of recurrent neural network with Trainlm algorithm


38
Traingdx is now implemented to see if trainlm performance can be bettered. It takes
2.7670e+003 minutes to run 100 epochs, which is significantly longer than the trainlm run
time of 75.0630 minutes. The performance error only outperforms the trainlm algorithm
slightly until its maximum epoch is reached.

Figure 32 Traingdx algorithm performance plot

These results were inadequate and pre and post processing is implemented to see if
improvements can be made. First, all the mean and standard deviation of the input and
target data are normalised. As a result of normalisation they now have zero mean and unity
standard deviation. After training the inputs and outputs are scaled back into the original
units. This does not improve performance; in fact figure 33 highlights that performance has
deteriorated.

39
Figure 33 Deteriorated performance of recurrent neural network

A second type of pre and post processing, scaling is implemented because of the lack of
success with the mean and standard deviation method. The function premnmx() scales the
data for training and postmnmx() converts the data back to its original state after the
algorithm has run. The resultant plot shown in figure 34 does not show any significant
difference in performance even when mean and standard deviation processing was carried
out on the data.

40
Figure 34 Sum square error plot of recurrent network with pre and post processing implemented

6.3 Overall Analysis of Networks

6.3.1 Comparison
Both systems tested do not perform to their optimum potential i.e. the MLP & recurrent
network. The multi-layer network out performs the recurrent network in terms of run time
and also square error performance. This result is not wholly unexpected because both
structures tested had just one hidden layer with a maximum of 5 neurons in this layer. For
an Elman to have the best chance at learning a problem it needs more hidden neurons in its
hidden layer than actually are required for a solution by any other method. With fewer
neurons, the Elman network is less able to find the appropriate weights for hidden neurons
since the error gradient is approximated [11]. Extensive testing is needed to improve the
performance of both networks because it is necessary to modify the architecture sometimes
only very slightly to produce a huge performance improvement. This testing for recurrent
networks is restricted by the length of time it takes for the networks to converge using the
backpropagation algorithm sometimes the structures have to be left over night to train
because of their long running time. The Genetic Algorithm (GA) is a possible solution for
41
the backpropagation training algorithm because it is not based on error gradient and does
not require as much computational time when the neuron number is high [12].
Development of genetic algorithms for identification and training purposes is a relatively
new direction and could produce extremely interesting results.

6.3.2 Conclusion and Future Directions


Genetic algorithms implemented in recent research have proved that training cost in terms
of run time is still manageable as the number of neurons increases [12]. Genetic algorithms
are based on a different concept to the backpropagation training algorithm. They offer an
exciting future direction for the research. The GA starts off with a population of randomly
generated chromosomes and then advances towards better chromosomes by applying
genetic operators. During successive iterations or generations the chromosomes are
evaluated as possible solutions. Based on these evaluations, a new population is formed
using a mechanism of selection and applying genetic operators such as crossover and
mutation.
Figure 35 illustrates the basic concepts behind genetic algorithms operators. These are the
genetic operators of a genetic algorithm used for optimising the fuzzy rule base of the fuzzy
component of an ABS controller [7].

42
Figure 35 The basic concepts behind genetic algorithms [7]
Development of the project in the future will not be limited to the use of genetic algorithms
and the improvement of the structures which use backpropagation. Architecture may also
be developed to include both multi-layer and recurrent networks hence maximising the
strength of each of the individual architectures in one unified unit. The multi-layers
strength lies in its success at pattern recognition problems and the recurrent networks
success is in its solution of optimisation problems. Matlab toolbox has proved a very
powerful tool for building each of the architecture separately its capabilities may be
investigated and perhaps extended to build a more complex model. In this study the
development of research and testing has been progressive. It traces the development of the
SLP through its growth into recurrent networks. Testing highlights the flaws in all the
architectures such as the SLP inability to perform non-linear classification, the MLP poor
error performance and the recurrent networks poor error performance and long training
durations. Possible solutions are offered and interesting future directions are discussed in
the form genetic algorithm development and architecture modification.

43
References

[1] Bruce D. Baker & Craig E. Richards (In Press), Exploratory application of neural
networks to school finance: forecasting educational spending

[2] Arthur W.Ham, (1974), Histology Seventh Edition, J.B. Lippincott Company,
Philadelphia and Toronto.
[3] J.Wesley Hines (1997), Fuzzy and Neural Approaches in Engineering, A Wiley-
Interscience Publication, John Wiley & Sons, INC.
[4] S. Haykin (1994), Neural Networks: A Comprehensive Foundation, N.Y.
Macmillian
[5] Jennifer Bruton, Course notes and reference code mlpeg1
[6] Chris Stergiou, Historical Background of Neural Networks
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/cs11/article1.html
[7] Yonggon Lee & Stanislaw H. Zak (2001), Designing a Genetic Neural Fuzzy Anti-
Lock Brake System Controller, IEEE Transactions on Evolutionary Computation
[8] W.K. Lennon & K.M. Passino (1995), “Intelligent control for brake systems”,IEEE
Transactions on Fuzzy Systems, VOL.3, 381-388.
[9] S.Rossignol, X.Rodet, J.Soumagne, J-L Collette & P Depalle, Feature extraction
and temporal segmentation of acoustic signal, CNET/RENNES (Centre National
d’Etudes des Telecommunicatiors), France
[10] Kumpati S Narendra & Kannan Parthasarathy (1990), Identification and Control of
Dynamical Systems Using Neural Networks, IEEE Transactions on Neural
Networks, VOL 1, no. 1.
[11] http://www.mathworks.com/access/helpdesk/help/helpdesk.shtml
[12] A. Blanco, M. Delgado, M.C. Pegalarjar (2001), A real-coded genetic algorithm for
training recurrent neural networks, Neural Networks VOL 14, 93-95.

44
Appendix 1
%The andgate problem again this time with 12 cycles
clear
w1=[0 1 -1]';
b=1;
k=1;
x1=[-1 -1]';
x2=[-1 1]';
x3=[1 -1]';
x4=[1 1]';
tau1=-1;
tau2=-1;
tau3=-1;
tau4=1;
tau=[tau1 tau2 tau3 tau4];
p=[[b;x1][ b;x2][ b;x3][ b;x4]];
mu=0.2;
new_w(:,k)=w1;
y(k)=sign(w1'*p(:,k))
e(k)=tau(:,k)-y(k);
new_w(:,k+1)=w1+(mu*e(k)*p(:,k));
k=0
while k<12;
for i=1:4;
y(i)=sign(new_w(:,k+i)'*p(:,i));
e(i)=tau(:,i)-y(i);
new_w(:,k+i+1)=new_w(:,k+i)+(mu*e(i)*p(:,i));
end
k=k+4;
end

45
Appendix 2
P=[-0.5 -0.5 0.3 0.1; %inputs
-0.5 0.5 -0.5 1.0];
T=[0 0 0 1]; %targets
plotpv(P,T); %vectors plotted
net=newp(minmax(P),1); %network created with one layer (slp)
plotpv(P,T); %vectors replotted with networks
%attempt at classification
net.b{1}=1; %bias

plotpc(net.IW{1},net.b{1}); %ploted with weights and values


%weights are set to zero so no
%classification line appears

%the network is now trained and a classification line is produced

E=1;
while (sse(E));
[net,Y,E]=adapt(net,P,T);
clf;
plotpv(P,T);
plotpc(net.IW{1},net.b{1});

drawnow;
end

% a new point is classified with this network

p=[0.7;1.2];
a=sim(net,p);
plotpv(p,a);
Point = findobj(gca,'type','line');
set(Point,'color','red');
hold on;
plotpv(P,T);
plotpc(net.IW{1},net.b{1});

46
Appendix 3
%Designing Neural Network
close all % close all open figures
clear all % clear all old variables, to reduce the risk of
confusing errors

tic;
load input_data
load output_data
teach1=teach1';
teach2=teach2';

net = newff(minmax(teach1),[5,2],{'tansig'
'purelin'},'traingd');

%Training the Neural Network


net=init(net);
Y = sim(net,teach1);
[pn,minp,maxp,tn,mint,maxt] = premnmx(teach1,teach2);
%net.trainParam.show=5;
net.trainParam.epochs=200;
net.trainParam.lr=0.02;
net=train(net,pn,tn);

Y = sim(net,teach1);

%plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf)
%legend('Training','Validation','Test',-1);
%ylabel('Squared Error'); xlabel('Epoch')

toc

47