Liu 2001

Advances in Industrial Control
Springer-Verlag London Ltd.

Other titles published in this Series:
Adaptive Internal Model Control
Aniruddha Datta
Price-Based Commitment Decisions in the Electricity Market
Eric Allen and Marija Hie
Compressor Surge and Rotating Stall: Modeling and Control
Jan Tommy Gravdahl and Olav Egeland
Radiotherapy Treatment Planning: New System Approaches
Olivier Haas
Feedback Control Theory for Dynamic Traffic Assignment
Pushkin Kaehroo and Kaan Ozbay
Autotuning ofPID Controllers
Cheng-Ching Yu
Robust Aeroservoelastic Stability Analysis
Rick Lind and Marty Brenner
Performance Assessment of Control Loops: Theory and Applications
Biao Huang and Sirish L. Shah
Data Mining and Knowledge Discovery for Process Monitoring and Control
Xue Z. Wang
Advances in PID Control
Tan Kok Kiong, Wang Quing-Guo and Hang Chang Chieh with Tore J. Hagglund
Advanced Control with Recurrent High-order Neural Networks: Theory and
Industrial Applications
George A. Rovithakis and Manolis A. Christodoulou
Structure and Synthesis ofPID Controllers
Aniruddha Datta, Ming-Tzu Ho and Shankar P. Bhattaeharyya
Data-driven Techniques for Fault Detection and Diagnosis in Chemical Processes
Evan L. Russell, Leo H. Chiang and Richard D. Braatz
Bounded Dynamic Stochastic Systems: Modelling and Control
Hong Wang
Non-linear Model-based Process Control
Rashid M. Ansari and Moses O. Tade
Identification and Control of Sheet and Film Processes
Andrew P. Featherstone, Jeremy G. VanAntwerp and Richard D. Braatz
Precision Motion Control: Design and Implementation
Tan Kok Kiong, Lee Tong Heng, Dou Huifang and Huang Sunan
G.P. Liu
Nonlinear Identification
and Control
A Neural Network Approach
With 88 Figures
. Springer
G.P. Liu, BEng, MEng, PhD
School of Mechanical Materials, Manufacturing Engineering and Management,
University of Nottingham, University Park, Nottingham, NG7 2RD, UK
ISBN 978-1-4471-1076-7 ISBN 978-1-4471-0345-5 (eBook)

DOI 10.1007/978-1-4471-0345-5
British Library Cataloguing in Publication Data

Liu, G. P. (Guo Ping), 1962-
Nonlinear identification and control. - (Advances in
industrial control)
1.Nonlinear control theory 2.Neural networks (Computer
science)
1. Title
629.8'36
ISBN 9781447110767
Library of Congress Cataloging-in-Publication Data
Liu, G.P. (Guo Ping), 1962-
Nonlinear identification and control/ G.P. Liu
p. cm. -- (Advances in industrial control)
Includes bibliographical references and index.
ISBN 978-1-4471-lO76-7 (alk. paper)
1. Automatic control. 2. Neural networks (Computer science) 3. Nonlinear theories. 4.
System identification. 1. Title. II. Series.
TJ213 .L522 2001
629.8--dc21 200lO42662
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permis sion in writing of the
publishers, or in the case of reprographic reproduction in accordance with the terms oflicences issued
by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be
sent to the publishers.
http://www.springer.co.uk
© Springer-Verlag London 2001
Originally published by Springer-Verlag London Berlin Heidelberg 2001
Softcover reprint of the hardcover 1st edition 2001
The use of registered names, trademarks etc. in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant laws and regulations and therefore
free for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the
information contained in this book and cannot accept any legal responsibility or liability for any errors
or omissions that may be made.
Typesetting: Electronic text files prepared by author
69/3830-5432lO Printed on acid-free paper SPIN lO770966

Advances in Industrial Control
Series Editors
Professor Michael J. Grimble, Professor ofIndustrial Systems and Director

Professor Michael A. Johnson, Professor of Control Systems and Deputy Director
Industrial Control Centre
Department of Electronic and Electrical Engineering
University of Strathclyde
Graham Hills Building
50 George Street
Glasgow Gil QE
United Kingdom
Series Advisory Board
Professor Dr-Ing J. Ackermann

DLR Institut fur Robotik und Systemdynamik
Postfach 1116
D82230 WeBling
Germany
Professor LD. Landau

Laboratoire d'Automatique de Grenoble
ENSIEG, BP 46
38402 Saint Martin d'Heres
France
Dr D.C. McFarlane
Department of Engineering
University of Cambridge
Cambridge CB2 1QJ
United Kingdom
Professor B. Wittenmark
Department of Automatic Control
Lund Institute of Technology
PO Box 118
S-221 00 Lund
Sweden
Professor D.W. Clarke

Department of Engineering Science
University of Oxford
Parks Road
Oxford OXI 3PJ
United Kingdom
Professor Dr-Ing M. Thoma
Institut fUr Regelungstechnik
UniversiHit Hannover
Appelstr. 11
30167 Hannover
Germany
Professor H. Kimura
Department of Mathematical Engineering and Information Physics
Faculty of Engineering
The University of Tokyo
7-3-1 Hongo
Bunkyo Ku
Tokyo 113
Japan
Professor A.J. Laub

College of Engineering - Dean's Office
University of California
One Shields Avenue
Davis
California 95616-5294
United States of America
Professor J.B. Moore

Department of Systems Engineering
The Australian National University
Research School of Physical Sciences
GPO Box4
Canberra
ACT 2601
Australia
Dr M.K. Masten
Texas Instruments
2309 Northcrest
Plano
TX 75075
United States of America
Professor Ton Backx

AspenTech Europe B.V.
De Waal32
NL-5684 PH Best
The Netherlands
Dedication
To Weihong and Louise

SERIES EDITORS' FOREWORD
The series Advances in Industrial Control aims to report and encourage technology
transfer in control engineering. The rapid development of control technology has an
impact on all areas of the control discipline. New theory, new controllers, actuators,
sensors, new industrial processes, computer methods, new applications, new
philosophies ... , new challenges. Much of this development work resides in
industrial reports, feasibility study papers and the reports of advanced collaborative
projects. The series otTers an opportunity for researchers to present an extended
exposition of such new work in all aspects of industrial control for wider and rapid
dissemination.
The time for nonlinear control to enter routine application seems to be
approaching. Nonlinear control has had a long gestation period but much ofthe past
has been concerned with methods that involve formal nonlinear functional model
representations. It seems more likely that the breakthough will come through the use
of other more flexible and amenable nonlinear system modelling tools. This
Advances in Industrial Control monograph by Guoping Liu gives an excellent
introduction to the type of new nonlinear system modelling methods currently being
developed and used. Neural networks appear prominent in these new modelling
directions. The monograph presents a systematic development of this exciting
subject. It opens with a useful tutorial introductory chapter on the various tools to
be used. In subsequent chapters Doctor Liu leads the reader through identification,
and then onto nonlinear control using nonlinear system neural network
representations. Each chapter culminates with some examples and the final chapter
is a worked-out case-study for combustion processes.
We feel the structured presentation of modern nonlinear identitication methods
and their use in control schemes will be of interest to postgraduate students,
industrial engineers and academics alike. We welcome this addition to the Advances
in Industrial Control monograph series.
M.1. Grimble and M.A. Johnson

Industrial Control Centre
Glasgow, Scotland, U.K.
PREFACE
It is well known that linear models have been widely used in system identi-
fication for two major reasons. First, the effects that different and combined
input signals have on the output are easily determined. Second, linear systems
are homogeneous. However, control systems encountered in practice possess
the property of linearity only over a certain range of operation; all physical
systems are nonlinear to some degree. In many cases, linear models are not
suitable to represent these systems and nonlinear models have to be considered.
Since there are nonlinear effects in practical systems, e.g., harmonic genera-
tion, intermodulation, desensitisation, gainj expansion and chaos, neither of
the above principles for linear models is valid for nonlinear systems. There-
fore, nonlinear system identification is much more difficult than linear system
identification.
Any attempt to restrict attention strictly to linear control can only lead to
severe complications in system design. To operate linearly over a wide range
of variation of signal amplitude and frequency would require components of an
extremely high quality; such a system would probably be impractical from the
viewpoints of cost, space, and weight. In addition, the restriction of linearity
severely limits the system characteristics that can be realised.
Recently, neural networks have become an attractive tool that can be used
to construct a model of complex nonlinear processes. This is because neu-
ral networks have an inherent ability to learn and approximate a nonlinear
function arbitrarily well. This therefore provides a possible way of modelling
complex nonlinear processes effectively. A large number of identification and
control structures have been proposed on the basis of neural networks in recent
years.
The purpose of this monograph is to give the broad aspects of nonlinear
identification and control using neural networks. Basically, the monograph
consists of three parts. The first part gives an introduction to fundamental
princi pIes of neural networks. Then several methods for nonlinear identification
using neural networks are presented. In the third part, various techniques for
nonlinear control using neural networks are studied. A number of simulated
and industrial examples are used throughout the monograph to demonstrate
the operation of the techniques of nonlinear identification and control using
neural networks. It should be emphasised here that methods for nonlinear
control systems have not progressed as rapidly as have techniques for linear
XII Preface
control systems. Comparatively speaking, at the present time they are still
in the development stage. We believe that the fundamental theory, various
design methods and techniques, and many application examples of nonlinear
identification and control using neural networks that are presented in this
monograph will enable one to analyse and synthesise nonlinear control systems
quantitatively. The monograph, which is mostly based on the author's recent
research work, is organised as follows.
Chapter 1 gives an overview of what neural networks are, followed by a
description of the model of a neuron (the basic element of a neural network)
and commonly used architectures of neural networks. Various types of neural
networks are presented, e.g., radial basis function networks, polynomial basis
function networks, fuzzy neural networks and wavelet networks. The function
approximation properties of neural networks are discussed. A few widely used
learning algorithms are introduced, such as the sequential learning algorithm,
the error back-propagation learning algorithm and the least-mean-squares al-
gorithm. Many applications of neural networks to classification, filtering, mod-
elling, prediction, control and hardware implementation are mentioned.
Chapter 2 presents a sequential identification scheme for nonlinear dynam-
ical systems. A novel neural network architecture, referred to as a variable neu-
ral network, is studied and shown to be useful in approximating the unknown
nonlinearities of dynamical systems. In the variable neural network, the num-
ber of basis functions can be either increased or decreased with time according
to specified design strategies so that the network will not overfit or underfit
the data set. The identification model varies gradually to span the appropri-
ate state-space and is of sufficient complexity to provide an approximation to
the dynamical system. The sequential identification scheme, different from the
conventional methods of optimising a cost function, attempts to ensure stabil-
ity of the overall system while the neural network learns the system dynamics.
The stability and convergence of the overall identification scheme are guaran-
teed by the developed parameter adjustment laws. An example illustrates the
modelling of an unknown nonlinear dynamical system using variable network
identification techniques.
Chapter 3 considers a recursive identification scheme using neural networks
for nonlinear control systems. This comprises a structure selection procedure
and a recursive weight learning algorithm. The orthogonal least squares algo-
rithm is introduced for off-line structure selection and the growing network
technique is used for on-line structure selection. An on-line recursive weight
learning algorithm is developed to adjust the weights so that the identified
model can adapt to variations of the characteristics and operating points in
nonlinear systems. The convergence of both the weights and estimation errors
is established. The recursive identification scheme using neural networks is
demonstrated by three examples. The first is identification of unknown sys-
tems represented by a nonlinear input output dynamical model. The second
is identification of unknown systems represented by a nonlinear state-space
dynamical model. The third is the identification of the Santa Fe time series.
Preface XIll
Chapter 4 is devoted to model selection and identification of nonlinear

systems via neural networks and genetic algorithms based on multiobjective
performance criteria. It considers three performance indices (or cost functions)
as the objectives, which are the Euclidean distance and maximum difference
measurements between the real nonlinear system and the nonlinear model,
and the complexity measurement of the nonlinear model, instead of a single
performance index. An algorithm based on the method of inequalities, least
squares and genetic algorithms is developed for optimising over the multiobjec-
tive criteria. Volterra polynomial basis function networks and Gaussian radial
basis function networks are applied to the identification of a practical sys-
tem a large-scale pilot liquid level nonlinear system and a simulated unknown
nonlinear system with mixed noise.
In Chapter 5, identification schemes using wavelet networks are discussed
for nonlinear dynamical systems. Based on fixed wavelet networks, parameter
adaptation laws are developed. This guarantees the stability of the overall
identification scheme and the convergence of both the parameters and the
state errors. Using the decomposition and reconstruction techniques of multi-
resolution decompositions, variable wavelet networks are introduced to achieve
desired estimation accuracy and a suitable sized network, and to adapt to
variations of the characteristics and operating points in nonlinear systems. B-
spline wavelets are used to form the wavelet networks. A simulated example
demonstrates the operation of the wavelet network identification to obtain a
model with different estimation accuracy.
Chapter 6 is concerned with the adaptive control of nonlinear dynamical
systems using neural networks. Based on Gaussian radial basis function neural
networks, an adaptive control scheme is presented. The location of the centres
and the determination of the widths of the Gaussian radial basis functions in
neural networks are analysed to make a compromise between orthogonality
and smoothness. The developed weight adaptive laws ensure the overall con-
trol scheme is stable, even in the presence of modelling error. The tracking
errors converge to the required accuracy through the adaptive control algo-
rithm derived by combining the variable neural network and Lyapunov synthe-
sis techniques. An example details the adaptive control design of an unknown
nonlinear time-variant dynamical system using variable network identification
techniques.
Chapter 7 studies neural network based predictive control for nonlinear con-
trol systems. An affine nonlinear predictor structure is presented. It is shown
that the use of nonlinear programming techniques can be avoided by using a set
of affine nonlinear predictors to predict the output of the nonlinear process.
The nonlinear predictive controller based on this design is both simple and
easy to implement in practice. Some simulation results of nonlinear predictive
neural control using growing neural networks are given.
Chapter 8 considers neural network based variable structure control for the
design of discrete nonlinear systems. Sliding mode control is used to provide
good stability and robustness performance for nonlinear systems. A nonlinear
XIV Preface
neural predictor is introduced to predict the outputs of the nonlinear process

and to make the variable structure control algorithm simple. When the predic-
tor model is inaccurate, variable structure control with sliding modes is used
to improve the stability of the system. A simulated example illustrates the
variable structure neural control of a nonlinear dynamical system.
Chapter 9 describes a neural control strategy for the active stabilisation of
combustion processes. The characteristics of these processes include not only
several interacting physical phenomena, but also a wide variety of dynamical
behaviour. In terms of their impact on the system performance, pressure osci-
llations are undesirable since they result in excessive vibration, causing high
levels of acoustic noise and, in extreme cases, mechanical failure. The active
acoustic control algorithm is comprised of three parts: an output model, an
output predictor and a feedback controller. The output model established us-
ing neural networks is used to predict the output in order to overcome the time
delay of the system, which is often very large, compared with the sampling pe-
riod. An output-feedback controller is introduced which employs the output of
the predictor to suppress instability in the combustion process. The approach
developed is first demonstrated by a simulated unstable combustor with six
modes. Results are also presented showing its application to an experimental
combustion test rig with a commercial combustor.
Much of the work described in this book is based on a series of publica-
tions by the author. The following publishers are gratefully acknowledged for
permission to publish aspects of the author's work which appeared in their
journals: The Institute of Electrical Engineers, Taylor and Francis Ltd., Else-
vier Science Ltd., and the Institution of Electrical and Electronic Engineers.
The author wishes to thank his wife Weihong and daughter Louise for their
constant encouragement, understanding and patience during the preparation
of the manuscript.
Guoping Liu
School of Mechanical, Materials, Manufacturing
Engineering and Management
University of Nottingham
Nottingham NG7 2RD
United Kingdom
May 2001
TABLE OF CONTENTS
Symbols and Abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. XIX
1. Neural Networks. .. .. .. . . . . .... . . . . . .. . . .. . . .. .. .. . . .. .. . . 1

1.1 Introduction............................................ 1
1.2 Model of a Neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Architectures of Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Single Layer Networks ............................. 4
1.3.2 Multilayer Networks .............................. 4
1.3.3 Recurrent Networks ............................... 5
1.3.4 Lattice Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Various Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Radial Basis Function Networks. . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Gaussian RBF Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.3 Polynomial Basis Function Networks. . . . . . . . . . . . . . . . . 9
1.4.4 Fuzzy Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.5 Wavelet Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . .. 10
1.4.6 General Form of Neural Networks ................... 13
1.5 Learning and Approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14
1.5.1 Background to Function Approximation. . . . . . . . . . . . .. 14
1.5.2 Universal Approximation. . . . . . . . . . . . . . . . . . . . . . . . . .. 15
1.5.3 Capacity of Neural Networks. . . . . . . . . . . . . . . . . . . . . . .. 16
1.5.4 Generalisation of Neural Networks. . . . . . . . . . . . . . . . . .. 17
1.5.5 Error Back Propagation Algorithm .................. 17
1.5.6 Recursive Learning Algorithms. . . . . . . . . . . . . . . . . . . . .. 19
1.5.7 Least Mean Square Algorithm ...................... 20
1.6 Applications of Keural ~etworks .......................... 20
1.6.1 Classification..................................... 20
1.6.2 Filtering......................................... 21
1.6.3 Modelling and Prediction. . . . . . . . . . . . . . . . . . . . . . . . . .. 21
1.6.4 Control.......................................... 22
1.6.5 Hardware Implementation. . . . . . . . . . . . . . . . . . . . . . . . .. 23
1. 7 Mathematical Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23
1.8 Summary............................................... 25
XVI Table of Contents
2. Sequential Nonlinear Identification. . . . . . . . . . . . . . . . . . . . . . .. 27

2.1 Introduction............................................ 27
2.2 Variable Neural Networks ................................ 29
2.2.1 Variable Grids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 30
2.2.2 Variable Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 31
2.2.3 Selection of Basis Functions ........................ 33
2.3 Dynamical System Modelling by Neural Networks. . . . . . . . . .. 36
2.4 Stable Nonlinear Identification. . . . . . . . . . . . . . . . . . . . . . . . . . .. 38
2.5 Sequential Nonlinear Identification. . . . . . . . . . . . . . . . . . . . . . . .. 41
2.6 Sequential Identification of Multivariable Systems ........... 45
2.7 An Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49
2.8 Summary............................................... 51
3. Recursive Nonlinear Identification. . . . . . . . . . . . . . . . . . . . . . . .. 53

3.1 Introduction............................................ 53
3.2 Nonlinear Modelling by VPBF Networks. . . . . . . . . . . . . . . . . .. 54
3.3 Structure Selection of Neural Networks. . . . . . . . . . . . . . . . . . . .. 56
3.3.1 Off-line Structure Selection. . . . . . . . . . . . . . . . . . . . . . . .. 56
3.3.2 On-line Structure Selection. . . . . . . . . . . . . . . . . . . . . . . .. 59
3.4 Recursive Learning of Neural Networks. . . . . . . . . . . . . . . . . . . .. 60
3.5 Examples............................................... 66
3.6 Summary............................................... 76
4. Multiobjective Nonlinear Identification. . . . . . . . . . . . . . . . . . .. 77

4.1 Introduction............................................ 77
4.2 Multiobjective Modelling with Neural Networks. . . . . . . . . . . .. 78
4.3 Model Selection by Genetic Algorithms. . . . . . . . . . . . . . . . . . . .. 81
4.3.1 Genetic Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 81
4.3.2 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 84
4.4 Multiobjective Identification Algorithm .................... 86
4.5 Examples............................................... 90
4.6 Summary ............................................... 100
5. Wavelet Based Nonlinear Identification ................... 101

5.1 Introduction ............................................ 101
5.2 Wavelet Networks ....................................... 102
5.2.1 One-dimensional Wavelets .......................... 102
5.2.2 Multi-dimensional Wavelets ......................... 104
5.2.3 Wavelet Networks ................................. 104
5.3 Identification Using Fixed Wavelet Networks ................ 105
5.4 Identification Using Variable Wavelet Networks .............. 108
5.4.1 Variable Wavelet Networks ......................... 109
5.4.2 Parameter Estimation .............................. 111
5.5 Identification Using B-spline Wavelets ...................... 113
5.5.1 One-dimensional B-spline Wavelets .................. 113
5.5.2 n-dimensional B-spline Wavelets ..................... 115
Table of Contents XVII
5.6 An Example ............................................ 116

5.7 Summary ............................................... 124
6. Nonlinear Adaptive Neural Control . ....................... 125

6.1 Introduction ............................................ 125
6.2 Adaptive Control ........................................ 126
6.3 Adaptive Neural Control ................................. 129
6.4 Adaptation Algorithm with Variable Networks .............. 135
6.5 Examples ............................................... 137
6.6 Summary ............................................... 141
7. Nonlinear Predictive Neural Control. ...................... 143

7.1 Introduction ............................................ 143
7.2 Predictive Control ....................................... 144
7.3 Nonlinear Neural Predictors .............................. 148
7.4 Predictive Neural Control ................................ 150
7.5 On-line Learning of Neural Predictors ...................... 152
7.6 Sequential Predictive Neural Control ....................... 155
7.7 An Example ............................................ 157
7.8 Summary ............................................... 160
8. Variable Structure Neural Control . ........................ 163

8.1 Introduction ............................................ 163
8.2 Variable Structure Control ................................ 164
8.3 Variable Structure Neural Control ......................... 168
8.4 Generalised Variable Structure Neural Control .............. 172
8.5 Recursive Learning for Variable Structure Control ........... 174
8.6 An Example ............................................ 176
8.7 Summary............................................... 178
9. Neural Control Application to Combustion Processes . ..... 179

9.1 Introduction ............................................ 179
9.2 Model of Combustion Dynamics ........................... 180
9.3 Neural Network Based Mode Observer ..................... 182
9.4 Output Predictor and Controller .......................... 183
9.5 Active Control of a Simulated Combustor .................. 184
9.6 Active Control of an Experimental Combustor .............. 190
9.7 Summary ............................................... 192
References . .................................................... 193
Index .......................................................... 209

SYMBOLS AND ABBREVIATIONS
The symbols and abbreviations listed here are used unless otherwise stated.
C field of complex numbers
diag{.} diagonal matrix
dim(.) dimension of a vector
exp(.) exponential function
GA genetic algorithm
GAs genetic algorithms
GRBF Gaussian radial basis function
g complex conjugate of 9
II f lin n-norm of the function f
< .,. > inner product
A(. ) eigenvalue of a matrix
Arnax (.) maximum eigenvalue of a matrix
Arnin (.) minimum eigenvalue of a matrix
MLVlO multi-input multi-output
MLVlS multi-input multi-state
Mol method of inequalities
MLP multilayer percept ron
max{-} maximum
min{·} mllllmum
1·1 modulus
NARMA nonlinear auto-regressive moving average
NARMAX NARMA model with exogenous inputs
NN neural network
NNs neural networks
N integer numbers
}/+ non-negative integer numbers
w angular frequency
a partial derivative with respect to x
ax
4J(. ) basis function
r reference input
RBF radial basis function
R field of real numbers (- 00, 00 )
R+ field of non-negative real numbers [0,(0)
sign(.) sign function
xx Symbols and Abbreviations
SISO single-input single-output

SISS single-input single-state
sup{-} supremum
t time
u system control input
VPBF Volterra polynomial basis function
x system state vector
y system output
CHAPTER!
NEURAL NETWORKS
1.1 Introduction
The field of neural networks has its roots in neurobiology. The structure and
functionality of neural networks has been motivated by the architecture of
the human brain. Following the complex neural architecture, a neural network
consists of layers of simple processing units coupled by weighted interconnec-
tions. With the development of computer technology, significant progress in
neural network research has been made. A number of neural networks have
been proposed in recent years.
The multilayer percept ron (MLP)(Rumelhart et al., 1986) is a network
that is built upon the McGulloch and Pitts' model of neurons (McCulloch
and Pitts, 1943) and the perceptron (Rosenblatt, 1958). The perceptron maps
the input, generally binary, onto a binary valued output. The MLP uses this
mapping to real valued outputs for binary or real valued inputs. The decision
regions that could be formed by this network extend beyond the linear sepa-
rable regions that are formed by the perceptron. The nonlinearity inherent in
the network enables it to perform better than the traditional linear methods
(Lapedes and Farber, 1987). It has been observed that this input output net-
work mapping can be viewed as a hypersurface constructed in the input space
(Lapedes and Farber, 1988). A surface interpolation method, called the radial
basis functions, has been cast into a network whose architecture is similar to
that of MLP (Broomhead and Lowe, 1988). Other surface interpolation meth-
ods, for example, the multivariate adaptive regression splines (Friedman, 1991)
and B-splines (Lane et al., 1989), have also found their way into new forms
of networks. Another view presented in Lippmann (1987), and Lapedes and
Farber (1988) is that the network provides an approximation to an underlying
function. This has resulted in applying polynomial approximation methods
to neural networks, such as the Sigma-Pi units (Rumelhart et al., 1986), the
Volterra polynomial network (Rayner and Lynch, 1989) and the orthogonal
network (Qian et al., 1990). The application of wavelet transforms to neural
networks (Pati and Krishnaprasad, 1990) has also derived its inspiration from
function approximation.
While these networks may have little relationship to biological neural net-
works, it has become common in the neural network area to refer to them as
neural networks. These networks share one important characteristic that they
2 1. Neural N etwor ks
are able to approximate any continuous mapping to a sufficient accuracy if

they have resources to do so (Friedman, 1991; Stinchcombe and White, 1989).
As its name implies, a neural network is a network of simple processing
elements called neurons connected to each other via links. The architecture of
the network and the functionality of the neurons determine the response of
the network to an input pattern. The network does no more than provide an
input output mapping. Thus, a simple mathematical model can represent these
networks. This chapter will investigate the neural network architectures and
their functional representation by considering the multilayer network, which
laid the foundation for the development of many other classes of feedforward
networks.
1.2 Model of a Neuron
A neuron is an information-processing unit that is fundamental to the oper-

ation of a neural network. The model of a neuron is illustrated in Figure 1.1
(Haykin, 1994). There are three basic elements in the neuron model: connecting
links, an adder and an activation function.
Fig. 1.1. Model of a neuron
Each connecting link is characterised by a weight or strength of its own. Speci-

fically, a signal Uj at the j-th input connected to the k-th neuron is multiplied
by the weight Wkj. For the subscripts of the weight Wkj, the first subscript
refers to the neuron and the second subscript refers to the input to which the
weight refers. The reverse of this notation is also used in the literature.
The adder sums the input signals weighted by the respective connecting
link of the neuron. The operations described here constitute a linear combiner.
The activation function limits the amplitude of the output of a neuron,
which is also referred to in the literature as a squashing function in that it
squashes the permissible amplitude range of the output signal to some finite
1.2 Model of a Neuron 3
value. Typically, the normalised amplitude range of the output of a neuron is

written as the closed unit interval [0,1] or alternatively [-1,1].
In mathematical terms, a neuron may be described by the following pair
of equations:
n
Vk = L WkjUj (1.1 )
j=l
Yk = 'P(Vk) (1.2)
where Uj is the input signal, Wkj the weight of the neuron, Vk the linear com-
biner link, 'P(.) the activation function and Yk the output signal of the neuron.
The activation function defines the output of a neuron in terms of the
activity level at its input. There are many types of activation functions. Here
three basic types of activation functions are introduced: threshold function,
piecewise-linear function and sigmoid function.
When the threshold function is used as an activation function, it is de-
scribed by
if v:2:0
'P(V) ={ ~ if v < 0
(1.3)
A neuron employing such a threshold function is referred to in the literature

as the McCulloch Pitts model, in recognition of the pioneering work done by
McCulloch and Pitts (1943). In this model, the output of a neuron takes the
value of 1 if the total internal activity level of that neuron is nonnegative and
o otherwise.
The activation function using a piecewise-linear function is given by
if V> 1.
- 2
l' f 2"
1
> v > -2"1 (1.4)
if V::; - ~
where the amplification factor inside the linear region of operation is assumed
to be unity. This activation function may be viewed as an approximation to a
nonlinear amplifier. There are two special forms of the piecewise-linear func-
tion: (a) it is a linear combiner if the linear region of operation is maintained
without running into saturation, and (b) it reduces to a threshold function if
the amplification factor of the linear region is made infinitely large.
The sigmoid function is a widely used form of activation function in neural
networks. It is defined as a strictly increasing function that exhibits smoothness
and asymptotic properties. An example of the sigmoid is the logistic function,
described by
1
'P ( v) = -l-+-e---a-v (1.5)
where a is the slope parameter of the sigmoid function. By varying the pa-
rameter a, sigmoid functions of different slopes can be obtained. In the limit,
as the slope parameter approaches infinity, the sigmoid function becomes sim-
ply a threshold function. Note also that the sigmoid function is differentiable,
whereas the threshold function is not.
1.3 Architectures of Neural Networks

In recent years a number of neural network architectures have been proposed.
Here, four different classes of network architectures (or structures) are in-
troduced: single layer networks, multilayer networks, recurrent networks and
lattice networks.
1.3.1 Single Layer Networks
A network of neurons organised in the form of layers is viewed as a layered

neural network. The simplest form of a layered network is one that has an
input layer of source nodes that projects onto an output layer of neurons
(computation nodes), but not vice versa. In other words, this network is strictly
of a feedforward type. It is illustrated in Figure 1.2 for the case of five nodes in
the input layer and four nodes in the output layer. Such a network is called a
single-layer network, with the designation "single layer" referring to the output
layer of computation nodes (neurons) but not to the input layer of source nodes
because no computation is performed there.
Input layer Output layer
Fig. 1.2. Architecture of a single layer network
1.3.2 Multilayer Networks
The multilayer network has a input layer, one or several hidden layers and
an output layer. Each layer consists of neurons with each neuron in a layer
1.3 Architectures of Neural Networks 5
connected to neurons in the layer below. This network has a feedforward ar-
chitecture which is shown in Figure 1.3. The number of input neurons defines
the dimensionality of the input space being mapped by the network and the
number of output neurons the dimensionality of the output space into which
the input is mapped.
In a feedforward neural network, the overall mapping is achieved via in-
termediate mappings from one layer to another. These intermediate mappings
depend on two factors. The first is the connection mapping that transforms
the output of the lower-layer neurons to an input to the neuron of interest and
the second is the activation function of the neuron itself.
Input layer Hidden layer Output layer
Fig. 1.3. Architecture of a multilayer network
1.3.3 Recurrent Networks
A recurrent neural network has at least one feedback loop that distinguishes
itself from a feedforward neural network. The recurrent network may consist
of a single-layer or multilayer of neurons and each neuron may feed its output
signal back to the inputs of all the other neurons. A class of recurrent networks
with hidden neurons is illustrated in the architectural graph of Figure 1.4. In
the structure, the feedback connections originate from the hidden neurons as
well as the output neurons. The presence of feedback loops in the recurrent
networks has a profound impact on the learning capability of the network, and
on its performance. Moreover, the feedback loops use particular branches com-
posed of unit-delay elements, which result in a nonlinear dynamical behaviour
by virtue of the nonlinear nature of the neurons.
Outputs
Fig. 1.4. Architecture of a recurrent network
1.3.4 Lattice Networks
A lattice network may consist of a one-dimensional, two-dimensional, or higher-

dimensional array of neurons. The dimension of the lattice refers to the number
of the space in which the graph lies. A set of source nodes in this network supply
the input signals to the array. The architectural graph of Figure 1.5 depicts a
two-dimensional lattice of two-by-two neurons fed from a layer of three source
nodes. Note that in this case each source node is connected to every neuron in
the lattice. A lattice network is really a feedforward network with the output
neurons arranged in rows and columns.
Inputs
Fig. 1.5. Architecture of a lattice network

1.4 Various Neural Networks 7
1.4 Various Neural Networks

Many different types of neural networks have been developed in recent years.
This section introduces several main neural networks that are widely used in
control systems.
1.4.1 Radial Basis Function Networks
Radial basis functions (RBF) have been introduced as a technique for multi-
variable interpolation (Powell, 1987). Broomhead and Lowe demonstrated that
these functions can be cast into an architecture similar to that of the multilayer
network, and hence named the RBF network (Broomhead and Lowe, 1988).
In the RBF network, which is a single hidden layer network, its input
to the hidden layer connection transforms the input into a distance from a
point in the input space, unlike in the MLP, where it is transformed into a
distance from a hyperplane in the input space. However, it has been seen from
multilayer networks that the hidden neurons can be viewed as constructing
basis functions which are then combined to form the overall mapping. For
the RBF network, the basis function constructed at the k-th hidden neuron is
given by
(1.6)
where 11.112 is a distance measure, u the input vector, d k the unit centre in
the input space and g(.) a nonlinear function. The basis functions are radially
symmetric with the centre on d k in the input space, hence they are named
radial basis functions. Some examples of nonlinear functions used as a radial
basis function g(.) are the following:
(a) the local RBFs
g(r) = exp (::) (Gaussian) (1.7)
g(r) = (r2 + (72)-~ (inverse multiquadric) (1.8)

(b) the global RBFs
g(r) = r (linear) (1.9)

g(r) = r3 (cubic) (1.10)
g(r) = vr2 + c2 (multi - quadratic) (1.11)
g(r) = r 2In(r) (thin plate splines) (1.12)
g(r) = In(r 2 + (72) (shifted logarithms) (1.13)
r2
g (r) = (1 - exp - ~2 ) In (r) (pseudo potential functions) (1.14)
where r = II u - d k 112' (7 is a real number commonly called receptive width or

simply the width of the locally-tuned function which describes the sharpness
of the hyperbolic cone used in the radial basis function.
As observed earlier, any functional description that is a linear combination

of a set of basis functions can be cast into a feedforward architecture. The
traditional methods used in surface interpolation and function approximation,
all have a functional form similar to that of the RBF network.
1.4.2 Gaussian RBF Networks
The radial basis function network with Gaussian hidden neurons is named the
Gaussian radial basis function (GRBF) network, also referred to as a network
of localised receptive fields by Moody and Darken, who were inspired by the
biological neurons in the visual cortex (Moody and Darken, 1989). The GRBF
network is related to a variety of different methods (Niranjan and Fallside,
1990), particularly, Parzen window density estimation which is the same as
kernel density estimation with a Gaussian kernel, potential functions method
for pattern classification, and maximum likelihood Gaussian classifiers, which
all can be described by a GRBF network formalism.
Following (1.6) and (1.7), the GRBF network can be described in a more
general form. Instead of using the simple Euclidean distance between an input
and a unit centre as in the usual formalism, a weighted distance scheme is used
as follows:
(1.15)
where C k is a weighting matrix of the k-th basis function whose centre is d k .

The effect of the weighting matrix is to transform the equidistant lines from
being hyperspherical to hyperellipsoidal. Thus, a Gaussian RBF is given by
(1.16)
where d and C represent the centres and the weighting matrices. Using the
same Ck for all the basis functions is equivalent to linearly transforming the
input by the matrix C;;1/2 and then using the Euclidean distance (u-dk)T (u-
d k ). In general, a different Ck is used.
The Gaussian RBF network mapping is given by
n
J(u;p) = L Wkipk(U; d, C) (1.17)
k=l
where p = {w, d, C}. Clearly, the Gaussian RBF network is determined by

the set of parameters {w k , d k , Ck}. To learn a mapping using this network,
one can estimate all of these parameters or alternatively, provide a scheme to
choose the widths C k and the centres d k of the Gaussian and adapt only the
weights Wk. Adapting only the weights is much easier and more popular, since
the problem of estimation is then linear.
1.4.3 Polynomial Basis Function Networks
Multivariate polynomial expansions have been suggested as a candidate for

discriminant functions in pattern classification (Duda and Hart, 1973; Koho-
nen, 1984) and are widely used in function approximation, particularly when
the input is one dimensional (Powell, 1981). Recently, the polynomial expan-
sion of a function with multiple variables has been cast into the framework of
neural networks. Its functional representation is described by
f(u) (1.18)
n n n
j(u;p) Wo +L WiUi +L L Wili2 U il Ui2 + ... +
i=l il=li2=il
n n n
+L L L Wili2 ... ikUilUi2·· .Uik
il=l i2=il ik=ik-l
N
L WjCPj(u) (1.19)
j=l
where p = {Wj} is the set of the concatenated weights and {cpj} the set of
basis functions formed from the polynomial input terms, N is the number of the
polynomial basis functions, k is the order ofthe polynomial expansion, O(Uk+l)
denotes the approximation error caused by the high order (:2: k+ 1) of the input
vector. The basis functions are essentially polynomials of zero, first and higher
orders ofthe input vector U E nn.
This method can be considered as expanding
the input to a higher dimensional space. An important difference between
polynomial networks and other networks like REF is that the polynomial basis
functions themselves are not parameterised and hence adaptation of the basis
functions during learning is not needed.
1.4.4 Fuzzy Neural Networks
Fuzzy neural networks have their origin from fuzzy sets and fuzzy inference
systems, which were developed by Zadeh (1973). A survey of fuzzy sets in ap-
proximate reasoning is given in Dubois and Prade (1991). The fuzzy reasoning
is usually an "if-then" rule (or fuzzy conditional statement), for example,
If pressure is HIGH, then volume is SMALL
where pressure and volume are linguistic variables, and HIGH and SMALL
linguistic values. The linguistic values are characterised by appropriate mem-
bership functions. The "if" part of the rules is referred as the antecedent and
the "then" part is known as the consequent.
Another type of fuzzy if-then rule has fuzzy sets involved only in the an-
tecedent part. For example, the dependency of the air resistance (force) on the
speed of a moving object may be described as
10 1. Neural Networks
If velocity is HIGH, then force = k * (velocityP

where HIGH is the only linguistic value here, and the consequent part is given
by a non-fuzzy equation of the input variable, velocity.
Suppose there is a rule base that consists of two fuzzy if-then rules, which
are
Rule 1: If UI is Al and U2 is B I , then el(u) = alUI + blU2 + CI

Rule 2: If UI is A2 and U2 is B 2" then e2(u) = a2uI + b2U2 + C2
To construct a fuzzy reasoning mechanism, the firing strength of the i-th rule
may be defined as the T-norm (usually multiplication or minimum operator)
of the membership values on the antecedent part
(1.20)
or
(1.21 )
where /-LAi (Ui) and /-LEi (Ui) are usually chosen to be bell-shaped functions with
maximum equal to 1 (Jang and Sun, 1993) and minimum equal to 0, such as
(1.22)
{CAJ, {bAJ and {CTd are the parameter sets.

A fuzzy reasoning mechanism may be stated thus: the overall output is
chosen to be a weighted sum of each rule's output (Takagi and Hayashi, 1991).
Thus, a fuzzy neural network can be given by
~ ei(u)
f(u) = ~ m CPi(Ui) (1.23)
i=I2:CPj(Uj)
j=l
where m is the number of fuzzy if-then rules.

The approximation capability of fuzzy neural networks or fuzzy inference
systems has been established by numerous researchers (see, for example, Wang,
1993; Brown and Harris, 1994). The functional equivalence of fuzzy neural
networks to RBF networks has also been studied (Jang and Sun, 1993). Both
fuzzy and RBF neural networks transform an input space into an output space
by clustering the input space, applying gains to each cluster, and interpolating
the regions between the clusters.
1.4.5 Wavelet Neural Networks

Wavelet neural networks were introduced in the 1990s (Zhang and Benveniste,
1992; Liu et al., 1998), based on wavelet transform theory initiated by Mor-
let et aZ. (1982) though the theory goes as far back as 1952 (Calderon and
Zygmund, 1952). Wavelet transform theory was developed to analyse signals

with varied frequency resolutions as a unifying idea of looking at nonstation-
ary signals at various time locations. For reviews and tutorials on wavelets,
see, for example, Rioul and Vetterli (1991), Strang (1989), Strichartz (1993)
and numerous complementary texts such as Chui (1992), Ruskai (1991) and
Newland (1993).
The wavelet transform provides a better alternative to classical Short-time
Fourier or Gabor transform (Gabor, 1946) and Windowed Fourier transform
(Daubechies, 1990) for time frequency analysis. For a continuous input sig-
nal, the time and scale parameters of the wavelet transform can be contin-
uous, which leads to a continuous wavelet transform, or be discrete, which
results in a wavelet series expansion. This is analogous to classical continu-
ous Fourier transform and discrete Fourier transform (Daubechies, 1990). The
term wavelets, which are wavelet transform and wavelet series, will be used
interchangeably, though strictly, the wavelet transform relates to continuous
signals while the wavelet series handle discrete transforms.
There exist some significant differences between wavelet series expansions
and classical Fourier series, which are:
(a) Wavelets are local in both the frequency domain (via dilations) and in
the time domain (via translations). On the other hand, Fourier basis functions
are localised only in the frequency domain but not in the time domain. Small
frequency changes in the Fourier transform will cause changes everywhere in
the time domain.
(b) Many classes of functions can be described in a more compact way by
wavelets than by the Fourier series. Also, the wavelet basis functions are more
effective than classical Fourier basis ones in achieving a comparable function
approximation. For example, a discontinuity within a function could be rep-
resented efficiently by a few wavelets whereas it may require many more basis
functions from the Fourier expansion.
The wavelets (Daubechies, 1988) refer to a family of functions that take
the following form in the continuous case:
(1.24)
where s is a scaling or dilation factor and t a translation factor of the original

function 7/J (u).
The continuous wavelet transform of a function g(u) E L 2 (R) (square
integrable space) is defined by
[W g(u)](s, t) = Is1 1 / 2 1CXl
-CXl
u- t
g(u)7/J(-s-)du (1.25)
This transfer can decompose g(u) into its components at different scales in
frequency and space (location) by varying the scaling/dilation factor sand
the translation factor t, respectively.
The function g(u) can be reconstructed by performing the inverse opera-
tion, that is
(1.26)
if the wavelet 1jJ( u) satisfies the admissibility conditions (Daubechies, 1988)

given by
(1.27)
Similar to the discrete Fourier transform (a discrete version of the continuous

Fourier transform), there also exists a discrete wavelet transform to calculate
the wavelet transform for discrete signals. For this case, the basic wavelet
function given in (1.24) needs to be discretised at various sand t values. For
example, a typical scaling and translation basis would be
(1.28)
(1.29)
where j E N+ and k E N+. The discrete basic wavelet function is given by
(1.30)
In practice, the orthonormal wavelet functions are widely used. For example,
the following Haar wavelet is one of such wavelets.
if O:S u < ~
if ~:Su<l (1.31 )
otherwise
Also, the orthonormal wavelet functions include the Gaussian derivative wavelet
1jJ(u) = (k - r)e- r / 2 (1.32)
and the Mexican hat wavelet
1jJ(u) = 2 (k _ r2)e- r2 / 2 (1.33)

y'31fl/4
where r = u T u and k = dim(u) (Mallat, 1989b).

There are many other types of wavelet function which depend on the re-
quired application, such as smooth wavelets, compactly supported wavelets,
wavelets with simple mathematical expressions and wavelets with associated
filters (Rioul, 1993). Wavelet theory and transforms can also be applied to
multiresolution analysis as used in signal processing (Akansu, 1992). Alterna-
tively, the Gaussian derivative wavelet and the Mexican hat wavelet can be
viewed as bandpass filters (Weiss, 1994).
The wavelet network is of the form (Zhang and Benveniste, 1992)
m
g(u) = go + L Wi'Ij;(Si(U - ti)) (1.34)
i=l
where Si = diag(sil' ... , Sid), d is the dimension of the input, and go is intro-
duced to deal with nonzero mean functions on finite domains. The original
formulation of the wavelet network was based on the tensor product of one-
dimensional wavelets but recently the radial wavelet function was applied.
To obtain the orientation selective nature of dilations and to improve flexi-
bility, a rotation transform can be incorporated by
m
g(U) = go + L w(lj)((u - ti)/Si) (1.35)
i=l
Wavelet theory and networks have been widely employed in applications in
diverse areas, such as geophysics (Kumar and Foufoula-Georgiou, 1993) and
system identification (Sjoberg et al., 1995; Liu et al., 1999, 2000).
1.4.6 General Form of Neural Networks
There are many other types of neural networks. Forms of neural networks
based on orthogonal polynomial expansions can be used, such as Hermite
polynomials, Legendre polynomials and Bernstein polynomials. Apart from
the polynomial expansion, orthogonal basis functions such as the Fourier se-
ries may also be employed. The surface interpolation method of splines has
been adopted in the development of spline networks (Friedman, 1991). Kernel
functions, which are commonly used in kernel density estimation procedures,
may also be introduced as forms of neural networks.
The mathematical formalism of the networks allows recent developments
in neural networks to deviate from the biological plausibility that served as an
impetus in the first place. This is not a cause for concern because the ultimate
aim of such developments is to build machines rather than to understand and
model biologically intelligent systems. What should be avoided is to refer to
them simply as neural networks. However, to avoid confusion in the terminol-
ogy we will continue to refer to these as neural networks with the emphasis
placed on the fact that they are no more than a special class of nonlinear
model.
The functional description of neural networks has a common form of ex-
pression. Essentially, neural networks are parametric and can be described as
a linear combination of basis functions. So, the neural network is generally
denoted by
m
f(u; w) = L wk'Pdu) (1.36)

k=l
where w is the parameter vector containing the coefficients Wk and the set
of parameters that define the basis function 'Pk(U), m is the number of basis
functions used in the overall mapping of the network. For each parameter
vector w E P, the network mapping f E F w , where P is the parameter set and
Fw the set of functions that can be described by the chosen neural network.
1.5 Learning and Approximation
Neural networks learn from the examples presented to them, which are in
the form of input output pairs. To simplify the presentation, a single variable
function is taken into account. Let the input to the network be denoted by u
and the output by y. The neural network maps an input pattern to an output
pattern, described by
f:u--+y (1.37)
An assumption made about these examples is that they are consistent with an
underlying mapping, say j*. Then the relationship between the input and the
output can be stated as
y = j*(u) +v (1.38)
where v is the measurement noise, which is an unknown random signal. Here,

let us assume that the measurements are noise free. Then, the data set of N
examples is described as
(1.39)
which contains the information that is available about the unknown mapping
j*.
Let the set Fw = {j(u;w) : for all w E P} describe all functions that
can be mapped by the neural network. The task of learning is to approximate
j*(u) by choosing a suitable f(u; w). This requires a measure of approximation
accuracy to be defined, whose simple example is the approximation error.
1.5.1 Background to Function Approximation
The basic approximation problem treated in this book can be stated as follows:
For a given f(u), find the function amongst the set Fw = {j(u;w) :
for all w E P} that has the least distance to f (u). This is equivalent to finding
the f(u; w) that has the least approximation error, i.e.,
min II f(u) - f(u; w) 112 (1.40)
It is not sufficient that the function f (u; w) to be found most closely ap-
proximates f (u) alone. To guarantee the approximation to be sufficiently good,
the least approximation error must be below a threshold. If the set F w , which
contains all the functions that can be mapped by the network, is sufficiently
large, then there is a reasonable chance of satisfying the above requirement.
1.5 Learning and Approximation 15
In practice, the underlying function f (u) to be approximated is unknown.

The information about the function f (u) is contained in the discrete data set
D (see equation (1.39)). Then, the accuracy of approximation measure must
be based on this discrete set and is given by
n
e(D; fw) = L IYk - f(Uk; W)12 (1.41)
k=l
which is known as the squared error measure. The approximation problem is

to find f(u; w), with its shorthand fw, that has the least e(D, fw), which is
referred to as the least squares approximation.
The network mapping f(u;w) is defined by a specific set of parameters
w E P. Finding the closest function f (u; w) to f (u) is equivalent to finding an
optimal parameter w, denoted by w*, the corresponding map being f (u; w*)
with shorthand description f:V. Thus, the learning problem of seeking the best
approximation to the underlying function becomes that of estimating the op-
timal set of parameter values. From the distance measure given in (1.41) that
is a function of w, the estimation problem can be stated as
w* = arg mine(D; fw) (1.42)

w
Generally, w appears nonlinearly in f (u; w). It is clear that the above problem
is a nonlinear optimisation problem, which can be solved by any of the standard
procedures or algorithms such as those in Luenberger (1984).
The function e(D; fw) can be viewed as an error surface defined over the
space of the parameter w, called the parameter space. This surface will either
have one or several minima which depend on how the parameter w appears
through f(u; w). If f(u; w) is linear in w, e(D; fw) is convex and has only one
minimum that is the global minimum of the error surface. On the other hand,
if f (u; w) is nonlinear in w the error surface may have several local minima
due to the non convexity of e(D; fw). One must bear in mind the effects caused
by the presence of local minima in choosing an optimisation procedure or
algorithm.
1.5.2 Universal Approximation
For the function approximation, it assumes that a choice of f (u; w) and F w

is made. Now, let us see why neural networks have been a popular choice for
representing f (U; w). The selection of F w determines the goodness of function
approximation that can be achieved. For example, if Fw contains a single
member which is a constant, then the best approximation in the Hibert space
1{ is the mean of the output value of f (u; w). It is clear that this is bound
to be a bad function approximation if the range of f (u) is large. However, if
F w spans the entire space 1{, then f (u) can be exactly represented by some
f(u; w). The approximation ability of the representation f(u; w) is therefore
crucial to the goodness of function approximation.
Neural networks with at least a single hidden layer have been shown to
have the capacity to approximate any arbitrary function in C(Rm) (continuous
function space) if there are a sufficiently number of basis functions (or hidden
nodes) (Cybenko, 1989). This property of neural networks is referred to as the
universal approximation property.
This approximation ability of neural networks can also be understood from
the geometric view in the function space. If the neural network consists of N
hidden neurons, then the function to be mapped is represented by a linear
combination of the N basis functions ¢k (u). For the case where these N basis
functions are linearly independent, the set of functions the network can map,
span a subspace of N-dimensions in the infinite dimensional Hilbert space H.
By increasing the number of linearly independent basis functions to infinity
the subspace spanned by the neural network mapping is extended to the entire
Hilbert space H. For the Gaussian RBF network, the linear independence of
the basis functions with different centres holds (Poggio and Girosi, 1990a,b),
which can also be extended for other types of neural networks to show that
these basis functions are linearly independent.
1.5.3 Capacity of Neural Networks
The universal approximation property of neural networks does not provide

any information about the capacity of a network with a finite number of basis
functions (or hidden units). But, it indicates that the capacity of the network
depends on the number of basis functions. This is also evident from the fact
that a larger sized Fw can give a good approximation to a wider class of
functions because increasing the number of hidden units increases the size of
this set.
The notion of capacity was introduced for pattern classifiers by Cover
(1965). The classifier typically provides an output value of either 0 or 1 and
then constructs only a class of piecewise constant functions. It was subse-
quently developed by introducing the concept of Vapnik Chervonenkis dimen-
sion or VC dimension as a measure of the capacity of a classifier network
(Vapnik and Chervonenkis, 1971). The VC dimension of a classifier network is
defined as the maximum number of dichotomies that a network can induce on
the input space. A dichotomy is a partition of the input space into two sub-
regions in this space. This is closely related to the number of hidden units in
the network, in analogy with the number of coefficients or degrees of freedom
(Baum and Haussler, 1989).
The notion of VC dimension is extended to networks that map arbitrary
real-valued functions and obtains a capacity measure for such networks. This
capacity is also found to be directly related to the number of hidden units
in the network. Generally, for commonly used neural networks the number of
parameters provide a measure of their capacity.
1.5.4 Generalisation of Neural Networks
If having learned to map the examples in the data set D, a neural network
predicts the input output observations consistent with the underlying function
f(u), which are not in D. Then, this neural network is said to generalise well.
The generalisation ability of a network depends critically on its functional form
f(u; w) and the data set D.
In order that a network has the capacity to generalise, its functional form
f(u; w) must be able to provide a sufficiently good approximation to the un-
known underlying function f(u). This implies that the capacity ofthe network
and hence the number of parameters should be large. The universal approxi-
mation property of neural networks seems to suggest that the functional rep-
resentation is not important as long as a sufficiently large network is chosen.
After the functional form is chosen, the network parameters must be esti-
mated from the data set D. If the number of examples contained in this data
set is less than the number of parameters, infinitely many solutions for the pa-
rameters that will fit the data exist. The network will generalise poorly if the
learning algorithm cannot give consistent estimates and cannot find an esti-
mate not necessarily closest to the unknown f(u). The generalisation problem
of neural networks can also be understood from a statistical point of view. If
there are an infinite number of functions that can fit the data set D exactly,
the probability that the estimate found will be closer to f(u) will be very low.
With an increasing number of examples this probability is increased and in
turn the generalisation of the network is improved. Thus, the network size
that gives good generalisation depends on the number of examples that are
used to estimate the parameters. It has been shown that an upper bound on
the number of parameters of the network can be derived on the basis of the
size of the data set D (Baum and Haussler, 1989).
It has been observed that choosing large size networks is bound to exhibit
poor generalisation (Chauvin, 1989), which is referred to as overfitting. Impos-
ing smoothness constraints is a powerful way of reducing the dimensionality of
the functional representation problem. Good generalisation can be achieved by
choosing large networks with added penalty terms to provide smoother basis
functions (Hinton, 1987; Hanson and Pratt, 1989).
1.5.5 Error Back Propagation Algorithm
A learning algorithm for neural networks is often defined by the optimisation

criterion and the optimisation procedure together. A widely used optimisa-
tion algorithm for neural network learning is based on the least squares error
criterion. The least squares method gives an estimate that is the maximum
likelihood estimate under the assumption that the observation noise statis-
tics is Gaussian (White, 1989). Alternative criteria based on cross entropy for
classifiers have also been proposed (Solla et at., 1988). These criteria view the
classifier as constructing a conditional probability for which the cross-entropy
distance measure is more suitable (Basseville, 1989). Since the interest here
is in estimating functions rather than constructing probability estimates, the

least squares criterion is more appropriate.
Neural network learning can be viewed as a block estimation problem where
all the information or data are assumed available together. Here, a brief inves-
tigation on error back propagation is given, which is the most commonly used
neural network learning algorithm.
The error back propagation learning algorithm, which is devised for the
MLP, was the first learning algorithm developed for multilayer feedforward
networks. Essentially, it is a stochastic gradient descent procedure minimising
the squared error criterion. Given the network mapping f(u;w) and the data
set D, the squared error function is given by
n
Je(D; fw) = 2)Yk - f(Uk; W))2 (1.43)
k=l
The error back propagation algorithm updates the parameters according to

the following:
(1.44)
at the j-th iteration, where a is a constant and
(1.45)
Substituting for Je(D; fw) and differentiating gives
n
-2 L ek V' wf(Uk; W(j-l)) (1.46)
k=l
where
Yk-f(Uk;W) (1.47)
of (Uk; w)
ow IW=W(j-l) (1.48)
The parameter is adapted in the direction of decreasing J e (D; f w), where the
direction is averaged over all the samples. The iteration is repeated until the
squared error falls below a required threshold.
This algorithm could be efficiently implemented in feedforward networks
by back propagating the errors (Chan and Fallside, 1987). Further, it could be
implemented within the highly parallel architecture of neural networks.
The error back propagation learning algorithm has a characteristic feature
of slow rate of convergence. Such behaviour is caused by the shape of the error
surface in the parameter space in which sharp valleys and long plateaux exist.
A scheme for adapting the step size or learning rate is proposed, based on the
angle of the previous gradient direction and the current gradient direction in
the parameter space (Chan and Fallside, 1987).
When the learning problem is viewed as one minimising a cost function,
the slow rate of the gradient descent procedure in the error back propagation
method becomes fairly obvious. The nonlinear optimisation procedure consid-
ers only the gradient of the current iteration. Methods that are faster but need
more computation have been developed, for example, the method of line search
along the gradient direction, the conjugate gradient descent method which
utilises information about previous descent directions, and the quasi-Newton
descent direction method which utilises the Hessian of the cost function along
with the gradient (Luenberger, 1984). These methods have also been applied
to neural network learning.
1.5.6 Recursive Learning Algorithms
The problem of recursive learning in neural networks can be viewed as a re-

cursive parameter estimation problem for which a variety of algorithms exist
(Ljung and Soderstrom, 1983; Young, 1984). The general sequential learning
algorithm can be operated as follows. Let the data set be defined as
(1.49)
which is received sequentially, so that at time n the observation {(Uk, Yk); f( Uk)
= yd is received. The neural network or nonlinear model mapping is given by
f(u; w). Let the set of parameter values be w(n-l) before the n-th observation
is received, which is known as the a priori estimate. On learning v(n), let
the parameter values be modified to w(n), known as the a posteriori estimate.
The operation of the recursive learning algorithm is to provide a functional
relationship between the posterior estimate w(n), the prior estimate w(n-l),
and the n-th observation. In general, it can be described mathematically by
(1.50)
where h(.,.) is a nonlinear function.

The recursive learning algorithms has an important advantage over block
algorithms that the model or network can continually improving the approx-
imation as it learns. If the network mapping is exactly the same as f(u), the
underlying model mapping, both Yn and f(u n ; w(n-l)) would be equal. But, it
is indicative of the approximation error between the network and the under-
lying model if a difference between the two exists. This difference is also the
prediction error, which is the error in the predicted value when compared to
the actual value. The prediction error, denoted by en, is defined by
(1.51)
In sequential learning, the prediction error can be calculated for each ob-
servation as it arrives and hence is a dynamic performance index that can be
used in evaluating different models and algorithms.
1.5.7 Least Mean Square Algorithm
A commonly used algorithm for neural networks is the least mean square
(LMS) algorithm (Widrow and Hoff, 1960). It is a special case ofthe stochastic
approximation algorithm (Robbins and Munro, 1951). For the n-th observation
v(n), the parameter vector is adapted by
(1.52)
where en is the prediction error and T} the learning rate or the adaptation step
size. The above LMS learning is a recursive version of the stochastic gradient
descent procedure, with the gradient being estimated on the basis of the cur-
rent sample rather than the ensemble of examples as in the block estimation
procedure. It is shown that such a procedure minimises the least squares cost
function defined in equation (1.43) (block estimation cost function) and fur-
ther that the LMS algorithm converge slowly to the underlying set of optimal
parameters.
1.6 Applications of Neural Networks

Neural networks have been widely applied to many areas. Applications of
neural networks for classification, filtering, modelling, prediction, control and
hardware implementation are introduced here.
1.6.1 Classification
With the growth of information technology and the availability of cheap com-
puter systems, the rapid expansion of medical knowledge makes the develop-
ment of Computer-Aided Diagnostic (CAD) systems increasingly attractive.
Such systems assist clinicians to improve clinical decision-making. The con-
tribution of neural networks to such systems is no exception. For example,
RBF networks have been applied to classify various categories of low back
disorders (Bounds et at., 1990), which takes in many elements of information
and classifies the different cases of low back disorders. Besides using RBF net-
works, classification studies have been made with MLP networks, fuzzy logic,
k-nearest neighbours, closest class mean and also have been compared with
clinicians' diagnoses.
Classification and feature extraction of speech signals is the single most
applied and reported application of neural networks (see, for example, Re-
nals, 1989; Bengio, 1992). Primarily, neural networks are used to classify spo-
ken vowels based on speech spectrograms. It is worth noting that consistent
1.6 Applications of Neural Networks 21
and superior performance was obtained with neural networks compared to

other known methods or networks for speech classification and feature ex-
traction. Neural networks have also found their way into process industries
(Leonard and Kramer, 1991; Lohninger, 1993). Possible applications include
process state identification, sensor validation, malfunction diagnosis and fault
detection. Neural networks have also been employed in the classification of
marine phytoplankton from multivariate flow cytometry data (Wilkins et at.,
1994).
1.6.2 Filtering
Neural networks have drawn considerable attention from the signal processing
community (Casdagli, 1989; Chen et al., 1990; LeCun et al., 1990). Remarkable
claims have been made concerning the superior performance of neural networks
over traditional methods in signal processing. One of the major areas of signal
processing application is filtering.
The filtering property of neural networks employing Gaussian radial basis
functions has been discussed and reported by researchers (Tattersall et at.,
1991) and applied in filtering chaotic data (Holzfuss and Kadtke, 1993). Gaus-
sian RBFs are a particularly good choice for this purpose because of the local
property of this network, which enables wild oscillations to be damped out.
Actually, the RBF method for multivariate approximation schemes is devel-
oped by imposing a smoothness constraint on the approximation function. This
smoothness constraint can be synthesised in the frequency domain by the use of
the generalised Fourier transform. Analysis and application of the generalised
inverse Fourier transform lead to a smooth approximating scheme. Moreover, it
has been shown that the neural network approach is a very promising method
for smoothing scattered data (Barnhill, 1983).
Neural networks as filters have been used in digital communications such
as channel equalisation and overcoming cochannel interference. Significant ro-
bustness and good filtering properties of neural networks for systems with high
signal to noise ratios have been reported (Holzfuss and Kadtke, 1993).
1.6.3 Modelling and Prediction
Since neural networks are used for nonlinear prediction of chaotic time series
(Casdagli, 1989), there has been a growing interest in using neural networks
for various prediction tasks (Leung and Haykin, 1991). Many prediction tasks
include various nonlinear time series, such as annual sunspots, Canadian lynx
data, ice ages, measles; chaotic data include Ikeda map, Lorenz equations,
Mackey-Glass delay differential equation, Henon map, logistic map, Duffing
oscillators, radar backscatter, fluid turbulence flow, electrochemical systems
(electrodissolution of copper in phosphoric acid) and many others. Neural
networks have become popular for prediction of a variety of different time
series, for example, chaotic time series (Platt, 1991), speech waveforms (Fall-
side, 1989) and economic data (Weigend et al., 1991). The interest in most of
these approaches demonstrate the effectiveness with which good predictions

can be made with neural networks. In some cases, investigation into the accu-
racy level has been achieved by different sizes of networks, and in others the
performances of different types of networks has been compared.
Unlike speech waveforms or astronomical data, chaotic time series are usu-
ally deterministic and the underlying models generating them are known. So,
the neural network learning performance can be evaluated directly by apply-
ing it to the prediction of chaotic time series. A review on the use of neural
networks for the prediction of chaotic systems can be found in Casdagli et al.
(1994).
Extensive studies of neural network prediction and modelling capabilities
have been reported (Carlin et al., 1994). Based on real-world data, these stud-
ies are used to identify the dynamic actuator characteristics of a hydraulic
industrial robot, to model carbon consumption in a metallurgic industrial
process and to estimate the water content in fish food products based on
NIR-spectroscopy. The use of neural networks for system identification was
popularised by a series of research papers (Chen et al., 1990; Narendra and
Parthasarathy, 1990; Chen and Billings, 1992).
1.6.4 Control
Neural networks have also received widespread attention and have been ap-
plied to the control of dynamical systems. They are employed to adaptively
compensate for plant nonlinearities (Sanner and Slotine, 1992; Feng, 1994; Liu,
2001). Under mild assumptions about the degree of smoothness exhibited by
the nonlinear functions, it has been shown that the nonlinear optimal neural
control is globally stable with tracking errors converging to a neighbourhood
of zero. A variant of neural networks (with Gaussian RBFs) is used to opti-
mise and control a repetitively pulsed, small-angle negative ion source which is
designed to produce a high-current, low-emittance beam of negative hydrogen
ions for injection into various accelerators used in nuclear physics (Mead et al.,
1992). Neural networks have shown amongst other things, the versatibility of
nonlinear adaptive basis functions, simple and rapid training algorithms, and
a variety of optional capabilities that could be incorporated such as Kalman
noise filtering. Neural networks have been used to design more powerful feed-
back feedforward controllers for robotic applications (Parisini and Zoppoli,
1993). Apart from showing the desirable properties the neural network could
achieve, it is highlighted how much computational load is involved, particu-
larly how the computation increases rapidly with respect to the dimension of
the problem.
There are many other interesting applications of neural networks used in
the control of dynamical and industrial systems. Space however, precludes
mention of these but details can be found as follows: biomedical control (Nie
and Linkens, 1993), chemical and industrial processes (Roscheisen et al., 1992;
Liu and Daley, 1999a,c,200l), servomechanism (Lee and Tan, 1993).
1.7 Mathematical Preliminaries 23
1.6.5 Hardware Implementation
The hardware implementation of neural networks is parallel to the development

of neural networks. Many reports have appeared describing this development.
An optical disk implementation of a neural network was reported to apply the
network to a handwritten classification task (Neifield et al., 1991). The optical
disk based system was designed to recognise handwritten numerals 0-9, and
computed the Euclidean distance between an unknown input and 650 stored
patterns (REF centres or reference vectors) at a demonstrated rate of 26 000
pattern comparisons/so This application chose the REF structure of neural
networks because they lie between supervised output error driven learning
algorithms like backpropagation, and memory intensive sample based systems
such as k-nearest neighbour classifiers.
In space-based applications, three different neurocomputers were designed
and their performances compared (Watkins and Chau, 1992), based on cus-
tom analogue VLSI circuits and digital systems made commercially available
on digital signal processors (Motorola 56000). These neurocomputer systems
were controlled by a PC host running in C and 56000 assembly language.
In computer vision, an experimental analogue VLSI chip to reconstruct a
smooth surface from sparse depth data was reported (Harris, 1987, 1988). Us-
ing state-of-the-art CMOS technology, it showed how a neural VLSI chip can
provide thin-plate spline smoothing of images. This work (Harris, 1994) was
extended and implemented via Delbruck's bump-resistors concept (Delbruck,
1991). Hardware development of neural networks is described in Anderson
et al. (1993) while a good review of various neurocomputer developments is
provided by Glesner and Pochmuller (1994).
1.7 Mathematical Preliminaries

This section outlines some fundamental mathematical concepts that are nec-
essary for the remaining chapters.
The framework of metric spaces provides a general way of measuring the
goodness of approximation, since there is a distance function defined for all
functions that belong to this space. A special case of metric spaces, normed
linear spaces, gives a more convenient method for approximation.
In most approximation problems, f(x) is in the space C(Rn) which is the
set of continuous functions defined in R n. The Lp- norm in C (R n) is defined
as
(1.53)
In a normed linear space, the distance between the functions f (x) and 1* (x)
is given the shorthand description
D(j, j*) := II f - 1* 112 (1.54)

and is the norm of the difference between the two functions, which is a suitable
distance function. Since the difference f - 1* is the error function, this measure
is the approximation error.
The commonly used norms are the 1-,2-, (Xl-norms. The L 1 -norm has
the property that the magnitude of error in the case of discrete data makes
no difference to the final approximation (Powell, 1981). The Loa-norm, also
known as the Chebyshev norm, is much used in approximation theory. The
norm can also be expressed as
II f Iloa:= sup f(x) (1.55)

xERn
which gives the maximum value of f(x). The (Xl-norm of the difference would
then give the maximum difference between the two functions for any point x,
which is also the maximum error of approximation.
The L 2 -norm or the Euclidean norm occurs naturally in theoretical studies
of Hilbert spaces (Powell, 1981). The practical reasons for considering the L 2 -
norm are even stronger. From a statistical point of view, if the errors in the
data have a normal distribution, the most appropriate choice of data fitting is
the L 2 -norm. Further, highly efficient algorithms can be developed to find the
best approximation. The L 2 - norm is given by
II f 112:= (
1
xERn
If(xWdx
)
1/2
(1.56)
The L 2 -norm defines the L2-space of functions, the square integrable real func-
tions. Since an inner product can be defined in this space, it is also the Hilbert
space of square integrable real functions, denoted by H (Linz, 1979). All con-
tinuous functions in C (R n), and therefore f and 1* , are a subset of this Hilbert
space.
Typically, for a function to be admitted into H, its L 2 -norm must be finite.
There exist continuous functions with infinite L 2 -norm in C(Rn). However,
for the input space D E Rn the norms of these functions can be made finite.
Since the input space is always finite, all continuous functions can be admitted
into H (see also, Linz, 1979).
Nand R denote the set of integers and real numbers, respectively. L2 (R)
denotes the vector space of measurable, square-integrable one-dimensional
functions f(x). For f,g E L 2 (R), the inner product and norm for the space
i:
L 2 (R) are written as
< f,g >:= f(x)g(x)dx (1.57)
II f 112 :=< f, f >1/2 (1.58)
where g(.) is the conjugate of the function g(.). L 2 (R n ) is the vector space
of measurable, square-integrable n-dimensional functions f(X1' X2, ... , x n ). For
1.8 Summary 25
j,g E L2(Rn), the inner product of j(Xl,X2, ... ,Xn ) with g(Xl,X2, ... ,Xn ) is
written as
The above mathematical notation introduced in this section will be used

throughout this book.
1.8 Summary
This chapter has presented an overview of neural networks. It started with

a description of the model of a neuron (the basic element of a neural net-
work) and commonly used architectures of neural networks. Then various neu-
ral networks were discussed, such as radial basis function networks, Gaussian
REF networks, polynomial basis function networks, Fuzzy neural networks
and wavelet networks. Function approximation by neural networks was then
considered. It takes the view that function approximation is essentially a lin-
ear combination of a set of basis functions defined at the hidden layer of a
single hidden layer network. Learning by neural networks and its relation to
function approximation are discussed with measures of approximation good-
ness. Three learning algorithms are introduced: the error back propagation
algorithm, the recursive learning algorithm and the least mean square algo-
rithm. Applications of neural networks to classification, filtering, modelling,
prediction, control and hardware implementation were briefly detailed. Some
fundamental mathematical concepts that are necessary in this book have also
been provided.
CHAPTER 2
SEQUENTIAL NONLINEAR IDENTIFICATION
2.1 Introduction
The identification of nonlinear systems using neural networks has become a
widely studied research area in recent years. System identification mainly con-
sists of two steps: the first is to choose an appropriate identification model and
the second is to adjust the parameters of the model according to some adap-
tive laws so that the response of the model to an input signal can approximate
the response of the real system to the same input. Since neural networks have
good approximation capabilities and inherent adaptivity features, they pro-
vide a powerful tool for identification of systems with unknown nonlinearities
(Antsaklis, 1990; Miller et al. 1990).
The application of neural network architectures to nonlinear system iden-
tification has been demonstrated by several studies in discrete time (see, for
example, Chen et al., 1990; Narendra and Parthasarathy, 1990; Billings and
Chen, 1992; Qin et al., 1992; Willis et al., 1992; Kuschewski et al., 1993; Liu
and Kadirkamanathan, 1995) and in continuous time (Polycarpou and Ioan-
nou, 1991; Sanner and Slotine, 1992; Sadegh 1993). For the most part, much
of the studies in discrete-time systems are based on first replacing unknown
functions in the difference equation by static neural networks and then de-
riving update laws using optimisation methods (e.g., gradient descent/ascent
methods) for a cost function (quadratic in general), which has led to var-
ious back-propagation-type algorithms (Williams and Zipser, 1989; Werbos,
1990; Narendra and Parthasarathy, 1991). Though such schemes perform well
in many cases, in general, some problems arise, such as the stability of the
overall identification scheme and convergence of the output error. Alternative
approaches based on the model reference adaptive control scheme (N arendra
and Annaswamy, 1989; Slotine and Li, 1991) have been developed (Polycar-
pou and Ioannou, 1991; Sanner and Slotine, 1992; Sadegh, 1993), where the
stability of the overall scheme is taken into consideration.
Most of the neural network based identification schemes view the problem
as deriving model parameter adaptive laws, having chosen a structure for the
neural network. However, choosing structure details such as the number of
basis functions (hidden units in a single hidden layer) in the model must be
done a priori. This can often lead to an over-determined or under-determined
network structure which in turn leads to an identification model that is not
optimal. In discrete-time formulation, some approaches have been developed
28 2. Sequential Nonlinear Identification
in determining the number of hidden units (or basis functions) using decision
theory (Baum and Haussler, 1989) and model comparison methods such as
minimum description length (Smyth, 1991) and Bayesian methods (MacKay,
1992). The problem with these methods is that they require all observations
to be available together and hence are not suitable for on-line or sequential
identification tasks.
Yet another line of approach, developed for discrete-time systems, is to be-
gin with a larger network prune, as in Mozer and Smolensky (1989) or begin
with a smaller network growth as in Fahlman and Lebiere (1990) and Platt
(1991) until the optimal network complexity is found. Amongst these dynamic
structure models, the resource allocating network (RAN) developed by Platt
(1991) is an on-line or sequential identification algorithm. The RAN is essen-
tially a growing Gaussian radial basis function (GRBF) network whose growth
criteria and parameter adaptation laws have been studied (Kadirkamanathan,
1991) and applied to time-series analysis (Kadirkamanathan and Niranjan,
1993) and pattern classification (Kadirkamanathan and Niranjan, 1992). The
RAN and its extensions addressed the identification of only autoregressive sys-
tems with no external inputs and hence stability was not an issue. Recently,
the growing GRBF neural network has been applied to sequential identifi-
cation and adaptive control of dynamical continuous nonlinear systems with
external inputs (Liu et al., 1995; Fabri and Kadirkamanathan, 1996). Though
the growing neural network is much better than the fixed neural network in
reducing the number of basis functions, it is still possible that this network
will induce an overfitting problem. There are two main reasons for this: first,
it is difficult to known how many basis functions are really needed for the
problem and second the nonlinearity of a nonlinear function to be modelled is
different when its variables change their value ranges. Normally, the number of
basis functions in the growing neural network may increase to the one that the
system needs to meet the requirement for dealing with the most complicated
nonlinearity (the worst case) of the nonlinear function. Thus, it may lead to a
network which has the same size as fixed neural networks.
To overcome the above limitations, a new network structure, referred to
as the variable neural network, was proposed by Liu et al. (1996b). The basic
principle of the variable neural network is that the number of basis functions
in the network can be either increased or decreased over time according to a
design strategy in an attempt to avoid overfitting or underfitting. In order to
model unknown nonlinearities, the variable neural network starts with a small
number of initial hidden units and then adds or removes units located in a
variable grid. This grid consists of a number of subgrids composed of different
sized hypercubes which depend on the novelty of the observation.
This chapter introduces variable neural networks and considers a sequential
identification scheme for continuous nonlinear dynamical systems using neu-
ral networks. The nonlinearities of the dynamical systems are assumed to be
unknown. The identification model is a Gaussian radial basis function neural
network that grows gradually to span the appropriate state-space and of suf-
2.2 Variable Neural Networks 29
ficient complexity to provide an approximation to the dynamical system. The

sequential identification algorithm for continuous dynamical nonlinear systems
is developed in a continuous-time framework instead of in discrete time. The
approach, different from the conventional methods of optimising a cost func-
tion, attempts to ensure stability of the overall system while the neural network
learns the system dynamics. The stability and convergence of the overall iden-
tification scheme is guaranteed by parameter adjustment laws developed using
the Lyapunov synthesis approach. The operation of the sequential identifica-
tion scheme is illustrated by simulated experimental results.
2.2 Variable Neural Networks
Two main neural network structures which are widely used in on-line iden-
tification and control are the fixed neural network and the growing neural
network. The fixed neural network usually needs a large number of basis func-
tions in most cases even for a simple problem. Though the growing network
is much better than the fixed network in reducing the number of basis func-
tions for many modelling problems, it is still possible that this network will
lead to an overfitting problem for some cases and this is explained in Section
2.1. To overcome the above limitations of fixed and growing neural networks,
a new network structure, called the variable neural network, is considered in
this section.
Due to some desirable features such as local adjustment of the weights and
mathematical tractability, radial basis functions were introduced to the neural
network literature by Broomhead and Lowe (1988) and have gained significance
in the field. Their importance has also greatly benefited from the work of
Moody and Darken (1989) and, Poggio and Girosi (1990a,b) who explore the
relationship between regularisation theory and radial basis function networks.
One of the commonly used radial basis function networks is the Gaussian radial
basis function (GRBF) neural network, also called the localised receptive field
network, which is described by
n
j(x;p) = L Wk'Pk(X; Ck, dk ) (2.1)
k=l
where Wk is the weight, p = {w-k, Ck, dk } is the parameter set and 'Pdx; Ck, d k )
is the Gaussian radial basis function
(2.2)
d k is the centre and Ck is the weighting matrix of the basis function. The good
approximation properties of the Gaussian radial basis functions in interpola-
tion have been well studied by Powell and his group (Powell, 1987). Thus, the
discussion on variable neural networks is based on the GRBF networks.
2.2.1 Variable Grids
In GRBF networks, one very important parameter is the location of the centres
of the Gaussian radial basis functions over the compact set X, which is the
approximation region. Usually, an n-dimension grid is used to locate all centres
in the gridnodes (Sanner and Slotine, 1992). Thus, the distance between the
gridnodes affects the size of the networks and also the approximation accu-
racy. In other words, a large distance leads to a small network and a coarser
approximation, while a small distance results in a large size of network and
a finer approximation. However, even if the required accuracy is given, it is
very difficult to know how small the distance should be since the underlying
function is unknown. Also, the nonlinearity of the system is not uniformly
complex over the set X. So, here a variable grid is introduced for location of
the centres of all GRBFs in the network.
The variable grid consists of a number of different subgrids. Each subgrid is
composed of equally sized n-dimensional hypercuboids. This implies that the
number of subgrids can increase or decrease with time in the grid according to
a design strategy. All the subgrids are named, the initial grid is named the 1st
order subgrid, then the 2nd order subgrid and so on. In each subgrid, there
are a different number of nodes, which are denoted by their positions. Let Mi
denote the set of nodes in the i-th order subgrid. Thus, the set of all nodes in
the grid with m subgrids is
m
M=UMi (2.3)
i=l
To increase the density of the gridnodes, the edge lengths of the hypercubes
of the i-th order subgrid will always be less than those of the (i - l)-th order
subgrid. Hence the higher order subgrids have more nodes than the lower order
ones. On the other hand, to reduce the density of the gridnodes, always remove
some subgrids from the grid until a required density is reached.
Let all elements of the set M represent the possible centres of the network.
So, the more the subgrids, the more the possible centres. Since the higher
order subgrids probably have some nodes which are the same as the lower
order subgrids, the set of the new possible centres provided by the i-th order
subgrid is defined as
for j = 1, 2, ... , i - I } (2.4)
where Po is an empty set. This shows that the possible centre set Pi corre-
sponding to the i-th subgrid does not include those which are given by the
lower order subgrids, i.e.
(2.5)
For example, in the two-dimensional case, let the edge length of rectangles on
the i-th subgrid be half of the (i - l)-th subgrid. The variable grid with three
subgrids is shown in Figure 2.1.
DEBm
lSI subgrid 2nd subgrid 3rd subgrid
-+-
Fig. 2.1. Variable grid with three subgrids
2.2.2 Variable Networks
The variable neural network has the property that the number of basis func-
tions in the network can be either increased or decreased over time according
to a design strategy. For the problem of nonlinear modelling with neural net-
works, the variable network is initialised with a small number of basis function
units. As observations are received, the network grows by adding new basis
functions or is pruned by removing old ones. The adding and removing oper-
ations of a variable neural network are illustrated by Figure 2.2.
To add new basis functions to the network the following two conditions
must be satisfied: (a) The modelling error must be greater than the required
accuracy. (b) The period between the two adding operations must be greater
than the minimum response time of the adding operation.
To remove some old basis functions from the network, the following two
conditions must be satisfied: (a) The modelling error must be less than the
required accuracy. (b) The period between the two removing operations must
be greater than the minimum response time of the removing operation.
It is known that if the grid consists of the same size n-dimension hypercubes
with edge length vector P = [PI, P2, ... , Pnl, then the accuracy of approximating
a function is in direct proportion to the norm of the edge length vector of the
REMOV/
~ADD
Fig. 2.2. Adding and removing operations of a variable network
grid (Powell, 1981), i.e.
CK ex Ilpll (2.6)
Therefore, based on the variable grid, the structure of a variable neural net-
work may be stated as the following. The network selects the centres from
the node set M of the variable grid. When the network needs some new basis
functions, a new higher order subgrid (say, (m + l)-th subgrid) is appended
to the grid. The network chooses the new centres from the possible centre set
Pm+! provided by the newly created subgrid. Similarly if the network needs to
be reduced, the highest order subgrid (say, m-th subgrid) is deleted from the
grid. Meanwhile, the network removes the centres associated with the deleted
subgrid. Since the novelty of the observation is tested, it is ideally suited to
on-line control problems. The objective behind the development is to gradually
approach the appropriate network complexity that is sufficient to provide an
approximation to the system nonlinearities and consistent with the observa-
tions being received. By allocating GRBF units on a variable grid, only the
relevant state-space traversed by the dynamical system is spanned, resulting
in considerable savings on the size of the network. How to locate the centres
and determine the widths of the GRBFs is discussed in the next section.
2.2.3 Selection of Basis Functions
It is known that the Gaussian radial basis function has a localisation property
such that the influence area of the kth basis function is governed by the centre
Ck and width d k . In other words, once the centre Ck and the width d k are fixed,
the influence area of the Gaussian radial basis function cp(x; Ck, dk ) is limited
in state-space to the neighbourhood of Ck.
On the basis of the possible centre set M produced by the variable grid,
there are large number of basis function candidates, denoted by the set B.
During system operation, the state vector x will gradually scan a subset of the
state-space set X. Since the basis functions in the GRBF network have a lo-
calised receptive field, if the neighbourhood of a basis function cp(x; Ck, dk ) E B
is located 'far away' from the current state x, its influence on the approxima-
tion is very small and could be ignored by the network. On the other hand, if
the neighbourhood of a basis function cp(x; Ck, dk ) E B is near to or covers the
current state x, it will playa very important role in the approximation. Thus
it should be kept if it is already in the network or added into the network if it
is not in.
Given any point x, the nearest node xi = [x~, xi;, ... , xt,]T to it in the
i-th subgrid can be calculated by,
(2.7)
for j = 1,2, ... , n, where round(-) is an operator for rounding the number (-) to
the nearest integer, for example, round(2.51) = 3, and Oij is the edge length
of the hypercube corresponding to the j-th element of the vector x in the i-th
subgrid. Without loss of generality, let Oi = Oil = Oi2 = ... = Oin.
Define m hyperspheres corresponding to the m subgrids, respectively,
(2.8)
for i = 1,2, ... , m, where O"i is the radius of the i-th hypersphere. In order to
get a suitable sized variable network, choose the centres of the basis functions
from the nodes contained in the different hyperspheres Hi(xi, O"i), which are
centred in the nearest nodes xi to x in the different subgrids with radius
O"i, for i = 1,2, ... , m. For the sake of simplicity, it is assumed that the basis
function candidates whose centres are in the set Pi have the same width di
and di < di - 1 . Thus, for the higher order subgrids, use the smaller radius, i.e.
O"m < O"m-l < ... < 0"1 (2.9)
Usually, choose
(2.10)
where /1 is a constant and less than 1. Thus, the chosen centres from the set
Pi are given by the set:
(2.11)
In order that the basis function candidates in the set Pi which are less than
an activation threshold to the nearest grid node xi in the i-th subgrid are
outside the set Hi(xi, IJ"i), it can be deduced from (2.2) and (2.8) that the IJ"i
must be chosen to be
(2.12)
for i = 1,2, ... , m, where 6m in E (0,1) represents the activation threshold.

Thus, the centre set of the network is given by the union of the centre sets
Ci , for i = 1,2, ... , m, that is,
m
(2.13)
For example, in the 2-dimension case, the radii are chosen to be the same as
the edge lengths of the squares in the subgrids, that is,
i = 1,2, ... ,m (2.14)
The chosen centres in the variable grid with four subgrids are shown in Figure
2.3.
Now, consider how to choose the width dk of the kth basis function. The
angle between the two GRBFs cp(x; Ci, di ) and cp(x; Cj, dj ) is defined as
where < ., . > is the inner product in the space of square-integrable functions,
I: . . I:
which is defined as
< cp(x;ci,di),cp(x;cj,dj ) >= cp(x;ci,di)CP(X;Cj,dj)dxl ... dxn
(2.16)
The angle can be given by (Kadirkamanathan, 1991)
B· = cos- 1
0
tJ ~
2~)
((- -
+1
"} In(c·
r J'
C
"
do) 2<+2
t
< ) (2.17)
where ~ = dUd]. The above shows that COS(Bij) depends on three factors: the
dimension n, the width ratio ~ and the output of a basis function at the centre
of the other basis function, cp( Cj; Ci, d i ).
=-...,-.._."". . :.; :. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Fig. 2.3. Location of centres in the variable grid with four subgrids(the number i,
for i = 1,2,3,4, denotes the centres chosen from the i-th subgrid)
If the centres of the two basis functions are chosen from the same subgrid, i. e.
~ = 1, it is clear from (2.17) that
(2.18)
On the other hand, if the centres of the two basis functions are from dif-
ferent subgrids, it is possible that their centres are very close. The worst case
will be when cp(Cj; Ci, di ) is near to 1. In this case, the angle between the two
basis functions can be written as
cos(B·)
~J
<
-
-
~ +1
( 2VE,)
- "l- (2.19)
Given the centre Ck, in order to assign a new basis function cp(x; Ck, d k ) that is
nearly orthogonal to all existing basis functions, the angle between the GRBFs
should be as large as possible. The width d k should therefore be reduced.
However, reducing d k increases the curvature of cp(x; Ck, d k ) which in turn gives
a less smooth function and can lead to overfitting problems. Thus, to make
a trade-off between the orthogonality and the smoothness, it can be deduced
from (2.18) and (2.19) that the width d k , which ensures the angles between
GRBF units are not less than the required minimum angle Bmin , should satisfy
(2.20)
or
(2.21 )
and
(2.22)
For example, assume that ~o satisfies (2.20). If the width of the basis functions
whose centres are located in the set Ci , which corresponds to the i-th subgrid
with 6i = ~06i-l' is chosen to be d i = ~Odi-l and the width d 1 of the basis
functions associated to the initial grid satisfies
(2.23)
then the smallest angle between all basis functions is not less than the required
minimum angle Bmin.
2.3 Dynamical System Modelling by Neural Networks
For the sake of simplicity, we first discuss the modelling of single-input single-
state (SISS) continuous nonlinear dynamical systems. The multi-input multi-
state (MIMS) case will be detailed in Section 2.6. Consider the class of
continuous-time dynamical systems with an input-state representation given
by
x = f(x, u), x(O) = xo (2.24)
where f(x, u) is an unknown nonlinear function that must be estimated, u E
Rl is the input, and x E Rl is the state. Assume that the nonlinear system is
stable.
By subtracting and adding ax, where a is some positive constant, the sys-
tem (2.24) becomes
x = -ax + g(x, u), x(O) = Xo (2.25)
where
g(x, u) = f(x, u) + ax (2.26)
is still a nonlinear function. Since neural networks provide an input output
mapping, we construct a model based on equation (2.25) by replacing the
nonlinear part g(x, u) by a neural network. Consider the model (Landau, 1979)
:i; = -ax + g(X,UiP), x(O) = xo (2.27)

2.3 Dynamical System Modelling by Neural Networks 37
where g is the output of the neural network, x denotes the state of the identi-
fication model, while P denotes the adjustable parameters of the network.
The nonlinear function g(x, u) is approximated by the GRBF network,
which is expressed by
K
g(x, UiP) = L Wk exp { - T12 [(x - mkd 2 + (u - mk2)2] } (2.28)

k=l k
where Tk is the width of the kth basis function whose centre is mk

[mkl' mk2V, P is the parameter vector containing Wk, Tk and mk (k
1,2, ... ,K).
It is well known that if the variables of a nonlinear function are in bounded
sets, the continuous function can be approximated well by the GRBF networks.
If x and u are not in bounded sets, we introduce the following one-to-one (1-1)
mapping:
(2.29)
(2.30)
where ax, bx , au, bu are positive constants, which can be chosen by the designer
(e.g., ax,bx,au,b u are 1). It is clear from (2.29) and (2.30) that x E [-bx,bxl
and U E [-bu,bul for x,u E (-00,+00). On the other hand, if x and u are
already bounded, we need only to set x = x and U = u. Thus
bxx
X~{ Ixl +ax
x
if x t/. [-b x , bxl
if x E [-b x , bxl
(2.31)
buu
u~ { lui +au
u
if u t/. [-bu , bul
if u E [-bu , bul
(2.32)
The above one-to-one mapping is illustrated in Figure 2.4, which shows that
in two-dimensional space the entire area can be transferred into a rectangular
one.
Replacing x and u by x and U in equation (2.28), the nonlinear function g
of the system model described by the GRBF network can be written as
K
g(X,UiP) = LWk'Pk(X,Uimk,Tk) (2.33)

k=l
where
Fig. 2.4. One-to-one (1-1) mapping
(2.34)
The problem then becomes that of estimating the function g(x, u,p) based on
the variables x and u, which are in the bounded sets. A schematic diagram of
the identification framework is shown in Figure 2.5.
+
u
Fig. 2.5. Configuration for the identification framework
2.4 Stable Nonlinear Identification
In system identification, the stability of the overall identification scheme is

an important issue. Even when the real system to be identified is bounded-
input bounded-state stable there is no a priori guarantee that the estimated
state or the adjustable parameters of the model will remain bounded. The
overall stability depends not only on the particular identification model that
is chosen but also on the parameter adjustment rules that are used. This
2.4 Stable Nonlinear Identification 39
problem is solved here by developing a stable parameter adjustment rule based

on Lyapunov stability techniques for the GRBF network model.
It has shown how to choose the basis functions of the network model in
Section 2.2. Here, it assumes that the basis functions CPk(x,u;mk,rk) for k =
1,2, ... ,K are given. Thus, the GRBF network based system model can be
rewritten in the form of
K
X = -ax + L WZCPk(X, u; mk, rk) + c(t) (2.35)

k=l
where wZ (k = 1,2, ... , K) is the optimal weight value and c(t) is the modelling
error defined as
K
c(t) = g(x,u) - LWZCPk(x,u;mk,rk) (2.36)

k=l
It is well known from approximation theory that the modelling error can be
reduced arbitrarily by increasing the number K, i.e., the number of linear
independent basis functions CPk in the network model. Thus, it is reasonable
to assume that the modelling error c(t) is bounded by a constant CK, which
represents the accuracy of the model and this is defined as
CK = sup Ic(t) I (2.37)

tER+
Since x E [-b x , bxl and u E [-b u , bul are bounded, the constant CK is finite.
From Equation 2.27 the identification model can also be described by
K
i: = -ax+ LWkCPk(x,u;mk,rk) (2.38)

k=l
where Wk (k = 1, ... , K) is the estimate of w k' while x is the estimated state
of the model.
Let us define the state error and the weight estimation error respectively
as
(2.39)
(2.40)
Hence, subtracting (2.35) from (2.38) gives the following dynamical expression
of the state error:
K
ex = -ae x + L ~kCPdx, u; mk, rk) + c(t) (2.41 )

k=l
Consider the following Lyapunov function candidate:
V (ex, z)
121
="2 ( 2
ex + ;; (; ~k K) (2.42)
where z = [6, ... '~Kf and a is a positive constant which will appear in the
sequential adaptation laws, also referred to as the learning or adaptation step
size. Using (2.42), the time derivative of the Lyapunov function V is given by
K 1 K .
-ae; + Lex~kipk(x,il;mk,Tk) + -a L~k~k + exc(t)
k=1 k=1
K
-ae; + ..!.a L(aex~kipk(X, il; mk, Tk) + ~ktk) + exc(t) (2.43)
k=1
Since the optimal weight wk is constant, it can easily be obtained that Wk =

-tk' So, it is clear from (2.43) that if the parameter estimates Wk are adapted
according to the following laws:
(2.44)
then equation (2.43) becomes
V(e x , z) + exc(t)
-ae;
< + lexlEK
-ae;
-alexl(lexl - EK fa) (2.45)
If there is no modelling error (i.e., EK = 0), then from (2.45), V is negative

semidefinite; hence the stability of the overall identification scheme is guaran-
teed. On the other hand, in the presence of a modelling error, if Iex I < E K I a
then it is possible that V > 0, which implies that the weights Wk may drift to
infinity with time. In order to avoid this drift, we can set Wk = 0 if lex I < EK I a
so that the state error will converge to the set lex I :::; EK I a. However, the upper
bound EK is unknown. Thus, we set an upper bound VKM on Ilwll (Euclidean
norm of the weight vector), where w( t) = [WI (t), ... ,W K (t)]T, and an upper
bound eo (required accuracy) on the state error ex. Then the modified estima-
tion law is
if lexl ~ eo and Ilwll:::; VKM (2.46)
otherwise
for k = 1, ... , K.
It can be seen from the modified weight adjustment laws above that if
lex I ~ eo ~ EK la, the first derivative of the Lyapunov function with respect to
time t is always negative semidefinite. Although in the case where eo :::; lex I :::;
EK I a, the weights may increase with time because it is possible that V > 0,
it is clear from the estimation law (2.46) that the weights are still limited
by the bound VKM. If lexl > e max (the maximum tolerable accuracy) and
2.5 Sequential Nonlinear Identification 41
Ilwll = VKM, this means that more GRBF units are needed to approximate
the nonlinear function g. Therefore, the overall identification scheme is still
stable in the presence of modelling error. The Lyapunov function V depends
also on the parameter error and the negative semi-definiteness then implies
convergence of the algorithm.
2.5 Sequential Nonlinear Identification
The control of real-time systems with unknown structure and parameter in-
formation can be based on carrying out on-line or sequential identification
using nonparametric techniques such as neural networks. The sequential iden-
tification problem for continuous dynamic systems may be stated as follows:
given the required modelling error, the prior identification model structure
and the on-line or sequential continuous observation, how are these combined
to obtain the model parameter adaptive laws or the required neural network
approximation?
Here, a sequential identification scheme is considered for continuous-time
nonlinear dynamical systems with unknown nonlinearities using growing Gaus-
sian radial basis function networks. The growing GRBF network, which is
actually a type of variable neural network, starts with no hidden units and
grows by allocating units on a regular grid, based on the novelty of obser-
vation. Since the novelty of the observation is tested, it is ideally suited to
on-line identification problems. The parameters of the growing neural network
based identification model are adjusted by adaptation laws developed using
the Lyapunov synthesis approach.
The identification problem for the dynamical system of Equation 2.24 can
be viewed as the estimation of the nonlinear function g(x,u;p) as shown in
Section 2.4. If the modelling error is greater than the required one, according
to approximation theory more basis functions should be added to the network
model to get a better approximation. In this case, denote the prior identi-
fication structure of the function at time t as g(t) (x, u; p) and the structure
immediately after the addition of a basis function as g( t+) (x, u; p). Based on
the structure of the function g(x, u;p) in Equation 2.28, the identification
structure now becomes
(2.4 7)
where WK+1 is the weight of the new (K + l)th Gaussian radial basis function
cP K + 1· The sequential identification scheme using a neural network for the
nonlinear function g(x, u; p) is shown in Figure 2.6.
It is also known that the kth Gaussian radial basis function has a localisa-
tion property that the influence area of this function is governed by the centre
mk and width rk. In other words, once the centre mk and the width rk are
fixed, the influence area of the kth Gaussian radial basis function CPk is limited
in state-space to the neighbourhood of mk.
<fJK+l
u X
Fig. 2.6. Sequential identification scheme using neural networks
Let us first consider how to limit the number of the centres and hence the
size of the network. As shown in Figure 2.4, the observation pairs (x, u) are
in a rectangular set. An hx x hu grid, where hx and hu are odd integers, can
be produced by scaling the x and u axes by 2b x /(h x - 1) and 2b u /(h u - 1),
respectively, as shown in Figure 2.7. If the centres of the basis functions of
the network model are located on some of the crosspoints of the grid it is
clear that those centres will be equally distributed. For any point (x, u) in the
rectangular set, the nearest crosspoint (xm, um) can be calculated by
Xm = round (~) clx (2.48)
um = round (~) cl u (2.49)
where round ( .) is an operator for rounding the number (.) to the nearest integer
and
clx=~ (2.50)
hx -1
cl -~ (2.51)
u - hu - 1
The main influence area D ofthe radial basis function with the centre (xm, um )
is also shown in Figure 2.7.
2.5 Sequential Nonlinear Identification 43
Fig. 2.7. Two-dimensional grid
Now, consider how the width rk of the kth basis function are chosen. The angle
between the two GRBFs 'Pi and 'Pj with the same width ri = rj = ro is given
by (Kadirkamanathan, 1991; Kadirkamanathan and Niranjan, 1993)
(2.52)
To add a new basis function 'Pk that is nearly orthogonal to all existing ba-
sis functions, the angle between the GRBFs should be as large as possible.
This means that the width rk should be reduced. But, the curvature of the
basis function 'Pk will be increased by reducing rk and in turn leads to a less
smooth function. Thus, to make a compromise between the orthogonality and
the smoothness, a good choice for the width rk, which ensures the angles be-
tween GRBF units are approximately equal to some required angle Bmin , is
(Kadirkamanathan, 1991)
(2.53)
where 1
"" = [210g(1/ :OS2 Bmin )] 2"

(2.54)
with Bmin being the required minimum angle between Gaussian radial basis
functions, and
mt = arg. min {llmk - mill}
z=l, ... ,K, mi#-mk
(2.55)
is the nearest (in the Euclidean space) centre to the kth centre. The above
assignments are the same as those for the resource allocating network (RAN)
(Platt, 1991) for which the equations are arrived at from the consideration of
observation novelty heuristics.
The growing network, which is the special one of variable networks without
removing operation, is initialised with no basis function units. As observations
are received the network grows by adding new units. The decision to add a new
unit depends on the observation novelty for which the following two conditions
must be satisfied:
6
(i) min
k=l •... ,K
Ix - mkll > -2x (2.56)
or
.
mm Iu- mk2 I >-
6u (2.57)
k=l, ... ,K 2
(ii) Ix - xl> e max (2.58)
where 6x and 6u represent the scale of resolution in the input-state grid, and
e max is chosen to represent the desired maximum tolerable accuracy of the
state estimation. Criterion (i) says that the current observation must be far
from existing centres. Criterion (ii) means that the state error in the network
must be significant.
When a new unit is added to the network at time t 1 , the parameters asso-
ciated with the GRBF units are adapted as follows:
x(td) 6x ,
[round ( T (2.59)
(2.60)
(2.61 )
if lexl:2: eo and Ilwll ::; JK + 1M

otherwise
(2.62)
for k = 1, ... , K + 1 and WK+l (tt) = O. If no new GRBF unit is added, only
the weights are adapted by the law (2.62), for k = 1, ... , K.
It is known from approximation theory (Powell, 1981) that the approxima-
tion accuracy of a function by a set of basis functions, such as in neural net-
works, is proportional to the parameters 6x and 6u of the grid. In other words,
the smaller the parameters 6x and 6u , the more accurate the neural model. If
the tolerable accuracy of the state error is not reached, i.e., lexl > e max , then
the thresholds 6x and 6u on the criterion (i) should gradually be reduced by
halving their values (i.e., 6x /2 and 6u /2) at each time step until the minimum
2.6 Sequential Identification of Multivariable Systems 45
allowed values are reached. In this way, the state error will be reduced and the
existing centres of the basis functions of the network model are all still on the
crosspoints of the new grid as shown in Figure 2.8.
u u
x x
Fig. 2.8. Modification of the two-dimension grid
With the increase of the number of the GRBFs and the cross-points of the
grid, the approximation of a function by a GRBF network will be increasingly
more accurate, i.e.,
(2.63)
According to the approximation theory there exists a number K* such that
(2.64)
It has also been shown in section 2.4 that the overall identification scheme
is stable and that the model parameters converge to within some bound of
the optimal values. Therefore, the algorithm developed in this section still
guarantees the stability and convergence of the overall identification.
2.6 Sequential Identification of Multivariable Systems
In this section the sequential identification developed for single-input single-

state systems is extended to multivariable systems. Consider the multi-input
multi-state (MIMS) continuous dynamical system described by
x = f(x, u), x(O) = xo (2.65)

where U E n rx1 is the input vector, x E nnxl is the state vector and fe) E
nn x 1 is a nonlinear function vector. Following the same line of analysis as
for the single-input single-state case, the identification model for the system
(2.65) can be expressed by
x = Ax + g(x, u), x(O) = Xo (2.66)
where
g(x, u) = f(x, u) - Ax (2.67)
and A E nnxn is a Hurwitz or stability matrix (i.e., all the eigenvalues are
in the open left-half complex plane). Modelling the nonlinear function vector
g(x,u) E nnxl using GRBF neural networks gives the following identification
model:
i: = Ax + g(x,u;p), x(O) = Xo (2.68)
where x denotes the state vector of the network model and g is the output
vector of the GRBF neural network. Define the following one-to-one mappings
for the inputs and states:
bxiXi
Xi = for i=1,2, ... ,n (2.69)
IXil + axi
buiUi
Ui= for i=1,2, ... ,T (2.70)
IUil + aui
where axi, bxi , aui, bUi are positive constants. These mappings ensure that the
elements of the vectors X and u are all in bounded sets. The estimate of the
function g then is written as
(2.71 )
where
(2.72)
(2.73)
WK = {Wid E nnxK is the weight matrix of the network with K GRBF

units, ipk(X,U) E n 1 is the kth Gaussian radial basis function, and p denotes
the adjustable parameter vector of the network consisting of the weight matrix
WK, the centres mk = [mkl,mk2, ... ,mk(n+r)f and the width Tk for k =
1,2, ... ,K.
Assuming that the basis function vector PK(X,U) is given the real system
can be modelled by the GRBF network model as
x = Ax + Wj'(PK(X, u) + e(t) (2.74)

2.6 Sequential Identification of Multivariable Systems 47
where Wi'( is the optimal weight matrix and e(t) = [el(t),e2(t), ... ,en (t)V is
the modelling error vector which is assumed to be bounded by
CK = . max sup {lei(t)l} (2.75)

t=l, ... ,n tER+
Define the state error vector and weight error matrix as
(2.76)
(2.77)
so that the dynamical expression of the state error is given by
(2.78)
Consider the Lyapunov function
(2.79)
where tr(·) denotes the trace of the matrix (.). The first derivative of the
Lyapunov function V with respect to time t is
- - 1
V(e x, r K) = ex Aex + ex rK<PK(X, u) + -tr(rK r K) + ex e(t)
. T T ·T T
(2.80)
a
Since
tr(e; rK<PK(X, il))

tr(rK<p K (x, il)e~) (2.81)
Equation 2.80 becomes

1
V(e x, r K ) = ex Aex + tr(rK<pK(X, u)e x ) + -tr(rK r K) + exT e(t)
. T - - T ·T
(2.82)
a
It is clear that TVK = - tK because the optimal weight matrix Wi'( is a constant
matrix. So, if the estimation law for the weight matrix is given by
(2.83)
then Equation 2.82 becomes
V(ex,rK) e;Aex+e;e(t)
< -IAmax(A)le; ex + e; e(t)
< -IAmax(A)1 t lexil {le il - IAm::(A) I }
X
(2.84)
where Amax(A) is the maximum eigenvalue of the matrix A which could be

negative because A is a Hurwitz or stability matrix and exi is the ith element
of the state error vector ex. In the presence of a modelling error, if
CK
. mm {Iexil} < .,--------,------,- (2.85)
~=l •... ,n IAmax(A)1
then it is possible that 11 > 0, which implies that the weights Wik may drift to
infinity over time. Following the analysis for the single-input single-state case,
this drift can be avoided by modifying the adaptation law as
lexil :2: eo and IIWKI::; VKM (2.86)

otherwise
for i = 1, ... , nand k = 1, ... , K, where eo is the required accuracy of the
state error ex and VKM are the upper bound on the Euclidean norm of the
weight matrix WK. It is easy to show that the overall identification scheme
with the modified identification laws is stable.
As observations are received the network grows by adding new units. The
decision to add a new unit depends on the observation novelty for which the
following must be satisfied:
Oxi
(i) min IXi - mkil > -2 (2.87)
k=l, ... ,K,i=l, .. ,n
or
OUj
mm
k=l, ... ,K,j=l, .. ,r
IUj - mk(j+n) I > -2 (2.88)
(ii) max (2.89)

i=l, ... ,n
where OXi and Ouj represent the scale of resolution in the input-state grid and
e max is chosen to represent the desired maximum tolerable accuracy of the
state estimation. When a new unit is added to the network at time tl the
parameters associated with the GRBF units are adapted as follows:
for i = 1, ... , n (2.90)
for j = 1, ... , r (2.91 )
mt = arg. min
t=l, ... ,K+l,mi#mk
{llmk - mill}, for k = 1, ... , K + 1 (2.92)
(2.93)
. { aexi'Pdx, u), if lexil:2: eo and IIWK+11 ::; VK + 1M

Wik = 0 otherwise
for i = 1, ... , n and k = 1, ... , K + 1
(2.94)
where Wi(K+1) (tn = 0, for i = 1,2, ... , n. If no new GRBF unit is added, only
the weight matrix WK is adapted using the law above in (2.94).
2.7 An Example 49
2.7 An Example
In this section a SISS dynamical system is used to demonstrate the operation

of the sequential identification algorithm for a continuous nonlinear system.
The following SISS continuous-time dynamical system was considered:
(2.95)
where the input u is assumed to be cos(t) and the initial state x(O) = O.
The parameter values used in this example are as follows: eo = 0.001, em ax =
0.005,6 x = 6u = 0.05, a = 0.5, M = 1.5,0: = 1, ~ = 3.0, Xo = O.
The simulation was begun with no GRBF units in the network model and
the number of units increased with time, according to the growth criteria.
The final results after an operation over a period of 10 seconds gave a GRBF
network with 16 hidden units, for approximating the dynamical system. The
performance of the sequential identification scheme using the GRBF network
are shown in Figures 2.9-2.12, for a typical run ofthe algorithm observing that
similar plots obtained under different operational conditions.
1 .2 ,---,---,-----,----,-----,----,----,--,-----.,--------,
0.8
0.6
04
Real state
--- Estimated state
2 3 4 5 6 7 8 9 10
time t (sec)
Fig. 2.9. Real state x and estimated state :i; over time
The actual and estimated states and the state error of the dynamical system
against time t are shown in Figures 2.9 and 2.10, respectively. It can be seen
from Figure 2.9 that for much of the operation, the state error is constrained
within the maximum tolerable bound e max = 0.005. The network parameters
also converged to a set of values although they were oscillating around these
values. A plot of the actual state x and the estimated state x against the input
u is shown in Figure 2.11 which indicates the presence of the strong nonlinear-
ity in the dynamical system. Figure 2.12 shows the relationship between the
estimated state and its first derivative as they gradually approach the true set
of values.
0.04,----~-~-~-~-~-~-~-~-~_____,
0.03
0.02
0.01
-0.01
-0.02
-0.030~-~-~-~-~-~5-~----=--~-~-~10
time t (sec)
Fig. 2.10. State error ex with time
1.2,----~-~-~-~-~-~-~-~-~-~
0.8
0.6
,,
0.4
\
0.2
O~-~-~-~-~-~-~--L-~-~-~
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
the input u
Fig. 2.11. Actual state x (-) and estimated statei: (- -) against the input u
2.8 Summary 51
\ / ~ \
\
0.8 I
I
\
I \
\
I \
0.6 /
/
/ \
I
\
\/
'/ - "
0.4 I \ /
1/ ~ ~
II /
/
0.2 II
1/
-0.2
I
-0.4
\ "-/ \
/
/
/
-0.6 " "/
-0.8
0 0.2 0.4 0.6 0.8 1.2
Fig. 2.12. Actual derivative of state x against state x (-) and estimated derivative
of state i against statei: (- -)
2.8 Summary
A variable neural network structure has been proposed, where the number of
basis functions in the network can be either increased or decreased over time
according to some design strategy to avoid either overfitting or underfitting.
In order to model unknown nonlinearities of nonlinear systems, the variable
neural network starts with a small number of initial hidden units, then adds
or removes units on a variable grid consisting of a variable number of subgrids
with different sized hypercubes, based on the novelty of observation.
A sequential identification scheme for continuous nonlinear dynamical sys-
tems with unknown nonlinearities using neural networks has been developed.
The main feature of this scheme is the combination of the growing Gaussian
radial basis function network with that of Lyapunov synthesis techniques in
developing the adaptive or estimation laws that guarantee the stability of the
system. The idea of growing the network, similar to the resource allocating net-
work (RAN), overcomes the problem of having to choose the neural network
structure a priori, a difficult task which often results in an overdetermined
network. The network begins with no radial basis function units and with
increasing time, the model grows gradually to approach the appropriate com-
plexity of the network that is sufficient to provide the required approximation
accuracy. The stability of the overall identification scheme and convergence of
the model parameters are guaranteed by parameter adjustment laws developed
using the Lyapunov synthesis approach. To ensure that the modelling error is
reduced arbitrarily, a transformation is proposed so that the states and inputs

of the system are mapped into bounded sets. The operation of the sequential
identification algorithm was demonstrated on a simulated experiment and the
results conformed to theoretical expectations.
CHAPTER 3
RECURSIVE NONLINEAR IDENTIFICATION
3.1 Introduction
The system identification procedure mainly consists of model structure selec-
tion and parameter estimation. The former is concerned with selecting which
class of mathematical operator is to be used as a model. The latter is con-
cerned with an estimation algorithm and usually requires input output data
from the process, a class of models to be identified and a suitable identifica-
tion criterion. A number of techniques have been developed in recent years
for model selection and parameter estimation of nonlinear systems. Forward
and backward regression algorithms were analysed in Leontaritis and Billings
(1987). Stepwise regression was used in Billings and Voon (1986) and a class of
orthogonal estimators were discussed in Korenberg et al. (1988). Algorithms
with the objective of saving memory and allowing fast computation have been
proposed in Chen and Wigger (1995). Methods to determine the a priori struc-
ttlral identifiability of a model have also been studied (Ljung and Glad, 1994).
A survey of existing techniques of nonlinear system identification prior to the
1980s is given in Billings (1980), a survey of the structure detection of input
output nonlinear systems is given in Haber and Unbehauen (1990) and a sur-
vey of nonlinear black-box modelling in system identification can be found in
Sjoberg et al. (1995).
An area of rapid growth in recent years has been neural networks. This
approach makes few restrictions on the type of input output mapping that can
be learnt. The majority of nonlinear identification techniques using neural net-
work are off-line which means the structure and parameters of the model are
fixed after off-line identification based on a set of input output data. However,
if there is a change in the system operation or the real system input space is
different from the one which was used for off-line identification, this will lead
to changes in the parameters of the neural network based model, causing a de-
terioration in the performance of the identification. Therefore, in order to have
good identification performance, both the structure and the parameters of the
model need to be modified in response to variations of the plant characteristics
and operating point. Recently, new algorithms have been developed which op-
erates on a window of data and which can be used on-line to adaptively track
the variations of both model structure (Fung et at., 1996; Luo and Billings,
1995) or topology (Luo et al., 1996; Luo and Billings, 1998) and update the
estimated parameters or weights on-line.
54 3. Recursive Nonlinear Identification
This chapter is mainly concerned with structure selection of nonlinear poly-

nomials in the VPBF network and recursive parameter estimation of the se-
lected model. In order to obtain a proper sized network the orthogonal least
squares algorithm is introduced for off-line structure selection and this is then
augmented by the growing network technique which is used for on-line struc-
ture selection. In the off-line selection stage, the orthogonal least squares tech-
nique is used to select a set of Volterra polynomial basis functions and to
arrange the order according to their ability to reduce the approximation error.
In the on-line selection, the growing network technique is used to approach
gradually the appropriate complexity of the network that is sufficient to pro-
vide an approximation to the system to be identified that is consistent with the
observations being received. For parameter estimation, a new on-line recursive
weight learning algorithm is developed using a Lyapunov synthesis approach.
It is not necessary to assume that the approximation error is white noise or
that its upper bound is known. The learning algorithm ensures that the weights
and approximation accuracy converge to their required regions.
3.2 Nonlinear Modelling by VPBF Networks
Consider the nonlinear discrete system described by
X t +! = G(Xt , Ut) (3.1)

Yt = h(Xt,Ut) (3.2)
where G(.) is a nonlinear function vector, h(.) a nonlinear function, X t the

state vector, Yt the output and Ut the input.
Based on the input and output relation of a system, the above nonlinear
discrete system can also be expressed by a nonlinear auto-regressive moving
average (NARMA) model (Leontaritis and Billings, 1985), that is,
Yt = f(Yt-l, Yt-2, ... , Yt-n, Ut-l, Ut-2, ... , Ut-m) (3.3)
where f (.) is some nonlinear function, nand m are the corresponding maxi-
mum delays.
It is well known that neural networks provide a good nonlinear function
approximation techniques. A nonlinear identification structure by neural net-
works is shown in Figure 3.1. Here it assumes that the nonlinear function f(.)
in the NARMA model is approximated by a single-layer neural network, which
consists of a linear combination of basis functions.
N
J(Xt) = L WkIPk(Xt) (3.4)

k=l
where Xt = [Yt-l,Yt-2, ... ,Yt-n,Ut-l,Ut-2, ... ,Ut-m], IPk(Xt) is the basis func-
tion and Wk the weight.
3.2 Nonlinear Modelling by VPBF Networks 55
System
u
Fig. 3.1. Neural network based identification
According to the universal approximation theorem (Haykin, 1994), there exist

a finite number of basis functions so that the neural network can approxi-
mate the nonlinear function precisely. But, in practice, the problem is how to
find these basis functions. Fortunately, it has been shown that the required
approximation accuracy can be reached using an adequate number of indepen-
dent nonlinear basis functions, for example, Volterra polynomial functions, ra-
dial functions, B-spline functions or wavelets. A neural network which uses the
Volterra polynomials as the basis functions will be studied. The representation
of the nonlinear function f(Xt) is then given by
WI + W2Yt-I + ... + Wn+IYt-n + W n +2 Ut-I + ... + Wn+m+I Ut-m

+Wn + m +2Yt-I + W n + m +3Yt-IYt-2 + ... + WNU t _ m
2 I
(3.5)
where the set of the Volterra polynomial basis functions is
['PI, 'P2, 'P3, ... , 'Pn+I, 'Pn+2, ... , 'Pn+m+I, 'Pn+m+2, 'Pn+m+3, ... , 'P N ](Xt)
= [1, Yt-I, Yt-2, ... , Yt-n, Ut-I, ... , Ut-m, yLll Yt-IYt-2, ... , u~-ml (3.6)
and the number of polynomial basis function is given by
N= (n+m+l)!
(3.7)
l!(n + m)!
Using the VPBF network, the nonlinear function f(.) can be obtained by
(3.8)
where o(xD is the approximation error.

Increasing the order I, the number N of basis functions becomes larger and
larger. Thus, the problem is how to estimate the function j(Xt) using a proper
sized neural network so that the approximation accuracy is within the required
bound. The structure selection and the weight learning of the neural network
are discussed in the following sections.
3.3 Structure Selection of Neural Networks
There are many ways to select the basis functions. Here, off-line structure
selection using the orthogonal least squares algorithm (Billings et al., 1988) and
on-line structure selection using growing network techniques are introduced for
the basis function selection of Volterra polynomial networks.
3.3.1 Off-line Structure Selection
It is assumed that a set of input output data (Yt, Ut, t = 1,2, ... , M) of the
system is given. Based on (3.5), the input output relation may compactly be
written in the following vector form:
(3.9)
where the input vector Y E n MXI , the weight vector W E n NXI , the
approximation error vector O(xl) E n MXI and the basis function matrix
n
p(x) E MxN are, respectively,
Y = [YI Y2 YMf (3.10)
W= [WI W2 ... WN]T (3.11)
O(xl) = [o(xi} o(x~) o(x~) f (3.12)
r",'PI (x,)
(X2)
'P2(xd
'P2 (X2) "N(X,)
'PNi X2 )
1
p(x) = . (3.13)
'PI (XM) 'P2(XM) 'PN(XM)

The weight vector W is usually found by minimising the Euclidean norm, i.e.,
W = argmin
W
II Y - p(x)W 112 (3.14)
which is a least squares solution.

The vectors Pi = ['Pi(xd, 'Pi(X2), ... , 'Pi(XM )f, for i = 1,2, ... , N, form a
set of basis vectors, and the orthogonal least squares solution W satisfies the
condition that p(x)W will be the projection of Y onto the space spanned by
the basis function vectors {pd. The orthogonal least squares method involves
3.3 Structure Selection of Neural Networks 57
transformation of the set of basis vectors {pd into a set of orthogonal basis
vectors, and thus makes it possible to calculate the individual contribution to
the desired output from each basis vector. An orthogonal decomposition of the
matrix p(x) gives
p(x) = PQ (3.15)
where P = [PI, P2 , ... , P N 1 is an M x N matrix with orthogonal columns and

Q is an N x N unit upper triangular matrix with 1 on the diagonal and 0
below the diagonal, that is,
1 q12 q13 qlN
0 1 q23 q2N
Q= 0 0 1 q3N (3.16)
0 0 0 1
Clearly, the matrix P satisfies

pTp=D (3.17)
where D is a diagonal with elements d i
di = pl Pi, 1<i <N (3.18)
Using the above, Equation 3.9 can be written as
Y= PV + O(xl) (3.19)
W = Q- 1 V (3.20)
where V = [VI, V2, ... , V N jT ERN XI. It can be seen that the optimal estimate
V= [Vl,V2+, ... ,VNjT of the vector V is
yT~ £ N
= plpi = 1,2, ... , (3.21)
A
Vi ' or i
so that IIY - PV 112 is minimal. The corresponding optimal weight vector is
(3.22)
The classical Gram Schmidt and modified Gram Schmidt methods can be used
to derive the above and thus to solve the least squares estimate of W.
The output variance can be expressed as
(3.23)
V;
Note that 2:;:1 pl Pi / M is the part of the desired output variance which can
be explained by the basis functions and OT 0 / M is the unexplained variance
v;
of y(t). So, pl Pi is the increment to the explained desired output variance
introduced by Pi and the error reduction ratio due to Pi may be defined by
0;ply
Ti = yTy (3.24)
This ratio offers a simple and effective means of seeking a subset of significant
basis functions. This implementation based on the classical Gram Schmidt is
given as follows (Billings et at., 1988, 1989):
(a) At the first step, for i = 1,2, ... , N, calculate
(3.25)
(3.26)
(Pii))T p;i)
(V/i))2(pi i ))Ty
yTy (3.27)
Find
Sl -- arg max {(i)

T1 , ~
. -- 1 , 2 , ... , N} (3.28)
and select
(3.29)
(b) At the k-th step, where k > 2 for i = 1,2, ... ,N, i -::f. Sl,···,~ -::f. Sk-1,
compute
T Pj
j = 1,2, ... ,k (3.30)
(Pj)TP/
(3.31 )
v:( i) (3.32)
k
(p~i))T p~i)
(Vk(i))2 (p~i)) Ty
yTy (3.33)
Find
Sk = arg maX{Tk( i) , i = 1,2, ... ,N, i -::f. Sl, ... ,i -::f. sk-d (3.34)
and select
(3.35)
3.3 Structure Selection of Neural Networks 59
(c) The procedure is terminated at the L- th step when

L
1- 2::rj < eo (3.36)

j=l
where 0 < eo < 1 is a chosen tolerance. This gives a subset model contain-
ing L significant terms.
It is clear from (3.21) and (3.24) that ri :::: O. Changing the order of the
VPBFs will lead to a change in the error reduction ratio rio For N VPBFs,
there are N! sorting possibilities. Let the r}j) denote the error reduction ratio ri
corresponding to the j-th sorting of the VPBFs. The classical Gram Schmidt
method introduced above can be used to find the a-th sorting of the basis
functions ipdXt), ip2(Xt), ... , ipN(Xt), which is the best sorting, such that
k k
' " ' r(o)
~ 1,
>
-~
'"' r(j)
1,
for j i- a,j = 1,2, ... , N!, k = 1,2, ... , N (3.37)
i=l i=l
In this way, the priority of all candidates is determined. Thus, the best sort-
ing of VPBFs is denoted by ip~(Xt), ip~(Xt), ... , ip'N(Xt) and the corresponding
weight vector is WO.
3.3.2 On-line Structure Selection
For nonlinear systems, the system operation can change with time or the real
system input space is different from the one which was used for off-line identifi-
cation. In order to produce good identification performance, both the structure
and the weights of the neural network model may need to be modified in re-
sponse to variations in the plant characteristics. Here, the modification of the
neural network structure will be taken into account. The adaptation of the
weights will be discussed in the next section.
According to approximation theory, adding more independent basis func-
tions to the network will improve approximation. In off-line structure selection,
the VPBFs are reordered in terms of their priority. Here it is assumed that at
time t - 1 the basis functions of the VPBF network consist of the first L best
candidates ip~ (Xt), ip~ (Xt), ... , ipL (Xt). To improve the approximation accuracy,
the growing network technique (Liu et al., 1995, 1996c) is applied. This means
that one more VPBF, which is chosen from and is of the highest priority in
the remaining basis function candidates ipL+l (Xt), ipL+2(Xt), ... , ip'N(Xt) , needs
to be added to the network. In this case, denote the structure of the VPBF
network at time t - 1 as Pt-1) (Xt) and the structure immediately after the
addition of a basis function at time t as pt) (Xt). Based on the growing network
technique and the structure of the function J(Xt) in (3.4), the structure of the
VPBF network now becomes
j'(t) (Xt ) = j'(t-1) (Xt ) + wL+1

a a
ipL+1 ( )
Xt (3.38)
where w L+1 is the weight corresponding to the new (L + l)th Volterra poly-
nomial basis function 'PL+1 (Xt).
The growing VPBF network is initialised with a small set of Volterra poly-
nomial basis function units. As observations are received, the network grows
by adding new units. This is called the addition operation. The decision to
add a new unit depends on two conditions. The first is that the following must
be satisfied:
(3.39)
where omax is chosen to represent the desired maximum tolerable accuracy.

The above condition implies that the approximation error between the real
output Yt and the output J(Xt) of the VPBF network must be significant. The
second is that the time period between two addition operations must not be
less than the response time of the network to the addition operation. This
is to limit the growing speed of the number of VPBFs in the network and
to avoid some unnecessary addition operations. In this way, an appropriately
sized network will be obtained and the off-line approximation error will be
within the required accuracy.
3.4 Recursive Learning of Neural Networks
In the previous section, the structure selection of the VPBF network model
was considered to reach a good approximation accuracy. This section takes
into account the parameter adaptation laws which ensure that the estima-
tion error converges to the desired range when the plant characteristics and
the system operating point change. Here, it is assumed that the basis func-
tions 'PI (Xt), 'P2 (Xt), ... , 'PL (Xt) are given. The estimated function Jt (Xt) in the
N ARMA model can also be expressed by
(3.40)
where the weight vector Wt - I and the basis function vector Pt-I are
Wt - I = [WI(t -1) W2(t -1) wL(t -1) f (3.41 )

Pt-I = ['P'{(Xt) 'P2(Xt) 'PL(Xt) f (3.42)
and the initial weight vector is Wo = [WI w 2 w LjT. The output Yt of

the system modelled using the VPBF network can be written in the form
Yt = Pi-I W* + Ct (3.43)
where W* is the optimal estimate of the weight vector W t in the network

with L independent VPBF units and ct is the modelling error. In terms of the
approximation ability of neural networks, the modelling error can be reduced
arbitrarily by increasing the number L. Thus, it is reasonable to assume that
3.4 Recursive Learning of Neural Networks 61
the minimal upper bound of the modelling error c(t) is given by a constant
h, which represents the accuracy of the model and this is defined as
(3.44)
The estimation problem is then to find a vector W belonging to the set defined
by
(3.45)
Although many algorithms have been proposed as a solution to the above

estimation problem (see, for example, Fogel and Huang, 1982; Ortega and
Lozano-Leal, 1987; Canudas and Carrillo, 1990; Whidborne and Liu, 1993;
Wang et at., 1995), these are based on the assumption that the minimal upper
bound tiL of the estimation error is known. However, this assumption is not
realistic in most practical applications. Following the recursive least squares
algorithm, a recursive weight learning algorithm for the VPBF network is
developed to remove this assumption. This algorithm is as follows (Liu et at.,
1997a, 1998a):
Wt = W; - at f3tTJt Pt <Pt-I et (3.46)
W; = W t- I + atf3tPt<pt-Iet (3.47)
Pt = Pt- I - f3tit Pt-1 <Pt-I <Pi-I Pt- I (3.48)
et = Yt - wEI <Pt-I (3.49)
(3.50)
f3 -
t -
{I, 0,
letl > 0
let I :S 0
(3.51)
It = (Ietl- 0) (Ietl + (2Ietl- 0) <Pi-IPt-l<Pt-I)-1 (3.52)

0, IIW:112 :S M
{ (3.53)
TJt E [s-, s+], IIW: 112 > M
where the lower and upper bounds of TJt are given by
(3.54)
+ atet<pLI Pt W t- I + Ct
s = 1+ 2 (3.55)
II atetPt<Pt-1 112
with Ct = [(atet<pLIPtWt-d2 + II atetPt<Pt-1 11;(M 2 - II W t- I 11;)P/2, M is

the upper bound of the 2-norm of the weight vector Wt, and 0 is the desired
approximation error. The optimal design of TJt will be given later.
Next, the properties of the above learning algorithm are analysed using
Lyapunov techniques. To ensure the convergence of the algorithm, consider
the Lyapunov function:
(3.56)
where Wt = W* - W t . Let
at = (J;tf3t Pt P t-1 et (3.57)

bt = -(J;tf3tTJt Pt P t-1 et (3.58)
Then, the weight vector Wt in (3.46) can be written as
W t- I + (J;tf3t Pt p t-I e t - (J;tf3tTJt Pt p t-I e t

W t - I + at + bt (3.59)
The above implies that bt is used to reduce the effect of at in the weight vector
w;
II 112 > M. The Lyapunov function Vi in (3.56) is now extended to
W t if
(3.60)
where
(3.61 )
To make the algorithm converge fast, dt should be as negative as possible. Two

cases will be considered below, 6 :2: 6£ and 6 < 6£.
Case 1: 0 2: o£ and 1Jt =0

In this case, it is clear that W t = W; since TJt = O. This also means IIW: 112 :S M
for all time. It will be shown later that this is true if M satisfies a certain
condition. Since TJt = 0 which results in d t = 0, equation (3.60) becomes
which uses f3t = 13;' Following the matrix inverse theorem (Goodwin and
Mayne, 1987), the inverse of the matrix Pt is obtained by
(3.63)
Since Yt = Pi-I W* + ct and et = Yt - WEI Pt-I,
(3.64)
From (3.63) and (3.64), the first term on the right side of (3.62) is expressed
as
~T I~ (~T)2
Wt - I Pt-=-I Wt- I + atf3t Wt - I Pt-I
Vi-I + atf3t (e; + e; - 2etet) (3.65)
By (3.64), the second term on the right side of (3.62) is given by
(3.66)
Substituting (3.65) and (3.66) into (3.62) yields
(3.67)
which uses the result
(3.68)
Since it is assumed that the approximation error et satisfies let I ::; h ::; 6,
then from the above
Vi < Vi-I + atf3t (61,- atIr)'te;)

< Vi-I + atf3t ( 62 - at I rtet2)
< Vi-I - f3tit (letl3 - 21etl62 + ( 3) letl- I (3.69)
For let I :2: 6, it is not difficult to show that let 13 - 2621et I+ 63 :2: let 12 (let I - 6).
Hence
Vi < Vi-I - f3titletl(letl - 6)

< V; f3t (Ietl - 6)2
(3.70)
t-I - 2 (1 + pLIPt-IPt-l)
which leads to
(3.71)
It is known from (3.48) that Amax(Pt ) ::; Amax(Pt-d ::; ... ::; Amax(PO). As a
result,
II Wt - Wt-I 11 22 = 2f3 2A;T p2A;

at tet "'t-I t "'t-I
(3.72)
It is clear from (3.70) that Vi ::; Va. Thus,
(3.73)
because Amin (Pt- I ) :2: Am in (pt-=-II) :2: ... :2: Am in (Pa- I ).

Clearly, Equation 3.71 shows that if 1 + <Pi-I Pt - I <Pt-I is finite for all time,
which is true if the closed-loop system is stable, the estimation error et con-
verges to 6, that is,
lim et
t-+oo
=6 (3.74)
Also, it can be seen from (3.72) that the weights converge as time t approaches
infinity. In addition, Equation 3.73 implies that the weights will never drift to
infinity over time. Thus, if M is chosen to satisfy
M> Am ax (Pa- I ) II W* - TV; 112 + II W* II~ (3.75)

- Am in (Pa- I ) a 2 2
then II W t II; is not greater than M for all time.

In practice, it is very difficult to determine the upper bound h of the mod-
elling error ct and the optimal weight vector W*. Equation 3.70 shows that if
6 < 6L, Wt will drift away because Ll Vi = Vi - Vi-I may be positive. Moreover,
if (3.75) is not satisfied, IIW; 112 ::; M, vtmay not hold. These problems will be
considered next.
Case 2: 8 < 8L or 1Jt # 0
The analysis of the algorithm for Case 1 shows that if 6 < 6L, W; may be
greater than the bound M. In addition, in the case where (3.75) is not satisfied,
it cannot be simply assumed that T)t = 0 since W; may also be greater than the
bound M. So, W t = W; - T)tf3tOOtPt<Pt-Iet will be used for weight adjustment.
This leads to
II W t- I + at + bt II;
II W t- I II; + (at + bt f(2Wt- I + at + bt )
II W t- I II; + (1- T)t)af(2Wt- I + (1 - T)t)at) (3.76)
To ensure that the weight vector Wt does not drift, we require
(3.77)
where the solution to the inequality is given by
(3.78)
and s- and s+ are given by (3.54) and (3.55). There are still an infinite number
of possibilities for T)t. Hence, the question of what is the optimal solution of T)t
arises. To answer this question, let us consider (3.61). The first term on the
right side of (3.61) can be calculated as
The second term on the right side of (3.61) can be computed by
b;Pt-lbt = TI;a;(3t e;pL 1Pt Pt-l (3.80)
Substituting (3.79) and (3.80) into (3.61) yields
(3.81)
where
(3.82)
Now, dt consists of two parts. The first is 2T1tat(3tetet, which is the uncertain
part because the modelling error is unknown. The second is g(Tlt) which is
computable. It is also known from the Lyapunov technique that the more
negative dt is, the faster the reduction of the function Vi is. Thus, the function
g(Tlt) is used as the performance index for choosing the optimal solution of Tit.
The function gt(Tlt) is a concave parabola and has only one minimum. The
optimal TI; which minimises gt and the minimum gt(TI;) are given by
(3.83)
(3.84)
Since letl :S bL , it is clear from (3.81) that
dt :S gt (TI;) + 21 TI; Iat (3t Iet IbL (3.85)
If TI; E [s-, s+], then from (3.60), (3.67) and (3.85)
Vi < Vi-l + atf3t (bE - ail/ten + gt(TI;) + 2at(3tITl;etlbL

Vi-l + atf3t (bE - ail/ten + at(3tITl;etl(2bL
-11- atpLlptpt-llletl) (3.86)
It is clear from (3.86) that if
letl > max{ Vatit-lbL, 211- atpL1PtPt-ll-lbd (3.87)

the second and third terms on the right-hand side of (3.86) will be negative.
Using (3.50) and (3.52) gives
atlt- I = (1 + (2 - bletl- l )pL I P t - I Pt-d(l + pLIPt-IPt-d- 1 :::; 2 (3.88)
As a result, if the following condition
(3.89)
is satisfied, then the weights converge to their optimal values since L1 Vi :::; O.
On the other hand, if the above condition is not satisfied, it is possible that
L1 Vi > O. This implies that the weight vector Wt may drift away over time. In
this case, the weight learning algorithm given by (3.46) avoids divergence of the
weight vector because IIWt l1 2 will not be greater than M for T)t E [8-, 8+].
Thus the error let I always converges.
If T); tf. [8-, 8+], let
g(8+) :::; g(8-)
(3.90)
g(8+) > g(8-)
Then
Vi < Vi-I + ati3t (bE - atIr)'tei) + gt(T)i) + 2atftITJtetlbL
Vi-I + atft ( bL2 - at I rtet2)
+2ftatlT)tetl(bL - (1 - (1 - 0.5T)i)atpLI ptpt-d sgn (T)i)letl)
(3.91 )
Similarly, if the following condition is satisfied
letl > max{ V2bL , (1- (1 - 0.5T)i)atpLI ptpt_d- I sgn (T)i)bL} (3.92)
then the weights converge to their optimal values since L1 Vi :::; O. However, it
is possible that L1 Vi > 0 if the above condition is not satisfied. This indicates
that the weight vector W t may drift away over time. But, the weight learning
algorithm given by (3.46) constrains IIWt l1 2 to be not greater than M. So, the
error let I always converges.
In the light of the above analysis, the design of T)t may be given by
(3.93)
The analysis of the algorithm for the weight adaptation laws clearly shows
that if the minimal upper bound h of the approximation error is not known
both the weights and the estimation error are still bounded.
3.5 Examples
Three examples will be used to illustrated recursive nonlinear identification.
The first is a system described by an input output model. The second is a
system described by a state-space model. The third is the data set of the
Santa Fe times series prediction and analysis competition.
3.5 Examples 67
Example 3.1
Consider the nonlinear dynamical system described by the input-output model

(Narendra and Parthasarathy, 1990)
Yt-lYt-2Yt-3Ut-2(Yt-3 - 1)
Yt = 1 (3.94)
+ Yt-2 + Yt-3
2 2
The input Ut was set to be a random sequence between -0.5 and 0.5. Based
on the input output data, the orthogonal least algorithm was used in off-line
structure selection of the VPBF network. Their order of selection and the
corresponding weights are given in Table 3.1.
Table 3.1. Selection order of VPBFs (Example 3.1)
Priority Order i VPBF 'Pi Weight wi

1 Ut-l 0.9907
2
2 Yt-2 U t-l -0.6967
3 yL3 U t-l -0.7903
4 Yt-2Yt-3 -0.1437
5 Yt-2 U L2 0.4680
6 yLl Ut-3 -0.4705
7 Yt-2 U t-2 U t-3 0.3687
8 Yt-3 U t-3 0.0819
9 Yt-lYt-2 U t-3 -0.3732
10 Yt-2Yt-3 U t-l 0.0244
11 yL2 U t-3 -0.0304
12 Yt-lYt-2 0.0052
13 Ut-lUt-2Ut-3 0.2794
14 Yt-3 U L2 -2.9104
15 Yt-3 U i-l -0.0191
16 Yt-2 0.0052
17 Yt-3 U L3 0.0177
18 Yt-3 U t-l -0.0046
19 yr-lYt-3 -4.2739
20 Yt-lYt-3 U t-2 7.0854
On-line structure selection was then applied and the recursive weight learning
algorithm was used. The input was defined as
sin(27ft/250) t ::; 500

{ (3.95)
Ut = 0.8 sin(27ft/250) + 0.2 sin(27ft/25) t > 500
The simulation parameters were M = 1.4, 6 = 0.02. The growing VPBF

network began with the first five best VPBFs given in Table 3.1. The network
grew until the number of VPBFs was 20. The simulation results are shown in
Figures 3.2-3.5.
__ Ire system OLIpLt --- Ire estirrated OLIrxt
-0.5
_1L-__ ~ ____- L____ __ ~__ ____- L____

~L- ~ ~ __ ~ ____- L__ ~
o 100 ::ro :?ill 400 500 00) 700 00) 9Xl 1000
Fig. 3.2. Output Yt and estimated output Yt by on-line identification (Example 3.1)
0.08,----,-----.--~-,----_,----_r~--_,----,_----,_----,_--_,
0.06
0.04
0.02
o
·0.02
·0.04
·0.06
·0.08
·0.1
·0.1 2 L -_ _---'_ _ _ _---'_ _ _ _---'._ _ _ _--'-_ _ _ _--'-_ _ _ _--'--_ _ _ _-'--_ _ _ _"""'---_ _ _ _-'---_ _~
o 100 200 300 400 500 600 700 800 900 1000
Fig. 3.3. On-line estimation error et (Example 3.1)

3.5 Examples 69
0.9
0.8
o
hl 0.7
>
.c
~06
;,:
Q)
£ 0.5
'0
E
004
c
I
N
1? 0.3
f-
0.2
0.1
O~ __- L____L -__- L____L -__- L____L -__- L____L -__- L__ ~
o 100 200 300 400 500 600 700 800 900 1000
timet
Fig. 3.4. 2-norm of the weight vector W t using on-line identification (Example 3.1)
0.5
c
o
~-15
.~
Q)
~ -2
f-
-2.5
-3
-3.5
_4L----L----L----L--~--~~--~----~--~----~--~
o 100 200 300 400 500 600 700 800 900 1000
timet
Fig. 3.5. Estimation error et using off-line identification with 20 VPBFs (Example
3.1)
Example 3.2
Consider the nonlinear system described by the state-space model
(3.96)
1 + xi(t) + x§(t)
1. 8X I(t)X2(t)
,(
X2 t + 1) = 14
. u 3(t ) _ 2( )
1 + Xl t
(3.97)
y(t) = 5XI(t)U(t) - X2(t) sin(xdt)) (3.98)
The input u was set to be a random sequence between -0.5 and 0.5, as in
Example 3.1. Using the input output data, the priority of the VPBFs was
obtained using the orthogonal least squares algorithm. The order of the VPBFs
and the corresponding weights were given in Table 3.2.
The on-line structure selection technique and the recursive weight learning
algorithm were applied with the input given by
u(t) = 0.19 sin (27ft/50) + 0.095 sin(27ft/20) (3.99)
The parameters in the simulation were M = 7, 0 = 0.01. The growing VPBF

network started with the first 15 best VPBFs, and the network stopped grow-
ing when the number of the VPBFs reached 30. The simulation results are
depicted in Figures 3.6-3.9.
0.6,-----~-----.------,-----_,------,_----_,----_,,_--__.
___ the system output ._- the estimated output

0.5
0.4
0.3
0.2
.0.1 '---____--'--____---'______...L..-_ _ _ _----'-_ _ _ _ _ _-'----_ _ _ _- - ' -_ _ _ _- - - ' ' - - -_ _- - - - - '

o 50 100 150 200 250 300 350 400
Fig. 3.6. System output Yt and estimated output fit using on-line identification
(Example 3.2)
3.5 Examples 71
Priority Order i VPBF cpi Weight wi

2
1 ut- 2 -4.4882
2 yr-l -0.0710
3 Yt-lYt-2 U t-2 0.2592
4 Ut-2 U t-3 -2.8018
5 Yt-2 U t - l 0.5594
6 Yt-2 U t - l U t-2 -0.5614
7 Ut-l U t-2 0.5816
2
8 Yt-lYt-2 0.2685
2
9 Yt-2 U t - l 1.1335
10 YF-3 U t-2 -1.1162
11 Yt-l U t - l U t-2 0.7620
12 Yt-lYt-2 1.6789
13 Yt-lyt-3 -0.3676
14 ut-2 -0.8687
15 Ut-2 1.3205
16 Yt-l -0.2783
2
17 Ut- 2Ut_3 0.1529
2
18 Yt- 2U t-2 -1.0620
19 Yt-3 U t-2 U t-3 -0.6005
2
20 Yt-l U t-2 -1.0152
21 Yt-lYF-2 0.4841
22 YF-l Ut-3 -0.7660
23 Ut-2Ut-3 0.8791
24 Ut-3 0.8206
25 Ut-l U t-3 -2.1835
2
26 Yt-lUt-l 0.2744
27 UL3 0.5547
2
28 ut- 3 0.5798
29 1 0.0642
2
30 Yt- 2U t-3 -0.9806
0.08nr------,-------,-------,-------,-------,-------,-------,-------,
0.06
0.04
0.02
o
-0.02
-004
-0.06
-0.08
-0.1 L -______L -______L -______L -______L -______L -______L -______L -____ ~
o 50 100 150 200 250 300 350 400
8,-------,--------,-------,--------,-------,--------,-------,-------,
O~ ____ ~______ ~_______ L_ _ _ _ _ _ ~ ______ ~_ _ _ _ _ _L __ _ _ _

~ ______~
o 50 100 150 200 250 300 350 400
3.5 Examples 73
1.4,-----,------,------,------,------,-----,------,------,
1.2
0.8
0.6
0.4
0.2
.0.2 '--____--'-____----L_ _ _ _ _ _- ' - -_ _ _ _---'-_ _ _ _ _ _' - -_ _ _ _- - ' -_ _ _ _----'-_ _ _ _-----'

o 50 100 150 200 250 300 350 400
Fig. 3.9. Estimation error et using off-line identification with 30 VPBFs (Example
3.2)
Example 3.3
The algorithm developed in this chapter is applied to the data set D of the
Santa Fe times series prediction and analysis competition. The data set was
obtained from ftp.cs.colorado.edu/pub /Time-Series/SantaFe. Using the first
500 data of the data set, the priority of the VPBFs was obtained using the
orthogonal least squares algorithm. The order of the VPBFs and the corre-
sponding weights are given in Table 3.3.
Priority Order i VPBF cpi Weight wi

1 Yt-l 1.0171
2 Yt-2 -0.8607
3 Yt-3 0.6090
4 Yt-3 -3.1776
5 Y;-l 4.3020
6 Y;-2 3.7112
7 Yt-l -21.3440
8 Yi-2 70.7690
9 1 0.0297
10 Yt-lyr-2 -149.0500
The on-line structure selection technique and the recursive weight learning
algorithm were applied to the first 1000 items of the data set. The growing
VPBF network started with the first three best VPBFs, and stopped when the
number of VPBFs reached six. The simulation results are shown in Figures
3.10-3.13. In Figure 3.10, the sub-figure (b) is a larger scale version of the
sub-figure (a).
To test the disturbance rejection of the algorithm, a uniformed random
noise (its magnitude is 0.05) was added to the data set of the Santa Fe Time
series. The estimation error is shown in Figure 3.13. It is clear that the algo-
rithm still gives good estimation.
1.4,-----,-----,-----,-----,-----,-----,----,,----,-----,-----,
1.2 .... the real output __ the estimated ouput
0.8
0.6 H~IIIIIIIIIII""nlllllll·"'·iV
0.4
0.2
o~----~----~----~----~----~----~--~~--~----~----~
o 100 200 300 400 500 600 700 800 900 1000
Fig. 3.10. System output Yt and estimated output Yt using on-line identification
(Example 3.3)
1.4,-----,-----,-----,-----,-----,-----,-----,----,,----,,----,
1.2 ... the real output __ the estimated ouput

3.5 Examples 75
In addition, the recursive identification algorithm based on Volterra poly-

nomial basis function neural networks has also been compared with the se-
quential identification using radial basis function (RBF) neural networks (Liu
et al., 1995, 1996c) for this example. To have the same estimation error bound,
the sequential identification using RBF networks needed at least 20 radial ba-
sis functions. Thus, one may conjecture that VPBF networks need many fewer
basis functions than RBF networks to have the same estimation accuracy.
0.5,-----,-----,-----,-----,-----,-----,-----,-----,-----,-----,
0.4
0.3
0.2
0.1
·0.1
·0.2
·0.3
·0.4
·0.5 '-------"--------"--------"--------"--------"--------"---------'--------'--------'---------'
o 100 200 300 400 500 600 700 800 900 1000
6.5
400 500 600 700 800 900 1000
Fig. 3.13. Estimation error et for the data with random noise (Example 3.3)
The results of the above three examples show that in terms of the estimation
error the performance of the proposed recursive identification scheme is much
better than an off-line approach. Although the minimal upper bound of the
approximation error is unknown, the 2-norm of the weight vectors is bounded

by M and the estimation errors converge to the required bounds. The proposed
scheme also has good disturbance rejection.
3.6 Summary
A recursive nonlinear identification scheme based on VPBF networks together

with orthogonal least squares and growing network algorithms has been pre-
sented. The structure selection of nonlinear polynomials in the VPBF network
and parameter estimation of the selected model were discussed. The orthog-
onal least squares algorithm was used for off-line structure selection to find
an initial set of VPBF candidate terms which were ranked according to the
reduction in the approximation error. A growing network technique was then
applied for on-line structure selection to obtain an appropriately sized network.
An on-line recursive weight learning algorithm was developed for parameter
estimation and its properties were also analysed using Lyapunov methods. The
learning algorithm ensures that the weights and approximation error converge
to the required bounds without assuming the approximation error is white
noise or that the upper bound of this is known.
CHAPTER 4
MULTI OBJECTIVE NONLINEAR IDENTIFICATION
4.1 Introduction
The identification of nonlinear systems can be posed as a nonlinear functional

approximation problem. From the vVeierstrass Theorem (Powell, 1981) and the
Kolmogorov theorem (Sprecher, 1965) in approximation theory, it is shown
that the polynomial and many other approximation schemes can approximate
a continuous function arbitrarily well. In recent years, a number of nonlinear
system identification approaches, particularly identification using neural net-
works, based on the universal approximation theorem (Cybenko, 1989), are
applications of a similar mathematical approach.
U sing the approximation approach, two key questions concerning nonlinear
system identification are important: how to judge the accuracy for the nonlin-
ear function being approximated and how to choose nonlinear function units
to guarantee the accuracy. Most nonlinear system identification approaches
fix the number of nonlinear function units and use only a single performance
function, e.g., L 2 -norm of the difference between the real nonlinear system and
the nonlinear model which results in the well-known least squares algorithm,
to measure and judge the accuracy of the identification model and to optimise
the approximation. The assumption behind choosing the L 2 -norm is that the
noise in the process and measurements has Gaussian (normal) distributions.
In nonlinear system identification there are often a number of objectives
to be considered. The objectives are often conflicting and no identification
which can be considered best with respect to all objectives exists. Hence, there
is an inevitable trade-off between objectives, for example, the distance mea-
surement and maximum difference measurement between the real nonlinear
system and the nonlinear model. Model comparison methods, such as infor-
mation criterion (Akaike, 1974), Bayesian model selection (MacKay, 1992) and
minimum description length (MDL) (Rissanen, 1989), consider two such ob-
jectives, namely, Euclidean distance (L 2 -norm) and model complexity. These
procedures allow the selection of the best amongst a small number of candidate
models (MacKay, 1992). In addition to the above two objectives, we consider
the L(X)-norm of the difference between the real nonlinear system and the non-
linear model because it represents the accuracy bound of the approximation
achieved by the estimated model. These considerations lead to the study of
multiobjective nonlinear system identification.
78 4. Multiobjective Nonlinear Identification
In this chapter, three multiobjective performance functions are introduced

to measure the approximation accuracy and the complexity of the nonlin-
ear model for noise with mixed distribution. Those functions are the L 2 - and
LCXl-norms of the difference measurements between the real nonlinear system
and the nonlinear model, and the number of nonlinear units in the nonlinear
model. Genetic algorithms are used to search for a suboptimal set of nonlinear
basis functions of the model to simplify model estimation. Two neural net-
works are applied for the model representation of the nonlinear systems. One
is the Volterra polynomial basis function (VPBF) network and the other is
the Gaussian radial basis function (GRBF) network. A numerical algorithm
for multiobjective nonlinear model selection and identification using neural
networks and genetic algorithms is also detailed. Two applications in identifi-
cation of a nonlinear system and approximation of a nonlinear function with
a mixed noise demonstrate the operation of the algorithm.
4.2 Multiobjective Modelling with Neural Networks
The modelling of nonlinear systems has been posed as the problem of selecting
an approximate nonlinear function between the inputs and the outputs of
the systems. For a single-input single-output system, it can be expressed by
the nonlinear auto-regression moving average model with exogenous inputs
(NARMAX) (Chen and Billings, 1989), that is,
y(t) = f(y(t - 1), y(t - 2), ... , y(t - ny), u(t - 1), u(t - 2), ... , u(t - nu) + e(t)
(4.1)
where f(.) is an unknown nonlinear function, y is the output, u is the control

input and e is the noise, respectively, ny, n u , ne are the corresponding maxi-
mum delays. It is assumed that the noise e(t) is a white noise. For the colour
noise case, the modelling of the system using neural networks below needs
some slight modifications, as suggested in Nerrand et al. (1994).
The nonlinear function f(.) in the above NARMAX model can be approx-
imated by a single-layer neural network, i.e., a linear combination of a set of
basis functions (Billings and Chen, 1992; Liu et al., 1998a).
N
j(x,p) = L WkCPk(X, dk ) (4.2)

k=l
where
x = [y(t -l),y(t - 2), ... ,y(t - ny),u(t -l),u(t - 2), ... ,u(t - nu)] (4.3)
CPk(X,dk ) (k = 1,2, ... , N) is the basis function and p is the parameter vector
containing the weights Wk and the basis function parameter vectors d k . If the
basis functions cpdx, d k ) do not have the parameters d k , then it is denoted by
4.2 Multiobjective Modelling with Neural Networks 79
CPk(X). Two sets of basis functions are used: a set of Volterra polynomial basis
functions (VPBF) and a set of Gaussian radial basis functions (GRBF).
Multivariate polynomial expansions have been suggested as a candidate
for nonlinear system identification using the N ARMAX model (Billings and
Chen, 1992). The Volterra polynomial expansion (Schetzen, 1980) has been
cast into the framework of nonlinear system approximations and neural net-
works (Rayner and Lynch, 1989). A network whose basis functions consist
of the Volterra polynomials is named the Volterra polynomial basis function
network. Its functional representation is given by
f(x) j(x;p) + o(x 3) (4.4)

j(x;p) a+xTb+xTCx
a + blXl + b2X2 + ... + cllxi + C12XIX2 + C22X~ + ... + Cnnx;,
2 2 2 T
[a, bl , b2, ... , Cll, C12, C22, ... , cnn ][l, Xl, X2, ... , Xl' XIX2, X2, ... , xnl
N
L WkCPk(X) (4.5)
k=l
where the parameter vector represents the weights of the networks and
[Wl,W2,W3, ... ,Wn+2,Wn+3,Wn+4, ... ,wNl =

[a, bl , b2, ... , C11, C12, C22···, cnnl (4.6)
[CPl, CP2, CP3, ... , CPn+2, CPn+3, CPn+4, ... , cP N](X) =
[1,Xl,X2, ... ,X 2l ,XlX2,X22, ... ,x n2 l (4.7)
p = {wd is the set of parameters or linear weights and {cpdx)} the set of
basis functions being linearly combined, o(x 3 ) denotes the approximation error
caused by the high order (2: 3) of the input vector. The basis functions are
essentially polynomials of zero, first and higher orders of the input vector
X E nn.
Radial basis functions were introduced as a technique for multivariable
interpolation (Powell, 1987), which can be cast into an architecture similar to
that of the multilayer perceptron (Broomhead and Lowe, 1988). Radial basis
function networks provide an alternative to the traditional neural network
architectures and have good approximation properties. One commonly used
radial basis function network is the Gaussian radial basis function (GRBF)
neural network. The nonlinear function approximated by the GRBF network
is expressed by
(4.8)
where C k is the weighting matrix of the k-th basis function, and p is the
parameter vector containing the weights Wk and the centres d k (k = 1,2, ... , N).
For the sake of simplicity, it assumes that C k = I.
Now, we discuss multiobjective performance criteria for nonlinear model

selection and identification. Let us define the following performance functions
(Liu and Kadirkamanathan, 1995, 1999):
¢l(p) = Ilf(x) - j(x;p)112 (4.9)

¢2(P) = Ilf(x) - j(x;p)llao (4.10)
¢3(p) = 1J(j(x;p)) (4.11)
where 11.112 and 11.llao are the L2- and Lao-norms of the function (.), 1J(j(x;p))
is the complexity measurement of the model.
For model selection and identification of nonlinear systems, there are good
reasons for giving attention to the performance functions ¢i(p) (i = 1,2,3).
The practical reasons for considering the performance function ¢1 (p) is even
stronger than the other performance functions ¢2 (p) and ¢3 (p). Statistical con-
siderations show that it is the most appropriate choice for data fitting when
errors in the data have a normal distribution. Often the performance function
¢1 (p) is preferred because it is known that the best approximation calculation
is straightforward to solve. The performance function ¢2 (p) provides the foun-
dation of much of approximation theory. It shows that when this is small, the
performance function ¢1 (p) is small also. But the converse statement may not
be true. A practical reason for using the performance function ¢2 (p) is based
on the following. In practice, an unknown complicated nonlinear function is
often estimated by one that is easy to calculate. Then it is usually necessary to
ensure that the greatest value of the error function is less than a fixed amount,
which is just the required accuracy of the approximation. The performance
function ¢3 (p) is used as a measure of the model complexity. A smaller per-
formance function ¢3 (p) indicates a simpler model in terms of the number
of unknown parameters used. Under similar performances in ¢1 (p) and ¢2 (p)
by two models, the simpler model is statistically likely to be a better model
(Geman et at., 1992).
In order to give a feel for the usefulness of the multiobjective approach
as opposed to single-objective design techniques, let us consider the minimi-
sation of the cost functions ¢i(P) (i = 1,2,3). Let the minimum value of ¢i
be given by ¢t, for i = 1,2,3, respectively. For these optimal values ¢t there
exist corresponding values given by ¢j[¢tl (j i- i,j = 1,2,3), for i = 1,2,3,
respectively, and the following relations hold:
min { ¢1 [¢~], ¢d ¢~]} :::: ¢r (4.12)

min { ¢2 [¢rJ, ¢2 [¢~]} :::: ¢~ (4.13)
min{ ¢3[¢n, ¢3[¢~]} :::: ¢~ (4.14)
If one of the performance functions ¢i (i = 1,2,3) is minimised individ-
ually (single-objective approach), then unacceptably large values may result
for other performance functions ¢j (j i- i, j = 1,2,3). Generally, there does
not exist a solution for all performance functions ¢i(p) for i = 1,2,3 to be
minimised by the same parameter vector p.
4.3 Model Selection by Genetic Algorithms 81
There are many methods available to solve the above multiobjective op-
timisation problem (Liu et al., 2001). Following the method of inequalities
(Zakian and AI-Naib, 1973; Whidborne and Liu, 1993), we reformulate the
optimisation into a multiobjective problem as
CPi(p) :S Ci, for i = 1,2,3 (4.15)
where the positive real number Ci represents the numerical bound on the per-
formance function CPi(P) and is determined by the designer. Generally speak-
ing, the number Ci is chosen to be a reasonable value corresponding to the
performance function CPi according to the requirements of the practical sys-
tem. For example, Cl should be chosen between the minimum of CPl and the
practical tolerable value on CPl. The minimum of CPl can be known by the least
squares algorithm. The practical tolerable value means if CPl is greater than it,
the modelling result cannot be accepted. In addition, if Ci is chosen to be an
unreachable value, Section 4.4 will show how to deal with this problem.
4.3 Model Selection by Genetic Algorithms
Many different techniques are available for optimising the design space as-
sociated with various systems. Recently, direct-search techniques, which are
problem-independent, have been proposed as a possible solution for the diffi-
culties associated with the traditional techniques. One direct-search method is
the genetic algorithm (GA) (Goldberg, 1989). Genetic algorithms are search
procedures which emulate the natural genetics. They are different from tradi-
tional search methods encountered in engineering optimisation (Davis, 1991).
In Goldberg (1989), it is stated that (a) the GA searches from a population
of points, not a single point and (b) the GA uses probabilistic and not deter-
ministic transition rules.
4.3.1 Genetic Algorithms
Genetic algorithms are invented by simulating some of the processes observed

in natural evolution. Biologists have been intrigued with the mechanism of evo-
lution since the evolutionary theory of biological change was accepted. Many
people are astonished that life at the existing level of complexity could have
evolved in the relatively short time suggested by the fossil record. The mech-
anisms that drive this evolution are not fully understood, but some of its
features are known. Evolution takes place on chromosomes, which are organic
devices for encoding the structure of living beings. A living being is partly
created through a process of decoding chromosomes.
Although the specificities of chromosomal encoding and decoding processes
are not fully known, the following general features of the evolution theory are
widely accepted.
(a) The evolution process operates on chromosomes rather than on the living
beings which they encode.
(b) The natural selection process causes the chromosomes that encode suc-
cessful structures to reproduce more often than ones that do not.
(c) The reproduction process is the point at which evolution takes place. The
recombination process may create quite different chromosomes in children
by combining material from the chromosomes of two parents. Mutations
may result in the chromosomes of biological children being different from
those of their biological parents.
(d) Biological evolution has no memory. Whatever it knows about producing
individuals that will function well in their environment is contained in
the gene pool, which is the set of chromosomes carried by the current
individuals, and in the structure of the chromosome decoders.
In the early 1970s, the above features of natural evolution intrigued the sci-
entist John Holland (1975). He believed that it might yield a technique for
solving difficult problems to appropriately incorporate these features in a com-
puter algorithm in the way that nature has done through evolution. So, he
began the research on algorithms that manipulated strings of binary digits (Is
and Os) that represent chromosomes. Holland's algorithms carried out simu-
lated evolution on populations of such chromosomes. Using simple encodings
and reproduction mechanisms, his algorithms displayed complicated behaviour
and solved some extremely difficult problems. Like nature, they knew nothing
about the type of problems they were solving. They were simple manipulators
of simple chromosomes. When the descendants of those algorithms are used
today, it is found that they can evolve better designs, find better schedules
and produce better solutions to a variety of other important problems that we
cannot solve using other techniques.
When Holland first began to study these algorithms, they did not have a
name. As these algorithms began to demonstrate their potential, however, it
was necessary to give them a name. In reference to their origins in the study
of genetics, Holland named them genetic algorithms. A great amount of re-
search work in this field has been carried out to develop genetic algorithms.
Now, the genetic algorithm is a stochastic global search method that mimics
the metaphor of natural biological evolution. Applying the principle of sur-
vival of the fittest to produce better and better approximations to a solution,
genetic algorithms operate on a population of potential solutions. A new set
of approximations at each generation is created by the process of selecting
individuals, which actually are chromosomes in GAs, according to their fitness
level in the problem domain and breeding them using operators borrowed from
natural genetics, for example, crossover and mutation. This process results in
the evolution of populations of individuals that are better suited to their envi-
ronment than the individuals that they were created from, just as in natural
adaptation.
It is well known that natural phenomena can be abstracted into an algo-
rithm in many ways. Similarly, there are a number of ways to embody the
preceding features of the theory of natural evolution in genetic algorithms. To

begin with, let us consider two mechanisms that link a genetic algorithm to
the problem it is solving. One is the way of encoding solutions to the prob-
lem on chromosomes and the other is the evaluation function that returns a
measurement of the worth of any chromosome in the context of the problem.
The way of encoding solutions plays an important role in genetic algo-
rithms. The technique for encoding solutions may vary from problem to prob-
lem and from genetic algorithm to genetic algorithm. In early genetic algo-
rithms, encoding was carried out using bit strings. Later, genetic algorithm
researchers developed many other types of encoding technique. Probably no
one technique works best for all problems, and a certain amount of art is
involved in selecting a good decoding technique when a problem is being at-
tacked. Thus, when selecting a representation technique in the context of a
real-world problem, several factors should be considered.
The evaluation function is the link between the genetic algorithm and the
problem to be solved. An evaluation function takes a chromosome as input and
returns a number or list of numbers that is a measure of the chromosome's
performance. Evaluation functions play the same role in genetic algorithms as
the environment plays in natural evolution. The interaction of an individual
with its environment gives a measure of its fitness, and the interaction of a
chromosome with an evaluation function provides a measure of fitness that the
genetic algorithm uses when carrying out reproduction.
It is assumed that the following initial components are given: a problem, a
way of encoding solutions to it, and a function that returns a measure of how
good any encoding is. We can use a genetic algorithm to carry out simulated
evolution on a population of solutions. Here is the basic structure of genetic
algorithms that uses these components to simulate evolution.
(a) Initialise a population of chromosomes.
(b) Evaluate each chromosome in the population.
(c) Create new chromosomes by mating current chromosomes.
(d) Remove some members of the population to make room for the new chro-
mosomes.
(e) Insert the new chromosomes into the population.
(f) Stop and return the best chromosome if time is up, otherwise, go to (c).
Following the above structure, a pseudo-code outline of genetic algorithms is
shown below. The population of chromosomes at time t is represented by the
time-dependent variable P(t), with the initial population of random estimates
P(O).
procedure GA
begin
t=O;
initialise P(t) = P(O);
evaluate P( t);
while not finished do
begin
t=t+l;
select P(t) from P(t - 1);
reproduce pairs in P( t) by
begin
crossover;
mutation;
reinsertion;
end
evaluate P (t) ;
end
end
If all goes well through this process of simulated evolution, an initial population
of unexceptional chromosomes will improve as the chromosomes are replaced
by better and better ones. The best individual in the final population produced
can be a highly evolved solution to the problem.
The genetic algorithm differs substantially from more traditional search
and optimisation methods, for example, gradient-based optimisation. The most
significant differences are the following.
(a) GAs search a population of points in parallel rather than a single point.
(b) GAs do not require derivative information on an objective function or
other auxiliary knowledge. Only the objective function and corresponding
fitness levels influence the directions of search.
(c) GAs use probabilistic transition rules, not deterministic ones.
(d) GAs can work on different encodings of the parameter set rather than the
parameter set itself.
It is important to note that the GA provides many potential solutions to a
given problem and the choice of the final solution is left to the designer. In
cases where a particular optimisation problem does not have one individual
solution, then the G A is potentially useful for identifying these alternative
solution simultaneously.
4.3.2 Model Selection
Recently, genetic algorithms have been applied to control system design (see,
e.g., Davis, 1991; Patton and Liu, 1994; Liu and Patton, 1998). GAs have
also been successfully used with neural networks to determine the network
parameters (Schaffer et al., 1990; Whitehead and Choate, 1994), with NAR-
MAX models (Fonseca et al., 1993) and for nonlinear basis function selection
for identification using Bayesian criteria (Kadirkamanathan, 1995). Here the
GA approach is applied to the model selection and identification of nonlinear
systems using multiobjective criteria as the basis for selection.
The model selection can be seen as a subset selection problem. For the
model represented by the VPBF network, the principle of model selection using
the genetic algorithms can be briefly explained as follows: For the vector x E
R n , the maximum number ofthe model terms is given by N = (n+1)(n+2)/2.
Thus, there are N basis functions which are the combination of 1 and the
elements of the vector x. Then there are 2N possible models for selection.
Each model is expressed by an N-bit binary model code c, i.e., a chromosome
representation in genetic algorithms. If some bits of the binary model code c
are zeros, it means that the basis functions corresponding to these zero bits
are not included in the model.
For example, if the vector x E R 3 , the maximum number of the model terms
is 10. Then there are 1024 possible models. Each model can be expressed by a
lO-bit binary model code. Thus the Volterra polynomial basis functions are
If the lO-bit binary model code is c = [1 0 0 1 0 0 1 0 1 0], the

model can be written as
j(x;p) pT diag(c)<p(X)
[WI, W4, W7, W9][<Pl, <P4, <P7, <pg]T
WI + W4X3 + W7XIX3 + WgX 22 (4.17)
For the model represented by GRBF networks, the maximum number of the
model terms is given by N, the number of the Gaussian functions, and there
are 2N possible models for selection and also N possible radial basis functions
with their centres d k • Thus a chromosome representation in genetic algorithms
consists of an N-bit binary model code c and N real number basis function
centres dk (k = 1,2, ... , N), i.e.,
[c, dr, dr, ... , d~] (4.18)
For example, if N = 5, x E R2 and the chromosome
(4.19)
then the model is given by
(4.20)
j=1 j=1
It is evident from the above that only the basis functions corresponding to the
nonzero bits of the binary model code c are included in the selected model.
Given a parent set of binary model codes and basis function parameter vectors,
a model satisfying a set of performance criteria is sought by the numerical
algorithm.
4.4 Multiobjective Identification Algorithm
With three objectives (or cost functions) for model selection and identification,
the numerical algorithm for this multiobjective identification problem is not
a straightforward optimisation algorithm, such as for the least squares algo-
rithm. This section develops a multiobjective identification algorithm which
uses genetic algorithm approaches and the method of inequalities to get a
numerical solution satisfying the performance criteria.
Now, let us normalise the multiobjective performance functions as the fol-
lowing.
(4.21 )
Let ri be the set of parameter vectors p for which the i-th performance criterion
is satisfied:
ri = {p: 1/Ji(P) :::; I} (4.22)
Then the admissible or feasible set of parameter vectors for which all the
performance criteria hold is the intersection
(4.23)
Clearly, p is an admissible parameter vector if and only if
(4.24)
which shows that the search for an admissible p can be pursued by optimisa-
tion, in particular by solving
min {max{ 1/Jl (p), 1/J2 (p), 1/J3 (p)} } (4.25)

p
subject to (4.24).
The optimisation needs to be carried out using iterative schemes. Now, let
pq be the value of the parameter vector at the q-th iteration step in optimisa-
tion, and define
r? = {p : 1/Ji(P) :::; Ll q}, for i = 1,2,3 (4.26)
where
(4.27)
and also define
r q = r1q n ri n rl (4.28)
Eq = 1/Jdpq) + 1/J2(pq) + 1/J3(pq) (4.29)
rq is the q-th set of parameter vectors for which all performance functions
satisfy
(4.30)
4.4 Multiobjective Identification Algorithm 87
It is clear that r q contains both pq and the admissible set r. Eq is a combined

measurement of all performance functions. If we find a new parameter vector
pq, such that
Liq < Llq (4.31)
or
(4.32)
where Liq and Eq are defined similarly to Llq and Eq, then we accept pq as the
next value of the parameter vector. Then, we set pHI = pq. We then have
for i = 1,2,3 (4.33)
and
(4.34)
so that the boundary of the set in which the parameters are located has been
moved towards the admissible set, as shown in Figure 4.1.
Fig. 4.1. Movement of the boundary of the set rq
The process of finding the optimisation solution is terminated when both Llq
and Eq cannot be reduced any further. But the process of finding an admissible
parameter vector p stops when
(4.35)
i.e., when the boundaries of r q have converged to the boundaries of r. Equa-

tion (4.35) is always achievable if Ci is properly set, for i = 1,2,3. On the other
hand, if the Llq persists in being larger than 1, this may be taken as an indica-
tion that the performance criteria may be inconsistent, whilst their magnitude
gives some measure of how closely it is possible to approach the objectives. In
this case, some of the parameters Ci need to be increased. Generally speaking,
the parameter Ei corresponding to the largest normalised performance func-

tion 'l/Ji (pq) should be considered to increase first and then that corresponding
to the second largest one and so on. This means that some of the performance
criteria should be relaxed until they are satisfied. From a practical viewpoint,
the approximate optimal solution is also useful if the optimal solution is not
achievable. Genetic algorithms have been used in multiobjective optimisation
and have provided better results over conventional search methods (Davis,
1991; Hajela and Lin, 1992; Schaffer, 1985).
Here, we combine genetic algorithms with the least squares algorithm in
deriving the estimation algorithm. The steps of the identification algorithm to
be executed for GA implementation are as follows (Liu and Kadirkamanathan,
1995, 1999):
Step 1: Chromosomal representation

Each chromosome in the population consists of an N-bit binary model
code C and a real number basis function parameter vector D, where N is
the number of the basis functions for the nonlinear model selection. For ex-
ample, for the VPBF network there is not the vector D and for the GRBF
network the vector D contains all basis function centres dk (k = 1,2, ... , N),
i.e., D = [df,dL ... ,d~]
Step 2: Generation of the initial population

The M chromosomes [c, D] for the initial population are randomly gener-
ated, where M is the population size and is often chosen to be an odd number.
Step 3: Evaluation of the performance functions

Given the j-th binary model code Cj and basis function parameter vector
D j , then the j-th nonlinear model is known. Using the least squares algorithm,
the j-th weight vector Wj can be computed easily, based on the data of the
vector x, the binary model code Cj and the basis function parameter vector
D j . Then evaluate the normalised performance functions 'l/Ji(Sj) (i = 1,2,3),
where Sj = [Wj, Cj, D j ], and
L1j = max 'l/Ji(Sj) ( 4.36)

'=1,2,3
3
Ej = L 'l/Ji(Sj) ( 4.37)
i=l
These computations are completed for all M sets of chromosomes, i.e. j

1,2, ... ,M.
Step 4: Selection
According to the fitness of the performance functions for each chromo-
some, delete the (M -1) /2 weaker members of the population and reorder the
chromosomes. The fitness of the performance functions is measured by
4.4 Multiobjective Identification Algorithm 89
for j = 1,2, ... ,M (4.38)
Step 5: Crossover
Offspring binary model codes are produced from two parent binary model
codes so that their first half elements are preserved. The second half elements in
each parent are exchanged. The average crossover operator is used to produce
offspring basis function parameter vectors. The average crossover function is
defined as
. 1
for J = 1, 2, ... , "2 (M - 1) (4.39)
Then the (M - 1)/2 offsprings are produced.
Step 6: Mutation
A mutation operator, called creep (Davis, 1991), is used. For the binary
model codes, it randomly replaces one bit in each offspring binary model code
with a random number 1 or O. For the offspring basis function parameter
vectors, the mutation operation is defined as
. 1
for J = 1, 2, ... , "2 (M - 1) (4.40)
where f3 is the maximum to be altered and ~j E [-1, 1] is a random variable

with zero mean.
Step 7: Elitism
The elitist strategy copies the best chromosome into the succeeding gen-
eration. It prevents the best chromosome being lost in the next generation. It
may increase the speed of domination of a population by a super individual,
but on balance it appears to improve genetic algorithm performance. The best
chromosome is defined as one satisfying
where
(4.42)
Em and E z correspond with .1 m and .1 z , which are defined in (4.36) and (4.37),
a > 1 and 6 < < a is a small positive number, which are given by the designer.
a and 6 are chosen such that a6 > 6, e.g., a = 1.1 and 6 = 0.05. This means
that sacrificing .1 m a little gives significant improvement in E b . Thus, the best
chromosome is one that has the smallest Eb in the neighbourhood of Em.
Step 8: New offspring

Add the (M - 1)/2 new offspring to the population which are generated
in a random fashion. Actually, the new offspring are formed by replacing ran-
domly some elements of the best binary model code and mutating the best
basis function parameter vector with a probability.
Step 9: Stop check

Continue the cycle initiated in Step 3 until local convergence of the algo-
rithm is achieved. This local convergence is defined as the population satisfying
1
for j = 1, 2, ... , "2 (M - 1) ( 4.43)
where Llb is corresponding to E b, and c is a positive number. This implies that

the difference between the chromosomes in the first half population and the
best chromosome is small in the sense of their performance measurement Ll j .
Take the best solution in the converged generation and place it in a second
"initial generation". Generate the other M - 1 chromosomes in this second
initial generation at random and begin the cycle again until a satisfactory
solution is obtained or Llb and Eb cannot be reduced any further. In addition,
for mixed noise distribution, the least squares algorithm in Step 3 should be
replaced by a more robust modified least squares algorithm as suggested in
Chen and Jain (1994).
4.5 Examples
This section introduces two applications. The first one considers identification
of a real system. The second one demonstrates approximation of a nonlinear
function by a mixed noise with different variance.
Table 4.1. Parameters for the identification algorithms
Parameter Name VPBF Network GRBF Network

model term number N 45 10
chromosome length 45 50
y(t - 1)
y(t - 2)
variable vector x
y(t - 3)
y(t - 4) [Y(t-l l
y(t - 2) ]
u(t - 1) u(t - 1)
u(t - 2) u(t - 2)
u(t - 3)
u(t - 4)
El 1.5 1.5
E2 0.3 0.3
E3 7 7
4.5 Examples 91
Example 4.1
We use the data generated by a large pilot-scale liquid level nonlinear system
with zero mean Gaussian input signal (Fonseca et al., 1993): 1000 pairs of
input output data were collected. The first 500 pairs were used in the model
selection and identification of the system, while the remaining 500 pairs were
used for validation tests. The Volterra polynomial basis function network and
the Gaussian radial basis function network were applied to select and iden-
tify the model of the system using the multiobjective identification algorithm
developed in Section 4.4.
The time lags ny and nu were obtained by a trial and error process based on
estimation of several models. During the simulation, it was found that for the
VPBF network, if ny and nu were greater than 4, the performance functions
improved very little. Similarly, for the GRBF network, if ny and nu were
greater than 2 the performance functions did not reduce significantly. It is clear
that the time lags ny and nu for the VPBF network are different from those for
the GRBF network. The main reason is that those two networks use different
kinds of basis functions which have different properties. The parameters for
the algorithm are given in Table 4.1.
VPBF Network
Since the maximum number of model terms is 45, there are 245 possible models
for selection. But, after 210 generations, an optimal model has been found by
the algorithm. The performance functions are
(PI (p) = 1.8000 (4.44)
(P2 (p) = 0.3965 (4.45)
¢3(p) = 3 (4.46)
The model represented by the VPBF network is
y(t) = 1.3234y(t - 1) - 0.3427y(t - 2) + 0.075y(t - 4)u(t - 2) (4.4 7)
The convergence of the performance functions with respect to generations
is given in Figure 4.2. It shows that the performance functions converge in
about 100 generations. In fact, in generation 94, the performance functions are
¢I(p) = 1.8119, ¢2(P) = 0.4071, and ¢3(P) = 3. After that, no improvement is
made until in generation 208 ¢1 (p) = 1.8, ¢2 (p) = 0.3965 and ¢3 (p) = 3. The
measured and estimated outputs, and the residual error of the system for the
training data are shown in Figure 4.3. The measured and estimated outputs,
and estimation error of the system for the validation test of the model identified
via the VPBF network are illustrated in Figure 4.4. Clearly, the performance
functions ¢1 (p) and ¢2 (p) are very close to the desired requirements. But they
do not satisfy them. This may result from the general drawback (premature
convergence) of genetic algorithms.
2.5,------------, 3,------------,
2 ~I 2.5
1.5 Il f psi2(p)
''=' ,-_1 f - - - - - , - - - -
2 phi1(p)
1~ psi1(p) ,------------1
1.5~
I
0.5 psi3(p) ~-
OL---~---~--~ 1L---~---~---~
o 100 200 300 o 100 200 300
0.7,------------, 20,------------,
0'6\~
0.5
0.4
0.3 L-_ _
o
~
~-P;-:hi~2(:-:P)~'--------1
___ __
100 200 300
~ ~
::1\
OL---~---~--~
o 100
phi3(p)
L -_ _ _ _ _ _--j
200 300
Fig. 4.2. Convergence of the performance functions using the VPBF network
-1
-2 _ _ Measured output
---- Estimated output
-3
0 50 100 150 200 250 300 350 400 450 500
The measured and estimated outputs of the system.
0.4
0.2
-0.2
_0.4L--~---L--L--~---L--~-~L---L--~-~
o 50 100 150 200 250 300 350 400 450 500

The estimation error of the system.
Fig. 4.3. Training results for the system using the VPBF network
4.5 Examples 93
-1
-2 _ Measured output
-3
500 550 600 650 700 750 800 850 900 950 1000
0.6
0.4
-0.4 '------'-------'---'-------'.---'-------'----'-------'---'------'
500 550 600 650 700 750 800 850 900 950 1000
Fig. 4.4. Validation results for the system using the VPBF network
GRBF Network
Although the maximum number of model terms is only 10 (i.e., 1024 possible
models for selection), the search dimension ofthe basis function centre param-
eters is 40 in real number space (i.e., infinite possibilities for selection). After
700 generations the performance criteria are almost satisfied. At this stage,
cPdp) = 1.5643 (4.48)
cP2(P) = 0.2511 (4.49)
cP3(p) = 5 (4.50)
In order to obtain a better performance, the basis function parameter vector
was searched for another 100 generations using the algorithm with a fixed num-
ber of model terms, i.e., let cP3(p) = 5 for this case. Finally, the performance
functions are
cPdp) = 1.2957 (4.51 )
cP2 (p) = 0.1724 (4.52)
cP3(p) = 5 (4.53)
The model represented by the GRBF network is
s
y(t) = ~ Wi exp
{2
- ~(y(t - j) - dij )2 -
2
~(u(t - j) - dij )2
} (4.54)
where
W1
W2 1 -1.2470
r-2.63631
W3 = -1.7695 (4.55)
r W4 0.9437
Ws -0.5341
- 2.1577 -1.8855 -0.8975 -0. 2841 1

-1.2717 -2.2730 0.3445 0.3315
{d ij } = r -0.6345 -1.1223 -1.1615 -0.3666 (4.56)
0.7344 1.0223 0.5469 0.1989
-1.2336 -0.5928 0.3212 0.5754
The performance of the GRBF network is shown in Figures 4.5-4.7. Figure 4.5
shows the convergence of the performance functions with respect to genera-
tions. The measured and estimated outputs, and residual error of the system
for the training data for the model identified via the GRBF network are shown
in Figure 4.6. The measured and estimated outputs, and estimation error of
the system for the validation test data for the model identified via the GRBF
network are shown in Figure 4.7.
In order to see the importance of the Loo norm measure of the accuracy
bound of the approximation, the performance function (P2 (p) is not used in
the next simulation. So, only two performance functions (h (p) and ¢3 (p) are
considered. Their required upper bounds Cl and C3 are still set to be 1.5 and
7. The simulation procedure is exactly the same as the above. The following
performance is obtained.
¢dp) = 1.2900 (4.57)
¢3(p) =4 (4.58)
The weight vector and centres of the network are
Wl] = [1.2394]
[W2
W3
-2.4092
-2.8293
(4.59)
W4 -2.5141
1.3219 0.4971 0.4451

0.0935]
[ -0.5826 -2.1796 0.2636 0.5724
{d ij } = -1.6041 (4.60)
-0.0912 -0.7477 -0.0275
-2.1362 -1.9554 -0.7189 -0.2974
4.5 Examples 95
4,-------------------, 6,-------------------,
5
3
.- - -I.-----!=.1,...-::.
psi2(p) -
0
0 200 400 600 800 200 400 600 800
0.8 8
1fU
0.6 7 -
0.4 6
phi2(p)
0.2 5 phi3(p) -
0 4
0 200 400 600 800 0 200 400 600 800
Fig. 4.5. Convergence of the performance functions using the GRBF network
-1
-2 __ Measured output
_3L----L----L----L----~--~----~--~----~----L---~
o 50 100 150 200 250 300 350 400 450 500

0.2,----,----__---,----,---__----,----,----__----,---~
_0.2~---L----L----L----~--~----~--~----~----~--~
o 50 100 150 200 250 300 350 400 450 500
Fig. 4.6. Training results for the system using the GRBF network
-1
_3L--~---L--L--~---L--L--~---L--L--~
500 550 600 650 700 750 800 850 900 950 1000
0.4 ,------,--,-----,--,-------,--,------,----,--,------,
-0.2
_0.4L--~---L--L--~---L--L--~---L--L--~
500 550 600 650 700 750 800 850 900 950 1000
Fig. 4.7. Validation results for the system using the GRBF network
3.5,----,----,----,------, 9
3 8
1L--~--~--~--~
o 200 400 600 800
5
4
0 200
In 400
phi3(p)
600 800
Fig. 4.8. Convergence of the performance functions using the GRBF network with-
out (/J2
4.5 Examples 97
-1
_3L----L----L----L----~--~----~--~----~----L---~
o 50 100 150 200 250 300 350 400 450 500

0.2
-0.2
o 50 100 150 200 250 300 350 400 450 500

Fig. 4.9. Training results for the system using the GRBF network without (P2
-1
-1.5
___ Measured output
-2 ---- Estimated output
_2.5L----L----L----L----L----L----~--~----~--~--~
500 550 600 650 700 750 800 850 900 950 1000
0.4,----,----,----,----,----,----,----,----,----,----,
-0.4
_0.6L----L----L----L----L----L----~--~----~--~--~
500 550 600 650 700 750 800 850 900 950 1000
Fig. 4.10. Validation results for the system using the GRBF network without (P2
The simulation results are shown in Figures 4.8-4.10. It is clear from the re-
sults that although the performance functions CPl (p) and CP3 (p) are reduced,
the maximum difference cP2 (p) of the approximation for the identification and
validity test is much greater than the previous case. So, it shows that if the per-
formance functions cPdp) and cP3 (p) are sacrificed somewhat, the performance
function cP2 (p) is improved significantly.
The selection, identification and validation results for the large pilot-scale
liquid level nonlinear system show that the VPBF network is simpler than the
GRBF network, but the performance of the latter is better than that of the
former. However, it is difficult to conclude that the GRBF model is better than
the VPBF model or vice versa. On the same set of experiments, the Bayesian
method selection and identification with Gaussian noise assumptions leads to
very similar performance as the above but needed 11 and 16 basis functions
(hidden units) for the VPBF and GRBF networks (Kadirkamanathan, 1995).
The identified model here is much simpler.
Example 4.2
Consider the following underlying nonlinear function to be approximated.
(4.61)
where x is a variable. A random sampling of the interval [-4,4] is used in
obtaining the 40 input output data items for approximation.
In order to see the effect of noise, the output of the function f to a given
input x is given by
f(x) = j*(x) + e (4.62)
where e is a mixed noise. The noise consists of uniformly and normally dis-
tributed noises, i.e.,
(4.63)
where eU[O,O"] is a zero mean uniform noise with finite variance (J" and eD[O,O"]
is a zero mean normal noise with finite variance (J". It is assumed that the
uniform noise eU[O,O"] and the normal noise eD[O,O"] are uncorrelated. Thus, the
mean and variance of the mixed noise e are zero and (J", respectively.
Here, the Gaussian radial basis function network was used to approximate
the nonlinear function by the multiobjective identification algorithm developed
in Section 4.4. Three cases were considered in this simulation. The first used
three performance functions during approximation. The second considered two
performance functions. The third used only one performance function. Actu-
ally, the following cases were taken into account: (a) Case 1: [cPdp) , cP2(P),
cP3(P)], (b) Case 2: [cPl(P), cP3(P)] and (c) Case 3: [cPl(P)]'
The effects of the mixed noise with different variance on the performance
functions cPl (p), cP2 (p) and cP3 (p) for the above three cases are illustrated in
Figures 4.11-4.13, respectively. It can be seen from the simulation results that
the performance of the approximation of the nonlinear function changes little
at low level variance of noise and the multiobjective case using three perfor-
mance criteria gives a good approximation even though the three performance
functions conflict with each other.
4.5 Examples 99
4 _X_X_ [phi1, phi2, phi3]

_0_0_ [phi1, phi3]
~ _' __'_ [phi1]
c:
o
'nc: 3
.2
Q)
Q
c:
'"
E
.g2
Q)
c.
Q)
.<::
f-
OL-----~------~--------~------~--------~~~--~
10-2 10- ' 100
The noise variance
Fig. 4.11. Performance function 1>1 (p) against noise variance (J'
2 -'_X_ [phi1, phi2, phi3]

_0_0_ [phi1, phi3]
_'_'_ [phi1]
c:
o
'nc: 1.5
.2
Q)
Q
c:
'E"
.@
Q)
c.
Q)
.<::
f-
10-1
The noise variance
Fig. 4.12. Performance function 1>2 (p) against noise variance (J'
18
~ 16
D.
c
o
t5
.2 14 -'_X_ [phi 1, phi2, phi3]
_0_0_ [phi1, phi3]
_'_'_ [phi1]
6L------------~~~~-----~~~--~~
10-2 10-' 10°
The noise variance
Fig. 4.13. Performance function CP3 (p) against noise variance (]'
4.6 Summary
This chapter has addressed the problems of model selection and identification
of nonlinear systems using neural networks, genetic algorithms and multiob-
jective optimisation techniques. Three performance functions that measure
approximation accuracy and model complexity are proposed as the multiob-
jective criteria in the identification task. They are the L 2 - and Loa-norms of
the difference measurements between the real nonlinear system and the non-
linear model, and the number of nonlinear units in the nonlinear model. The
optimisation is carried out using genetic algorithms which select the nonlinear
function units to arrive at the simplest model necessary for approximation,
along with optimising the multiobjective performance criteria. Volterra poly-
nomial basis function networks and Gaussian radial basis function networks
are subjected to the algorithm in the task of a liquid level nonlinear system
identification. The model selection procedure results in determining the rel-
evant linear and second order nonlinear terms for the VPBF model and in
selection of the basis function centres for the GRBF model. The experimental
results demonstrate the convergence of the developed algorithm and its ability
to arrive at a simple model which approximates the nonlinear system well.
The approach discussed in this chapter can also be extended in many ways,
for example, to adaptively modify the numerical bounds on the performance
functions. Furthermore, cross-validation techniques can be used to guide the
optimisation and also in the adaptation of bounds on the performance func-
tions.
CHAPTERS
WAVELET BASED NONLINEAR IDENTIFICATION
5.1 Introduction
The approximation of general continuous functions by nonlinear networks has
been widely applied to system modelling and identification. Such approxima-
tion methods are particularly useful in the black-box identification of nonlinear
systems where very little a priori knowledge is available. For example, neu-
ral networks have been established as a general approximation tool for fitting
nonlinear models from input output data on the basis of the universal approx-
imation property of such networks. There has also been considerable recent
interest in identification of general nonlinear systems based on radial basis
networks (Poggio and Girosi, 1990a,b), fu",,,,y sets and rules (Zadeh, 1994),
neural-fuzzy networks (Brown and Harris, 1994; Wang et al., 1995) and hing-
ing hyperplanes (Breiman, 1993).
The recently introduced wavelet decomposition (Grossmannand and Mor-
let, 1984; Daubechies, 1988; Mallat, 1989a; Chui, 1992; Meyer, 1993; IEEE,
1996) also emerges as a new powerful tool for approximation. In recent years,
wavelets have become a very active subject in many scientific and engineer-
ing research areas. vVavelet decompositions provide a useful basis for localised
approximation of functions with any degree of regularity at different scales
and with a desired accuracy. Recent advances have also shown the existence of
orthonormal wavelet bases, from which follows the variability of rates of con-
vergence for approximation by wavelet based networks. Wavelets can therefore
be viewed as a new basis for representing functions. Wavelet based networks
(or simply wavelet networks) are inspired by both feedforward neural networks
and wavelet decompositions. They have been introduced for the identification
of nonlinear static systems (Zhang and Benveniste, 1992) and nonlinear dy-
namical systems (Coca and Billings, 1997; Liu et al., 1998, 2000).
This chapter presents a wavelet network based identification scheme for
nonlinear dynamical systems. Two kinds of wavelet networks are studied: fixed
and variable wavelet networks. The former are used for the case where the esti-
mation accuracy is assumed to be achieved by a known resolution scale. But, in
practice, this assumption is not realistic because the nonlinear function to be
identified is unknown and the system operating point may change with time.
Thus, variable wavelet networks are introduced to deal with this problem. The
basic principle of the variable wavelet network is that the number of wavelets
in the network can either be increased or decreased over time according to a
102 5. Wavelet Based Nonlinear Identification
design strategy in an attempt to avoid overfitting or underfitting. In order to

model unknown nonlinearities, the variable wavelet network starts with a lower
resolution scale and then increases or reduces this according to the novelty of
the observation, which is ideally suited to on-line identification problems. The
objective of variable wavelet networks is to gradually approach the appropri-
ate network complexity that is sufficient to provide an approximation to the
system nonlinearities.
The parameters of the wavelet network are adjusted by adaptation laws de-
veloped using a Lyapunov synthesis approach. The identification algorithm is
performed over the network parameters by taking advantage of the decompo-
sition and reconstruction algorithms of a multiresolution decomposition when
the resolution scale changes in the variable wavelet network. Combining the
wavelet network and Lyapunov synthesis techniques, the identification algo-
rithm developed for continuous dynamical nonlinear systems guarantees the
stability of the whole identification scheme and the convergence of both the
parameters and estimation errors. The wavelet network based identification
scheme is realised using B-spline wavelets and it is shown how to calculate
decomposition and reconstruction sequences needed for identification using
variable wavelet networks.
5.2 Wavelet Networks
Wavelets are a class of functions that have some interesting and special prop-
erties. Some basic concepts about orthonormal wavelet bases will be intro-
duced initially. Then the wavelet series representation of one-dimensional and
multidimensional functions will be considered. Finally, wavelet networks are
introduced.
5.2.1 One-dimensional Wavelets
The original objective of the wavelet theory is to construct orthogonal bases in

L2(R). These bases are constituted by translations and dilations of the same
function 'l/J. It is preferable to take 'l/J as localised and regular. The principle of
wavelet construction is the following:
(a) the function ¢(x - k) are mutually orthogonal for k ranging over N;
(b) ¢ is a scaling function and the family ¢(2 j x - k) constitutes an orthogonal
basis of L 2 (R);
(c) the wavelet is defined as 'l/J and the family 'l/J(2 j x - k) constitutes an or-
thogonal basis of L2(R).
It can also be proved that the family {¢(2jo x - k), 'l/J (2j x - k), for j 2: jo} also
forms an orthogonal basis of L2(R).
The wavelet subspaces Wj are defined as
(5.1)
5.2 Wavelet Networks 103
which satisfies
Wj n Wi = {0}, j i- i (5.2)
Any wavelet generates a direct sum decomposition of L2(R). For each j EN,
let us consider the closed subspaces:
Vj = ... EB Wj - 2 EB Wj - 1 (5.3)
of L 2 (R), where EB denotes the direct sum. These subspaces have the following
properties:
(i) ... C V-l C Va C V1 c ...
(ii) closeL2 (U
JEN
Vj) = L 2 (R)
(iii) nVj = {0}

JEN
(iv) Vj+l = Vj + Wj , j EN
(v) f(x) E Vj {::::::} f(2x) E Vj+l,j EN
Hence, the sequence of subspaces Vj is nested, as described by property (i).

Property (ii) shows that every function f in L2(R) can be approximated as
closely as desirable by its projections, denoted by Pj f in Vj. But, by decreas-
ing j, the projections Pj f could have arbitrarily small energy, as guaranteed
by property (iii). The most important intrinsic property of these spaces is
that more and more variations of Pjf are removed as j -+ -00. In fact, these
variations are peeled off, level by level in decreasing order of the rate of varia-
tions and stored in the complementary subspaces Wj as in property (iv). This
process can be made very efficient by an application of property (v).
If ¢ and 'l/J are compactly supported, they give a local description, at dif-
ferent scales j, of the considered function. The wavelet series representation of
the one-dimensional function f(x) is given by
f(x) = L ajok¢jok(X) +L L bjk'l/Jjk(X) (5.4)

kEN j?jo kEN
where ¢jok(X) = 2jo/2¢(2jox - k), 'l/Jjdx) = 2j / 2'l/J(2 j x - k), and the wavelet
coefficients ajok and bjk are
(5.5)
(5.6)
5.2.2 Multi-dimensional Wavelets
The wavelet series representation can easily be generalised to any dimension

n. For the n-dimensional case x = [Xl, X2, ... , Xn], we introduce the scaling
function
(5.7)
and the 2n - 1 mother wavelets !J)i(X) , i = 1,2, ... , 2n - 1, are obtained by

substituting some ¢(Xj) by 1j;(Xj) in (5.7). Then the following family is an
orthonormal basis in L2 (R n ):
(5.8)
for j:::: jo, j EN, k = [k1,k2' ... ,kn] E Nn, and

<Pjk(X) = 2jn/2<p(2jx1 - k1' 2j X2 - k2' ... , 2j x n - k n ) (5.9)
,T,(i) (X) -- 2jn / 2,T,

"'jk "'i (2j X1- k 1, 2 X2- k 2,···, 2 Xn - k n )
j j
(5.10)
For f (X) E L2 (R n), the n-dimensional wavelet series representation of the

function f (x) is
(5.11)
where the wavelet coefficients are
ajok =< f(x), <Pjok(X) > (5.12)

(i) _
bjk -< f(x), !J)jk(i) (x) > (5.13)
5.2.3 Wavelet Networks
For system identification, f(x) is unknown. Then the wavelet coefficients ajok
and b;2 cannot be calculated simply by (5.12) and (5.13). As (5.8) shows,
constructing and storing orthonormal wavelet bases involves a prohibitive cost
for large dimensions n. In addition, it is not realistic to use an infinite number
of wavelets to represent the function f (x). So, we consider the following wavelet
representation of the function f(x):
(5.14)
j=jo kEBj i=l
where A jo ' Bj E Nn are the finite vector sets of integers and N E R 1 is a finite
integer. Since the convergence of the series in (5.11) is in L 2 (R n ),
5.3 Identification Using Fixed Wavelet Networks 105
Hence, given E > 0, there exists a number N* and vector sets A;o' B;o' B;0+1' ... ,
BN such that for N:2: N*, Ajo :;2 A;o and Bjo :;2 B;o' Bjo+1 :;2 B;o+1'···'
BN :;2 B N,
Ilf(x) - j(x)112 ::; E (5.16)
This shows that the required approximation accuracy of the function f by j

can be guaranteed by properly choosing the number N and the vector sets
A jo ' Bjo ' ... , BN . Following neural networks, the expression (5.14) is called a
wavelet network. In this network, the parameters ajok and b;2,
the number N
and the vector sets A jo , Bjo , ... , BN will jointly be determined from the data,
based on the scaling functions ¢ and the wavelets 'lj;.
5.3 Identification Using Fixed Wavelet Networks
Consider the multi-input multi-state (MIMS) continuous dynamical system

described by
i; = f(x, u), x(O) = xo (5.17)
where u E n Tx1 is the input vector, x E n dX1 is the state vector and f(·) =
[h (-), 12 (-), ... f d (-) V E L~ (n n) is an unknown nonlinear function vector. It is
also assumed that u, x are in compact sets.
Following the structure (5.14) of wavelet networks, at the resolution 2N +1,
the estimation j (x, u) of the function f (x, u) is the output vector of the wavelet
network which is expressed by
N 2n_1
j(x,u) = L AjokPjodx,u) +L L L Bj~lJr;~(x,u) (5.18)
j=jo kEBj i=1
where n = r + d, and Ajok' Bj~ E nn are the wavelet coefficient vectors, the
scaling function P jo k (x, u) and the wavelet functions lJr; ~ (x, u) are similarly
defined as p( x) and lJrj (x), respectively, by replacing x with (x, u).
Here, it is assumed that the number N and the vector sets A jo , Bj are given.
So, the wavelet network (5.18) for the estimation of the nonlinear function
f (x, u) is called a fixed wavelet network. Based on the estimation j (x, u) by
the fixed wavelet network, the nonlinear function f(x, u) can be expressed by
N 2n-1
f( x,u ) = '"' '" (X,U ) + '"'
~ A*jok'¥jok ~ '"'~ '"' jk '£'jk (X,U ) +EN
~ B(i)*,T,(i) (5.19)
j=jo kEBj i=1
where the optimal wavelet coefficient vectors A;ok and Bj~* are
< h (x, u), 1

r
< h(x, u), <Pjok (x, u) >
(5.20)
< fd(X, u), <Pjok(X, u) >
< h (x, u), tJrj~() (x, u) >

B jk
(i)* - < h(x, u), tJrj~() (x, u) >
- (5.21)
()
< fd(X, u), tJrj~ (x, u) >
EN = [E Nl , E N2, ... , ENd] T is the modelling error vector which is assumed to be
bounded by
(5.22)
Modelling the nonlinear function vector f(x, u) using wavelet networks gives
the following identification model for the nonlinear dynamical system (5.17):
~ = A(x - x) + j(x, u), x(O) = Xo (5.23)
where x denotes the state vector of the network model and A E n dxd is a
Hurwitz or stability matrix (i.e., all the eigenvalues are in the open left-half
complex plane).
Define the state error vector and wavelet coefficient error vectors as
ex x-x (5.24)
Ajok Ajok - Ajok (5.25)
iJ(i) B(i)* _ B(i)

jk jk jk (5.26)
so that the dynamical expression of the state error is given by

N 2n-l
ex = A ex + '"'
~ A- jok'¥jok
J. ( x, U ) + '
~ "' ~ '~
'"'"' B- jk '¥jk ( x, U )
(i),T,(i) + EN (5.27)
j=jo kEBj i=l
(5.28)
where P = {Pij} E n dxd is chosen to be a positive definite matrix so that the

matrix Q = - P A - AT P is also a positive definite matrix, and Qjo and {3j are
positive constants which will appear in the parameter adaptation laws, also
referred to as the adaptation rates.
The first derivative of the Lyapunov function V with respect to time t is
5.3 Identification Using Fixed Wavelet Networks 107
(5.29)
Substituting (5.27) into (5.29) gives
(5.30)
Since Ajok and Bj2* are constant vectors,

:. (i) _ . (i)
Bjk - -B jk (5.31)
If there is no modelling error, i.e., eN = 0, Ajok and B;2 can simply be

estimated by the following adaptation laws:
(5.32)
(5.33)
In the presence of a modelling error eN, several algorithms can be applied to

ensure the stability of the whole identification scheme, e.g., fixed or switching
CT-modification, e-modification and dead-zone methods.
Define the following sets:
F-(r, M) = {z : Ilzll < M or (11zll = M and e; pzr ~ O)} (5.34)
F+(r, M) = {z : Ilzll = M and e; pzr < O} (5.35)
where z E n d , r is a function and M is a positive constant. Here, in order to

avoid parameter drift in the presence of modelling error, the application of the
projection algorithm (Goodwin and Mayne, 1987) gives the following adaptive
laws for the parameter estimates Ajok and B;2:
(5.36)
B'(i)
jk -- {
(5.37)
where Mjok, Mj~ are the allowed largest values of IIAjok11 and IIB3~11, respec-
tively. It is clear that if the initial parameter vectors are chosen such that
Ajok(O) E F-(Pjok,Mjok)U F+(Pjok, Mjok) and B3~(0) E F-(l]/j~,Mj~)U
F+(l]/j~), Mj~), then the vectors Ajok and B3~ are confined to the sets
J '¥Jok, M.Jok ) UJr+(",.
r-(",. Jok an d Jr-('T,(i)
'¥Jok, M·) jk U Jr+('T,(i)
'¥jk' M(i)) 't'jk' M(i))
jk' respec-
tively. Using the adaptive laws (5.36) and (5.37), (5.30) becomes
d d
V < -e~Qex + 2 L L IpijllexillcNjl

i=l j=l
d d
< -e~Qex + 2 L L IpijllexilEN (5.38)

i=l j=l
where ex = [e x1,ex2, ... ,exdl T .

For the sake of simplicity, the positive definite matrix Q is assumed to be
diagonal, i. e., Q = diag[ q1 , Q2, ... , Qdl. Also define
(5.39)
where ( is a positive variable, i.e., ( 2: O.

If there is no modelling error (i.e., EN = 0), it is clear from (5.38) that if is
negative semidefinite. Hence the stability of the overall identification scheme
is guaranteed and ex -+ 0, Ajok -+ 0, B3~ -+ O. In the presence of modelling
errors, if ex ~ 8(EN), it is easy to show from (5.38) that if is still negative
and the state error ex will converge to the set 8(EN). But, if ex E 8 (EN ), it
is possible that if > 0, which implies that the weight vectors Ajok and B3~
may drift to infinity over time. The adaptive laws (5.36) and (5.37) avoid this
drift by limiting the upper bounds of the parameters. Thus the state error ex
always converges to the set 8(EN) and the whole identification scheme will
remain stable in the presence of modelling errors.
5.4 Identification Using Variable Wavelet Networks
For nonlinear systems, the system operation can change with time. This will
result in an estimation error for the fixed wavelet network that is beyond the
5.4 Identification Using Variable Wavelet Networks 109
required error. In order to improve the identification performance, both the

structure and the parameters of the wavelet network model need to be modified
in response to variations of the plant characteristics. This section takes into
account modification of the wavelet network structure and adaptation of the
parameters.
It is known that the modelling error EN can be reduced arbitrarily by
increasing the resolution of the wavelet network. But, generally when the reso-
lution is increased beyond a certain value the modelling error EN will improve
very little by further increasing the resolution. This will also result in a large
sized network even for a simple nonlinear system and in practice, this is not re-
alistic. In most cases, the required modelling error can be given by considering
the design requirements and specifications of the system. Thus, the problem
now is to find a suitable sized network to achieve the required modelling er-
ror. Following variable neural networks (Liu et al., 1996b), a variable wavelet
network is introduced below.
5.4.1 Variable Wavelet Networks
Generally speaking, a variable wavelet network has the property that the num-
ber of wavelons in the network can be either increased or decreased over time
according to a design strategy. For the problem of nonlinear modelling, the
variable wavelet network is initialised with a small number of wavelons. As
observations are received, the network grows by adding new wavelons or is
pruned by removing old ones.
According to the multiresolution approximation theory, increasing the res-
olution of the network will improve the approximation. To improve the ap-
proximation accuracy, the growing network technique (Kadirkamanathan and
Niranjan, 1993; Liu et al., 1996c) is applied. This means that the wavelets at a
higher resolution need to be added to the network. Here it is assumed that at
the resolution 2N the approximation of the function f by the wavelet network
is denoted as j(N). Based on the growing network technique and the structure
J
of the function in (5.18), the adding operation is defined as
2n_l
J(x, u) = j(N)(x, u)ffi L L Bj!)kPti(x, u) (5.40)

kERN i=l
where ffi denotes the adding operation. Equation 5.40 means that wavelets at
the resolution 2N+1 are added to the network. To add new wavelons to the
network the following two conditions must be satisfied: (a) The modelling error
must be greater than the required accuracy. (b) The period between the two
adding operations must be greater than the minimum response time of the
adding operation.
The removing operation is defined as
(5.41)
where 8 denotes the removing operation. Equation 5.41 implies that wavelets
at the resolution 2N are removed from the network. Similarly, to remove some
old wavelons from the network, the following two conditions must be satisfied:
(a) The modelling error must be less than the required accuracy. (b) The pe-
riod between the two removing operations must be greater than the minimum
response time of the removing operation.
In both the adding and the removing operations, condition (a) means that
the change of the modelling error in the network must be significant. Condition
(b) says the minimum response time of each operation must be considered.
From the set 8(EN) which gives a relationship between the state error ex
and the modelling errors EN, it can be shown that the state error depends on
the modelling error. If the upper bound EN of the modelling error is known,
then the set 8 (EN ) to which the state error will converge is also known. How-
ever, in most cases the upper bound EN is unknown.
In practice, systems are usually required to keep the state errors within
prescribed bounds, that is,
(5.42)
where ei is the required accuracy. At the beginning, it is very difficult to know

how many wavelons are needed to achieve the above identification require-
ments. In order to find a suitable sized network for this identification problem,
lower and upper bounds are set for the state errors which are functions of time
t. A variable network such that
(5.43)
is then tried, where 1]f(t),r//(t) are mono decreasing functions of time t, re-
spectively. For example,
1]; (t) = e-~ut1]; (0) (5.44)
1]f(t) = e-~d1]f(O) (5.45)
where f3u, f3L are positive constants, 1]; (0), 1]f (0) are the initial values. It is
clear that 1]; (t), 1]f (t) decrease with time t. As t -+ 0, 1]; (t), 1]f (t) approach O.
Thus, in this way the state errors reach the required accuracy given in (5.42).
From the relationship between the modelling error and the state error and
given the lower and upper bounds 1];(t),1]f(t) + ei of the state errors the
corresponding modelling error should be
(5.46)
From (5.39) the area that the set 8(() covers is a hyperellipsoid with centre
.. " (5.4 7)
5.4 Identification Using Variable Wavelet Networks 111
It can also be deduced from the set 8(EN(t)) that the upper bound eu(t) and
the lower bound edt) are given by
(5.49)
Thus, given the upper and lower bounds of the state error, the corresponding
values for the modelling error can be estimated by (5.48) and (5.49).
5.4.2 Parameter Estimation
To smooth the identification performance when the adding and removing op-
erations are used, the decomposition and reconstruction algorithms of a mul-
tiresolution decomposition are applied to the initial calculation of the wavelet
coefficients. Here two important relations are introduced. First, since the fam-
ily {Pk} spans Va, then {p(2x - k)} spans the next filter scale VI = Va ttl Wa.
Both the scaling function and the wavelet function can be expressed in terms
of the scaling function at the resolution 2j = 21, i.e.,
(5.50)
kEN
tPi(x) = J2 L d~i)p(2x - k) (5.51)

kEN
where Ck and d~i) are known as the two scale reconstruction sequences. Second,
any scaling function p(2x) in VI can alternatively be written using the scaling
function p(x) in Va and wavelet function tP(x) in Wa as
(5.52)
where ak and b~i) are known as the decomposition sequences, and I E Nn.
In addition, in terms of multiresolution decompositions, the approximation
of the function f(x, u) at the resolution 2j can be written as
j(j)(x,u) = L A(j-l)kP(j-l)k(X,U)
kEAj-l
(5.53)
where
L A(j-l)kP(j-l)k (x, U) L A(j-2)k P (j-2)k (x, U)

kEAj-l kEAj-2
Hence, if the state error ex rf. 8(EU(t)), the network needs more wavelets. Add
the wavelets at the resolution 2N+1 into the network. Following the adding
operation (5.40) and the expression (5.53) of j(j) (x, u) at j = N, the structure
of the approximated function j(x, u) is of the form
(5.55)
The parameter vectors ANk and B}Yk are adapted by the laws (5.36) and (5.37).
U sing the sequences Ck and d~i), the initial values after the adding operation
are then given by the reconstruction algorithm (Mallat, 1989b) below.
(5.56)
(5.57)
where A(N-l)k and B~2-1)k are the estimated values before the adding oper-
ation.
If the state error ex E 8(EL(t)), some wavelets need to be removed because
the network may be overfitted. In this case remove the wavelets associated
with the resolution 2N. In terms of the removing operation (5.41) and the
expression (5.53) of j(j)(x,u) at j = N, the structure of the approximated
function j (x, u) is of the following form:
j(x,u) = L A(N-2)k P (N-2)k(X,U)

kEAN-2
The adaptive laws for the parameters A(N-2)k and B~2-2)k are still given
by (5.36) and (5.37). But, using the sequences ak and b~i), the initial values
5.5 Identification Using B-spline Wavelets 113
after the removing operation are then changed by the decomposition algorithm
(Mallat, 1989b) as follows:
A(N-2)k(0) = L al-2k A Nl (5.59)

IEAN-2
(5.60)
where ANk and B~~ are the estimated values before the removing operation.
Clearly, in both the above cases, the adaptive laws of the parameters are
still given in the form of (5.36) and (5.37), based on the above changed param-
eters. It also follows that the convergence area of the state error vector begins
with 8(c:u(0)) - 8(c:L(0)) and ends with 8(s), where s = c:u(oo).
The determination of the vector sets Aj and B j, for j = jo, ... , N is also
important but simple. The basic rule for choosing these sets is to make sure
2 j x - k, for k E Aj or Bj , is not out of the valid range of the variables of the
scaling function <1>(.) or !J)i (.), respectively.
5.5 Identification Using B-spline Wavelets
One of the main ingredients in wavelet network based identification is the

structure of the scaling function and the formulation of the wavelet decompo-
sition and reconstruction. For many applications, it will not be essential for
the wavelets to be orthonormal. Relaxing the orthonormality condition results
in nonorthogonal multiresolution approximations and provides a more flexible
framework for function approximation. A typical choice of scaling functions
would be B-splines. B-splines are piece-wise polynomial functions and have
good local properties. They are very simple to implement on the computer
and can be made as smooth as desired. For these reasons, B-splines have been
used widely in interpolation problems. In the present section therefore wavelets
which use B-spline functions as scaling functions are discussed.
5.5.1 One-dimensional B-spline Wavelets
For the sake of simplicity, one-dimensional B-spline wavelets will be considered

initially. The one-dimensional B-spline function of m-th order is defined by the
following recursive algorithm (Chui, 1992; Wang et ai., 1995)
x m-x
Bm(x) = --Bm-1(x)
m-1
+ --Bm-1(x
m-1
-1) (5.61)
for m E N+ \ {O, I}, where
xE(O,l)
(5.62)
otherwise
Let the m-th B-spline function be the scaling function, that is,
(5.63)
Then both the scaling function and the wavelets can be expressed in terms of
the scaling function at the resolution 2j = 21
m
(5.64)
k=O
3m-2
1fJ(x) = L dk B m(2x - k) (5.65)
k=O
where the two scale reconstruction sequences Ck and d k are given by (Chui,
1992)
Ck = 21 - m (~) (5.66)
dk = (_I)k21-m f (7)
z=o
B 2m (k + l-l) (5.67)
Also, the relationship between the scaling functions B m (2x) and Bm(x) and
the wavelet 1fJ (x) can be expressed as
Bm(2x -l) = L(az-2kBm(X - k) +b Z- 2k 1fJ(X - k)), lEN (5.68)

k
where the decomposition sequences ak and bk are given by (Chui, 1992)
1
ak = 2g-k (5.69)
1
bk = 2h_k (5.70)
and gk and hk are determined from the rational functions
G( Z) -- ~ '~gkZ
" k -_ Z-1 (I+Z)m E 2m - 1(Z) (5.71 )
2 k 2 E 2m - 1 (Z2)
(
H Z
) =~"'h"
2 ~ kZ
k= __ l(I-Z)m (2m-I)!
Z 2 E 2m _ 1(Z2) (5.72)
k
and E 2m - 1 is the Euler Frobenius polynomial of order 2m - 1.

It is clear from the above that based on the B-splines the scaling function
¢(x) and the wavelet 1fJ(x) are easily constructed using (5.63) and (5.65), and
the reconstruction sequences Ck, d k and the decomposition sequences ak, bk
are readily calculated from (5.66), (5.67), (5.69) and (5.70), respectively.
5.5 Identification Using B-spline Wavelets 115
5.5.2 n-dimensional B-spline Wavelets
The one-dimensional B-spline wavelet case will now be extended to the n-

dimensional B-spline wavelet case. The m-th B-spline function will still be the
one-dimensional scaling function, i.e., ¢(x) = Bm(x). Then from (5.7), the
n-dimensional scaling function is given by
n
p(x) = II Bm(xd (5.73)
1=1
Using the relations (5.63) and (5.64) gives

m m m n
(5.74)
which leads to
p(x) = I>kP(2x - k) (5.75)

k
where Ck = Ck, Ck 2 •• ,Ckn .

The wavelets tPi(x) are a combination of n functions from the function
set {¢(xd, ¢(X2), ... , ¢(x n ), ?j!(xd, ?j!(X2), ... , ?j!(x n )}. The wavelets tPi(x) can
similarly be expressed as the n-dimensional scaling function p(x). For example,
if tP2(X) = ?j!(Xd¢(X2)?j!(X3) ... ?j!(xn), then using (5.64) and (5.65) results in
3m-2 m 3m-2 3m-2 n
(5.76)
In this case,
tP2(X) = L d~2)p(2x - k) (5.77)

k
where d~2) = dk, Ck2 dk3 .. .dkn . Thus, all sequences {d~i)} can be calculated in
the same way as d~2).
With (5.68), the relationship between the scaling functions p(2x) and p(x)
and wavelets tPi (x) can be expressed as
n
p(2x -l) = II L(ali-2kiBm(Xi - ki ) + bli-2ki?j!(Xi - ki)), l E Nn (5.78)
i=1 ki
which results in the following compact form
(5.79)
where ak and bk can simply be calculated using ak and bk .

It is clear from the above that the scaling function, the wavelet, the re-
construction sequences and the decomposition sequences for the n-dimensional
case can be computed directly from the those obtained for the one-dimensional
case. Therefore, the structure of the scaling function and the formulation of
the wavelet decomposition and reconstruction for the wavelet network based
identification are completed using B-splines.
5.6 An Example
Consider a nonlinear system described by
(5.80)
where the input u = 0.5( cos(1.2t) sin(l. 7t) + exp( - sin( t 4 ))). Since n = 2,
we will need 2-D B-spline wavelets for the wavelet network to identify this
nonlinear dynamical system. Fourth order B-splines were used as the scaling
function. Thus, the 2-D scaling function is given by
(5.81)
where B4(') is the fourth-order B-spline, which is a piecewise cubic function.

For n = 2, there are three 2-D mother wavelets expressed by
tJf1 (x,u) = B 4(x)'ljJ(u) (5.82)

tJf2 (x, u) = 'ljJ(X)B4(U) (5.83)
tJf3 (x, u) = 'ljJ(x)'ljJ(u) (5.84)
where the one-dimensional mother wavelet 'ljJ(x) is
10
'ljJ(x) = {; ~
4
-=s- (4)
( l)k
I Bs(k + 1 -1)B4(2x - k) (5.85)
The 2-D scaling function p( x, u) and the three 2-D wavelets tJf1 (x, u), tJf2 (x, u),
tJf3 (x, u) are shown in Figures 5.1-5.4. The state x and the nonlinear function
f(x, u) (or the state derivative x) are shown in Figures 5.5 and 5.6, respectively.
Wavelet networks at the resolutions 2j , for j = 0,1,2,3 were used for the
identification with 16, 81, 146 and 278 wavelons, respectively. The state errors
and the modelling errors for different resolutions are shown in Figures 5.7-5.14.
All figures denoted (b) are larger-scale versions of the figures denoted (a).
As expected, at the beginning of the identification larger state errors and
modelling errors exist. After a while, these errors become smaller and smaller,
and finally they converge to certain ranges. It is clear from the simulation
results that the whole identification scheme is stable from the beginning to
the end. It has also been shown that the state error and the modelling error
5.6 An Example 117
decrease with increase in the resolution of the wavelet networks. But, the state
error and the modelling error are improved only slightly when the resolution
becomes adequate. Thus, for nonlinear dynamical system identification using
wavelet networks, a proper resolution should be chosen so as to achieve the
desired practical identification requirements.
0.5
0.4
0.1
o
4
4
-1 -1
x
Fig. 5.1. Scaling function p(x, u)
0.15
0.1
0.05
S 0
:6-
'iii
D..
-0.05
-0.1
-0.15
-0.2
4
3 6
2 5
4
3
o 2
-1 0
x
Fig. 5.2. Wavelet function Pl(X, u)

0.15
0.1
0.05
S" 0
x
~ -0.05
0..
-0.1
-0.15
-0.2
6
4 4
3
2 2
o
o
-2 -1
x
Fig. 5.3. Wavelet function l/h(x, u)
0.1
0.05
S"
:6- 0
'"'iii
0..
-0.05
-0.1
6
6
4 5
4
2 3
2
o 0
Fig. 5.4. Wavelet function P3(X, u)

5.6 An Example 119
0.2,-------,--------,--------,-------,-----,------,
0.1
~ -0.1
(ii
1i)
~ -0.2
~Q)
F -0.3
-0.4
-0.5
_0.6L----L----~----L----L---~---~
o 5 10 15 20 25 30
timet
Fig. 5.5. System state x(t)
0.5
0.4 A
A
0.3
S
~
0.2 A A A
1\ A
c
0 A
nc 0.1
A
.2
~
Q)
.!; 0
c
0
c
i!! -0.1 V V
f-
V V v
-0.2
V
-0.3 V
V v
V
-0.4
0 5 10 15 20 25 30
timet
Fig. 5.6. Nonlinear function f(x,u) (or the state derivative x)

0.1
0.05
eQ;
fil
11)
0
Q)
.c
f- -0.05
-0.1
0 5 10 15 20 25 30
(a) timet
0.03
0.02
eQ;
Q)
0.01
(ii
11)
Q)
0
.c
f-
-0.01
-0.02
0 5 10 15 20 25 30
(b) timet
Fig. 5.7. State error x(t) - x(t) at the resolution 2°
eQ; 0.5
Ol
§
Qi 0
"0
0
E
~ -0.5
f-
-1
0 5 10 15 20 25 30
(a) timet
0.1
g 0.05
Q)
OJ
.!';
Qi 0
"0
0
E
~ -0.05
f-
-0.1
0 5 10 15 20 25 30
(b) timet
Fig. 5.8. Modelling error f(x, u) - j(x, u) at the resolution 2°

5.6 An Example 121
0.1
0.05
eQ;
fil
11)
0
Q)
.c
f- -0.05
-0.1
0 5 10 15 20 25 30
(a) timet
x 10- 3
10
eQ; 5
Q)
'"
11)
Q)
.c 0
f-
_5L-----~------~------~-------L-------L--~--~
o 5
(b) timet
Fig. 5.9. State error x(t) - x(t) at the resolution 21
eQ;
Ol
§
Qi
"0
0
E
~ -1
f-
-2
0 5 10 15 20 25 30
(a) timet
0.1
g
Q)
OJ 0.05
.!';
Qi
"0
0
E 0
Q)
.c
f-
-0.05
0 5 10 15 20 25 30
(b) timet
Fig. 5.10. Modelling error f(x, u) - j(x, u) at the resolution 21

0.1
eQ;
fil
11)
-0.1
0 5 10 15 20 25 30
(a) timet
x 10- 3
5
eQ;
Q)
'"
11)
Q)
.c
0
f-
-5
0 5 10 15 20 25 30
(b) timet
Fig. 5.1l. State error x(t) - x(t) at the resolution 22
eQ; 2
Ol
§
Qi
"0
0
E -1
-3
0 5 10 15 20 25 30
(a) timet
0.1
g 0.05
Q)
OJ
.!';
Qi 0
"0
0
E
j!! -0.05
f-
-0.1
0 5 10 15 20 25 30
(b) timet

5.6 An Example 123
0.1
eQ;
fil
11)
-0.1
0 5 10 15 20 25 30
(a) timet
4 x 10- 3
2
eQ;
Q)
'"
11)
Q)
.c
0
f- -2
-4
0 5 10 15 20 25 30
(b) timet
Fig. 5.13. State error x(t) - x(t) at the resolution 23
eQ;
Ol
§
Qi
"0
0
E
-4
0 5 10 15 20 25 30
(a) timet
0.1
g 0.05
Q)
OJ
.!';
Qi 0
"0
0
E
j!! -0.05
f-
-0.1
0 5 10 15 20 25 30
(b) timet

5.7 Summary
A wavelet network based identification scheme has been presented for nonlinear
dynamical systems. Two kinds of wavelet networks, fixed and variable wavelet
networks, were studied. Parameter adaptation laws were derived to achieve
the required estimation accuracy for a suitable sized network and to adapt to
variations of the characteristics and operating points in nonlinear systems. The
parameters of the wavelet network were adjusted using laws developed by the
Lyapunov synthesis approach. The identification algorithm was performed over
the network parameters by taking advantage of the decomposition and recon-
struction algorithms of a multiresolution decomposition when the resolution
scale changes in the variable wavelet network. By combining wavelet networks
with Lyapunov synthesis techniques, adaptive parameter laws were developed
which guarantee the stability of the whole identification scheme and the con-
vergence of both the network parameters and the state errors. The wavelet
network identification scheme was realised using B-spline wavelets and the
calculation of the decomposition and reconstruction sequences using variable
wavelet networks was given. A simulated example was used to demonstrate
the operation of the identification scheme.
CHAPTER 6
NONLINEAR ADAPTIVE NEURAL CONTROL
6.1 Introduction
Neural networks are capable of learning and reconstructing complex nonlinear
mappings and have been widely studied by control researchers in the design
of control systems. A large number of control structures have been proposed,
including supervised control (Werbos, 1990), direct inverse control (Miller et
ai., 1990), model reference control (Narendra and Parthasarathy, 1990), inter-
nal model control (Hunt and Sbararo, 1991), predictive control (Hunt et aL,
1992; Willis et ai., 1992), gain scheduling (Guez et aZ., 1988), optimal deci-
sion control (Fu, 1970), adaptive linear control (Chi et aZ., 1990), reinforce-
ment learning mntrol (Anderson, 1989; Barto, 1990), indirect adaptive mntrol
(Narendra and Parthasarathy, 1990; Liu et aZ., 1999a) and direct adaptive con-
trol (Polycarpou and Ioannou, 1991; Sanner and Slotine, 1992; Karakasoglu et
ai., 1993; Sadegh, 1993; Lee and Tan, 1993). The principal types of neural net-
works used for control problems are the multilayer percept ron neural networks
with sigmoidal units (Psaltis et ai., 1988; Miller et ai., 1990; Narendra and
Parthasarathy, 1990) and the radial basis function neural networks (Powell,
1987; Niranjan and Fallside, 1990; Poggio and Girosi, 1990a).
Most of the neural network based control schemes view the problem as
deriving adaptation laws using a fixed structure neural network. However,
choosing network structure details such as the number of basis functions (hid-
den units in a single hidden layer) in the neural network must be done a prior'l,
which often leads to either an overdetermined or an underdetermined network
structure. The problem with these control schemes is that they require all
observations to be available and hence are difficult for on-line control tasks,
especially adaptive control. In addition, fixed structure neural networks often
need a large number of basis functions even for simple problems.
This chapter is concerned with the adaptive control of continuous-time
nonlinear dynamical systems using variable neural networks. In variable neural
networks, the number of basis functions can be either increased or decreased
with time according to specified design strategies so that the network will not
overfit or underfit the data set. Based on Gaussian radial basis function variable
neural networks, an adaptive control scheme is presented. 'iVeight adaptive laws
developed using the Lyapunov synthesis approach ensure the overall control
scheme is stable, even in the presence of modelling error. The tracking errors
between the reference inputs and outputs converge to the required accuracy
126 6. Nonlinear Adaptive Neural Control
through the adaptive control algorithm derived by combining variable neural

network techniques and adaptive laws. The operation of an adaptive control
scheme using the variable neural network is demonstrated using two simulated
examples.
6.2 Adaptive Control
Adaptive control is an extension and generalisation of classical off-line feedback

control synthesis. In adaptive control systems, all or some of the controller pa-
rameters are adjusted automatically in response to on-line process observations
or output residuals. Adaptive control uses both a priori knowledge about the
controlled process and the automatic incorporation of acquired outline knowl-
edge based on observations of the process. Adaptation of the controller in re-
sponse to feedback measurements from the priori unknown controlled process
is central to adaptive control research.
In the early 1950s, there was extensive research on adaptive control in
connection with the design of autopilots for high performance aircraft, which
operate over a wide range of speeds and altitudes. In the 1960s the development
of adaptive control was affected by many contributions to control theory, such
as state space, stability theory and dynamic programming. There was a renais-
sance of adaptive control in the 1970s when different estimation schemes were
combined with various design methods. In the late 1970s and early 1980s, the
stability of adaptive systems was proved under very restrictive assumptions.
Investigation into these assumptions sparked interesting research into the ro-
bustness of adaptive control. In the 1990s, with the development of neural
networks, nonlinear adaptive control has attracted a number of researchers.
A number of adaptive control schemes have been developed in the last fifty
years, for example, model reference adaptive control, self-tuning regulators,
stochastic adaptive control and so on. Here, an interesting direct method,
called Lyapunov's second method, is introduced to investigate the stability of
a solution to a nonlinear differential equation. The key idea is to find a real
function in state space whose level curves enclose the equilibrium such that
the derivative of the state variable always points towards the interior of the
level curves. Let the nonlinear differential equation be
i; = f(x, t), f(O, t) = 0 (6.1)
where x E nn is the state vector. It assumes that the function f(x, t) is

such that a solution exists for all t 2: to and that the equilibrium point is at
the origin. This assumption involves no loss of generality because this can be
achieved through a simple coordinate transformation. The Lyapunov stability
theorem can be stated as follows:
Theorem 6.2.1. Let the function V(x, t) : nn+1 -+ n satisfy the following
conditions:
6.2 Adaptive Control 127
(a) V(O, t) = ° Vt E R.
(b) V(x, t) is differentiable in x ERn and t E R.
(c) V(x, t) is positive definite.
A sufficient condition for uniform asymptotic stability of the system in
(6.1) is then that the function V(x, t) is negative definite.
The proof of the theorem can be found in Vidyasagar (1978). When applying
Lyapunov stability theory to an adaptive control problem, we will get a time
derivative of the Lyapunov function V(x, t), which depends on the control
signal and other signals in the system. If these signals are bounded, system
stability can be ensured by the condition that V is negative semidefinite.
To illustrate that the Lypunove stability theorem can be used to design an
adaptive control law that guarantees the stability of the closed-loop system,
consider a linear system described by
y(n)(t) + an_ly(n-l)(t) + ... + aly(l)(t) + aoy(t)

= bmu(m)(t) + bm_1u(m-l)(t) + ... + b1u(1)(t) + bou(t) (6.2)
where y E Rand u E R are the output and the input of the system, respec-
tively, y(i) is the i-th derivative of the output with respect to time, ai and bi
are the unknown coefficients of the system.
Also, it is assumed that the reference model is
y[;:)(t) + O!n_ly[;:-l)(t) + ... + O!ly~)(t) + O!oYm(t)

= f3mu(m)(t) + f3m_lU(m-l)(t) + ... + f31U(1)(t) + f3ou(t) (6.3)
where Ym E R is the output of the model, O!i and f3i are the known coefficients
of the model.
Let the error be defined as
e(t) = Ym(t) - y(t) (6.4)
Subtracting (6.3) from (6.2) results in the following error differential equation:
n-l n-l m
(6.5)
i=O i=O j=o
where for i = 0,1, ... , n - 1 and j = 0,1, ... , m,

ai = ai - O!i (6.6)
bj = f3j - bj (6.7)
The state space form of the above system can be expressed as
x(t) = Ax(t) + L1a + L1b (6.8)

e(t) = Xl(t) (6.9)
where the state vector x E nn and the system matrices are
(6.10)
A = [ 0 (6.11)
-ao
(6.12)
Llb = [ 0 ... 0
m
~ b~u
~. (j)
(t)
1
T
(6.13)
and I n - 1 is an (n - 1) x (n - 1) identity matrix.

Assuming that all state variables of the system are measurable, a Lyapunov
function can be chosen as
v = ~(xTPx+a7 Aa+Frb) (6.14)
where A E nnxn and r(m+1) x (m+1) are the weighting matrices that are posi-
tive definite and are of a diagonal form:
A = diag[ Ao A2 (6.15)
r = diag[ /0 /1 (6.16)
P E nnxn is a symmetrical positive definite matrix and satisfies

(6.17)
Q E nnxn is a symmetrical positive definite matrix.

Taking derivative V with respect to time t gives
V = ~xT(PA+ATP)X+ ~ai (Ai~i+ (~XkPkn) y(i) (t))

+ ~bj (/jb j + (~XkPkn) uU)(t)) (6.18)
Since the coefficients ai and f3 j of the model are constant, ~i = ai and bi = - bi .

If the adaptive laws are chosen as
ai = -Ail (txkPkn) y(i)(t) (6.19)

k=l
bj = /;1 (tXkPkn) uU)(t) (6.20)

k=l
6.3 Adaptive Neural Control 129
for i = 0, 1, ... , n - 1 and j = 0,1, ... , m, then
(6.21)
is negative. Thus the function V will decrease as long as the error x is different
from zero. It can be calculated that the error will go to zero. This means that
the closed loop adaptive control system is stable.
6.3 Adaptive Neural Control
There remain a number of unsolved problems in nonlinear system control. In

particular, the design and implementation of adaptive control for nonlinear
systems is extremely difficult. In most cases the adaptive control strategies de-
veloped largely depend on the particular information on the nonlinear structure
of the plant to be controlled. The recent intensively studied neural networks
bring a new stage in the development of adaptive control for unknown non-
linear systems. If the relationship between the input and the output of an
unknown nonlinear plant is modelled by an appropriate neural network, the
model obtained can be used to construct a controller through any nonlinear
control design methods, e.g., inverse model synthesis, model reference and
label-feature generation. The whole procedure of the training and construc-
tion of the controller can be implemented on-line. The neural network model
is updated by measured plant input and output data and then the controller
parameters are directly adapted using the updated model. The general struc-
ture of a neural network based adaptive control system is shown in Figure
6.l.
Consider a class of continuous nonlinear dynamical systems which can be
expressed in the canonical form (Isidori, 1989; Nijmeijer and Schaft, 1990;
Slotine and Li, 1991):
y(n)(t) + F(y(n-l)(t), ... ,y(1)(t),y(t)) = G(y(n-l)(t), ... ,y(l)(t),y(t))u(t) (6.22)

where y(t) is the output, u(t) the control input, and F(.) and G(.) unknown
nonlinear functions. The above system represents a class of continuous-time
nonlinear systems, called affine systems. The above equation can also be trans-
formed to the state-space form
i; = Ax - bF(x) + bG(x)u (6.23)

(6.24)
where
(6.25)
Fig. 6.1. Neural network based adaptive control
b = [0,0, ... , IV and I n - 1 is an (n - 1) x (n - 1) identity matrix, and x =

[X1,X2,""X n V is the state vector.
With the use of Gaussian activation functions, each basis function in the
RBF network responds only to inputs in the neighbourhood determined by
the centre and width of the function. It is also known that if the variables of a
nonlinear function are bounded, the continuous function can be approximated
well by GRBF networks. Here, the GRBF networks are used to model the
nonlinearity of the system.
If Xi is not in a certain bounded area, we introduce the following one-to-one
(1-1) mapping (Liu et at., 1995):
_ bxiXi
Xi = I I for i = 1,2, ... ,n (6.26)
Xi + axi
where axi, b xi are positive constants, which can be chosen by the designer
(e.g., axi,b xi are 1). Thus, it is clear from (6.26) that Xi E [-bxi,bxiJ for Xi E
(-00, +(0). The above one-to-one mapping shows that in the n-dimensional
space the entire area can be transferred into an n-dimensional hypercube.
Clearly, if X is already in the bounded area, we only need to set X = x.
Based on a variable grid with m subgrids, which was detailed in Section 2.2,
the nonlinear function approximated by the GRBF network can be expressed
by
m mi
G(x)u - F(x) = L L(fi~j + g7+ j u )zp(x; ci+j, di+j) + c:(K) (6.27)

i=l j=l
where
m
K=Lmi (6.28)
i=l
i.p(X;Ck,dk)=exp{-:~llx-CkI12}, for k=1,2, ... ,K(6.29)
ci+j is the j-th element of the set Ci , mi the number of its elements, ft+j and
gi+ j the optimal weights, x = [Xl, X2, .•. , xnf the variable vector,
Ck the k-th
centre, d k the k-th width, c(K) the modelling error, and K the number of
basis functions. The nonlinear function G(x)u - F(x) approximated by neural
networks is shown in Figure 6.2. So, the next step is how to obtain estimates
of the weights.
G(x) u-F(x)
Fig. 6.2. Modelling of the nonlinear function G(x)u - F(x) using neural networks
Thus the nonlinear part G(x)u - F(x) of the system can be described by the
following compact form:
G(x)u - F(x) = (g*(K)u - j*(K)f <fJ(x, K) + c(K) (6.30)
where
(6.31)
j*(K) = [J{,f{, ... ,fkf (6.32)
(6.33)
It is known from approximation theory that the modelling error can be
reduced arbitrarily by increasing the number K, i.e., the number of linear
independent basis functions i.p(x; Ci, di ) in the network model. Thus, it is rea-
sonable to assume that the modelling error c(K) is bounded by a constant CK,
which represents the accuracy of the model and this is defined as
CK = sup Ic(K)1 (6.34)

tER+
Although CK can be reduced arbitrarily by increasing the number of indepen-

dent basis functions, generally when the number is greater than a small value
the modelling error CK is improved very little by increasing the number fur-
ther. It also results in a large sized network even for a simple problem, which
is not realistic in practice. In most cases, the required modelling error can
be obtained by considering the design requirements and specifications of the
system. Thus, the problem now is to find a suitable sized network to achieve
the required modelling error. In other words, it is how to determine the num-
ber, centres, widths and weights of the Gaussian radial basis functions in the
network.
The stability of the overall control scheme is an important issue in the
design of the system. The overall stability depends not only on the particular
control approach that is chosen but also on the control laws that are used. In
practice, one of the design objectives for a system is that the tracking error
between inputs and outputs should converge to the required accuracy. Those
problems are solved here by developing a stable adaptive control law based on
Lyapunov stability techniques and the variable GRBF network discussed in
Chapter 2.
We assume that the basis functions ip(x; Ck, d k ) for k = 1,2, ... , K are
given. The control objective is to force the plant state vector x to follow a
specified desired trajectory Yd = [Yd, y~l), ... , y~n-l)V. The tracking error vec-
tor and the weight error vectors, respectively, are defined as
e x - Yd (6.35)
j(K) j*(K) - f(K) (6.36)
g(K) g*(K) - g(K) (6.37)
where f(K) and g(K) are the estimated weight vectors. From (6.23), it can
be shown that
Ax + b(gT(K)u - fT(K))P(x; K)
+b(gT (K)u - jT (K))P(x; K) + bc(K) (6.38)
Hence, from (6.35)-(6.38), the dynamic expression of the tracking error is
Ae - by~n) + b(gT (K)u - F (K) )p(x; K)

+b(gT(K)u - jT(K))P(x; K) + bc(K) (6.39)
One approach to this problem is to take the control input satisfying
(6.40)
where the vector a = [al, a2, ... , anV makes the following matrix
1 0
o 1
(6.41 )
stable, i. e., all the eigenvalues are in the open left plane. The control in-
put consists of a linear combination of the tracking errors aTe, the adaptive
part fT(K)p(x,K) which will attempt to estimate and cancel the unknown
function F(.), and y~n) is a feedforward of the n-th derivative of the desired
trajectory.
Consider the following Lyapunov function
(6.42)
where P is chosen to be a positive definite matrix so that the matrix Q =

- P Aa - A~ P is also a positive definite matrix, and 0: and (3 are positive
constants which will appear in the adaptation laws, also referred to as the
learning or adaptation rates. Using (6.39), the derivative of the Lyapunov
function V with respect to time is given by
V(e,J, g) -eTQe + 2P;; e((gT(K)u - jT(K))p(x; K) + c(K))

+20:- 1jT(K)j(K) + 2(3-1gT(K)g(K) (6.43)
where the vector P n is the n-th row ofthe matrix l!, i. e., P n = [Pn1, Pn2, ... , Pnn].
Since 1* is a constant vector, we have that j = - j, similarly, g = -g. If
there is no modelling error, i.e., c(K) = 0, the weight vectors f and g can
simply be generated according to the following standard adaptation laws:
j(K) = -o:P;; ep(x; K) (6.44)

g(K) = (3P;; eup(x; K) (6.45)
In the presence of a modelling error c( K), to ensure the stability of the system,
many algorithms, e.g., the fixed or switching O"-modification (Ioannou and
Kokotovic, 1983; Ioannou and Tsakalis, 1986), c-modification (Narendra and
Annaswamy, 1987), the dead-zone methods (Narendra and Annaswamy, 1989;
Sastry and Bodson, 1989) and projection algorithm (Goodwin and Mayne,
1987; Ioannou and Datta, 1991; Polycarpou and Ioannou, 1991), can be applied
to modify the above standard adaptation laws.
Define the following sets:
(6.4 7)
92 = {g : Ilgll = M2 and P;; eug T p(x; K) > O} (6.49)

where M1 and M2 are positive constants.

Here, in order to avoid parameter drift in the presence of modelling error,
application of the projection algorithm gives the following adaptive laws for
the parameter estimates f and 9 (Liu et ai., 1999a):
-aPJ' e(x; K) if f(K) E F1

j(K) = {
-aPJ' e(x; K) + aM12 PJ' efT(K)(x; K)f(K) if f(K) E F2
(6.50)
/3PJ' eu(x; K) if g(K) E Q1

g(K) = { .
/3PJ' eu(x; K) - /3M.;2 PJ' eugT(K)(x; K)g(K) if g(K) E Q2
(6.51)
It is clear that if the initial weights are chosen such that f(K,O) E F1 U F2
and g(K,O) E ~h U Q2, then the weight vectors f and 9 are confined to the
sets F1 UF2 and Q1 UQ2, respectively. With use of the adaptive laws (6.50)
and (6.51), Equation 6.43 becomes
n
V(e,J,g) ::; -eTQe + 2 L IPnilleilcK (6.52)
i=l
For the sake of simplicity, the positive definite matrix Q is assumed to be
diagonal, i.e., Q = diag[q1,q2, ... ,qn], where qi > 0, for i = 1,2, ... ,n. Also
define
(6.53)
where ( is a positive variable, i.e., ( :2: O.

If there is no modelling error (i.e., CK = 0), Equation 6.52 can be written
as
n
V(e,J,g)::; - Lqie; (6.54)
i=l
The above clearly shows that V is negative semidefinite. Hence the stability
of the overall identification scheme is guaranteed and
e -+ 0, j -+ 0, g -+ 0 (6.55)
On the other hand, in the presence of modelling error, Equation 6.52 can be
expressed as
. ~ ~ n
V(e,f,g)::; - Lqi
( IPni I
leil- - . cK
)2 + Ln -2. -2
PnicK
(6.56)
i=l q~ i=l q~
6.4 Adaptation Algorithm with Variable Networks 135
It is easy to show from the above that if e rf- 8(EK), V is still negative and
the tracking errors will converge to the set 8(EK)' But, if e E 8(EK), it is
possible that V > 0, which implies that the weight vectors f(K) and g(K)
may drift to infinity over time. The adaptive laws (6.50) and (6.51) avoid this
drift by limiting the upper bounds of the weights. Thus the tracking error
always converges to the set 8(EK) and the overall control scheme will remain
stable in the case of modelling error.
6.4 Adaptation Algorithm with Variable Networks
The set 8(EK), which gives a relationship between the tracking and modelling
errors, indicates that the tracking error depends on the modelling error. If the
modelling error EK is known, then the set 8(EK) to which the tracking error
will converge can be worked out. However, in most cases the upper bound EK
is unknown.
In practice, control systems are usually required to keep the tracking errors
within prescribed bounds, that is,
leil :::; EiO, for i = 1,2'00" n (6.57)
where EiO is the required accuracy. At the beginning, it is very difficult to

know how many neural network units are needed to achieve the above control
requirements. In order to find a suitable sized network for this control problem,
first set lower and upper bounds for the tracking errors, which are functions
of time t, and then try to find a variable network such that
leiIE[Llf(t), Llf(t)+EiO], for i=1,2,oo.,n (6.58)
where Llf (t) and Llf (t) are mono decreasing functions of time t, respectively.
Those bounds are usually defined as
Llf(t) = f3&Llf(O) (6.59)

Llf(t) = f3iLlf(O) (6.60)
where f3u and f3L are constants and less than 1, Llf (0) and Llf (0) are the initial
values. It is clear that Llf (t) and Llf (t) decrease with time t. As t -+ 00, Llf (t)
and Llf(t) approach O. Thus, in this way the tracking errors reach the required
accuracy given in (6.57).
The relationship between the modelling error and the tracking error shows
that, given the lower bound Llf (t) and upper bound Llf (t) +EiO of the tracking
errors, the modelling error corresponding to the above should be
EK(t) E [Edt), EU(t)] (6.61)
Since the area that the set 8( () covers is a hyperellipsoid with the centre
... , (6.62)
it can be deduced from the set 8(cK(t)) given by (6.53) that the upper bound
cu(t) and the lower bound cdt) are given by
cdt) =. ~ax
z=1,2, ... ,n
(IPnil
qi
+ (t P;j) 0.5)_0.5
.
J=1
qiqJ·
L1f(t) (6.63)
cU(t) = . ~in
z=1,2, ... ,n
(IPnil
qi
+ (t P;j ) 0.5) -0.5
j=1 qiqj
(L1f (t) + ciO) (6.64)
It has been shown in Section 2.2 that a variable GRBF network is determined
by a set of parameters, which are as follows:
(a) the possible centre set Pi provided by the i-th order subgrid,
(b) the chosen centres from the centre set Pi,
(c) the total number K of the network units,
(d) the radius 0" i of the i-th hypersphere corresponding to the i-th su bgrid to
choose the centres of the basis functions,
(e) the edge length 6i of the hypercube corresponding to the i-th subgrid,
(f) the width d i of the basis functions associated with the i-th subgrid.
Hence, if the tracking error e ~ 8(cu(t)), the network needs more basis func-
tions. Add the (m+ l)-th order subgrid to the grid. The parameters associated
with the GRBF units are then changed as follows:
O"m+1 = 'Y1O"m (6.65)

6m +1 = 126m (6.66)
dm + 1 = 13 d m (6.67)
m+1
P= U Pi (6.68)
i=1
m+1
C= U Ci (6.69)
i=1
m+1
K= Lmi (6.70)
i=1
where Ii, for i = 1,2,3, is a constant and less than 1.
But, if the tracking error e E 8(cdt)), the network needs to remove some
basis functions. Just remove the units associated with the m-th subgrid. The
parameters associated with the GRBF units are then changed as follows:
6.5 Examples 137
m-l
P= U Pi (6.71)
i=l
m-l
C= UC i (6.72)
i=l
m-l
K= Lmi (6.73)
i=l
In both above cases, the adaptive laws of the weights are still given in the
form of (6.50) and (6.51), based on the above changed parameters. For the
two-dimensional case, the convergence area is shown in Figure 6.3. At the be-
ginning, the convergence area of the tracking area is Eo. Finally it approaches
the expected convergence area E, that is, leil :S CiD, for i = 1,2.
Fig. 6.3. Two-dimensional convergence area
6.5 Examples
This section considers two examples. The first is concerned with adaptive con-
trol of a time-invariant nonlinear system. The second considers adaptive con-
trol of a time-variant nonlinear system.
Example 6.1
The dynamical system used in the simulation example is given by (Sanner and
Slotine, 1992)
ii - 4 (sin~:7fY)) Cin;;y) r
= (2 + sin(37fY -
which is a second-order time-invariant nonlinear system.
1.57f))u (6.74)
The parameter values used in this example are as follows: the reference
input Yd = sin( t); the initial value of the output y(O) = 0.5; the initial value
of the output derivative y(O) = 0; the required accuracy of the tracking error
vector [ElO, E20] = [0.05,0.1]; the constants f3u = f3L = 0.96; the initial val-
ues Llf (0) = 0.005, Llf(O) = 0.05, for i = 1,2; the required minimum angle
between the GRBFs COS(Bmin) = 0.951; the edge length of the rectangles in
the first subgrid is (h = 0.5; the radius of centre selection in the first subgrid
0"1 = 0.99; the width of the GRBF units corresponding to the first subgrid
d1 = 1.11; activation threshold 6m in = 0.45; the initial number of variable
networks is 45; vector a = [1,1]; matrix P = [[0.75, 0.5jT, [0.5, IjT]; adaptation
rates 0: = 1.5 and f3 = 3.
The parameters associated with the variable network are
6i = 0.6186i-1 (6.75)
O"i = 0.6180"i-1 (6.76)
d i = 0.618di - 1 (6.77)
for i = 2,3, ... , m. The maximum of m (the number of subgrids) is limited to
be 11.
The weights are adaptively adjusted by the laws (6.50) and (6.51). The
adaptive control law is given by (6.40). The results of the simulation are shown
in Figures 6.4-6.6. Though the difference between the system output and the
desired output is very large at the beginning, the system is still stable and the
tracking error asymptotically converges to the expected range, which is also
shown in Figure 6.5. As can be seen from Figure 6.6, the number of GRBF
units in the neural network also converges in a period of time.
Example 6.2
Consider a time-variant nonlinear dynamical system given by
ii - (8e- O.05t _ 4) (sin~:7fY)) (sin;;y)) 2 = (2 + cos(O.lt) sin(37fY - 1.57f))u
This plant is different from that in Example 6.1. The functions F(.) and G(.) in
Example 6.1 are time-invariant nonlinear functions. While, here the functions
F and G are time-variant.
All parameter values, the structure of variable networks, the weight learning
laws, and the adaptive control laws used in this example are exactly the same as
Example 6.1. The tracking error between the reference input and the output of
the system is shown in Figure 6.7. Although the plant to be controlled is time-
variant, the convergence of the tracking error in this example is still similar
to that in Example 6.1. This shows that the adaptive control scheme using
variable neural networks also works well for time-variant nonlinear systems.
6.5 Examples 139
.~ 1.5
g
~
<J)
i5
-g
'"
"5 -0.5
Q.
~ -1
<J) _ the system output --- the reference input
F -15o"--------'------1-"-o----1"-5----2LO---~25----30
time t (sec)
.~ 3
-g
'=>"
%
o
'0
~ -1
~ -2 __ the system output derivative --- the reference input derivative

~-3"--------'-------"-----"------L---~----
o 10 15 20 25 30
time t (sec)
Fig. 6.4. Reference input Yd(t), output y(t), reference input derivative Yd(t) and
output derivative y(t) of the system
0.5
0.4
0.3
2
Q;
g'
~ 0.2
jg
~
f-
0.1
-0.1 L -_ _ _-'---_ _ _-"--_ _ _-'--_ _ _--'-_ _ _- - L_ _ _~
o 10 15 20 25 30
time t (sec)
Fig. 6.5. Tracking error y(t) - Yd(t) of the system

100~
~
90
80
II~
I
J'l 70
"§
LL
'"a:CD
60
I
a 50
:v
.0
~ 40
Jl'
>- 30
20
10
0
0 10 15 20 25 30
time t (sec)
Fig. 6.6. The number K of GRBF units in the variable neural network
0.5
0.4
0.3
~
0>
C
'"~ 0.2
1='"
0.1
-0.1 '-----------'------'------"----------"-------'----------'
o 10 15 20 25 30
time t (sec)
Fig. 6.7. Tracking error y(t) - Yd(t) of the system

6.6 Summary 141
6.6 Summary
Nonlinear adaptive neural control has been studied in this chapter. After the
introduction of adaptive control for linear continuous-time systems, adaptive
neural control was presented by combining the variable Gaussian radial ba-
sis function network and Lyapunov synthesis techniques. This guarantees the
stability of the control system and the convergence of the tracking errors.
The number of GRBF units in the variable neural network also converges by
introducing mono decreasing upper and lower bounds on the tracking errors.
Simulation examples illustrate the operation of the variable neural network for
adaptive nonlinear system control.
CHAPTER 7
NONLINEAR PREDICTIVE NEURAL CONTROL
7.1 Introduction
Predictive control is now widely used by industry and a large number of imple-
mentation algorithms, including generalised predictive control (Clarke et al.,
1987), dynamic matrix control (Cutler and Ramaker, 1980), extended predic-
tion self-adaptive control (Keyser and Cauwenberghe, 1985), predictive func-
tion control (Richalet et al., 1987), extended hori:wn adaptive control (Ydstie,
1984) and unified predictive control (Soeterboek et al., 1990), have appeared
in the literature. Most predictive control algorithms are based on a linear
model of the process. However, industrial processes usually contain complex
nonlinearities and a linear model may be acceptable only when the process is
operating around an equilibrium point. If the process is highly nonlinear, a
nonlinear model will be necessary to describe the behaviour of the process.
Recently, neural networks have been used in some predictive control al-
gorithms that utilise nonlinear process models (Hunt et al., 1992; Willis et
al., 1992; Liu and Daley, 2001). Alternative design of nonlinear predictive
control algorithms has also been studied (McIntosh et al., 1991; Morningred
et al., 1991; Proll and Karim, 1994; Liu et al., 1996a, 1998b). However, in
most algorithms for nonlinear predictive control their performance functions
are minimised using nonlinear programming techniques to compute the future
manipulated variables in on-line optimisation. This can make the realisation
of the algorithms very difficult for real-time control.
This chapter considers neural network based affine nonlinear predictors so
that the predictive control algorithm is simple and easy to implement. The use
of nonlinear programming techniques to solve the on-line optimisation problem
is avoided and a neural network based on-line weight learning algorithm is
given for the affine nonlinear predictors. It is shown that using this algorithm,
both the weights in the neural networks and the estimation error converge and
never drift to infinity over time.
The chapter is organised as follows. Section 7.2 gives a brief introduction
to linear predictive control. Section 7.3 presents the structure of the affine
nonlinear predictors using neural networks. The predictive neural controller
is described in Section 7.4. Section 7.5 develops the on-line weight learning
algorithm for the neural networks used for the predictors and includes analysis
of the properties of the algorithm. The design of nonlinear predictive control
using 'growing' neural networks is illustrated in Section 7.6. Finally, Section
144 7. Nonlinear Predictive Neural Control
7.7 gives a simulated example to show the operation of the neural network
based predictive control.
7.2 Predictive Control
Based on an assumed model of the process and on assumed scenario for the
future control signals, predictive control gives a sequence of control signals for
discrete systems. Only the first control signal is applied to the process and
a new sequence of control signals is calculated when new measurements are
obtained. For continuous systems, the predictive control concept is also sim-
ilar. Clearly, predictive control belongs to the class of model-based controller
design concepts, where a model of the process is explicitly used to design the
controller.
One of the important features of predictive control is that its controller is
relatively easy to tune. This makes predictive control very attractive to a wide
class of control engineers and even for people who are not control engineers.
Predictive control has other features as follows:
(a) The predictive control concept can be used to control a wide variety of
processes without taking special precautions, for example, 8180 or M1MO
processes, stable or unstable processes, minimum or non minimum phase
processes, and linear or nonlinear processes.
(b) Predictive control can handle process constraints in a systematic way dur-
ing the design of the controller, which is rather important for industrial
process control.
(c) Within the framework of predictive control there are many ways to design
predictive controllers, for example, generalised predictive control, dynamic
matrix control, and unified predictive control.
(d) Feedforward control action is introduced to predictive control in a nat-
ural way to compensate measurable disturbances and to track reference
trajectories.
(e) Predictive control can easily deal with pre-scheduled reference trajectories
or set points of processes by making use of prediction.
The way predictive controllers operate for single-input single-output systems is
illustrated by Figure 7.1. It shows that the control sequences 1 and 2 designed
using the past input output data produce different output sequences 1 and
2, respectively. This implies that if the future controller sequence is planned
correctly at time t the system output will be very close to or exactly the desired
reference trajectory. Predictive controllers are usually used in discrete time.
It is also possible to design predictive controllers for use in continuous time.
This section gives a brief introduction to predictive control for linear discrete
systems.
Let us consider the following single-input single-output discrete-time linear
system:
7.2 Predictive Control 145
Past Future
y 2
time t
Fig. 7.1. Past and future of output Yt and control input Ut
(7.1)
where Yt is the measured variable or output, Ut is the control input, d is the

time delay, A, Band L1 are polynomials in the backward shift operator q-l:
A = 1 + alq-l + ... + anq-n (7.2)

B = bo + b1q-l + ... + bmq-m (7.3)
L1 = 1 _ q-l (7.4)
nand m are the orders of the polynomials.

In order to define how well the predicted process output tracks the reference
trajectory, there are many cost functions available in predictive control. Here
we use a cost function which is of the following quadratic form:
(7.5)
where
RHLI [rt+d,rt+d+l, .... ,rt+L,]T (7.6)

A]T
Yt+Ll [
,
Yt+d, Yt+d+l, .... , Yt+L
A A
(7.7)
L1UH M, [L1U t, L1UH1, .... , L1UHMl f (7.8)
R t +L" rt+L " i1Ut +M, are vectors of the future reference input Tt, predicted
output Yt and control input Ut, respectively, L1 = d + L - 1, M1 = M - 1, L
the output horizon, M the control horizon and a the weight.
The future reference input is the desired process output, which is often
called the reference trajectory, and can be an arbitrary sequence of points.
Then the predictive controller calculates the future controller output sequence
so that the predictive output of the process is close to the desired process
output.
Now the optimal controller output sequence u* over the predictive horizon
is obtained by minimisation of the performance function Jp with respect to u,
that is
u* = argminJp (7.9)
u
Basically speaking, calculating the control output sequence is an optimisa-

tion problem or, more specifically, a minimisation problem. Usually, solving a
minimisation problem requires an iterative procedure.
In order to predict the process output over the prediction horizon, a k-step-
ahead predictor is required. A k-step-ahead prediction of the process output
must be a function of all data upto t (defined as the set D), the future con-
troller output sequence u and the model of the process 9. Such a k-step-ahead
predictor can thus be described by
YtH = f(u, D, 9) (7.10)
where f(.) is a function. It is clear that the k-step-ahead predictor depends

heavily on the model of the process.
The process output at t+k based on (7.1) can be obtained by substituting
t + k for k, that is,
B
YtH = i1A i1Ut-dH (7.11)
Thus the k-step-ahead predictor can be constructed as
, B A
Yt+k = i1A LJ.Ut-d+k (7.12)
The above equation can also be written as
YtH = Bi1Ut-dH - q(i1A - 1)YtH-1 (7.13)
Now the output y(t + k) for k :::: 1 can be computed recursively using (7.13),
starting with the following equation for k = 1:
Yt+1 = Bi1Ut-d+1 - q(i1A - l)Yt (7.14)
The k-step-ahead predictor (7.13) and (7.14) runs independently ofthe process.
This predictor is not suitable for practical purposes because there always exist
7.2 Predictive Control 147
differences between the prediction and the real process output. For example,
model mismatch or a disturbance at the output of the process may result
in a prediction error. One way to improve the predictions is to calculate the
predictions using (7.13) and (7.14) with fit in the right-hand side of (7.14)
replaced by the measured process output Yt. Thus equation (7.14) becomes
(7.15)
By extending the above predictor, a d + k-step-ahead predictor, which uses

sequences of both past inputs and outputs of the process upto the sampling
time t to construct the predictive model, is of the form below.
Yt+dH = EkYt + HBL1uHk, for k = 0,1,2, ... L - 1 (7.16)
where the polynomials Ek and Fk satisfy the Diophantine equation
(7.17)
Several methods can be used to solve the above equation, for example, a re-
cursive approach (Clarke et al., 1987).
The optimal controller output sequence over the prediction horizon is ob-
tained by minimising the performance index J p with respect to the control
input vector. This can be carried out by setting
(7.18)
In predictive control the assumption is made that all the future control incre-
ments L1Ut+i, for i < M is non-zero. Since, in practice, the control horizon in
predictive control need not be taken to be large, here set M = 2. Let
d+m+k-l
Pk = BFk = L Pk,iq
-i
(7.19)
i=O
Thus, the predictors can also be expressed by
(7.20)
where
d+m+k-l
Qk = EkYt + L Pk.iq-i L1Ut-l (7.21 )
i=k+l
gk = Pk,k-l (7.22)
hk = Pk,k (7.23)
with P k .- 1 = O.
Application of (7.18) results in the following predictive controller
[~ r[ f[
£-1 £-1 £-1
ex + 2:= g~ 2:= gkhk 2:= h+dH - Qk)gk
Ut = Ut-1 + k=O k=O k=O
£-1 £-1 £-1
k=O
2:= gk hk ex + 2:=
k=O
h~ 2:=
k=O
(rHdH - Qk)hk
1
(7.24)
It is clear from the above that the predictive controller only involves the in-
version of a 2 x 2 matrix. This makes the implementation of the predictive
control very easy.
7.3 Nonlinear Neural Predictors
It has been shown in the previous section that the fundamental idea in predic-
tive control is to predict the vector of future tracking errors and minimise its
norm over a given number of future control moves. It is therefore clear that
predictive controller design mainly consists of two parts: prediction and min-
imisation. This section discusses the prediction part. The minimisation part
will be considered in the next section.
Only discrete-time affine nonlinear control systems will be considered with
an input output relation described by
Yt = F(Yt) + G(Yt)Ut-d (7.25)
where F(.) and G(.) are nonlinear functions, Y is the output and U the control
input, respectively, the vector Yt = [Yt-1, Yt-2, ... , Yt-n], n is the order of Yt
and d is the time delay of the system. It is assumed that the order n and the
time delay d are known but the nonlinear functions F(.) and G(.) are unknown.
Clearly, the future output can generally be expressed by the N ARMA model
(Leontaritis and Billings, 1985; Narrendra and Mukhopadhyay, 1997)
(7.26)
for i = 0, 1, ... , L, where F(.) is a nonlinear function.

Though the model (7.26) is an exact prediction of the nonlinear plant
(7.25), this is not convenient for the computation of a control input to track
a desired reference signal even for i = 0. Basically, there are two kinds of
approximate predictors which can be used to predict the future output of
nonlinear systems. One is the recursive predictor and the other is the nonre-
cursive predictor. Here, the latter is used to compensate for the influence ofthe
time delay. Following the affine nonlinear system described by (7.25) and the
general future output expression (7.26), we present some (i + d)-step-ahead
nonrecursive affine nonlinear predictors, for i = 0,1, ... , L, to approximately
predict the future output. These predictors use available sequences of both
past inputs and outputs of the process upto the sampling time t to construct
the predictive models, which are of the following form:
7.3 Nonlinear Neural Predictors 149
Yt+d+i = Fi(Xt) + L Gij(Xt)Ut+j (7.27)

j=O
for i = 0, 1, ... , L, where Fi(Xt) and G ij (Xt) are nonlinear functions ofthe vector
Xt to be estimated, and the vector Xt = [Yt, Yt-l, ... , Yt-n+l, Ut-l, Ut-2, ... ,
Ut-dl. The key feature of these predictors is that the present and future control
inputs Ut, Ut+l, ... , uHi occur linearly in (7.27). It can be seen from (7.27)
that linearised predictors for nonlinear system which are widely used in the
literature (see, e.g., Wang et al., 1995; Xie and Evans, 1984) are a special case
of the above.
Due to the arbitrary approximation feature of neural networks, the nonlin-
ear functions Fi(xt) and Gij(Xt) can both be approximated by single hidden
layer networks. This is expressed by
Ni
Fi(xt) = L fi,k'Pi,k(Xt) (7.28)
k=l
N ij
Gij (Xt) = L 9iJ.k'Pij.d x t) (7.29)

k=l
for j :S i and i, j = 0,1, ... , L, where 'Pi,dxt) and 'Pij.k (Xt) are basis functions
of the networks, Ni and N ij denote the size of the networks. Define the weight
and basis function vectors of the neural networks as
Pi = [fi,l fi,2 (7.30)

G ij = [gij,l gij,2 gij,Nij f (7.31)
<Pi = ['Pi,l (Xt) 'Pi,2(Xt) 'Pi,Ni (Xt) f (7.32)
<Pij = ['Pij.dXt) 'Pij.2(Xt) 'Pij.Nij (Xt) f (7.33)
The neural network based predictors can then be rewritten as

i
Yt+d+i = Pl <Pi + L G~<PijUt+j (7.34)

j=O
for i = 0, 1, ... , L.
It is well known from the universal approximation theory for neural net-
works that the modelling error of the predictor can be reduced arbitrarily
by properly selecting the basis functions and adjusting the weights. There are
many types of basis functions which can be selected, including radial functions,
sigmoid functions, polynomial functions and so on. Section 7.6 will discuss the
selection of basis functions using a radial basis function network. An on-line
learning algorithm for the weight adjustment of the networks used in the pre-
dictors will be given in Section 7.5.
7.4 Predictive Neural Control
Based on the neural predictor described in the previous section, a nonlinear

predictive neural control strategy is proposed, as shown in Figure 7.2. The
fundamental idea of this strategy is to apply a set of neural predictors to predict
the future system output sequences and then to calculate the control input
through a formula that will be found by employing optimisation techniques.
Fig. 7.2. Neural network based predictive control
To define how well the predicted process output tracks the reference trajectory,
a number of cost functions are employed for predictive control. This section
uses a cost function which is of the following quadratic form.
(7.35)
where
Rt+d+L = [THd Tt+d+l Tt+d+L f (7.36)

rt+d+L = [Yt+d Yt+d+1 Yt+d+L ]T (7.37)
Ut+L = [Ut Ut+1 UHL ]T (7.38)
Rt+d+L, rt+d+L and UHd+L are the future reference input, predicted output
and control input vectors, respectively, L is the control horizon, L + d is the
prediction horizon, and a > 0 is the weight.
The optimal controller output sequence over the prediction horizon is ob-
tained by minimising the performance index J np with respect to UHL . This
can be carried out by setting
(7.39)
7.4 Predictive Neural Control 151
Taking the derivative of the performance function J np with respect to the

control input vector Ut +L results in
(7.40)
Using the neural network based predictors (7.34), the derivatives of Yt+d+L
with respect to the control input vector Ut +L are given by
r
o
(7.41 )
Let
(7.42)
Equation 7.40 can be compactly expressed by the following matrix equation:
(7.43)
where h = [1,0, ... , 0] is an identity vector and the matrix DL is of the form
o
n, ~ [ -; 1
(7.44)
-1
It is clear from (7.43) that the controller input vector Ut +L can be calculated
by
(7.45)
Thus, the control input Ut minimising the performance function J np is given

by
(7.46)
The predictive neural controller is therefore relatively simple and easy to im-
plement using the affine nonlinear predictors. There is no need to solve a
nonlinear programming problem to obtain the optimal control input Ut unless
additional constraints are imposed on the control signal and/or output of the
system.
7.5 On-line Learning of Neural Predictors
Here, we consider the on-line adjustment of the weights of the i-th predictor.
The weight estimations of the other predictors are similar. It will be assumed
that the basis functions of all the networks which are used in the predictors
are given and the required prediction accuracy can be achieved by adjusting
the corresponding weights to those functions.
Using the available output data Yt-d-i, ... , Yt-d-i-n+l and the input data
Ut-d-i-l, ... , Ut-2d-i, the output of the i-th predictor at time t can be written
as
Yt = (P;*f <Pi(Xt-d-i) + 2::«:;;jf <Pij(Xt-d-i)Ut-d-i+j + Et (7.4 7)

j=O
where Pt and Gij are the optimal estimates of the weight vectors Pi and Gij ,
for j = 0,1, ... , i, respectively, Et is the approximation error of the predictor
using the neural network and is assumed to be bounded by a positive number
t5 for all time, that is
(7.48)
where the number t5 represents the prediction accuracy, which is known by

assumption. In Section 7.6, growing neural networks will be shown to meet this
requirement. The i-th estimated predictor can also be compactly expressed by
fit = wt <Pt-l (7.49)
where the weight vector W t and the basis function vector <Pt are
GT.]T
~,
(7.50)
<Pi (Xt-d-i)
<PiO (Xt-d-i)Ut-d-i
<Pt-l = <Pil (Xt-d-i)Ut-d-i+l (7.51)
The estimation problem is then to find a vector W belonging to the set defined
by
(7.52)
Based on the recursive least squares algorithm, an on-line weight learning

algorithm for neural networks is developed for affine nonlinear predictors. The
algorithm and its properties are given by the following theorem.
7.5 On-line Learning of Neural Predictors 153
Theorem 7.5.1. Consider the i-th predictor and the learning algorithm:
Wt = W t- I + at!3tPt-IPt-Iet (7.53)
Pt = Pt- I - (3titPt-IPt-IPLIPt-I (7.54)
at = (1- 6Ietl-I)(1 + pLIPt-IPt-d- I (7.55)
it = (Ietl- 6) (Ietl + (2Ietl- 6) pLIPt-IPt-I)-l (7.56)
(7.57)
(7.58)
Then
(i) (7.59)
(ii) lim
t-+oo
IWt - Wt-Il = 0 (7.60)
(iii) (7.61 )
where
(7.62)
and Amin (.) denote the maximum and the minimum eigenvalues of the
Am ax (.)
matrix (.), respectively, and W* is the optimal estimate of the weight vector
Wt·
Proof: (i) Consider the Lyapunov function
(7.63)
Combining (7.53)-(7.58) yields
(7.64)
Since it is assumed that the approximation error Et of the predictor satisfies
IEtl :S 6, it is known from the above that
(7.65)
In addition, for letl :2: 6
(7.66)
It is straightforward to show that
Vi < Vi-I - ftitletl(letl - 6)

< V; fdletl - 6)2
(7.67)
t-I - 2 (1 + <Pi-I Pt - I <Pt-I)
It therefore follows that
(7.68)
(ii) From (7.53)
II W t - W t - I II;
< (7.69)
It is clear from (7.54) that Amax(Pt ) ::; Amax(Pt-d ::; ... ::; Amax(PO)' Then
(7.69) can be written as
II W t _ Wt-I 1122 < ftAmax(PO) (Ietl - 6)2

(7.70)
1 + 'l't-I Ft-I 'l't-I
- JoT n Jo
which, together with (7.68), proves (ii).

(iii) From the matrix inversion theorem (Goodwin and Sin, 1984), it follows
that
(7.71 )
Then
Amin (Pt- I ) :2: Am in (pt-=-II) :2: ... :2: Am in (PO-I) (7.72)
Equation 7.67, together with the above, gives
(7.73)
which results in
(7.74)
Thus
(7.75)
7.6 Sequential Predictive Neural Control 155
This establishes (iii).

Property (i) of the theorem above shows that if 1 + Pi- 1 Pt-1 Pt-1 is finite
for all time, which is true if the closed-loop system is stable, the estimation
error et converges to o. Also, it can be seen from Property (ii) that the weights
converge as time t approaches infinity. In addition, Property (iii) implies that
the weights will never drift to infinity over time.
7.6 Sequential Predictive Neural Control
This section discusses predictive neural controller design using growing neural
networks. Consider the i-th predictor to show how to design the predictive con-
trol. For the sake of simplicity, the basis function vectors of the i-th predictor
are assumed to be
(7.76)
This means all neural networks for the i-th predictor have the same basis
functions, which are of the form
(7.77)
where Ni is the number of basis functions and ipidXt) is the Gaussian radial
basis functions, i. e.
(7.78)
rik is the width of the (ik)-th basis function and Cik is its centre.
The i-th predictor is now given by
Ni i Ni
Yt+d+i = L fi,kipik (Xt) +L L gij,kipik (Xt)Ut+j (7.79)

k=l j=Ok=l
If the prediction error of the i-th predictor is greater than required, according
to approximation theory more basis functions should be added to the networks
to improve approximation. Based on the structure of the function Yt+d+i in
Equation 7.34, the structure of the i-th predictor using the growing neural
networks now becomes
i
Y~2d+i = Y~~~~i + !;,Ni+1 ipi(NiH) (Xt) + L %.Ni+1 ipi(Ni+1) (Xt)ut+j (7.80)

j=l
where Yi~~~i denotes the structure of the i-th predictor at time t - 1 and
Y~2d+i the structure after the addition of a basis function at time t, fi,N i +1
and gij.NiH are the weights corresponding to the new (Ni + l)th Gaussian
radial basis function ipi(Ni+ 1) (Xt).
The growing network is initialised with no basis function units. As obser-

vations are received the network grows by adding new units. The decision to
add a new unit depends on the observation novelty for which the following two
conditions must be satisfied:
(i) min
k=l, ... ,Ni
II Xt - Cik 112 > clc (7.81)
(ii) (7.82)
where ei(t) is the prediction error of the i-th predictor which may approxi-
mately be measured by et defined by (7.57), clc is the required distance between
the basis functions and cl max is chosen to represent the desired maximum tol-
erable accuracy of the predictor estimation. Criterion (i) says that the current
observation must be far from existing centres. Criterion (ii) means that the
approximation error in the network must be significant.
If the above conditions are satisfied, the new centre is set to be Ci(Ni+1) =
Xt. To assign a new basis function 'Pi(Ni+1) (Xt) that is nearly orthogonal to all
existing basis functions, the angle between the GRBFs should be as large as
possible by reducing the width ri(Ni+ 1 )' However, the smaller ri(Ni+1) increases
the curvature of 'Pi(Ni+1) (Xt) which in turn gives a less smooth function and can
lead to overfitting problems. Thus, to have good orthogonality and smoothness,
a choice for the width ri(Ni+l) , which ensures the angles between GRBF units
are approximately equal to the required angle B min , is (Liu et ai., 1998b)
(7.83)
where Bmin is the required minimum angle between Gaussian radial basis func-
tions.
When a new unit is added to the network at time t, the dimension of the
vectors W t and <Pt and the matrix Pt should increase by 1. The on-line learning
algorithm for the i-th predictor is still the same as that given in Section 7.5.
After the above consideration, the matrices QLand H L are of the following
form:
r
o
(7.84)
(7.85)
The predictive controller are still given by the form (7.46). In this way, the
design of the nonlinear predictive neural control is completed by growing net-
works.
7.7 An Example 157
7.7 An Example
In this section, consider the following affine nonlinear system (Chen and Khalil,
1995):
Yt = 2.5Yt-1Yt-2
2 2 + 0.3 cos (0.5 (Yt-l + Yt-2 )) + 1. 2U t-l (7.86)
1 + Yt-l + Yt-2
The reference input r(t) = sin(7ft/500). The initial condition of the plant is
(Y-l, Y-2) = (0,0).
The goal is to control the plant (7.86) to track the reference input r( t) using
a predictive control strategy so that the following quadratic cost function is
minimised.
(7.87)
To ensure Yt is in a fixed region, we used the following one-to-one (1-1) mapping

(Liu et al., 1996b):
_ Yt
(7.88)
Yt = IYtl + 1
It is clear from Equation 7.88 that Yt E (-1,1) for Yt E (-00, +(0). This can
also be applied to Ut.
We used growing networks to model the one-step and two-step ahead pre-
dictors. The growing networks were initialised with no basis function units.
As observations were received the network grew by adding new units. The
required distance and minimum angle between the basis functions were set to
6e = 0.011, Bmin = 20°, respectively. The required and maximum tolerable
estimation accuracy of the predictor were
6 = 0.008 (7.89)
6max = 0.012 (7.90)
In the simulation, the performance of the neural network based predictive

control is shown in Figures 7.3-7.7. Figure 7.3 shows the output Yt and the
reference input rt of the system. The tracking error rt - Yt is shown in Figure
7.4. The estimation errors of the one-step and two-step-ahead predictors us-
ing growing networks are illustrated in Figures 7.5 and 7.6, respectively. The
number of basis functions in the neural networks with respect to time t for the
two-step-ahead predictor is given in Figure 7.7.
It can be seen from the simulation results that the tracking error, the
one-step and two-step prediction errors converge with time t. It is also clear
that the growing neural network based prediction models grow gradually to
approach the appropriate complexity of the predictors that are sufficient to
provide the required approximation accuracy. Moreover, the nonlinear system
is successfully controlled using the predictive neural controller.
2,---,----,----,---,----,----,----,---,----,---,
1.5
"5
Q.
"5
o
C1>
i= -0.5
-1 __ the system output

--- the reference input
_1.5L----L----~--~-----L----L----L----~--~-----L--~
o 100 200 300 400 500 600 700 800 900 1000
timet
Fig. 7.3. Output Yt and the reference input Tt of the system
0.6,----,----,----,----,-----,----,----,----,----,----,
0.4
g
C1>
OJ
"
:s;:
~C1>
.c
f-
-0.4
_0.6L----L----~--~-----L----L----L----~--~-----L--~
o 100 200 300 400 500 600 700 800 900 1000
timet
Fig. 7.4. Tracking error Yt - Tt of the system

7.7 An Example 159
0.8,----,----,----,----,-----,----,----,----,----,----,
0.6
0.4
e 0.2
Cii
CJl
§
a;
"0
o
E
C1>
F -0.2
-0.4
-0.6
-0.8 L -__--'-____'---__--'--__----'____-'--__----'____-'---__----'____-'--__---.J
o 100 200 300 400 500 600 700 800 900 1000
timet
Fig. 7.5. Modelling error of the one-step-ahead predictor
0.8
0.6
e
Cii
CJl 0.4
.E
a;
"0
0
E 0.2
C1>
.<::
f-
-0.4~--~-----'--------'-----'------~--~-----'--------'-----'----~
o 100 200 300 400 500 600 700 800 900 1000
timet
Fig. 7.6. Modelling error of the two-step-ahead predictor

140
120
g? 100
0
U
"
.2
en 80
'iii
ttl
.0
15
Q;
.0 60
E
::>
"
'"
.<::
f- 40
100 200 300 400 500 600 700 800 900 1000
timet
Fig. 7.7. Number of basis functions in the growing neural network for the two-step-
ahead predictor
In addition, the simulation results using the design procedure discussed in this
chapter were compared with those provided by other neural network based
predictive control techniques, for example, neural network based adaptive pre-
dictive control (Tan and Keyser, 1994) and robust nonlinear self-tuning pre-
dictive control using neural networks (Zhu et ai., 1997). It was shown that the
design procedure given in this chapter has three significant advantages. The
first is that the computation for optimisation in the new design procedure is
simpler and faster than other techniques because it uses a simple analytical
solution to the minimisation of the performance function. The second is that
the new design procedure has better reference tracking performance as a result
of the use of a set of affine nonlinear predictors. The third is that the design
procedure provides an appropriate sized neural network by introducing the
growing network technique.
7.8 Summary
This chapter has discussed the neural network based predictive controller de-
sign of nonlinear systems. The fundamental principle of predictive control was
explained on the basis of linear discrete-time systems. A set of affine nonlinear
neural predictors was used to predict the output of the nonlinear process so
that the difficulty of minimising the performance function for nonlinear pre-
dictive control is avoided, which is usually carried out by the use of nonlinear
7.8 Summary 161
programming techniques. The resulting predictive neural control algorithm is

relatively simple and easy to implement in practice. Based on least squares
techniques, an on-line weight learning algorithm for the neural networks based
affine nonlinear predictors has been given. The properties of the algorithm
were studied and it was shown that both the weights and the estimation er-
ror converge as time approaches infinity. The design of neural network based
nonlinear predictive control was illustrated using growing networks. Analysis
of the stability of the closed-loop nonlinear predictive neural control system
remains an open question for future research.
CHAPTERS
VARIABLE STRUCTURE NEURAL CONTROL
8.1 Introduction
Variable structure control with sliding modes was first proposed in the early
1950s (Utkin, 1964; Ernelyanov, 1967; Itkis, 1976) and has subsequently been
used in the design of a wide spectrum of system types including linear and
nonlinear systems, large-scale and infinite-dimensional systems, and stochastic
systems. It has also been applied to a wide variety of engineering systems.
The most distinguished feature of variable structure control based on sliding
modes is the ability to improve the robustness of systems which are subject
to uncertainty. If, however, the uncertainty exceeds the values allowed for the
design, the sliding mode cannot be attained and this results in an undesirable
response (Utkin, 1964). In the continuous-time case this problem was solved by
combining variable structure and adaptive control (Slotine and Li, 1991), but
this requires that all the system variables are available and can be measured.
This case has also been discussed for linear discrete systems using input output
plant models (Furuta, 1990, 1993; Hung et al., 1993; Pan and Furuta, 1995)
and for nonlinear discrete systems where the input output model is unknown
(Liu et al., 1997b. 1999b).
This chapter presents a neural network based variable structure controller
design procedure for unknown nonlinear discrete systems. A neural network
based affine nonlinear predictor is introduced so that the control algorithm is
simple and easy. Two performance functions are considered for the design of
variable structure neural control. The first performance function is concerned
with minimisation of the prediction error. The second performance function
includes minimisation of the prediction error and the control input. A recursive
learning algorithm for neural networks for the neural network affine nonlinear
predictor is also discussed. This algorithm can be used for both on-line and off-
line weight training. It is shown that both the weights of the neural networks
and the estimation error converge.
A brief introduction to variable structure control for linear systems is given
in Section 8.2. Then, Section 8.3 considers the structure of the affine nonlinear
predidors, which is based on neural networks. Variable strudure neural control
is studied for nonlinear systems in Section 8.4. Generalised variable structure
neural control is discussed in Section 8.5. Section 8.6 develops the recursive
learning algorithm for the neural networks used for the d-step-ahead predictor
164 8. Variable Structure Neural Control
and the properties of the algorithm are analysed. Finally, simulation results
are given in Section 8.7.
8.2 Variable Structure Control
Variable structure control is a high-speed switched feedback control. For ex-

ample, the gains in each feedback path switch between two values according
to a rule that depends on the value of the state at each instant. The purpose
of the switching control law is to drive the nonlinear plant's state trajectory
onto a prespecified (user-chosen) surface in state-space and to maintain the
plant's state trajectory on this surface for all subsequent time. The surface
is called a switching surface. When the plant's state trajectory is above the
surface, a feedback path has one gain, and a different gain if the trajectory
drops below the surface. This surface defines the rule for proper switching. The
plant dynamics restricted to this surface represent the controlled system's be-
haviour. The first critical phase of variable structure control design is to define
properly a switching surface so that the plant, restricted to the surface, has
the desired dynamics, such as stability, tracking, regulation, etc. Therefore, a
variable structure control design breaks down into two phases. The first phase
is to design or choose a switching surface that represents the desired dynamics.
The second phase is to design a switched control that will drive the plant state
to the switching surface and maintain it on the surface upon interception.
This section briefly introduces variable structure control for linear discrete-
time systems. Let us consider a single-input single-output discrete-time linear
system which is represented by
(8.1)
where Yt is the output, Ut is the control input, d is the time delay, A and B
are polynomials in the backward shift operator q-l:
A = 1 + alq-l + ... + anq-n (8.2)

B = bo + b1q-l + ... + bmq-m (8.3)
nand m are the orders of the polynomials. It assumes that the above system
is minimum phase, that is, all zeros of the polynomial B are within the unit
disk.
The structure of a variable structure control system is governed by the
sign of the switching function. A switching function is generally assumed to
be linear. For discrete time systems, a simple switching function is defined as
(8.4)
where Tt is the reference input and C is a Schur polynomial defined as
(8.5)
8.2 Variable Structure Control 165
whose zeros are inside the unit disk.

The objective of control is to design a controller to ensure the closed-loop
system reaches the following switching surface:
SHd = 0 (8.6)
To transfer the system state onto the switching surface, let us construct a
Lyapunov function as
(8.7)
The difference in the Lyapunov functions is written as
(8.8)
where
(8.9)
It is clear from the above that if the controller to be designed satisfies
(8.10)
then, the following inequality will hold:
(8.11)
which implies that as the time t approaches infinity, the system will reach the
switching surface, i. e.,
lim
t--+oo
St =0 (8.12)
Also, the error is defined as
(8.13)
The minimum variance control (Astrom, 1970) to achieve (8.6) is given by
(8.14)
where the polynomials E and F satisfy
AE+q-dF = C (8.15)
Using (8.15), Equation 8.1 can be rewritten as
(8.16)
Thus
(8.17)
We consider use of the above conventional minimum variance control and the
variable structure control on the inside or outside of the sector defined later.
The following control input is considered:
(8.18)
where Vt = 0 is the equivalent control to achieve SHd = St. Substituting (8.18)

into (8.17), the following relation is derived:
St+d = Vt + St (8.19)
The auxiliary control input Vt is chosen as the output feedback with the vari-
ance coefficients
(8.20)
where H is a polynomial
(8.21 )
The coefficients {h k } of the sector are switched outside of a sector defined on

the set (et, et-l, ... , et-n+d. Now, let us define a set:
2(c) = {~: I~I ::; c} (8.22)
The following control law gives the stable closed-loop control system.
Theorem 8.2.1. For the plant (8.1) with the control law (8.18), the closed-
loop system is stable if the coefficients of the polynomial H are chosen as
Stet-k tf. 2(15k ) (8.23)

otherwise
for k = 0,1, ... , n - 1, where do > 0 and 15k is defined as
(8.24)
a 2: 1 and sign(.) is a sign function.

8.2 Variable Structure Control 167
Proof: By substituting (8.20) into (8.18), the control law can be rewritten as
(8.25)
It has been shown that using the above control law yields
(8.26)
which gives
(8.27)
For the choice of the coefficients of the polynomial H, if Stet-k tj. 5(6k) there
exists the following:
St L1S t+d StHet

n-l
St L hket-k
k=O
n-l
< - L 6kl h kl
k=O
< -~ (~Ie<-'II"'I)'
a 2
< --(L1s t+d) (8.28)
2
A Lyapunov function is chosen as (8.7). Using the above inequality leads to
< (8.29)
As a > 1,
(8.30)
This concludes that St decreases when Stet-k tj. 5(6k), for k = 0,1, ... , n - 1
and L1s t - d -+ 0 as t -+ 0 yields either that if Stet-k tj. 5(6k),
lim et =0 (8.31 )
t--+o
(8.32)
For the latter case, the closed-loop system is controlled so that St+d = O. Since
(8.33)
the Schur polynomial C will make the error et decrease to zero. Therefore, the
closed-loop system becomes stable.
8.3 Variable Structure Neural Control

Following the variable structure control of linear systems, a variable structure
neural control scheme for nonlinear systems is presented. The basic idea of
this scheme is illustrated in the block diagram shown in Figure 8.1. A neural
network approximates the unknown nonlinearities of the system. Controller
1 stabilises the system for good robustness using the sliding mode method if
the plant model is not accurate. Controller 2 minimises a specific performance
function to give the desired performance, e.g., reference tracking. The logic
switch is a condition that determines which controller should be in action.
Consider a discrete-time affine nonlinear control system which has been
described by
(8.34)
where F(.) and G(.) are nonlinear functions, Yt is the output and Ut the control
input, respectively, the vector Xt = [Yt-1 Yt-2 Yt-n], n is the order of
Yt and d is the time delay of the system. It is assumed that the order nand
the time delay d are known, the nonlinear functions F(.) and G(.) are smooth
but unknown, and G(.) is bounded away from zero.
Based on the affine nonlinear system described by (8.34), a d-step-ahead
affine nonlinear predictor is employed to compensate the influence of the time
delay. This predictor uses sequences of both past inputs and outputs of the
process upto the sampling time t to construct the predictive models, which are
of the following form:
(8.35)
where F(Xt) and G(Xt) are nonlinear functions of the vector Xt which are to
be estimated.
Due to the good approximation properties of neural networks, the nonlin-
ear functions F(Xt) and G(Xt) can be approximated by single hidden layer
networks, which are expressed by
No
F(Xt) = L Jkipk(Xt) (8.36)
k=l
N,
G(Xt) = L gk')'k (Xt) (8.37)
k=l
where ipk(Xt) and Ik(Xt) are the basis functions of the networks, No and N1
denote the size of the networks. Define the weight and basis function vectors
of the neural networks as
F=[h 12 JNof (8.38)
G = [gl g2 gNlf (8.39)
<Pt = [ip1(Xt) ip2(Xt) ipNo(Xt)]T (8.40)
Ft = [11(Xt) 12 (Xt) IN, (Xt) f (8.41 )
8.3 Variable Structure Neural Control 169
Fig. 8.1. Variable structure control using neural networks
Then the neural network based predictors can be rewritten as
(8.42)
It is well known from the universal approximation theory for neural networks
that the modelling error of the predictor can be reduced arbitrarily by properly
choosing the basis functions and adjusting the weights. There are many ba-
sis functions available, e.g., radial functions, sigmoidal functions, polynomial
functions and so on. This chapter does not intend to discuss how to choose
between these. But a recursive learning algorithm for the weight adjustment
of the networks used in the predictors will be presented in Section 8.5.
Based on the d-step-ahead affine nonlinear predictor modelled using the
neural networks described above, we now consider the variable structure neural
control using sliding modes. It will be assumed that all the basis functions
in the neural network predictor are given but the weights of the network are
unknown. The objective of the control is to minimise the following performance
function.
J S = "2l(A*
Yt+d - rt+d
)2 (8.43)
where r is the reference input and iJ;+d is the optimal d-step-ahead prediction
of the output Yt.
For the given neural network structure, the optimal d-step-ahead predictor
is given by
(8.44)
where F* and G* are the optimal estimates of the weights which yield a pre-
diction error within the required accuracy.
Based on the optimal d-step-ahead predictor given by (8.44), the control
input to minimise J s can be solved analytically and is expressed as
(8.45)
In practice, it is very difficult to know the optimal weight vectors F* and G* in

the affine nonlinear predictor if some uncertainties or disturbances exist in the
system. Here we consider use of the above neural predictor and the variable
structure controller. Let
(8.46)
(8.4 7)
(8.48)
where c is a positive number. Thus, the following control input is considered:
(8.49)
where St = fJ; - rt·

From now on, it is assumed that the weight vector G is such that GT f't
is bounded away from zero. This reasonable assumption is based on the fact
that the nonlinear function G (.) of the system is assumed to be bounded away
from zero. The auxiliary control input Vt is chosen to give St+d = St and Vt is
chosen as the output feedback
(8.50)
where
(8.51)
ak and bk are the coefficients which are to be designed.

The problem is to choose the coefficients ak and bk to guarantee stability
of the system. To solve this problem, the following theorem is used.
Theorem 8.3.1. If the coefficients ak and bk of Vt are chosen as
-dosign( 'PkSt) St ~ S(O"t)

ak ={ 0 otherwise
(8.52)
-dosignbk'!)!tst) St ~ S(O"t)
bk = { 0 otherwise
(8.53)
do > ""t + M - 1 > 0 and T > 1, where

8.3 Variable Structure Neural Control 171
(8.54)
N,
~t = L Irdxt)(cF f't)-lIM + 1 (8.55)
k=l
(8.56)
then St+d converges to S (O"t).

Proof: To ensure the stability of the system, the Lyapunov technique is used.
Choose the Lyapunov function as
(8.57)
The difference in the Lyapunov functions can be expressed in the following

form:
.1vt+d = vt+d - vt = (St + .1s t+d)2 - s; = 2s t .1s Hd + (.1s Hd )2 (8.58)

Using the control input (8.49), it can be shown from (8.46) that
and it then follows that
(8.60)
where
(8.61 )
Substituting (8.50) into (8.60) results in

No N,
l.1s Hd l ::; L IUk + akPt)<pdXt) I + L l(gk + bkPt)rk(Xt)'l/Jt I
k=l k=l
< (do~t + M) (~I<Pk(Xt)1 + ~ Irk (Xt)'l/Jtl) (8.62)
Using (8.52) and (8.53) yields
2s t .1s t+d = 2s t (pT<P t + (C]Tf't)-l)'l/Jt + PtVt)
2st (~Uk + akPt)<Pk(Xt) + ~(9k + bkPt)rdXt)'l/Jt)

< -2(do + 1 - K,t - JL) ( (; l4?d Xt)Stl + (; lid Xt)1jJt Stl

~ ~ )
2
< -T ((dOK,t + JL) (~I4?k(Xt)1 + ~ lik(Xt)1jJtl) )

< (8.63)
As a result, the following relation is derived
(8.64)
The above relation implies that .1s Hd converges to zero as t approaches infin-
ity. This shows that SHd is brought into the inside of the set 2'(Ut).
8.4 Generalised Variable Structure Neural Control
In the previous section, the performance function J s involves only the difference
between the reference and the optimal prediction. For many practical systems,
the control input of the system should be taken into account in the performance
function. Thus, the objective of the control in this section is to minimise the
performance function below, which includes the control input.
(8.65)
where .1Ut = Ut - Ut-l, a is a positive number.

Using the neural network based d-step-ahead affine nonlinear predictor, the
control input to minimise Jg is given by
(8.66)
To avoid the difficulty of finding the optimal weight vectors F* and G* in the
affine nonlinear predictor, the use of a predictive neural controller and variable
structure control are considered. Similar to the previous section, the following
control input is used:
(8.67)
The auxiliary control input Vt is chosen as
(8.68)
where
8.4 Generalised Variable Structure Neural Control 173
For this case, the following theorem gives the design of the auxiliary control
input Vt so that St+d converges from the outside to the inside of the set 2.
Theorem 8.4.1. If the coefficients ak and bk of Vt are given by
-dosign(ipkst) St tj. 2(Tt)

ak ={ o otherwise
(8.72)
-dosign(rk1,UtSt) St tj. 2( Tt)

bk = { (8.73)
o otherwise
St tj. 2(Tt)
Cl ={ otherwise
(8.74)
do > "'t + p, - 1 > 0 and T > 1, where
(8.75)
Nl
"'t = L IrdXt)(CF Ft )-IIp, + 1 (8.76)
k=1
p,= . . max. {lft-iil,lgj-gjl} (8.77)

1,=1,2, ... ,No ,)=1,2, ... ,N1
( and do are positive numbers, then SHd converges to the set 2(Tt).
Proof: Choose the Lyapunov function as
(8.78)
It is shown from the proof of Theorem 8.2 that the difference of the above
Lyapunov function is given by (8.58). With (8.46) and (8.67), St+d can be
expressed as
St+d F*(Xt) + G*(Xt)Ut - rt+d

(F + Pf <Pt + (6 + af FtT/t (aT Ft(rt+d - pT <P t + St + Vt) + aUt-I)
- - -T - -T - -T - 1
+G + St + a(G r t )-
~T ~T
F Pt rtG rtT/t(rHd - F Pt Ut-d

-T- -T- -T-
+ St G rtUt-l) + (G rtG r t
~T-
-aT/t(rHd - F Pt -
-T -
+(G r t ) )T/tVt
2
+ St (8.79)
Moving the term St from the right side to the left side in the equation above
gives
where
(8.81 )
The upper bound of the absolute value of Lls t+d is estimated by
No
ILlsHdl < L IUk + ak(t)'PdXt) I
k=l
Nl
+ L l(gk + bk(t)T/tik(Xt)'l/Jt I + I(Ck(t -p,)T/tWtl

k=l
< (p, + dO~t) (~I'Pk(Xt)1 + ~ lik(Xt)T/t'l/Jt I + ITJtWtl) (8.82)

Using (8.72)-(8.74) leads to
2stLlst+ d = 2st (pT <P t + (;T Ft'l/JtT/t - P,T/tWt + T/t(tVt)
2st (~Uk + ak(t)'PdXt)

+ ~(9k + bk(t)T/tlk(Xt)'l/Jt + (Ck( -P,)T/tWt)
< -2(do + 1 - ~t -p,) (~I'Pk(Xt)stl

+ ~ lik(Xt)T/t'l/JtSt I + IT/tWtStl)
2
< -T ((dO~t + p,) (~I'Pk(Xt)1 + ~ lik(Xt)T/t'l/Jt I + IT/tWtl) )

< -T(Lls Hd )2 (8.83)
Thus, the following relation is derived
(8.84)
The above relation shows that SHd converges to the inside of the set 5 (Tt) as
t approaches infinity. Thus, this proves the theorem.
8.5 Recursive Learning for Variable Structure Control

In practice uncertainties and/or disturbances will always exist and recursive
weight learning of the neural networks used to construct the d-step-ahead affine
8.5 Recursive Learning for Variable Structure Control 175
nonlinear predictor becomes necessary. Here, we consider the recursive adjust-

ment algorithm of the weights of the d-step-ahead predictor. The algorithm
can be used for both on-line and off-line weight training.
Using the available output data Yt-d-1, ... , Yt-d-n and the input data Ut-d,
the d-step-ahead predictor is given by
-*T-
Yt = (F ) Pt-d + (G-*T-
) Tt-dUt-d + et (8.85)
where F* and G* are the optimal estimates of the weights, et is the approxima-
tion error of the predictor which is assumed to be bounded, i. e., max let I :::; 6L,
but the upper bound 6L is not known exactly.
The estimated d-step-ahead predictor can be written compactly as
(8.86)
where the weight vector Wt - 1 and the basis function vector Pt-1 are
Wt- 1 = [h 12 JNa gl g2 ... gNll T (8.87)

Pt-1=[<pdXt-d) ... <PNo(Xt-d) i1(Xt-d)Ut-d iN, (Xt_d)Ut_d]T
(8.88)
Based on the recursive least squares algorithm for a bounded noise (Whidborne
and Liu, 1993; Wang et al., 1995), the recursive weight learning algorithm for
the neural network is proposed as
W t = W t- 1 + AtPtPt-1 et (8.89)
Pt = Pt- 1 - AntPt-1Pt-1PL1Pt-1 (8.90)
At = ,st(6epL1Pt-1Pt-d-1(letl- 6e ) (8.91 )
it = 6e letl- 1 (8.92)
et = Yt - Yt (8.93)
,st = { ~ (8.94)
where the positive number 6e is assumed not to be less than the upper bound
h of the approximation, P(O) is a positive finite matrix and Am ax (.) is the
maximum eigenvalue of its argument matrix.
(8.95)
where Wt = W t - Wt, Wt is the optimal estimate of the weight vector W t .

Combining (8.89)-(8.93), the Lyapunov function can be expressed as
(8.96)
(8.97)
Using (8.91)-(8.92) gives
Vi ::; Vi-I - ft(pT-IPt-lPt-d-1(letl- 6e)(letl- 6;1(1) (8.98)
Thus
(8.99)
where
(8.100)
Since the bound h of the estimation error Edt) is assumed not to be greater
than 6e, it is easy to show that f(6 e ) > 0 until the error letl = 6e. So, the
error letl converges to 6e. On the other hand, if letl < 6e, it is possible that
L1 Vi > O. This implies that the weight vector Wt may drift away over time.
In this case, set ft = 0 in the weight learning algorithm given by (8.89)-(8.94)
to avoid divergence of the weight vector. Thus the error letl converges to the
range [0,6 e ].
The analysis above shows that if the upper bound 6L is known, then the
error letl will converge to 6L by simply setting 6e = 6L. In the case where the
upper bound 6L of the estimation error Edt) is not known exactly, the error
letl still converges to 6e if 6e is set to be greater than 6L. Thus, the closer the
number 6e is chosen to the upper bound 6L, the more accurate the estimation
of the predictor is.
8.6 An Example
In this section, consider the following affine nonlinear system (Chen and Khalil,
1995):
Yt = 2.5Yt-lYt-2
2 2 + 0.3 cos (0.5 (Yt-l + Yt-2 )) + 1. 2U t-l (8.101 )
1 + Yt-l + Yt-2
The initial condition of the plant is (Y-l, Y-2) = (0,0) and the reference input
(t) = { 6 cos(1Ttj80) 0 < t ::; 160 (8.102)
r O t > 160
Since the structure and parameters of the functions F(Xt) and G(Xt) in the
affine nonlinear system are assumed to be unknown, a growing Gaussian radial
basis function (GRBF) neural network was used to approximate the functions.
The growing GRBF network was initialised with no basis function units. As
observations are received the network grows by adding new units. The decision
8.6 An Example 177
to add a new unit depends on the observation novelty for which two condi-
tions must be satisfied. The first condition states that the approximation error
between the real output and the estimated output must be significant. The
second condition states that the new centre of the GRBF must be far away
from existing centres. In this way, the approximation accuracy of the functions
F(Xt) and G(Xt) will converge to the required bound.
1o,-----,------,------,-----,------,----~
-4 _ the system output
--- the reference input

_6L-____ ~ __ ~~ ______L -_ _ _ _~ _ _ _ _- L_ _ _ _~
o 50 100 150 200 250 300

timet
Fig. 8.2. Output Yt and reference input Tt of the system
-2
-4
-60L-----~50,---,-1~00----,-15~0-----2~0~0----~2~50,---~300
timet
Fig. 8.3. Tracking error Yt - Tt of the system

In the simulation, the recursive weight algorithm was used for off-line train-
ing of the growing GRBF network. When the variable structure neural control
was applied, the recursive weight algorithm was then used for on-line train-
ing of the growing GRBF network. The generalised variable structure neural
control strategy was used. The parameters were 0: = 0.5, T = 1.1, JL = 0.1,
do = 0.5. The performance of the system is shown in Figures 8.2 and 8.3 for
neural network based variable structure control. Figure 8.2 shows the output
Yt and the reference input Tt of the system. The tracking error Tt - Yt is shown
in Figure 8.3. The results of the simulation indicate that the tracking error of
the system using variable structure control is small and converges rapidly.
8.7 Summary
This chapter has considered a neural network based variable structure con-
troller design for unknown nonlinear discrete systems. The basic idea of vari-
able structure control was discussed for linear discrete-time systems. A neural
network based affine nonlinear predictor was introduced to predict the outputs
of the nonlinear process, and a variable structure control algorithm was devel-
oped which is simple and easy to implement. In order to improve the stability
and robustness performance of the system, a discrete sliding mode control
technique was applied. Two cases were considered for the variable structure
neural control. The first was based on minimisation of the square prediction
error. The second was based on combined minimisation of both the squared
prediction error and the squared control input. A recursive weight learning
algorithm for the affine nonlinear predictors was also developed which can be
used for both on-line and off-line weight training. Analysis of the weight learn-
ing algorithm demonstrated that both the weights of the neural networks and
the estimation errors converge.
CHAPTER 9
NEURAL CONTROL APPLICATION TO COMBUSTION
PROCESSES
9.1 Introduction
Combustion processes exist in many applications related to power generation,
heating and propulsion; for example, steam and gas turbines, domestic and
industrial burners, and jet and ramjet engines. The characteristics of these
processes include not only several interacting physical phenomena but also a
wide variety of dynamic behaviour. In terms of their impact on system per-
formance, pressure oscillations are of most significance. In some applications,
pressure oscillations are undesirable since they result in excessive vibration,
causing high levels of acoustic noise and in extreme cases mechanical failure.
In the frequency domain, the pressure is characterised by dominant peaks at
discrete frequencies which correspond to the acoustic modes of the combustion
chamber.
Generally speaking, there are two types of strategies that can be employed
to solve this problem: passive control and active control. Passive control has
been used in most practical combustors and approaches include changing the
flame anchoring point, the burning mechanism and the acoustic boundary con-
ditions, and installing baffles and acoustic dampers (Schadow and Gutmark,
1989). As a result of changes in the dynamic properties with operating point
it is difficult to optimise performance using passive means alone. Active con-
trol is the term that is widely used to describe the situation where control is
effected by adding energy to the system via the use of actuators in a way that
counteracts the oscillation.
Many active control approaches have been applied to combustion systems;
for example, phase-lead control (McManus et al., 1990), robust control (Tierno
and Doyle, 1992), LQG control (Annaswamy and Ghoniem, 1995), and adap-
tive control (Padmanabhan et al. 1995). In order to develop an effective active
controller for unstable combustors, Fung and Yang (1991) proposed a strategy
using state feedback theory. The unstable combustor was modelled as a linear
system, and a classical observer was used to estimate the states of the system
from the pressure measurement. This observer requires the parameters of each
mode (e.g., the damping and frequency) to enable estimation of the system
states. These parameters are generally difficult to obtain in advance because
they vary with the operating point and the ambient conditions. Neumeier
and Zinn (1995) presented an active control approach for combustion systems,
which attenuates the largest mode each time and ignores the other less impor-
180 9. Neural Control Application to Combustion Processes
tant modes. This can lead to poor pressure prediction, and also to a control
action that accentuates the less naturally dominant modes.
In this chapter another active control approach is studied, which addresses
all the modes independently. It is based on an output model which is estimated
by a neural network. This output model is then used to predict the system
pressure, to overcome the combustion system time delay. Finally, the prediction
is used by a controller to optimally attenuate the oscillating modes. Results
from the control of a simulated unstable combustor and a combustion test rig
are given.
9.2 Model of Combustion Dynamics
The combustion process is an oxidation process generating heat, light and/or

sound, which includes chemical reactions and transformation of energy stored
in chemical bonds to heat that can be utilised in a variety of ways. A model of
the acoustic dynamics of combustion processes can be derived from an under-
lying thermo-acoustic representation of the combustion process. Under various
assumptions, mass, momentum and energy partial differential equations can
be used to develop the modal descriptions (Daley and Liu, 1998; Liu and Da-
ley, 1999b). The reason for presenting this physical description is to show that
the acoustic dynamics of combustion processes can be represented by a model
that is identical to a neural network structure.
To simplify the model of the acoustic dynamics of combustion processes,
it is assumed that the longitudinal acoustic mode is the dominant one, all
dissipative phenomena are neglected and perfect gas relations are used. Then,
the conservation equations of mass, momentum, and energy in one-dimensional
flow can be written as, respectively, the mass equation:
(9.1)
the momentum equation:
av av ap
p-
at
+ pv-
ax
+ -ax - f = 0 (9.2)
and the energy equation:
ap ap av
- + v - + ,p- + (1 - ,)q - e =0 (9.3)
at ax ax
where p, v and p refer to the density of the mixture, the velocity of the gas, and
the pressure, respectively, m, f and q are the rates of added mass, momentum,
and heat release per unit volume, e denotes other sources of energy, and, is the
specific heat ratio. Let the above variables be separated into two parts: their
mean and perturbed components; for example, the pressure can be expressed
in the form p(x, t) = p(x) + p(x, t). Also assume that the mean flow is steady,
9.2 Model of Combustion Dynamics 181
the perturbed components are small variations about the means, the mean
heat release is small, and the Mach number of the mean flow is also small.
Then the following dynamic relations can be obtained
EP p _ (;2 EP P _ (r _ 1) 8q _ 8e + 2, - 1 (;2 8m + (;2 8 f = 0 (9.4)

8t 2 8x 2 8t 8t 2 8t 8x
8p _8v __
8t + ,p 8x - (r - l)q - e = 0 (9.5)
where (; is a coefficient. The above neglects the mean flow effects. In the pres-
ence of non-negligible mean flow and mean heat release rate, the underlying
acoustic relations are more complex than the above equations, but can also be
analysed in a similar manner.
It is assumed that in the combustor there are no external sources except
the heat release rate. This means m = f = e = O. The unsteady pressure p
(variations about a mean) can be expressed by the following physically based
model (Annaswamy and Ghoniem, 1995; Neumeier and Zinn, 1995):
n
(9.6)
with the basis function as
'l/Ji(x) = sin(kix + 'PiO) (9.7)
where k i and 'PiO are determined by the boundary conditions, and correspond
to the spatial mode shapes, /1i represents the acoustic dynamics, n is the
number of modes and x is the displacement along the chamber length. Let
'l/J(x) = [ 'l/J1 (x) 'l/J2 (x) 'l/Jn(x) JT (9.8)
/1(t) = [ /11(t) /12 (t) ... /1n(t) f (9.9)
K = diag [ k1 k2 kn f (9.10)
E = 10£ 'l/J'l/JTdx (9.11)
The acoustic dynamics of the combustion system can then be written as (An-
naswamy and Ghoniem, 1995)
(9.12)
where ao = (r - 1)( ,m
-1, [l = (;K, , is the specific heat and the perturbed
localised heat release rate is of the form: q(x, t) = J(x - xo)qo(t).
Thus the fundamental dynamics are described by (9.12) with output (9.6).
9.3 Neural Network Based Mode Observer
Active stabilisation of (9.12) can be achieved directly, using an acoustic driver,

or indirectly using fuel modulation to influence the heat release. The basic con-
trol principle is to provide additional energy which is out of phase with the
oscillating modes. In order to enable this to be achieved for each mode inde-
pendently, a prediction model of the combustion dynamics must be developed.
One approach is to exploit the structural similarity of output (9.6) to that of
a neural network. On the assumption of a point pressure measurement, the
output model of the combustion process described by a neural network is
N
p(t) = L Wdi(~i' t) + rSt (9.13)

i=1
where Wi is the weight, fi(~i' t) the basis function with parameter vector, N
the number of basis functions and rSt the modelling error.
---~ Combustor
Learning
Mechanism
Fig. 9.1. Output modelling using neural networks

9.4 Output Predictor and Controller 183
For a different position x on the combustor, the weights will be different. The
model can be rewritten as
(9.14)
where the weight and basis function vectors are
WN ]T (9.15)
Ft=[h 12 (9.16)
Finding the weights in (9.13) represents a linear regression problem. Therefore,

if it is assumed that the modelling error is white noise, then the weights of the
output model can be derived using a standard least-squares algorithm, that is,
T
= W t - 1 + PtFt(pt - W t _ 1Ft )
A A A
Wt (9.17)
Pt = ).-1 (Pt - 1 -Pt-1FtFtPt-1().+FtPt-1Ft)-1) (9.18)
where). E (0,1] is a forgetting factor. The output modelling using neural

networks is shown schematically in Figure 9.1.
9.4 Output Predictor and Controller
In combustion systems with active elements, there is a large time delay between
the actuator's input and the actual measured pressure. To control a system
with time delay, a predictor is needed to estimate the future output. The
output predictor using the neural network is readily constructed as
N
fJ(t + T) = L wi(t)fi(~i' t + T) (9.19)

i=l
where T is the time delay of the combustion system. This implies that the
accuracy of the output prediction depends on the estimated weights at time t.
It is therefore assumed that this time dependence is low in frequency compared
to the mode frequencies
Based on the mode predictor, the output of a controller that is used to
cancel the effect of the inherent time delay is generally of the form
u(t) = f(fJ(t + T)) (9.20)
where f (.) is a function of the output prediction. The design of this function
depends on what control performance is under consideration, and it can be a
linear or nonlinear function.
9.5 Active Control of a Simulated Combustor
In this section, the active control of a simulated combustor with six modes is
considered. The output model based predictive control using neural networks
is shown in Figure 9.2.
d
r + + u p
Fig. 9.2. Output model based predictive control using neural networks
Following the model of acoustic dynamics in Section 9.2, a simulated combustor

represented by a linearly unstable system with time delay is considered. Its
behaviour is described by the collective response of six unstable oscillators.
The transfer function is given by
(9.21 )
where the frequencies Ii = wd27f are 100,220,350,470,600 and 800Hz, the

damping factors <;"i are -0.5/Wl, -1/w2, -1.5/W3, -0.2/W4, -0.25/ws, -0.3/W6,
the system time delay T = 1/300 and the mode gains gi are 1000,2000,3000,
4000,5000 and 6000. The active control scheme for this system is shown in
Figure 9.3. It is assumed that the reference input (the heat release produced by
the main fuel supply) has a disturbance which is white noise with an amplitude
approximately 10% of the reference input.
It is clear from the above that all the mode damping factors are negative,
and that the time delay is almost three times the period of the fastest mode.
Simulations have indicated that this system is a very difficult one to control
using alternative approaches.
According to the characteristics of the system, sine waves are used as basis
functions for the output model. Thus, the output model of the combustor
described by a sine wave basis function neural network is
12
p(t) = L Wi sin(7]it + cPi) (9.22)

i=l
9.5 Active Control of a Simulated Combustor 185
where Wi is the weight and sin(7]it + cPi) is the basis function. For this system,
the parameters are chosen to be
7]1 = 7]7 = 2007r 7]2 = 7]8 = 4407r (9.23)

7]3 = 7]9 = 7007r 7]4 = 7]10 = 9407r (9.24)
7]5 = 7]11 = 12007r 7]6 = 7]12 = 16007r (9.25)
and the parameters are
cPi = 0 and cPi+6 = 0.57r, i = 1, 2, ... , 6 (9.26)

The performance of the output model using the sine wave basis function neural
network is illustrated in Figure 9.4. It can be seen that there is little difference
between the estimated pressure of the model and pressure of the simulated
combustor. The frequencies 7]i in the model are an important factor that affects
the prediction accuracy. However, by simulation it has been determinated that
if the frequencies 7]i have less than ±5% variation, the prediction is similar to
the results shown in Figure 9.4.
The estimated weights of the output model using the sine wave basis func-
tion neural network vary slowly with time, as shown in Figure 9.5 and Figure
9.6. This results from the system noise. If there is no noise in the system, the
parameters will converge with very little variation. The large change in magni-
tude of the weights which happens at time 0.5 seconds is caused by application
of the active control system. This is merely a reflection that the active control
system is working well and dramatically reduces the modal amplitudes, which
are a function of the weights. Though the weights become smaller, they will
not be zero if the system output is not zero.
Fig. 9.3. Active control of a simulated combustor

10
5
j
V
N~
~
~ ~
0
~
:0
(f)
(f)
~
~
n.
-5
-10
I~ Combustor
Model
-15~----~--~----~----~----~----~----~----~
0.26 0.265 0.27 0.275 0.28 0.285 0.29 0.295 0.3
Time (sec)
Fig. 9.4. Comparison of estimated and combustor pressure
6.---,,---,----,----,----,----,----,---,,---,----,
-2 W1
W2
W3
-4 W4
W5
W6
_6~ __ ~ __ ~ ____ ~ __ ~____ ___ L_ _ _ _L __ _
~ ~ __~ ____ ~
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Time (sec)
Fig. 9.5. Weights Wl,W2, ... ,W6 of the neural network

W7
WS
W9
10 W10
W11
W12
5
:c
(f)
OJ
'iii
S ~~
o '"
(?""'_" _.' ."7".'.-:.'.--:-.'. -: : ~ : :-:::: ..
I·
./
-5·i
I·
I
I
-10L---~--~--~--~--~--~----~--~--~--~
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 O.S 0.9
Time (sec)
Fig. 9.6. Weights W7, Ws, ... , W12 of the neural network
U sing the output model, the output predictor is of the form

N
fj(t + T) = L Wi(t) sin(1]i(t + T) + ¢i) (9.27)
i=l
The performance of the output predictor is very good before the active control
is used, as shown in Figure 9.7. After the active control is applied to the system,
the prediction performance of the output predictor becomes worse, but is still
quite good. This is because the active control changes the characteristics of
the closed-loop system with time. The difference between the measured and
predicted pressures is small, as shown in Figure 9.8. This also shows that
though there is noise in the system, the output predictor has disturbance-
rejection properties.
To stabilise and attenuate the system oscillation, a simple controller is
introduced which is a linear function of the output prediction, that is,
n
u(t) = -Kc L Wi(t) sin(1]i(t + T) + ¢i) (9.28)
i=l
where Kc is the feedback gain. The active control response is given in Figure
9.9, where it can be seen that when the active control is switched on at 0.5
seconds, the pressure is rapidly reduced. The amplitudes of the combustor
acoustic modes increase with time before active control is switched on at 0.5
second. When the active control is applied, these amplitudes reduce gradually.
The behaviour of the six modes is illustrated in Figure 9.10.
10
5 J V
'.
V
~ ~ ~
~ ~
0
~
:::J
<fl
I
<fl
(J)
0::
-5 ~
-10
1== Combustor
Predictor
1
-15
0.26 0.265 0.27 0.275 0.28 0.285 0.29 0.295 0.3
Time (sec)
Fig. 9.7. Measured and predicted pressure before active control
2r----,-----.-----r----,-----.----,r----,----~
1.5
~ ,\
"i ~ i'
N'}! J
I
0.5 I , '~I,1,1I I' !I
,I I
I 'I .1) ~I\I 'I' \"
, I' /
"
~~ o~' I, I .
,I
£ -0.5 .!
'j
.~ \ /
-1 II
~,I
I'
I ,I
-1.5
Ii
-2 Combustor I
Predictor
-2.5 '--------'-------'-------'------'------'--------''-------'-------'
1.26 1.265 1.27 1.275 1.28 1.285 1.29 1.295 1.3
Time (sec)
Fig. 9.8. Measured and predicted pressure after active control

15,-----,------,------,-----~--.__,------,_----_,----_,------._----_,
10
Fig. 9.9. Active control response
3
mode 6
2.5
2
<J)
Q)
"0
~
Q.
E 1.5
ell
Q)
"0
0
::2:
Time (sec)
Fig. 9.10. Amplitudes of each mode

Another performance measure of the system is the power spectrum of the

pressure. The power spectrum of the system, with and without active control,
is shown in Figure 9.11. The power in each mode is reduced by 20 dB when
the active control is applied.
40,------------------,------------------,------------------,
30
·40
active control
·50
·60L------------------L------------------~----------------~
o 500 1000 1500
Fig. 9.11. Power spectrum of the system, with and without active control
9.6 Active Control of an Experimental Combustor
Output model based predictive control using neural networks has also been
evaluated using an atmospheric combustion test rig with a commercial com-
bustor. A schematic diagram of the active control system for the combustor
test rig is shown in Figure 9.12.
The active controller consists mainly of the weight-learning algorithm for
the neural network based output model, the output predictor and the feedback
controller. These were designed using SIMULINK C-code S-functions. They
were implemented using the Math Works Real-Time Workshop, connected to
a dSPACE board based around the TMS320. The actuator was a loudspeaker,
which was installed on the outer wall of the combustion chamber. After scaling
for commercial reasons, the main results of active control for the combustor
with two modes are shown in Figure 9.13 and Figure 9.14. When the active
control is switched on at 1.5 seconds, the pressure is reduced. It can be seen
that these test results are consistent with the simulation results.
9.6 Active Control of an Experimental Combustor 191
Speaker
drive signal
d!!sa dSPACE control
and analysis
system
t
m
Chamber
pressure
signals
Main Illlllllim
Secondary
(2 BAR SU PPL YI
Pilot
Fig. 9.12. Active control system of the combustor test rig
1.5,--------,---------,---------,---------,---------,--------,
0.5
-0.5
-1
Active control on
_1.5L-________ ________
~ ~ _________ L _ _ _ _ _ _ _ _ _ L _ _ _ _ _ _ _ _~_ _ _ _ _ _ _ _~
o 0.5 1.5 2 2.5 3
Fig. 9.13. Pressure response, before and after active control

5,----,-----,-----,-----,-----,-----,----,,----,-----,-----,
l-
i\
o i \ control off
I \
-5
-10
\ \
control on
\\r!N'
\ ~;\
/,
{r,: . '\ f'
. V
\' '!
I
-20
_25L____ L_ _ _ _ _ _ _ _L __ __ L_ _ _ _ _ _ _ _L____ L_ _ _ _ _ _ _ _L __ _
~ ~ ~ ~
o 50 100 150 200 250 300 350 400 450 500
Fig. 9.14. Power spectral density of the combustor, with and without control
9.7 Summary
An active control strategy for combustion systems has been presented. The
strategy is based on an output model, an output predictor and a feedback
controller. Neural networks were used to reconstruct the measured output ac-
curately, using minor knowledge about the combustion system. Unlike a classi-
cal observer, only a measured output signal is required. To overcome the time
delay of the system which is often very large compared with the sampling pe-
riod, an output predictor has been developed. An output-feedback controller
was introduced which uses the output of the predictor to suppress instability
in the combustion process. The active control of a simulated unstable combus-
tor system with six modes was used to demonstrate how each mode can be
extracted and dealt with separately. The performance of the strategy was also
illustrated in a combustor test rig with two dominant modes. Since the output
prediction is accurate despite the need for only limited a priori knowledge,
the approach will be useful for combustion systems where the behaviour is not
fully understood.
REFERENCES
Akaike, H. (1974). A new look at the statistical model identification. IEEE

Transactions on Automatic Control, vol. AC-19, pp. 716-723.
Akansu, H. (1992). Multiresolution Signal Decomposition, Transforms, Sub-
bands and Wavelets. Academic Press, London.
Anderson, C. W. (1989). Learning to control an inverted pendulum using neu-
ral networks. IEEE Control Systems Magazine, vol. 9, pp. 31-37.
Anderson, J., J. Platt and D. Kirk (1993). Analog VLSI chip for radial basis
functions. Advances in Neural Information Processing Systems, vol. 5, pp.
765-772.
Annaswamy, A. M. and A. F. Ghoniem (1995). Active control in combustion
systems. IEEE Control Systems Magazine, vol. 15, no. 6, pp. 49-63.
Antsaklis, P. J. (ed.) (1990). Special issue on neural networks in control sys-
tems, IEEE Control Systems Magazine, vol. 10, no. 3.
Astrom, K. J. (1970). Introduction to Stochastic Control Tlteory. Academic
Press, New York.
Barnhill, R. (1983). A survey of the representation and design of surfaces.
IEEE Transactions on Computer Graphics, vol. 3, pp. 9-16.
Barto, A. G. (1990). Neural Networks for Control. MIT Press, Cambridge,
MA.
Basseville, M. (1989). Distance measures for signal processing and pattern
recognition. Signal Processing, vol. 18, pp. 349-369.
Baum, E. and D. Haussler (1989). What size net gives valid generalization.
Neural Computation, vol. 1, no. 1, pp. 151-160.
Bengio, Y. (1992). Radial basis functions for speech recognition. In P. Laface
and R. DeMori (eds), Speech Recognition and Understanding, Recent Ad-
vances, Springer-Verlag, Berlin, pp. 293-298.
Billings, S. A. (1980). Identification of nonlinear systems - a survey. lEE Ero-
ceedings, Eart D, vol. 127, pp. 272-285.
Billings, S. A. and W. S. F. Voon (1986). A prediction-error and stepwise-
regression estimation algorithm for non-linear systems. InteTTtational Jom·-
nal of Control, vol. 44, no. 1, pp. 803-822.
Billings, S. A., S. Chen and M . .T. Korenberg (1988). Identification of MIvIO
nonlinear systems using a forward-regression orthogonal estimator. Inter·-
national Journal of Control, vol. 49, pp. 2157-2189.
194 References
Billings, S. A., M. J. Korenberg and S. Chen (1989). Identification of non-

linear output-affine systems using an orthogonal least squares algorithm.
International Journal of Systems Science, vol. 19, pp. 1559-1568.
Billings, S. A. and S. Chen (1992). Neural networks and system identification.
In K. Warwick et al. (eds), Neural Networks for Systems and Control, Peter
Peregrinus, London, pp. 181-205.
Bounds, D., P. Lloyd and B. Mathew (1990). A comparison of neural net-
works and other pattern recognition approaches to the diagnosis of low
back disorders. Neural Networks, vol. 3, pp. 583-591.
Breiman, L. (1993). Hinging hyperplanes for regression, classification and
function approximation. IEEE Transactions on Information Theory, vol.
39, pp. 999-1013.
Broomhead, D. S. and D. Lowe (1988). Multivariable functional interpolation
and adaptive networks. Complex Systems, vol. 2, pp. 321-355.
Brown, M. and C. J. Harris (1994). Neurofuzzy Adaptive Modelling and Con-
trol. Prentice-Hall, New York.
Calderon, A. and A. Zygmund (1952). On existence of certain singular inte-
gral. Acta Mathematics, vol. 88, pp. 85-139.
Canudas, C. de Wit and J. Carrillo (1990). A modified EW-RLS algorithm
for systems with bounded disturbances. Automatica, vol. 26, pp. 599-606.
Carlin, M., T. Kavli and B. Lillekjendlie (1994). A comparison of 4 methods
for nonlinear data modelling. Chemometrics and Intelligent Laboratory Sys-
tems, vol. 23, no. 1, pp. 163-177.
Casdagli, M. (1989). Nonlinear prediction of chaotic time series. Physica D,
vol. 35, pp. 335-356.
Casdagli, M., D. Jardins, S. Eubank, J. Farmer, J. Gibson and J. Theiler
(1994). Nonlinear modelling of chaotic time series: theory and applications.
In K. M. Jong and J. Stringer (eds), Applied Chaos, John Wiley and Sons.
Chan, L. W. and F. Fallside (1987). An adaptive training algorithm for back
propagation training networks. Computer Speech and Language, vol. 2, no.
3, pp. 205-218.
Chauvin, Y. (1989). A back-propagation algorithm with optimal use of hidden
units. In D. S. Touretzky (ed.), Neural Information Processing Systems 1.,
Morgan Kaufmann, San Mateo, CA.
Chen, D. S. and R. C. Jain (1994). A robust back propagation learning algo-
rithm for function approximation. IEEE Transations on Neural Networks,
vol. 5, no. 3, pp. 467-479.
Chen, S. and S.A. Billings (1989). Representations of non-linear systems: the
NARMAX model. International Journal of Control, vol. 49, no. 3, pp. 1013-
1032.
Chen, S. , S. A. Billings, C. Cowan and P. Grant (1990). Practical identi-
fication of NARMAX models using radial basis functions. International
Journal of Control, vol. 52, no. 6, pp. 1327-1350.
Chen, S. and S. A. Billings (1992). Neural networks for nonlinear dynamic
system modelling and identification. International Journal of Control, vol.
References 195
56, no. 2, pp. 319-346.

Chen, S., S. A. Billings and P. M. Grant (1990). Nonlinear system identifica-
tion using neural networks. International Journal of Control, vol. 51, no.
6, pp. 1191-1214.
Chen, S. and J. Wigger (1995). A fast orthogonal least squares algorithm for
efficient subset model selection. IEEE Transactions on Signal Processing,
vol. 43, no. 7, pp. 1713-1715.
Chen, F. C. and H. K. Khalil (1995). Adaptive control of a class of nonlin-
ear discrete-time systems using neural networks. IEEE Transactions on
Automatic Control, vol. 40, no. 5, pp. 791-80l.
Chi, S. R., R. Shoureshi and M. Tenorio (1990). Neural networks for system
identification. IEEE Control Systems Magazine, vol. 10, pp. 31-34.
Chui, C. K. (1992). An Introduction to Wavelets. Academic Press, San Diego.
Clarke, D. W., C. Mohtadi and P. S. Tuffs (1987). Generalized predictive
control- part 1. The basic algorithm. Automatica, vol. 23, no. 2, pp. 149-
160.
Coca, D. and S. A. Billings (1997). Continuous-time system identification for
linear and nonlinear systems using wavelet decompositions. International
Journal of Bifurcation and Chaos, vol. 7, no. 1, pp. 87-96.
Cover, T. (1965). Geometrical and statistical properties of systems of linear
inequalities with applications to pattern recognition. IEEE Transactions
on Electronic Components, vol. 14, pp. 326-334.
Cutler, C. R. and B. 1. Ramaker (1980). Dynamic matrix control - a com-
puter control algorithm. Proceedings of the American Control Conference,
San Francisco.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function.
Mathematics of Control, Signals and Systems, vol. 2, pp. 303-314.
Daley, S. and G. P. Liu (1998). Active control of combustion instabilities. Pro-
ceedings of the International Conference on Control '98, vol. 1, pp. 416-42l.
Davis, L. (ed.) (1991). Handbook of Genetic Algorithms. Van Nostrand Rein-
hold, New York.
Daubechies, 1. (1988). Orthonormal bases of compactly supported wavelets.
Communication on Pure and Applied Mathematics, vol. 16, pp. 909-996.
Daubechies, 1. (1990). The wavelet transform, time-frequency localization and
signal analysis. IEEE Transactions on Information Theory, vol. 36, no. 5,
pp. 961-1005.
Delbruck, T. (1991). Bump circuts for computing similarity and disimilarity
of analog voltages. International Joint Conference on Neural Networks.
Dubois, D. and H. Prade (1991). Fuzzy-sets in approximate reason - 2: in-
feence with possibility. Fuzzy Sets and Systems, vol. 40, no. 1, pp. 143-202.
Duda, R. O. and P. E. Hart (1973). Pattern Classification and Scene Analysis.
John Wiley, New York.
Emelyanov, S. V. (1967). Variable Structure Control Systems. Nauka, Moscow
(in Russian).
196 References
Fabri, S. and V. Kadirkamanathan (1996). Dynamic structure neural networks

for stable adaptive control of nonlinear systems. IEEE Transactions on
Neural Networks, vol. 7, pp. 1151-1167.
Fahlman, S. E. and C. Lebiere (1990). The cascade-correlation architecture. In
D. S. Touretzky (ed.), Advances in Neural Information Processing Systems
2, Morgan Kaufmann, San Mateo, CA.
Fallside, F. (1989). Analysis of linear predictive data such as speech and of
arma processes by a class of single-layer connectionist models. In NATO
Advanced Research Workshop on neuro Computing, algorithms, Architec-
tures and Applications, NATO/ ASI Series, Springer-Verlag.
Feng, G. (1994). Stable adaptive predictor for nonlinear systems using neural
networks. Computers and Electrical Engineering, vol. 20, no. 5, pp. 383-390.
Fogel, E. and Y. F. Huang (1982). On the value of information in system
identification - bounded noise case. Automatica, vol. 18, no. 2, pp. 229-
238.
Fonseca, C. M., E. M. Mendes, P. J. Fleming and S. A. Billings (1993). Nonlin-
ear model term selection with genetic algorithms. Proceedings of lEE/IEEE
Workshop on Natural Algorithms for Signal Processing, pp. 27/1-27/8.
Friedman, J. H. (1991). Multivariate adaptive regresion splines. The Annals
of Statistics, vol. 19, no. 1, pp. 1-67.
Fu, K. S. (1970). Learning control systems - review and outlook. IEEE Trans-
actions on Automatic Control, vol. 16, pp.210-221.
Fung, C., S. A. Billings and W. Luo (1996). On line supervised adaptive track-
ing using radial basis function neural networks. Neural Networks, vol. 9,
pp. 1597-1617.
Fung, Y. and V. Yang (1991). Active control of non-linear pressure oscilla-
tions in combustion chambers. The 27th AIAA/ASME/SAE/ASEE Joint
Propulsion Conference, Sacramento, CA.
Furuta, K. (1990). Sliding mode control of a discrete system. Systems and
Control Letters, vol. 14, pp. 145-152.
Furuta, K. (1993). VSS type self-tuning control. IEEE Transactions on Indus-
trial Electronics, vol. 40, no. 1, pp. 37-44.
Gabor, D. (1946). Theory of communication. Journal of lEE, vol. 93, pp. 429-
457.
Geman, S., E. Bienenstock and R. Boursat (1992). Neural networks and the
bias/variance dilemma. Neural Computation, vol. 4, pp. 1-58.
Glesner, M. and W. Pochmuller (1994). Neurocomputers: An Overview of
Neural Networks in VLSI. Neural Computing Series, Chapman and Hall.
Goldberg, D. E. (1989). Genetic Algorithm in Search, Optimization, and Ma-
chine Learning. Addison Wesley Publishing Company, Wokingham, UK.
Goodwin, G. C. and D. Q. Mayne (1987). A parameter estimation perspective
of continuous time model reference adaptive control. Automatica, vol. 23,
no. 1, pp. 57-70.
Goodwin, G. C. and K. S. Sin (1984). Adaptive Filtering Prediction and Con-
trol. Prentice-Hall, Englewood Cliffs, N J.
References 197
Grossmannand, A. and J. Morlet (1984). Decomposition of Hardy functions

into square integrable wavelets of constant shape. SIAM Journal of Math-
ematic Analysis, vol. 15, pp. 723-736.
Guez, A., J. 1. Elibert and M. Kam (1988). Neural network architecture for
control. IEEE Control Systems Magazine, vol. 8, pp. 22-25.
Haber, H. and H. Unbehauen (1990). Structure identification of nonlinear dy-
namic systems - a survey on input/output approaches. Automatica, vol.
26, no. 4, pp. 651-677.
Hajela, P. and C. Y. Lin (1992). Genetic search strategies in multicriterion
optimal design. Structural Optimization, vol. 4, pp. 99-107.
Hanson S. J. and L. Y. Pratt (1989). Comparing biases for minimal network
construction. In D. S. Touretzky (ed.), Neural Information Processing Sys-
tems 1., Morgan Kaufmann, San Mateo, CA.
Harris, J. (1987). A new approach to surface reconstruction: the coupled
depth/slope model. IEEE First International Conference on Computer Vi-
sion, London, pp. 277-283.
Harris, J. (1988). An analog VLSI chip for thin-plate surface interpolation.
NIPS '88 Conference.
Harris, J. (1994). Implementing radial basis function using bump-resistor net-
works. IEEE ICNN, Orlando, F1.
Haykin, S. (1994). Neural Networks: A Comprehensive Foundation. Macmillan
College Publishing Company, New York.
Hinton, G. E. (1987). Connectionist learniing procedures. Artificial Intelli-
gence, vol. 40, pp. 185-234.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. The Uni-
versity of Michigan Press, Ann Arbor.
Holzfuss, J. and J. Kadtke (1993). Global nonlinear noise reduction using ra-
dial basis functions. International Journal of Bifurcation and Chaos, vol.
3, no. 3, pp. 589-596.
Hung, J. Y., W. Gao and J. C. Hung (1993). Variable structure control: A
survey. IEEE Transactions on Industrial Electronics, vol. 40, no. 1, pp.
2-22.
Hunt, K. J. and D. Sbararo (1991). Neural networks for nonlinear internal
model control. lEE Proceedings, Part D, vol. 138, pp. 431-438.
Hunt, K. J., D. Sbararo, R. Zbikowski and P. J. Gawthrop (1992). Neural
networks for control systems - a survey. Automatica, vol. 28, no. 6, pp.
1083-1112.
IEEE (1996). Scanning the special issue on wavelets. Proceedings of the IEEE,
vol. 84, no. 4.
Ioannou, P. A. and P. V. Kokotovic (1983). Adaptive Systems with Reduced
Models. Springer-Verlag, New York.
Ioannou, P. A. and A. Datta (1991). Robust adaptive control: design, analysis
and robustness bounds. In Foundations of Adaptive Control, P. V. Koko-
tovic (ed.), Springer-Verlag, Berlin, pp. 71-152.
198 References
Ioannou, P. A. and Tsakalis (1986). A robust direct adaptive control. IEEE

Transactions on Automatic Control, vol. 31, pp. 1033-1043.
Isidori, A. (1989). Nonlinear Control Systems: An Introduction. Springer-
Verlag, Berlin.
Itkis, Y. (1976). Control Systems of Variable Structure. Wiley, New York.
Jang, J. and C. Sun (1993). Functional equivalence between radial basis func-
tion networks and fuzzy inference systems. IEEE Transactions on Neural
Networks, vol. 4, no. 1, pp. 156-159.
Kadirkamanathan, V. (1991). Sequential Learning in Artificial Neural Net-
works. Ph.D Thesis, University of Cambridge, UK.
Kadirkamanathan, V. (1995). Bayesian inference for basis function selection
in nonlinear system identification using genetic algorithms. In Maximum
entroby and bayesian methods, J. Skilling and S. Sibisi (eds), Kluwer, Dor-
drecht.
Kadirkamanathan, V. and G. P. Liu (1995). Robust identification with neural
networks using multiobjective criteria. Proceedings of 5th IFAC Symposium
on Adaptive Systems in Control and Signal Processing, Budapest, Hungary,
pp. 305-310.
Kadirkamanathan, V. and M. Niranjan (1992). Application of an architec-
turally dynamic network for speech pattern classification. Proceedings of
the Institute of Acoustics, vol. 14, part 6, pp. 343-350.
Kadirkamanathan, V. and M. Niranjan (1993). A function estimation ap-
proach to sequential learning with neural Networks. Neural Computation,
vol. 5, pp. 954-957.
Karakasoglu, A., S. L. Sudharsanan and M. K. Sundareshan (1993). Identifi-
cation and decentralized adaptive control using neural networks with ap-
plication to robotic manipulators. IEEE Transactions on Neural Networks,
vol. 4, no. 6, pp. 919-930.
Keyser, R. M. C. De and A. R. van Cauwenberghe (1985). Extended predic-
tion self-adaptive control. Proceedings of the 7th IFAC Symposium on Iden-
tification and System Parameter Estimation, York, UK, pp. 1255-1260.
Kohonen, T. (1984). Self-Organization and Associate Memory. Springer-
Verlag, New York, second edition, 1984.
Korenberg, M., S. A. Billings, Y. P. Liu and P. J. Mcllroy (1988). Orthogonal
parameter estimation algorithm for non-linear stochastic systems. Interna-
tional Journal of Control, vol. 48, no. 1, pp. 193-210.
Kumar, P. and E. Foufoula-Georgiou (1993). A multicomponent decomposi-
tion of spatial rainfall fields -1: segregation of large and small scale features
using wavelet transforms. Water Resources Research, vol. 29, no. 8, pp.
2515-2532.
Kuschewski, J. G., S. Hui and S. H. Zak (1993). Application of feedforward
neural networks to dynamical system identification and control. IEEE
Transactions on Control Systems Technology, vol. 1, no. 1, pp. 37-49.
Landau, 1. D. (1979). Adaptive Control - The Model Reference Approach.
Marcel Dekker, Inc., New York.
References 199
Lane, S. H., D. A. Handelman and J. J. Gelfand (1989). Development of adap-

tive B-splines using CMAC neural networks. Proceedings of the Interna-
tional Joint Conference on Neural Networks, Washington D.C.
Lapedes, A. S. and R. M. Farber (1987). Nonlinear signal processing using
neural networks: prediction and system modelling. Technical Report LA-
UR-87-2662, Los Alamos National Laboratory.
Lapedes, A. S. and R. M. Farber (1988). How neural networks work. In Neu-
ral Information Processing Systems, D. Z. Anderson (ed.), pp. 442-456,
American Institute of Physics, New York.
LeCun, Y., J. S. Denker and S. A. Solla (1990). Optimal brain damage. In
Advances in Neural Information Processing Systems 2 , D. S. Touretzky
(ed.), Morgan Kaufmann, San Mateo, CA, pp. 598-605.
Lee, T. and W. Tan (1993). Real-time parallel adaptive neural network con-
trol for nonlinear servomechanisms - an apprach using direct adaptive tech-
niques. Mechatronics, vol. 3, no. 6, pp. 705-725.
Leonard, J. and M. Kramer (1991). Radial basis function networks for classi-
fying process faults. IEEE Control Systems, vol. 11, no. 3, pp. 31-38.
Leontaritis,1. J. and S. A. Billings (1985). Input-output parametric models
for nonlinear systems, Parts I and II. International Journal of Control, vol.
41, no. 1, pp. 303-344.
Leontaritis,1. J. and S. A. Billings (1987). Model selection and validation
methods for non-linear systems. International Journal of Control, vol. 45,
no. 3, pp. 311-341.
Leung, H. and S. Haykin (1991). Neural network modelling of radar backscat-
ter from an ocean surface using chaos theory. Proceedings of the SPIE, vol.
1565, pp. 279-286.
Linz, P. (1979). Theoretical Numerical Analysis. John Wiley, New York.
Lippmann, R. P. (1987). An introduction to computing with neural nets.
IEEE ASSP Magazine, vol. 4, pp. 4-22.
Liu, G. P. (2000). Neural-learning control of nonlinear dynamical systems.
Proceedings of the lEE Workshop on Learning Systems for Control, Birm-
ingham.
Liu, G. P., S. A. Billings and V. Kadirkamanathan (1998). Nonlinear system
identification using wavelet networks. Proceedings of the UKA CC Interna-
tional Conference on Control '98, Swansea, pp. 1248-1253.
Liu, G. P., S. A. Billings and V. Kadirkamanathan (1999). Nonlinear system
identification via variable wavelet networks,. Proceedings of the 14th IFAC
World Congress, Beijing, vol. H, pp. 109-114.
Liu, G. P., S. A. Billings and V. Kadirkamanathan (2000). Identification of
nonlinear dynamical systems using wavelet networks, International Journal
of Systems Science, vol. 31, no. 12, pp. 1531-1541.
Liu, G. P. and S. Daley (1999a). Design and implementation of adaptive pre-
dictive control for a combustor NOx process. Journal of Control Process,
vol. 9, pp.485-491.
200 References
Liu, G. P. and S. Daley (1999b). Output model based predictive control for
unstable combustion systems using neural networks. Control Engineering
Practice, vol. 7, pp. 591-600.
Liu, G. P. and S. Daley (1999c). Neural network based predictive control
of unstable combustion systems. Proceedings of the 14th IFAC World
Congress, Beijing, vol. J, pp. 421-426.
Liu, G. P. and S. Daley (1999d). Adaptive predictive control of combustor
NOx emissions. Proceedings of the 14th IFAC World Congress, Beijing,
vol. 0, pp. 91-96.
Liu, G. P. and S. Daley (2001) Adaptive predictive control of combustor NOx
emissions. Control Engineering Practice, vol. 9., no. 6, pp. 631-638.
Liu, G. P. and V. Kadirkamanathan (1995). Learning with Multiobjective Cri-
teria. Proceedings of the Fourth International Conference on Artificial Neu-
ral Networks, Cambridge, UK, pp. 53-58.
Liu, G. P. and V. Kadirkamanathan (1999). Multiobjective criteria for non-
linear model selection and identification with neural networks. lEE Pro-
ceedings, Part D, vol. 146, no. 5, pp. 373-382.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1995). Sequential iden-
tification of nonlinear systems by neural networks. Proceedings of the 3rd
European Control Conference, Rome, pp. 2408-2413.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1996a). Nonlinear pre-
dictive control using neural networks. Proceedings of the UKACC Interna-
tional Conference on Control '96, UK, vol. 2, pp. 746-751.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1996b). Variable neu-
ral networks for nonlinear adaptive control. Preprints of the 13th IFAC
Congress, San Francisco, vol. F, pp.181-186.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1996c). Stable sequential
identification of continuous nonlinear dynamical systems by growing RBF
networks. International Journal of Control, vol. 65, no. 1, pp. 53-69.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1997a). On-line identifi-
cation of nonlinear systems using Volterra polynomial basis function neural
networks. Proceedings of the 4th European Control Conference, Brussels.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1997b). Variable structure
control for nonlinear discrete systems using neural networks. Proceedings
of the 4th European Control Conference, Brussels.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1998a). On-line identifi-
cation for nonlinear systems using Volterra polynomial neural networks.
Neural Networks, vol. 11, pp. 1645-1657.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1998b). Predictive con-
trol for nonlinear systems using neural networks. International Journal of
Control, vol. 71, no. 6, pp. 1119-1132.
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1999a). Variable neural
networks for adaptive control of nonlinear systems. IEEE Transactions on
Systems, Man, and Cybernetics, vol. 29, no. 1, pp. 34-43.
References 201
Liu, G. P., V. Kadirkamanathan and S. A. Billings (1999b). Neural network

based variable structure system nonlinear control. International Journal of
Systems Science, vol. 30, no. 10, pp. 1153-1164.
Liu, G. P. and R J. Patton (1998). Eigenstructure Assignment for Control
System Design. John Wiley & Sons, Chichester.
Liu, G. P., J. B. Yang and J. F. Whidborne (2001). Multiobjective Optimisa-
tion and Control. Research Studies Press Ltd., Taunton, UK.
Ljung, L. and T. Soderstrom (1983). Theory and Practice of Recursive Iden-
tification. MIT Press, Cambridge, MA.
Ljung, L. and T. Glad (1994). Modelling of Dynamic Systems. Information
and System Sciences Series, Prentice Hall, Englewood Cliffs, NJ.
Lohninger, H. (1993). Evaluation of neural networks basied on radial basis
functions and their applications to the prediction of boiling points from
structureal parameters. Journal of Chemical Information and Computer
Sciences, vol. 33, no. 5, pp. 130-744.
Luenberger, D.G. (1984). Linear and Nonlinear Programming. Addson-
Wesley, New York, second edition.
Luo, W. and S. A. Billings (1995). Adaptive model selection and estimation
for nonlinear systems using a sliding data window. Signal Processing, vol.
46, pp. 179-202.
Luo, W. and S. A. Billings (1998). Structure selective updating for nonlinear
models and radial basis function neural networks. International Journal of
Control and Signal Processing, vol. 12, pp. 325-345.
Luo, W., S. A. Billings and K. M. Tsang (1996). On line structure detection
and parameter estimation with exponential windowing for nonlinear sys-
tems. European Journal of Control, vol. 2, pp. 291-304.
MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, vol. 4,
no. 3, pp. 415-447.
Mallat, S. G. (1989a). A theory for multiresolution signal decomposition: the
wavelet representation. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, vol. 11, no. 7, pp. 674-693.
Mallat, S. G. (1989b). Multifrequency channel decompositions of images and
wavelet models. IEEE Transactions on Acoustic, Speech, and Signal Pro-
cessing, vol. 37, pp. 2091-2110.
McCulloch, W. W. and W. Pitts (1943). A logical calculus of the ideas immi-
nent in nervous activity. Bulletin of Mathematical Biophysics, vol. 5, pp.
115-133.
McManus, K. R, V. Vandsburger and C. T. Bowman (1990). Combustor per-
formance enhancement through direct shear layer excitation. Combustion
and Flame, vol. 82, pp. 75-92.
McIntosh, A. R., S. L. Shah and D. G. Fisher (1991). Analysis and tuning of
adaptive generalized predictive control. The Canadian Journal of Chemical
Engineering, vol. 69, pp. 97-110.
Mead, W., P. Bowling, S. Brown, R. Jones, C. Barnes, H. Gibson, J. Goulding
and Y. Lee (1992). Optimisation and control of a small-angle negative
202 References
ion source using an on-line adaptive controller based on the connectionist

normalised local spline neural network. Nuclear Instruments and Methods
in Physics Research, vol. B72, pp. 271-289.
Meyer, Y. (1993). Wavelets and Operators. Cambridge Studies in Advanced
Mathematics, Cambridge University Press, Cambridge.
Miller, T. M., R. S. Sutton III and P. J. Werbos (eds) (1990). Neural Networks
for Control. MIT Press, Cambridge, MA.
Morningred, J. D., D. A. Mellichamp and D. E. Seborg (1991). A multivari-
able adaptive nonlinear predictive controller. Proceedings of the American
Control Conference, Boston, pp. 2776-2781.
Moody, J. and C. J. Darken (1989). Fast learning in networks of locally-tuned
processing units. Neural Computation, vol. 1, pp. 281-294.
Morlet, J. , G. Arens, 1. Fourgeau and D. Giard (1982). Wave propagation
and sampling theory. Geophysics, vol. 47, pp. 203-236.
Mozer, M. C. and P. Smolensky (1989). Skeletonization: A technique for trim-
ming the fat from a network via relevance assignment. In D. S. Touretzky
(ed.), Advances in Neural Information Processing Systems 1, Morgan Kauf-
mann, San Mateo, CA.
Narendra, K. S. and A. M. Annaswamy (1987). A new adaptive law for robust
adaptation with persistent excitation. IEEE Transactions on Automatic
Control, vol. 32, no. 2, pp. 134-145.
Narendra, K. S. and A. M. Annaswamy (1989). Stable Adaptive Systems.
Prentice-Hall, Engelwood Cliffs, NJ.
Narendra, K. S. and K. Parthasarathy (1990). Identification and control of
dynamical systems using neural networks. IEEE Transactions on Neural
Narendra, K. S. and K. Parthasarathy (1991). Gradient methods for the op-
timization of dynamical systems containing neural networks. IEEE Trans-
actions on Neural Networks, vol. 2, pp. 252-262.
Narendra, K. S. and S. Mukhopadhyay (1997). Adaptive control using neu-
ral networks and approximate models. IEEE Transactions on Neural Net-
works, vol. 8, no. 3, pp. 475-485.
Neifield, M., A. Yamamura, S. Rakshit, S. Kobayashi and D. Psaltis (1991).
Optical disks in optical computing. IEEE International Joint Conference
on Neural Networks, pp. 442-447.
Nerrand, 0., P. Rousselragot, L. Personnaz and G. Dreyfus (1994). Training
recurrent neural networks: why and how? An illustration in dynamical
process modeling. IEEE Transations on Neural Networks, vol. 5, no. 2, pp.
178-184.
Newland, D. E. (1993). An Introduction to Random Vibrations, Spectral and
Wavelet Analysis. Longman Scientific & Technical, Harlow, third edition.
Neumeier, Y. and B. T. Zinn (1995). Active control of combustion instabilities
using real time identification of unstable combustor modes. Proceedings of
the 4th IEEE Conference on Control Applications, pp. 691-698.
References 203
Nie, J. and D. Linkens (1993). Learning control using fuzzyfied self-organizing

radial basis function network. IEEE Transactions on Fuzzy Systems, vol.
1, no. 4, pp. 280-207.
Nijmeijer, Hand A. J. van der Schaft (1990). Nonlinear Dynamical Control
Systems. Springer-Verlag, New York.
Niranjan, M. and F. Fallside (1990). Neural networks and radial basis func-
tions for classifying static speech patterns. Computer Speech and Language,
vol. 4, pp. 275-289.
Ortega, R. and R. Lozano-Leal (1987). Reformulation of the parameter iden-
tification problem for systems with bounded disturbances, Automatica, vol.
23, no. 2, pp. 247-251.
Padmanabhan, K. T., C. T. Bowman and J. D. Powell (1995). An adaptive
optimal combustion control strategy. Combustion and Flame, vol. 100, pp.
101-110.
Pan, Y. and K. Furuta (1995). VSS approach to the design of robust digital
controller using transfer equation. European Journal of Control, vol. 1, no.
2, pp. 166-173.
Parisini, T. and R. Zoppoli (1993). Radial basis functions and multilayer feed-
forward neural networks for optimal control of nonlinear stochastic systems.
IEEE International Joint Conference on Neural Networks 1993, vol. 1, pp.
1853-1858.
Pati, Y. C. and P. S. Krishnaprasad (1990). Analysis and synthesis of feedfor-
ward neural networks using discrete affine wavelet transformations. Tech-
nical Report TR 90-44, Electrical Engineering Department, University of
Maryland at College park.
Patton, R. J. and G. P. Liu (1994). Robust control design via eigenstructure
assignment, genetic algorithms and gradient-based optimization. lEE Pro-
ceedings, Part D, vol. 141, no. 3, pp. 202-208.
Platt, J. (1991). A resource allocating network for function interpolation. Neu-
ral Computation, vol. 4, no. 2, pp. 213-225.
Poggio, T. and F. Girosi (1990a). Regularization algorithms for learning that
are equivalent to multilayer networks. Science, vol. 247, pp. 978-982.
Poggio, T. and F. Girosi (1990b). Networks for approximation and learning.
Proceedings of the IEEE, vol. 78, no. 9, pp. 1481-1497.
Polycarpou, M. M. and P. A. Ioannou (1991). Identification and control of
nonlinear systems using neural network models: design and stability anal-
ysis. Technical Report 91-09-01, Department of Electrical Engineering-
Systems, University of Southern California, USA.
Powell, M. J. D. (1981). Approximation Theory and Methods. Cambridge Uni-
versity Press, Cambridge.
Powell, M. J. D. (1987). Radial basis functions for multivariable interpolation:
a review. In J. C. Mason and M. G. Cox (eds), Algorithms for Approxima-
tion, pp. 143-167, Oxford University Press, Oxford.
Proll, T. and M. N. Karim (1994). Real-time design of an adaptive nonlinear
predictive controller. International Journal of Control, vol. 59, no. 3, pp.
204 References
863-889.
Psaltis, D., A. Sideris and A. A. Yamamura (1988). A multilayered neural
network controller. IEEE Control Systems Magazine, vol. 8, pp. 17-21
Qian, S., Y. C. Lee and R. D. Jones, C. W. Barnes and K. Lee (1990). The
function approximation with an orthogonal basis net. Technical Report,
Los Alamos National Laboratory.
Qin, S. Z., H. T. Su and T. J. McAvoy (1992). Comparison offour net learning
methods for dynamic system identification. IEEE Transactions on Neural
Rayner, P. J. and M. Lynch (1989). A new connectionist model based on a
nonlinear adaptive filter. Proceedings of the International Conference on
Acoustics, Speech and Signal Processing, pp. 1191-1194, Glasgow, Scotland.
Renals, S. (1989). Radial basis function network for speech pattern classifica-
tion. Electronics Letters, vol. 27, no. 7, pp. 437-439.
Richalet, J., S. Abu el Ata-Doss, Ch. Arber, H. B. Kuntze, A. Jacubasch and
W. Schill (1987). Predictive functional control application to fast and ac-
curate robots. Proceedings of the 10th IFAC Congress, Munich, Germany.
Rioul, O. (1993). Regular wavelets: a discrete-time approach. IEEE Transac-
tions on Signal Processing, vol. 41, no. 12, pp. 3572-3579.
Rioul, O. and M. Vetterli (1991). Wavelets and signal processing. IEEE Signal
Processing Magazine, vol. 8, pp. 14-38.
Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, World Sci-
entific, Singapore.
Robbins, H. and S. Munro (1951). Stochastic approximation method. Annals
of Mathematical Statisitics, vol. 22, pp. 400-407.
Roscheisen, M., R. Hofmann and V. Tresp (1992). Neural control for rolling
mills: incorporating domain theories to overcome data deficiency. Advances
in Neural Information Processing Systems, vol. 4, pp. 659-666.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information
storage and organisation in the brain. Psychological Review, vol. 65, pp.
386-408.
Rumelhart, D. E., G. E. Hinton and R. J. Williams (1986). Learning internal
representations by error propagation. In Parallel Distributed Proceesing:
Explorations in the Microstructure of Cognition. D. E. Rumelhard and J.
L. McClelland (eds), vol. 1: Foundations, Bradford Books/MIT Press, Cam-
bridge, MA.
Ruskai, M. (1991). Wavelets and Their Applications. Jones and Bartlett Pub-
lishers.
Sadegh, N. (1993). A perceptron network for functional identification and con-
trol of nonlinear systems. IEEE Transactions on Neural Networks, vol. 4,
no. 6, pp. 982-988.
Sanner, R. M. and J. J. E. Slotine (1992). Gaussian networks for direct adap-
tive control. IEEE Transactions on Neural Networks, vol. 3, no. 6, pp.
837-863.
References 205
Sastry, S. and M. Bodson (1989). Adaptive Control: Stability, Convergence

and Robustness. Prentice-Hall, Englewood Cliffs, NJ.
Schadow, K. C. and E. Gutmark (1989). Review of passive shear flow control
research for improved subsonic and supersonic combustion. AIAA Paper
89-2786.
Schetzen, M. (1980). The Volterra and Wiener Theories of Nonlinear Systems.
Wiley, New York.
Schaffer, J. D. (1985). Multiple objective optimization with vector valued ge-
netic algorithms. In Proceedings of the First International Conference on
Genetic Algorithms, J. J. Grefenstette (ed.) Lawrence Elbaum, pp. 93-100.
Schaffer, J. D., R. A. Caruana and L. J. Eshelman (1990). Using genetic search
to exploit the emergent behavior of neural networks. Physica D, vol. 42,
no. 1-3, pp. 244-248.
Sjoberg, J., Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P. Glorennec, H.
Hjalmarsson and A. Juditsky (1995). Nonlinear black-box modelling in
system identification: a unified overview. Automatica, vol. 31, no. 12, pp.
1691-1724.
Slotine, J. J. E. and W. Li (1991). Applied Nonlinear Control. Prentice-Hall
International, Englewood Cliffs, N J.
Smyth, P. (1991). On stochastic complexity and admissible models for neural
network classifiers. In R. P. Lippmann, J. Moody and D. S. Touretzky (eds),
Advances in Neural Information Processing Systems 3, Morgan Kaufmann,
San Mateo, CA.
Soeterboek, A. R. M., H. B. Verbruggen, P. P. J. Van den Bosch and H. Bulter
(1990). On the unification of predictive control algorithms. Proceedings of
the 29th IEEE Conference on Decision and Control, Honolulu, USA.
Solla, S. A., E. Levin and M. Fleisher (1988). Accelerated learning in layered
neural networks. Complex Systems, vol. 2, pp. 625-640.
Sprecher, D.A. (1965). On the structure of continuous functions of several
variables. Transactions of the American Mathematical Society, vol. 115,
pp. 340-355.
Stinchcombe, M. and H. White (1989). Universal approximation using feed-
forward networks with non-sigmoid hidden layer activation functions. Pro-
ceedings of the International Joint Conference on Neural Networks, Wash-
ington DC.
Strang, G. (1989). Wavelets and dilation equations: a brief introduction.
SIAM Review, vol. 31, no. 4, pp. 614-627.
Strichartz, R. (1993). How to make wavelets. American Mathematical
Monthly, vol. 3100, no. 6, pp. 539-556.
Takagi, H. and 1. Hayashi (1991). Nn-driven fuzzy reasoning. International
Journal of Approximate Reasoning, vol. 5, no. 3, pp. 191-212.
Tan, Y. and R. De Keyser (1994). Neural network based adaptive control.
Proceedings of Advances in Model-Based Predictive Control, pp. 358-369.
Tattersall, G., S. Foster and R. Johnston (1991). Single-layer lookup percep-
trons. lEE Proceedings, Part F, vol. 138, no. 1, pp. 46-54.
206 References
Tierno, J. E. and J. C. Doyle (1992). Multi mode active stabilisation of a Rijke

tube. ASME, DSC-vo138, Active Control of Noise and Vibration, pp. 65-68.
Utkin, V. 1. (1964). Variable structure system with sliding mode: a survey.
IEEE Transactions on Automatic Control, vol. 22, no. 2, pp. 212-222.
Vapnik, V. N. and A. Y. Chervonenkis (1971). On the uniform convergence of
relative frequencies of events to their probabilities. Theory of Probability
and its Applications, vol. XVI, no.3, pp. 264-280.
Vidyasagar, M. (1978). Nonlinear Systems Analysis. Prentice-Hall, Engle-
wood Cliffs, N.J.
Wang, H., G. P. Liu, C. J. Harris and M. Brown (1995). Advanced Adaptive
Control. Pergamon Press Ltd, Oxford.
Wang, L. X. (1993). Stable adaptive fuzzy control of nonlinear systems. IEEE
Transactions on Neural Networks, vol. 1, pp. 146-155.
Watkins, S. S. and P. M. Chau (1992). Different approaches to implementing
a radial basis function neurocomputer. RNNS/IEEE Symposium on Neu-
roinformation and Neurocomputer, vol. 2. pp. 1149-1155.
Weigend, A. S., B. A. Huberman and D. E. Rumelhart (1991). Generalization
by weight-elimination with application to forecasting. In R. P. Lippmann,
D. S. Touretzy and J. Moody (eds), Neural Information Processing Systems
3, Morgan Kaufmann, San Mateo, CA, pp. 875-882.
Weiss, L. (1994). Wavelets and wideband correlation processing. IEEE Signal
Processing Magazine, vol. 11, pp. 13-32.
Werbos, R. J. (1990). Backpropagation through time: what it does and how
to do it. Proceedings of the IEEE, vol. 78, pp. 1550-1560.
Whidborne, J. F. and G. P. Liu (1993). Critical Control Systems: Theory, De-
sign and Applications. Research Studies Press, UK.
White, H. (1989). Learning in artificial neural networks: a statistical perspec-
tive. Neural Computation, vol. 1, no. 4, pp. 425-464.
Whitehead, B. A. and T.D. Choate (1994). Evolving space-filling curves to
distribute radial basis functions over an input space. IEEE Transactions
on Neural Networks, vol. 5, no. 1, pp. 15-23.
Widrow, B. and M. E. Hoff (1960). Adaptive switching circuits. IRE
WESCON Convention Record, vol. 4, pp. 96-104.
Wilkins, M., C. Morris and L. Boddy (1994). A comparison of radial basis
function and backpropagation neural networks for identification of marine
phytoplankton from multivariate flow cytometry data. CABIOS, vol. 10,
no. 3, pp. 285-294.
Williams, R. J. and D. Zipser (1989). A learning algorithm for continuous run-
ning fully recurrent neural networks. Neural Computation, vol. 1, pp. 270-
280.
Willis, M. J., G. A. Montague, C. Di. Massimo, M. T. Tham and A. J. Morris
(1992). Artificial neural networks in process estimation and control. A uto-
matica, vol. 28, no. 6, pp. 1181-1187.
Xie, X. and R. Evans (1984). Discrete-time adaptive control for deterministic
time-varing systems. Automatica, vol. 20, pp. 309-319.
References 207
Ydstie, B. E. (1984). Extended horizon adaptive control. Proceedings of the

9th IFAC World Congress, Budapest, Hungary.
Young, P. C. (1984). Recursive Estimation and Time Series Analysis.
Springer-Verlag, Berlin.
Zadeh, L. (1973). Outline of a new approach to the analysis of complex sys-
tems and decision processes. IEEE Transactions on Systems, Man, and
Cybernetics, vol. 3, pp. 28-44.
Zadeh, L. (1994). Fuzzy logic, neural networks, and soft computing. Commu-
nication ACM, vol. 37, pp. 77-86.
Zakian, V. and U. Al-Naib (1973). Design of dynamical and control systems
by the method of inequalities. lEE Proceedings, vol. 120, pp. 1421-1427.
Zhang, Q. and A. Benveniste (1992). Wavelet Networks. IEEE Transactions
on Neural Networks, vol. 3, no. 6, pp. 889-898.
Zhu, K. Y., X. F. Qin and T. Y. Chai (1997). A new robust non-linear self-
tuning predictive control using neural networks. Proceedings IME: Journal
of Control System Engineering, vol. 211, pp. 439-446.
INDEX
activation function 2 Gaussian derivative wavelet 12

active control 183 Gaussian radial basis function network
active ~tabili~ation 181 8
adaptation algorithm 135 genetic algorithms 81
adaptive control 126 global RBFs 7
adaptive neural control 129 growing network 44
adder 2 growing neural network 155
adding operation 31
affine system 129 hardware implementation 23
approximation 14
inner product 24
B-spline function 115 inverse multiquadric 7
B-spline wavelet 115
lattice network 6
capacity of neural networks 16 learning 14
chaotic time series 21 least mean square algorithm 20
Chebyshev norm 24 least squares approximation 15
chromosomal representation 88 linguistic values 9
classification 20 linguistic variables 9
combustion dynamics 180 local RBFs 7
combustor 183, 190 Lyapunov function 153
connecting links 2 Lyapunov's second method 126
control horizon 146 Lyapunov stability theorem 126
crossover 89 Lp-norm 23
L=-norm 24
dilation factor 11 L 2 -norm 24
Diophantine equation 147
method of inequalities 81
eigenvalue 153 Mexican hat wavelet 12
elitism 89 mode observer 181
encoding solution 83 model of a neuron 2
error back propagation 17 multi-input multi-state 45
Euclidean norm 24 multilayer network 4
evaluation function 83 multilayer perceptron 1
multiobjective identification 86
feedforward neural network 4 multiobjective modelling 78
filtering 21 multiobjective performance criteria 80
fixed wavelet network 105 multi-quadratic 7
function approximation 14 multivariate polynomial expansions 9,
fuzzy neural networks 9 79
fuzzy reasoning mechanism 10 mutation 89
209
210 Index
N ARMA model 54 removing operation 31

N ARMAX model 79
neural network architectures 4 Santa Fe times series 73
neural networks 1, 2 scaling factor 11
neural predictor 152 Schur polynomial 164
neuron 2 selection of basis functions 33
nonrecursive predictor 148 sequential nonlinear identification 41
shifted logarithms 7
off-line structure selection 56 sigmoid function 3
one-to-one mapping 37 single-input single-state 36
on-line learning 152 single layer networks 4
on-line structure selection 59 squashing function 2
orthogonal least squares algorithm 56 stable nonlinear identification 38
orthonormal wavelet functions 12 structure selection 56
output horizon 146 switching function 164
output predictor 183 switching surface 164
perceptron 1 thin plate splines 7

piecewise-linear function 3 threshold function 3
polynomial basis function networks 8 translation factor 11
predictive control 144
predictive model 148 universal approximation 15
predictive neural control 150
predictor 146 variable grids 30
pseudo potential functions 7 variable neural networks 29,31
variable structure control 164
radial basis function 7 variable structure neural control 168
radial basis function networks 7 variable wavelet network 108
recurrent neural network 5
recursive learning 19, 60, 174 wavelet 11, 23, 104
recursive nonlinear identification 53 wavelet network 13 109
recursive predictor 148 wavelet neural networks 10
recursive weight learning algorithm 61 wavelet subspace 102
reference model 127 wavelet transform 11

Liu 2001

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Liu 2001

Uploaded by

Copyright:

Available Formats

Advances in Industrial Control

Springer-Verlag London Ltd.

ISBN 978-1-4471-1076-7 ISBN 978-1-4471-0345-5 (eBook)

British Library Cataloguing in Publication Data

69/3830-5432lO Printed on acid-free paper SPIN lO770966

Professor Michael J. Grimble, Professor ofIndustrial Systems and Director

Series Advisory Board

Professor Dr-Ing J. Ackermann

Professor LD. Landau

Professor D.W. Clarke

Professor A.J. Laub

Professor J.B. Moore

Professor Ton Backx

To Weihong and Louise

M.1. Grimble and M.A. Johnson

Chapter 4 is devoted to model selection and identification of nonlinear

neural predictor is introduced to predict the outputs of the nonlinear process

Symbols and Abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. XIX

1. Neural Networks. .. .. .. . . . . .... . . . . . .. . . .. . . .. .. .. . . .. .. . . 1

2. Sequential Nonlinear Identification. . . . . . . . . . . . . . . . . . . . . . .. 27

3. Recursive Nonlinear Identification. . . . . . . . . . . . . . . . . . . . . . . .. 53

4. Multiobjective Nonlinear Identification. . . . . . . . . . . . . . . . . . .. 77

5. Wavelet Based Nonlinear Identification ................... 101

5.6 An Example ............................................ 116

6. Nonlinear Adaptive Neural Control . ....................... 125

7. Nonlinear Predictive Neural Control. ...................... 143

8. Variable Structure Neural Control . ........................ 163

9. Neural Control Application to Combustion Processes . ..... 179

References . .................................................... 193

Index .......................................................... 209

SISO single-input single-output

are able to approximate any continuous mapping to a sufficient accuracy if

1.2 Model of a Neuron

A neuron is an information-processing unit that is fundamental to the oper-

Fig. 1.1. Model of a neuron

Each connecting link is characterised by a weight or strength of its own. Speci-

value. Typically, the normalised amplitude range of the output of a neuron is

A neuron employing such a threshold function is referred to in the literature

1.3 Architectures of Neural Networks

1.3.1 Single Layer Networks

A network of neurons organised in the form of layers is viewed as a layered

Input layer Output layer

Fig. 1.2. Architecture of a single layer network

1.3.2 Multilayer Networks

Input layer Hidden layer Output layer

Fig. 1.3. Architecture of a multilayer network

1.3.3 Recurrent Networks

Fig. 1.4. Architecture of a recurrent network

1.3.4 Lattice Networks

A lattice network may consist of a one-dimensional, two-dimensional, or higher-

Fig. 1.5. Architecture of a lattice network

1.4 Various Neural Networks

1.4.1 Radial Basis Function Networks

g(r) = exp (::) (Gaussian) (1.7)

g(r) = (r2 + (72)-~ (inverse multiquadric) (1.8)

g(r) = r (linear) (1.9)

where r = II u - d k 112' (7 is a real number commonly called receptive width or

As observed earlier, any functional description that is a linear combination

1.4.2 Gaussian RBF Networks

where C k is a weighting matrix of the k-th basis function whose centre is d k .

where p = {w, d, C}. Clearly, the Gaussian RBF network is determined by

1.4.3 Polynomial Basis Function Networks

Multivariate polynomial expansions have been suggested as a candidate for

D(j, j) := II f - 1 112 (1.54)