You are on page 1of 22

This article was downloaded by: [Case Western Reserve University]

On: 13 October 2014, At: 14:43


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:
Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Control


Publication details, including instructions for authors and subscription
information:
http://www.tandfonline.com/loi/tcon20

Identification of non-linear stochastic systems


by state dependent parameter estimation
Peter C. Young , Paul McKenna & John Bruun
Published online: 08 Nov 2010.

To cite this article: Peter C. Young , Paul McKenna & John Bruun (2001) Identification of non-linear stochastic
systems by state dependent parameter estimation, International Journal of Control, 74:18, 1837-1857, DOI:
10.1080/00207170110089824

To link to this article: http://dx.doi.org/10.1080/00207170110089824

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)
contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors
make no representations or warranties whatsoever as to the accuracy, completeness, or suitability
for any purpose of the Content. Any opinions and views expressed in this publication are the opinions
and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of
the Content should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising
directly or indirectly in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial
or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or
distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can
be found at http://www.tandfonline.com/page/terms-and-conditions
INT. J. CONTROL, 2001 , VOL. 74, NO. 18, 1837 ± 1857

Identi®cation of non-linear stochastic systems by state dependent parameter estimation

PETER C. YOUNG{*, PAUL MCKENNA{ and JOHN BRUUN{{

The paper outlines how improved estimates of time variable parameters in models of stochastic dynamic systems can be
obtained using recursive ®ltering and ®xed interval smoothing techniques, with the associated hyper-parameters opti-
mized by maximum likelihood based on prediction error decomposition. It then shows how, by exploiting special data re-
ordering and back-®tting procedures, similar recursive parameter estimation techniques can be utilized to estimate much
more rapid State Dependent Parameter (SDP) variations. In this manner, it is possible to identify and estimate a widely
applicable class of nonlinear stochastic systems, as illustrated by several examples that include simulated and real data
from chaotic processes. Finally, the paper points out that such SDP models can form the basis for new methods of signal
processing, automatic control and state estimation for nonlinear stochastic systems.
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

1. Introduction ministic di€ erential equation models. They employed


Previous publications (e.g. Young 1988, 1989, 1998, a continuous-time steepest descent algorithm, imple-
1999, 2000, 2001a) have discussed an approach to non- mented within an analog computer, to estimate the
stationary and non-linear signal processing based on the parameters and, as a result, their method is rather
identi®cation and estimation of stochastic models with sensitive to noise on the data. As far as the ®rst author
time variable (TVP) or state dependent (SDP) par- is aware, the idea of SDP modelling, within a more
ameters. Here the term `non-stationary’ is assumed to robust stochastic setting, exploiting recursive estima-
mean that the statistical properties of the signal, as tion, was originated in Young (1969) and Mendel
de®ned by the parameters in an associated stochastic (1969).
model, are changing over time at a rate which is `slow’ They enhanced recursive estimation performance by
in relation to the rates of change of the stochastic state assuming that the model parameters could vary because
variables in the system under study. Although such non- of their dependence on the variations in other measured
stationary systems exhibit fairly complex behaviour, this variables. Young (1978) then explored these ideas within
can often be approximated well by TVP (or piece-wise a broader SDP setting and Priestley (1988) took them up
linear) models, the parameters of which can be estimated in a series of papers and a book on the subject. These
using recursive methods of estimation in which the par- earlier publications do not, however, exploit the power
ameters are assumed to evolve in a simple stochastic of recursive ®xed interval smoothing (FIS), which pro-
manner (e.g. Young 1984, 1999 and the prior references vides the main engine for the developments described in
therein). On the other hand, if the changes in the par- the present paper (Young, 1993).
ameters are functions of the state or input variables (i.e. SDP estimation, as proposed here, involves the non-
the parameters and/or their changes actually constitute parametric identi®cation of the state dependency using
stochastic state variables), then the system is truly non- recursive methods of time variable parameter estimation
linear and likely to exhibit severe non-linear behaviour. which allow for rapid (state dependent) parametric
Normally, this cannot be approximated in a simple TVP change. As we shall see, the standard methods of TVP
manner; in which case, recourse must be made to the estimation developed previously for non-stationar y time
alternative, and more powerful SDP modelling methods series analysis need to be modi®ed considerably in this
that are the main topic of this paper. SDP setting to allow for the much more rapid temporal
The extension of the TVP estimation methods to changes that arise from the state dependency. This pro-
allow for state dependency can be traced back to a little cess exploits recursive Fixed Interval Smoothing (FIS)
known Conference paper by Hoberock and Kohr algorithms, combined with special data re-ordering
(1966), who used it in connection with simulated deter- and `back-®tting’ procedures, to obtain estimates of
any state dependent parameter variations. These state
dependencies are estimated in the form of non-para-
Received 20 October 1999. Revised 16 October 2000.
* Author for correspondence. e-mail: p.young@lancaster. metric relationships (graphs) between the estimated
ac.uk. Also, Centre for Resource and Environmental Studies, rapid parameter variation and the associated state or
Australian National University, Canberra, ACT 0200, input variable(s). Parameterization of these non-para-
Australia. metric relationships can be accomplished in various
{ Systems and Control Group, Centre for Research on
Environmental Systems and Statistics, Lancaster University,
ways, from simple curve ®tting based on weighted
Lancaster LA1 4YQ, UK. least squares methods to the use of neural/neuro-fuzzy
{ Now with Unilever Research PLC at Port Sunlight, UK. networks, or radial basis functions.

International Journal of Control ISSN 0020±7179 print/ISSN 1366±5820 online # 2001 Taylor & Francis Ltd
http://www.tandf.co.uk/journals
DOI: 10.1080/00207170110089824
1838 P. C. Young et al.
9
Having identi®ed a structural form for the non- zTt ˆ ‰¡yt¡1 ¡ yt¡2 ¡ yt¡n ut¡¯ ut¡¯¡mŠ =
linear model of the system based on the parameterized
pt ˆ ‰a1 …v t †a2…v t † an …v t †b0 …v t † bm…v t †ŠT ;
non-linear relationships, this model is converted into a
stochastic state space form. The ®nal constant par- …1 a†
ameter estimation phase of the non-linear modelling
then exploits some method of non-linear optimization, while ai …v t †, i ˆ 1; 2; . . . ; n and bj …v t †, j ˆ 0; 1; . . . ; m
such as least squares or Maximum Likelihood (ML) are the state dependent parameters, which are
methods of estimation based on Gaussian assumptions assumed to be functions of one of the variables in
for the stochastic disturbances and the application of a non-minimal state vector v Tt ˆ ‰zTt UTt Š. Here
Prediction Error Decomposition (Schweppe, 1965). The Ut ˆ ‰U1;t U2;t U r;t ŠT is a vector of other variables
resulting model should then provide a parametrically that may a€ ect the relationship between these two pri-
e cient representation of the stochastic, non-linear mary variables (see Young 1969, 1993, Young and
system that has considerable potential for use in sub- Beven 1994). The pure time delay ¯, measured in sam-
sequent signal processing, time series analysis, fore- pling intervals, is introduced to allow for any temporal
casting and automatic control system design. For delay that may occur between the incidence of a change
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

example, the methodology described here exploits recur- in the input ut and its ®rst e€ ect on the output yt ; and et
sive estimation in an o€ -line manner but this sequential is a zero mean, white noise input with Gaussian normal
processing of the data facilitates the development of amplitude distribution and variance ¼2 (although this
related on-line adaptive methods of signal processing, assumption is not essential to the practical application
forecasting and control. of the resulting estimation algorithms). Finally, for con-
In order to illustrate the practical application and venience of notation, let pt be de®ned as
utility of the SDP approach, the paper also contains a pt ˆ ‰ p1;t p2;t pn‡m‡1;t ŠT …1 b†
number of simulation examples, as well as a practical
study involving an analysis of electrical signals obtained with pi;t , i ˆ 1; 2; . . . ; n ‡ m ‡ 1, relating to the ai …v t †
from experiments carried out on the axon of a squid. and bj …v t † through (1a).
Other simulation and practical examples cited in the A typical example of a SDARX model is the forced
references cover a variety of application areas from the logistic equation
environment through engineering to economics. yt ˆ ¬yt¡1 ¡ ¬y2t¡1 ‡ ut ‡ et et ˆ N…0; ¼2 † …2†
or
yt ˆ a1 … yt¡1 † yt¡1 ‡ b0 …ut † ut ‡ et …2 a†
2. Identi®cation of time variable parameter and where
nonlinear input-outpu t systems
a1 … yt¡1 † ˆ ¬ ¡ ¬yt¡1 b0 …ut † ˆ 1:0 8t …2 b†
A previous paper (Young 1999) has discussed the
estimation of time variable parameters in the various As we shall see, although it is simple, this model can
kinds of `linear’ regression model. One of these, exhibit rich behavioural patterns: from simple logistic
the Dynamic{ Auto-Regressive eXogenous variables growth to chaotic response.
(DARX) model is capable of modelling the input±
output behaviour of non-stationar y stochastic, dynamic 2.1. Time variable parameter estimation
systems. While the DARX model can produce fairly Before considering the full SDP model (1), it is
complex response characteristics, it is only when the par- instructive to ®rst deal with the simpler situation
ameters are functions of the system variables, and so where the parameters in pt are slowly variable with
vary at a rate commensurate with these variables, that time. In order to estimate these time variable par-
the resultant model can behave in a heavily non-linear ameters, it is necessary to make some assumptions
or even chaotic manner. We will refer to a State about the nature of their temporal variability.
Dependent Parameter (SDP) model of this type as a Re¯ecting the statistical setting of the analysis and refer-
State Dependent Parameter ARX (SDARX) model. ring to previous research on this topic, it seems desirable
In its simplest single input, single output form, the if this is characterized in some stochastic manner.
SDARX model equation can be written most conveni- Normally, when little is known about the nature of the
ently in the form time variability, this model needs to be both simple and
yt ˆ zTt pt ‡ et et ˆ N…0; ¼2 † …1† ¯exible. One of the simplest and most generally useful
where, class of stochastic, state space models involves the
assumption that the ith parameter, pi;t ,
{ The term `dynamic’ is used here for historical reasons (see i ˆ 1; 2; . . . ; n ‡ m ‡ 1; is de®ned by a two-dimensional
reference) to mean a time variable parameter ARX model. stochastic state vector xi;t ˆ ‰li;t di;t ŠT , where li;t and di;t
Identi®cation of non-linear stochastic systems 1839

are, respectively, the changing level and slope of the each parameter can be assumed to be time-invarian t if
associated TVP. This selection of a two-dimensional the variance of the white noise input ²1i;t is set to zero.
state representation of the TVPs is based on practical Then, the stochastic TVP setting reverts to the more
experience over a number of years. Initial research in the normal, constant parameter TF model situation. In
1960s (e.g. Lee 1964, Young 1969, 1970, and many other words, if RW models with zero variance white
others) tended to use a simple scalar random walk noise inputs are speci®ed for the model parameters,
(RW) model for the parameter variations . Later research then the recursive estimation algorithm described
in the 1970s and 1980s (see Norton 1976, Jakeman and below for the general stochastic TVP case will provide
Young 1979, 1984) showed the value of modelling not recursive estimates that are identical to those obtained
only the level changes in the TVPs but also their rates of with the normal recursive estimation algorithm for ARX
change, as in the de®nition of xi;t , above. models with constant parameters (e.g. Young 1984). Of
The stochastic evolution of each xi;t (and, therefore, course, there is some added value to the recursive sol-
each of the n ‡ m ‡ 1 parameters in pt ) is assumed to ution even in this situation, since the user is provided
be described by a generic Generalized Random Walk with the recursive estimates over the whole interval
(GRW: Jakeman and Young 1979, 1984) process t ˆ 1; 2; . . . ; N. These can provide additional useful
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

de®ned in the State Space (SS) terms information on the model: for example, they show
how the estimates are converging and can be used (e.g.
xi;t ˆ Fi xi;t¡1 ‡ Gi gi;t i ˆ 1; 2; . . . ; m ‡ n ‡ 1 …3†
Brown et al. 1975) to detect both the presence of poten-
where tial parametric change and possible over-parameteriza -
µ ¶ µ ¶
¬ ­ ¯ 0 tion (i.e. the model contains too many parameters to
Fi ˆ ; Gi ˆ provide unambiguous estimation results).
0 ® 0 "
Clearly other, more general and higher order stoch-
T
and gi;t ˆ ‰²1i;t ²2i;t Š is a 2 1, zero mean, white noise astic processes could be used to model the stochastic
vector that allows for stochastic variability in the TVPs, provided such models can be identi®ed satisfac-
parameters and is assumed to be characterized by a torily from the data. For example the higher order
(normally diagonal) covariance matrix Q ²i . This is, of IRWs (Double and Triple Integrated Random Walk
course, a generic model formulated in this manner only (DIRW, TIRW), etc.), the Integrated or Double
to unify various random walk-type models: it is never Integrated AutoRegressive (IAR, DIAR: see Young
used in its entirety since it is clearly over-parameterized . 1994) model, and even more general processes.
As such, it comprises as special cases the Integrated However, the more complex models introduce addi-
Random Walk (IRW: ¬ ˆ ­ ˆ ® ˆ " ˆ 1; ¯ ˆ 0); the tional hyper-parameter s that would have to be well iden-
scalar Random Walk (RW: scalar but equivalent to (3) ti®ed from the data and optimized, thus introducing
with ­ ˆ ® ˆ " ˆ 0; ¬ ˆ ¯ ˆ 1: i.e. just the ®rst equation potential practical di culties.
in (3), see below); the intermediate case of Smoothed The idea of assuming that the model parameters
Random Walk (SRW: 0 < ¬ < 1; ­ ˆ ® ˆ " ˆ 1; and evolve over time as non-stationar y stochastic variables
¯ ˆ 0); the ®rst order autoregressive process (AR(1): may seem complex at ®rst sight but it is, in fact, just a
again scalar with ­ ˆ ® ˆ " ˆ 0; 0 < ¬ < 1; ¯ ˆ 1); statistical device to allow for the estimation of para-
and, ®nally, both the Local Linear Trend (LLT: metric change. After all, the assumption of the RW
¬ ˆ ­ ˆ ® ˆ " ˆ ¯ ˆ 1){; and Damped Trend (DT: model is simply a means of introducing into the estima-
¬ ˆ ­ ˆ ¯ ˆ " ˆ 1; 0 < ® < 1): see Harvey (1984, tion problem the freedom for the associated parameter
1989). The various, normally constant, parameters in to vary at each sample in time by a small random
this GRW model (¬, ­ , ®, ¯, " and the elements of amount de®ned by the variance of the white noise
Q²i ) are often referred to as hyper-parameters . This is input ²1i;t . And the more complex GRW models in (3)
to di€ erentiate them from the TVPs that are the main are just a way of re®ning and adding to this freedom. In
object of the estimation analysis. However, the hyper- fact, it can be shown (Young and Pedregal 1998) that
parameters are also assumed to be unknown a priori the GRW assumptions on the parameter variations have
and need to be estimated from the data, as we shall an implicit but physically interpretable e€ ect when used
see in the subsequent discussion. with ®xed interval smoothing algorithms (see later): they
Note that, in the case of the RW model, i.e. make the recursive parameter estimates, at any sample
time t, depend only on the local data in the vicinity of
li;t ˆ li;t¡1 ‡ ²1i;t ; pi;t ˆ li;t …3 a†
this sample, with the selected GRW model de®ning the
local weighting e€ ect. In the case of the RW model, for
{ Interestingly, the GRW model with ¬ ˆ ® ˆ " ˆ 1 allows
any linear combination of RW and IRW models to be realised; instance, this weighting e€ ect or `kernel’ has a Gaussian-
and the LLT model can be considered simply as one such like shape that applies maximum weight to the current
combination. data with declining weight each side. And the `band-
1840 P. C. Young et al.

width’ of the kernel is de®ned by the ratio of the vari- ithm, where the `®xed interval’ is the interval covered
ance of the white noise input ²1i;t to the residual variance by the total sample size N.
¼2 (the Noise Variance Ratio (NVR): see later). This can The reason for this two-pass approach is easy to
be related to the more explicit use of localized data understand. The forward-pass ®ltering estimate of xt ,
weighting in methods such as locally weighted kernel which de®nes the estimated TVPs, can be denoted by
regression (e.g. Holst et al. 1996), regularization x
^ tjt (or simply x ^ t , for convenience) since it represents
(Jakeman and Young 1979, 1984) and `wavelet’ methods the estimate at sample t given only the data up to and
(e.g. Daubechies 1988). including sampling instant t. However, under our
Having introduced the GRW models for the par- assumption that each of the parameters evolve stochas-
ameter variations, an overall SS model can then be con- tically according to equation (3), a superior `smoothed’
structed straightforwardl y by the aggregation of the estimate x ^ tjN , exists and can be generated by the FIS
subsystem matrices de®ned in (3), with the `observation’ algorithm, in which the estimate at t is based on all
equation de®ned by the model equation (1): i.e. the data over the sampling interval t ˆ 1; 2; . . . ; N. As
¼ a result, the phase lag associated with the forward-pass
Observation equation: yt ˆ Ht xt ‡ et …i†
…4† ®ltering estimate (since it cannot anticipate any change
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

State equation: xt ˆ Fxt¡1 ‡ Ggt …ii† until the evidence for change in the series has been pro-
where cessed) is eliminated on the backward smoothing pass.
xt ˆ ‰xT1;t xT2;t xTn‡m‡1;t ŠT …4 a† Thus, any variation in the parameters is estimated as it
occurs, without any lag e€ ect (indeed, it may even be
If p ˆ 2…n ‡ m ‡ 1†, then F is a p p block diagonal anticipated if the smoothing e€ ect is substantial , as it
matrix with blocks de®ned by the Fi matrices in (3); G can be in high noise situations). This proves particularly
is a p p block diagonal matrix with blocks de®ned by useful in operations such as interpolation over gaps in the
the corresponding subsystem matrices Gi in (3); and gt is data, estimation and removal of individual components
a p-dimensional vector containing, in appropriate loca- from the data (signal extraction), and seasonal adjust-
tions, the white noise input vectors gi;t (`system disturb- ment (see Young and Pedregal 1998; Young et al. 1999).
ances’ in normal SS terminology) to each of the GRW In relation to the time series yt , t ˆ 1; 2; . . . ; N, the
models in (3). These white noise inputs, which provide recursive ®ltering/smoothing algorithm has the follow-
the stochastic stimulus for parametric change in the ing form.
model, are assumed to be independent of the observa-
tion noise et and have a covariance matrix Q formed 1. Forward pass recursive LS equations
from the combination of the individual covariance
Prediction:
matrices Q ²;i . Finally, Ht is a 1 p vector of the form
x
^ tjt¡1 ˆ F^
xt¡1
Ht ˆ ‰¡yt¡1 0 ¡ yt¡2 0
Ptjt¡1 ˆ FPt¡1 FT ‡ GQ r GT
¡ yt¡n 0 ut¡¯ 0 ut¡¯¡m 0Š …4 b† Correction:
that relates the scalar observation yt to the state vari- )
x ^tjt¡1 ‡ Ptjt¡1 HTt ‰1 ‡ Ht Ptjt¡1 HTt Š¡1 f yt ¡ Ht x
^t ˆ x ^ tjt¡1 g
ables de®ned by (4 a), so that it represents the model (1),
with each parameter de®ned as a GRW process. In the Pt ˆ Ptjt¡1 ¡ Ptjt¡1 HTt ‰1 ‡ Ht Ptjt¡1 HTt Š¡1 Ht Ptjt¡1
case of the scalar RW and AR(1) models, the alternate …5 a†
zeros are simply omitted.
The SS formulation in equations (4) is particularly
well suited for optimal recursive estimation in which the 2. Backward pass smoothing equations{
time variable parameters (acting as surrogate `states’ in 9
x xt‡1jN ‡ GQr GT Lt Š
^tjN ˆ F¡1 ‰^ >
>
this SS formulation) are estimated sequentially whilst >
>
Lt ˆ ‰I ¡ Pt‡1 HTt‡1 Ht‡1 ŠT >
>
working through the data in temporal order (usually >
=
termed `forward-pass ®ltering’). In the o€ -line situation, ‰FT Lt‡1 ¡ HTt‡1 f yt‡1 ¡ Ht‡1 x
^t‡1 gŠ …5 b†
where all the time series data are available for analysis, >
>
>
this ®ltering operation is accompanied by optimal recur- PtjN ˆ Pt ‡ Pt FT P¡1
t‡1jN ‰Pt‡1jN ¡ Pt‡1jt ŠP¡1 >
t‡1jt FPt >
>
>
;
sive smoothing (see e.g. Bryson and Ho 1969 and the LN ˆ 0
prior references therein; Norton 1986). Here, the
estimates obtained from the forward-pass, ®ltering { Note that here the F matrix needs to be non-singular,
algorithm are updated sequentially whilst working which is the case for the GRW type model. Also note that
an alternative FIS algorithm is available in which, at each
through the data in reverse temporal order (usually backwards recursion, the estimate x^tjN is based on an update
termed `backward-pass smoothing’) using a back- of the ®ltering estimate x
^ t (see Norton 1976, Young 1984).
wards-recursive Fixed Interval Smoothing (FIS) algor- This can be speci®ed as an alternative to (5 b).
Identi®cation of non-linear stochastic systems 1841

In these algorithms, the p p Noise Variance Ratio prediction errors. Now the ML estimate of ¼2 , con-
(NVR) matrix Qr and the p p matrix Pt are de®ned as ditional on the hyper-parameters , is given by
Q Pt
Qr ˆ ; Pt ˆ …5 c† 1 X N
"2t
¼2 ¼2 ¼^2 ˆ …8†
N ¡ p tˆp‡1 1 ‡ Ht Ptjt¡1 HTt
where Pt is the error covariance matrix associated with
the state estimates which, in the present TVP context, so that it can be estimated in this manner and `concen-
de®nes the estimated uncertainty in the parameters. For trated out’ of the expression (7) by substituting (8)
simplicity, it is normally assumed that the NVR matrix into (7), to yield the following expression for the `con-
Qr is diagonal, although this is not essential. The NVR centrated likelihood’
parameters that characterize Qr (as well as any other
unknown hyper-parameter s in the SS model (4)) are N ¡p
unknown prior to the analysis and clearly need to be log …Lc † ˆ ¡ log …2º ‡ 1†
2
estimated on the basis of the time series data yt and ut
before the ®ltering and smoothing algorithms can be 1 X N
¡ log …1 ‡ Ht Ptjt¡1 H Tt †
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

utilized. The optimization of these hyper-parameter s is 2 tˆp‡1


discussed in the next sub-section.
N¡p
¡ log …¼^2 † …9†
2
2.2. Maximum likelihood (ML) optimization of hyper-
parameters which needs to be maximized with respect to the
The approach to ML optimization based on unknown hyper-parameter s in order to obtain their
Prediction Error Decomposition (PED) derives originally ML estimates.
from the work of Schweppe (1965), who showed how to Since (9) is non-linear in the hyper-parameters , the
generate likelihood functions for Gaussian signals using likelihood maximization needs to be carried out numeri-
the Kalman ®lter (see also Bryson and Ho 1969, p. 389). cally. Consequently, it is more convenient to remove the
Its importance in the present context was probably ®rst constant term (since it will play no part in the optimiza-
recognized by Harvey (1981) and Kitagawa (1981) in tion) and consider
their development of `unobserved component’ forecast-
ing models. Since then, it has become one of the two 1 X N
log …Lc † ˆ ¡ log …1 ‡ H t Ptjt¡1 H Tt †
standard approaches to the problem (the other being 2 tˆp‡1
the Expectation and Minimization (EM) algorithm:
N¡p
Dempster et al. 1977). ‡ log …¼^2 † …10†
With given initial values for the hyper-parameters , 2
the Kalman ®lter algorithm will yield the one-step-ahea d
which can then be minimized if it is multiplied by
prediction errors (also termed the `innovations’ or ¡1. This minimization is accomplished by initiating
`recursive residuals’) "t , where the optimization with the hyper-paramete r estimates
" t ˆ y t ¡ Ht x
^tjt¡1 t ˆ 1; 2; . . . ; N …6† either selected by the user or set to some default values
(in both cases, ensuring that the resulting optimization
If the ®rst p observations are regarded as ®xed, the log- does not converge on a local minimum). The recursive
likelihood function of yp‡1 ; . . . ; yN can be de®ned in TVP estimation algorithm is used repeatedly to generate
terms of the standard `regression’ form of prediction the one step ahead prediction errors "t and, thence,
error decomposition, i.e. the log-likelihood value in (10) associated with the
¡…N ¡ p† 1 latest selection of hyper-paramete r estimates made by
log L ˆ log …2º† ¡ log …¼2 † the optimization algorithm. The optimization algorithm
2 2
then adjusts its selection of hyper-paramete r estimates
1 XN
in order to converge on those estimates that minimize
¡ log …1 ‡ H t Ptjt¡1 H Tt †
2 tˆp‡1 (10). Typical methods that can be used for numerical
optimization are the `fmins’ and `fminu’ functions
1 X N
"2t available in the Matlab software system, or their
¡ …7†
2¼2 tˆp‡1 1 ‡ Ht Ptjt¡1 HTt equivalents, although more complex and e cient
procedures are available. Further details of this and
where it can be shown that ¼2 …1 ‡ Ht Ptjt¡1 HTt † is the alternative ML optimization procedures are given, for
variance of "t , so that the last term in (7) is based on example, in Harvey (1989) and Harvey and Peters
the sum of squares of the normalized one-step-ahea d (1990).
1842 P. C. Young et al.

2.3. Full state dependent parameter estimation variable’ (MDV) series obtained by subtracting all the
Since the parameter vector pt is potentially state- other terms on the right-hand side of (1) from yt , using
dependent, it may vary at a rate commensurate with the values of the other parameter estimates from the
the temporal variations in yt , ut and Ui;t , and so it previous iteration. At each such back®tting iteration,
cannot be assumed that the simple GRW model (3) the sorting can then be based on the single variable
is appropriate to describe the parametric variation associated with the current SDP being estimated.
over time. At ®rst sight, it would appear that the stoch- Since the SDP estimates resulting from this back®t-
astic state space model should include prior informa- ting algorithm are themselves time series, it will be noted
tion on the nature of the parameter variation if the that the algorithm constitutes a special form of non-
TVP estimation methodology discussed in previous parametric estimation and, as such, can be compared
sections is to work satisfactorily. Fortunately, it is with other non-parametri c methods, such as the
Generalized Additive Modelling (GAM) approach of
possible to remove this requirement if we resort to
Hastie and Tibshirani (1996). However, in both concep-
the rather unusual procedure, at least within a time
tual and algorithmic terms, the SDP approach described
series context, of sorting the data in a non-temporal
here is signi®cantly di€ erent from this earlier approach
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

order. Then, if the ordering is chosen so that the SDP


and seems more appropriate to the estimation of non-
variations associated with the sorted series are
linear, stochastic, dynamic models. Moreover, the recur-
smoother and less rapid, it is more likely that a
sive methodology, on which SDP estimation is based, is
simple GRW process can be utilized to describe their
couched in optimal maximum likelihood terms that
evolution.
seem more elegant and ¯exible than the scatter-plot
For example, if the time series are sorted in some
smoothing procedures used by Hastie, Tibshirani and
common `ascending order of magnitude’ manner (i.e.
others.
the sort operation in Matlab), then the rapid natural
The back®tting algorithm for the SDARX model (1)
variations in yt and ut are e€ ectively eliminated from
takes the following form.
the data and replaced, in the sorted data space, by
much smoother and less rapid variations. And if the
SDPs are, indeed, related to these variables, then Back®tting algorithm for SDP models
they will be similarly a€ ected by the sorting. Follow- Step 1. Assume that FIS estimation has yielded prior
ing FIS estimation, however, these SDP estimates TVP estimates p^0i;tjN ; i ˆ 1; 2; . . . ; m ‡ n ‡ 1 of
can be `unsorted’ (a trivial unsort operation to reverse the SDPs.{
Matlab’s sort) and their true, rapid variation will Step 2. Iterate: i ˆ 1; 2; . . . ; m ‡ n ‡ 1; k ˆ 1; 2; . . . ; kc
become apparent. Of course, the nature of the sorting P
will a€ ect the estimation and it seems likely that there (i) form the MDV yit ˆ yt ¡ j6ˆi zj;t p^kj;tjN ;
will be an optimum sorting which results in minimum (ii) sort{ both yit and zi;t according to the
variance estimates. However, such optimum sorting will ascending order of zi;t ;
naturally depend upon the nature of the state depen- (iii) obtain an FIS estimate p^ki;tjN of pi;t in the
dency and its de®nition would require some sort of itera- MDV relationship yit ˆ pi;t zi;t .
tive estimation procedure. In practical terms, therefore, Step 3. Continue Step 2 (each time forming the MDV
the common ascending order sorting and un-sorting and then sorting according to the current right-
operations seem the most straightforward and will be hand side variable zi;t , prior to FIS estimation),
utilized here. until iteration kc , when the individual SDPs
One obvious requirement with this new approach to (which are each time-series of length N) have
SDP estimation is that the sorting of data, prior to FIS not changed signi®cantly according to some
estimation, must be common to all of the variables in the chosen criterion. The smoothing hyper-par-
relationship (1). If a single, ascending order strategy is ameters required for FIS estimation at each
selected, therefore, it is necessary to decide upon which stage are optimized by Maximum Likelihood
variable in the model the sorting should be based. The (ML), as explained earlier in } 2.2 and discussed
simplest strategy is to sort according to the ascending further below.
order of the `dependent’ variable yt . Depending upon
the nature of each SDP in the vector pt , however, a
single variable sorting strategy, such of this, may not { As a default, these can be simply the constant least
produce satisfactory results. If this is the case, then a squares parameter estimates, since the convergence of the
back®tting procedure is not too sensitive to the prior
more complicated, but still straightforward, `back®tting’ estimates, provided they are reasonable.
procedure can be exploited. Here, each parameter is { Depending on the nature of the state dependency, sorting
estimated in turn, based on the `modi®ed dependent may need to be with respect to another variable in v t .
Identi®cation of non-linear stochastic systems 1843
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 1. Simulated forced logistic growth model. Upper panel: low noise level (10%); lower panel: high noise level (100%). Noisy
data (circles); noise free data (full line); noise (‡1.0) above.

Note that the ML optimization can be carried out author in what appears to be the ®rst publication on
in various ways: after every complete iteration (each stochastic SDP modelling (Young 1978) although, at
involving m ‡ n ‡ 1 FIS operations) until convergence that time, the more powerful SDP estimation algor-
is achieved; only at the initial complete iteration, with ithms described in this paper had not been developed
the hyper-parameter s maintained at these values for the and simple recursive parameter estimation (®ltering)
rest of the back®tting; or just on the ®rst two iterations. was utilized.
The latter choice seems most satisfactory in practice, The estimation is based on 1000 samples of ut and yt
since very little improvement in convergence occurs if under two di€ erent noise conditions, 200-sampl e seg-
optimization is continued after this stage. Normally, ments of which are presented in ®gure 1. Here the per-
convergence is completed after only a few iterations, centage noise is de®ned in terms of the standard
although it can be more lengthy in some circumstances deviation of the noise et in relation to the standard devi-
(see Conclusions } 4). ation of noise-free output. The upper panel shows the
low noise (10%) situation, with the noisy data (circles)
Simulation example 1: As a simulation example of compared with the noise-free data (full line) and the
SDARX modelling, consider the forced logistic growth noise et plotted above (‡1.0). The lower panel is similar
equation (2), mentioned previously, with ¬ ˆ 2:0 and but the noise level is now at the much higher 100% level.
a zero mean, random input signal ut . The unforced In fact, if the noise is de®ned in terms of di€ erence
equivalent of this model (i.e. ut ˆ 0 8t: see Simulation between the measured, noisy output and the noise-free
example 2) is, in fact, the example used by the ®rst output, then the noise level is 110%.

HP Optimization WLS Estimation

Noise NVR…a1 † NVR…b0 † ¬^1 ¬


^2 ­^0 R2T

Low 0.452 2:5 10¡17 1.996 71.993 1.003 0.999


(0.000 016) (0.000 032) (0.006)
High 0.0165 2:5 10¡17 1.959 71.908 1.016 0.998
(0.00 023) (0.000 46) (0.064)

Table 1 a. Hyper-parameter and ®nal WLS estimation results.


1844 P. C. Young et al.
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 2. Simulated forced logistic growth model. Upper panels: low noise level (10%); lower panels: high noise level (100%). Left
panels: FIS estimate of feedback state dependent parameter a1 … yt¡1 †. Right panels: FIS estimate of input (time-invariant)
parameter b0 . True values shown as dashed lines and standard error bands shown as dotted lines.

The initial SDP estimation results are shown in the value close to unity. Figure 3 presents the associated
left side of Table 1 a, using a RW model for each SDP. estimates of the non-linear functions: f^1 … yt¡1 † ˆ
Here, the optimum NVR values are obtained by ML a^1 … yt¡1 j N† yt¡1 and f^2 …ut † ˆ b^0 …ut j N† ut plotted as
optimization at the ®rst back®tting iteration and they functions of their associated states.
are maintained at these values for the subsequent two The e€ ectiveness of the sorting procedure in facilitat-
iterations required for convergence. Note how, in both ing state dependent parameter estimation is illustrated in
low and high noise situations, the optimized ®gure 4. This compares the SDP estimate a^1 … yt¡1 †
NVR…b0 † ˆ 2:5 10¡17 is insigni®cantly di€ erent from plotted in the sorted data space, as obtained from the
zero, illustrating how the optimization has, quite objec- back®tting algorithm (above), with the same estimate
tively, identi®ed that b0 is constant. In contrast the opti- plotted in the natural data space (below). It is clear
mized NVRs for the `lag’ parameter a1 … yt¡1 †, are 0.452 that the direct estimation of this SDP, based on the
and 0.0165, respectively, so indicating the presence of time series processed in the natural (temporal ) order,
state dependency and allowing for its estimation. would produce poor results without access to a priori
The FIS estimation results, revealing the estimated information on the nature of the parameter variation.
nature of the state dependency, are shown in ®gures 2 For example, when this is attempted in the low
and 3. In all the plots, the true functions are plotted as noise case, the ML optimized NVR values are
dashed lines and estimation standard error bands are very small, the resulting parameter estimates
given by the dotted lines. Figure 2 provides plots of hardly change and the explanation of the data is
the estimated SDP variation against the associated very poor. And even if we assume prior knowledge
non-minimal state variable, in each case. The left-hand (e.g. set NVR…a1 † ˆ 100; NVR…b0 † ˆ 0), the model
panel shows the estimated relationship between a^1;tjN ˆ explains the data better but the associated SDP plots
a^1 … yt¡1 j N† and yt¡1 ; while the right-hand panel are very noisy indeed, with no clear indication of
shows the estimated b^0;tjN ˆ b^0 …ut j N† as a function the nature of the non-linearity. In contrast, by
of ut . As expected, these correspond to those used in sorting the data in ascending order of magnitude, the
the simulated model: a^1 … yt¡1 j N† is a clear linear func- sequential variations in the SDP are now slow and
tion of yt¡1 with slope and intercept close to 72 and 2, smooth enough to be well estimated by the FIS algor-
respectively, in correspondenc e with equation (2 b); ithm, under the assumption that the parameter varies as
while b^0 …ut j N† is constant for all t, at an estimated a RW process.
Identi®cation of non-linear stochastic systems 1845
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 3. Simulated forced logistic growth model. Upper panels: low noise level (10%); lower panels: high noise level (100%). Left
panels: FIS estimate of feedback non-linearity f1 … yt¡1 †. Right panels: FIS estimate of input (linear) function f2 …ut † ˆ b0 ut .
True values shown as dashed lines and standard error bands shown as dotted lines.

Figure 4. Simulated forced logistic growth model: comparison of the SDP estimate a^1 … yt¡1 † plotted in the sorted data space
(above) with the same estimate plotted in the natural, un-sorted, temporal order (below).
1846 P. C. Young et al.

Noise ¬
^1 ¬
^2 ­^0 R2T
on Instrumental Variable (IV) techniques (Young 2000).
Although limited to input±output systems, the latter IV
Low 1.999 72.011 1.002 0.999 procedure is simpler and more e€ ective because it does
(10%) (0.006) (0.013) (0.006) not require simultaneous estimation of the non-station-
High 1.988 72.031 1.059 0.994 ary noise model parameters. It is also e€ ective in the
(100%) (0.064) (0.132) (0.060) SDP situation, but only if the non-linear system does
not exhibit chaotic behaviour.
Table 1 b. Final least squares estimation results.
When considering the practical implications of
any errors-in-variable s bias on the estimates of the
Given the non-parametric estimation results shown
parameters in the SDARX model, it is important to
in ®gures 2 and 3, it is clear that, even in the high noise
stress that SDP estimation is being suggested here
case, the forced logistic growth equation, with three
primarily as a method of identifying the non-linear
unknown parameters (¬1 ; ¬2 and ­ 0 ), i.e.
model structure and not necessarily as an end in itself.
yt ˆ ¬1 yt¡1 ¡ ¬2 y2t¡1 ‡ ­ 0 ut ‡ et …11† Moreover, the bias can be quite small even in quite high
noise situations. For instance, consider the following
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

provides a parsimonious representation of the data. It is


extended version of the model (2) with an additional
straightforward, therefore, to obtain estimates of these
tanh non-linearity on the input
parameters by simply ®tting appropriate parameterized
functions to the estimated state dependent relationships xt ˆ 2:0xt¡1 ¡ 2:0x2t¡1 ‡ 0:01…tanh …ut †
using ordinary least squares or WLS estimation (see
‡ 0:01ut † ut ˆ U…0; 3:2†
Young 1993; Young and Beven 1994). The WLS esti-
mates obtained in this manner are listed at the right- y t ˆ xt ‡ ¹ t ¹t ˆ N…0; 3 10¡5 †
hand side of Table 1 a, where R2T is the coe cient of
determination based on the simulation model residuals. Here, the errors-in-variable s noise on xt is quite high
Although these estimates appear somewhat biased in (70% by standard deviation) but, despite this, both the
the high noise case, the resulting deterministic model lag SDP and the input non-linearity, as shown in
output, obtained from equation (11) with et ˆ 0, is ®gure 5, are estimated well, so providing useful informa-
very similar to the noise free output of the actual system. tion on the location and nature of the non-linearities in
Having identi®ed a parametric form for the model, this case.
based on the non-parametri c SDP results, however,
improved ®nal estimates of the model parameters can
be obtained using linear estimation based directly on the 3. Identi®cation of purely stochastic non-linear systems
measured data (the non-linear model (11) is linear in the A special example of the SDP model (1) is the
parameters). As might be expected, the resulting esti- following State Dependent parameter Auto-Regressive
mates, as shown in Table 1 b, are now very good. (SDAR) model
yt ˆ zTt pt ‡ et …12†
2.4. The problem of errors-in-variable s estimation where
9
As in the constant parameter situation, the TVP and zTt ˆ ‰¡yt¡1 ¡ yt¡2 ¡ yt¡n Š =
SDP estimates are `asymptotically’ biased away from …12 a†
their true values if the assumed, and rather special, sig- pt ˆ ‰a1 …v t † a2 …v t † an …v t †ŠT ;
nal topology of the SDARX model (1) is violated. Here,
Clearly, the same SDARX estimation methods dis-
we use `asymptotic’ in a loose sense, since concepts such
cussed previously can be applied to this model, a simple
as asymptotic bias are not strictly applicable in the TVP
example of which is the chaotic version of the logistic
and SDP situations. However, the presence of measure-
equation. Typical simulation results for this model are
ment noise results in parameter estimates which are per-
discussed below.
manently biased away from the true parameter values in
a systematic manner redolent of asymptotic bias in the
Simulation example 2: In order to consider further
constant parameter case. For example, if yt in (1) is
the e€ ects of errors-in-variable s measurement noise, as
measured in the presence of noise ¹t , in addition to the
discussed in } 2.4 for input±output systems{, the
`system’ noise et , then this kind of troublesome bias can
appear on the parameter estimates.
In the slow TVP situation, it is possible to obviate
{ Note that, in this purely stochastic situation, it is not
this `errors-in-variables ’ problem by either extending the possible to utilize the IV approach mentioned in } 2.4 to
model to include additional noise terms (e.g. Norton obviate bias since this requires the presence of an input
1986); or utilizing a recently developed approach based signal to generate the instrumental variables.
Identi®cation of non-linear stochastic systems 1847
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 5. SDP estimation results for a system with severe input and lag non-linearities, showing the relative insensitivity, in this
case, to a high level of errors-in-variables noise (70% by standard deviation).

following Noisy SDAR (NSDAR) version of the a^1 … yt¡1 j N†. In both plots, the true relationship is
chaotic logistic growth model (cf. equations (2)) will shown as a dashed line, while the estimated standard
be used in this example: error bounds are shown dotted.
) The most notable feature of the results in ®gure 6 is
xt ˆ 4:0xt¡1 ¡ 4:0x2t¡1 ‡ et et ˆ N…0; 0:0064† the larger errors in the SDP estimate at low values of
yt ˆ xt ‡ ¹t ¹t ˆ N…0; 0:0012† yt¡1 . Since the measurement noise variance is constant,
…13 a† this is the region where the measurement noise is having
its most deleterious biasing e€ ect. Even with these
or errors, however, the WLS estimates of the parameters
(3:884…0:111† and ¡3:857…0:135†, respectively) are quite
yt ˆ a1 … yt¡1 † yt¡1 ‡ ±t a1 … yt¡1 † ˆ 4:0 ¡ 4:0yt¡1
close to the true values (4 and ¡4). The results are better
…13 b† still, however, if the noise is itself made state dependent,
This model generates chaotic behaviour (see later, ®gure in the sense that it is set proportional to the signal level
7) and the noise ±t is a complex non-linear function of et , (i.e. ¹tsd ˆ ‰xt =max …xt †Š ¹t ). This is a common situation
¹t and yt . The noise level is about 10% (by standard with real data and it signi®cantly reduces the noise e€ ect
deviation). As expected, without this measurement at low signal levels where the bias is largest. As a result,
noise present, SDP estimation is straightforward with the WLS estimates are improved to 3:960…0:013† and
excellent, low variance SDP estimates that identify the ¡3:946…0:016†, respectively. Indeed, in this state depen-
nature of the non-linearity without any di culty. dent noise situation, the results are still good even if the
Even with the measurement noise, the estimation variance of ¹t is doubled, so that the noise level on the
results are reasonable. Since there is only one SDP, data is visibly quite large, as shown in ®gure 7. The
a1 … yt¡1 †, ML optimization of the associated scalar resulting SDP estimation results are presented in the
NVR quickly yields Qr ˆ 0:000016. The subsequent lower panels of ®gure 6, which identify clearly that the
SDP estimation results are illustrated in ®gure 6, data were generated by the logistic model with chaos-
where the top left-hand panel shows the FIS estimated inducing parameter values.
non-linear function with its characteristic quadratic
shape; while the top right-hand panel shows the Simulation example 3: The cosine map model (e.g.
estimated state dependency of the SDP estimate Zhan-Qian and Smith 1998) takes the form
1848 P. C. Young et al.
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 6. SDP estimation of the simulated chaotic logistic model. Left panels: estimated non-linear functions with ordinary
(upper) and state-dependent (lower) noise. Right panels: estimated state dependent parameters with ordinary (upper) and
state-dependent (lower) noise.

Figure 7. Simulated chaotic logistic model. Noisy (full line) and noise-free (dashed line) output, with the state-dependent noise
(‡1.2) shown above.
Identi®cation of non-linear stochastic systems 1849
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 8. Simulated cosine map model. Upper panel: measured output. Lower panel: phase plane (embedding) plot of yt vs yt¡1
(noise-free response shown as a full line).

yt ˆ cos …2:8yt¡1 † ‡ 0:3yt¡2 ‡ et et ˆ N…0; 0:01† lag parameter a2;t ˆ a2 … yt¡2 † is e€ ectively time-
…14† invariant.
Figure 9 shows the FIS estimate of the cosine non-
and a typical 2000 sample simulation of the model is linearity subsequent to the convergence of the back®t-
presented in ®gure 8, which shows the time response in ting procedure, which took six iterations in this case.
the upper panel and the phase plane … yt yt¡1 † plot in Except for the region around the singularity at
the lower panel. In the latter graph, the noise free yt¡1 ˆ 0 (see below), the estimation is very good. The
response is shown as a full thick line with the noisy associated a^2 … yt¡2 j N† ˆ 0:291…0:0038†, for all t, is
response plotted as dots. estimated as being time-invariant despite the fact that
This a typical stochastic model that exhibits under- the NVR, Qr …2; 2† ˆ 1:28 10¡7 , is not too small in this
lying chaotic response characteristics. It provides a test- case. Figure 10 compares the actual phase plane plot for
ing example for the SDP approach, however, because it the data used in the estimation (left-hand panel) with a
is not in the assumed a ne form (Co 1996): in particu-
similar plot based on data from a typical random reali-
lar, the SDP term a1 … yt¡1 † yt¡1 in the most appropriate
zation of the SDAR model (right-hand panel). Similar
SDAR model
agreement is found in both the time plots of the two
yt ˆ a1 … yt¡1 † yt¡1 ‡ a2 … yt¡2 † yt¡2 ‡ et et ˆ N…0; ¼2 † series and the histograms.
Figure 11 compares the FIS estimate a^1 … yt¡1 j N†,
…14 a†
plotted as a function of yt¡1 , with the theoretical
is not able to represent the equivalent term cos …2:8yt¡1 † function given by a1 … yt¡1 † ˆ cos …2:8yt¡1 †=yt¡1 (which
in (14) exactly, since cos …2:8yt¡1 †=yt¡1 has a singularity migrates to 1 at the point yt¡1 ˆ 0). Over the most
at yt¡1 ˆ 0. Despite this di culty, SDAR estimation important region ¡0:1 > yt¡1 > 0:1, the estimate is very
yields excellent results, as illustrated in ®gures 9±11. accurate. Not surprisingly, it becomes inaccurate close
These results were obtained with the ML optimized to the singularity, but the algorithm is robust enough to
NVR matrix Q r ˆ diag ‰0:0057 1:28 10¡7 Š and it is handle this well, without impairing the SDP estimates
clear, yet again, that the optimization has successfully elsewhere. And the overall cosine shape of the non-
identi®ed that the potential state dependency resides in linear function is clearly estimated accurately, as we
the ®rst lag parameter a1;t ˆ a1 … yt¡1 †, while the second see in ®gure 9.
1850 P. C. Young et al.
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 9. Simulated cosine map model. Comparison of FIS estimated and actual cosine non-linearity.

Figure 10. Simulated cosine map model. Comparison of the phase-plane plot for the data used in the estimation (left panel) with
the phase-plane plot obtained from a random realization of the estimated SDAR model.
Identi®cation of non-linear stochastic systems 1851
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 11. Simulated cosine map model. Comparison of the FIS estimate a^1 … yt¡1 j N† as a function of yt¡1 , with the theoretical
function given by a1 … yt¡1 † ˆ cos …2:8yt¡1 †=yt¡1 .

Finally, on the basis of the above SDP results, the ®rst order SDAR model results will be discussed here:
form of the non-linear equation is identi®ed correctly as: additional results based on the estimation of a second
order SDAR model are presented in Young (2001 a).
yt ˆ ¬ cos …­ yt¡1 † ‡ ®yt¡2 ‡ et …14 b†
The simplest SDAR model that could produce the
and the optimized ML estimates of the, now con- dynamics shown in ®gure 12 is the following ®rst order
stant, parameters ¬, ­ and ® in this model are SDAR
^ ˆ 0:998…0:004†, ­^ ˆ 2:797…0:004† and ®^ ˆ 0:303
¬
yt ˆ a1 … yt¡1 † yt¡1 ‡ et …15†
…0:003†.
The FIS estimate of the SDP is shown in ®gure 13 and
Real example: analysis of squid data: This example is the associated non-linear function in ®gure 14. These
based on the analysis of the signal shown in ®gure 12, results were obtained with the single NVR hyper-par-
which was obtained by Kazu Aihara and Gen Matsu- ameter optimized at NVR…a1 † ˆ 5:01 106 , under the
moto from experiments on the giant axon of a squid assumption of an RW model for the SDP parameter
(Mees et al. 1992). The signal comprises voltage meas- variation (an IRW assumption provides a slightly
urements made from a micro-pipette inserted into a more smoothed estimate but does not make any signi®-
giant axon. Squid are used for such experiments be- cant di€ erence). The standard error band is shown as
cause they have large diameter axons: this is because the dashed lines on both plots. This model explains
the nerves are not myelinated (insulated), so they need 92.2% of the squid data (i.e. the coe cient of determi-
to have large diameter to reduce ion leakage and so nation based on the model residuals R2 ˆ 0:922).
maintain the transmission speed. The experiment is Figure 15 provides a qualitative but discerning com-
done in vitro (i.e. the nerves are chopped out of the parison of the model behaviour and the squid data
squid), with the membrane voltage clamped. This is (shown in the lower panel): here, the upper panel
normally referred to as the forced response. However, shows a random realization of the model (15) generated
the forcing is periodic and the response is not: in ef- by a Simulink simulation model using a look-up table
fect, ®gure 12 shows a putative chaotic response to a for the non-linearity based on the SDP estimation
periodic signal. It is analysed here, however, as a results. It is clear that the general nature of the response
purely stochastic, unforced dynamic process. Only the visually matches the actual squid signal rather well and
1852 P. C. Young et al.
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 12. Squid example: signal obtained from experiments on the giant axon.

Figure 13. Squid example: FIS estimate of the state dependent parameter a1 … yt¡1 †.

this is con®rmed by the similarity in the statistical prop- Network Toolbox function solverb, is able to reproduce
erties of the two signals. the non-linearity with an R2 ˆ 0:997, and the Simulink
The estimated non-parametric non-linearity can be simulated behaviour of the resulting parameterized
parameterized in various ways. Here, a nine-function model closely resembles that of both the non-parametric
RBF model, designed using the Matlab Neural model and the squid data.
Identi®cation of non-linear stochastic systems 1853
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

Figure 14. Squid example: FIS estimate of non-linearity f1 … yt¡1 † ˆ a1 … yt¡1 † yt¡1 .

Figure 15. Squid example: qualitative comparison of a random realization of the SDAR model behaviour (upper plot) and the
squid data (lower panel).

Note ®nally, that the dots shown in ®gure 14 repre- SDP approach to modelling non-linear systems and
sent the phase-plane or embedding graph of the squid existing methods used by non-linear systems theorists,
data, with yt plotted versus yt¡1 . This is interesting where smooth curves are often ®tted to the embedding
because it draws a comparison between the present graphs in order to model the non-linear system (see e.g.
1854 P. C. Young et al.

Mees 1991, 1993). The advantage of the SDP approach models described here can provide the basis for rather
is that the smoothing is optimal in a ML sense and is novel non-parametric or state dependent parameter
carried out within a stochastic, dynamic systems setting. KF±FIS design, with implications for both modelling
Also, the smoothing is applied to the estimation of the and non-linear optimization based on prediction error
SDP parameter, rather than the non-linear function as a decomposition. For example, the blow¯y population
whole, and so it provides a more ¯exible and informa- modelling example in Young (2000) has shown how a
tive result. SDP implementation of the KF can be used as a vehicle
For instance, the SDP in the model (15) can be con- in the ®nal stage of non-linear model parameter optimi-
sidered, in an approximate sense, as the changing eigen- zation based on prediction error decomposition.
value of a ®rst order, discrete-time dynamic system (we In the case of control system design, Young (1996)
might refer to this as a virtual-eigenvalue) and this pro- adumbrated a SDP approach to non-linear control
vides some additional insight into the nature of the system design by showing how a SDP version of the
system. By reference to ®gure 13, for example, we see Proportional-Integral-Plu s (PIP) controller (see Taylor
that, for yt less than about 7120, this eigenvalue is et al. 2000 and the prior references therein) could be
approximately constant at a value of about 0.75, used to successfully control the chaotic logistic growth
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

suggesting that the underlying behaviour over this part equation (13 a). In more recent research (McCabe et al.
of the state space is quite stable, with a (virtual) time 2001), we have provided a theoretical basis for such
constant of approximatel y 3.5 sampling intervals. SDP-PIP control system design and shown how it can
Between values of about 7120 and 7105, however, be applied to a variety of non-linear systems, including
the eigenvalue increases steadily to unity, where the the non-linear model of a hydraulic servo with asym-
system is at a point of neutral stability (i.e. the system metric limiting and deadzone non-linearities discussed
is acting transiently as an integrator). Thereafter, for elswhere in this Special Issue (see Hu et al. 2001;
values greater than 7105, the eigenvalue rises sharply Young 2001 b).
to around 1.5 and the underlying system is clearly
exponentially unstable. However, if we now consider
the temporal variation of the SDP, it is clear that, 5. Conclusions
every time that the SDP exceeds unity, the instability Recursive estimation has a long and rich history:
of the system drives it immediately back to a location from its beginnings in Gauss’s original derivation of
where the SDP is in the region where the eigenvalue is recursive least squares (Gauss 1823; see Appendix 2 of
about 0.75 again and the system is stabilized. This is also Young 1984), through its re-discovery by Plackett (1950)
well illustrated by a stacked plot of the changing impulse and Kalman’s seminal work on stochastic state estima-
response associated with the AR model de®ned by the tion (Kalman 1960), to the burgeoning of research on
SDAR model at each instant of time instant. recursive estimation in a whole range of di€ erent aca-
So, to conclude, it would appear that the ®rst-order demic disciplines between 1960 and the present. In the
SDAR model provides a reasonable mechanism for last ten years, however, the advent of fast computers and
characterizing the behaviour of the electrical activity in the desire of theorists to extend the boundaries of time
the axon of the squid and that, although the second- series analysis has led to an explosion of research on
order SDAR model provides a slightly better explana- Monte Carlo-based numerical methods, from either
tion of the squid data (see Young 2001 a), this does not classical (e.g. Durbin and Koopmans 1999) or
seem su cient to justify the increased complexity of the Bayesian (e.g. Ruanaidh and Fitzgerald 1996,
model. Gamerman 1997) perspectives.
The motivation of this more recent research is clearly
to extend the `Gaussian’ methods of standard recursive
4. SDP state estimation and automatic control estimation to non-Gaussian and non-linear time series,
It is clearly possible to develop state estimation and using models in which the stochastic inputs are non-
control system design methods based on the new class of Gaussian. But models of non-Gaussian and non-linear
SDP non-linear models discussed in this paper. These processes do not necessarily require the assumption of
can be considered in parametric or non-parametric non-Gaussian inputs. As we have seen in this paper, a
form (the latter providing a new way of considering fairly wide class of non-Gaussian and non-linear time
control and estimation system design). series can be represented by time variable (TVP) and
The combined KF and FIS algorithms, when used in state dependent (SDP) parameter, non-linear, stochastic
a more conventional state estimation setting, can func- models with Gaussian inputs. When it is possible (and
tion with system matrices characterized by time, or even the methods do seem quite widely applicable), this is
state dependent parameters. This extends their range of clearly advantageous , since it allows for the use of well
applicability to a considerable extent. Thus, the SDP tried and robust algorithms that are computationally
Identi®cation of non-linear stochastic systems 1855

much less demanding than even the `classical’ non- and back®tting procedure described in this paper.
Gaussian methods (see Durbin and Koopmans 1999). Unlike the GAM, for instance, the non-linear
The fact that the FIS algorithm can function well as functions in the SDP models are factorized into
a non-parametri c estimator means that it provides a the product of the SDP and the model variable;
powerful, recursive alternative to other, more conven- and the SDP is estimated by optimal FIS smooth-
tional, methods of smoothing, such as regularization, ing (rather than the more conventional scatter-
smoothing splines, kernel smoothers and locally plot smoothing used by Hastie and Tibshirani).
weighted kernel regression (see e.g. the practical ex- . The back®tting procedure does not provide com-
ample in Young and Pedregal 1998). It also provides a
plete covariance information on the SDP esti-
non-parametric method for transforming random vari-
mates. Could this be distorting the standard
ables (e.g. Gaussian to non-Gaussian) , or identifying the
errors on the estimates? In more general terms,
nature of a parametric transform between random vari-
what are the full theoretical statistical properties
ables.
of the SDP estimates obtained by back®tting?
Finally, it is clear that the simulated and real
examples presented in the paper, combined with those . Finally, what are the identi®ability conditions on
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

discussed in other cited references, demonstrate the the SDP models? It is clear that problems analo-
e cacy of the proposed SDP approach to modelling gous to collinearity in constant parameter model
for a fairly wide and practically useful class of non- estimation can occur and that back®tting conver-
linear stochastic systems. However, the proposed tech- gence may be a€ ected by such problems. Also, in
nique is relatively new and it raises a variety of interest- the case of input±output models, the nature of the
ing theoretical questions and possibilities for extending input signals will a€ ect the identi®ability of the
the approach to an even richer class of non-linear stoch- model parameters. It is necessary to explore
astic systems. For example: these factors further and establish what other fac-
tors may a€ ect the identi®ability of the model.
. How can the approach be extended to handle mul-
tivariable state dependencies, where the SDPs may
Regardless of the answers to these questions, how-
be functions of several state variables? One
ever, the SDP approach to the identi®cation of non-
approach currently being developed by the ®rst
linearities in stochastic systems appears to hold great
author is to model the SDPs with neuro-fuzzy
promise. In contrast to other approaches, such as neural
functions of several variables (e.g. Hu et al.
networks and NARMAX models, for example, it
2001) but with the structure of the non-linear
attempts to identify the type of non-linearity and, there-
system identi®ed ®rst from the data using the
methodology described in the present paper (see fore, the form of the non-linear model, prior to the esti-
the discussion in Young 2001 b). mation of the parameters in the ®nally identi®ed model.
This helps to ensure that the ®nal non-linear model is
. What is the best method of handling the estima- e ciently parameterized (parsimonious) and it should
tion bias that occurs when the proposed SDP
avoid the over-parameterizatio n that normally accom-
modelling approach is applied in situations
panies neural network and, to a lesser extent, the `black-
where errors-in-variables problems occur? An
box’ NARMAX models. Indeed, the SDP approach has
instrumental variable (IV) method, such as that
been developed as a primary tool in Data-Based
mentioned in } 2.4, can obviate such problems in
Mechanistic (DBM) modelling (e.g. Young, 1998 and
the case of well-behaved non-linear input±output
the prior references therein){, where its ability to obtain
models. But alternative approaches will be
parametrically e cient and physically meaningful mod-
required in the case of purely stochastic non-linear
els is essential. A practical example of this is given in the
systems without inputs and/or non-linear systems
that exhibit sensitive, chaotic behaviour. ®rst author’s comments on the paper by Hu et al. (2001),
. Although no convergence problems have been
encountered so far in the evaluation of the pro-
{ Here, the model structure is inferred directly in an
posed SDP estimation procedure, what conditions inductive manner from data in relation to a generic class of
are required for convergence of the back®tting black-box models (e.g. SDARX) and then the model is
procedure? Hastie and Tibshirani (1996) use a interpreted in a physically meaningful manner. This can be
similar back®tting procedure for estimation of contrasted with the hypothetico-deductive `Grey-Box’ mod-
their Generalized Additive Model (GAM). It elling, where the simpli®ed model structure is based on prior,
physically-based and possibly subjective assumptions, with
needs to be established whether their conclusions the parameters that characterize this simpli®ed structure
as regards convergence (which are not entirely per- then estimated from data on the basis of this assumed model
suasive, in any case) are applicable to the models form.
1856 P. C. Young et al.

which appears elsewhere in this Special Issue (Young algorithm, Journal Royal Statistical Society, Series B, 39,
2001 b). 1±38.
Durbin, J., and Koopmans, S. J., 1999, Time series analysis
SDP estimation also provides a non-parametric of non-Gaussian observations based on state space models
model that can be useful in its own right. As we have from both classical and Bayesian perspectives. Journal of the
seen, the SDP model can be simulated easily in pro- Royal Statistical Society, Series B, 62, (in press).
grams such as Simulink, thus removing the need for Gamerman, D., 1997, Markov Chain Monte Carlo (Chapman
the ®nal parametric estimation in some applications, and Hall).
such as simulation, forecasting and automatic control. Gauss, K. F., 1821, 1823, 1826, Theoria combinationis obser-
vationum erroribus minimis obnoxiae, Parts 1, 2 and sup-
Finally, it should be noted that all of the results plement, Werke, 4, 1±108.
presented in this paper were generated using tools Harvey, A. C., 1981, Time Series Models (Oxford: Phillip
from the CAPTAIN Matlab Toolbox developed at Allen, Oxford).
Lancaster. This Toolbox is currently in the ®nal stages Harvey, A. C., 1984, A uni®ed view of statistical forecasting
of ­ -testing and will be generally available towards the procedures (with comments). Journal of Forecasting, 3, 245±
283.
end of 2001. Further information on the Toolbox can Harvey, A. C., 1989, Forecasting Structural Time Series
be obtained from the ®rst author and via http://www. Models and the Kalman Filter (Cambridge University Press).
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

es.lancs.ac.uk/cres/captain/. Harvey, A. C., and Peters, S., 1990, Estimation procedures


for structural time series models, Journal of Forecasting, 9,
173±204.
Hastie, T. J., and Tibshirani, R. J., 1996, Generalized
Acknowledgements Additive Models (Chapman and Hall).
Hoberock, L. L., and Kohr, R. H., 1966, An experimental
Some of the material in this paper has appeared in determination of di€ erential equations to describe simple
two books (Young 2000, 2001 a) and the author is grate- non-linear systems. Preprints, Joint Automatic Control
ful to the publishers (Cambridge University Press, Conference, Seattle, Washington, USA.
Cambridge and Birkhauser, Boston), for permission to Holst, U., HoÈ ssjer, O., BjoÈ rklund, C., Ragnarsson, P.,
and Edner, H., 1996, Locally weighted least squares kernel
use this material. Part of this research was supported by
regression and statistical evaluation of LIDAR measure-
the EPSRC under grant GR/K77884; the BBSRC under ments, Environmetrics, 7, 410±416.
grant EO6813; and the EPSRC, together with the Hu, J., Kumamaru, K., and Hirasawa, K., 2001, A quasi-
Newton Institute for Mathematical Sciences in ARMAX approach to modelling non-linear systems.
Cambridge, for support during the ®rst author’s six International Journal of Control, this special issue.
month stay at the Newton Institute, where most of Jakeman, A. J., and Young, P. C., 1979, Recursive ®ltering
and the inversion of ill-posed causal problems, CRES
this paper was written. The ®rst author is also grateful Report No. AS/R28/1979, Centre for Resource and
for the many stimulating discussions he had with Environmental Studies, Australian National University.
Professor Alistair Mees, University of Western Jakeman, A. J., and Young, P. C., 1984, Recursive ®ltering
Australia, who also provided the squid data (with and the inversion of ill-posed causal problems. Utilitas
the permission of Kazu Aihara and Gen Matsumoto). Mathematica, 35, 351±376.
Kalman, R. E., 1960, A new approach to linear ®ltering and
The third author received some support from the
prediction problems. ASME Transactions Journal Basic
Nu eld Foundation. Finally, thanks are also due to Engineering, 82D, 35±45.
the anonymous referees for their helpful and construc- Kitagawa, G., 1981, A non-stationary time series model and
tive comments. its ®tting by a recursive ®lter. Journal of Time Series
Analysis, 2, 103±116.
Lee, R. C. K., 1964, Optimal Identi®cation, Estimation and
Control. (MIT Press).
References McCabe, A., Chotai, A., and Young, P. C., 2001, State
Brown, R. L., Durbin, J., and Evans, J. M., 1975, dependent parameter PIP control of non-linear systems.
Techniques for testing the constancy of regression relation- Report No. TR/175, Centre for Research on Environ-
ships over time, Journal Royal Statistical Society, Series B, mental Systems and Statistics, Lancaster University.
37, 141±192. Mees, A. I., 1991, Dynamical systems and tesselations: detect-
Bryson, A. E., and Ho, Y. C., 1969, Applied Optimal Control ing determinism in data. International Journal of Bifurcation
(Blaisdel). and Chaos, 1, 777±794.
Co, T., 1996, Parameter estimation of non-linear systems using Mees, A. I., 1993, Parsimonious dynamical reconstruction,
modulating functions methods. In M. I. Friswell and J. E. International Journal of Bifurcation and Chaos, 3, 669±675.
Mottershead (Eds), Identi®cation in Engineering Systems, Mees, A. I., Aihara, K., Adachi, M., Judd, K., Ikeguchi,
University of Wales, Swansea, UK, pp. 87±96. T., and Massumoto, G., 1992, Deterministic prediction
Daubechies, I., 1988, Orthonormal bases of compactly and chaos in squid axon response. Physics Letters A, 169,
supported wavelets, Communications on Pure Applied 41±45.
Mathematics, 41, 906±966. Mendel, J. M., 1969, A priori and a posteriori identi®cation of
Dempster, A. P., Laird, N. M., and Rubin, D. B., 1977, time varying parameters. Second Hawaii International
Maximum likelihood from incomplete data via the EM Conference on System Sciences.
Identi®cation of non-linear stochastic systems 1857

Norton, J. P., 1976, Identi®cation by optimal smoothing Young, P. C., 1994, Time-variable parameter and trend esti-
using integrated random walks. Proceedings of the Institute mation in non-stationary economic time series. Journal of
Electrical Engineers, 123, 451±452. Forecasting, 13, 179±210.
Norton, J. P., 1986, An Introduction to Identi®cation Young, P. C., 1996, A general approach to identi®cation,
(Academic Press). estimation and control for a class of non-linear dynamic
Plackett, R. L., 1950, Some theorems in least squares. systems. In M. I. Friswell and J. E. Mottershead (Eds),
Biometrika, 37, 149±157. Identi®cation in Engineering Systems (Swansea: University
Priestley, M. B., 1988, Nonlinear and Nonstationary Time of Wales), pp. 436±445.
Series Analysis (Academic Press). Young, P. C., 1998, Data-based mechanistic modelling of
Ruanaidh, J. J., and Fitzgerald, W. J., 1996, Numerical environmental, ecological, economic and engineering
Bayesian Methods Applied to Signal Processing (Springer). systems. Environmental Modelling and Software, 13, 105±
Schweppe, F., 1965, Evaluation of likelihood function for 122.
Gaussian signals. IEEE Transactions on Information Young, P. C., 1999, Nonstationary time series analysis and
Theory, 11, 61±70. forecasting. Progress in Environmental Science, 1, 3±48.
Taylor, C. J., Young, P. C., and Chotai, A., 2000, State Young, P. C., 2000, Stochastic, dynamic modelling and signal
space control system design based on non-minimal state processing: time variable and state dependent parameter
variable feedback: further generalisations and uni®cation estimation. In W. J. Fitzgerald, A. Walden, R. Smith
results, International Journal of Control, 73, 1329±1345.
Downloaded by [Case Western Reserve University] at 14:43 13 October 2014

and P. C. Young (Eds) Nonlinear and Nonstationary Signal


Young, P. C., 1969, Applying parameter estimation to
Processing (Cambridge: Cambridge University Press),
dynamic systems: Part l, Theory. Control Engineering, 16,
pp. 74±114.
119±125; Part lI Applications, 16, 118±124.
Young, P. C., 2001 a, The identi®cation and estimation of
Young, P. C., 1970, An instrumental variable method for real-
non-linear stochastic systems. In A. I. Mees (Ed.)
time identi®cation of a noisy process. Automatica, 6, 271±
Nonlinear Dynamics and Statistics (Boston: Birkhauser),
287.
Young, P. C., 1978, A general theory of modeling for badly pp. 127±166.
de®ned dynamic systems. In G. C. Vansteenkiste (Ed.), Young, P. C., 2001 b, Comment on `A quasi-ARMAX
Modeling, Identi®cation and Control in Environmental approach to the modelling of non-linear systems’ by J. Hu
Systems (North Holland), 103±135. et al., International Journal of Control, this special issue.
Young, P. C., 1984, Recursive Estimation and Time-Series Young, P. C., and Beven, K. J., 1994, Data-based mechanistic
Analysis (Springer). modelling and the rainfall-¯ow non-linearity. Environ-
Young, P. C., 1988, Recursive extrapolation, interpolation metrics, 5, 335±363.
and smoothing of non-stationary time-series. In H. F. Young, P. C., and Pedregal, D. J., 1998, Recursive and en-
Chen (Ed.) Identi®cation and System Parameter Estimation bloc approaches to signal extraction. Journal of Applied
1988 (Pergamon Press), pp. 33±44. Statistics, 26, 103±128.
Young, P. C., 1989, Recursive estimation and modelling Young, P. C., Pedregal, D. J., and Tych, W., 1999,
of non-stationary and non-linear time-series. In IFAC Dynamic harmonic regression. Journal of Forecasting, 18,
Symposium on Adaptive Systems in Control and Signal 369±394.
Processing, Vol. 1 (Pergamon Press), pp. 49±64. Zhan-Qian, J. L., and Smith, R. L., 1998, Estimating local
Young, P. C., 1993, Time variable and state dependent mod- Lyapunov exponents by local polynomial regression, Uni-
elling of non-stationary and non-linear time series In T. versity of North Carolina, Department of Statistics, http://
Subba Rao (Ed.), Developments in Time Series Analysis www.stat.unc.edu./postcript/rs/chaos.
(Chapman and Hall), pp. 374±413.

You might also like