Automatica 1970

Automatica, Vol. 7, pp. 123-162. Pergamon Press, 1971. Printed in Great Britain.
System Identification A Survey"

Une revue de l'identification des syst mes
Systemidentifikation, eine 0bersicht
O63op ono3HaBaHVIa CHCTeM
K. J. A S T R O M t a n d P. E Y K H O F F +
Summary--The field of identification and process-parameter (5) Identification of linear systems

estimation has developed rapidly during the past decade. In Least squares identification of a parametric model
this survey paper the state-of-the-art/science is presented in A probabilistic interpretation
a systematic way. Attention is paid to general properties and Comparison with correlation methods
to classification of identification problems. Model structures Correlated residuals
are discussed; their choice hinges on the purpose of the Repeated least squares
identification and on the available a priori knowledge. Generalized least squares
For the identification of models that are linear in the The maximum likelihood method
parameters, the survey explains the least squares method and Instrumental variables
several of its variants which may solve the problem of Levin's method
correlated residuals, viz. repeated and generalized least Tally principle
squares, maximum likelihood method, instrumental variable Multivariable systems
method, tally principle.
Recently the non-linear situation, the on-line and the real- (6) Identification of non-linear systems
time identification have witnessed extensive developments. Representation of non-linear systems
These are also reported. There are 230 references listed, Estimation of a parametric model
mostly to recent contributions. In appendices a resum6 is (7) On-line and real-time identification
given of parameter estimation principles and a more detailed Model reference techniques
exposition of an example of least squares estimation. On-line least squares
Contraction mappings
Stochastic approximations
Organization of the paper Real-time identification
The contents of the paper are arranged as follows: Non-linear filtering
Approximations
(1) Introduction
Status of the field (8) Conclusions
(2) General properties of identification problems (9) References

Purpose of identification Appendix A A resum6 of parameter estimation
Formulation of identification problems
Relations between identification and control;--the Appendix B An example of least squares identification of
separation hypothesis a parametric model
Accuracy of identification
(3) Classification of identification methods 1. INTRODUCTION
The class of models
The class of input signals IN RECENT years aspects of system identification
The criterion have been discussed i n a m u l t i t u d e of papers, at
Computational aspects m a n y conferences a n d in a n appreciable n u m b e r of
(4) Choice of model structure university courses. A p p a r e n t l y the interest in this
The concept of linearity in the parameters subject has different roots, e.g.
Representation of linear systems
Canonical forms for linear deterministic systems Definite needs by engineers in process industries
Canonical forms for linear stochastic systems to o b t a i n a better knowledge a b o u t their plants
for i m p r o v e d control. This holds n o t only for
the chemical b u t also for the mechanical a n d
* Received 9 February 1970; revised 24 August 1970.
The original version of this paper was an Invited Survey other p r o d u c t i o n industries.
Paper which was presented at the 2nd IFAC Symposium on The task of studying high performance aero a n d
Identification and Process Parameter Estimation, held in
Prague, Czechoslovakia, during June 1970. It was recom- space vehicles, as well as the d y n a m i c s o f more
mended for publication in revised form by the editorial staff. d o w n - t o - e a r t h objects like railway carriages a n d
t Division of Automatic Control, Lund Institute of hydrofoils.
Technology, Lund, Sweden.
Department of Electrical Engineering, Technical Uni- Study of the h u m a n being in tracking action a n d
versity, Eindhoven, Netherlands. i n other types of control.
123
124 K . J . ASTROMand P. EYKHOFF
Research of biological functions, e.g. of neuro- bewildering, even for so-called experts. Many
muscular systems like the eye pupil response, arm different methods and techniques are being analysed
or leg control, heart rate control, etc. and treated. "'New methods" are suggested cn
Not only the needs for, but also the possibilities masse, and, on the surface, the field appears to look
of estimation have dramatically changed with the more like a bag of tricks than a unified subject.
development of computer hardware and software. On the other hand many of the so-called different
More or less apart from the "engineering" and methods are in fact quite similar. It seems to be
"biological" approach the econometricians and highly desirable to achieve some unification of the
statisticians have been working on dynamical field. This means that an abstract frame-work to
economic models, leading to incidental cross- treat identification problems is needed. In this
fertilization with engineering [l]. context it appears to us that the definition of an
At many universities the field has been recognized identification problem given by ZADEH [2] can be
as a legitimate subject for faculty and Ph.D. used as a starting point, .;.e. an identification pro-
research. The net result of this development is a blem is characterized by three quantities: a class
large number of publications, either accentuating of models, a class of input signals, and a criterion.
a particular type of approach or describing a certain We have tried to emphasize this point of view
case study. In this survey paper the "motivation" throughout the paper.
of the identification is derived from control engi- For a survey paper like this it is out of question
neering applications. to strive for completeness. Limitations are given
Throughout the history of automatic control it by: the number of relevant publications; the
has been known that the knowledge about a system balance between the "educational" and the "expert"
and its environment, which is required to design a slant of this representation; the (in)coherence of the
control system, is seldom available a priori. Even field and the wide spectrum of related topics.
if the equations governing a system are known in Also it is desirable to keep in mind that until now
principle it often happens that knowledge of parti- a number of survey papers have been written, based
cular parameters is missing. It is not uncommon on many references. For an indication where this
that the models which are available are much too new paper [0] stands with respect to the older ones
complex etc. Such situations naturally occur in the reader is presented with an enumeration of
many other fields. There are, however, two facts topics dealt with in the IFAC survey papers:
which are unique for the identification problems EYKHOFF, VAN DER GRINTEN, KWAKERNAAK and
occurring in automatic control, i.e. VELTMAN [208], CUENOD and SACE [206], EYKHOFF
[22], and BALAKRISHNANand PETERKA[351.
It is often possible to perform experiments on the
system in order to obtain the lacking knowledge. IDENTIFICATION IN AUTOMATIC CONTROL
The purpose of the identification is to design a SYSTEMS
control strategy. General aspects
The purpose of ident([ication/estimation procedures
One of the factors which undoubtedly contributed
Identification
very much to the great success of frequency res-
--definition and formulation [0]
ponse techniques in "classical" control theory was
--and control [0], [35]
the fact that the design methods were accompanied
Model representation
by a very powerful technique for systems identifi-
--a priori knowledge [22]
cation, i.e. frequency analysis. This technique made
--linear [0], [22], [351, [2061, [2081
it possible to determine the transfer functions
-linear in the parameters [0], [22]
accurately, which is precisely what is needed to
--non-linear, general [206]
apply the synthesis methods based on logarithmic
--non-linear, Wiener [206]
diagrams. The models used in "modern" control
--non-linear, Volterra [35]
theory are with a few exceptions parametric models
--linear/non-linear-in-parameters [0]
in terms of state equations. The desire to determine
--multivariable [0]
such models from experimental data has naturally
Industrial models
renewed the interests of control engineers in para-
--use [2081
meter estimation and related techniques.
---examples dynamic]static [208]
Status of the field
Although it is very difficult to get an overview Formulation of the estimation problem
of a field in rapid development we will try to point Classes of instrumentation [0], [22], [35], [208]
out a few facts which have struck us as being rele- --models [0]
vant when we prepared this survey. --input signal [0]
The field of identification is at the moment rather ----criteria [0]
System identificationwa survey 125
----explicit mathematical relations/model 2. GENERAL PROPERTIES OF IDENTIFICATION

adjustment [208] PROBLEMS
--"one-shot" technique/iterative technique [0] Purpose of identification
achievable accuracy [0], [35] When formulating and solving an identification
input noise [0] problem it is important to have the purpose of the
identifiability [0] identification in mind. In control problems the
final goal is often to design control strategies for
Parameter estimation a particular system. There are, however, also situa-
Relationship between estimation techniques [221 tions where the primary interest is to analyse the
properties of a system. Determination of rate
Least squares/generalized least squares coefficients in chemical reactions, heat transfer
[01, C22], [35], [208] coefficients of industrial processes and reactivity
coefficients in nuclear reactors are typical examples
"One-shot" techniques of such a "diagnostic" situation. In such a case
auto and cross correlation [0], [22], [206], [208] determination of specific parameter values might be
differential approximation [2061 the final goal of the identification. Many problems
deconvolution, numerical [206] of this type are also found in biology, economy,
normal equations (sampled and medicine.
signals) [0], [22], [35], [208] Even if the purpose of the identification is to
residual; C0] design a control system the character of the problem
test of model order C0] might vary widely depending on the nature of the
combined model and noise param, estimation [0] control problem. A few examples are given below:
instrumental variables [0], [35]
Design a stable regulator.
generalized model/minimization of
equation-error [0], [22], [35], [208] Design a control program for optimal transition
from one state to another.
lterative techniques
Design a regulator which minimizes the variations
model adjustment [0], [22], [35], [208]
in process variables due to disturbances.
--on-line, real-time [0], [35]
--sensitivity [22], [208] In the first case it might be sufficient to have a
--hill climbing techniques C0] fairly crude model of the system dynamics. The
stochastic approximation C0], [22], [35] second control problem might require a fairly
relation with Kalman filtering C0] accurate model of the system dynamics. In the
third problem it is also necessary to have a model
Maximum likelihood [0], [22], [35], [208]
of the environment of the system. Assuming that
achievable accuracy [35]
the ultimate aim of the identification is to design
properties [0], [22]
a control strategy for a system, what would con-
stitute a satisfactory solution from a practical point
Bayes' Estimation [22]
of view ?
Use of deterministic test signals In most practical problems there is seldom
choice of input signals [0] sufficient a priori information about a system and
comparison of a number of test signals [208] its environment to design a control system from
sinusoidal test signals [206] a priori data only. It will often be necessary to
pseudo-random binary noise [35], [208] make some kind of experiment, observe the process
while using perturbations as input signals and ob-
State estimation serve the corresponding changes in process
state desc6ption, examples [208] variables. In practice there are, however, often
state estimation [0], [208] severe limitations on the experiments that can be
--non-linear filtering [0] performed. In order to get realistic models it is
often necessary to carry out the experiments
Parameter and state estimation combined during normal operation. This means that if the
gradient method [206] system is perturbed, the perturbations must be
quasilinearization [0], [221, [206] small so that the production is hardly disturbed.
invariant imbedding [0], [206] It might be necessary to have several regulators in
operation during the experiment in order to keep
Other surveys and monographs of general interest the process fluctuations within acceptable limits.
are ALEKSANDROVSKIIand DEmI~ [3], BEKEV [4], This may have an important influence on the esti-
LEE [209], RAIBMAN[221] and STROBELC6], [42]. mation results.
126 K.J. AS'FROMand P. EYKHOFF
When carrying out identification experiments of Using Zadeh's formulation it is necessary Io

this type there are many questions which arise specify a class of systems, , f = .~S }, a class of input
naturally: signals, ~ , and the meaning of "equivalent". 1,1
the following we will call "the system under test"
How should the experiment be planned ? Should
simply the process and the elements of J will be
a sequential design be used, i.e. plan an experi-
called models. Equivalence is often defined in terms
ment using the available a priori information,
of a criterion or a lossfunctio, which is ~ functional
perform that experiment, plan a new experiment
of the process output y and the model output I',,,, i.e.
based on the results obtained, etc. When should
the experimentation stop ?
V = V(y, y,,,) . (I)
What kind of analysis should be applied to the
results of the experiment in order to arrive at Two models m~ and m2 are then said to be equiva-
control strategies with desired properties ? What lent if the value of the loss function is the same for
confidence can be given to the results ? both models, i.e.
What type of perturbation signal should be used
to get as good results as possible within the limits V(y, ym~)= V(y, Y,,2)"
given by the experimental conditions ?
There is a large freedom in the problem formulation
If a digital computer is used what is a suitable which is reflected in the literature on identification
choice of the sampling interval ? problems. The selection of the class of models,
5p, the class of input signals, ~//, and the criterion
In spite of the large amount of work that has been is largely influenced by the a priori knowledge of
carried out in the area of system identification we the process as well as by the purpose of the identi-
have at present practically no general answers to fication.
the problems raised above. In practice most of these When equivalence is defined by means of a loss
general problems are therefore answered in an ad function the identification problem is simply an
hoe manner, leaving the analysis to more specified optimization problem: find a model SoE5p such
problems. In a recent paper JACOB and ZADEH [7] that the loss function is as small as possible. In
discuss some of the questions in connection with such a case it is natural to ask several questions:
the problem of identifying a finite state machine; Is the minimum achieved ?
c.f. also A N ~ L and BEKEY [8] Some aspects of the
choice of sampling intervals are given in FANTAUZZl ls there a unique solution ?
[9] and SANO and TERAO l1 1]. Sincethegeneralprob- Is the uniqueness of the solution influenced by
lems discussed above are very difficult to formalize the choice of input signals ?
one may wonder if there will ever be rational
answers to them. Nevertheless, it is worthwhile to If the solution is not unique, what is the charac-
recognize the fact that the final purpose of identifi- ter of the models which give the same value of
cation is often the design of a control system, since the loss function and how should 5~ be restricted
this simple observation may resolve many of the in order to ensure uniqueness ?
ambiguities of an identification problem. A typical
example is the discussion whether the accuracy Answers to some of these problems have been given
of an identification should be judged on the basis for a simple class of linear systems arising ~n
of deviations in the model parameters or in the biomedical applications by BELLMANand ASTR6M
time response. If the ultimate purpose is to design [12]. The class of models 5~ has been called identi-
control systems then it seems logical that the ac- fiable if the optimization problem has a unique
curacy of an identification should be judged on the solution. Examples of identifiable and non-
basis of the performance of the control system identifiable classes are also given.
designed from the results of the identification. The formulation of an identification problem as
an optimization problem also makes it clear that
there are connections between identification theory
Formulation of identification problems and approximation theory. Many examples are
found in the literature, e.g. LAMPARO [13], KITA-
The following formulation of the identification
MORI [14], BARKER and HAWLEY [15], ROBERTS[16],
problem was given by ZADEH [2]:
[17], and others, where covariance functions are
"Identification is the determination, on the basis identified as coefficients in orthogonal series expan-
of input and output, of a system within a specified sions. Recent examples are SCHWARZE [18],
class of systems, to which the system under test is GORECKI and TUROWICZ [19], and DONATI and
equivalent." MILANESE [207].
System identification--a survey 127
Another type of identification problem is environment are known precisely. It can be

obtained by imbedding in a probabilistic frame- necessary to modify the control strategy to take
work. If Se is defined as a parametric class, Se into account the fact that the identification is not
= {Sa}, where/~ is a parameter, the identification precise. Conceptually, it is known how these
problem then reduces to a parameter estimation problems should be handled. In the extreme case
problem. Such a formulation makes it possible to when identification and control are done simul-
exploit the tools of estimation and decision theory. taneously for a system with time-varying para-
In particular it is possible to use special estimation meters the dual control concept of FEL'DBAUM
methods, e.g. the maximum likelihood method, [26] can be applied. This approach will, however,
Bayes' method, or the rain-max method. It is lead to exorbitant computational problems even
possible to assign accuracies to the parameter for simple cases. C.f. MENDES [27] and the discus-
estimates and to test various hypotheses. sion in section 7 of th~s paper.
Also in many probabilistic situations it turns out It can also be argued that the problem of control-
that the estimation problem can be reduced to an ling a process with unknown parameters can be
optimization problem. In such a case the loss approached without making reference to identifica-
function '(1) is, however, given by the probabilistic tion at all. As a typical example we mention on-
assumptions. Conversely to a given loss function line tuning of PID regulators. In any case it seems
it is often possible to find a probabilistic interpreta- to be a worthwhile problem to investigate rigorously
tion. The problem can usually be handled analy- under what conditions the separation hypothesis is
tically under gaussian assumptions. An approxi- valid. Initial attempts in this direction have been
mation technique in the non-gaussian case is dis- made by SCHWARTZ and STEIGLITZ [28], /~STROM
cussed by RAIBMANand collaborators [215], [216]. and WITTENMARK [29]. Apart from the obvious
There are several good books on estimation fact that it is desirable to choose a class of model Se
theory available, e.g. DEUTSCH [20] and NAHI [21]. for which there is a control theory available, there
A summary of the important concepts and their are also many other interesting questions in the
application to process identification is given by area of identification and control, e.g.
EYKHOFF[22]. An expos6 of the elements of estima- Is it possible to obtain rational choices of model
tion theory is also given in Appendix A. structures and criteria for the identification if
Also in the probabilistic case it is possible to de- we know that the results of identification will be
fine a concept of identifiability using the framework used to design control strategies ?
of estimation theory. In ~STRrM and BOHLIN [23]
What "accuracy" is required of the solution of an
a system is called identifiable if the estimate is
identification problem if the separation hypo-
consistent. A necessary condition is that the infor-
thesis should be valid at least with a specified
mation matrix, associated with the estimation
error ?
problem, is positive definite. Another concept of
identifiability has been given by BALAKRISHNAN Partial answers to these questions are given by
[24]. The question of identifiability has also been /~STROMand WITTENMARK[29] for a restricted class
pursued by STALEY and YUE [25]. The concept of of problems.
determinable classes introduced by ROOT [228],
[229], [230] is also closely related to identifiability. Accuracy of identification
The problem of assigning accuracy to the result
Relations between identification and control The of an identification is an important problem and
separation hypothesis. also a problem which always seems to give rise to
Whenever the design of a control system for discussions; e.g. QVARNSTROM[30], HERLES [31].
a partially known process is approached via identi- The reason is that it is possible to define accuracy
fication it is an a priori assumption that the in many different ways and that an identification
design can be divided into two steps: identification which is accurate in one sense may be very inac-
and control. In analogy with the theory of curate in another sense.
stochastic control we refer to this assumption as For example in the special case of linear systems
the separation hypothesis. The approach is very it is possible to define accuracy in terms of devia-
natural, in particular if we consider the multitude tions in the transfer function, in the impulse
of techniques which have been developed for the response, or in the parameters of a parametric
design of systems with known process dynamics model. Since the Fourier transform is an un-
and known environments. However, it is seldom bounded operator, small errors in the impulse
true that optimum solutions are obtained if a response can very well give rise to large errors in the
process is identified and the results of the identi- transfer function and vice versa. A discussion of
fication are used in a design procedure, developed this is given by UNBEHAUEN and SCHLEGEL [32]
under the assumption that the process and its and by SaXOBEL[5]. It is also possible to construct
128 K . J . ,~STROM and P. EYKHOFF
examples where there are large variations in a para- S r. How large a deviation of the loss function is
metric model in spite of the fact that the corres- obtained ? For So the control strategy which mini-
ponding impulse response does not change much. mizes (9) is given by
See e.g. STROBEL[42].
Many controversies can be resolved if we take u(t)= -z~x(t). (10)
the ultimate goal of the identification into account.
The minimal value of the loss is
This approach has been taken by ~T~P~N [33] who
considers the variation of the amplitude margin with min V = ~.
the system dynamics. Such an analysis is ideally
suited to analyse the case when the purpose of the If ~ = 1 it can be shown that a very slight increase
identification is to design a stable proportional of the loss function is obtained if say T=0.001.
regulator. The following example illustrates tile However, if ¢¢= 2000 ( > 7z/2T) the criterion (9) will
difficulties in a universal concept of accuracy. be infinite for the strategy (10) because the system
Example. Consider the process ST given by is unstable. We thus find that the same model
error is either negligible or disastrous depending on
dx the properties of the loss function.
- u ( t - T). (2)
dt Limitations in the possibilities of identification
are discussed by Sa-~'XN [34].
The transfer function is
3. C L A S S I F I C A T I O N O F I D E N T I F I C A T I O N
Hr(s )=_.1 e_~r (3) METHODS
S The different identification schemes that are
available can be classified according to the basic
and the unit step response is elements of the problem, i.e. the class of systems Se,
the input signals qg, and the criterion. Apart from
hr(t)= -T t>T. (4) this it might also be of interest to classify them
with respect to implementation and data processing
Assume that the process S T is identified as So. is requirements. For example: in many cases it might
it possible to give a sensible meaning to the accuracy be sufficient to do all computations off line, while
of the identification ? It is immediately clear that other problems might require that the results are
the differences obtained on line, i.e. at the same time the measure-
ments are done. Classifications have been done
max [hr( t)-- ho( t )] (5) extensively in EYKHOFF [22], BALAKRISHNAN and
t
PETERKA [35].
max [Hr(joo)- Ho(jw)[ (6)

The class of models bp
The models can be characterized in many
can be made arbitrarily small if T is chosen small different ways, by nonparametrie representations
enough. On this basis it thus seems reasonable to such as impulse responses, transfer functions,
say that So is an accurate representation of Sr if covariance functions, spectral densities, Volterra
T is small. On the other hand the difference series and by parametric models such as state models
IlogHr(joJ)-logtIo(jOJ)}=(oT (7) d x = f ( x , u, fl)

dt
i.e. the difference in phase shift, can be made arbit-
rarily large, no matter how small we choose T. y=g(x, u,/3) (11)
Finally, assume that it is desired to control the
system (2) with the initial condition where x is the state vector, u the input vector, y the
output vector and /3 a parameter (vector). It is
~,(o) = l (8) known that the parametric models can give results
with large errors if the order of the model does
in such a way that the criterion not agree with the order of the process. An
illustration of this is given in an example of section
5. A more detailed discussion of parametric model
V= {~2x2(t) + u2(t)}dt (9)
o structures is given in section 4. The nonparametric
representations have the advantage that it is not
is minimal. Suppose, that an identification has necessary to specify the order of the process
resulted in the model So while the process is actually explicitly. These representations are, however,
intrinsically infinite dimensional, which means that hand if the input signals can be chosen, how should
it is frequently possible to obtain a model such that this be done ? It has been shown by ASTRrM and
its output agrees exactly with the process output. BOHLIN [23], /~STRrM [48], AOKI and STALEY [49]
A typical example is given below. that the condition of persistent excitation (of order
Example. Suppose, that the class of models is n), i.e. that the limits
taken as the class of linear time-invariant systems
with a given transfer function. A reasonable 1 N
estimate of the transfer function is then given by a = ~¢-.limooN k~=l u( k )
fff jr y(t)e-~tdt and

l(s)- ~o ,(t)e-~tdt 1 N
R.(i)= l i m - Z {u(k)-~}{u(k+i)-~}
N~ Nk=I
where u is the input to the process and y is the 02)
output. To "eliminate disturbances" we might exist and the matrix A. defined by
instead first compute the input covariance function
A,={au=R,(i-j) } i , j = l . . . . . n (13)
R,,(z): i f S-I'1 u(t)u(t+z)dt
is positive definite, is sufficient to get consistent
estimates for least squares, maximum likelihood
and the input-output covariance and maximum likelihood in the special case of
white measurement errors.
1 f'r-~ One might therefore perhaps dare to conjecture
R.y(z)= T J o u(t)y(t+z)dt z>0 that a condition of this nature will be required in
general.
i • u(t)y(t+z)dt z<~O
Notice, that if the "mean square fourier trans-
form" of the input, i.e.
1 N
and then estimate the transfer function by x / N k l i m ~x [u(k)--u]eio'k
f,(co)= N_.~ (14)
FIrs "l-IrrR"y(z)e-~z
2t , Sr_rR,(z)e_S~dz . exists, then
It is easy to show that /~1~-.-~-/~2. The reason is

R.(~)---flf.(~o)[2e"°'do~ (15)
J
simply that the chosen transfer function will make
the model output exactly equal to the process out- and the matrix A, given by (12) is automatically
put, at least if the process is initially at rest. nonnegative definite for arbitrary n. Moreover, if
Interesting aspects of parametric versus non-
parametric models are found in the literature on f,(co)>0 (16)
time series analysis. See for example MANN and
WALD [36], WmTTLE [37], GRENANDERand ROSEN- the matrix A. is also positive definite for arbitrary
BLATT [38], JENKINS and WATTS [39]. Needless to n. The condition (16) is thus sufficient to guarantee
say the models must, of course, finally be judged that an input signal is persistently exciting of
with respect to the ultimate aim. arbitrary order. Also notice that if the input signal
is an ergodic stationary stochastic process then its
The class of input signals spectral density is given by
It is well known that significant simplifications in
the computations can be achieved by choosing input . (i7)
signals of a special type, e.g. impulse functions,
step functions, "colored" or white noise, sinu- The condition (16) then implies that the spectral
soidal signals, pseudo-random binary noise (PRBS), density of the input signal does not vanish for
etc. A bibliography on PRBS is given in NIKIFORUK any co, a condition which is well known for corre-
and GUPTA [40]. See also GODFREY [41]. For the lation analysis.
use of deterministic signals c.f. STROBEL [42], Apart from persistent excitation many applica-
GITT [43], WILFERT [44], VAN DEN l O S [45], WEE- tions will require that the output is kept within
FONDERand HASENKOPF[46], CUMMING[47]. specified limits during the experiment. The prob-
From the point of view of applications it seems lem of designing input signals, energy and time
highly desirable to use techniques which do not constrained, which are optimal e.g. in the sense
make strict l~mitations on the inputs. On the other that they minimize the variances of the estimates,
130 K . J . ,~STR6M and P. EYKHOFF
has been discussed by LEVADI [50], AOK~ and criterion (18) can be interpreted as a least squares
SfALEY [49], [198], and NAHI [213]. The same criterion for the error e. In the case
problem is also discussed in RAULT et al. [51].
It is closely related to the problem of optimal e = y - Ym = Y - M ( u ) (19)
signal design in communication theory: see e.g.
M1DDLETON [52]. where M ( u ) denotes the output of the model when
Many identification procedures require that the the input is u, e is called the o u t p u t error. It is the
input signal is independent of the disturbances natural definition when the only disturbances are
acting on the process. If this condition is not white noise errors in the measurement of the output.
fulfilled it might still be possible to identify the In the case
parameters. Each specific case must, however,
be analyzed in detail. Typical cases where the e = u - u,, = u - M - l (y) (20)
input may depend on the disturbances are when
normal operating records are used, and the system where u,,= M - J ( y ) denotes the input of the model
is under closed loop control as in adaptive systems. which produces the output y, e is called the input
The danger of identifying systems under closed error. The notation M - ~ implies the assumption
loop control deserves to be emphasized. Consider that the model is invertible, roughly speaking that
the classical example of Fig. 1. it is always possible to find a unique input which
produces a given output. Rigorous definitions
of the concept of invertibility are discussed by
BROCKETT and MESAROVIC [54], SILVERMAN [55],
SAIN and MASSEY [56]. From the point of view of
estimation theory the criterion (18) with the error
defined as the input error (20) would be the natural
criterion if the disturbances are white noise entering
at the system input.
FIG. 1. Identification of a process in a closed loop. Since the output of a system depends not only
on the input but also on the state, the error as de-
fined by (19) and (20) will also depend on the initial
An attempt to identify H e from measurements of
state of the model M. This creates little difficulty
u and y will give
in the case of finite dimensional state models since
the initial state can always be included in the
HR parameters. For nonparametric problems there
might, however, be some difficulties. The problem
i.e. the inverse of the transfer function of the feed- has been discussed by SILVERMAN[57] in connection
back. In industrial applications the feedback can with correlation methods.
enter in very subtle ways e.g. through the action In a more general case the error can be defined as
of an operator who makes occasional adjustment.
FISrlER [53] has shown the interesting result that e= M2 l(y)_ Ml(u ) (21)
the process may be identified if the feedback is
made non-linear. where Mz represents an invertible model. This
type of model and error (21) are referred to as
The criterion g e n e r a l i z e d m o d e l and g e n e r a l i z e d e r r o r EYKHOFF
It was mentioned in section 2 that the criterion is [113]. A special case of the generalized error is the
often a minimization of a scalar loss function. The "equation error" introduced by POTTS, ORNSTEIN
loss function is chosen a d h o c when the identifica- and CLVMER [58]. Figure 2 gives an interpretation
tion problem is formulated as an optimization of the different error concepts in terms of a block
problem and it is a consequence of other assump- diagram.
tions when the problem is formulated as an
Computational aspects
estimation problem.
Mostly the criterion is expressed as a functional All solutions to parametric identification prob-
of an error e.g. lems consist of finding the extremum of the loss
function V considered as a function of the para-
V(y, y,~)= e2(t)dt (18) meters ft. The minimization can be done in many
o different ways, e.g.
where y is the process output, y,, the model output, as a " o n e - s h o t " approach, i.e. solving the rela-
and e the error; y , Ym, and e are considered as tions that have to be satisfied for the extremum
functions defined on (0, 7"). Notice, that the of the function or functional V or:
This method, applied to a positive definite
proec$$~ Y quadratic function of n variables, can reach the

minimum in at most n steps.
In these methods it has not been taken into ac-
count that in the practice of estimation the deter-
mination of the grad.;ent is degraded through the
stochastic aspects of the problem. A method which
outout error considers this uncertainty in the gradient-deetr-
mination is the:
(f) stochastic approximation method:
_ process
fl(i + 1) = fl(i)- F(i)V~(fl(i))
where F(i) has to fulfil the conditions:
r(i)~>0
~ F2(i)< ~3
input error i=1
and
pr0ees-5
• F(i)--,~
i=1
as n--, oo.
In the scalar case F(i)=l/i satisfies the above

conditions.
A good survey of optimization techniques is
generalized error found in the book by WILDE [59]. See also BEKEY
and MCGEE [60].
FIG. 2. Some error concepts.
as an iterative approach, i.e. by some type of 4. CHOICE OF MODEL STRUCTURE

hillclimbing. In this case numerous techniques The choice of model structure is one of the basic
are available, e.g. ingredients in the formulation of the identification
(a) cyclic adjustment of the parameters one-by- problem. The choice will greatly influence the
one, a.o. Southwell relaxation method character of the identification problem, such as:
the way in which the results of the identification can
(b) gradient method: be used in subsequent operations, the computa-
fl(i+ 1)=fl(i)-FV~[fl(i)] F>O* tional effort, the possibility to get unique solutions,
etc. There are very few general results available
(c) steepest descent method:
with regard to the choice of structures.
fl(i + 1)=fl(i)-F(i)Vs[fl(i)] F(i)>O In this section we will first discuss the concept
F(i) chosen such that V(fl) is minimum in the of linearity in the parameters and we will then dis-
direction of the gradient. cuss the structure of linear systems.
(d) Newton's method: The concept of linearity in the parameters
fl( i + 1) = fl( i) - F( i) Vp[fl( i)] In control theory the distinction between linear
r(i) = [ v~p(/~( i))] - ' and non-linear is usually based on the dynamic
behaviour, i.e. the relation between the dependent
(e) conjugate gradient method:
and the independent time variables. For parameter
fl( i + 1) ---fl(i) -- F(i)s(i) estimation another distinction between linearity
IIv ( (o)ll 1)
and nonlinearity is of as much importance, viz.
I[va(fl(i- 1))[Izs(i-'' with respect to the relation between the dependent
F(i) > 0 chosen to minimize V(fl (i) - F(i)s(i)). variables and the parameters. To be specific a
system is said to be linear in the parameters if the
* V is used as a shorthand notation for the gradient of generalized error is linear in the parameters. Ap-
V with~respect to [~: parently, these two notions of linearity have no
v, = v,v--j= .... immediate relation as can be seen from the follow-
ing examples.
132 K.J. ~kSTROMand P. EYKHOFF
We assume a p~ocess with input signal u and Such non-linear expressions, that can be made linear
output signal y. Then the "model" may be chosen in the parameters through transformation, arc
to form an "error" e between process and model called intrhlsieally linear. If such a linearization is
output in the following way. not possible then imrinsicallv non-linear is used.
Model
linear in the non-linear in the
Process parameters parameters
II
e= y-- W= ) ' - - -
= D+~
f'+ ay = tt "-~ .. e=.f,+~y-u
_¢=
o
o
e = y - w = y - ~r[u, :z]
f.+avS=u = .- e=);,+c~y3- u
© I~~~- ,~W 3 ~ II
Henceforth we will use the term "linear" for the It may pay to make transformations even if the
dynamic behaviour and use "linear in the para- system is intrinsically non-linear; see e.g. DISKIND
meters" for the other type.* [61]. A typical example is the identification of
In connection with estimation schemes the great discrete-time linear system when the output is
importance of linearity in the parameters will measured with white measurement noise. The
become clear. Therefore it pays to try to find representation of the system by the coefficients of
iransformations of the variables to obtain such a the pulse transfer function leads to a non-linear
dnearity if possible. Some simple examples may regression problem while the representation of the
illustrate this. model by coefficients of a generalized model or by
the ordinates of the weighting function leads to an
Example 1. The transformation estimation problem which is linear in the para-
meters.
--'-4'/,/1 , - - - " ~ tl 2 ~ "4'ill __1--,f12
XI X2 ~I at Representation o f linear systems
brings the non-linear relation Linear time-invariant systems can be represented
in many different ways by input-output descrip-
_ ~2X2 + X 1
tions, such as impulse response or transfer func-
~IX1X2
tion H or by the state model S(A, B, C, D) defined
to the linear form by
dx
z = & u l +[;2u2. - A x + Bu
dt
Example 2. The transformation y = Cx + Du (22)
logz--+y, Iogxl~u t , logx2---*U 2 where x is an n-vector, the input u is a p-vector,

and the output y is an r-vector. It is well-known that
logc--~fi o , Sq-+31 , 3( 2 ----~/~2
the systems S(A, B, C, D) and S ( T A T - 1 , TB,
brings C T -1, D) where T is a nonsingular matrix are
Z -~- C X l e t l x 2 ct2 equivalent in the sense that they have the same
input-output relation. It is also easy to verify
to the linear form that the systems S(A, B, C, D) and S(,4, B, C, 15)
are equivalent in the sense that they have the same
y=flO+filtll + f12L/2 •
input-output relation if
* Note that also the term "'order" may cause confusion. D=O
In regression analysis this term refers to the highest degree
of the independent variable: CAkB=CAkB k=O, 1. . . . . n. (23)
y=f3o+fhul+n models of the first
y=[~o+fhul+~2u2+... +~,,,um+n order The relations between the different representations
y=130+131ul +1~2ul2 model of the second
order were clarified by Kalman's work; see e.g. KALMAN
System identifieationwa survey 133
[62]. The impulse response and the transfer func-

tion only represent the part of the system S which is 711 712 ... 71.
completely controllable and completely observable. ?21 Y22 "'' Y2n
It is thus clear that only the completely control-
y=
lable and completely observable part of a state
model S(A, B, C, D) can be determined frominput-
output measurements. The impulse response and
the transfer function are easily obtained from the ~rl ~r2 ... ~rn
state description. The problem of determining a
state model from the impulse response is more
subtle, even if we disregard the fact that only the
controllable and observable subsystem can be dll d12 ... dtp
determined from the impulse response. The prob- d21 dzz . . . d2p
lem of assigning a state model of the lowest possible +
order which has a given impulse response has been u. (25)
solved by Ho and KALMAN[63]. See also KALMAN,
FALB and ARam [64], and BRUNI, ISIDORI and
RUBERTI [201]. Again the solution is not unique. drl dr2 .. d. v
The model S(A, B, C, D) contains
N l =n2 + np+ nr + pr
This representation contains n + np + nr +pr para-
(24)
meters, n of these are redundant since all state
parameters. The fact that the input-output relation variables can be scaled without affecting the input-
is invariant under a linear transformation of the output relations. The input-output relation can
thus be characterized by
state variables implies that all N 1 parameters cannot
be determined from input-output measurements.
To obtain unique solutions as well as to be able to N2 = n ( p + r ) + pr (26)
construct efficient algorithms it is therefore of great
interest to find representations of the system which parameters. Since the system is completely control-
contain the smallest number of parameters, i.e. lable and observable there is at least one non zero
canonical representations.
element in each row of the B matrix and of each
column of the C matrix• The redundancy in (25)
Canonical forms for linear deterministic systems can thus be reduced by imposing conditions like
Canonical forms for linear systems are discussed
e.g. by KALMAN [62]. When the matrix A has maxflo= 1 i=1, 2 . . . . . n (27)
J
distinct eigenvalues canonical forms can be obtained
as follows. By a suitable choice of coordinates the 2 Ifl,J[= 1 i = 1, 2 . . . . . n (28)
J
matrix A can be brought to diagonal form.
m
or similar conditions on the C matrix.
When the matrix A has multiple eigenvalues the
0 ... 0
problem of finding a minimal parameter repre-
0 22 ... 0 sentation is much more complex. If A is cyclic, i.e.
dx
X there exists a vector b such that the vectors b, Ab,
dt A 2 b , . . . , A ~- Ib span the n-dimensional space, the
matrix can be transformed to companion form
and a minimal parameter representation is then
0 0 ... 2, given by
m
--a 1 1 0 ... 0
flll ill2 filp -a2 0 1 ... 0
fl21 fl22 . . .
fl2v dx
X
+ dt
-a.-i 0 0 ... 1
1/.2 -a. 0 0 ... 0
134 K.J. ASTRt')M and P. EYKHOFF
An analogous form for systems with several outputs

bl~ b12 Dip
is
b21 b22 b2p
d"y j_ a d"-Iy , [- d"ul
+ cl-~-.~dt-~-7-1 + . . . + A . ) = [ B o l d t " + ' "
+ Bnlul]+ • ..
[- d"uv
+[BoPdt h + . . . + B.pup] (32)
bn-1.1 bn- 1,2 bn-l,p
bnl bn2 • • • bnp where y is a r-vector and A i r x r matrices.
This form was introduced by KOEPCKE[65]. It has
been used among others by WONG et al. [66] and
Rowe [67]. The determination of the order of the
Cll C12 ,.. Cln process (32), which in general is different from n,
C21 C22 • •. C2n as well as the reduction of (32) for state form has
been done by TUEL [68]. Canonical forms for linear
y= multivariable systems have also been studied by
X
LUENBERGER [69] and BUCY [205]. Analogous
results hold for discrete time systems. When the
Or1 Or2 •.. Crn matrix A has multiple eigenvalues and is not cyclic
it is not clear what a "minimal parameter repre-
sentation" means. The matrix A can, of course,
always be transformed to Jordan canonical form.
dll d12 ... dlv Since the eigenvalues of A are not distinct, the
d21 d22 ... d2p matrix A can, strictly speaking, be characterized by
fewer than n parameters. The one's in the super-
+ u (29) diagonal of the Jordan form can, however, be
arranged in many different ways depending on the
internal couplings. This leads to many different
dr1 dr2 ... dry structures•
Canonical Jbrms for linear stochastic systems

where n additional conditions, e.g. of the form (27) We will now discuss canonical forms for stocha-
or (28) are imposed on the elements of the matrices stic systems. To avoid the technical difficulties
B and C. associated with continuous-time white noise we
In the case of processes with one output the will present the results for discrete-time systems.
additional conditions are conveniently introduced The analogous results are, however, true also for
by specifying all elements of the vector C, e.g. continuous-time systems. Consider the system
C'=[1 0 . . . 0]. The canonical form (29) then be-
comes x(k + 1 ) = ~ x ( k ) + F u ( k ) + v ( k )
y(k) = Ox(k) + Du(k) + e(k) (33)
Y(s)=[dll-f blxs"-----~l+b21s--'----~"--2"+-
+ ...
where k takes integer values. The state vector x,
the input u and, the output y have dimensions n,p,
+''" + [ d'v-t blvs"-+l als"-1+
+ b2ps"- 2-[.
+ "+a--]+b ' ] Up(s) and r; {v(k)} and {e(k)} are sequences of indepen-
dent equally-distributed random vectors with zero
(30) mean values and covariances R~ and R2. Since the
covariance matrices are symmetric the model (33)
where Y and Ui denote the Laplace transforms of contains
y and ul. A canonical representation of a process
of the nth order with p inputs and one output can N 3 =nZ+np+ nr+pr+½n(n+ 1) +½r(r + 1)
thus be written as
=n(½n+½+ p+r)+r(P+2+½ ) (34)
1 . . .
d"Y+al&-~Y+dP
dr-' "'" +a.y= b" t d
parameters• Two models of the type (33) are said
to be equivalent if: (i) their input-output relations
+b, l u l ] + g . , d"up (31) are the same when e = 0 and v=0, and (ii) the sto-
".-+L0O,d--~+-..+b',vu,].
chastic properties of the outputs are the same when
u=0. The parameters of O, F, and 0 can be reduced and the corresponding reciprocal polynomials
by the techniques applied previously to deter-
ministic systems. h*(q) = qnA(q- t)
It still remains to reduce the parameters repre-
B*(q) = q"B,(q - ')
senting the disturbances. This is accomplished e.g.
by the Kalman filtering theorem. It follows from C*(q)=q"C(q -1) (40)
this that the output can be represented as
the equation (37) can be written as
2(k + 1) = O2(k) + Vu(k) + Ke(k)
p
y(k) = 02(k) + Ou(k) + e(k) (35) A*(q- X)y(k)= ~ B*(q- 1)ui(k)+ C*(q- 1)e(k)
i=1
(37')
where 2(k) denotes the conditional mean of x(k)
or
given y ( k - 1), y ( k - 2) . . . . . and {e(k)} is a sequence
of independent equally distributed random variables p
with zero mean values and covariance R. A(q)y(k)= ~ B,(q)ui(k)+C(q)e(k ). (37")
i =1
The single output version of the model (35) was
used in ASTR~SM[199]. KAILATH[71] calls (35) an
innovations representation of the process. A detailed This canonical form of an nth order system was
discussion is given in/~STROM [72]. The model (35) introduced by/~STRrM, BOHLINand WENSMARK[74]
is also used by MEHRA [73]. and has since then been used extensively. The
Notice, that if the model (35) is known, the corresponding form for multivariable systems is
steady state filtering and estimation problems are obtained by interpreting y and ui as vectors and
very easy to solve. Since K is the filter gain it is not A, B~ and C as polynomials whose coefficients are
necessary to solve any Riccati equation. Also no- matrices. Such models have been discussed by
tice that the state of the model (35) has physical EATON [75], KASHYAP [76], R O W E [67] and VALIg
interpretation as the conditional mean of the state [2191.

of (33). If * is chosen to be in diagonal form and The following canonical form
if conditions such as (27) are introduced on F and
0 the model (35) is a canonical representation Bl(q), tk ~ , B2(q) u rk ~+
Y ( k ) = A - ~ . - t , ,-rA2(q ~ 2t , . . .
which contains
+ B,(q) . . . . C(q) "t" (41)

N , =n(p + 2r) + r(P + 2 + ½) (36)
Ap(q) A(q)
parameters. has been used by BOHUN [77], and STEmLITZ and

For systems with one output, where the addi- MCBmD~ [94] as an alternative to (37).
tional conditions are as 0'= [1 0 . . . 0],the equation The choice of model structure can greatly in-
(35) then reduces to fluence the amount of work required to solve a
particular problem. We illustrate this by an
y(k) + a 1y(k - I) + . .. + a,y(k - n) example.
A filtering example. Assume, that the final goal
=[b'xul(k)+ . . . + b ' l u t ( k - n ) ] + ...
of the identification is to design a predictor using
+ [b'opup(k) + . . . + b'npup(k- n)] + e(k) Kalman filtering. If the process is modeled by
+ e l e ( k - 1)+ . .. + c , e ( k - n ) . (37)
x(k + 1) = ex(k) + o(k)
By introducing the shift operator q defined by y(k) = Ox(k) + e(k) (42)
qy(k) = y(k + 1) (38) where {e(k)} and {v(k)} are discrete-time white
noise with covariances Rt and R2, the likelihood
the polynomials function for the estimation problem can be written
as
A ( q ) = q " + a l q " - X + . . . +a.
Bi(q) = b'oiq"+ b'~lq"- 1 + . . . + b,ni - l o g L = ½ ~ [v'(k)R'(av(k)+e'(k)R21e(k)]
k=l
i=1, 2 . . . . . p
+ 21og(det Rt- det R2) + const. (43)
C(q)=qn+ciq"-X + . . . +e, (39)
136 K.J. ASTRrM and P. EYKHOFF
where the system equations are considered as con- Since a parametric model contains the order of
straints. The evaluation of gradients of the loss the system explicitly what happens if the wrong
function leads to a two point boundary value prob- order is assumed in the problem formulation ?
lem, Also when the identification is done, the A linear stochastic system can be decomposed
solution of the Kalman filtering problem requires into a deterministic dynamic system with additive
the solution of a Riccati equation. noise corrupting the ideal output. Is it "better"
Assume instead that the process is identified using to determine the deterministic model separately
the structure and then (if required) the statistical characteri-
z(k + 1) @z(k) + Ke(k)
=
stics of the additive noise, or should the overall
model be determined first and then decomposed,
y(h') = Oz(k) + e(k) (44) if necessary ?
the likelihood function then becomes There are not yet any general answers to these
problems. Special cases have been investigated by
-logL={k~= e'(k)R-le(k)+21ogdetR. (45) GUSTAVSSON [78] in connection with identification
of nuclear reactor and distillation tower dynamics
as well as on simulated data. Since correlation
The evaluation of gradients of the loss function in techniques, their properties and applications by
this case is done simply as an initial value problem. now are very well known we will not discuss these
When the identification is done the steady state here. Let it suffice to mention the recent papers by
Kalman filter is simply given by RAKE [79], WELFONDER [80], BUCHTA [81], HAYASHI
[82], REID [83], [84], STASSEN [85], GERDIN [86],
2(k + l ) : dp2(k) + K[y(k) - 02(k)]. (46) BROWN [202], KOSZELNIKet al. [203], and RAm~AN
et al. [204]. Instead we will concentrate on the
Hence if the model with the structure (44) is known more recent results on the identification of para-
there is no need to solve a Riccati equation in order metric models.
to obtain the steady state Kalman filter.
Least squares identification of a parametric model
5. IDENTIFICATION OF LINEAR SYSTEMS Consider a linear, time invariant, discrete-time
model with one input and one output. A canonical
Linear systems naturally represent the most
form for the model is
extensively developed area in the field of systems
identification. In this section we will consider linear
ym(k)+a,ym(k-1)+ . . . + a , y , , ( k - n )
systems as well as "linear environments", i.e.
environments that can be characterized by linear =blu(k-1)+ ... +b,u(k-n) (47)
stochastic models. In most control problems the
properties of the environment will be just as impor- where u is the input and y,, the output of the model.
tant as the system dynamics, because it is the pre- Using the notation introduced in section 4 the
sence of disturbances that creates a control prob- model (47) can be written as
lem in the first place.
To formulate the identification problem using the A(q)y,,(k) =B(q)u(k) (47')
framework of section 2 the class of models 6p, the or
inputs ql, and the criterion must be defined. These
A*(q- 1)y,,(k) = B*(q - ~)u(k). (47")
problems were discussed in sections 3 and 4. If
classical design techniques are to be used, the model
Let the criterion be chosen as to minimize the loss
can be characterized by a transfer function or by an
function (1), i.e.
impulse response. Many recently developed design
N+n
methods will, however, require a state model, i.e.
V = V(y, y,,,)= )-" e2(k) (48)
a parametric model. k=n
Several problems naturally arise:
Suppose the impulse response is desired. Should where e is the generalized error defined by
this be identified directly or is it "better" to
identify a parametric model and then compute the e(k) = A*(q- ')[y(k) - Yrn(k)]
impulse response ? or
Assume, that a parametric model is desired. e(k) = A*(q - l)y(k) - B*(q- 1)u(k) (49)
Should this be fitted directly or is it "better" to first
determine the impulse l esponse and then fit a and the last equality follows from (47"). The main
parametric model to that ? reason for choosing this particular criterion is that
the error e is linear in the parameters a~ and bl. The equation defining the error (49). then becomes
The function V is consequently quadratic and it is
easy to find its minimum analytically. Notice that e=y-OB. (52)
(49) implies
The minimum of the loss function is found through
y ( k ) + a l y ( k - 1)+ . . . + a . y ( k - n ) V V=0. IfO'q~ is not singular then this minimum
is obtained for fl =/~:
= b l u ( k - 1) + . . . + b . u ( k - n) + e(k). (50)
= D " ~ ] - lO'y- (53)
The quantities e(k) are also called residuals or
equation errors. The criterion (48) is called mini-
It is thus a simple matter to determine the least
mization of "equation error". In Fig. 3 we give a
squares estimate. The matrices O'y and O'qb are
block diagram which illustrates how the generalized
given in (54) and (55). Notice that O'O is sym-
error can be obtained from the process inputs and
metric.
outputs and the model parameters ai and bi in the
least squares method.
N+n
- Z y(k)y(k-l)
k=n+l
u(k)i, N+n
- ~, y ( k ) y ( k - 2 )
k=n+l
at
~ I Y"tk),l
~'(q-') ]
N+n
- Z y(kly(k-,O
lm(k~ (k~
k=n+l
(I)ly ~--. (54)
u(k) N+n
Z y(k)u(k- 1)
k=n+l
N+n
Z y(k)u(k-2)
i ~.v~(k) k=n+l
Fro. 3. Definition of the "equation error".

N+n
Z y(k)u(k-n)
To find the minimum of the loss function V we k=n+l
introduce
y(n + 1)
y(n + 2)
y=
y(n + N)
-y(n) - y(n - 1) ... - y ( 1 ) iu(n) u(n-1) ... u(1)

i
- y(n + 1) -y(n) ... --y(2) i u ( n + l ) u(n) , . . u(2)
i
~=
i
-- y(N + n - - 1) -y(N+n-2) • . . -y(N) iu(N+n-1) . . . . . . u(N)

i
/~'=[axa2 ... a.blb2 . . . b.]. (51)

¢-0
G~
-N+n- 1 N+n- 1 N+n-1 N+n--I N+n- I ~,+n- 1
~, y~(k) ~ y(k)y(k-l) . . . y, y(k)y(k-n+l)- y, y(k)u(k) - ~ y(k)u(k-I) . . , - ~ y(k)u(k-n+i)

k=n k=n k=n k=n k=n k=n
N+n-2 N+n--2 N+n- 2 N+n-2 N +n- 2

y2(k) . . . ~ y(k)y(k-n+2):- ~_, y(k)u(k+l) - ~. y(k)u(k) . . . - ~ y(k)u(k-n+2)
k=n-- I k=n- 1 k=n- 1 k=n- 1 I~ - n - I
7Z
N N N N ,-q
Z Y2(k) - ~ y(k)u(k+n-1) - ~_, y(k)u(k+,,-2) . . . - ~ y(k)u(k) ©:

k=l k=l k=l k=l
t55)
N+n-I N+n--1 N+n-I
y, uZ(k) ~ u(k)u(k-1) . . . 2 u(k)u(k-n+l)
k=n k=n k=n
,.<
N+n--2 N+n-2
©
Z u~(k) ... ~ .(k)u(k-n+2)
k=n--1 k=n--1
k=l
For literature on matrix inversion the reader is general properties of the maximum likelihood esti-
referred to WESTLAKE[87]. mate since they are derived under the assumption
Notice, that the technique of this section can of independent ~xperiments.
immediately be applied to the identification of non- In practice the order of the system is seldom
linear processes which are linear in the parameters, known. It can also be demonstrated that serious
e.g. errors can be obtained ifa model of the wrong order
is used. It is therefore important to have some
y(k)+ay(k-1)=b~u(k-1)+b2u2(k-1). (56) methods available to determine the order of the
model, i.e. we consider 5a as the class of linear
models with arbitrary order.
A probabilistic interpretation To determine the order of the system we can fit
least squares models of different orders and analyse
Consider the least squares identification problem the reduction of the loss function.
which has just been discussed. Assume, that it is The loss function will of course always decrease
of interest to assign accuracies to the parameter when the model parameters are increased. To test
estimates as well as to find methods to determine if the reduction of the loss function is significant
the order of the system if it is not known. Such when the number of parameters are increased from
questions can be answered by imbedding the prob- nt to n2 we can use the following test quantity.
lem in a probabilistic framework by making suitable
assumptions on the residuals. For example if it is
assumed that t = V1 -- V2 N - - r/2 (58)
V2 n2-nt
The input-output data is generated by the model
where the residuals {e(k) } are independent, equally
distributed have zero mean values and finite where V~ is the minimum value of the loss function
fourth order moments; for a model with ni parameters (i= 1, 2) and Nis the
number of input-output pairs. It can be shown that
The input u is independent of e and persistently the random variable t for large N is asymptotically
exciting of order n; F(n 2 - n 1, N - n2) distributed.
All the roots of the equation If follows from the properties of the F-distribu-
tion that ( n 2 - n 1) t is also asymptotically Z2
zn+alzn-l + ... +a,=0 (57) distributed.
When applying the least squares method to a single-
have magnitudes less than one. input single-output system and the order of the
model is increased from n to n + 1 the number of
Then it is possible to show that the least squares parameters is increased by two. We have
estimate fl given by (53) converges to the true
parameter value as N ~ ~ .
F(2, 100) = 3"09=~P{t > 3"09} =0.05
The special case of this theorem when b~=0 for
all i, which correspond to the identification of
F(2, oo) = 3"00=~P{t > 3"00} = 0"05.
the parameters in an autoregression, was proven by
MANN and WALD [36]. The extension to the case
with b~#0 is given in ASTR6M [48]. Hence at a risk level of 5 per cent and N > 100 the
For large N the covariance of # is given by quantity t should be at least 3 for the corresponding
reduction in loss function to be significant.
The idea to view the test of order as a decision
cov[t ] problem has been discussed by ANDERSON [89].
It is also a standard tool in regression analysis.
where tr2 is the variance of e(t). Other techniques for testing order are discussed in
Estimates of the variances of the parameter WOODSIDE [88].
estimates are obtained from the diagonal elements Notice, that the least squares method also in-
of this matrix. cludes parametric time series analysis in the sense
If it is also assumed that the residuals are gaussian of fitting an autoregression. This has been dis-
we find that the least squares estimate can be inter- cussed by WOLD [90] and WHITTLE [37]. Recent
preted as the maximum likelihood estimate, i.e. we applications to L E G analysis have been given by
obtain the loss function (48) in a natural way. GERSCH [91]. Using the probabilistic framework
It has been shown that the estimate fl is asympto- we can also give another interpretation of the least
tically normal with mean fl and covariance a 2 squares method in terms of the general definition
[ ~ , ~ ] - x. Notice, that this does not follow from the of an identification problem given in section 3.
140 K . J . ASTROMand P. EYKHOFF
First observe that in the generalized error defined Comparison with correlation methods
by (49) another Ym can be used: Now we will compare the least squares method
with the correlation technique for determining the
e(k) = A*(q - 1)y(k) - B*(q - 1)u(k) impulse response. When determining process
=y(k)-ym(k ) . (59) dynamics for a single-input single-output system
using correlation methods tile following quantities
Consequently: are computed :
y,,(k) =9(klk - 1) = [1 - A * ( q - ')]y(k)
• l N- i
+ B*(q - X)u(k) R,,0) = ~ k~=l u(k)u(k + i)
= - a l y ( k - 1 ) - . . . - a , y ( k - n)
+blu(k-l)+ ... +b,u(k-n). (60)
R , ( i ) = N ~ i k~=i Y(k)Y(k + i)
Notice, that y,,(k)=9(k]k-l) has a physical (62)
interpretation as the best linear mean squares • 1 N-i
predictor of y(k) based on y ( k - 1), y ( k - 2 ) , . . . ny.(t) = ~ k ~ , y(k)u(k + i)

for the system (50). The generalized error (49) can
thus be interpreted as the difference between the • 1 N-i
actual output at time k and its prediction using the R,,( l ) = - ~ - i k =~, u( k ) y( k + i)
model (60).
The least squares procedure can thus be inter-
preted as the problem of finding the parameters Comparing with the least squares identification of
for the (prediction) model (60) in such a way that process dynamics we find that the elements of the
the criterion matrices ~'¢b and ~ ' y of the least squares procedure
given by (54) and (55) are essentially correlations or
N
cross-correlations. Neglecting terms in the begin-
V(y, ym)= ~ [y(k)-y,,(k)] 2 (6l)
k = l
ning and end of the series we find
Ry(O) Ry(1) . . . R y ( n - l ) i -Ry.(0) -R,y(l) . . . -R,y(n-l)-

Ry(0) . . . R,(n - 2) i - R,.(1) - Ry.(0) - n.,(n - 2)
i
. . . . t . . . . . . . . . . . . . . . . . . .
Ry(O) -Rr,(n-1) -Ry,(n-2) . . . -Ry,(0)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
~'d~=N (63)
R,(0) R,(1) . . . R , ( n - 1)
R,(0) . . . R,(n-2)
R ,(0)
is as small as possible. Compare with the block

diagram of Fig. 4. This interpretation is useful, ] - - Ry(l)
because it can be extended to much more general
- Ry(2)
cases. The interpretation can also be used in
situations where there are no inputs, e.g. in time
series analysis.
- 8y(n)
[nO0
gO'y=N (64)
~,~ prOcs.55~y(k) R.y(1)
Ruy(2)
g.y(n)
FIG. 4. Interpretation as predictor model.
Hence if a correlation analysis is performed, it is Necessary conditions are:

a simple matter to calculate the least squares esti-
mate by forming the matrices ~ ' ~ and ~ ' y from the OV u
Z e(k)u(k- 1) = 0 (66)
values of the sample covariance functions and O/~ k=l
solving the least squares equation. Since the order c~V ~v
of the system is seldom known a priori it is often ~, e(k)y(k- l)=O. (67)
convenient to compute the least squares estimate
recursively using the test of order we have described
previously. In (67) the additive noise is present both in e and y;
this leads to a term
Correlated residuals
N
Many of the nice properties of the least squares Z n2(k-l)
method depend critically upon the assumption that k=l
the residuals {e(k)} are uncorrelated. It is easy

to find real-life examples where this assumption which causes a bias.
does not hold. To see how this works out consider an example.
Example. Assume, that the process is actually
Example. Consider a noise-free first order system described by the model
x(k + 1) + ax(k) = bu(k). y(k + 1 ) - 0.5y(k) = 1.0u(k) + n(k + 1) + 0.1 n(k)
Assume that x is observed with independent where {n(k)} is a sequence of independent normal
measurement errors (additive noise), i.e. (0, 1) random variables, but that the system is
identified using the least squares method under the
y(k) = x(k) + n(k) assumption that the residuals are uncorrelated.
th n Below we give a typical result obtained from 500
pairs of inputs and outputs
y(k + l) + ay(k) = bu(k) + n(k + 1) + an(k).
Process
We thus get a system similar to (50) but with cor- parameters Estimates
related residuals. a = - 0.5 ~ = - 0.643 ___0"029
When the residuals are correlated the least b= 1.0 /~= 1.018+0.062
squares estimate will be biased. Asymptotically
the bias is given by
We are apparently in a very bad situation; not
only is the estimate ~ wrong but we have also a
E(fl - b) = [E(~'(I))] - 'E(C~'e) (65)
great deal of confidence in the wrong result. Note,
that the true value 0.500 deviates from the estimate
where /~ is the estimate and b is the true value of
with about 5a.
the parameter. The reason for this bias can be
The correlation of the residuals can thus easily
indicated as follows; c.f. Fig. 5.
lead to wrong conclusions. Several techniques
have been suggested to deal with correlated resi-
in(k) duals, viz:
(a) repeatedleast squares
(b) generalized least squares
(c) the maximum likelihood method
(d) instrumental variables
u(k-1)l~ ]y(k-1) (e)
(f)
Levin's method
tally principle.
FIG. 5. Parameter estimation using a generalized (a) Repeated least squares. Suppose that we did
model.
not know the order of the system discussed in the
previous example. It would then be natural to
The estimate results from a minimization of the
continue the least squares procedure and to test
loss function
the order of the system. The results for the parti-
N
cular example are shown in Table 1. Assuring a risk
V= ~ e2(k).
k=l level of 5 per cent and using the test quantity of
!42 K . J . AsTROM and P. EYKHOFF
(58) we find from Table 1 that the increase of the We can thus conclude that if we choose o~ not as the
class of linear first order systems but as the class
TABLE 1 of linear systems of arbitrary order it is at least
possible to overcome the difficulty of correlated
, cL± cr(~) f~_+~(13) V t residuals in the specific example, This idea was
mentioned briefly in ASTR6M [92]. It has, however,
1 --0"643 ±0'029 1"018±0"062 592'65 not been pursued in general,
2 - 1'015 ±0"045 1'086±0.056 469.64 50'94 (b) Generalized least squares. Another way to
0'377±0"039 --0'520±0.072 overcome the difficulty with correlated residuals
3 - 1"118 ±0"050 1"115 ±0"055 447'25 9.67 is to use the method of generalized least squares.
0"624-L0"068 --0"660JO'078 See e.g. CLARKE [93].
--0"178 ±0"043 0"263 ±0.076
The basic idea is as follows. Let the process be
4 -1'15720'050 1"085±0'055 426-40 9.43 governed by
0.756±0"074 --0"733 ±0'078
-0"412 ±0-074 0"409 ±0'083
0"187 ±0"044 -0-146 ±0"076 A*(q - ~)y(k) = B*(q- t)u(k) + v(k) (68)
5 -1"185±0'051 1"080+0'054 418-72 3'51
0-814±0'077 --0'745 +0'078 where A* and B* are polynomials and {v(k)} a
--0'518 ±0'083 0"475 ±0'086 sequence of correlated random variables. Suppose
0'349±0"076 --0'252 ±0'086
- 0 ' 1 1 7 +0"044 0'123 ±0'076 that the correlations of the residuals are known.
Say that {v(k)} can be represented as
6 --1"195 ±0"051 1"079±0'055 416'56 0"99
0"339±0'079 --0"751 ±0'078
-0-555 ±0'088 0"487±0'087 v(k) = G*(q - ')e(k) (69)
0'410±0"088 --0-290±0'090
--0'208 ±0"079 0"183 ±0'087
0'061 +0"045 -0"080±0"076 where {e(k)} is a sequence of uneorrelated random
variables and G a pulse transfer function. The
7 414'62 0'89
equation describing the process can then be written
as
model order from 5 to 4 gives a significant reduc-

tion in the loss function (t=3.51) but that the
A*(q-')y(k)=B*(q-~)u(k)+G*(q-')e(k) (70)
reduction in the loss function when the order is
or
increased from 5 to 6 is not significant (t=0.99).
The order test will thus indicate that a fifth order
model is appropriate.
A*(q-')~(k)=B*(q-~)i(k)+e(k) (71)
where
A*(q- ~)y(k) = B*(q - ~)u(k) + 2e(k)
where 1
)~(k) = G * ( q - -)y(k) (72)
A * ( q - 1 ) = l - 1"19q -1 + 0 ' 8 1 q - Z - 0 ' 5 2 q -3
+ 0 . 3 5 q - 4 - 0 . 1 2 q -5 1
if(k) = G*(q- ~)u(k). (73)
B*(q- 2) = l '08q- t _ 0"75q- 2 + 0"48q - 3
- 0 . 2 5 q - 4 + 0 . 1 2 q -5 Hence if the signals i and )~ are considered as

the inputs and outputs we have an ordinary
Dividing A by B we find least squares problem. Compare with (49). We
thus find that the generalized least squares can be
1-08A(z) _ z - 0-495 interpreted as a least squares identification problem
B(z) where the criterion is chosen as (48) with the genera-
0'03z 3 - 0"0722 + 0" t 2 z - 0"06 lized error defined as
+
z ¢ - 0"69z 3 + 0"44z z - 0 ' 2 3 z + 0 " 1 1 "
Taking the uncertainties of the coefficients a and fl G*(q- ) G (q- )

into account it does not seem unreasonable to
neglect the rest in the above expression. We find = A*(q- ')[ G,~q_ ~)Y(k) 1
that the system under test can be represented by
y(k)= l'08,u(k)+ it . e(k). -B*(q-1)[~u(k)] • (74)
q-0.5 A*(q-')
Compare with the block diagram of Fig. 6. This The non-linearity in the parameters can also be
shows how the generalized error can be obtained handled by means of quasilinearization, c.f.
from the process inputs and outputs and the model SCHULZ [218]. The techniques proposed in [111]
parameters 0~and fl in the generalized least squares and [217] can also be interpreted as generalized
method. least squares methods.
These procedures have the drawbacks that there
are no systematic rules for the choice of order of the
u~
In(k) model (75) and of the autoregression (76). Neither
are any convergence proofs yet available. It has,
however, been shown to work very well with rea-
sonable ad hoc choices of order in speOfic examples.
The following observation might also be worth-
while. Assume, that the generalized least squares
procedure will converge. Say A i ~ A , B j ~ B , and
elk) D j--, D. We will then obtain
FIG, 6. Parameter estimation using "'whitening
filters" G*. A*(q- l )D*(q- ~)y(k)
= B*(q- t)D*(q- X)u(k) + e(k) (79)
The correlation of the residuals and the pulse
i.e. a description of the process with uncorrelated
transfer function G are seldom known in practice.
residuals. It thus appears that the differences bet-
CLARKE [93] has proposed an iterative procedure to
ween the repeated least squares and the generalized
determine G which has been tested on simulated
least squares are small.
data as well as on practical measurements (distil-
lation column identification). (c) The maximum likelihood method. Another
The procedure consists of the following steps: way to deal with the problem of correlated residuals
is to postulate a system with correlated residuals,
(1) Make an ordinary least squares fit of the model e.g. a canonical representation of an n:th order
system with one input and one output
A*(q- t)y(k) = B*(q - ')u(k) + v(k). (75)
A*(q-1)y(k)=B*(4~)u(k)+ 2C*(c~ t )e(k) (80)
(2) Analyze the residuals v and fit an autoregression
i.e. where u is the input, y the output and {e(k-} a
sequence of independent normal (0, 1) random vari-
Dy(q-')v(k)=e(k) (76)
ables. Compare with section 4. The parameters
of (80) can be determined using the method of
where {e(k)} is discrete-time white noise.
maximum likelihood. This has been discussed in
(3) Filter the process inputs and outputs through
[74], [23], [97], [78], [199].
The likelihood function L is given by
~(k)=D~(q-1)y(k);
ff(k)=O~(q-')u(k). (77) . 1 U
- l o g L(O, Z) = 2--~k=~a e2(k)
(4) Make a new least squares fit to the filtered
inputs and outputs and repeat from 2. +NlogA+Nlog21r (81)
Another approach along these lines is the algorithm where

of S~mLITZ and MCBRIDE [94]. They use at the
jth iteration : C*(q- ~)e(k) = A*(q- ~)y(k)- B*(q- t)u(k) (82)
D*(q - 1) = A j•_ l ( q - 1) and {u(k), k = 1, 2 . . . . . N} is the applied input

signal and {y(k), k = 1, 2, . . . , N} is the observed
thus making a least squares fit of the model output signal. The likelihood function is considered
as a function of 0 and 2, where 0 is a vector whose
components are the parameters ax, a2, . . . , an,
e(k)=AAi'l~q'~)y(k
- ) A,_l(q-X)
Bf (q-l) u(k) bD b2, • • •, b,, c 1, c 2, . . . , c, and the n initial con-
ditions of (82). Notice, that the logarithm of the
• -1 * -1
-I-.,,.,
-- ly~tv]
8j(q ). ,,.,q Aj(q )
-- ~ u ~ , ~ , ] [ •, t -- 1 "," (78) likelihood function is linear in the parameters a i
L Aj~q ) jaj_ttq ) and b i but strongly non-linear in %
144 K.J. ASTROMand P. EYKHOFF
Also notice, that the optimization of L with res- paper comparisons with other techniques such as
pect to 0 and 2 can be performed separately in the correlation methods and generalized least squares
following way: First determine 0 such that the loss are also given. The maximum likelihood method
function has also been applied to time series analysis (put
B=0). The maximum likelihood estimate is :i
N strongly non-linear function of the paraineters.
V(0)= ~ ~2(k) (83)
k=l Since time series analysis is mostly concerned with
quadratic functions, such as covariances and spec-
is minimal with respect to 0. The optimization with tral densities, one might expect that the estimates
respect to 2 can then be performed analytically. can be expressed as non-linear functions of the
We get sample covarianccs. Estimates of tiffs nature which
are asymptotically equivalent to the maxinum
iz = 1 min V(O). (84) likelihood estimates for parametric time series
N 0 analysis have been given by ZETTERRERG[I01]. The
generalization of the maximum likelihood method
The maximum likelihood estimate can be shown to the multivariable case has been done by Woo
to have nice asymptotic properties. Estimates of the [97] and CroNES [222].
accuracy of the parameters can also be provided.
See ASTR6M, BOHLINand WENSMARK[74], ASTR6M (d) Instrumentalvariables. Repeatedleastsquares,
and BOHLIN [23]. A test for possible violation of generalized least squares and the maximum likeli-
prerequisites of ML estimation is given by BOHLIN hood method all give a model of the environment in
[99]. terms of a model for the disturbances as a filter
The maximum likelihood procedure can also be driven by white noise. If we are only interested in
interpreted as finding the coefficients of the pre- the system dynamics there are other methods to
diction model avoid the difficulties with correlated residuals, e.g.
the instrumental variable method.
The equation for the least squares estimate can
y,,(k) =)3(klk- 1)= B*(q-') ,.. be obtained from the equation
A*(q-1)- C*(q- '!y(k) (85) y=C~fl + e (88)

C*(q -1)
by premultiplying with qb', neglecting the term q~'e
in such a way that the criterion and solving the equation
N *'y=*'O/?. (89)
V=- V(y, Ym)= ~ [y(k)
k=l
N The estimate /~ will be unbiased if the term ~'e
-Ym(k)] 2= ~ e2(k) (86) has zero mean. When the residuals are correlated
k=l
it is easy to show that EO'e¢O.
In the instrumental variable method, [102], [103],
is as small as possible.
the equation (88) is multiplied with W' where W,
Notice, that (80) can also be written as
called the instrumental matrix, is a matrix whose
elements are functions of the data with the proper-
A*(q-1)[ ~ y( k)l ties
L tq ) J
= B*(q- i ) [ ~ u ( k ) ] + 2e(k). (87) EW'O nonsingular (90)
EW'e=O. (9l)
This means that the maximum likelihood method
can also be interpreted as a generalized least squares The parameter estimate obtained from
method where the filter function G = 1/C is deter-
mined automatically. W' y = W'~fi (92)
It has been shown that predictors and minimal
variance control algorithms are easily determined will then be unbiased. It is also possible to find
from the model (80); c.f. /~STROM [72], [92]. The instrumental variables such that the estimate has
maximum likelihood method has been applied optimal properties. The schemes proposed by
extensively to industrial measurements. See e.g. PETERKA and ~MUK [104], HSIA a n d LANDGREBE
.~STRO~ [921, [1991, and GUSTAVSSON [I00] in this [105] are closely related to the instrumental variable
technique. VUSKOVI(;, BINGULAC and DJOROVI(~ 4 for a discussion of structures for multivariable
[106] indicate the use of pseudo sensitivity functions systems. Also c.f. GRAUPE, SWANICK and CASSIR
as instrumental variables. [112]. For the structure (35) the likelihood function
is given by
(e) Levin's method. A particular method for
estimating the bias due to correlated residuals has
been proposed by LEVIN [107], [210], [211] for the - log L(O, R) = Nlog det R
case of a deterministic system with independent
observation errors. Levin's results are based on a 1 ~ e'(k)R-'e(k)+~log2rt. (95)
technique due to Koopmans and it gives the esti- -]-2k= l
mates in terms of an eigenvalue problem for the
matrix ~ ' ~ . A careful analysis of Levin's method Even for multivariable systems the maximization
which includes convergence proofs and error esti- of L(O, R) can be performed separately with respect
mates has been done by AOKI and YUE [108]. The to 0 and R. It was shown by EATON [75] that the
method has also been used by SMITH [109]. maximum of L(0, R) is obtained by finding 0 which
minimizes
(f) Tally principle. This estimation procedure as
described by PETERKAand HALOUSKOV,~[110] starts
from the canonical fo'm of an nth order process V(0) = det[k~l e(k)~'(k)]. (96)
given by equation (37). Now the terms with e are
combined to a random variable 6(k):
The maximization with respect to R can then be
done analytically to yield
6(k) = ~(k)+ ~ c~e(k- i).
i=l
R =lk~= l ~(k)~'(k ) . (97)
Based on the knowledge that {e(k)} is a sequence of
uncorrelated random variables it is found that
This fact is also mentioned in ROWE [67]. The
E[f(k)y(k- i)] = 0 for i > n identification problem for linear multivariable
systems is also closely related to the problem of
E[tS(k)u(k-i)] = 0 for i > 0 . (93) simplifying a large linear system. See e.g. DAVlSON
[701.
For any estimate of the unknown parameters one
finds a sequence 6*(k). Now
6. IDENTIFICATION OF NON-LINEAR SYSTEMS
1 N Representation of non-linear systems
-- Z 6*(k)u(k-i) (94) Note again, that non-linearity does not neces-
Nk=l sarily imply a non-linearity in the parameters too
and (c.f. section 4).
For linear systems the impulse response offers
1 N a non-parametric system description that does not
-- E 6*(k)y(k-i) (94')
Nk=l depend on specific a priori assumptions. For a wide
class of non-linear systems a Volterra series ex-
have to "tally" (duplicate) the corresponding expec- pansion offers analogous possibilit!es, using im-
tations (93) as well as possible in a least square pulse responses of increasing dimensionality.
sense. Approximation of these functions by a finite num-
In the special case of time series analysis the ber of points leads to a model that is linear in the
"tally" principle reduces to the well-known tech- parameters; c.f. EYKHOFF [113], ALPER [114], ROY
nique of fitting the parameters of the Yule-Walker and SHERMAN [1 15]. For many practical cases the
equation to the sample covariances. See e.g. Ref. number of parameters needed for this description
[224]. is too large.
Another approach for non-linear model building,
Multivariable systems called group method of data handling, is given
The essential difficulty in the identification of by IVAKHNENKO[116], [117].
multivariable systems is to find a suitable repre- When considering non-linear systems as well as
sentation of the system. Once a particular repre- linear systems with multiple inputs and multiple
sentation is chosen it is a fairly straight-forward outputs it therefore is necessary to make specific
procedure to construct identification methods assumptions concerning the model structure. It is
analogous to those given for the case of systems usually assumed that the system equations are
with one input and one output. We refer to section known except for a number of parameters b.
146 K . J . /]tSTROMand P. EYKHOFF
In a typical case the identification problem can This is probably due to the computational prob-
then be formulated as follows: lems when evaluating the conditional expectations.
Let the class of models be all systems described by McGEE and WALVORD [128] propose a solution by
the state equation using a Monte Carlo approach.
d--~x=f(x, u, [3, t) 7. ONLINE AND REAL-TIME IDENTIFICATION

dt In many applications it is highly desirable to
ym=g(x, U, fl, t) (98) obtain the result of the identification recursively
as the process develops. Fol example it might be of
where the parameter vector fl belongs to a given set. interest to proceed with the experiment until a
Let the criterion be given by the loss function specified parameter accuracy is achieved. The
problem is then as follows. Assume, that an esti-
XT mate fin is obtained based on N pairs of input-
V(y, ym)= V(fl)= o [Y(k)-ym(k' [3)]2dt (99) output samples. Is it necessary to repeat the whole
identification procedure from the beginning, using
where y is the process output and y,,, the model the whole string of input-output data in order
output. to obtain fiN+ i or is it possible to arrange the com-
putations recursively? An identification scheme
Estimation./or a parametric model which is recursive and which does not require that
The estimation problem for a non-linear para- the whole string of input-output data is brought in
metric model thus reduces to a non-linear optimiza- at each step is called an on-line method.
tion problem. As was discussed in section 3-- On-line identification can thus be looked upon as
Computational aspects, there are many techniques a convenient way of arranging the computations.
available for the solution of the non-linear estima- Apart from being of practical interest this point of
tion problem. view on identification problems will also make it
Techniques which exploit the particular struc- possible to establish connections with other fields,
ture of the problem is discussed by MARQUARDT e.g. non-linear filtering, stochastic approximation,
[120]. learning and adaption.
The particular case when the non-linear system If the parameters of the process are truly time
can be separated into linear dynamics and non- varying it is, of course, meaningless to do anything
linearities without memory are discussed by BUTLER else but track the parameters in real-time. This is
and BOHN [118] and NARENDRAand GALLMAN[119]. called real-time identification. One may recognize
BELLMANand KALABA[ 121 ] have used quasilineariza- two computational procedures: an accumulative
tion to solve the non-linear optimization problem. solution, open loop with respect to the parameter
Interesting applications of this technique are estimate, and a recursive solution, closed loop with
found in BUEL, KAGIWADAand KALABA[122], ]]UEL respect to the parameter estimate; c.f. GENIN [129].
and KALABA [123]. A fairly general computer There are many different ways to obtain al-
program to solve the problem has been written by gorithms for real-time identification. Practically
BUEL. Also c.f. VANECEKand FESSL [124]. all methods, however, will yield algorithms with
Another method to solve the non-linear optimiza- the structure
tion problem has been given by TAYLOR, ILIFF and
POWERS [125] in connection with application to fl(N + 1) = fl(N) + F(N)e(N) (100)
inflight determination of stability derivatives.
Again the criterion (99) can be given a probabll :- where e(N) is the generalized error discussed earlier
stic interpretation if it is assumed that the only and F(N) is a gain factor which can be of varying
disturbances are white noise measurement errors. complexity.
A technique which admits the measurement errors
to be a stationary process with unknown rational Model reference techniques
spectral density has been proposed by ASTR6M, The on-line identification problem is sometimes
BOHLIN and WENSMARK [74]. formulated as a model tracking problem. A simple
Due to specific assumptions that are made con- case is illustrated in Fig. 7. Note, that this is an
cerning the structure of (98) one might expect that example of a recursive, closed loop, approach.
serious mistakes can be made if these assumptions The input is simultaneously fed to the process
are not true. Results which prove or disprove this and to a model with adjustable parameters. The
are not known. adjustable parameters are changed by an adjust-
Rather few publications have appeared on the ment mechanism which receives the process output
use of Bayes' method (c.f. Appendix A) in identifi- y and the model output y,, as inputs. This formula-
cation techniques; MASLOV [126], GALTIERI [127]. tion of the on-fine identification problem was first
b On-line least squares
~djustmcnt
~ mechanism
The conversion of any identification method to an
on-line technique consists of showing that the
estimate satisfies a recursive equation. This is easily
done for the least squares method. Consider the
least squares model of section 5, i.e.
y ( k ) + a l y ( k - 1)+ . . . + a , y ( k - n )
FIG. 7. Model adjustment technique.
=blu(k-1)+... +b,u(k-n). (101)
Define
considered by WHITAKER [130]. The essential
problem is to determine the adjustment mechanism f l ' = [ a ~ , a 2. . . . . a., bl, b2 . . . . . b.] (102)
such that the model parameters in some sense will
and
be close to process parameters. There are many
ways to obtain suitable adjustment mechanisms. (p(N + 1) = [ - y(N), - y(N - 1). . . . .
One way is to introduce a criterion in terms of a
-y(N-n+l), u(N), u ( N - 1). . . . .
loss function as was done in section 2 and to change
parameters in such a way that the loss function u ( N - n + 1)]. (103)
decreases. Since there are so many different ways
to define both the generalized error and the loss The least squares estimate is then given by (53). It
function we will not enumerate all cases but just can be shown by simple algebraical manipulations
illustrate the principle. For example, if the loss that the least squares estimate satisfies the recursive
function is based on e ( N ) = y ( N ) - y , , ( N ) and a equation
least squares structure is used, it follows that the
optimal choice of gain F(N) of the adjustment fl(N + 1) = fl(N) + F(N)[y(N + 1)
mechanism (100) is proportional to
- ~o(N + 1)fl(N)] (lO4)
where fl(N) denotes the least squares estimate based

on N pairs of input-output data and
This was e.g. shown by KACZMARZ[131] and has F(N)= P(N)9'(N + 1)[~ + (o(N + 1)P(N)q~'(N
later been exploited by several authors, e.g. NAG- +1)] -~
UMO and NODA [132], BI~LANGER[133], RICHALET (lO5)
[134].
P(N + 1) = P(N) - P(N)q¢(N + 1)[a
The effect of working with a quantized signal
sign u is discussed in CRUM and Wu [135]. In + q~(N+ 1)P(N)q~(N + 1)]- ' • q~(N
simple cases the adjustment mechanism makes the
+ 1)P(N) = P(N) - F(N)tp(N + 1)P(N)
rate of change of the parameter adjustment pro-
portional to the sensitivity derivatives. See e.g. -- [ I - F(N)~o(N + 1)]P(N) (106)
MEISSINGER and BEKEY[136]. A recent application
of this idea is given by ROSE and LANCE[137]. The P(No) = ~[O,(No)O(No)]-i (lO7)
requirement that the closed loop is stable is a neces-
sary design criterion. Since the system consisting of where No is a number such that O'(No)O(No) is
the adjustable model, the process and the adjust- positive definite.
ment mechanism is highly non-linear the stability The recursive equation (104) has a strong intui-
problem is not trivial. Using Liapunov methods, tive appeal. The next estimate fl(N+ 1)is formed by
LION [138], SHACKCLOTH a n d BUTCHART [139], adding a correction to the previous esfmate. The
PARKS [140], PAZDERA and POTTINGER [142] have correction is proportional to y ( N + l ) - ~ o ( N + l )
designed stable systems. Lion's results have recently fl(N). The term tp/3 would be the value of y at
been generalized to stochastic systems by KUSHNER time N + 1 if the model were perfect and there were
[143]. The powerful stability tests developed by no disturbances. The correction term is thus pro-
Povov [144] and ZAMES[145] have given new tools portional to the difference between the measured
to design adjustment mechanisms which will result value of y(N+ 1) and the prediction of y ( N + l )
in stable systems. Initial efforts in this direction based on the previous model parameters. The
have been done by LANDAU[146] who has proposed components of the vector F(N) are weighting
stable model reference systems using the Popov factors which tell how the corrections and the pre-
criterion. vious estimate should be weighted.
148 K.J. ,~STROMand P. EYKHOFF
Notice, that in order to obtain the recursive efficient in contrast with the recursive least squares
equations it is necessary to introduce the auxiliary method. To obtain an efficient algorithm it is
quantity P. The state of the systems (104), (105) necessary to make y a matrix.
and (106) governing the on-line estimator is thus With the choice
the vector fl, which is the current estimate, and the
symmetric matrix P. The pair (fl, P) thus repre- ? = [cl)'(N)(1)(N)] - ' (1 ll )
sents the smallest number of variables, characteri-
zing the input-output data, which are necessary the algorithm becomes equivalent to the least
to carry along in the computations. squares.
If the model of the system is actually given by The method of Oza and Jury can be applied to
(50) where {e(k)} are independent residuals with more general cases than the least squares. It was
variance a 2, the matrix P can be interpreted as the actually proven for the case when there are errors
covariance matrix of the estimate if ~ is chosen as in the measurements of both inputs and outputs,
a 2. A special discussion is needed about the start provided the covariance function of the measure-
of the iterative procedure. That can be done by: ment errors are known. The assumption of known
covariances of the measurement errors severely
using possible a priori knowledge about fl and P; limits the practical applicability of the method.
by using a "one-shot" least squares estimation
Stochastic approximations
procedure using the first series of observations;
The formula for the recursive least squares is
by starting with P(O)=cI c.f. KLINGER [147].
fl(k + 1) = fl(k) + F(k)[y(k + 1)
Notice, that since fl(N) given by (104) is the least
squares estimate the convergence of the equations -~o(k+l)fl(k)] (112)
(104)-(106) follows directly from the consistency where F was chosen by the specific formula (105).
proofs of the least squares estimate. It can be shown that there are many other choices
Recursive versions of the generalized least squares of 17 for which the estimate fl will converge to the
procedure have been derived by YOUNG [148]. true parameter value b. Using the theory of stocha-
Recursive versions of an instrumental variable stic approximations it can be shown that the choice
method has been derived by PETERKA and SMUK
[104]. An approximative on-line version of the F(k) = ~A(p'(k + 1) (113)
maximum likelihood method has been proposed by
PANUSKA[96].
will ensure convergence if A is positive definite.
A discussion of on-line methods is also given in
See e.g. ALBERT and GARDNER [153]. A particul-
LEATHRUM [149], STANKOVI~, VELASEVI(" and
arly simple choice is e.g. A = L The algorithms
CARAPI~ [150].
obtained by such choices of F will in general give
estimates with variances that are larger than the
Contraction mappings variance of the least squares estimate. The algo-
A technique of constructing recursive algorithms rithms are, however, of interest because they make
have been suggested by OZA and JURY [151], [152]. it possible to reduce computations, at the price of a
We will explain the technique in connection with larger variance. Using stochastic approximations
the least squares problem. Instead of solving the it is also possible to obtain recursive algorithms in
equation cases where the exact on-line estimate is either very
complicated or very difficult to derive. There are
• '(N)O(N)fl(N) = ~ ' ( N ) y (108) excellent surveys available on stochastic approxi-
mations. See e.g. ALBERT and GARDNER [153] and
for each N and showing that fl(N) satisfies a recur- TSYPKIN [154]. Recent applications are given by
sive equation, Oza and Jury introduce the mapping SAKRISON [155], SARIDIS and STEIN [156], [157],
HOLMES [158], ELLIOTT and SWORDER [159],
TN(fl) = fl - ? [O'(N)~(N)fl - O'(N)y] (109) NEAL and BEKEY[160].
where ? is a scalar. It is then shown that the Real time-ident(fication

sequence The recursive version of the least squares method
is closely related to the Kalman filtering theory.
fl(N + 1) = TN(fl(N)) (110)
Kalman considers a dynamical system
under suitable conditions converges to the true
parameters as N--* oo. When applied to the ordinary
x(k + 1) Ox(k) + e(k)
=
least squares problems the algorithm (109) is not y(k) = Cx(k) + v(k) (114)
where {e(k), k = 1, 2 . . . . } and {v(k), k = 1, 2 , . . . } Since the coefficients are constant we have
are sequences of independent equally distributed
random vectors with zero mean values and covari- x(k+l)=x(k). (120)
ance matrices R~ and R 2 respectively. Kalman has
proven the following theorem. The equation (117) can now be written as
Theorem (Kalman). Let the initial condition of
y(k)=C(k)x(k)+e(k) (121)
(114) be a normal random variable (m, R0). The
best estimate of x(k), in the sense of least squares,
and the least squares identification problem can be
given the observed outputs y(1), y(2) . . . . . y(k) is
stated as a Kalman filtering problem with O = L
given by the recursive equations
R1 =0, R z = 2 z.
~(k) = O.~(k- 1) + F(k)[y(k) - C,Yc(k - 1)] The recursive equations of the least squares
estimate can thus be obtained directly from Kalman's
:f(O) = m (115) theorem. This has an interesting consequence be-
where cause it turns out that if the parameters a~ are not
constants but gauss-markov processes, i.e.
r(k) = S(k)C'[CS(k)C' + R2]-1
S(k) = O P ( k - 1)O' + R 1 ai(k + 1) = ~ai(k ) + vi(k) (122)
P(k) = S ( k ) - r(k)CS(k) the Kalman theorem can still be applied. (This

S(O)=Ro. (116) requires a slight generalization of Kalman's proof
since the parameters of the C vector are stochastic
The matrix S(k) has a physical interpretation as processes.)
the covariance matrix of the a priori estimate of BOHLIN [161] has extended the argument to
x(k) given y(1) . . . . . y ( k - 1) and the matrix P(k) processes of the structure (117) with coefficients
as the covariance of the posterior estimate of x(k) which are processes with rational spectral densities.
given y ( 1 ) , . . . , y(k). It thus is possible to obtain parameter estimators
Now consider the least squares identification of for linear models with time varying parameters. It
the system is in fact not necessary to assume a first order
process but the parameters ai can be chosen to be
y(k)+aly(k-1)+ ... +a,y(k-n) stationary processes with rational spectral densities.
In this way it is possible to obtain identifiers for
=blu(k-1)+... +b,u(k-n)+e(k) (117) processes with rapidly varying parameters. This
has been discussed by BOHLIN[77] and WIESLANDER
where {e(k)} is a sequence of normal (0, 2) random
[163]. The method proposed by SEGERSTAHL[164]
variables. can be considered as the special case v,=0. Notice,
Introduce the coefficients of the model as state that with this approach it is necessary to know the
variables
covariances of the processes {vi} characterizing the
xl(k)=al variations in the model parameters. Such assump-
tions will, of course, limit the practical applicability
x2(k)=a2 of the methods. One way to overcome this diffi-
culty is to use the approximative techniques for
estimating the covariances in a Kalman filter
proposed by MEHRA [73] and SAGEand HUSA [165].
Techniques for validating the assumptions of the
x.(k)=a. Kalman filtering problem have been proposed by
x,+ l(k) = bl BERKOVEC[166]. Recursive estimation of the transi-
tion matrix is discussed by PEARSON [167]. An
Xn + 2(k) = b2 analog implementation is given by HS1A and
VIMOLVAN1CH [168].
Non-linear filtering
x2.(k)=b. (118) The relationship between the recursive least
squares and the Kalman filtering theory was ob-
and define the following vector tained by introducing the parameters of the identi-
fication problem as state variables. We thus find
C(k) = [ - y(k - 1). . . . . - y ( k - n),
that there are in principle no differences between
u ( k - 1). . . . . u(k-n)]. (119) parameter estimation and state estimation. A
150 K.J. ASTROM and P. EYKHOFF
parameter estimation problem can be extended to a equation has a solution which is a gaussian distri-
state estimation problem by introducing the para- bution. Apart from this special case the solution of
meters as auxiliary state variables. A constant the functional equation is an extremely difficult
parameter b corresponds to the state equation numerical problem even in simple cases. For a
system of second order with two parameters the
db vector x will have four components. If we approxi-
--=0 (123)
dt mate crudely, e.g. by quantizing each state variable
in 100 levels, the storage of the function p(x, t) for
a fixed t will require 1004= 10~ cells.
for continuous time systems and
Contributions to the theory of non-linear filter-
ing are also given by BucY [170], SHIRYAEV[171],
b(k + 1) = b(k) (124) FISHER [22@ MORTENSEN [227], WONHAM [174],
[220]. A survey is given in the book [225].
for discrete time systems. The (state) estimation
problem obtained in this way will, however, in Approximations
general be a non-linear problem since the para- From the functional equation for the conditional
meters frequently occur in terms like bx(t) in the distribution it is possible to derive equations for
original problem. In the general continuous time the maximum likelihood estimate (the mode), the
case we are thus faced with a filtering problem for minimum variance estimate (the conditional mean)
the model etc. It turns out that these equations are not well
suited for numerical computations. If we want to
dx = f ( x , t)dt + a(t, x)dv compute the conditional mean we find that the
(125) differential equation for the mean will contain not
dy = g(x, t)dt + la(t, x)de
only covariances but also higher moments. It is
where {v(t)) and {e(t)} are Wiener processes with therefore of great interest to find approximative
incremental covariances lodt and Iedt. Some of the methods to solve the nonlinear filtering problem.
components of x are state variables and others are Approximative schemes have been suggested by
parameters of the identification problem. The non- BASS and SCHWARTZ[175], NISHIMURAet al. [176]
SUNAHARA [177], KUSHNER [178], JASZWINSKI
linear filtering problem is completely solved if the
conditional probability of x(t) given {y(s), to ~<s ~<t} [I 79], [180], LARMINATand TALLEC[181]. Kushner's
can be computed. The maximum likelihood article contains a good survey of many of the diffi-
estimate is e.g. obtained by finding the value of x culties associated with the approximative tech-
for which the conditional density has its maximum. niques.
The least squares estimate is given by the condi- The following type of approximation has been
tional mean etc. Compare the resum6 of estimation suggested by several authors. Estimate x of (125)
theory in Appendix A. by 2 given by
It is in principle easy to obtain a recursive for-
mula for the conditional probability distribution d.~=f(~, t ) d t + r ' ( t ) [ d y - g ( ~ , t)dt]. (127)
simply by applying Bayes' rule. For the model (125)
the problem is, however, technically difficult and The estimate 2 is referred to as the extended Kalman
great care must be given to the appropriate inter- filter. The gain matrix F of (127) can be chosen in
pretation of the equation (125). It has been shown many different ways, e.g. by evaluating the optimal
by KUSHNER [172] and STRATONOVICH [223] that gain for the linearized problem or simply by choos-
under suitable regularity conditions the conditional ing
probability density of x(t) given {y(s), to<<.s<~t}
satisfies the following functional equation. r(t)=-Lo'(x,t) (128)
t
" O O2
d,p(t, x)= - i=~, -~-~(f,p)+½
.
~ --(a, asp)]dt
~,s = 1 axiOx j J
in analogy with stochastic approximations. See
e.g. ATHANSet al. [200], Cox [182] and JASZWINSKI
+ [dy-~g(x, t)p(x, t)dx]'[##']-I [225]. The essential difficulty with all the approxi-
mative techniques is to establish convergence. In
x [ d y - S o ( x , t)p(x, t)dx] (126) practice it is often found that the algorithms con-
verge very well if good initial conditions are avail-
where the differential dtp is interpreted in the Ito able, i.e. if there are good initial estimates of the
sense. parameters, and if suitable computational "tricks"
In the special case of linear systems with gauss- are used. A computational comparison of several
markov parameters, discussed before, the functional non-linear filters is given by SCHWARTZand STEAR
[183]. A technique allowing second order non- used simply convert the identification problem to an
linearity in system measurements is discussed in approximation problem by postulating a structure.
NEAL [184]. The few non-parametric techniques available are
The application of (127) to system identification computationally extremely time consuming.
was first proposed by KoPP and ORFORD [185].
On-line and real-time identification. This is a
It has recently been applied to several industrial
fiddler's paradise. Much work remains to be done
identification problems, e.g. nuclear reactors in
to prove convergence as well as to devise approxi-
HABEGGER and BAILEY [186], stirred tank reactors
mation techniques.
in WELLS [187], and head box dynamics in SASTRY
and VETTER[188]. The use of the extended Kalman Empirical knowledge available. The extensive
filter in connection with reduced models is dis- applications of identification methods which are
cussed by BAGGERUDand BALCHEN[189]. now available provide a source of empirical
information which might be worth a closer analysis.
One of the most striking facts is that most
8. CONCLUSIONS
methods yield very simple models even for complex
In the previous sections applicable identification
systems. It seldom happens that models of a single-
techniques as well as (yet) unsolved problems have
input single-output system are of an order higher
been mentioned. Particularly from the point of
than 5. This fact, which is extremely encouraging
view of the practising engineers (being university
from the point of view of complexity of the regu-
professors we are really sticking our necks out now)
lator, is not at all well-understood.
there are many important questions that remain to
Most methods seem to work extremely well on
be answered. For example how should the sampling
simulated data, but not always that well on actual
interval be chosen ? What is a reasonable model
industrial data. This indicates that some methods
structure? How should an identification experi-
might be very sensitive to the a priori assumptions.
ment be planned ? How can it be ensured that the
It therefore seems highly desirable to develop tests
a priori assumptions required to use a particular
which insure that the a priori assumptions are not
method are not violated ? If we leave the area of
contradicted by the experimental data. It also
general problems and study specific methods the
means that it is highly desirable to have techniques
situation is better.
which are flexible with respect to a priori assump-
Linear time-invariant systems. There are good tions. A typical example is the assumption that the
techniques available for identifying linear stationary measurement errors or the residuals have a known
systems as well as linear environments, stationary covariance function. It is, of course, highly un-
stochastic processes. The relations between dif- natural from a practical point of view to assume
ferent techniques are reasonably well undelstood that we have an unknown model but that the resi-
and the choice of methods can be done largely on duals of this unknown model have known statistics.
the basis of the final purpose of the identification.
Comparison of different techniques
There are, however, unresolved problems also in
In spite of the large literature on identification
this area, for example convergence proofs for the
there are few papers which compare different
generalized least squares method. techniques. The exceptions are VAN DEN BOOMand
Multivariable systems. The essential difficulty MEL1S [190], CHERUY a n d MENENDEZ [191],
with multivariable systems is to choose a suitable GUSTAVSSON [78], [100], gMUK [192]. Of course,
canonical form. A few such forms are available but it is more fun to dream up new methods than to
the structural problems are not yet neatly resolved. work with somebody else's scheme. Nevertheless,
For example there are few good techniques to for a person engaged in applications it would be
incorporate a priori knowledge of the nature that highly desirable to have comparisons available.
there is no coupling between two variables or that It would also be nice to have a selection of data to
there is a strong coupling between two other vari- which several known techniques are tried which can
ables. If a multivariable system is identified as a be used to evaluate new methods.
conglomerate of single-input single-output systems,
Where is the field moving?
how should the different single-input single-output It is our hope and expectation that the field is
systems be combined into one model. How do we moving towards more unification and that there
decide if a particular mode is common to several will be more comparisons of different techniques.
loops taking uncertainties into account. The textbooks which up to now have been lacking
Once a particular structure is chosen the solution will definitely contribute to that; forthcoming are
of the multivariable identification problem is EYKHOFF [193], SAGE and MELSA [194]. The area of
straightforward. identification will certainly also in the future be
Non-linear systems. The techniques currently influenced by vigorous development of other fields
152 K.J. ASTR6M and P. EYKHOFF
of control systems theory. One may guess that the [15] H. A. BARKERand D. A. HAWLEY: Orthogonalising
presently active work on multivariable systems will and optimization. Proc. third IFAC congress, London,
paper 36A (1966).
result in a deeper understanding of such systems [16] P. D. ROBERTS: The application ot" or~hogonal func-
and consequently also of the structural problems. tions to self-optimizing control systems containing a
The recent results in stability theory might influence digital compensator. Pro:. third IFAC congress,
London, paper 36C (1966).
the real time identification algorithms. Pattern [17] P. D. ROBERTS: Orthogonal transformations applied
recognition, learning theory and related theories to control system identification and optimization.
will contribute also to the field of identification; IFAC syrup. Identification in autom, control systems,
Prague, paper 5.12 (1967).
e.g. TSYPKIN [195-197]. Also after the IFAC, [18] G. SCHWARZE: Determination of models for identi-
Prague, 1970 symposium much work remains to fication with quality characteristics in the time domain.
be done. Proc. fourth IFAC congress, Warsaw, paper 19.3
(1969).
[19] H. GORECKI and A. TUROWJCZ: The approximation
method of identification. Proc. fourth IFAC congress,
Acknowledgement--The authors like to mention gratefully Warsaw, paper 5.5 (1969).
discussions with many colleagues and material provided by [20] R. DEUTSCH: Estimation Theory. Prentice-Hall,
several institutes and individuals. When this paper was Englewood Cliffs, N.J. (1965).
written Astrt~m was visiting professor at the Division of [21] N. E. NAHI: Estimation Theory and Applications.
Applied Mathematics, Brown University, Providence, Rhode John Wiley, New York (1969).
Island, U.S.A. His work on this paper has been partially [22] P. EYKHOEF: Process parameter and state estimation.
supported by National Aeronautics and Space Administration IFAC symp. Identification in autom, control systems,
under grant No. NGL 40-002-015 and by the National Prague, survey paper; also in Automatica 4, 205-233
Science Foundation under grant No. GP 7347. (1968).
[23] K. J. ASXR6Mand T. BOHLtN: Numerical identification
of linear dynamic systems from normal operating
records. Paper IFAC symp. Theory of self-adaptive
REFERENCES control systems, Teddington, Engl. In Theory of Self-
Adaptive Control Systems tEd. P. H. HAMMOND).
[1] F. M. FISHER: The Identification Problem in Econo- Plenum Press, New York (1966).
metrics. McGraw-Hill, New York (1966). [24] A. V. BALAKRISHNAN: Techniques of system identi-
[2] L. A. ZADEH: From circuit theory to system theory. fication. Lecture notes, Institutio Electrotecnio, Uni-
Proc. I R E 50, 856-865 (1962). versity of Rome (1969).
[3] N. M. ALEKSANDROVSKIIand A. M. DEIctt: Deter- [25] R. M. STALEr and P. C. YUE: On system parameter
ruination of dynamic characteristics of non-linear identifiability. Inform. Sci. 2, 127-138 (1970).
objects (review). 134 ref. Automn Remote Control 29, [26] A. A. FEL'DBAUM: Dual-control theory. Automn.
142-160 (1968). Remote Control 21, 874-880, 1033-1039 (1960); 22,
[4] G. A. BEKEY: Identification of dynamic systems by 1-12, 109-121 (1961).
digital computers. Joint automatic control conf. [27] M. MENDES: An on line adaptive control method.
(Computer control workshop), 7-18 (1969). 2nd IFAC syrup. Identification and process parameter
[5] H. STROBEL: On the limits imposed by random noise estimation, Prague, paper 6.5; see also Automatica 7,
and measurement errors upon system identification in No. 3, 1971, and Dual adaptive control of time-
the time domain. IFAC symp. Identification in autom. varying systems (in German). Regelungstechnik ; 18,
control systems, Prague, paper 4.4 (1967). 157-164 (1970).
[6] H. STROBEL: The approximation problem of the [28] S. C. SCHWARTZand K. STEmUTZ: The identification
experimental system analysis (in German). 161 and control of unknown linear discrete systems.
references. Zeitsehr. Messen, Steuern Regeln 10, 460- Report 24, Information Sciences and Systems Labora-
464 (1967); 11, 29-34; 73-77 (1968). tory, dept. EE., Princeton University (1968).
[7] J. P. JACOB and L. A. ZADEH: A computer-oriented [29] K. J. /~STROM and B. WITTENMARK" Problems of
identification technique. Preprint JACC, 979-981 identification and control. Report, Electronic Sciences
(1969). Laboratory, University of South. Calif., Los Angeles,
[8] E. S. ANOEL and G. A. BEKEY: Adaptive finite-state Calif. (1969). J. Math. Anal. Appl. to appear.
models of manual control systems. IEEE Trans. Man- [30] B. QVARNSTROM: An uncertainty relation for linear
Machine Syst. MMS-9, 15-20 (1968). mathematical models, Proc. second IFAC congress,
[9] G. FANTAUZZI: The relation between the sampling Basle, 634 (1963).
time and stochastic error for the impulsive response of [31] V. HERLES: Limits of precision in identification pro-
linear time-independent systems (short paper). IEEE cedures. Second IFAC symp. Identification and
Trans. Aut. Control AC-13, 426~28 (1968). process parameter estimation, Prague, paper 9.8 (1970).
[10] K. J. /~,STROM: Computational aspects of a class of [32] H. UNBEHAUENand G. SCHLEGEL: Estimation of the
identification problems. Report, Dept. of EE, Uni- accuracy in the identification of control systems using
versity of Southern California, Los Angeles (1969). deterministic test signals. IFAC symp. Identification
[11] A. SANO and M. TERAO." Measurement optimization autom, control systems, Prague, paper 1.12 (1967).
in batch process optimal control. Proc. fourth IFAC [33] J. ~TI~PAN: The identification of controlled systems
congress, Warsaw, paper 12.4. Also Automatica 6 and the limit of stability. IFAC symp. Identification
(1969). in autom, control systems, Prague, paper 1.11 (1967).
[12] R. BELLMANand K. J. ,~STROM: On structural identi- [34] J. ~TI~PAN: The uncertainty problem in system identi-
fiability. Math. Biosci. 1, 329-339 (1969). fication. 2nd IFAC syrup. Identification and process
[13] D. G. LAMPARD: A new method of determining parameter estimation, Prague, paper 1.8 (1970).
correlation functions of stationary time series. Proc. [35] A. V. BALAKRISHNANand V. PETERKA: Identification
IEE 102C, 35-41 (1955). in automatic control systems. Proc. fourth IFAC
[14] T. KJTAMORI: Application of orthogonal functions to congress, Warsaw, survey paper. Also Automatica 5,
the determination of process dynamic characteristics 817-829 (1969).
and to the construction of self-optimizing control [36] H. B. MANN and A. WALO: On the statistical treat-
systems. Proc. first IFAC congress, Moscow, 613-618 ment of linear stochastic difference equations. Econo-
(1960). metrica 11, No. 3 + 4 (1943).
[37] P. WHITTLE:Prediction and Regulation. Van Nostrand, [60] G. A. BEKEYand R. B. McGEE: Gradient methods for
Princeton, N.J. (1963). the optimization of dynamic system parameters by
[38] U. GRENANDEg and M. ROSENBLATT: Statistical hybrid computation. In Computing methods in
Analysis of Stationary Time Series. John Wiley, New Optimization Problems (Eds. BALAKmSHNAN and
York (1957). NEUSTADT). Academic Press, New York (1964).
[39] G. JENKINSand D. WARTS: Spectral Analysis and Its [61] T. DISg.XND: Parameter evaluation by complex domain
Applications. Holden Day, San Francisco (1968). analysis. Preprint JACC, 770-781 (1969).
[40] P. N. NIKIFORUKand M. M. GOPTA: A bibliography [62] R. E. KALMAN: Mathematical description of linear
of the properties, generation and control systems dynamical systems. SIAM J. Control 1, 152-192
applications of shift-register sequences. Int. J. Control (1963).
9, 217-234 (1969). [63] B. L. Ho and R. E. KALMAN: Effective construction
[41] K. R. GODFREY: The application of pseudo random of linear state-variable models from input/output
sequences to industrial processes and nuclear power functions. Regelungstechnik 14, 545-548 (1966).
plant. 2nd IFAC syrup. Identification and process [64] R. E. KALMAN,P. L. FALBand M. A. ARran: Topics
parameter estimation, Prague, paper 7.1 (1970). in Mathematical System Theory. McGraw-Hill, New
York (1969).
[42] H. STROBEL: System Analysis Using Deterministic Test
[65] R. W. KOEPCr~: Private communication (1963).
Signals (in German). V.E.B. Verlag Technik, Berlin [66] K. Y. WONG, K. M. WnG and E. J. ALLBRITTON:
(1968). Computer control of the Clarksville cement plant by
[43] W. GRIT: Systematic errors in the determination of state space design method. IEEE Cement Industry
frequency response by means of impulse-testing (in Conf., St. Louis, Mo. (1968).
German). Industrie-Anzeiger, No. 52. Ausgabe [67] I. H. ROWE: A statistical model for the identification
"Antrieb-Getriebe-Steuerung" (1969). ofmultivariable stochastic systems. Proc. IFAC syrup.
[44] H. H. WILFERT: Signal- and Frequency Response on multivariable systems, Di~sseldorf (1968).
Analysis on Strongly-Disturbed Systems (in German). [68] W. G. TUEL, JR. : Canonical forms for linear systems
V.E.B. Verlag Technik, Berlin (1969). - - I and II. IBM Research report. R.J. 375 and 390
[45] A. VAN DEN BOS: Estimation of linear system co- (1966).
efficients from noisy responses to binary multi- [69] D. G. LUENBERGER: Canonical forms for linear multi-
frequency test signals. 2nd IFAC syrup. Identification variable systems (short paper). 1EEE Trans. Aut.
and process parameter estimation, Prague, paper 7.2 Control AC-12, 290-293 (1967).
(1970). [70] E. J. DAVISON: A new method for simplifying large
[46] E. WELFONDERand O. HASENKOPF: Comparison of linear dynamic systems (correspondence). IEEE
deterministic and statistical methods for system Trans. Aut. Control AC-13, 214-215 (1968).
identification by test-measurements at a disturbed [71] T. KAILATH: An innovations approach to least-
industrial control system. 2nd IFAC syrup. Identi- squares estimation. Part I: Linear filtering in additive
fication and process parameter estimation, Prague, white noise. IEEE Trans. Aut. Control AC-13, 646-655
paper 7.5 (1970). (1968).
[47] I. G. CUMMrNO: Frequency of input signal in identi- [72] K. J. ASTR6M: Introduction to Stochastic Control
fication. 2nd IFAC symposium Identification and Theory. Academic Press, New York (1970).
process parameter estimation, Prague, paper 7.8. (1970). [73] R. K. MEHRA: Identification of stochastic linear
[48] K. J. As'm6M: Lectures on the identification problem. dynamic systems. Proc. 1969 IEEE symposium on
The least squares method. Report, Division of Auto- adaptive processes. Pennsylvania State University
matic Control, Lund Institute of Technology, Lund, (1969).
Sweden (1968). [74] K. J. ASTR6M,T. BOHLIN and S. WENSM~K: Auto-
[49] M. Aord and R. M. STALEY: On input signal synthesis matic construction of linear stochastic dynamic models
in parameter identification. Proc. fourth IFAC con- for industrial processes with random disturbances
gress, Warsaw, paper 26.2. Also Automatica 6, 431- using operating records. Report, IBM Nordic
440 (1969). Laboratory, Stockholm (1965).
[50] V. S. LEVADI: Design of input signals for parameter [75] J. EATON: Identification for control purposes. Paper,
estimation. IEEE Trans. Aut. Control AC-11, 205-211 IEEE Winter meeting, N.Y. (1967).
(1966). [76] R. L. KASHYAP: Maximum likelihood identification
[51] A. RAULT, R. POULIQUENand J. RICHALET: Sensi- of stochastic linear systems. IEEE Trans. Aut. Control
bilizing input and identification. Proc. fourth IFAC AC-15, 25-34 (1970).
congress, Warsaw, paper 5.2 (1969). [77] T. BOHLIN: Real-time estimation of time-variable
[52] D. MIDDLETON: An Introduction to Statistical Com- processes characteristics. Report, IBM Nordic Lab.,
munication Theory. McGraw-Hill, New York (1960). LidingO, Sweden (1968).
[53] E. E. FISHEg: Identification of linear systems. Paper [78] I. GUSTAVSSON: Maximum likelihood identification of
JACC, 473-475 (1965). dynamics of the Agesta reactor and comparison with
[54] R. W. BROCKETTand M. D. MESAROVlC: The repro- results of spectral analysis. Report 6903. Division of
ducibility of multivariable control systems. JMAA 11, Automatic Control, Lund Institute of Technology,
548-563 (1965), Lund, Sweden (1969).
[55] L. M. SILVEgMAN: Inversion of multivariable linear [79] H. RAKE: The determination of frequency responses
systems. IEEE Trans. Aut. Control AC-14, 270-276 from stochastic signals by means of an electronic
(1969). analog computer (in German). Regelungstechnik 16,
[56] M. K. SAIN and J. L. MASSEY: Invertibility of linear, 258-260 (1968).
time-invariant dynamical systems. IEEE Trans. Aut. [80] E. WELFONDER: Stochastic signals; statistical averag-
Control AC-14, 141-149 (1969). ing under real conditions (in German), Fortsch. Bet.
[57] H. SILVERMAN: Identification of linear systems using VDI-Z 8, 165 (1969).
fast fourier transform technique. Ph.D. thesis, Brown [81] H. BUCHTA: Method of estimation of random errors
University (1970). in the determination of correlation functions of
[58] T. F. POTrS, G. N. OgNSTEaNand A. B. CLYMEg: The random infralow frequency signals in control systems.
automatic determination of human and other system Proc. fourth IFAC congress, Warsaw, paper 33.4
parameters. Proc. West. Joint Computer Conf., Los (1969).
Angeles, 19, 645-660 (1961). [82] N. HAYASnI: On a method of correlation analysis for
[59] D. J. Wtt~E: Optimum Seeking Methods. Prentice- multivariate systems. Proc. fourth IFAC eongre6s,
Hall, Engiewood Cliffs, N.J. (1964). Warsaw, paper 33.1 (1969).
154 K . J . fl~STROM a n d P. EYKHOFF
[83] K. N. REID: Lateral dynamics of an idealized moving [106] M. 1. Vu~Kovtt~, S. P. BINGULACand M. DJorovl(::
web. Preprint JACC, 955-963 (1969). Applications of the pseudo sensitivity functions in
[84] K. N. REID: Lateral dynamics of a real moving web. linear dynamic systems identification. 2nd IFAC syrup.
Preprint JACC, 964-976 (1969). Identification and process parameter estimation,
[85] H. G. STASSEN: The polarity coincidence correlation Prague, paper 2.5 (1970).
technique--a useful tool in the analysis of human [107] M.J. LEVIN: Estimation of the characteristics of linear
operator dynamics. 1EEL Trans. Man-Machine Syst. systems in the presence of noise. Eng. Sc.D. disserta-
MMS-10, 34-39 (1969). tion, Dept. EL., Columbia University, New York(1959).
[86] K. GERDIN: Markov modelling via correlation [108] M. AOKt and P. C. YUE: On certain convergence
analysis applied to identification of a paper machine questions in system identification. S I A M J. Control
in the absence of test signal. 2nd IFAC syrup. Identi- 8, 239-256 (1970).
fication and process parameter estimation, Prague, [109] F. W. SMITH: System Laplace-transform estimation
paper 2.8 (1970). from sampled data. IEEE Trans. Aut. Control AC-13,
[87] J. R. WESTLAKE: A Handbook o f Numerical Matrix 37-44 (1968).
Inversion and Solution o.f Linear Equations. John Wiley, [110] V. PETERKA and A. HALOUSKOVX: Tally estimate of
New York (1968). AstrOm model for stochastic systems. 2nd IFAC syrup.
[88] C. M. WOODSIDE: Estimation of the order of linear Identification and process parameter estimation,
systems. 2nd IFAC symp. Identification and process Prague, paper 2.3 (1970).
parameter estimation, Prague, paper 1.7 (1970). [1111 W. Octroi, A. RAULT and H. WEINGARTEN: A new
[89] T. W. ANDERSON: The choice of the degree of a poly- identification algorithm for linear discrete time
nomial regression as a multiple decision problem. systems. 2nd 1FAC symp. Identification and process
Ann. math. stat. 33, 255-265 (1962). parameter estimation, Prague, paper 2.9 (1970).
[90] H. WOLD: A study in the analysis of stationary time [112] D. GRAUPE,B. H. SWANICKand G. R. CASSIR: Reduc-
series. Uppsala, Almqvist & Wiksell (1938). tion and identification of multivariable processes using
[91] W. GERSCH: Spectral analysis of e.e.g.'s by autoregression analysis (short paper). IEEE Trans. ARt.
regressive decomposition of time series. Report, Control AC-13, 564-567 (1968).
Center of Applied Stochastics, School of Aeronautics, [113] P. EYKHOFV: Some fundamental aspects of process-
Purdue University; Math. Biosci. 7, 205-222 0970). parameter estimation. IEEE Trans. Aut. Control AC-8,
[92] K. J..~STR~)M: Computer control of a paper machine 347-357 (1963).
- - a n application of linear stochastic control theory. [114] P. ALPER: A consideration of the discrete Volterra
I B M J. Res. Dev. 11, 389--405 (1967). series. IEEE Trans. Aut. Control AC-10, 322-327
[93] D. W. CLARKE; Generalized-least-squares estimation (1965).
of the parameters of a dynamic model. IFAC symp. [115] R. J. RoY and J. SHERMAN: A learning technique for
Identification in autom, control system, Prague, paper Volterra series representation (short paper). IEEE
paper 3.17 (1967). Trans. Aut. Control AC-12, 761-764 (1967).
[94] K. STEIGLITZand L. E. McBmDE: A technique for the [116] A. G, IVAKHNENKO: Heuristic self-organization in
identification of linear systems (short paper). I E E E problems of engineering cybernetics. Automatica 6,
Trans. Aut. Control AC-10, 461-464 (1965). 207-219 (1970).
[95] A. E. ROGERSand K. STEIOLSTZ: Maximum likelihood [117] A. G. IVAKHNENKO and Ju.V. KOPPA; The group
estimation of rational transfer function parameters method of data handling for the solution of the various
(short paper). IEEE Trans. Aut. Control AC-12, interpolation problems of cybernetics. 2nd IFAC
(594-597 (1967). syrup. Identification and process parameter estimation,
[96] V. PANUSKA: On the maximum likelihood estimation Prague, paper 2.1 (1970).
of rational pulse transfer function parameters (corres- [118] R. E. BUTLERand E. V. BOHN: An automatic identi-
pondence). 1EEL Trans. Aut. Control AC-13, 304 fication technique for a class of non-linear systems
(1968). (short paper). IEEE Trans. Aut. Control AC-11, 292-
[97] K. T. Woo: Maximum likelihood identification of 296 (1966).
noisy systems. 2nd I F A C symp. Identification and [119] K. S. NARENDRAand P. G. GALLMAN: An iterative
process parameter estimation, Prague, paper 3.1 (1970). method for the identification of non-linear systems
[98] K. J. ,~STROM: On the achievable accuracy in identi- using a Hammerstein model (short paper). IEEE
fication problems. Proc. IFAC symp. Identification in Trans. Aut. Control AC-11, 546-550 (1966).
autom, control systems, Prague, paper 1.8 (1967). [120] D. W. MARQUARDT: An algorithm for least-squares
~99] T. BOHL1N: On the problem of ambiguities in maxi- estimation of non-linear parameters. J. Soc. ind. appl.
math. 11,431-441 (1963).
mum likelihood identification. 2nd IFAC syrup.
Identification and process parameter estimation, [121] R. E. BELLMANand R. E. KALABA." Quasi-linearization
Prague, paper 3.2 (1970). Also see this issue. and non-linear boundary-valve problems. Elsevier, New
[100] I. GUSTAVSSON: Comparison of different methods for York (1965).
identification of linear models for industrial processes. [122] J. D. BULL,H. H. KAGIWADAand R. E. KALABA: A
2nd IFAC syrup. Identification and process parameter proposed computational method for estimation of
estimation, Prague, paper 11.4 (1970). orbital elements drag coefficients and potential fields
parameters from satellite measurements. Ann. Gdophys.
[101] L. H. ZETTERBERG" Estimation of parameters for a 23, 35-39 (1967).
linear difference equation with application to LEG
[123] J. D. BULLand R. E. KALABA: Quasilinear/zation and
analysis. Math. Biosci. 5, 227-275 (1969).
the fitting of non-linear models of drug metabolism to ex-
[102] M. G. KENDALL and A. STUART: The Advanced perimental kinetic data. Math. Biosci. 5, 121-132 (1969).
Theory o f Statistics, Vol. II. Griffin, London (1961). [124] A. VAN~EK and J. FESSL: A contribution to para-
[103] K. Y. WONO and E. POLAK: Identification of linear meters and state estimation and synthesis via quasi-
discrete time systems using the instrumental variable linearization. 2nd IFAC symp. Identification and
method. IEEE Trans. Aut. Control AC-12, 707-718 process parameter estimation, Prague, paper 2.6 (1970).
(1967). [125] L. W. TAYLOR,K. W. ILIFF and B. G. POWERS: A
[104] V. PETERKA and K. ~MUK: On-line estimation of comparison of Newton-Raphson and other methods
dynamic model parameters from input-output data. for determining stability derivatives from flight data.
Proc. fourth IFAC congress, Warsaw, paper 26.1 Paper, AIAA third Flight Test Simulation and Support
(1969). Conf., Houston, Texas (1969).
[105] T. C. HSIA and D. A. LANDGREBE: On a method for [126] E. P. MASLOV: Application of the theory of statistical
estimating power spectra. 1EEL Trans. Instrum. Meas. decisions to the estimation of object parameters.
IM-16, 255-257 (1967). Automn Remote Control 2,4, 1214-1226 (1963).
[127] C. A. GALTIERI: The problem of identification of [151] K. G. OZA and E. I. JURY: System identification and
discrete time processes. Report, Electronics Research the principle of contraction mapping. S I A M J . Control
Lab., University of California, Berkeley (1963). 6, 244-257 (1968).
[128] R. B. MCGHEE and R. B. WALFORD: A Monte Carlo [152] K. G. OZA and E. I. JURY: Adaptive algorithms for
approach to the evaluation of conditional expectation identification problem. Proc. fourth IFAC congress,
parameter estimates for non-linear dynamic systems. Warsaw, paper 26.4. Also Automatica 6, 795-799
IEEE Trans. Aut. Control AC-13, 29-37 (1968). (1970).
[129] Y. GENIN: A note on linear minimum variance estima- [153] A. E. ALBERT and L. A. GARDNER JR.: Stochastic
tion problems (correspondence). IEEE Trans. Aut. Approximation and Non-linear Regression. MIT Press,
Control AC-13, 103 (1968). Cambridge (Mass.) (1967).
[130] H. P. WHITAKERet al. : Design of a model reference [154] YA. Z. TSYPKIN: Adaption, learning and selflearning
adaptive control system for aircraft. Report R-164, in control systems. Proc. third I F A C congress,
Instrumentation Lab., MIT, Boston (1958). London, survey paper; also: Adaptation, training
[131] S. KACZMARZ: Approximate solution of systems of and self-organization in automatic systems, Automn.
linear equations (in German). Bull. Int. Acad. polon. Remote Control 27, 16-51, 160 lit. ref. (1966).
Sci., Cl. Sci. Math. Nat. Set. A. 355-357 (1937). [155] D. J. SAK~SON: The use of stochastic approximation
[132] J. NAGUMO and A. NODA: A learning method for to solve the system identification problem. IEEE Trans.
systemidentification. IEEE Trans. Aut. Control AC-12, Aut. Control AC-12, 563-567 (1967).
282-287 (1967). [156] G. N. SAPaDISand G. STEIN: Stochastic approximation
[133] P. R. B~LANGER: Comments on " A learning method algorithms for linear discrete-time system identification.
for system identification" (correspondence). IEEE 1EEL Trans. Aut. Control AC-13, 515-523 (1968).
Trans. Aut. Control AC-13, 207-208 (1968). [157] G. N. SARIDIS and G. STE~: A new algorithm for
[134] J. RICHALET, F. LECAMtJS and P. HUMMEL: New linear system identification (correspondence). 1EEL
trends in identification: structural distance and weak Trans. Aut. Control AC-13, 592-594 (1968).
topology. 2nd IFAC symp. Identification and process [158] J. K. HOLMES: System identification from noise
parameter estimation, Prague, paper 1.2 (1970). corrupted measurements. J. Optimizn Theory Appl. 2,
[135] U A. CRUM and S. H. Wu: Convergence of the 103-116 (1968).
quantizing learning method for system identification [159] D. F. ELLIOTT and D. D. SWORDER: Applications of
(correspondence). 1EEE Trans. Aut. Control AC-13, a simplified multidimensional stochastic approxima-
297-298 (1968). tion algorithm. Proc. J A C C ,148-154 (1969).
[136] H. F. MEISSINGERand G. A. BEKEY: An analysis of [160] C. B. NEAL and G. A. BEKEY: Estimation of the
continuous parameter identification method. Simula- parameters of sampled data systems by means of
tion 6, 95-102 (1966). stochastic approximation. Preprints 1969 Joint autom.
[137] R. E. ROSE and G. M. LANCE: An approximate control conf. (sess. VI-E), 977-978 (1969).
steepest descent method for parameter identification. c.f. also: C. B. NEAL: Estimation of the parameters
Proc. JACC, 483-487 (1969). of sampled data systems by stochastic approximation,
[138] P. M. LION: Rapid identification of linear and non- Univ. of South Calif., Electronic Sciences, Lab., techn.
linear systems. Preprint JACC, Washington (1966). report (1969).
[139] B. SHACKCLOTHand R. L. BUTCHART: Synthesis of [161] T. BOHLIN: Information pattern for linear discrete-
model reference adaptive systems by Liapunov's time models with stochastic coefficients. 1EEL Trans.
second method. IFAC syrup, theory of selfadaptive Aut. Control AC-15, 104-106 (1970).
control systems, Teddington, 145-152 (1965). [162] T. BOHLrN: On the maximum likelihood method of
[140] P. C. PARKS: Liapunov redesign of model reference identification. I B M J . Res. Develop. 14, 41-51 (1970).
adaptive control systems. IEEE Trans. Aut. Control [163] J. WESLANDER: Real time identification Parts I and
AC-11, 362-367 (1966). II. Reports, Division of Automatic Control, Lund
[141] R. P. WISHNER, J. A. TABACZYNSKIand M. ATHANS: Institute of Technology, Lund, Sweden (1969).
A comparison of three non-linear filters. Automatica [164] B. SEOERST~,HL: A theory of parameter tracking
5, 487-496 (1969). applied to slowly varying non-linear systems. Proc.
[142] J. S. PAZDERA and H. J. POTTINGER: Linear system fourth IFAC congress, Warsaw, paper 5.4 (1969).
identification via Liapunov design techniques. Pre- [165] A. P. SAGEand G. W. HUSA: Adaptive filtering with
prints JACC, 795-801 (1969). unknown prior statistics. Proc. JACC, 760 [1969).
[143] H. J. KUSHNER: On the convergence of Lion's identi- [166] J. W. BERKOVEC: A method for the validation of
fication method with random inputs. Report, Division Kalman filter models. Proc. JACC, 488-493 (1969).
of Applied Mathem., Brown University, R.I. (1969).
[167] J. D. PEARSON: Deterministic aspects of the identi-
[144] V. M. PoPov: Absolute stability of non-linear systems
fication of linear discrete dynamic systems (short
of automatic control. Automn. Remote Control 22, 857-
paper). 1EEL Trans. Aut. Control AC-12, 764-766
875 (1962).
(1967).
[145] G. ZAMES: On the input-output stability of time-
varying non-linear feedback systems. IEEE Trans. [168] T. C. HSIA and V. VIMOLVANICH: An on-line technique
Aut. Control AC-11, 228-238 (1966). for system identification (short paper). IEEE Trans.
[146] I. D. LANDAU: An approach to synthesis of model Aut. Control AC-14, 92-96 (1969).
reference adaptive control systems. Report, Alsthon- [169] J. B. FARISON: Parameter identification for a class of
Recherches; to appear in 1LEE Trans. Aut. Control linear discrete systems (correspondence). IEEE Trans.
(1970). Aut. Control AC-12, 109 (1967).
[147] A. KLINGER: Prior information and bias in sequential [170] R. S. BuoY: Non-linear filtering theory. 1LEE Trans.
estimation (correspondence). 1EEE Trans. Aut. Control Aut. Control AC-10, 198 (1965).
AC-13, 102-103 (1968). [171] A. N. SHIRYAEV: On stochastic equations in the
[148] P. C. YOUNG: An instrumental variable method for theory of conditional Markov processes. Theory o f
real-time identification of a noisy process. Proc. probability and its applications, U.S.S.R. 11, 179 (1966).
fourth IFAC congress, Warsaw, paper 26.6. Also [172] H. J. KUSHNER: Non-linear filtering: the exact
Automatica 6, 271-287 (1970). dynamical equations satisfied by the conditional mode.
[149] J. F. LEATHRUM: On-line recursive estimation. Proc. I E E E Trans. Aut. Control AC-12, 262-267 (1967).
JACC, 173-178 (1969). [173] R. L. Sa~ATONOVICH: Conditional Markov processes.
[150] S. STANKOVI~, D. VELAgEVI~ and M. (~ARAPI~" An Theory Probab. Appl. 5, 156-178 (1967).
approach to on-line discrete-time system identification. [174] W. M. WON~AM: Some applications of stochastic
2nd I F A C symp. Identification and process parameter differential equations to non-linear filtering. S I A M J.
estimation, Prague, paper 4.2 (1970). Control 2, 347-369 (1964).
156 K . J . ASTROM and P. EYKHOFF
[175] R. BAss and L. SCHWARTZ: Extensions to multi- [200] M. ATHANS, R. P. WISHNER and A. BERTOLINI: Sub-
channel non-linear filtering. Hughes report SSD optimal state estimation for continuous-time non-
604220R (1966). linear systems from discrete noisy measurements.
[176] M. NISHIMURA, K. FUJU and Y. SUZUKi: On-linc IEEE Trans. Aut. Control AC-13, 504-514 (1968).
estimation and its application to an adaptive control [201] C. BRUNI, A. ISIDORI arid A. RUBERTI: Order and
system. Proc. fourth IFAC congress, Warsaw, paper factorization of the impulse response matrix. Proc.
26.3 (1969). fourth IFAC congress, Warsaw, paper 19.4 (1969).
[177] Y. SUNAHARA: An approximate method of state [202] R. F. BgowN: Review and comparison of drift
estimation for non-linear dynamical systems. Proc. correction schemes for periodic cross correlation.
JACC, 161-172 (1969). Electron. Letters 5, 179-181 (1969).
[178] H. J. KUSHNER: Approximations to optimal non- [203] M. KOSZELNIK,J. MALKIEWICZand ST. TRYBULA: A
linear filters. Preprints JACC, Philadelphia, Pa., 613 method to determine the transfer functions of power
623 (1967). systems. Proc. fourth IFAC congress, Warsaw, paper
[179] A. JASZWINSKI: Filtering for non-linear dynamical 33.2 (1969).
systems. 1EEE Trans. Aut. Control AC-11, 765 (1966). [204] N. S. RAIBMAN,S. A. ANISIMOVand V. M. CHADEYE:
[180] A. H. JASZWINSKI: Adaptive filtering. Automatiea 5, Linear plant type identification technique. 2nd IFAC
475-485 (1969). symp. Identification and process parameter estimation,
[181] PH. DE. LARMINATand M. H. TALLEC: On-line identi- Prague, paper 1.6 (1970).
fications of time varying systems. 2nd IFAC syrup. [205] R. S. Bt)cY: Canonical forms for multivariable
Identification and process parameter estimation, systems (short paper). IEEE Trans. Aut. Control
Prague, paper 3.3 (1970). AC-13, 567-569 (1968).
[182] H. Cox: On the estimation of state variables and [206] M. CUENOD and A. SAGE: Comparison of some
parameters for noisy dynamic systems. IEEE Trans. methods used for process identification. IFAC syrup.
Autom. Control AC-9, 5-12 (1964). Identification in autom, control systems, Prague,
[183] L. SCHWARTZ and E. B. STEAR: A computational survey paper; also in Automatica 4, 235-269 (1968).
comparison of several non-linear filters (short paper). [207] F. DONATI and M. MILANESE" System identification
1EEE Trans. Aut. Control AC-13, 83-86 (1968). with approximated models. 2nd IFAC symp. Identi-
[184] S. R. NEAL: Non-linear estimation techniques (short fication and process parameter estimation, Prague,
paper). IEEE Trans. Aut. Control AC-13, 705-708 paper 1.9 (1970).
(1968). [208] P. EYKHOFF, P. M. VAN DER GRINTEN, H. KWAKER-
[185] R. E. KoPP and R. J. ORFORD: Linear regression NAAK and B. P. VELTMAN; Systems modelling and
applied to system identification for adaptive control identification. Proc. third IFAC congress, London,
systems. A1AA Journal 2300-2306 (1963). survey paper (1966).
[186] L. J. HABI~C,GER and R. E. BAILEY: Minimum variance [209] R. C. K. LEE: Optimal estimation, identification and
estimation of parameters and states in nuclear power control. MIT Press, Cambridge (Mass.) (1964).
systems. Proc. fourth IFAC congress, Warsaw, paper [210] M. J. LEVIN: Optimum estimation of impulse response
12.2 (1969). in the presence of noise. I R E Trans. Circuit Theory
[187] C. H. WELLS: Application of the extended Kalman CT-7, 50--56 (1960).
estimator to the non-linear well stirred reactor problem. [211] M. J. LEVIN: Estimation of system pulse transfer
Proc. JACC, 473-482 (1969). function in the presence of noise. Preprints JACC,
[188] V. A. SASTRYand W. J. VETTER: The application of Minneapolis, 452-458 (1963).
identification methods to an industrial process. Pre- [212] D. V. LINDLEY: Introduction to Probability and
prints JACC, 787-794 (1969). Statistics, front a Bayesian Viewpoint. Cambridge
[189] A. BAGGERUDand J. G. BALCHEN: An adaptive state Univ. Press, London (1965).
estimator. 2nd IFAC syrup. Identification and process [213] N. E. NAHI: Optimal inputs for parameter estimation
parameter estimation, Prague, paper 10.1 (1970). in dynamic systems with white observation noise.
[190] A. J. W. VAN DEN BOOM and J. H. A. M. MELIS: A Proc. JACC, 506-513 (1969).
comparison of some process parameter estimating [214] V. PANUSKA: An adaptive recursive least-squares
schemes. Proc. fourth I F A C congress, Warsaw, paper identification algorithm. Proc. 1969, IEEE symposium
26.5 (1969). on adaptive processes. Pennsylvania State University
[191] J. CHERUVand A. MENENDEZ: A comparison of three (1969).
dynamic identification methods. Preprints JACC,
[2151 N. S. RAIBMANand O. F. HANS: Variance methods of
982-983 (1969).
identifying multidimensional non-linear controlled
[192] K. ~MUK: Comparison of some algorithms for least
squares updating. 2nd I F A C symp. Identification and systems. Automn Remote Control 28, 695-714 (1967).
process parameter estimation, Prague, paper 2.10 [216] N. S. RAmMAN, S. A. ANISIMOVand F. A. HOVSEVIAN:
(1970). Some problems of control plants identification. Proc.
[193] P. E~KHOrr: System parameter and state estimation. fourth IFAC congress, Warsaw, paper 5.1 (1969).
John Wiley, London; to appear (1971). [217] I. H. Row~: A bootstrap method for the statistical
[194] A. P. SAGE and J. L. MEI_SA: System Identification. estimation of model parameters. Preprints JACC,
Academic Press, New York; to appear. 989-987 (1969).
[195] YA. Z. TSYVKIN: Optimization, adaptation, and [218] E. R. SCHULZ: Estimation of pulse transfer function
learning in automatic systems. Computer Inform. Sci. parameters by quasilinearization (short paper). 1EEE
2, 15-32 (1967). Trans. Aut. Control AC-13, 424-426 (1968).
[196] YA. Z. TSYPKIN: Self-learning--What is it'? IEEE [219] J. VALIg: On-line identification of multivariable-linear
Trans. Aut. Control AC-13, 608-612 (1968). systems of unknown structure from input output data.
[197] YA. Z. TSVPK~: Generalized algorithms of learning. 2nd IFAC symp. Identification and process parameter
Automn Remote Control 31, 97-104 (1970). estimation, Prague, paper 1.5 (1970).
[198] M. Aord and R. M. STALEY: Some computational [220] W. M. WONr~AM: Random differential equations in
considerations in input signal synthesis problems. In control theory. In Probabilistic Methods in Applied
Computing Methods in Optimization Problems--Vol. 2 Mathematics (Ed. A. T. BHARUCHA-REID). Academic
(Eds. ZADEH, NEUSTADT, BALAKRISHNAN). Academic Press, New York (1969).
Press, New York (1969). [221] N. S. RAIBMAN: What is system identification? (in
[199] K. J. ASTR6M: Control problems in papermaking. Russian). Moscow (1970).
Proc. IBM Scientific computing symposium on control [222] P. CA_rNES: Maximum likelihood identification of
theory and applications, Yorktown Heights, New multivariable systems. Ph.D. thesis, Imperial College,
York (1964). London (1970).
[223] R. L. STRATONOVaCH: Conditional Markov Processes or

and Their Application to the Theory of Optimal Control.
American Elsevier, New York (1968). Xu(k)[y(k)- =0 (A.3)
[224] K. J. ASTR6Mand S. W~NSMARK: Numerical identi- k
fication of stationary time series (1964). or
[225] A. H. JASZWrNSKI: StochasticProcesses and Filtering
Theory. Academic Press, New York (1970). u(k)y(k)
[226] J. FmHER: Conditional probability density functions k
(A.4)
and optimal non-linear estimation. Ph.D. thesis, P= u(k).(k):
UCLA, Los Angeles (1966). k
[227] R. E. MORTENSEN: Optimal control of continuous-
time stochastic systems. Ph.D. thesis (engineering),
University of California, Berkeley (1966). /t is the optimal estimate under the conditions given.
[228] W. L. RooT: On system measurement and identifica- Note from (A.3) that the terms y ( k ) - f l u ( k ) are
tion. Proc. Symposium on system theory, Polytechnic weighted with respect to u(k); quite naturally the
Institute of Brooklyn, April 1965. larger the input signal, the more importance is
[229] R. T. PROSSERand W. L. ROOT: Determinable classes
of channels. J. Math. Mech. 16, 365-398 (1966). assigned to the deviation between observation y(k)
[230] W. L. ROOT: On the structure of a class of system and "prediction" flu(k)! Equation (A.4) refers to
identification problems. 2nd IFAC Symposium.
Identification and process parameter estimation, the correlation methods. For the extension to the
Prague, paper 8.1 (1970). more-parameter case and to the generalized least-
squares method c.f. section 5.
APPENDIX A
A resumd of parameter estimation Y
As a tutorial resum6 of the statistical methods
the following example of utmost simplicity may
suffice. Consider the situation of Fig. 8 where an
/ ~'I arctan [~
FIG. 9. Interpretation of least-squares estimation.

observed sigwats
FXG.8. The simplest example of parameter
estimation. Using the rnaximum-likelihood method for the
same case as before, we have to know p., the
probability density function of n(k). In that case the
estimate [3 has to be found for parameter b. This measurement of u(k) provides us with the know-
estimate has to be derived from a number of signal ledge sketched in Fig. 10, the a priori probability
samples u(k) and y(k) where
y(k) = bu(k)+ n(k) (A.1)
and where the average (expected) value of n(k) is

zero.
Using the least squares method the estimate is
chosen in such a way that the loss function, defined
as
V(fl) = ~ [y(k) - [3u(k)] 2 = (y - [3u)'(y- flu)

k ct~m ulkl
is a minimum. p(y;b=O) =Pa
In Fig. 9 the differences between the observations
y and the "predictions" flu are indicated. The FIG. 10. Interpretation of maximum-likelihood
minimization can be pursued analytically; a neces- estimation; a priori knowledge.
sary condition for the minimum is
density of y(k) with b as parameter. N o w the
[ y ( k ) - [3u(k)] 2 = 0 measurement, or a posteriori knowledge, of y(k)
d-fl brings us to the situation indicated in Fig. 11. The
[3=~ (A.2) function L(y(k) ; [3) is called likelihood .function.
158 K . J . ~STROM and P. EYKHOFF
used as the a priori knowledge for the following

measurement. In this way the development of &,
with increasing number of observations can be
followed; c.f. Fig. 13. Note, that p,, the additive
noise being stationary, does not change.
a Ix~teriwi...~]~
.-7, ~ L/-,,<,>
"L yOO;p)
(
FIG. 11. Interpretation of maximum-likelihood

estimation; a posteriori knowledge.
/(/ f'iJli'
a Ixiori.._~. ,4- .~ I t
( Tt!- I q
-,,.
I
We have to assign an estimate fi from this function i,

L, A reasonable and popular choice is to take that
value ~, for which L(y(k); fl) has its maximal
value. Again this can be generalized to more-
parameter cases.
Using the Bayes' method for the same case, one
needs, as before, p,, but also the apriori probability
density function Pb of b. Note, that previously b
was an unknown constant, and that now b is a
FIG. 13. Interpretation of Bayes' estimation.
random variable. From Bayes' rule
p(bly ) ~ p(y, b)_ p(ylb)p(b) The reader is invited to consider special cases
P(Y) P(Y) like u(k) = 0 and u(k)~ ~. Again the method can be
generalized to more parameters; its vizualization
this can be interpreted as: the probability density has severe limitations, however. Note that in this
function of the parameter b, given the result of the case the knowledge on b is given in terms of &,, a
measurement on y. This can be rewritten as function. In practice the reduction from a function
to a single value (estimate) can be done by using
p(y,b)=p(y-ub, b ) l ~ l =p~(y-ub)pb(b) a cost or loss function, providing a minimum cost--
or minimum loss estimate.
1 The problem of input noise. The simple example
since b and n are statistically independent. Pb and of Fig. 8 also serves very well to illustrate the so-
p, give a probability density function as indicated called "problem of input noise". Consider the
in Fig. 12. Note, that this is the apriori knowledge, system illustrated by the block diagram in Fig. 14
where neither the input nor the output can be
l~,bl IPb observed exactly.
y-~ / I x In(k)
I ~ ...-# ! ~x
Li p. . . . .
/l"t / .-"
/ I.>" _
~ tan t ~
observed signits
FIG. 12. Interpretation of Bayes' estimation. FiG. 14. The problem of input noise.
available before the measurement of u(k) and y(k).

These measurements provide us with a "cut" It is well-known in statistics (see e.g. LINDLEY
through the probability-density function, from [212]) that, unless specific assumptions are made
which the a posteriori probability function for b concerning the variations in u, n, and v, it is not
follows. This new probability function now may be possible to estimate the dynamics of the process.
Consider e.g. at a2 bl
y(k)- l'5y(k- 1) + 0 . 7 y ( k - 2) = 1.0u(k- 1)
y(k)=bu(k)+n(k)
b2
a(k)=u(k)+v(k). + 0.5u(k- 2) + 2e(k) (A.5)
Assume, that v(k) and n(k) INPUT

are independent stochastic
variables with zero mean
'o l J, ]1-]
values. If v(k)=0 we find
. 111 I !
10 20 30 40 50 60 70 80 90 100
that the estimate ofb is given -k
OUTPUT
by (A.3) 10
~Vt(k)y(k)
k L
/..a
k
~(k)~(k)" Q
However, if n(k)= 0 we find
by the same argument that
the estimate of b is
j
~ Xa(k)y(k).
~,y(k)y(k) ._F I
J
._.l.
k
This corresponds to choos-

15
ing fl such that the difference 10 20 30 /,0 50 60 70 SO 90 I00
P- k
between the observations FIG. 16. Input and,'output signals of the example.
u(k) and the predictions
y(k)/fl are as smallas possible where {e(k)} is a sequence of independent normal
in the least squares sense. (0, 1) random numbers generated by a pseudo
See Fig. 15. random generator. The following values of 2 have
been used: 0, 0.1, 0.5, 1.0 and 5.0.
In Table B. 1 we show the results obtained when a
model having the structure (B.1) is fitted to the
generated input-output data using the least squares
procedure.
The estimates are calculated recursively for
models of increasing order to illustrate the very
typical situation in practice when the order of the
model is not known. In Table B. 1 we have shown
the least squares parameter estimates and their
estimated accuracy, the loss function and a con-
ditioning number of the matrix ~ ' ~ . The condi-
FIG. 15. The problem of input noise. tioning number/~=2n{max (A)ij} max{A-1)i~} is
chosen rather arbitrarily.
Without additional information it is, of course, We will now analyse the result of Table B.I.
impossible to tell which estimate to choose. Let us first consider the case of no disturbances
2 = 0. In this case we find that it is not possible to
compute the third order model, we find that the
APPENDIX B matrix ~ ' ~ is singular as would be expected. The
An example of least squares identification of a para- conditioning number is 1.3 x 106. We also find
metric model that the estimated standard deviations of the second
To illustrate the least squares method, its appli- order model are zero.
cations and some numerical problems which might To handle the numerical problems for a model of
arise, we provide an example. The computations third order in a case like this, we must use numeri-
are carried out on input-output data from a known cal methods which do not require the inversion of
process. • ' ~ e.g. the reduction oft9 to triangular form using
In Fig. 16 we show input-output pairs which are a QR algorithm. A reduction based on a sequence
generated by the equation of plane rotations has been pursued by PETERKA
TABLE B.I. LEASTSQUARE~ESTIMATESOF THE PARAMETERS
Model
order --91 ~2 ~a fll ~2 ~3 V
True
parameter 1'50 0"70 0"0 1"0 0"5 0"0
0.88 +_0"03 1-23 -+0.18 1 "71 265'863 59

~=0.0 1-50 -+0.00 0'70-+0-00 1.00-+0-00 0.50_-0.00 0'00 0"000 203
1 "3 × 106
,-.]
O:
0"88 -+0"03 1 "19-+0-18 1.66 248.447 60
k=O.1 1-50-+0"01 0'69 i 0 " 0 1 0"99+_0"01 0-49 ~0-02 0.11 0-987 205
1"52-+0"11 0'73-+0'16 0"02+0"08 0'99-+0-01 0'47-+0'11 0.02 Z 0-08 0-11 0"983 35974
t'r'l
,.<
0-88 ± 0"03 1 "10+0"17 1 '59 227"848 63
k=0-5 1 "48 -+0"04 0"67 -+0"03 0"96 _+0"06 0"48 -+0"07 0"53 24"558 206
1.54-+0"11 0"76-+0"16 0"04-+0"08 0"96-+0-06 0"42+_0"12 -0"04 -+0-09 0"53 24-451 1518 ©
0-88 -+0-03 1.00-+0"20 1 "84 308"131 75

Z=1.0 1 "47 -+ 0'06 0"66 + 0'06 0.93-+0-12 0'46-+0'14 1"06 99"863 212
1-55-+0"11 0"82_.'0"17 0'09__+0"09 0"93__+0"13 0"37-+0-16 -0.02+0.15 1-07 98"698 476
0'86±0-05 0"20-+0"83 7"55 5131"905 520

k=5.0 1-48 -+0"07 0' 74 ± 0'08 0-98+0"61 0'4l +0.61 5"29 2462-220 1174
1 "55+0-11 0-87-+0'18 0"09 + 0"09 0.97+0-64 0.24-+0'66 0-13 + 0 ' 6 4 5-32 2440'245 2031
System i d e n t i f i e a t i o n - - a survey 161
a n d gMUK [104]. T h e y have shown t h a t this calcu-

case5 ),= 5"0 n2
l a t i o n can also be d o n e r e c u r s i v d y in the o r d e r o f
the system. n V Ix nl 1 2 3 4 5
P r o c e e d i n g to the case o f 2 = 0.1, i.e. the s t a n d a r d 0 21467.64 0 156 185 122 92 75
d e v i a t i o n o f the disturbances is one tenth o f the 1 5131"905 520 1 52 26 18 14
m a g n i t u d e o f the i n p u t signals, we find t h a t the 2 2462.220 1174 0"21 0.94 1'2
3 2440"245 2031 3 1"2 1"5
m a t r i x ~ ' ~ is still b a d l y c o n d i t i o n e d when a third 4 2375"624 2910 4 1'6
o r d e r m o d e l is c o m p u t e d . 5 2290-730 3847
A n a l y s i n g the details we find, however, t h a t the
G a u s s J o r d a n m e t h o d gives a r e a s o n a b l e accurate
inverse o f ~ ' ~ . Pre- a n d p o s t m u l t i p l y i n g the m a t r i x tion as well as the values o f the test variable ( 5 8 )
with its c o m p u t e d inverse, we find t h a t the largest when testing the r e d u c t i o n o f the loss f u n c t i o n for
off-diagonal d e m e n t is 0-011 a n d the largest a m o d e l o f o r d e r nl c o m p a r e d to a m o d e l o f o r d e r
d e v i a t i o n o f d i a g o n a l elements f r o m 1.000 is 0.0045. n2 as was discussed before. W e have
W e also find t h a t the estimates ~3 a n d /~3 d o n o t
differ significantly f r o m zero. F(2, 100) = 3" 09 =~P{ t > 3 "09 } = 0"05.
W e will also discuss some o t h e r ways to find the
o r d e r o f the system. W e can e.g. c o n s i d e r the Hence at the 5 per cent risk level we find in all cases
variances o f the p a r a m e t e r s . W e find e.g. f r o m t h a t the loss function is significantly reduced when
T a b l e B.1 t h a t the coefficients ~3 a n d P3 do n o t the o r d e r o f the system is increased f r o m 1 to 2
differ significantly f r o m zero in a n y case. In Table but t h a t the reductions in the loss function by a
B.2 we also s u m m a r i z e the values o f the loss func- further increase o f the loss function are n o t signifi-
cant.
W e thus find t h a t by a p p l y i n g the F-test in this
TABLEB.2. GlWS THE VALUES OF THE LOSS FUNCTION V,
THE CONDITIONING NUMBER Ix OF (I)P(I) AND A TABLE OF case we get as a result t h a t the system is o f second
/-VALUES VCHEN IDENTIFYING MODELS OF DIFFERENT ORDER o r d e r for all samples. The actual p a r a m e t e r values
TO THE EXAMPLE OF F I G . 16
o f T a b l e B.1 as well as the e s t i m a t e d accuracies
give an i n d i c a t i o n o f the accuracy t h a t can be
case 1 ~.=0 rt2 o b t a i n e d in a case like this.
n V IX nl 1 2 I t should, however, be e m p h a s i z e d t h a t when the
same p r o c e d u r e is a p p l i e d to practical d a t a the
0 2666.23 0 442"4 do results are very s e l d o m as clearcut as in this simu-
1 285.86 59 1 ov
2 0.00 205 l a t e d example. See e.g. GUSTAVSSON [100].
R6sum6--Le domaine de l'identification et de l'estimation

case 2 ~=0.1 n2 des param~tres de processus s'est rapidement developp6
durant la d6cade pass6e. Dans cette revue, l'6tat de l'art de
n V IX nl 1 2 3 cette science est pr6sent6 d'une mani6re syst6matique. Une
attention est accord6e aux propriet6s g6n6rales et ~ la
0 2644"15 0 472 57000 42100 classification des probl~mes d'identification. Les structures
1 248"447 60 1 12000 5900 des modules sont discut6es; leur choix depend du but de
2 0.587 205 2 0.191 l'identification et des informations disponibles ~t priori.
3 0.983 35974 Pour l'identification des modules qui sont lin6aires dans les
param6tres, la revue explique la m6thode des moindres
carr6s et plusieurs de ses variantes qui peuvent resoudre le
case 3 ),=0.5 n2 probl~me des r6sidus correl6s, telles que les moindres carr6s
iter6s et g6n6ralis6s, la m~thode de vraisemblance maximale,
n V IX nl 1 2 3 4 la m6thode de la variable influente, le principe de pointage.
Rebemment, la situation non-lin6aire, l'identification dans
0 2701.35 0 532 2616 1715 1283 la boucle et l'identification en temps r6el ont donn6 lieu
1 227.848 63 1 397 195 130 des developpements considerables qui sont d6crits d'une
2 24.558 206 2 0"2 0"1 mani~re coh6rente.
3 24"451 1518 3 0'8 Plus de 230 r6f6rences sont donn~es, la plupart en ce qui
4 24.006 2982 concerne des publications r6centes. Un appendix contient
un r6sum6 des principes d'estimation de param~tres et un
autre donne un expos6 plus d6taill6 d'un exemple d'une
case4 ~.= 1.0 n2 estimation au moyen des moindres carr6s.
n V IX nl 1 2 3 4 5
Zusammeafassung--Das Gebiet der Identifikation und
0 3166"78 0 455 734 487 365 292 ProzeBparametersch~tzung entwickelte sich im letzten
1 308"131 212 1 100 50 33 25 Jahrzehnt sehr schnell. In dieser ~bcrsicht ist der Stand
2 99"863 476 2 0"55 0"72 0"80 dieses Wissenschaftsweiges systematisch dargestellt. Auf-
3 98'698 873 3 0"88 0"92 merksamkeit wurde allgemeinen Eigenschaften und der
4 96"813 1351 4 0"96 Klassifikation der Identifikationsprobleme geschenkt.
5 94 "800 1647 Modellstrukturen werden diskutiert; ihre Wahl dreht sich
um den Zweck der Identifikation und um die verf'dgbare a
162 K.J. A S T R 6 M a n d P . EYKHOFF
priori-Kenntnis. Fiir die Identifikation von Modellen, die BHtlMaHHe O6mHM CBOI~CTBaM H i<naccriqbHKattnn npo6.qeM
die M e t h o d e der kleinsten Q u a d r a t e u n d verschiedene ihrer ono3naBam4a. O6cyxc,aatoTca cTpyKTypm MO~leaeR; rix
Varianten, die das P r o b l e m der korrelierten Gleichungs- BbI6Op 3aBHCHT OT Henri OHO3HaBaHri~/ HOT 3apatiee ~IOCTy-
fehler, niimlich wiederholter u n d verallgemeinerter M e t h o d e n HHbIX CBe~eHHH. ~flfl OnO3HaBaHrl~ MO~le~ei~ J1HHe~HblX B
der kleinsten Quadrate, M a x i m u m likelihood Methode, napaMeTpax, o~3op O61,~CH~eT MeTO~ HaHMeHblIIHX
" t a l l y " - - P r i n z i p charakterisieren. N e u e r d i n g s legte die KBa,~paTOB ri HeKoTOpble 143 eFO BapHariTOB MoFyIIIHX
nichtlineare Situation, die on-line u n d die real-time Identi- pellIriTb ripofi.qeMy KoppeJ114pOBaHHblX OCTaTKOB, KaK ]la
fikation a u s g e d e h n t e Entwicklungen n a h e , fiber die zt, s a m - ripriMep !qOBTOOIATeJIBHble HaHMeHBLtlHe KBallpaTbl, MeTO,q.
m e n h / i n g e n d berichtet wird. M e h r als 230 Literaturstellen Har~6OJ1bluero n p a a ~ o n o a o a r i n , MeTO~ B~Iri~voulefi t~cpe-
sind angegeben, die sich meist a u f neue Arbeiten beziehen. MeHHO~, H19HHUHIq EIO21CqeTa. He~aaHO He.~riHe~Hoe
In A n h ~ n g e n wird eine Z u s a m m e n f a s s u n g der Prinzipien COCTOJ:IHHC, ono3HaBaHtte B KOHType Vt ono3ttaBaHrte 8
zur Parameterschfitzung u n d ein g e n a u e r behandeltes ~Ie~CTBHTe,rlbHOM BpeMeHri npriaeJir~ K 3HaqHTeJlbHblM
Beispiel einer Sch~itzung nach der M e t h o d e der kleinsten pa3BriTnnM roropbIe onrml,maloxca cricTeMaTriqecrrtM
Q u a d r a t e angeffihrt. o6pa3oM.
l-Ipriae~erio cm, lme 230 pedimpeHtmia ornocmUHxc~
Pe3mr~e O6:laCTb OrlO3HaBaHl4~i 14 OHeHKri rlapaMeTpoB FJIaBHI,IM o~pa3OM K HellaBHI,IM lIy{51IriKatlH~lM.
rlpouec~oB 6blCTpO pa3BriYlaCb B TetleHHH HpoILIYlOFO l"lpririO ~KeHI4~I c o ~ e p x a T CBo~Ky FIpHHHHHOB oLIeHKH
~teC~Tl'IheTrlfl. B HaCTOfltUeM o63ope ¢HCTeMaTHqeCKn napaMeTpoB ri 60slee 11eTa.rrbHoe orlucaHne ~]priMepa
ripe~lCTaBJ'leHO HacTofltttee COCTO~iHrie3TO~ HayKri. Y~e~aerc~i OHeHKVl rtpI,t flOMOI~H HariMeHbmriX KBaRpaToB.

Automatica 1970

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatica 1970

Uploaded by

Copyright:

Available Formats

Automatica, Vol. 7, pp. 123-162. Pergamon Press, 1971. Printed in Great Britain.

System Identification A Survey"

Summary--The field of identification and process-parameter (5) Identification of linear systems

(2) General properties of identification problems (9) References

----explicit mathematical relations/model 2. GENERAL PROPERTIES OF IDENTIFICATION

When carrying out identification experiments of Using Zadeh's formulation it is necessary Io

Another type of identification problem is environment are known precisely. It can be

max [Hr(joo)- Ho(jw)[ (6)

IlogHr(joJ)-logtIo(jOJ)}=(oT (7) d x = f ( x , u, fl)

fff jr y(t)e-~tdt and

It is easy to show that /~1~-.-~-/~2. The reason is

This method, applied to a positive definite

proec$$~ Y quadratic function of n variables, can reach the

In the scalar case F(i)=l/i satisfies the above

as an iterative approach, i.e. by some type of 4. CHOICE OF MODEL STRUCTURE

logz--+y, Iogxl~u t , logx2---*U 2 where x is an n-vector, the input u is a p-vector,

[62]. The impulse response and the transfer func-

An analogous form for systems with several outputs

Canonical Jbrms for linear stochastic systems

interpretation as the conditional mean of the state [2191.

+ B,(q) . . . . C(q) "t" (41)

parameters. has been used by BOHUN [77], and STEmLITZ and

By introducing the shift operator q defined by y(k) = Ox(k) + e(k) (42)

Fro. 3. Definition of the "equation error".

-y(n) - y(n - 1) ... - y ( 1 ) iu(n) u(n-1) ... u(1)

-- y(N + n - - 1) -y(N+n-2) • . . -y(N) iu(N+n-1) . . . . . . u(N)

/~'=[axa2 ... a.blb2 . . . b.]. (51)

-N+n- 1 N+n- 1 N+n-1 N+n--I N+n- I ~,+n- 1

~, y~(k) ~ y(k)y(k-l) . . . y, y(k)y(k-n+l)- y, y(k)u(k) - ~ y(k)u(k-I) . . , - ~ y(k)u(k-n+i)

N+n-2 N+n--2 N+n- 2 N+n-2 N +n- 2

Z Y2(k) - ~ y(k)u(k+n-1) - ~_, y(k)u(k+,,-2) . . . - ~ y(k)u(k) ©:

predictor of y(k) based on y ( k - 1), y ( k - 2 ) , . . . ny.(t) = ~ k ~ , y(k)u(k + i)

Ry(O) Ry(1) . . . R y ( n - l ) i -Ry.(0) -R,y(l) . . . -R,y(n-l)-

Ry(O) -Rr,(n-1) -Ry,(n-2) . . . -Ry,(0)

is as small as possible. Compare with the block

Hence if a correlation analysis is performed, it is Necessary conditions are:

the residuals {e(k)} are uncorrelated. It is easy

x(k + 1) + ax(k) = bu(k). y(k + 1 ) - 0.5y(k) = 1.0u(k) + n(k + 1) + 0.1 n(k)

model order from 5 to 4 gives a significant reduc-

- 0 . 2 5 q - 4 + 0 . 1 2 q -5 Hence if the signals i and )~ are considered as

Taking the uncertainties of the coefficients a and fl G*(q- ) G (q- )

Another approach along these lines is the algorithm where

D*(q - 1) = A j•_ l ( q - 1) and {u(k), k = 1, 2 . . . . . N} is the applied input

A*(q-1)- C*(q- '!y(k) (85) y=C~fl + e (88)

d--~x=f(x, u, [3, t) 7. ONLINE AND REAL-TIME IDENTIFICATION

b On-line least squares

where fl(N) denotes the least squares estimate based

where ? is a scalar. It is then shown that the Real time-ident(fication

P(k) = S ( k ) - r(k)CS(k) the Kalman theorem can still be applied. (This

[223] R. L. STRATONOVaCH: Conditional Markov Processes or

FIG. 9. Interpretation of least-squares estimation.

y(k) = bu(k)+ n(k) (A.1)

and where the average (expected) value of n(k) is

V(fl) = ~ [y(k) - [3u(k)] 2 = (y - [3u)'(y- flu)

used as the a priori knowledge for the following

FIG. 11. Interpretation of maximum-likelihood

We have to assign an estimate fi from this function i,

available before the measurement of u(k) and y(k).

Assume, that v(k) and n(k) INPUT

This corresponds to choos-

0.88 +_0"03 1-23 -+0.18 1 "71 265'863 59

0-88 -+0-03 1.00-+0"20 1 "84 308"131 75

0'86±0-05 0"20-+0"83 7"55 5131"905 520

a n d gMUK [104]. T h e y have shown t h a t this calcu-

A(q-1)- C(q- '!y(k) (85) y=C~fl + e (88)