2 - 2 Time Domain Output Error Identification PDF

05/10/17
Time-domain output-error identification

Marco Lovera
Dipartimento di Scienze e Tecnologie Aerospaziali, Politecnico di Milano
The output-error method
Model class: } only uncertainty in measurement

there Is me
randomness in the dynamics
µ
can be non .
linear function
Assumptions: Xcok × . initial state

| mat of the tune in not Known no It Is also estimated
• y is a scalar measurement
• u(t) piece-wise constant with period Ts
• or equivalently true system belongs to the net
I to keep things mriple )
Marco Lovera
Measurement model:
• Measurements are discrete easier

really r
not necessary
but makes things
• Sampling is uniform and defined by
• Measurement equation:
From
deterministic value Is affected
( only
the expected value by error )
=
-
true value measurements

noise
of attnut
•
mean variance
• at different time
ewicorrelodeol
instant
one
random number Gowmoin
Joan be a function of K
Marco Lovera
• Under the previous assumptions the samples of the

measured output are such that by it Is htro mean assumption
q
A
Y ym
:
Ym
,
as y(k) is a deterministic sequence. immune

.
• In terms of variance we have
• Therefore
this must
xxx
be proven
Marco Lovera
• We have to check independence of the measurements,

which, under Gaussianity assumptions, reduces to
checking incorrelation:
• Expanding the product:
• And recalling that samples of y are deterministic:

htho mean there mean
/ ¥
www.t.mudemt from Vol
It
• Therefore =o?
}
eevysjiaf
H
they are uncorrelated
Marco Lovera
• Block diagram:
• Hence the name output-error for this set-up:

• Only measurement noise is considered
• No disturbances acting on the plant are included in the
model.
• In other words, the map from u to y is deterministic.
Marco Lovera
The joint density of the data can then be written as
so the logarithm of the joint density is

depends on the parameters
.
Marco Lovera
• Therefore the log-likelihood can be obtained by plugging

the measurements in place of the running variables, to get
• Note now that maximizing the log-likelihood is equivalent

to the minimisation of
Kent linear
if it in put of non system
compute min
it is not possible to
need to do it
in closed form ,
we
numerically
Marco Lovera
• Note that defining
we have that
• Therefore the cost function is equal to the sum of the

squares of the deviations between the measured outputs
and the model outputs.
• This is a particular case of ML estimation known as Least

Squares (LS) estimation.
Marco Lovera
• Note further that under the assumption

we have that
& t
measurement between true and model
error
discrepancy system
• Therefore if then
and the cost converges to
• The optimal cost is zero only in the noise-free case.

Marco Lovera
An optimization scheme for the OE method
initially
Consider a starting value for the parameter and a
perturbation
Taking a second order approximation imposing stationarity
we get
vector to
matrix & to
vector
Marco Lovera
Solving for the increment in the parameter we get

? ¥olo=o
.
Therefore if the cost is truly quadratic then starting from the

initial guess we find the minimum in one iteration.
Marco Lovera
If the cost is not quadratic we can use this result to set up an

iterative optimisation scheme:
Iteration is repeated until convergence of the cost function
and/or convergence of the parameter
is reached.
Marco Lovera
This simple iterative scheme is known as the Newton-

Raphson method.
Other possible convergence criteria include:
• Relative rather than absolute changes in cost and/or

parameters.
• Gradient of the cost sufficiently close to zero.
Marco Lovera
How do we compute the gradient and the hessian of the

cost?
Recall that
and therefore
problem to
compute
0
Marco Lovera
This is a vector with components given by
As for the second derivative, element-wise we get ( element by element )
. to
free if we have already

must be computed
compute the gradient
Marco Lovera
For the sake of simplicity in the expression of the second

derivative
the second term is often neglected (this avoids the need to

compute the second derivative of the model output), so that
Note that the approximate hessian is still symmetric.

The resulting approximate algorithm is called Gauss-Newton.
calculating the derivative of the output instead of cost function
we have only shift the problem in
Marco Lovera
To complete the definition of the method we need a scheme

to compute the sensitivities of the model output with respect
to the parameters.
This can be done either numerically or analytically.
The numerical approach is unavoidable whenever nonlinear

models are considered.
Sensitivities can be computed
• Using forward differences

• Using central differences.
Marco Lovera
£9
'
Using forward differences we get =µ0§g|← jth component
( r ÷ 5 %)
the perturbation should be small – general guideline: 1% of

the current value of the parameter component.
Clearly the computation of the vector of sensitivities requires

} It If mmilaton of the model is net long
⇒ is
useful
otherwise It Keeps too long time → available for not
simulations of the response of the model to the sampled to much complex
model
input.
Marco Lovera
Using central differences instead we get
In this case the computation of the vector of sensitivities

requires
simulations of the response of the model to the sampled

input, but the computed sensitivities are significantly more
accurate.
Marco Lovera
linear model
to more useful for
The analytical approach, on the other hand, starts from the
model equations:
Differentiating with respect to a component of the parameter

vector we get for the state equation we can modify this
ni
no condition if we want
include I. C.
to
.
parameters
into the
estimated
to be
is function
and note that
.
given
-
that doesn't depend en parameter
Marco Lovera
Interchanging derivatives on the left had side we get:
and similarly for the output equation
Therefore, it is possible to compute the required sensitivities

by simulating the state space models defined by the above
state and output equations.
Marco Lovera
Local and global minima
• The developed optimisation scheme is local, in the sense

that at each iteration only point-wise information on the
derivatives of the cost are used.
• This means that in general the algorithm may converge to

a different solution depending on the initial guess for the
an
parameters.
←hfi•V
o¥'i→V←
0*0
' =/ GLOBAL MINIMUM LOCAL MINIMA
point
'
! ,
,
i minimum of local all minima stationary

i
minima
.
.
Oo
• This is a key issue with the OE method:

• a reliable initial guess for the parameters is necessary
• careful inspection of the computed estimates is also
necessary, to ensure they are physically meaningful.
Marco Lovera
10/10/17
Asymptotic variance of the OE estimates
ML estimators are efficient, so we expect that
where
Crowner Rae bound
How can we evaluate M?

We reason as follows:
• The estimate has been chosen so as to maximise the likelihood

of the data;
• Therefore maximal probability (corresponding to the expected
value) should be attained at the optimal likelihood. no we can calculate M as
to
function of logl
for Gaumoin
Marco Lovera
As a consequence we can make the following

approximation: I can compute this because I have
, data and theta
Note that for the OE problem
.
MY
doll approximation
|g=g*
this quantity
•
in
given for free from our IAMTWE SCHEME ALGORITHM
•
Jloie Joy +
Yoel a. so +sotffIy*so
do
conditioning !!
:
Marco Lovera M
Therefore we can evaluate M as
Finally, note that the noise variance is also needed.
If it is not known, it can be estimated using the sample

variance, as in the preliminary examples on ML estimation:
Marco Lovera
Confidence intervals for the estimates
• The theory of ML estimation ensures that, asymptotically,

estimates are unbiased and achieve the C-R variance
bound.
• Therefore, asymptotically
*
• As a consequence, having obtained an estimate from a

given dataset, we can define confidence intervals using
properties of Gaussian densities.
Marco Lovera
Confidence intervals for the estimates
• More precisely, letting
• We have, element-wise, that

*
• And in terms of probabilities:

m *
m *
an *
Marco Lovera
Multiple output case
• So far only the case of a scalar measurement has been

considered, for the sake of simplicity.
• In real problems, however, vectors of measurements must

be used, so the output equation is
• And the measurement model becomes
k
R matrix
pxp
Marco Lovera
• Depending on the specific problem, the noise variance R can

range from
• A diagonal matrix, in the case of uncorrelated
niftier
.it
measurements (individual components of the output - Y1
provided by p different sensors) ↳

.- Yz
Yp
• A block-diagonal matrix, in the case of partially correlated
different sensors)
.int#FEHt*E.H
measurements (subvectors of the output provided by
→ a .
,wI
→
3Yn
Yz ( ) Q accelerometer
vector of seampeneuts
→
yq
• Full matrix, in the case of fully correlated measurements.
• The first two cases are the most common in practice.
Marco Lovera
• Therefore, the density of the measurement noise is given

by
and following the same derivation as in the scalar output

case, the density of the measurements results
Marco Lovera
• The likelihood is constructed as before, to get
and the cost function to be minimised becomes
Marco Lovera
• Note that if R is diagonal the cost reduces to the sum of p

costs for each component of the output:
where
↳ Rii
• This highlights the importance of proper scaling of the

measurements in the formulation of the problem. ↳
J ÷i
÷1 e. ÷r → .
yn
→
-3
÷ to
Yz →
- ' 2
ez÷
-6
-6 → is
J3÷ro
÷ eo
y , he you
J=Jn+@
-
don't
we want to minimise care
he
of J3
N .B .
variance
Marco Lovera lmegbgiblejn )
becomes
532=0032
\ YJ =
CYs ⇐
10+6
{ t.tt#geuenPampaterghnY.oEa
no not .
this may lotnnfnskityd
llowlta
loss of
• Finally, the gradient and the hessian of the cost ans the
Fischer information matrix can be generalised as
Marco Lovera
MATLAB FILE : oe .
example .m
a= . a b= -2
S
ftp.axotxbm
S e Mlo )
,
O=[ F)
Ts =
goes nowupluig interval
v( K ) ~G( on ) measurement noise in discrete time → N .B a- 2=1 -3 < vlk ) E 3

⇒ o=n
.
with 99,7%
µH= senlt ) + sentst ) + senlst )
to the words
? We have to estimate a ,b
^)
Is a well posed problem ? -
structural uibntifrability Is
guaranteed because of its features
otherwise Gcsk
¥ →
identifiable
-
look at the input also
( use 2000-3000-4000
for deval samples nomore
)
TIME DOMAIN OUTPUT ERROR IDENT .

For LTI
SYSTEMS
bot tune domain
•
we can use
frequency -
domain and -
the model Is LINEAR I AH )× Ban linear

{
=
+
• now Alo )= oarsman
y=ao)x+Dlo)n force
• we obtain the name eat function

but now
you can
proceed analytically
Fade
DI
=H9s¥oj +
°ff0÷x + 3Bo÷Pn because Tfg=o ( u applied externally Admit )
(¥¥=° depend O on
mafn
guitars go that onbe amputee *
A
.# :] →
FF 't : :]
'
¥k¥t= noises 'x

YET Yolyn :#
nestle
+ +
H÷=aaEgts¥x+so÷u
Now these model we need to solve to compute wuntivitis Is linear
mmulaton
mutinies
nommalx
Mo + L
in
for LH
£ to get
tout
interval than the

• sampling → SHANNON -
NYRUST TIME Is 5÷eo tunes shorter
fattest dynamics of interest
Ts arbitrarily small is not a

good idea -
# data points to
-
tegainintouusef information onthenaam .

Introduction on PREFILTERWG

2 - 2 Time Domain Output Error Identification PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 - 2 Time Domain Output Error Identification PDF

Uploaded by

Copyright:

Available Formats

05/10/17

Time-domain output-error identification

Model class: } only uncertainty in measurement

Assumptions: Xcok × . initial state

• u(t) piece-wise constant with period Ts

• or equivalently true system belongs to the net

I to keep things mriple )

• Measurements are discrete easier

• Sampling is uniform and defined by

true value measurements

• Under the previous assumptions the samples of the

as y(k) is a deterministic sequence. immune

• In terms of variance we have

• We have to check independence of the measurements,

• Expanding the product:

• And recalling that samples of y are deterministic:

• Hence the name output-error for this set-up:

The joint density of the data can then be written as

so the logarithm of the joint density is

• Therefore the log-likelihood can be obtained by plugging

• Note now that maximizing the log-likelihood is equivalent

• Note that defining

• Therefore the cost function is equal to the sum of the

• This is a particular case of ML estimation known as Least

• Note further that under the assumption

• The optimal cost is zero only in the noise-free case.

Taking a second order approximation imposing stationarity

Solving for the increment in the parameter we get

Therefore if the cost is truly quadratic then starting from the

If the cost is not quadratic we can use this result to set up an

Iteration is repeated until convergence of the cost function

and/or convergence of the parameter

This simple iterative scheme is known as the Newton-

Other possible convergence criteria include:

• Relative rather than absolute changes in cost and/or

• Gradient of the cost sufficiently close to zero.

How do we compute the gradient and the hessian of the

This is a vector with components given by

As for the second derivative, element-wise we get ( element by element )

free if we have already

For the sake of simplicity in the expression of the second

the second term is often neglected (this avoids the need to

Note that the approximate hessian is still symmetric.

To complete the definition of the method we need a scheme

This can be done either numerically or analytically.

The numerical approach is unavoidable whenever nonlinear

Sensitivities can be computed

• Using forward differences

Using forward differences we get =µ0§g|← jth component

the perturbation should be small – general guideline: 1% of

Clearly the computation of the vector of sensitivities requires

Using central differences instead we get

In this case the computation of the vector of sensitivities

simulations of the response of the model to the sampled

Differentiating with respect to a component of the parameter

that doesn't depend en parameter

Interchanging derivatives on the left had side we get:

and similarly for the output equation

Therefore, it is possible to compute the required sensitivities

• The developed optimisation scheme is local, in the sense

• This means that in general the algorithm may converge to

i minimum of local all minima stationary

• This is a key issue with the OE method: