Lecture c4 System Identification

– Control Theory –
C4. Time domain system identification

Jan Swevers
September 2020
Objectives :
• The student has basic knowledge on system identification in general and
on least-squares time-domain identification (LS-TDI) in particular.
• The student is able to apply the LS-TDI procedure to linear time-
invariant systems with one input and one output. This includes selection
of model structure, evaluation of model accuracy, application of simple
measures to improve model accuracy.
0-0
Introduction to least-squares time domain identification 1/92
Outline of this chapter

• What is system identification
• The system identification procedure
• Models and prediction
• Time domain parameter estimation
• Practical issues of least-squares ARX model parameter estimation
Reference material: course notes.
[H00S3A-H00S4A-H04X3A-H04X3B]
What is system identification

• Definition of system identification : Selection of a model for a process
(i.e. studied system or device under test (DUT)), using a limited number of
measurements of the input and outputs, which may be disturbed by noise,
and a priori system knowledge.
• Definition of parameter estimation : The experimental determination
of values of parameters that govern the dynamic behavior, assuming that
the structure of the process model is known
• We limit ourselves to linear time-invariant (LTI) systems and models.
The system identification procedure

4 basic steps:
1. Collect input-output measurement data
2. Select a model structure to represent the system
3. Match the selected model structure to the measurements (parameter
estimation)
4. Validate the model
Collect input-output measurement data

• Measure the response (input/output) while the system is in normal
operation. No freedom to select excitation!
• Perform a dedicated experiment that actively excites the system:
– Satisfy the condition of persistency of excitation: the excitation should
be sufficiently rich such that all modes of the system are excited and
observable in the output sequence.
– More desirable: design/select the excitation that results in maximally
informative data, subject to constraints that may be at hand, in order
to minimize the model uncertainty. A priori information on the system
is very important to support this design/selection.
– Sampling frequency: in principle twice the highest frequency of interest,
in practical cases at least 10 times higher than the highest frequency.
Take same sampling frequency as the one used to implement the
controller.
Choosing a convenient model set

• Make a choice within all possible mathematical models that can be used to
represent the system: choice is very important but often difficult.
• Continuous- or discrete-time models: depends on the further application of
the model/measurement configuration.
• For digital control applications: discrete-time models (see later).
• A priori available system information can help:
– certain physical laws are known to hold true for the system,
– preliminary data analysis: step or frequency response.
• If no information is available: apply trial-and-error procedure.
Match the selected model structure to the measurements (parameter

estimation)
• Determine within the set of models, the model that is the “best”
approximation or provides the “best” explanation of the observed data.
• We need a criterion to measure the model quality : the estimation of the
model parameters corresponds to the minimization of the chosen criterion.
• The choice of the criterion is extremely important because it determines
the stochastic properties of the estimator.
• This criterion or cost function defines a distance between the experimental
data and model: can be chosen on an ad-hoc basis using intuitive insight,
or more systematically based on stochastic arguments.
Validation the model

• How do you know if the model is satisfactory: use it and check if it serves
its purpose.
• This is often too dangerous: model validation criteria to get some feeling
on the accuracy, confidence on its value.
– Simulate and compare with different sets of measurements: in practice,
the best model (yielding the smallest errors) is not always preferred:
often simpler models that describe the system within user-specified
error bounds are preferred.
– Compare parameter estimates with expectations or values found using
other measuring techniques (if available).
• During validation, keep the application in mind (determines what
properties are critical), test the model under the same conditions, avoid
extrapolation as much as possible.
The identification loop
The measurement configuration

Two most common measurement configurations: continuous-time (A) and
zero-order-hold (zoh) (B) measurement configuration
Continuous-time measurement configuration

• External excitation source generates continuous-time band-limited signal.
• Input and output signals are measured with measurement equipment that
samples these continuous-time signals synchronously.
• Both measured input and output signals can be perturbed by measurement
noise,
• Dynamics of sampling/DAC are not present in measurements.
• Continuous-time models are appropriate.
zoh-measurement configuration
• The excitation signal is generated by the measurement device that also
measures the system output. This measurement device is typically a digital
control computer also used to control the system.
• The excitation signal is a discrete-time signal u(kT ), free of noise.
• Through a zero-order-hold interpolation performed by the digital-to-analog
converter (DAC) of the control computer a continuous-time signal u(t) is
generated, a sequence of steps, that is applied to the (continuous-time)
system.
• At the system output, the continuous-time signal y(t) is sampled yielding
the discrete-time signal y(kT ) that is stored in the control computer
memory.
• The sampled output signal may be perturbed by measurement noise,
• Discrete-time models including dynamics of zoh-interpolation are
appropriate.
zoh-measurement configuration (2)

zoh-configuration corresponds to digital control computer implementation.

Relation discrete-time and continuous-time signals in zoh measurement
configuration.

Relation between discrete-time signal u(kT ) (grey dots) and continuous-time
signal u(t) (black line).
Relation between continuous-time signal y(t) (black line) and discrete-time

signal y(kT ) (grey dots).
Models and prediction

Overview
• Discrete-time representation of continuous-time systems
• Prediction
• Discrete-time input-output models for linear time-invariant systems
Discrete-time representation of continuous-time systems

Discrete-time models for the zoh measurement configuration
• Output of a linear time-invariant system for a given input u(t) and impulse
response g(t):
Z ∞
y(t) = g(τ )u(t − τ )dτ.
τ =0
• The Laplace transform of the impulse response {g(τ )}∞

τ =0 is called the
transfer function G(s):
d0 snd + d1 s(nd −1) + · · · + dnd

G(s) = n (n −1)
,
c0 s + c1 s
c c + · · · + cnc
with nd ≤ nc (system is proper).
• We consider the output only at discrete times tk = kT , for k = 1, 2, ...:
Z ∞
y(kT ) = g(τ )u(kT − τ )dτ.
τ =0
Discrete-time models for the zoh measurement configuration (2)

• Due to zero-order hold (zoh) conditions, the input u(t) is kept constant
between the sampling instants:
u(t) = uk , kT ≤ t < (k + 1)T.
• This yields:
Z ∞ ∞ Z
X lT
y(kT ) = g(τ )u(kT − τ )dτ = g(τ )u(kT − τ )dτ,
τ =0 l=1 τ =(l−1)T
∞ ∞
!
X Z lT X
= g(τ )dτ uk−l = gT (l)uk−l ,
l=1 τ =(l−1)T l=1
where we define the (discrete) impulse response of that system:

Z lT
gT (l) = g(τ )dτ.
τ =(l−1)T

• We omit T :
∞
X
y(t) = g(k)u(t − k), for t = 1, 2, ...
k=1
• The z-transform of the discrete impulse response {g(k)}∞

k=1 is called the
discrete-time transfer function G(z):
b0 z nb + b1 z (nb −1) + · · · + bnb

G(z) = ,
a0 z na + a1 z (na −1) + · · · + ana
with nb < na (due to the strict causality condition, i.e. g(k) = 0, for
k = 0).
• The relationship between the parameters of the transfer function of
continuous time system and the parameters of its zoh discrete-time
equivalent can be calculated using published tables or CACSD (MATLAB)
software.

• Previous result (g(k) = 0, for k = 0 and nb < na ) is correct if the
continuous-time system G(s) is strictly proper, that is nd < nc .
• If continuous-time system is bi-proper, that is nd = nc , both continuous
and discrete-time system have a direct feed through between input and
output and hence for the discrete-time system orders of numerator and
denominator are (also) equal nb = na and g(k) 6= 0, for k = 0.
• g(k) = 0, for k = 0 does not mean that the input is delayed by one
sampling time T . The effective delay is approximately T /2.
• To illustrate this consider following continuous-time system:
ωn2
G(s) = 2 2
,
s + 2ζωn s + ωn
with ωn = 2 × π rad/s and ζ = 0.7.

In following figure:
• Blue: the (continuous-time) impulse response of G(s)
• Red: the (continouse-time) response of G(s) to the ZOH realization of a
Dirac impulse which is a block pulse of width T and height 1/T . T = 0.1 s.
• Red dots: Samples of red curve with sampling time T = 0.1 s.
• An effective delay of approximately T /2 = 0.05 s can be observed.
Timing diagram
• The derived ZOH relation assumes following timing of the input and
output:
compute u
k k +1 k +2
yk yk+1 yk+2
uk uk+1 uk+2 t/Ts
Timing diagram (2)

• In a feedback control configuration, a delay ∆k is introduced between
measurement of the output and sending out the control command (D/A
conversion). This time delay is due to the processing of the output
measurement to obtain the control command.
• This delay is not known in advance (depends on required processing time)
and can vary.
• In most applications this delay is small compared to sampling period and
hence can be neglected. For the project of this course, this delay can be
significant when implementing an (E)KF.
compute u
k k +1 k +2
yk yk+1 yk+2
∆k ∆k+1 ∆k+2
uk uk+1 uk+2 t/Ts
Timing diagram (3)

• The real-time software MicroOS used on the MECOtrons in your project
stores the control command calculated during discrete-time interval k in a
memory buffer until discrete-time instance k + 1.
compute u
store u before sending it out
k k +1 k +2
yk yk+1 yk+2
∆k ∆k+1 ∆k+2
uk uk+1 t/Ts
• Advantage: the effective delay between measurement and control action is

constant and known.
• Disadvantage: this effective delay is increased to about 3T /2 s.
• Remark: this buffering operation also takes place during the identification
experiment.
Introducing disturbances
• We assume that the disturbances and noise can be lumped into an additive
term v(t) at the output:
∞
X
y(t) = g(k)u(t − k) + v(t).
k=1
• Sources of disturbances:
– Measurement noise: the sensors that measure the signals are subject to
noise and drift.
– Uncontrollable inputs: the system is subject to signals that have the
character of inputs, but are not controllable by the user.
• This model does not consider input disturbances, for example noise on the
measurements of the input data sequence.
Introducing disturbances (2)

• Time-domain identification approach to model disturbances:
∞
X
v(t) = h(k)e(t − k),
k=0
where e(t) is a sequence of independent (identically distributed) random

variables with a certain probability density function, and h(0) = 1.
• The mean value of e is equal to zero, yielding:
∞
X
E{v(t)} = h(k)E{e(t − k)} = 0.
k=0
• The covariance of v(t) equals:

∞
X
E{v(t)v(t − τ )} = σ 2 h(k)h(k − τ ).
k=0
σ 2 is the variance of e(t).

Introducing disturbances (3)

Introducing delay/forward shift operators, e.g.
qu(t) = u(t + 1), q −1 u(t) = u(t − 1).
yield:
∞
X ∞
X
y(t) = g(k)u(t − k) + h(k)e(t − k),
k=1 k=0
X∞ ∞
X
= g(k)q −k u(t) + h(k)q −k e(k),
k=1 k=0
"∞ # " ∞
#
X X
= g(k)q −k u(t) + h(k)q −k e(t),
k=1 k=0
= G(q)u(t) + H(q)e(t),
with:
∞
X ∞
X
G(q) = g(k)q −k , H(q) = h(k)q −k .
k=1 k=0
Prediction
The prediction of future outputs of a system is most essential for the
development of the time-domain identification methods discussed here.
One-step-ahead prediction of v(t)
∞
X
v(t) = H(q)e(t) = h(k)e(t − k).
k=0
We assume that H(q) is stable:

∞
X
|h(k)| < ∞,
k=0
and define the z-transform of its impulse response :

∞
X
H(z) = h(k)z −k .
k=0
One-step-ahead prediction of v(t) (2)

We assume that H(q) is invertible: that is, if v(s), s ≤ t, are known, then we
shall be able to compute e(t) as:
∞
X
e(t) = H̃(q)v(t) = h̃(k)v(t − k),
k=0
P∞
with H̃(q) stable, i.e. k=0 |h̃(k)| < ∞.
Then the function 1/H(z) is analytic in |z| ≥ 1 (which means that 1/H(z) is
stable or that H(z) does not have zeros on or outside the unit circle) :
∞
1 X
= h̃(k)z −k .
H(z)
k=0
Hence we can define:

∞
X
H̃(q) = H −1 (q) = h̃(k)q −k .
k=0
One-step-ahead prediction of v(t) (3)

Assume now that we have observed v(s) for s ≤ t − 1 and that we want to
predict the value of v(t) (one-step ahead).
∞
X
v(t) = e(t) + h(k)e(t − k).
k=1
The knowledge of v(s) for s ≤ t − 1 implies the knowledge of e(s) for s ≤ t − 1.

So replace e(t) by the value for which its p.d.f. has its maximum, yielding the
most probable value of v(t) (called the maximum a posterior prediction
(MAP)):
∞
X
v̂(t|t − 1) = 0+ h(k)e(t − k),
k=1
= [H(q) − 1] e(t),
H(q) − 1 −1

= v(t) = 1 − H (q) v(t).
H(q)
One-step-ahead prediction of y(t)

Assume that y(s) and u(s) are known for s ≤ t − 1. Hence, v(s) are known for
s ≤ t − 1:
v(s) = y(s) − G(q)u(s).
Since G(q)u(t) does not include u(t), the one-step-ahead prediction of y(t)
equals:
ŷ(t|t − 1) = G(q)u(t) + v̂(t|t − 1),

−1

= G(q)u(t) + 1 − H (q) v(t),
−1

= G(q)u(t) + 1 − H (q) [y(t) − G(q)u(t)] ,
−1 −1

= H (q)G(q)u(t) + 1 − H (q) y(t).
The prediction error equals:
ε(t|t − 1) = y(t) − ŷ(t|t − 1) = −H −1 (q)G(q)u(t) + H −1 (q)y(t) = e(t).
The variable e(t) represents that part of y(t) that cannot be predicted from
past data: the innovation at time t.
Discrete-time input-output models for linear time-invariant systems

• Rather than using h(k) and g(k) with an infinite number of parameters.
• Use parameterizations of G(q) and H(q) that are finite.
• They correspond to those used in the Matlab System Identification
Toolbox.
• Extend the model with a parameter vector θ:
y(t) = G(q, θ)u(t) + H(q, θ)e(t).
• Prediction error:
ε(t|t − 1, θ) = y(t) − ŷ(t|t − 1, θ) = −H −1 (q, θ)G(q, θ)u(t) + H −1 (q, θ)y(t).
• ε(t|t − 1, θ) = e(t) (a sequence of independent random variables) if θ = θ 0

(exact parameter vector).
Equation error model structure

• ARX model structure
• ARMAX model structure
• Other equation-error-type and output-error-type model structures
We discuss only the ARX model structure!
ARX model structure

The most simple input-output model: a linear difference equation
y(t)+a1 y(t−1)+. . .+ana y(t−na ) = b1 u(t−nk )+. . .+bnb u(t−nk −nb +1)+e(t),
The white-noise term e(t) enters as a direct error in the difference equation:
equation error model.
The model parameter vector:
h iT
θ = a1 a2 ... a na b1 . . . b nb .
If we introduce:
A(q, θ) = 1 + a1 q −1 + . . . + ana q −na ,

B(q, θ) = b1 + . . . + bnb q −nb +1 ,
we get:
A(q, θ)y(t) = B(q, θ)u(t − nk ) + e(t).
ARX model structure (2)

This corresponds to:
B(q, θ) 1
G(q, θ) = q −nk , H(q, θ) =
A(q, θ) A(q, θ)
ARX model: AR refers to the autoregressive part A(q, θ)y(t) and X to the
extra input B(q, θ)u(t − nk ) (called the eXogeneous variable in econometrics).
ARX model structure: linear regressor

One-step-ahead prediction:
ŷ(t|θ) = B(q, θ)u(t − nk ) + [1 − A(q, θ)]y(t).
Now introduce the vector:

h iT
ϕ(t) = −y(t − 1) . . . −y(t − na ) u(t − nk ) . . . u(t − nk − nb + 1) .
ŷ(t|θ) = θ T ϕ(t) = ϕT (t)θ.

Model is linear in the parameters: linear regression model.
The vector ϕ(t) is known as the regression vector.
Other model structures
Figuur 1: *
ARMAX
Figuur 2: *
ARARMAX
Other model structures (2)
Figuur 3: *
Output error model structure
Figuur 4: *
Box-Jenkings model structure
Time domain parameter estimation

Overview
• Parameter estimation
• Prediction error identification method (PEM)
• Least-squares parameter estimation for ARX models
Parameter estimation
• We have selected a certain set of candidate models :
M∗ = {M(θ)|θ ∈ DM }.
• Each model M(θ) represents a way of predicting future outputs
M(θ) : ŷ(t|t − 1) = [1 − H −1 (q, θ)]y(t) + H −1 (q, θ)G(q, θ)u(t).
• We are also in the situation that we have collected a batch of data from the
system:
T
Z N = [y(1), u(1), y(2), u(2), . . . , y(N ), u(N )] .
• Parameter estimation: mapping the data Z N to the set DM :
Z N → θ̂ N ∈ DM .
Prediction error identification method (PEM)

General PEM formulation
• We consider that the essence of a model is its prediction aspect.
ε(t, θ ∗ ) = y(t) − ŷ(t|θ ∗ ).
• The prediction error sequence for Z N can be seen as a vector in IRN .

• The ”size” of this vector (norm) can be taken as a measure for the
“quality” of a model.
• Let the prediction error sequence be filtered through a stable linear filter
L(q):
εf (t, θ) = L(q)ε(t, θ), 1 ≤ t ≤ N.
Then use the following form:
N
1 X
VN (θ, Z N ) = l(εf (t, θ)).
N t=1
where l(·) is a scalar-valued function.

General PEM formulation (2)

The estimate θ̂ N corresponds to:
θ̂ N = arg minθ ∈DM VN (θ, Z N ).
This approach is called a prediction error identification method (PEM).

Quadratic norm
Take:
1 2
l(ε) = ε .
2
and omitting the filter yields:
N
N 1 X 1 2
VN (θ, Z ) = ε (t, θ),
N t=1 2
with
ε(t, θ) = H −1 (q, θ) [y(t) − G(q, θ)u(t)] .
Least-squares parameter estimation for ARX models

Formulation of LSE
• Assume that the system that has to be identified behaves according to an
ARX model structure.
• Assume e(t) ∈ N (0, σ 2 ): a sequence of independent zero-mean Gaussian
random variables:
E{e(t)} = 0,
E{e(t)e(t1 )} = δ(t, t1 )σ 2 ,
where E{} denotes the expected value, and δ denotes the Kronecker delta.
• Prediction error:
ε(t|t − 1, θ) = y(t) − ŷ(t|t − 1, θ).
Formulation of LSE (2)

• For ARX model yields:
ŷ(t|t − 1, θ) = ϕT (t)θ,
with
h iT
ϕ(t) = −y(t − 1) . . . −y(t − na ) u(t − nk ) . . . u(t − nk − nb + 1) .
• Combining quadratic criterion (omitting 1/N ) with the prediction error

expression yields the following least squares (LS) criterion:
N
N
X 1 T
2
VN (θ, Z ) = y(t) − ϕ (t)θ .
t=1
2
• The so-called least-squares estimate (LSE) equals:

" N
#−1 N
LS 1 X 1 X
θ̂ N = arg minθ VN (θ, Z N ) = ϕ(t)ϕT (t) ϕ(t)y(t).
N t=1 N t=1
Matrix formulation of the LSE

• Introduce the following N − na -dimensional column vectors:
h iT
ŷ(θ) = ŷ(na + 1|θ) ŷ(na + 2|θ) . . . ŷ(N |θ),
h iT
y = y(na + 1) y(na + 2) . . . y(N ) ,
• and following (N − na ) × d matrix (d is the number of model parameters):

 
ϕT (na + 1)
 
 T
 ϕ (na + 2) 

Φ= .
 .. 

 . 

ϕT (N )
• The vector containing the output predictions then equals:
ŷ(θ) = Φθ
Matrix formulation of the LSE (2)

• The vector containing the prediction errors equals:
ε(θ) = y − ŷ(θ).
• The LS criterion equals:

N1
VN (θ, Z ) = ε(θ)T ε(θ),
2
and the LSE equals:
LS
h i−1
θ̂ N = ΦT Φ ΦT y.
| {z }
Φ+
• The output predictions for the LSE equal:
LS
h i−1
T
ŷ(θ̂ N ) = Φ Φ Φ ΦT y.
• Calculate the pseudo inverse using the singular value decomposition.

Consistency of the LSE

• A parameter estimate is consistent if the estimate θ(Z N ) converges in
probability to the true parameter vector which is indicated as θ0 .
plimN →∞ θ(Z N ) = plimθ(Z N ) = θ0
which means that the probability P (|θ(Z N ) − θ0 | > ) → 0 for N → ∞, for

every .
Consistency of the LSE (2)

• What if the innovations do not satisfy these conditions? Consider:
y(t) = ϕT (t)θ 0 + ν0 (t).
{ν0 (t)} is called the regression-equation error sequence.
• Introduce the matrices :
N
1 X
R(N ) = ϕ(t)ϕT (t)
N t=1
and
N
1 X
f (N ) = ϕ(t)ν0 (t).
N t=1
• The LSE equals:
N
LS −1 1 X T
θ̂ N = R(N ) ϕ(t) ϕ (t)θ 0 + ν0 (t) ,
N t=1
= θ 0 + R(N )−1 f (N ).

• The probability limits of these matrices (we assume they exist) are:
R∗ = plim R(N ),
h∗ = plim f (N ).
• The LSE has a probability limit:

LS −1

plim θ̂ N = plim θ 0 + R(N ) f (N ) ,
= θ 0 + plim R(N )−1 plim f (N ),
= θ 0 + (R∗ )−1 h∗ .

• For the LSE to be consistent, we thus have to require that:
(i) R∗ is nonsingular.
(ii) h∗ = 0. This will be the case if either:
(iia) {ν0 (t)} is a sequence of independent random variables with zero
mean values : E{ϕ(t)ν0 (t)} = 0.
(iib) The input sequence {u(t)} is independent of the zero-mean noise
sequence {ν0 (t)} and na = 0.
• In cases (i) and (iia) it can be shown that the random variable
√ LS
N (θ̂ N − θ 0 )
converges in distribution to the normal distribution with zero mean and

covariance matrix λ0 (R∗ )−1 , where λ0 is the variance of ν0 (t). Experiment
design issues therefore deal with the problem of making R∗ “large” subject
to given constraints.
Example
• Consider the following system (not an ARX model structure !!!):
B(q)
y(t) = u(t − 1) + e(t),
A(q)
with
A(q) = 1 + a1 q −1 ,
B(q) = b1 + b2 q −1 .
which can be rewritten as:
y(t) + a1 y(t − 1) = b1 u(t − 1) + b2 u(t − 2) + e(t) − a1 e(t − 1).
• This relation can be written as: y(t) = ϕT (t)θ 0 + ν0 (t), with:

h iT
ϕ(t) = −y(t − 1) u(t − 1) u(t − 2) ,
ν0 (t) = e(t) − a1 e(t − 1).
Example (2)
• The sequence ν0 (t) is not a sequence of independent random variables. For
example, ν0 (t) and ν0 (t − 1) are related since they both depend on e(t − 1).
As a result, ν0 (t) also relates to y(t − 1), i.e. it relates to some of the
elements of φ(t).
• In matrix form:
     
y(3) −y(2) u(2) u(1)
  ν0 (3)
    a  
1
y(4) −y(3) u(3) u(2)   ν0 (4)
     
     
=   b1 + .
.. .. .. ..   ..

     
 .   . . .   . 
   b2  
y(N − 2) −y(N − 1) u(N − 1) u(N ) ν0 (N − 2)
Practical issues of least-squares ARX model

parameter estimation
Overview
• Frequency domain interpretation of the quadratic prediction error criterion
• Low-pass data filtering to improve parameter estimation
• Iterative weighted least squares approach
• Identifying a partially known system
Frequency domain interpretation of the quadratic prediction error

criterion
Frequency domain interpretation: take the DFT of the prediction error and
apply Parseval’s theorem:
N −1
1 1 X
VN (θ, Z N ) = |EN (2πk/N, θ)|2 ,
N2
k=0
N −1
1 X 1 j2πk/N j2πk/N 2 j2πk/N
= 2
| ĜN (e ) − G(e , θ)| QN (e , θ) + RN ,
N 2
k=0
with
|UN (ej2πk/N )|2 C
QN (2πk/N, θ) = , RN ≤ √ .
|H(ej2πk/N , θ)|2 N
ĜN (ej2πk/N ) is called the empirical transfer function estimate (ETFE) and
equals:
j2πk/N YN (ej2πk/N )
ĜN (e )= j2πk/N
.
UN (e )
Frequency domain interpretation for ARX model structure

For the ARX model structure,
1
H(q, θ) = ,
A(q, θ)
yielding:
VN (θ, Z N ) ≈
N −1
1 X 1 j2πk/N j2πk/N 2 j2πk/N 2 j2πk/N 2
|Ĝ N (e ) − G(e , θ)| |UN (e )| |A(e , θ)| .
N2 2
k=0
The difference between the empirical transfer function estimate and the
frequency response function of the model is weighted by two terms:
periodogram of the input and the frequency response of the denominator of the
model.
Low-pass data filtering to improve parameter estimation

Example
Consider:
B(q) b1 q −1 + b2 q −2
=
A(q) 1 + a1 q −1 + a2 q −2
0.0484q −1 + 0.0479q −2
= −1 −2
,
1 − 1.8727q + 0.9691q
which is the zero-order-hold discrete-time equivalent for a sampling rate
fs = 100 Hz of a second order continuous-time system with undamped natural
frequency fn = 5 Hz, damping ratio ζ = 0.05 and a DC-gain equal to one.
Example (2)
Excited with a step-up and step-down excitation:
1.5
input
0.5
−0.5
0 200 400 600 800 1000
time [samples]
Figuur 5: *
Input excitation for second order system
Example (3)
Gaussian noise is added to the simulated output:
B(q)
y(t) = u(t) + e(t),
A(q)
with e(t) ∈ N (0, σ 2 ) with σ = 0.05.
1.5
output
0.5
−0.5
−1
0 200 400 600 800 1000
time [samples]
Figuur 6: *
Noise-free (red) and noisy (blue) response to step-up and down input excitation
Example (4)
LSE:
 
y(2)
 
y(3)
 
 
y = 
 ..
,


 . 

y(999)
 
−y(1) −y(0) u(1) u(0)
 
−y(2) −y(1) u(2) u(1)
 
 
Φ =  ,
 .. .. .. 

 . . . 

−y(998) −y(997) u(998) u(997)
and
h iT
θ̂ LS = â1 â2 b̂1 b̂2 = Φ+ y.
Example (5)
Inaccurate results due to high frequency weighting introduced by denominator
|A(ej2πk/N , θ)|2 .
20
Magnitude [dB]
10
−10
−20
0 5 10 15
0
Phase [degrees]
−50
−100
−150
−200
−250
0 5 10 15
Frequency [Hz]
Figuur 7: *
Magnitude and phase of exact model (red) and identified model (blue)
Example (6)
• Next, the identification is repeated after filtering the input and output data
with the same low-pass filter.
• Second-order Butterworth filter with a cut-off frequency of 7 Hz. Both
input and output signals are filtered with the same filter such that the
relation between the filtered signals does not alter.
VN (θ, Z N ) ≈
N −1
1 X 1 2 j2πk/N 2 j2πk/N 2 j2πk/N 2
2
| ĜN − G(θ)| |UN (e )| |A(e , θ)| |H f (e )| ,
N 2
k=0
with Hf (ej2πk/N ) the frequency response of the applied filter at the

considered frequencies.
• In this criterion, the high-pass characteristic of A(ej2πk/N , θ) is
compensated by the low-pass characteristic of Hf (ej2πk/N ).
Example (7)
Improved result:
20
Magnitude [dB]
10
−10
−20
0 5 10 15
0
Phase [degrees]
−50
−100
−150
−200
−250
0 5 10 15
Frequency [Hz]
Figuur 8: *
Magnitude and phase of exact model (red) and identified models obtained
without data filtering (blue) and with data filtering (green) respectively
Example (8)
1.5
output
0.5
−0.5
−1
0 200 400 600 800 1000
time [samples]
Figuur 9: *
Time-domain evaluation: exact output (red) and simulated output with
identified models obtained without data filtering (blue) and with data filtering
(green) respectively
Iterative weighted least-square approach

Sanathanan Koerner procedure
Step 1: Identify the model using the measured data using either no filter or a
simple low-pass filter. The obtained model is denoted as B(q, θ̂ 1 )/A(q, θ̂ 1 ).
Step 2: Repeat the identification using the measured data filtered with the
following low-pass filter obtained from the denominator of the model
obtained in step 1: Hf1 (q) = 1/A(q, θ̂ 1 ). The obtained model is denoted as:
B(q, θ̂ 2 )/A(q, θ̂ 2 ).
Step 3: . . .
Step i: Repeat the identification using the measured data filtered with the
following low-pass filter obtained from the denominator of the model
obtained in step i − 1: Hf(i−1) (q) = 1/A(q, θ̂ (i−1) ). The obtained model is
denoted as: B(q, θ̂ i )/A(q, θ̂ i ).
Sanathanan Koerner procedure (2)

• Iteration step i:
VN (θ, θ̂ (i−1) , Z N ) ≈
N −1
1 X 1 2 j2πk/N 2 |A(ej2πk/N , θ)|2
|ĜN − G(θ)| |UN (e )| .
N2 2 |A(e j2πk/N , θ̂ (i−1) )| 2
k=0
• If the iterative procedure converges, and at a certain point no further

model improvement is obtained, A(ej2πk/N , θ̂ (i−1) ) will come arbitrarily
close to A(ej2πk/N , θ̂ (i ) and the high-frequency emphasis introduced by
A(ej2πk/N , θ̂ (i ) will be completely compensated.
Example revisited
20
Magnitude [dB]
10
−10
−20
0 5 10 15
0
Phase [degrees]
−50
−100
−150
−200
−250
0 5 10 15
Frequency [Hz]
Figuur 10: *
Magnitude and phase of exact model (red) and identified models obtained in
the first (blue), second (green) and third (yellow) step of the iterative
procedure, respectively
Example revisited (2)
0.04
0.03
0.02
simulated output error

0.01
−0.01
−0.02
−0.03
−0.04
0 200 400 600 800 1000
time [samples]
Figuur 11: *
Time-domain evaluation: difference between exact model output and simulated
output with model identified with low-pass data filtering (green) and model
from third step of[H00S3A-H00S4A-H04X3A-H04X3B]
the iterative procedure (yellow), respectively
Identifying a partially known system

Approach: scheme 1
• Assume that some poles and/or zeros of the system are known:
Bu (q, θ) Bk (q)
G(q, θ) = Gu (q, θ)Gk (q) = q −nk ,
Au (q, θ) Ak (q)
with Bk (q)/Ak (q) the known part of the system model, and
q −nk Bu (q, θ)/Au (q, θ) the unknown part of the system model, dependent of
the parameter vector θ.
• Introducing the ARX model structure:
Ak (q)Au (q, θ)y(t) = Bk (q)Bu (q, θ)u(t − nk ) + e(t).
Approach: scheme 1 (2)

• Now filter y(t) and u(t) with Ak (q) and Ak (q) respectively,
y ∗ (t) = Ak (q)y(t),
u∗ (t) = Bk (q)u(t),
we get a similar ARX model structure but now with only the unknown
part of the system model:
Au (q, θ)y ∗ (t) = Bu (q, θ)u∗ (t − nk ) + e(t),
which is equivalent to:
∗ Bu (q, θ) ∗ 1
y (t) = u (t − nk ) + e(t),
Au (q, θ) Au (q, θ)
and referred to as scheme 1.
Approach: scheme 2
• Filter only the input with the known part of the model:
Bk (q)
u∗ (t) = u(t),
Ak (q)
yielding:
Bu (q, θ) ∗ 1
y(t) = u (t − nk ) + e(t).
Au (q, θ) Ak (q)Au (q, θ)
The corresponding ”ARX” model structure equals:
1
Au (q, θ)y(t) = Bu (q, θ)u∗ (t − nk ) + e(t).
Ak (q)
Approach: scheme 3
• Filter the output with the inverse of the known part of the model:
Ak (q)
y ∗ (t) = y(t),
Bk (q)
yielding:
Bu (q, θ) 1
y ∗ (t) = u(t − nk ) + e(t).
Au (q, θ) Bk (q)Au (q, θ)
The corresponding ”ARX” model structure equals:
1
Au (q, θ)y ∗ (t) = Bu (q, θ)u(t − nk ) + e(t).
Bk (q)
Frequency domain interpretation
VN (θ, Z N ) ≈
N −1
1 X 1 YN∗ 2 ∗ j2πk/N 2 j2πk/N 2
2
| ∗ − Gu (θ)| |UN (e )| |A u (e , θ)| ,
N 2 UN
k=0
with
• for scheme 1: u∗ = Bk u and y ∗ = Ak y,
• for scheme 2: u∗ = Bk /Ak u and y ∗ = y,
• and for scheme 3: u∗ = u and y ∗ = Ak /Bk y.
Remarks
• If data filtering is applied to improve the accuracy of the parameter
estimate, it should compensate for the high frequency weighting
|Au (ej2πk/N , θ)|.
∗
• Depending on the applied scheme, also |UN (ej2πk/N )| can emphasis certain
frequencies and should be checked.
• In some situations, also low frequency distortions are present (DC-offset
and/or drift on measurements), also on inputs! These can be amplified by
the pre-filtering with the known part of the model, e.g. if the system
contains a pure integration or differentiation ... Then, apply band-pass
filters to remove these low frequency distortions.
Example
Consider system:
B(q) b1 q −1 + b2 q −2 + b3 q −3
=
A(q) (1 − q −1 )(1 + a1 q −1 + a2 q −2 )
0.1656 e−3 q −1 + 0.6580 e−3 q −2 + 0.1651 e−3 q −3
= −1 −1 −2
,
(1 − q )(1.0000 − 1.8962q + 0.9937q )
which is the zero-order-hold discrete-time equivalent for a sampling rate
fs = 100 Hz of a second order continuous-time system with undamped natural
frequency fn = 5 Hz, damping ratio ζ = 0.01 augmented with an integrator.
Example (2)
The discrete-time system is excited with a random excitation:
input
0
−1
−2
−3
−4
0 200 400 600 800 1000
time [samples]
Figuur 12: *
Input excitation for second order system
Example (3)
Gaussian noise is added to the simulated output:
B(q)
y(t) = u(t) + e(t),
A(q)
with e(t) ∈ N (0, σ 2 ) with σ = 0.01
0.5
0.4
0.3
0.2
output
0.1
−0.1
−0.2
−0.3
0 200 400 600 800 1000
time [samples]
Figuur 13: *
Noise-free (red) and noisy (blue) response to step-up and down input excitation
Example (4)
The known and unknown parts of the model equal:
Bk (q) q −1
= −1
,
Ak (q) 1−q
Bu (q, θ) b1 + b2 q −1 + b3 q −2
= −1 −2
.
Au (q, θ) 1 + a1 q + a2 q
We apply scheme 1:
u∗ (t) = q −1 u(t) = u(t − 1),

y ∗ (t) = (1 − q −1 )y(t) = y(t) − y(t − 1).
We apply either low-pass data filtering or the Sanathanan Koerner approach.
Example (5)
40
Magnitude [dB]
20
−20
−40
−60
0 2 4 6 8 10
−50
Phase [degrees]
−100
−150
−200
−250
−300
0 2 4 6 8 10
Frequency [Hz]
Figuur 14: *
Magnitude and phase of exact model (red) and identified model using low-pass
filtered data (blue) and model obtained after one Sanathanan-Koerner step
(green)
Example (6)
40
Magnitude [dB]
20
−20
−40
−60
0 2 4 6 8 10
0
Phase [degrees]
−100
−200
−300
0 2 4 6 8 10
Frequency [Hz]
Figuur 15: *
Magnitude and phase of exact model (red) and identified model using low-pass
filtered data taking a priori knowledge into account (blue) and without
accounting for the a priori knowledge (green)
Example (7)
1.2
0.8
simulated output
0.6
0.4
0.2
−0.2
−0.4
0 200 400 600 800 1000
time [samples]
Figuur 16: *
Time-domain evaluation: comparison of simulated output of three models using
step-up and down input signal: (1) exact model (red) (2) model with a pole at
z = 1 (a priori system knowledge taken into account) (blue) and (3) model with
a pole at z[H00S3A-H00S4A-H04X3A-H04X3B]
= 0.9987 (full model identified) (green)
Conclusions: revisit the objectives

• The student has basic knowledge on system identification in general and on
least-squares time-domain identification (LS-TDI) in particular.
• The student is able to apply the LS-TDI procedure to linear time-invariant
systems with one input and one output. This includes selection of model
structure, evaluation of model accuracy, application of simple measures to
improve model accuracy.

Lecture c4 System Identification

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture c4 System Identification

Uploaded by

Copyright:

Available Formats

– Control Theory –

C4. Time domain system identification

Outline of this chapter

Reference material: course notes.

What is system identification

The system identification procedure

Collect input-output measurement data

Choosing a convenient model set

Match the selected model structure to the measurements (parameter

Validation the model

The identification loop

The measurement configuration

Continuous-time measurement configuration

zoh-measurement configuration (2)

zoh-measurement configuration (3)

zoh-measurement configuration (4)

Relation between continuous-time signal y(t) (black line) and discrete-time

Models and prediction

Discrete-time representation of continuous-time systems

• The Laplace transform of the impulse response {g(τ )}∞

d0 snd + d1 s(nd −1) + · · · + dnd

Discrete-time models for the zoh measurement configuration (2)

u(t) = uk , kT ≤ t < (k + 1)T.

where we define the (discrete) impulse response of that system:

Discrete-time models for the zoh measurement configuration (3)

• The z-transform of the discrete impulse response {g(k)}∞

b0 z nb + b1 z (nb −1) + · · · + bnb

Discrete-time models for the zoh measurement configuration (4)

Discrete-time models for the zoh measurement configuration (5)

uk uk+1 uk+2 t/Ts

Timing diagram (2)

Timing diagram (3)

• Advantage: the effective delay between measurement and control action is

Introducing disturbances (2)

where e(t) is a sequence of independent (identically distributed) random

• The covariance of v(t) equals:

σ 2 is the variance of e(t).

Introducing disturbances (3)

We assume that H(q) is stable:

and define the z-transform of its impulse response :

One-step-ahead prediction of v(t) (2)

Hence we can define:

One-step-ahead prediction of v(t) (3)

The knowledge of v(s) for s ≤ t − 1 implies the knowledge of e(s) for s ≤ t − 1.

One-step-ahead prediction of y(t)

ŷ(t|t − 1) = G(q)u(t) + v̂(t|t − 1),

The prediction error equals:

ε(t|t − 1) = y(t) − ŷ(t|t − 1) = −H −1 (q)G(q)u(t) + H −1 (q)y(t) = e(t).

Discrete-time input-output models for linear time-invariant systems

y(t) = G(q, θ)u(t) + H(q, θ)e(t).

ε(t|t − 1, θ) = y(t) − ŷ(t|t − 1, θ) = −H −1 (q, θ)G(q, θ)u(t) + H −1 (q, θ)y(t).

• ε(t|t − 1, θ) = e(t) (a sequence of independent random variables) if θ = θ 0

Equation error model structure

ARX model structure

A(q, θ) = 1 + a1 q −1 + . . . + ana q −na ,

ARX model structure (2)

ARX model structure: linear regressor

ŷ(t|θ) = B(q, θ)u(t − nk ) + [1 − A(q, θ)]y(t).

Now introduce the vector:

ŷ(t|θ) = θ T ϕ(t) = ϕT (t)θ.

Other model structures

Other model structures (2)

which means that the probability P (|θ(Z N ) − θ0 | > ) → 0 for N → ∞, for