Material: Part I: Introduction To System Identification

' System Identification
$ ' $
sc4110
X. Bombois
P.M.J. Van den Hof
Material: Part I: Introduction to system identification
• Lecture notes sc4110 - January 2006

available through: Blackboard or Nextprint
& % & %
Lecture hours: (see schedule for details)
• Monday (15:45-17:30) in Room D (3Me)
• Thursday (15:45-17:30) in Room D (3Me)
' $ ' $
• Friday (13:45-15:30) in Room A (API)
System Identification: Part I 1 System Identification: Part I 2
Notion of model common in many branches of science
Within (systems and control) engineering:

models of dynamical systems for the purpose of
System identification is about modeling • system (re)design

• control design
• prediction
& % & %
• simulation
• diagnosis / fault detection

' $ ' $
Data-based modeling of the DCSC DC motor
System identification is about data-based modeling motor (measured )

(applied ) rotational speed
voltage [V] dynamics [rad /s]
?
data-based modeling ???
Determine a model of the dynamical relation existing between
& % & %
the voltage u(t) driving the motor and the angular speed y(t)
of the rotor
'
System Identification: Part I
How to proceed?
$ '
5 System Identification: Part I
• Given a candidate model (i.e. a transfer function), we can use

the available data to compute the signal ε(t) featuring the
$
6
• Excite the system by applying the following sequence for the modeling error
voltage u(t) during 20 seconds
6
?
4
2
motor y(t)
applied u(t) u(t)
voltage [V]
dynamics
−2
−4
−6
0 2 4 6 8 10
time (sec)
12 14 16 18 20
y(t)
• Measure the induced rotational speed y(t) +

model ε(t)
& % & %
-
150
100
50
speed [rad/s]
measured y(t) −50

0
• determine that model minimizing the power of ε(t) (often a

−100
−150
filtered ε(t); see later)
0 2 4 6 8 10 12 14 16 18 20
time (sec)

' $ ' $
150
Identification result: a discrete-time transfer function (4th order)

100
50
Frequency response of the identified model
From: u1 To: y1
Measured y(t) (blue)

2
10
vs. ǫ(t) (red)

−50
1
10
Frequency
Magnitude (abs)
−100
response 0
10 −150
0 2 4 6 8 10 12 14 16 18 20
& % & %
−1
10 −1
10
0
10 10
1
10
2 3
10
ε(t) contains not only the model inaccuracy, but also the noise
Frequency (rad/sec)
acting on the system
'
$ '
Example 1: control of the pick-up mechanism of a CD-player

$
10
pick up mechanism: position the reading tool (laser) on the

Why is data-based modeling useful ?
right track of the CD using a mechanical arm
When thinking of modeling, we indeed generally think of

first-principle modeling and not data-based modeling
first-principle modeling = modeling using the laws of physics

(Newton, mass conservation.. )
& % & %
However, data-based modeling is often as important as
first-principle modeling Arm is driven by the current i(t) of a motor
Optical sensor to measure the laser position θ(t)

' $ ' $
positioning First-principle modeling
Dynamical system i(t) θ (t)
mechanism
The model is designed based on the Newton law
Objective: design a fast and precise position controller (required

bandwidth ≈ 1000 Hz) Since the current induces a force, the relation between the
current and the position is modeled by a double integrator
θref (t)
i(t) positioning θ (t)
controller
& % & %
mechanism
=⇒ model is needed The controller designed with this physical model could not
achieve the desired bandwidth without thrilling
'
Data-based modeling
$ '
$
14
An identification experiment was then performed and the

following model identified These flexible modes are quasi impossible to model with
physical laws
double integrator Identified model =⇒ new controller design
Since all significant dynamics were now tackled, the controller

flexibility modes
based on the identified model showed satisfactory behaviour
& % & %
(actuator dynamics)
For a bandwidth of ≈ 1000 Hz, the mechanical modes can no

longer be neglected and should be tackled by the controller

' Example 2: fatigue load reduction for new generation of wind
turbines J.W. Wingerden et al., Wind Energy, Wiley, 2008
$ ' To enhance the life duration of wind turbines, these vibrations
must be regulated
$
Two control loops to reduce the strain in the blade:
• pitch control: optimal orientation of the blades
• flap control: optimal orientation of flaps added to the blade
blade structure
blade
& The blades of a wind turbine are subject to high vibration loads
% & %
' $ ' $
due to wind gust, periodic rotations, ... flap
First-principle modeling
For control design, we need a model of the dynamics between Model based on aerodynamic and mechanical laws
the pitch and flap actuators and the strain in the blade: linear model (order = 28)
many physical parameters to determine =⇒ high uncertainty
wind 102
From Pitch Actuator [deg] From Smart actuator 1&2 [V]
Strain sensor Amplitude [V]

10
100
0.025
10−2
10−4
pitch actuator 100 4 101 102 100 4 101 102
20 20
Strain sensor Phase [deg]

system strain 180
flap actuator
& % & %
-360
-540
100 4 101 102 100 4 101 102
Frequency [Hz] Frequency [Hz]
With this model, impossible to deduce a controller stabilizing

the real-life system

' Data-based modeling
$ ' Important differences between the two models
$
We excite both inputs up to 100Hz (important band for
control) and measure the corresponding strain Behaviour in low frequencies (the physical model did not take
into account the strain sensor dynamics)
Based on these data, the following model is identified
Extra resonance between 10 Hz and 100 Hz due to other
From Pitch Actuator [deg] From Smart actuator 1&2 [V]
vibration modes (unmodeled in the first-principle approach)
Strain sensor Amplitude [V]
2
10
1st flapping mode 2st flapping mode
100
10−2 1st flapping mode

The identified model is simpler (order = 10) and less uncertaina
2st flapping mode
1st lead-lag mode

10−4
100 1P 3P 102 100 1P 3P 102
control design based on the identified model leads to a
& % & %
Strain sensor Phase [deg]
satisfactory reduction of the strain in the blade

-360
a parameters of first-principle model have in fact been tuned with the identified
-720
100 101 102 100 101 102 model
' $ ' $
Frequency [Hz] Frequency [Hz]
Example 3: Signal equalization in mobile telephony
clouds
mobile phone receiving
a signal y(t) A model of the so-called channel is required to reconstruct u(t)
from the distorted y(t)
Antenna emitting ground This model can not be determined in advance since the position
a signal u(t)
of the mobile phone is mobile (by definition)
The received signal y(t) is made up of several delayed versions
of the emitted signal u(t) + noise The model is identified at each received call
& % & %
y(t) = g1 u(t − n1 ) + g2 u(t − n2 ) + ... + noise
=⇒ distorted signal

' How to proceed?
$ ' Both the known sequence and the signal of interest are
$
distorted by the channel
When emitting u(t), the signal of interest uinterest (t) is
2
preceded by a known sequence uknown (t) 1.5
2 0.5
y(t)
1.5 0
1 -0.5
0.5 -1
u(t)
0 -1.5
known sequence signal of interest
distorted by channel distorted by channel
-0.5 -2
0 200 400 600 800 1000 1200 1400
& % & %
tim e
-1
-1.5
known sequence signal of interest
-2
0 200 400 600
tim e
800 1000 1200
Denote by yknown (t) and yinterest (t) the received signals
corresponding to uknown (t) and uinterest (t), respectively
'
$ '
Summary: First-principle vs. Data-based modeling”

$
26
the two methodologies are often combined to increase

confidence in the model
General disadvantages of first-principle modeling
Since uknown is a known sequence, the GSM software uses the
data uknown and yknown to identify a model of the channel • model contains many unknown (physical) parameters =⇒
high uncertainty (not quantifiable)
This model can be then used to determine an appropriate filter • model generally more complicate than with system
to reconstruct uinterest (t) from yinterest (t) identification
• missing actuator/sensor dynamics and phenomena can be
& % & %
forgotten
• sometimes impossible to determine (as in example 3, but also
in the process industry)
• no disturbance model

' $ ' the signal v(t) is an unknown disturbance (noise, process
disturbance, effects of non-measured inputs, ..)
$
System Identification: the players
v It can be best modeled via a (zero-mean) stochastic process.

to-be-identified system u + y Indeed, v(t) will never be the same if you repeat the experiment
G0 +
The challenging nature of system identification is due to the

u(t) is the (discrete-time) input which can be freely chosen presence of v(t)
y(t) is the (discrete-time) output which can be measured and is
made up of If v(t) = 0, it is just an algebraic game to find the relation
& % & %
• a contribution due to u(t) i.e. G0 u(t) between u(t) and y(t)
• a contribution independent of u(t) i.e. the disturbance v(t)

As result, an identification experiment (generally) delivers both
' $ ' $
a model of the transfer G0 and of the disturbance v(t)
System identification procedure

prior knowledge /
intended model application
Identification Criterion
Experiment
design Measure the “distance” between a data set (u, y)t=1,···N and a
particular model.
Identification
Data Model Set
Criterion In this course, we will consider two criteria
Construct model • Prediction Error Identification (PEI) delivering a
& % & %
discrete-time transfer function as model of G0
Validate
NOT OK • Empirical Transfer Function Estimate (ETFE) delivering an
intended
Model
model application estimate of the frequency response of G0
OK

' $ ' $
Why those?
• PEI is the most used method in practice and the one
delivering the most tools to validate a model Model set
• ETFE is used to have a first idea of the system and facilitate
the use of PEI Complexity of models (order, number of parameters) to be
determined
Other criteria: subspace identification, IV methods, ML
& % & %
methods, ...
'
Experiment Design
$ '
Model validation
$
34
• Choice of the type of excitation • Comparing the actual output of the system with the output
predicted by the model
• sum of sinusoids (multisine)
• realization of (filtered) white noise or alike • Determining the uncertainty of the system e.g. in the
frequency domain
• Which frequency content?
2
10
amplitude
0
10
• Which duration? 10
10
-2
-4
& % & %
-2 -1 0
10 10 10
0
phase
-200
-400
-600
-2 -1 0
Experiment design is very important since it has a direct

10 10 10
frequency
influence on the quality of the model

• ....
' System identification for (robust) control
$ ' $
amplitude
10
2 History
0
10
disturbance
10
-2
-4
• Basic principle (LS) from Gauss (1809)
Data → Model
10
input output 10
-2
10
-1
10
0
process
process
phase
0
• Development based on theories of
-200
-400
- stochastic processes
Identification
Feedback control system -600
10
-2
10
-1
frequency
10
0
- statistics
• Strong growth in sixties and seventies
Feedback
Feedbackcontrol
controlsystem
system Åström en Bohlin (1965), Åström en Eykhoff (1971)
disturbance
• Brought to technological tools in nineties
(Matlab Toolboxes for either time-domain of frequency domain),
& % & %
reference
Model → Controller input + output
as well as to professional industrial control packages
controller
controller process
process
- (Aspen, SMOC-PRO, IPCOS, Tai-Ji Control, AdaptX, ...).
'
$ '
$
38
Notions from estimation theory

Estimator θ̂N of θ0 based on N data points.
a. Unbiased (zuiver): E θ̂N = θ0
b. Consistent. θ̂N is consistent if:
• P r[limN →∞ θ̂N = θ0 ] = 1
Bull’s eye represents θ0 ;
• θ̂N → θ0 with probability 1 voor N → ∞.
left: unbiased estimate with small variance
& % & %
c. Variance: cov(θ̂N ) = E(θ̂N − E θ̂N )(θ̂N − E θ̂N )T . middle: biased estimate with small variance
right: unbiased estimate with large variance

' $ ' 1. Introduction
$
Why are discrete-time systems and signals important in system
identification?
In system identification, we deal with measured signals =⇒

Part II: RECAP on discrete-time systems and signals discrete-time signals
=⇒
& % & the models/systems can be represented by discrete-time transfer
%
' $ ' $
functions
System Identification: part II 1 System Identification: part II 2
2. Discrete-time systems
Continuous-time vs. Discrete-time systems u x[n]

ZOH ucont Continuous ycont Sampling y
Ts system Ts
u x[n]
Ts system Ts The system is excited via the discrete sequence u(t)
t = 0, 1, 2, ... generated by a PC
This discrete signal is made continuous by the Zero Order Hold
The continuous output ycont (tc ) (tc ∈ R) of the system is (ZOH):
& % & %
sampled with sampled time Ts
ucont (tc ) = u(t) for tTs ≤ tc < (t + 1)Ts
This sampling delivers the discrete measurements y(t) where
t = 0, 1, 2, ... i.e. y(t) = ycont (tTs )

' Illustration:
$ ' Upper plot: the discrete sequence u(t)
$
Bottom plot: the continuous signal ucont made by the ZOH
10 (red) compared with the discrete sequence u(t) (blue)
Continuous system: G0 (s) = s+10
1
0.8
Sampling time: Ts = 0.04 s. 0.6
0.4
0.2
The sequence u(t) is made up of 41 samples i.e. t = 0...40 0

0 5 10 15 20 25 30 35 40
samples
1
 0.8
 0 f or 0 ≤ t ≤ 2
& % & %

 0.6
u(t) = 0.8 f or 3 ≤ t ≤ 17 0.4

 0.2
0.5 f or 18 ≤ t ≤ 40

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
time (s)
'
System Identification: part II
The continuous signal ucont is then filtered by G0 (s) delivering

the continuous signal ycont (upper plot, red). This continuous
$ '
5 System Identification: part II
$
6
signal is then sampled with a sample period Ts = 0.04s. (upper Discrete-time transfer function
plot, blue circle). This delivers the discrete sequence y(t) of 41
samples (t = 0...40) (bottom plot)
Does it exist a transfer function relation between y(t) and u(t)?
1
0.8
0.6
0.4
0.2
u ZOH ucont Continuous ycont Sampling y
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Ts system Ts
time (s)
& % & %
0.8
0.6
G0(z)
0.4
0.2
0
0 5 10 15 20 25 30 35 40
samples

' $ ' Example:
$
u ZOH ucont Continuous ycont Sampling y
Ts system Ts
a
When G0 (s) = s+a and Ts = 0.04s., the discrete-time transfer
function between y(t) and u(t) is
G0(z)
(1 − b)z −1
+∞ G0 (z) = with b = e−aTs
X
−t 1 − bz −1
y(t)z
Y (z)
∆ t=−∞
G0 (z) = = Thus:
U (z) +∞
& % & %
X
u(t)z −t 10 0.33z −1
t=−∞ G0 (s) = ←→ G0 (z) =
s + 10 1 − 0.67z −1
G0 (z) can be computed from G0 (s) with the function c2d.m of
' $ ' $
Matlab (ZOH methodology)
Proof:
Properties of discrete-time transfer function
Suppose u(t) is a discrete step, then ucont (tc ) is a continuous step.
The step response of G0 (s) is, for tc > 0,
ycont (tc ) = 1 − e−at c u y

G0(z)
The sampled signal y(t) is given by ycont (tc ) at samples tc = tTs i.e.,
for t > 0,
∆
y(t) = ycont (tTs ) = 1 − e−atT s
With some abuse, we will write
= 1 − bt
& % & %
1 1
Y (z) 1−z −1
− 1−bz −1 (1 − b)z −1
G0 (z) = = 1 = y(t) = G0 (z)u(t)
U (z) 1−z −1
1 − bz −1

' y(t) = G0 (z)u(t)
$ ' Remark:
$
pure delays can be easily represented within G0 (z)
can be seen as a difference equation since:

For continuous transfer function, a pure delay of α = βTs
seconds (β integer) is a non-rational part:
∆
z −1 u(t) = u(t − 1)
10
e−αs
s + 10
Example:
& % & %
bz −1
y(t) = u(t) ⇐⇒ y(t) − ay(t − 1) = bu(t − 1) The corresponding rational discrete transfer function is:
1 − az −1
this allows to compute the sequence y(t) as a function of the 0.33z −1
' $ ' $
sequence u(t) z −β
1 − 0.67z −1
Impulse response of G0 (z)
Assume G0 (z) is causal Indeed:
The impulse response g0 (t) t = 0.. + ∞ is the response ∞

X
y(t) = G0 (z)u(t) when u(t) is a discrete pulse δ(t) i.e. y(t) = G0 (z)δ(t) = g0 (k)δ(t − k) = g0 (t)
k=0
u(t) = 1 when t = 0 and u(t) = 0 elsewhere
This response allows to rewrite G0 (z) as follows:

The impulse sequence g0 (t) can be deduced
• by solving the difference equation for u(t) = δ(t)
& % & %
∞
• by dividing the numerator of G0 (z) by its denominator
X
G0 (z) = g0 (k)z −k
k=0

' $ ' $
Stability of G0 (z) Frequency response of G0 (z)
a transfer function is stable ⇐⇒ the poles of G0 (z) are all the frequency response of G0 (z) is given by the transfer
located within the unit circle function evaluated at z = ejω i.e. on the unit-circle:
G0 (z = ejω )
Example:
Only the frequency response between [0 π] is relevant.
bz −1
stable ⇐⇒ |a| < 1
& % & %
1 − az −1 Discrete frequency ω ∈ [0 π] =⇒ actual frequency
ωactual = Tωs (ωactual lies within the interval between 0 and the
Indeed, z = a is the unique pole of 1 − az −1 Nyquist pulsation)
'
General interpretation:
$ '
$
18
Frequency response representation: bode plot

Y (ω) = G0 (ejω )U (ω)
Bode Diagram
40
30
with Y (ω), U (ω) the Fourier transform of y(t), 20
10
u(t) (t = −∞... + ∞)
Magnitude (dB)
0
−10
−20
−30
−40
One particular consequence: −50
−60
0
−180
Phase (deg)
u(t) = sin(ω0 t) =⇒ −360
& % & %
−540
−720
−2 −1 0 1
10 10 10 10
Frequency (rad/sec)
y(t) = G0 (z)u(t) = |G0 (ejω0 )| sin ω0 t + ∠G0 (ejω0 )


' $ ' $
Remarks 2. Non-linearities
1. Choice of Ts We adopt a linear framework to define the relation between u

and y
The sampling period Ts is an important variable
We thus analyze the behaviour around one particular set-point
It should be chosen so that [0 Tπs ] covers the band of
significance of the continuous-time system If the system is used at multiple set-points, a model must be
& % & %
identified for each of them (and coupled with a scheduling
See end of the course for methodologies to choose Ts function)

' 3 Discrete-time signal analysis
$ ' $
Disturbance v(t):
Signals encountered in system identification 2.5
stochastic
1.5
0.5
signal
-0.5
-1
Input u(t):
-1.5
-2
-2.5
0 10 20 30 40 50 60 70 80 90 100
1.5
Output y(t):
0.5
multisine 0
-0.5
-1
-1.5
-2
0 10 20 30 40 50 60 70 80 90 100
y(t) = G0 (z)u(t) + v(t)
& % & %
2.5
1.5
(filtered) 1
0.5
-0.5
white noise -1
-1.5
-2
-2.5
0 10 20 30 40 50 60 70 80 90 100
'
Observations
$ '
Recap: Stochastic vs. Deterministic signals

$
24
the values taken by a stationary stochastic signal at different t

finite-power signals =⇒ analysis via their power spectrum Φ(ω) are different at each experiment/realization
(i.e. distribution of power content over the frequency ω)
BUT, each realization has the same power content over ω (i.e.
signal y(t) can be made up of a combination of stochastic and the same Φ(ω))
deterministic signal (e.g. when u(t) is a multisine)
=⇒ make it complicate to define Φ(ω)
& % & %
Ensemble
three real-
A new theory is necessary to deal with such signals called izations
quasi-stationary signals (see later)
t Time
0

' $ ' $
Analysis of quasi-stationary signals
Stationarity also implies that the mean of the signal and the
auto-correlation function is time-invariant A quasi-stationary signal is a finite-power signal which can be
• a stochastic signal (stationary)

the values taken by a deterministic signal at different t and thus • a deterministic signal
Φ(ω) are the same for all experiments/realizations
• the summation of a stochastic and a deterministic signal
& % & %
In identification, the deterministic signals are the multisines
Analysis very close to the one of stationary stochastic signals
(see WB2310 S&R3)
'
Mean Ēu(t) of a quasi-stationary signal u(t)

$ '
$
28
N Power spectrum Φu (ω) of a quasi-stationary signal

1 X
Mean of a deterministic signal u(t): lim u(t)
N →∞ N t=1
The power spectrum of u(t) is defined as the Fourier Transform
Mean of a stochastic signal u(t): Eu(t)
of the auto-correlation function of u(t):
=⇒ New operator Ē ∆
+∞
X
Φu (ω) = Ru (τ ) e−jωτ
τ =−∞
N
∆ 1 X with
Ēu(t) = lim Eu(t)
& % & %
N →∞ N
t=1
∆
Ru (τ ) = Ē (u(t) u(t − τ ))
for purely stochastic or deterministic signal, the new operator is

equivalent to the classical mean operator

' ∆
Total power Pu = Ēu2 (t) of u(t):
$ ' Φu (ω) =
+∞
X
2 −jω0
Ru (τ )e−jωτ = σu e 2
= σu ∀ω
$
π
1
Z
τ =−∞
Pu = Ru (0) = Φu (ω)dω
2π −π
Φ u (ω)
Example 1: Φu (ω) and Pu when u(t) is a white noise of
2
variance σu ? σ 2
u
1 X
N ω
Ru (τ ) = lim E (u(t)u(t − τ )) −π 0 π
N →∞ N t=1
& % & %
= E (u(t)u(t − τ )) by stationarity

 σ 2 when τ = 0 2
and Pu = Ru (0) = σu
∆ u
=
 0 when τ 6= 0
'
$ '
=⇒ Ru (τ ) =
$
32
N 2
1 X A A20

lim cos(ω0 τ ) − cos(2ω0 t − ω0 τ + 2φ)
N →∞ N 2 2
Example 2: Φu (ω) and Pu when u(t) = Asin(ω0 t + φ) t=1
since Es(t) = s(t) for a deterministic signal.
Ru (τ ) = Ē (u(t)u(t − τ )) A2
2
=⇒ Ru (τ ) = cos(ω0 τ )
= Ē A sin(ω0 t + φ) sin(ω0 t − ω0 τ + φ) 2
2
A A20

= Ē cos(ω0 τ ) − cos(2ω0 t − ω0 τ + 2φ) and thus, in the fundamental frequency range [−π π],
2 2
& % & %
A2 π
Φu (ω) = (δ(ω − ω0 ) + δ(ω + ω0 ))
2
A2
and Pu = Ru (0) = 2
.

' $ ' $
A2 π
Φu (ω) = (δ(ω − ω0 ) + δ(ω + ω0 ))
2
The power spectrum of the sinus is independent of its phase Properties of the power spectrum
shift φ and is = to 0 except in ±ω0 where it is infinite.
y(t) = G(z)u(t) =⇒ Φy (ω) = |G(ejω )|2 Φu (ω)
Φ u(ω)
y(t) = s1 (t) + s2 (t) with s1 (t) independent of s2 (t)
=⇒ Φy (ω) = Φs1 (ω) + Φs2 (ω)

ω
& % & %
−π −ω 0 0 ω0 π
'
Cross- and auto-correlation function

$ '
Approximations of Ru (τ ) and Φu (ω) using finite data

$
36
The cross-correlation Ryu (τ ) between y and u is a function

which allows to verify whether two q-s signals y(t) and u(t) are To exactly compute Ru (τ ) and Φu (ω), we need both an infinite
correlated with each other number of measurements of u(t) and an infinite number of
realizations of u(t)
∆
Ryu (τ ) = Ē (y(t)u(t − τ )) In practice, we have generally N < ∞ measurements of u(t):
{u(t) | t = 0...N − 1}
Properties:
A. Approximation of Ru (τ ) and properties of this approximation
• the value of y(t) at time t is not (cor)related in any way to

the value of u(t − τ ) =⇒ Ryu (τ ) = 0 N −1
 1
 X
u(t)u(t − τ ) f or |τ | < N − 1
& % & %

N
• the signals y(t) and u(t) are independent =⇒ R̂u (τ ) = N t=0

Ryu (τ ) = 0 ∀τ  0

f or |τ | > N − 1
NB. Ru (τ ) = Ruu (τ )

' $ ' B. Approximation of Φu (ω) (Periodogram) and properties of
this approximation
$
Φu (ω) can be approximated in two equivalent ways:
This approximation is a consistent estimate of Ru (τ ) i.e.
+∞
X
Φ̂N
u (ω) = N
R̂u (τ ) e−jω τ
N
lim R̂u (τ ) = Ru (τ ) τ =−∞
N →∞
∗
= UN (ω) UN (ω)
with UN (ω) the (scaled) Fourier Transform of {u(t) | t = 0...N − 1}

For fixed N , though, the accuracy of N
(τ )
decreases for
R̂u i.e.
N
increasing values of τ since R̂u (τ ) is computed with lesser and
& % & %
N −1
1 X
lesser products u(t)u(t − τ ) UN (ω) = √ u(t) e−jω t
N t=0
Note: the approximation via UN (ω) is the most logical for
' $ ' $
deterministic signals
Example 1: we have collected N = 1000 time-samples of a

2
white noise of variance σu = 100
When u(t) is deterministic, Φ̂N
u (ω) is a consistent estimate of
Φu (ω) 40
30
lim Φ̂N (ω) = Φu (ω) 20

N →∞ u
10
For all other cases, we have only that Φ̂N

u (ω)
is an
u(t)
0
asymptotically unbiased estimate of Φu (ω) (variance is nonzero) −10
−20
& % & %
lim E Φ̂N
u (ω) = Φu (ω)
N →∞ −30
−40
0 100 200 300 400 500 600 700 800 900 1000
time

' Obtained Periodogram Φ̂N
u (ω) (blue) w.r.t. Φu (ω) (red)
$ ' As expected, it does not change when N is increased to
$
3
N = 10000:
2.5
3
2
2.5
1.5
Periodogram (logarithmic scale)
1
1.5
Periodogram (logarithmic scale)

0.5
1
0
0.5
−0.5
0
−1
−0.5
−1.5
& % & %
−1
−2
0 0.5 1 1.5 2 2.5 3 −1.5
ω
−2
0 0.5 1 1.5 2 2.5 3
ω
Φ̂N
u (ω) is an erratic function fluctuating around Φu (ω)
'
Example 2: we have collected N = 100 time-samples of

$ '
Obtained Periodogram Φ̂N

u (ω) for ω ∈ [0 π]
$
44
u(t) = sin(0.63t) + 12 sin(1.26t) + 34 sin(1.89t) (fundamental 30
period = 10 time-samples): 25
2
20
Periodogram
1.5
15
1
0.5 10
u(t)
0
5
−0.5
& % & %
0 0.5 1 1.5 2 2.5 3
ω
−1
−1.5
−2
0 10 20 30 40 50 60 70 80 90 100
It can be proven that the value at ω1 = 0.63, ω2 = 1.26 and
time N A2
ω3 = 1.89 are given by 4 i where Ai (i = 1, 2, 3) is the
amplitude of the sinusoid of frequency ωi .

' For N → ∞, Φ̂N u (ω) tends thus to Φu (ω). Here is the
$
periodogram for the same signal when N = 1000
300
250
200
Periodogram
150
100
& %
50
0
0 0.5 1 1.5 2 2.5 3
ω
System Identification: part II 47

' $ ' 1. Introduction about Prediction Error Identification
$
1.1. Assumptions on the True System: S = { G0 H0 }
v(t)
z }| {
y(t) = G0 (z)u(t) + H0 (z)e(t)
Part III: Prediction Error Identification e(t)
H0(z)
u(t) + y(t)
G0(z)
& % & %
' $ ' $
true system S
System Identification 0 System Identification 1
G0 (z) and H0 (z) are two unknown linear transfer functions in

3z −1
the Z-transform ( e.g. G0 (z) = 1+0.5z −1
and the disturbance v(t) represents the measurement noise; the
1
H0 (z) = 1+0.5z −1
) effects of stochastic disturbance, the effects of non-measurable
input signals; · · ·
the input signal u(t) is chosen by the operator and applied to S the disturbance v(t) is modeled by H0 (z)e(t):
and the output signal y(t) is measured
• H0 (z) is stable, inversely stable and
y(t) is assumed to be made up of two distinct contributions:
P∞
monic (i.e. H0 (z) = 1 + k=1 h0 (k)z −k )
• G0 u(t): dependent of the choice of u(t) • e(t) is a white noise signal i.e. a sequence of independent,
identically distributed random variables (no assumption is
& % & %
made on the probability density function)
• the disturbance v(t) = H0 (z)e(t) : independent of the
input signal u(t)

' Properties of e(t) and v(t) as a consequence of the assumptions
$ ' 1.2. Objective of PE Identification
General Objective
$
Since {e(t)} is a white noise, Find the best parametric models G(z, θ) and H(z, θ) for the
unknown transfer functions G0 and H0 using a set of measured
data u(t) and y(t) generated by the true system S.
Ee(t) = 0
∆
Re (τ ) = Ee(t)e(t − τ ) = σe2 · δ(τ ) Example of parametric models:
{v(t)} is therefore the realization of a stochastic process with θ1 z −1 1

G(z, θ) = H(z, θ) =
properties: 1 + θ2 z −1 1 + θ2 z −1
& % & %
 
Ev(t) = 0 θ1
θ=  M = { G(z, θ), H(z, θ) ∀θ ∈ R2 }
Φv (ω) = |H0 (eiω )|2 · σe2 θ2
' $ ' $
Note: H (z, θ) is always chosen as a monic transfer function (like H0 )
In the beginning, we will make the following assumption:

Summary: the full-order identification problem
∃θ0 such that G(z, θ0 ) = G0 (z) and H(z, θ0 ) = H0 (z) Consider the following true system:
v(t)
i.e. S ∈ M z }| {
y(t) = G0 (z)u(t) + H0 (z)e(t) = G(z, θ0 )u(t) + H(z, θ0 )e(t)
The objective can therefore be restated as follows:
from which N input and output data have been measured:
Find (an estimate of) the unknown parameter vector θ0 using a
set of N input and output data: Z N = { u(t), y(t) | t = 1...N }.
& % & %
Z N = { u(t), y(t) | t = 1...N } Given the parametrization G(z, θ) and H(z, θ), find (an
estimate of) the unknown parameter θ0 .
generated by the true system i.e. y(t) = G0 u(t) + H0 e(t)

' $ ' $
Simple idea to reach this objective :
Problem: y(t, θ) can not be computed since the white noise
sequence e(t) is unknown
Let us simulate the parametric models with the input u(t) in
ZN : Consequences:
y(t, θ) = G(z, θ)u(t) + H(z, θ)e(t) • we need to find an accurate way to predict y(t, θ)
and let us find the vector θ for which: • the predictor ŷ(t, θ) should be chosen in such a way that θ0
y(t) − y(t, θ) = 0 ∀t = 1...N can still be deduced e.g. by minimizing the power of
y(t) − ŷ(t, θ)
& % & %
In other words, θ = θ0 minimizes the power of y(t) − y(t, θ)
'
System Identification
$ '
8 System Identification
ǫ(t, θ) compares the output of the true system and the

predicted output of a candidate model. e(t)
$9
2. Predictor ŷ(t, θ) in identification and prediction error ǫ(t, θ)

H0 (z)
N
Given Z and a model G(z, θ), H(z, θ) in M, we define the
predictor ŷ(t, θ) of the output of this model as follows: u(t) y(t)
+
G0 (z)
∆
ŷ(t, θ) = H(z, θ)−1 G(z, θ)u(t)+(1−H(z, θ)−1 )y(t) ∀t = 1...N
and we define the prediction error ǫ(t, θ) as follows: – +

G(z, θ)
∆
ǫ(t, θ) = y(t) − ŷ(t, θ) ∀t = 1...N
& % & %
= H(z, θ) −1
(y(t) − G(z, θ)u(t)) ∀t = 1...N 1
H(z, θ)
ǫ(t, θ)
' Properties of the prediction error ǫ(t, θ)
Property 1. Given θ and Z N , ǫ(t, θ) computable ∀t = 1...N
$ ' Property 2. ǫ(t, θ0 ) = e(t) (smth really unpredictable at time t)
$
Example:
 
  y(t)
θ1
z }| {
θ1 z −1 1 ǫ(t, θ) = H(z, θ)−1 G0 (z)u(t) + H0 (z)e(t) −G(z, θ)u(t)
 
G(z, θ) = 1+θ2 z −1
H(z, θ) = 1+θ2 z −1
θ= 
θ2
G0 (z) − G(z, θ) H0
! = u(t) + e(t)
θ1 z −1 H(z, θ) H(z, θ)
ǫ(t, θ) = 1 + θ2 z −1 y(t) − u(t)
1 + θ2 z −1
= y(t) + θ2 y(t − 1) + θ1 u(t − 1) =⇒ ǫ(t, θ0 ) = e(t)
& % & %
Notes: Property 3. ǫ(t, θ) 6= white noise for all θ 6= θ0 (provided an
it is typically assumed that u(t < 0) = y(t < 0) = 0 appropriate signal u(t))
' $ ' $
H −1 (z, θ) is always causal since H (z, θ) is monic !
Sketch of the proof of Property 4:

2
Property 4. θ0 minimizes the power Ēǫ (t, θ) of ǫ(θ) i.e.
s1 (t,θ) s2 (t,θ)
z }| { z }| {
2
θ0 = arg minθ Ēǫ (t, θ) G0 (z) − G(z, θ) H0 (z) − H(z, θ)
ǫ(t, θ) = e(t) + u(t) + e(t)
1 X
N H(z, θ) H(z, θ)
∆
with Ēǫ2 (t, θ) = lim Eǫ2 (t, θ)
N →∞ N
t=1 with s2 (t, θ) function of e(t − 1), e(t − 2), ... (not of e(t)).
Since ǫ(t, θ0 ) = e(t), we have thus: u(t) and e(t) uncorrelated and e(t) white noise =⇒
Ēǫ2 (t, θ0 ) = σe2 Ēǫ2 (t, θ) = σe2 + Ēs21 (t, θ) + Ēs22 (t, θ)
& % & %
Ēǫ2 (t, θ) > σe2 ∀θ 6= θ0
θ = θ0 minimizes both Ēs21 (t, θ) and Ēs22 (t, θ) by making them
equal to 0.
(the latter provided u(t) has been chosen appropriately)
=⇒ θ = θ0 minimizes Ēǫ2 (t, θ)

' $ ' Example
$
We have collected N = 2000 data u(t) and y(t) on the
following true system

z −3 0.103 + 0.181z −1
Important remark. The two following statements are equivalent: y(t) = u(t)+e(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4
• The true parameter vector θ0 reduces the prediction error
ǫ(t, θ) to the realization of the noise e(t). and we have chosen the following model structure M
• The true parameter vector θ0 minimizes the power of the 
 z −3
b0 + b1 z −1


M = G(z, θ) = ; H(z, θ) = 1
prediction error ǫ(t, θ).  1 + f1 z −1 + f2 z −2 + f3 z −3 + f4 z −4 
& % & %
T
θ= b0 , b1 , f 1 , f 2 , f 3 , f 4
T
=⇒ θ0 = 0.103 , 0.181 , −1.991 , 2.203 , −1.841 , 0.894
'
We have computed ǫ(t, θ) (t = 1...N ) for θ = θ0 and for

another θ i.e. θ1 6= θ0 :
$ '
As can be seen with R̂ǫN (τ ), ǫ(t, θ0 ) has well the properties of a

$
17
4
epsilon(t,theta0) white noise as opposed to ǫ(t, θ1 )
2
auto−correlations of epsilon(t,theta0) (red) and epsilon(t,theta1) (blue)
1.5
0
−2 1
−4
0 200 400 600 800 1000 1200 1400 1600 1800 2000
t 0.5
epsilon(t,theta1)
4
0
2
0
−0.5
−2
& % & %
−4 −1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
t
−1.5
0 5 10 15 20 25 30 35 40
T tau
θ1 = 0.12 , 0.25 , −2 , 2.3 , −1.9 , 0.8

' $ ' $
Estimated power of ǫ(t, θ0 ) : 0.1015 (σe2 = 0.1) Summary:
Estimated power of ǫ(t, θ1 ) : 1.4678 • ǫ(t, θ) is a computable quantity comparing the output y(t)
of the true system and the predicted output of a model
Note: the estimated power is R̂ǫN (0) • θ = θ0 minimizes the power of ǫ(t, θ)
& % & %
'
3. Mathematical criterion for prediction error identification

$ '
$
21
Remark:
3.1. An ideal criterion
There is no difference between θ ∗ and θ0 at this stage of the
Denote by θ ∗ , the solution of the minimization of the power of
course since we suppose S ∈ M and u(t) appropriate.
the prediction error:
θ ∗ = arg minθ V̄ (θ) We nevertheless introduce the new notation θ ∗ since

N
1 X
• when S 6∈ M, the notion of true parameter vector θ0 does
with V̄ (θ) = Ēǫ2 (t, θ) = lim Eǫ2 (t, θ)
N →∞ N t=1 not exist, while the minimum θ ∗ of the cost function V̄ (θ)
exists
Properties of V̄ (θ) and θ ∗ (when S ∈ M and u(t) appropriate)
• if u(t) is not chosen appropriately, then V̄ (θ) has several
& % & %
V̄ (θ) has an unique minimum θ ∗ minima and θ ∗ represents the set of these minima, while θ0
is one single parameter vector
θ ∗ = θ0

' $ ' 3.2. Tractable identification criterion
$
The true parameter vector θ0 is thus the solution of: Power of prediction error is estimated using the N available
data Z N :
arg minθ V̄ (θ)
N
N 1 X 2
1 X VN (θ, Z N ) = ǫ (t, θ)
with V̄ (θ) = Ēǫ2 (t, θ) = lim Eǫ2 (t, θ) N t=1
N →∞ N t=1
N
1 X 2
Question ? Is it possible to consider this criterion ? NO !!! = (H(θ)−1 (y(t) − G(θ)u(t))
N t=1
& % & %
Indeed, the power of the prediction error can not be exactly Parameter estimation through minimizing VN :
computed with only one experiment and only N measured data.
θ̂N = arg min VN (θ, Z N )
θ
'
$ '
Example:
$
25
Consequences and properties of the identified parameter

vector θ̂N : 0.7z −1 1
S: y(t) = u(t) + e(t)
1 + 0.3z −1 1 + 0.3z −1
• different experiments and data =⇒ different θ̂N .
• θ̂N is only an estimate of θ ∗ (= θ0 ).  
bz −1 1 a
• θ̂N is a random variable which is asymptotically (N → ∞) M : G(z, θ) = 1+az −1
H(z, θ) = 1+az −1
θ= 
b
Gaussian with mean θ ∗ :
θ̂N ∼ AsN (θ ∗ , Pθ ) we have applied 20 times the same sequence u(t) of length
N = 200 and we have measured the corresponding y(t).
& % & %
• θ̂N → θ ∗ with probability 1 when N → ∞ (i.e. Pθ → 0
when N → ∞ ) For these 20 experiments, we have computed the estimate θ̂N
T
of θ0 = 0.3 , 0.7

' The twenty estimates θ̂N are represented with a blue cross and
$ ' $
θ0 by a red circle.
How can we solve the optimization problem delivering θ̂N ?
1
0.9
N
0.8 1 X
θ̂N = arg min ǫ2 (t, θ)
0.7 θ N t=1
0.6
N
0.5 1 X 2
b
= arg min (H(θ)−1 (y(t) − G(θ)u(t))

0.4 θ N t=1
0.3
& % & %
0.2
In order to answer this question, the parametrization of G(z, θ)
0.1
and H(z, θ) must be defined more precisely.
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
a
'
4 Black box model structures

$ '
Model structures used in practice

$
29
Model structure: M = {(G(z, θ), H(z, θ)), θ ∈ Rnθ } Model structure G(z, θ) H(z, θ)
General parametrization used in the Matlab Toolbox: z −nk B(z, θ) 1
ARX
A(z, θ) A(z, θ)
z −nk B(z,θ) C(z,θ)
G(z, θ) = F (z,θ)A(z,θ)
H(z, θ) = D(z,θ)A(z,θ)
z −nk B(z, θ) C(z, θ)
ARMAX
A(z, θ) A(z, θ)
T
θ = a1 .. ana b0 .. fnf z −nk B(z, θ)
OE - Output Error 1
B(z, θ) = b0 + b1 z −1 + · · · + bnb −1 z −nb +1 F (z, θ)
A(z, θ) = 1 + a1 z −1 + · · · + ana z −na
& % & %
FIR z −nk B(z, θ) 1
C(z, θ) = 1 + c1 z −1 + · · · + cnc z −nc
D(z, θ) = 1 + d1 z −1 + · · · + dnd z −nd z −nk B(z, θ) C(z, θ)
BJ - Box-Jenkins
F (z, θ) = 1 + f1 z −1
+ · · · + fnf z −nf
F (z, θ) D(z, θ)

' Example: ARX Model structure
$ ' Distinction between model structures
$
• ARX and FIR have a predictor linear in θ
z −nk B(z, θ) 1
G(z, θ) = ; H(z, θ) = ŷ(t, θ) = z −nk B(θ)u(t) + (1 − A(θ))y(t)
A(z, θ) A(z, θ)
= φT (t)θ
with
is a linear function in θ ⇒ Important computational
B(z, θ) = b0 + b1 z −1 + · · · + bnb −1 z −nb +1
advantages.
A(z, θ) = 1 + a1 z −1 + · · · + ana z −na
T
θ = a1 a2 · · · ana b0 b1 · · · bnb −1 . • BJ, FIR and OE have an independent parametrization of
G(z, θ) en H(z, θ)
& % & %
na , nb are the number of parameters in the A and B
polynomial.
There are no common parameters in G and H.
nk number of time delays
⇒ Advantages for independent identification of G and H.

' $ ' $
with
5 Computation of the identified parameter vector θ̂N
φ(t) = (−y(t − 1), ..., −y(t − na ),
N N
1 X 1 X 2 u(t − nk ), ..., u(t − nk − nb + 1))T
θ̂N = arg min ǫ2 (t, θ) = arg min (y(t) − ŷ(t, θ))
θ N t=1
θ N t=1 T
θ= a1 a2 · · · ana b0 · · · bnb −1
5.1 Case of a predictor linear in θ (ARX and FIR)
1 PN 2
VN (θ, Z N ) = N t=1 y(t) − φT (t)θ is quadratic in θ.
z −nk B(θ) 1
G(θ) = ; H(θ) =
A(θ) A(θ)
Predictor ŷ(t, θ): VN (θ)
ŷ(t, θ) = H(θ)−1 G(θ)u(t) + [1 − H(θ)−1 ]y(t)

= z −nk B(θ)u(t) + [1 − A(θ)]y(t)
= φ(t)T θ LINEAR in θ !!!
& % & %
θ
' $ ' $
VN (θ,Z N )
z }| {
N
1 X 2
θ̂N = arg minθ y(t) − φ(t)T θ can be determined
N t=1 As a consequence:
analytically using:
 −1
∂VN (θ, Z N )
=0 1 X N N
1 X
 
∂θ T

= φ(t)φ (t)

θ=θ̂N θ̂N  · φ(t)y(t)
 N t=1 N

 t=1
Indeed: | {z } | {z }
R(N ) f (N )
N
∂VN (θ, Z N ) 1 X
= −2 [φ(t)y(t) − φ(t)φT (t)θ]
∂θ N t=1
Putting derivative to 0 in θ = θ̂N delivers: • Analytical solution through simple matrix operations.
" N
# N
1 X T 1 X
φ(t)φ (t) θ̂N = φ(t)y(t)
N t=1 N t=1
& % & %
' $ ' $
5.2 Case of a predictor nonlinear in θ (OE,BJ,ARMAX) T
θ= b0 · · · bnb −1 f1 f2 · · · fnf
Example of the OE model structure:
VN (θ,Z N )
z −nk B(θ) z }| {
G(θ) = ; H(θ) = 1 N
F (θ) 1 X 2
θ̂N = arg minθ y(t) − φ(t, θ)T θ can not be
N t=1
Predictor ŷ(t, θ):
determined analytically using:
ŷ(t, θ) = H(θ)−1 G(θ)u(t) + [1 − H(θ)−1 ]y(t)
∂VN (θ, Z N )
B(θ) =0
= z −nk u(t) ∂θ
F (θ) θ=θ̂N
= φ(t, θ)T θ NONLINEAR in θ !!! since this derivative is a very complicate expression which is
nonlinear in θ and since this derivative is (generally) equal to 0
with for several θ (local minima).
φ(t, θ) = (u(t − nk ), ..., u(t − nk − nb + 1), The solution θ̂N will therefore be computed iteratively. Risk of
T
−ŷ(t − 1, θ), ..., −ŷ(t − nf , θ))
& % & %
finding a local minimum !
' $ ' $
Counterexample
6 Conditions on experimental data b0 z −1 1

S : y(t) = u(t) + e(t)
1 + f0 z −1 1 + d0 z −1
The ideal identification criterion Consider u(t) = 0 ∀t as input signal and a full-order model
structure M for S:
  
arg min Ēǫ2 (t, θ)

 b 
bz −1 1

 
 
M = G(z, θ) = ; H(z, θ) = θ =  d 
θ 

 1 + f z −1 1 + dz −1
 


f
Consequently:
has a unique solution θ ∗ (i.e. θ ∗ = θ0 when S ∈ M) if the =0
input signal u(t) that is chosen to generate the experimental z }| {
G0 (z) − G(z, θ) H0
data is sufficiently rich. ǫ(t, θ) = u(t) + e(t)
H(z, θ) H(z, θ)
& % & %
' $ ' $
1 + dz −1
=⇒ ǫ(t, θ) = e(t)
1 + d0 z −1
Notion of signal richness: persistently exciting input signals
We know that Ēǫ2 (t, θ) is minimum for θ making ǫ(t, θ) = e(t)
A quasi-stationary signal u is persistently exciting of order n if
=⇒ the (Toeplitz) matrix R̄n is non-singular
 
The power Ēǫ2 (t, θ) is minimized for each θ making Ru (0) Ru (1) ··· Ru (n − 1)
 
H(z, θ) = H0 i.e. 
 Ru (1) Ru (0) ··· Ru (n − 2)


R̄n :=  .. .. .. ..

. .
 
    . . 

 b 
  
Ru (n − 1) Ru (1) Ru (0)
 
∗ 

θ =  d0  ∀b ∈ R and ∀f ∈ R
 ···

 

f
 
Note: θ0 lies in the set of θ ∗ .

& % & %
' $ ' $
Examples:
2
• A white noise process (Ru (τ ) = σu δ(τ )) is persistently
2
exciting of infinite order. Indeed, R̄n = σu In .
• a block signal
2
 
1
1.5
1 3
− 13 −1
1
 
 1 1 1
− 13 
R̄4 =  3 3
0.5
 

 −1 1
1 1 
0
 3 3 3 
−0.5
−1 − 13 1
3
1
−1
−1.5
R̄3 is regular, R̄4 is singular. Consequently, u is p.e. of order 3
−2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
Ru (0) = 1 Ru (1) = 3
Ru (2) = − 13 Ru (3) = −1
Ru (4) = − 13 Ru (5) = 1
3
Ru (6) = 1 etcetera
& % & %
' $ ' $
Another method to determine the order of u

If the spectrum Φu is unequal to 0 in n points in the interval Important result. Let us denote the number of parameters in
(−π, π], then u is persistently exciting of order n the function G(z, θ) by ng . The ideal identification criterion i.e.
Example θ ∗ = arg min V̄ (θ)

θ
The signal
u(t) = sin(ω0 t) has a unique solution (i.e. θ = θ0 ) if the signal u(t) generating
∗
the data is sufficiently exciting of order ≥ ng .

is persistently exciting of order 2. (Φu has a contribution
in ±ω0 ).
& % & %
' $ ' $
Sketch of the proof (case of a FIR model structure):
nb
X
ǫ(t, θ) = y(t) − bk u(t − k) (nk = 1)
k=1
What can we say about the identification of θ̂N ?
θ̂N will be the (consistent) estimate of θ ∗ = θ0 (the unique
θ ∗ is characterized by:
solution of the ideal criterion) if the input signal is sufficiently
θ∗ exciting of order ≥ ng .
  z }| {  
Ru (0) ··· Ru (nb −1) b∗1 Ryu (1)
    
Ru (1) Ru (nb −2)   b2   Ryu (2)
  ∗   
 ··· 
.. .. ..   ..  =  ..
    
.
 
 . .  .   . 
     Remark. In the sequel, we will always assume that the signal
Ru (nb −1) · · · Ru (0) b∗nb Ryu (nb ) u(t) has been chosen such that it is persistently exciting of
sufficient order.
Consequence:
θ ∗ can uniquely be identified if and only if u is persistently
& % & %
exciting of order ≥ nb .

' $ ' $
First experiment on S
Example
we have applied u(t) = sin(0.1t) (u p.e. of order 2) to S and
Let us consider the following true system S: collected N = 2000 IO data:

z −3 0.103 + 0.181z −1 2
input signal u(t)
y(t) = u(t)+e(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4 1
we have chosen the full-order model structure M −1
 
−2
 z −3 b0 + b1 z −1  0 200 400 600 800 1000 1200 1400 1600 1800 2000
M = G(z, θ) = ; H(z, θ) = 1 t
 1 + f1 z −1 + f2 z −2 + f3 z −3 + f4 z −4 
output signal y(t)
2
T 0
θ= b0 , b1 , f1 , f2 , f3 , f4 =⇒ nG = 6
−1
−2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
t
we now perform two identification experiments on S
& % & %
Using the 2000 recorded data, we have identified θ̂N
' $ ' $
G(z, θ̂N ) (blue) is compared with G(z, θ0 ) (red): Second experiment on S
From u1 to y1
we have applied a white noise u(t) (u p.e. of order ∞) to S
2
10
and collected N = 2000 IO data:
0
10
Amplitude
input signal u(t)

2
−2
10
1
−4
10 0
−2 −1 0 1
10 10 10 10
−1
0
−2
−200 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Phase (degrees)
−400 output signal y(t)

2
−600
1
−800
−2 −1 0 1
10 10 10 10 0
Frequency (rad/s)
−1
Due to the lack of excitation, there are multiple θ ∗ which −2

0 200 400 600 800 1000
t
1200 1400 1600 1800 2000
minimize Ēǫ2 (t, θ) and the identified θ̂N is a consistent

estimate of one of these θ ∗ (6= θ0 )
& % & %
Using the 2000 recorded data, we have identified θ̂N

' $
G(z, θ̂N ) (blue) is compared with G(z, θ0 ) (red):
From u1 to y1
2
10
0
10
Amplitude
−2
10
−4
10
−2 −1 0 1
10 10 10 10
−200
Phase (degrees)
−400
−600
−800
−2 −1 0 1
10 10 10 10
Frequency (rad/s)
Since the signal u is p.e. of order ≥ 6, θ ∗ is unique and the

identified θ̂N is a consistent estimate of this θ ∗ = θ0
& %
System Identification 54

When S ∈ M, the identified parameter vector θ̂N has the
following property:
7 Statistical properties of θ̂N when S ∈ M
• θ̂N ∼ N (θ0 , Pθ )
Due to the stochastic noise v(t) corrupting the data Z N , the

• θ̂N → θ0 with probability 1 when N → ∞ (i.e. Pθ → 0
identified parameter vector θ̂N is a random variable i.e.
when N → ∞ ).
the value of θ̂N is different at each experiment

Note: the first property is in fact θ̂N ∼ AsN (θ0 , Pθ )

7.1 Normal distribution of the identified parameter vector θ̂N

Consider an identification experiment on S achieved using an
55
Interpretation of θ̂N ∼ N (θ0 , Pθ )

input signal u(t) and a number N of data
The parameter vector θ̂N identified in such an experiment is the Consider p different identification experiments on S which
realization of a normal distribution: (i)
deliver p different estimates θ̂N
θ̂N ∼ N (θ0 , Pθ )
E θ̂N = θ0 means that
p
1 (i)
lim θ̂N = θ0
p→∞ p
i=1

E (θ̂N − θ0 )(θ̂N − θ0 )T
∆
Pθ =

σe2 −1
= Ēψ(t, θ0 )ψ T (t, θ0 )
N
θ̂N unbiased estimate of θ0

with ψ(t, θ0 ) = ∂ ŷ(t,θ) = − ∂ ε(t,θ)

∂θ ∂θ
θ=θ 0 θ=θ 0

Estimates θ̂N distributed as θ̂N ∼ N (θ0 , Pθ ) with large Pθ
Interpretation of θ̂N ∼ N (θ0 , Pθ ) (con’t)

Pθ = E (θ̂N − θ0 )(θ̂N − θ0 )T
∆
Estimates θ̂N distributed as θ̂N ∼ N (θ0 , Pθ ) with small Pθ

the covariance matrix Pθ gives an idea of the standard deviation
between θ̂N and θ0 (see next slide)

Properties of the covariance matrix Pθ of θ̂N

59
ψ(t, θ0 ) =
ΛG (z, θ0 )
u(t) +
ΛH (z, θ0 )
e(t)
60
H(z, θ0 ) H(z, θ0 )
Property 1. Pθ is a function of the chosen input signal u(t) and
ΛG Λ∗ ΛH Λ∗
of the number N of data used for the identification. Now defining ΓG = HH ∗
G
and ΓH = HH ∗
H
and using Parseval
theorem
Proof: σe2 −1
Pθ = Ēψ(t, θ0 )ψ T (t, θ0 ) =⇒
G0 (z) − G(z, θ) H0 N
(t, θ) = u(t) + e(t) =⇒
H(z, θ) H(z, θ) −1
π
σ2

1 jω jω
Pθ = e ΓG (e , θ0 ) Φu (ω) + ΓH (e , θ0 ) σe2 dω
N 2π −π
−∂(t, θ) ΛG (z, θ0 ) ΛH (z, θ0 )

ψ(t, θ0 ) = = u(t) + e(t)
∂θ
θ=θ0 H(z, θ0 ) H(z, θ0 )
=⇒ Pθ function of u(t) and N
∂G(z,θ) ∂H(z,θ)
with ΛG (z, θ) = ∂θ
and ΛH (z, θ) = ∂θ We can therefore influence the value of Pθ by appropriately
choosing u(t) and N

7.2 Consistency property of the PEI estimate θ̂N
Property 2. The covariance matrix Pθ is a function of the

unknown true system S via σe2 and θ0 . θ̂N → θ0 with probability 1 when N → ∞
Property 3. A reliable estimate P̂θ of Pθ can nevertheless be

deduced using the data and θ̂N ≡

N
−1 If we could collect N = ∞ data from S, then the identified
σ̂e2 1
P̂θ = ψ(t, θ̂N )ψ T (t, θ̂N ) parameter vector θ̂N →∞ would have the following distribution:
N N t=1

N θ̂N →∞ ∼ N (θ0 , Pθ ) with Pθ = 0
with σ̂e2 = 1
N t=1 (t, θ̂N )2
In other words, θ̂N →∞ is a random variable whose realization is

always equal to θ0
7.3 Proof of the statistical properties of θ̂N when M is FIR
S: G0 (z) = a0 + b0 z −1 and H0 (z) = 1

Indeed:
N input-output data have been collected from S
σe2 T
−1
Pθ = Ēψ(t, θ0 )ψ (t, θ0 ) Full-order FIR model structure:
N
G(z, θ) = a + bz −1 H(z, θ) = 1
and N → ∞ =⇒
⎛ ⎞
a
θ=⎝ ⎠
b
Pθ → 0

Predictor:

ŷ(t, θ) = φ(t)T θ with φT (t) = u(t) u(t − 1)

The estimate θ̂N is obtained as follows:
⎡ ⎤−1
N N
⎢1 1
⎢ ⎥
T
=⎢ φ(t)φ (t)⎥
⎥
θ̂N ⎢ φ(t)y(t)
⎣ N t=1 N t=1
⎥
Note that the data y(t) and u(t) collected from S obey the
⎦

R
following relation:
θ0
⎛ ⎞
a0 What is the relation between θ̂N and θ0 ?
y(t) = φ(t)T ⎝ ⎠ +e(t) t = 1...N
b0
Replace y(t) by its expression:

=y(t)
⎛ ⎞
N
⎜1

θ̂N = R−1 ⎝ φ(t) φ(t)T θ0 + e(t) ⎠
⎟
N t=1


N
1

67
What are the moments of this normal distribution ?

68
−1
θ̂N = θ0 + R φ(t)e(t)
N t=1
Mean:
estimation error
N

1
E θ̂N = θ0 + E R−1 φ(t)e(t)
N
=⇒ t=1
θ̂N is a random variable and is (asymptotically) normally Since φ(t) and R are deterministic (not stochastic):
distributed
⎛ =0 ⎞
N
1

Indeed E θ̂N = θ0 + R−1 ⎝ φ(t) Ee(t)⎠
N t=1
• e(t) is a random process and
= θ0
• central limit theorem

Covariance matrix:

The FIR case is a very particular case: only the normal
Pθ = E (θ̂N − θ0 )(θ̂N − θ0 )T
∆
distribution is asymptotic in N while E θ̂N = θ0 and the
covariance matrix are valid ∀N
N N
R−1 R−1

σ2
T Note that Pθ = Ne R−1 converges when N → ∞ to the

Pθ = E φ(t)e(t) e(s)φ (s)
N t=1 s=1
N asymptotic expression

N N
R−1 R−1

φ(t) E(e(t)e(s)) φT (s) σe2

= Ēψ(t, θ0 )ψ T (t, θ0 )
−1
N t=1 s=1
N N
−1 N −1

R R

φ(t)φT (t) since

= σe2
N t=1
N
σe2 σe2 ŷ(t, θ) = φT (t)θ =⇒ ψ(t, θ) = φ(t) ∀θ
−1 −1 −1
= R RR = R
N N

Parametric uncertainty region

72
θ̂N close to θ0 if Pθ “small”

What happens when N → ∞ ?
⎛ ⎛ ⎞⎞ To determine how close, we can build an uncertainty region in

N the parameter space:
1 u(t)e(t)
θ̂N →∞ = θ0 + lim ⎝R−1 ⎝ ⎠⎠
N →∞ N u(t − 1)e(t)
t=1
θ̂N ∼ N (θ0 , Pθ ) ⇐⇒
random variable whose realisation is always 0

(θ0 − θ̂N )T Pθ−1 (θ0 − θ̂N ) ∼ χ2 (k)
with k the dimension of θ̂N

U = θ ∈ Rk | (θ − θ̂N )T Pθ−1 (θ − θ̂N ) ≤ α
(θ0 − θ̂N )T Pθ−1 (θ0 − θ̂N ) ∼ χ2 (k)
The uncertainty ellipsoid U is centered at the identified

the unknown true parameter vector θ0 lies therefore in the
parameter vector θ̂N and shaped by its covariance matrix Pθ
following ellipsoid U with probability, say, 95%
The largest Pθ , the largest the ellipsoid and thus the largest the
U = θ ∈ Rk | (θ − θ̂N )T Pθ−1 (θ − θ̂N ) ≤ α uncertainty

Remark: G(z, θ0 ) lies with the same probability in
with α such that P r(χ2 (k) < α) = 0.95. D = {G(z, θ) | θ ∈ U }

Example: Using these data, we have computed the estimate θ̂N of

T
θ0 = 0.3 , 0.7 along with its (estimated) covariance
0.7z −1 1 matrix Pθ :
S: y(t) = u(t) + e(t)
1+ 0.3z −1 1 + 0.3z −1 ⎛ ⎞ ⎛ ⎞
0.301 0.4922 0.0017
θ̂N = ⎝ ⎠ Pθ = 10−3 ⎝ ⎠
⎛ ⎞ 0.733 0.0017 0.6264
bz −1 a
M : G(z, θ) = 1+az −1
H(z, θ) = 1
1+az −1
θ=⎝ ⎠
b The 95% uncertainty region U can then be constructed

we have applied a sequence u(t) of length N = 1000 to S and

U = θ ∈ Rk | (θ − θ̂N )T Pθ−1 (θ − θ̂N ) ≤ 5.99
we have measured the corresponding y(t).

The estimate θ̂N ( blue cross ) along with its uncertainty
ellipsoid U in the parameter space
8 Statistical distribution of the identified model when S ∈ M
1 the identified parameter vector θ̂N is a random variable

0.95
distributed as θ̂N ∼ AsN (θ0 , Pθ ) =⇒
0.9
0.85
0.8 the identified models G(z, θ̂N ) (and H(z, θ̂N )) are also random
0.75 variables:
b
0.7
0.65
• G(z, θ̂N ) is an (asymptotically) unbiased estimate of
0.6
G(z, θ0 )
0.55
• the variance of G(z, θ̂N ) is defined in the frequency domain

0.5
0 0.05 0.1 0.15 0.2 0.25
a
0.3 0.35 0.4 0.45 0.5
as:

cov(G(ejω , θ̂N )) = E |G(ejω , θ̂N ) − G(ejω , θ0 )|2
∆
The in practice unknown θ0 is represented by the red circle and

lies in U as expected
cov(G(ejω , θ̂N )) can be expressed as a function of Pθ :

Properties of cov(G(ejω , θ̂N ))
cov(G(ejω , θ̂N )) = ΛG (ejω , θ0 ) Pθ Λ∗G (ejω , θ0 )

Property 1. cov(G(ejω , θ̂N )) is a function of the chosen u(t)
and of the number N of data used for the identification.
∂G(z,θ)
with ΛT
G (z, θ) = ∂θ direct consequence of the fact that Pθ is a function of these
quantities

(obtained using a first order approximation and the assumption that N
is large enough)

More speaking relation between the choice of u(t) and of N
and cov(G(ejω , θ̂N )) Property 2. cov(G(ejω , θ̂N )) is a function of the unknown S
Obtained by assuming that the MacMillan degree n of the Property 3. An estimate of cov(G(ejω , θ̂N )) can nevertheless
model G(z, θ) in M → ∞ be computed using the data and θ̂N
n Φv (ω) cov(G(ejω , θ̂N )) ≈ Λ∗G (ejω , θ̂N ) P̂θ ΛG (ejω , θ̂N )

cov G(ejω , θ̂N ) ≈
N Φu (ω)

Comparison with non-parametric identification:

9 Validation of the identified model when S ∈ M
• cov(G(ejω , θ̂N )) → 0 when N → ∞ (even for

non-periodic signal) We have identified a model G(z, θ̂N ) in M using Z N and we
have verified that S ∈ M (see later).
• the modeling error at ω1 is correlated to the error at ω2 due
to the parametrization Important question: Is G(z, θ̂N ) close to G(z, θ0 ) ?



86
Validation using cov(G(ejω , θ̂N )

More precizely, since G(z, θ̂N ) is normally distributed, we have
cov(G(ejω , θ̂N )) = E |G(ejω , θ̂N ) − G(ejω , θ0 )|2
∆
at each frequency ω that
Consequently, at each frequency ω:

jω
the modeling error |G(e , θ0 ) − G(e jω
, θ̂N )| is very likely to |G(ejω , θ0 )−G(ejω , θ̂N )| < 1.96 cov(G(ejω , θ̂N ) w.p. 95%
be small w.r.t. |G(ejω , θ̂N )|

if
cov(G(ejω , θ̂N ) is thus a measure of the modeling error and

allows to deduce uncertainty bands around the frequency
the standard deviation cov(G(ejω , θ̂N ) of G(ejω , θ̂N ) is small response of the identified model G(z, θ̂N )
w.r.t. |G(ejω , θ̂N )|


What is a small standard deviation cov(G(ejω , θ̂N ) (or a
87
small modeling error) w.r.t. |G(ejω , θ̂N )|? What to do if the variance appears too large ?
Highly dependent on the expected use for the model !! If the variance cov(G(ejω , θ̂N )) appears too large, then we can
not guarantee that G(z, θ̂N ) is a close estimate of G0 (z)
For example, if we want to use
the model for control, the
modeling error (measured by cov(G(ejω , θ̂N )) has to be A new identification experiment has then to be achieved in
much smaller around the cross-over frequency than at the other order to obtain a better model
frequencies

For this purpose, we have to take care that the variance in this
See the literature on “identification for robust control” to know new identification is smaller

how large cov(G(ejω , θ̂N ) may be

How can we reduce the variance of the identified model in a
Example
new identification ? Let us consider the same flexible transmission system S (in the
ARX form)
n Φv (ω) Let us consider a full order model structure M for S

jω
cov G(e , θ̂N ) ≈
N Φu (ω)
We want to use G(z, θ̂N ) for control
√
jω cov(G(ejω ,θ̂N )
Consequently, cov(G(e , θ̂N )) can be reduced by In this example, we need |G(ejω ,θ̂N )|
< 0.1 ∀ω ∈ [0 1]

• increasing the number of data N ; First identification experiment
• or increasing the power spectrum Φu (ω) of the input signal 2
We apply a white noise input signal u(t) of variance σu = 0.005
at the frequencies where cov(G(ejω , θ̂N )) was too large to S, collect N = 2000 IO data and identify a model G(z, θ̂N )

in M
Validation of the identified model G(z, θ̂N ):

we compare cov(G(ejω , θ̂N ) (blue) and |G(z, θ̂N )| (red):
Second identification experiment
identified G (red); standard deviation (blue)
2
10
1
10
We want to reduce the variance of the identified model
0
10
Let us for this purpose increase the power of u(t):

−1
10
−2
10
2
We apply a white noise input signal u(t) of variance σu = 1 to
S, collect N = 2000 IO data and identify a model G(z, θ̂N )

−3
in M
10
−2 −1 0
10 10 10
omega

cov(G(ejω , θ̂N ) is too large !!!


Third identification experiment
2
10
We want to reduce the variance of the identified model further
around the 1st peak
1
10
0
10
Let us for this purpose increase the power of u(t) around this
−1
10
first peak:
−2
10
u(t) = white noise of the 2nd experiment +sin(0.3t)+sin(0.4t)

−3
10
−2 −1 0
10 10 10
omega
We apply this input signal u(t) to S, collect N = 2000 IO data

cov(G(ejω , θ̂N ) is better, but still too large at the 1st peak and identify a model G(z, θ̂N ) in M

for our control purpose!!!

2
10
Final note:
1
10
Similar analysis can be made for H(ejω , θ̂N ) using

cov(H(ejω , θ̂N ))
0
10
−1
10
−2
10
cov(H(ejω , θ̂N )) can be deduced using a similar reasoning as
for cov(G(ejω , θ̂N ))

−3
10
−2 −1 0
10 10 10
omega

cov(G(ejω , θ̂N ) is now OK for our control purpose!!!

' 10 A special case of undermodelling
$ ' Define, as before, the ideal identification criterion:
$
10.1 Identification in a model structure M which does not
contain S: S 6∈ M θ ∗ = arg min Ēǫ2 (t, θ)
θ
and the estimate θ̂N of θ ∗ :

S 6∈ M ⇐⇒ there does not exist a θ0 such that
N
1 X
θ̂N = arg min ǫ(t, θ)2
G(z, θ0 ) = G0 (z) and H(z, θ0 ) = H0 (z)
θ N t=1
Statistical properties of θ̂N w.r.t. θ ∗
& % & %
• θ̂N → θ ∗ w.p. 1 when N → ∞
Consider a model structure M = {G(z, θ) ; H(z, θ)} such
• θ̂N ∼ AsN (θ ∗ , Pθ ) (Pθ having a more complicate
that S 6∈ M and an input signal u(t) sufficiently exciting of
expression than when S ∈ M)
order ≥ ng
'
$ '
$
100
Since S 6∈ M, we have in general: 10.2 Special case of undermodelling: S 6∈ M with G0 ∈ G
G(z, θ ∗ ) 6= G0 (z) and H(z, θ ∗ ) 6= H0 (z) The model structure M used for identification purpose is
such that
One exception though: ∃θ0 such that G(z, θ0 ) = G0 (z) but H(z, θ0 ) 6= H0 (z)
& % & %
S 6∈ M with G0 ∈ G and M OE, BJ or FIR

' $ ' Result:
True system S: y = G0 u(t) + H0 e(t)
$
What can be said about θ ∗ in this special case ?
To answer this question, we distinguish two classes of model Chosen model structure M = { G(z, θ), H(z, θ) } such that
structures M: ∃θ0 with G(z, θ0 ) = G0 (z) but H(z, θ0 ) 6= H0 (z).
• M with no common parameters in G(θ) and H(θ) (i.e.

OE, BJ, FIR) • if M is OE, BJ or FIR, then
   
η η ∗
θ=  G(θ) = G(η) H(θ) = H(ζ) θ∗ =   G(z, η ∗ ) = G0 H(z, ζ ∗ ) 6= H0

ζ ζ ∗
& % & %
• M with common parameters in G(θ) and H(θ) (i.e. ARX, • if M is ARX or ARMAX, then
ARMAX) G0
z }| {
G(z, θ ) 6= G(z, θ0 ) H(z, θ ∗ ) 6= H0
∗
'
$ '
$
104
Example
Using these IO data, we have identified a model in two model
z −3 0.103 + 0.181z −1 structures such that S 6∈ M with G0 ∈ G:
y(t) = u(t)+v(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4
Marx = ARX(na = 4, nb = 2, nk = 3)
with v(t) = H0 (z)e(t); H0 (z) very complicate i.e. S is not
ARX, not OE !!!
Moe = OE(nb = 2, nf = 4, nk = 3)
2
We have applied a powerful white noise input signal (σu = 5) to arx
Let us denote G(z, θ̂N oe
) and G(z, θ̂N ), the models identified
& % & %
S and collected a large number of IO data (N = 5000) =⇒ in Marx and Moe , respectively.
small variance =⇒ θ̂N ≈ θ ∗

' arx
Bode plots of G(z, θ̂N
G0 (z) (red)
oe
) (blue) and G(z, θ̂N ) (black) and
$
From u1 to y1
2
10
0
10
Amplitude
−2
10
−4
10
−2 −1 0 1
10 10 10 10
−200
Phase (degrees)
−400
−600
& %
−800
−2 −1 0 1
10 10 10 10
Frequency (rad/s)
oe arx
As expected, we obtain G(z, θ̂N ) ≈ G0 (z) and G(z, θ̂N ) very
different from G0

' $ ' 11.1 Model structure validation: an a-posteriori verification
$
11 Choice and validation of model order and structure
Assume that we have identified a parameter vector θ̂N in a

Until now, we have posed assumptions on the property of the model structure M = { G(θ), H(θ) } with N data Z N
model structure M w.r.t. S: collected on the true system S: y(t) = G0 u(t) + H0 e(t).
• S∈M
Model structure validation: based on θ̂N and Z N , determine if
• S 6∈ M with G0 ∈ G
the chosen model structure M is such that:
• S 6∈ M with G0 6∈ G
• S ∈ M or
& % & %
How can we verify these assumptions ? • S 6∈ M with G0 ∈ G
a solution: model structure validation • S 6∈ M with G0 6∈ G
'
11.2 Model structure validation in the asymptotic

case (N → ∞)
$ '
Situation A
We observe:
$
109

 σ2 for τ = 0
e
The identified parameter vector is then θ ∗ Rǫ (τ ) = σe2 δ(τ ) =
 0 elsewhere
Model structure validation is performed by considering Rǫ (τ )
Rǫu (τ ) = 0 ∀ τ
and Rǫu (τ ) of ǫ(t, θ ∗ ):
This situation occurs when
ǫ(t, θ ) = H(θ )
∗ ∗ −1
(y(t) − G(θ )u(t))
∗
G0 − G(θ ∗ ) H0
Due to the fact that ǫ(t, θ ∗ ) = u(t) + e(t)
H(θ ∗ ) H(θ ∗ )
& % & %
G0 − G(θ ) ∗
H0
ǫ(t, θ ∗ ) = u(t) + e(t), = 0 × u(t) + e(t)
H(θ ∗ ) H(θ ∗ )
three situations can occur for these quantities Rǫ (τ ) and ⇐⇒ G(θ ∗ ) = G0 and H(θ ∗ ) = H0
Rǫu (τ )
⇐⇒ S ∈ M

' Situation B
We observe:
$ ' Situation C
We observe:
$
Rǫ (τ )6=σe2 δ(τ ) Rǫ (τ )6=σe2 δ(τ )
Rǫu (τ ) = 0 ∀ τ ∃τ s.t. Rǫu (τ )6=0
This situation occurs when This situation occurs when

G0 − G(θ ∗ ) H0 6=0
ǫ(t, θ ∗ ) = u(t) + e(t)
H(θ ∗ ) H(θ ∗ )
z }| {
G0 − G(θ ∗ ) H0
6=1 ǫ(t, θ ∗ ) = u(t) + e(t)
z }| { H(θ ∗ ) H(θ ∗ )
H0
& % & %
= 0 × u(t) + e(t) ⇐⇒ G(θ ∗ ) 6= G0
H(θ ∗ ) 
 either S 6∈ M with G ∈ G for M ARX or ARMAX
⇐⇒ G(θ ) = G0 and H(θ ) 6= H0
∗ ∗ 0
⇐⇒
 or S 6∈ M with G0 6∈ G
' $ ' $
⇐⇒ S 6∈ M with G0 ∈ G for M OE, BJ or FIR
Conclusions for the asymptotic case:

2) M is chosen as ARX or ARMAX:
1) M is chosen as OE, FIR or BJ:
Situations A and C can occur for Rǫ (τ ) and Rǫu (τ )
Situations A, B and C can occur for Rǫ (τ ) and Rǫu (τ ) By determining in which situations we are, we verify whether
By determining in which situations we are, we verify whether the identification of θ ∗ has been performed in a M such that
the identification of θ ∗ has been performed in a M such that • S ∈ M (situation A)
• S ∈ M (situation A) • S 6∈ M (situation C)
& % & %
• S 6∈ M with G0 ∈ G (situation B)
No distinction can be made between G0 ∈ G and G0 6∈ G
• or S 6∈ M with G0 6∈ G (situation C)

' 11.3 Model structure validation in the practical case N < ∞
$ ' $
The identified parameter vector is θ̂N which is an unbiased
estimate of θ ∗
What do these 99%-confidence regions represent ?
Model structure validation is performed by considering R̂ǫN (τ )
N
and R̂ǫu (τ ) of ǫ(t, θ̂N ): w.p.
R̂ǫN (τ ) lies in its confidence region ∀τ =⇒ Rǫ (τ ) = σe2 δ(τ )
PN −τ N w.p.
N
(τ ) = 1 R̂ǫu (τ ) lies in its confidence region ∀τ =⇒ Rǫu (τ ) = 0 ∀τ
R̂ǫu N t=1 ǫ(t + τ, θ̂N )u(t)
N −τ
& % & %
1
R̂ǫN (τ ) =
P
N t=1 ǫ(t + τ, θ̂N )ǫ(t, θ̂N )
' $ ' $
and by considering 99%-confidence regions for these estimates
Based on the results of the asymptotic case, we can therefore

deduce:
To construct these confidence regions, we use the following 1) when M is OE, FIR, or BJ
result:
both R̂ǫN (τ ) and R̂ǫu
N
(τ ) are in their confidence regions ∀τ
√ R̂N w.p.
ǫ (τ ) =⇒ S ∈ M
if Rǫ (τ ) = σe2 δ(τ ), then N R̂N
∼ AsN (0, 1).
ǫ (0)
√ N
if Rǫu (τ ) = 0 ∀ τ , then N
N R̂ǫu (τ ) ∼ AsN (0, P ) with an R̂ǫu (τ ) is in its confidence regions ∀τ while R̂ǫN (τ ) is not
w.p.
estimable P . completely in its confidence region =⇒ S 6∈ M with G0 ∈ G
& % & both R̂ǫN (τ ) and R̂ǫu
%
N
(τ ) are not completely in their confidence
w.p.
regions =⇒ S 6∈ M with G0 6∈ G

' $ ' $
2) when M is ARX or ARMAX
11.4 Example of how we can find a M s.t. S ∈ M

both R̂ǫN (τ ) and R̂ǫu
N
(τ ) are in their confidence regions ∀τ
w.p.
=⇒ S ∈ M
Let us consider an unknown true system S
w.p.
other cases =⇒ S 6∈ M
We would like to determine a model set M which contains S
& % & %
No distinction can be made between G0 ∈ G and G0 6∈ G
'
First analysis of the system

Let us apply a step input signal u(t) to S and observe y(t)
$ '
$
121
step response: u(t) (red) and y(t) (blue)

Collection of the data for the identification and determination
80
of M
70
60
50 We have applied a white noise input signal to S and collected

40
N = 5000 input-output data =⇒ Z N
30
20
10
Based on the first analysis of S, first choice for M:
0
M = BJ (nb = 2, nc = 2, nd = 2, nf = 2, nk = 3)
& % & %
−10
0 20 40 60 80 100 120 140 160 180 200
t
From this behaviour, we can conclude that G0 has a limited We can identify a parameter vector θ̂N in this M using Z N
order and from a detailed observation, we see that the delay is
nk = 3

' Does this M contain the true system S ?
Let us perform the model structure validation (Matlab function:
$ ' $
resid)
Correlation function of residuals. Output y1

1.2
0.8
Let us increase the order for G(z, θ) and H(z, θ)
0.6
0.4
0.2
0 M = BJ (nb = 3, nc = 3, nd = 3, nf = 3, nk = 3)
−0.2
0 5 10 15 20 25
lag
Cross corr. function between input u1 and residuals from output y1

0.5 and identify θ̂N in this new model structure using the same data
0.4
0.3
ZN
0.2
& % & %
0.1
−0.1
−0.2
−25 −20 −15 −10 −5 0 5 10 15 20 25
lag
' $ ' $
w.p.
=⇒ S 6∈ M with G0 6∈ G
Let us perform the model structure validation of this new M:

1
0.8
0.6
0.4
0.2 A third order H(z, θ) is thus not sufficient to describe H0 (z).

0
−0.2 Let us try:

−0.4
0 5 10 15 20 25
lag

0.04
M = BJ (nb = 3, nc = 4, nd = 4, nf = 3, nk = 3)
0.02
−0.02
and identify θ̂N in this new model structure using the data Z N
& % & %
−0.04
−25 −20 −15 −10 −5 0 5 10 15 20 25
lag
w.p.
=⇒ S 6∈ M with G0 ∈ G

' Let us perform the model structure validation of this new M:
$ ' $
1.2
0.8
0.6
0.4
0.2
0
By a simple iteration, we can find a model set M that has the
−0.2
0 5 10
lag
15 20 25 property S ∈ M
0.04
0.02
Note: the used S was indeed BJ(3,4,4,3,3) !!

0
−0.02
& % & %
−0.04
−25 −20 −15 −10 −5 0 5 10 15 20 25
lag
w.p.
=⇒ S ∈ M
'
11.5 Final remarks.

$ '
12 A typical procedure to identify a reliable full-order model

$
129
Model structure validation validates the hypothesis S ∈ M

For some type of systems, a reasonable objective can be to
based on the available data
identify reliable full-order models G(z, θ̂N ) and H(z, θ̂N ) of G0
and H0
Other data can be used for the validation than for the
identification
To reach this objective:
Model structure validation is often called model validation

Model structure validation allows to determine a model set M
such that S ∈ M
However
& % & %
a successful model structure validation does not necessarily q
imply that G(z, θ̂N ) and H(z, θ̂N ) are close estimates of and cov(G(ejω , θ̂N )) allows one to verify whether G(z, θ̂N ) is
q
G0 (z) = G(z, θ0 ) and H0 (z) = H(z, θ0 ) (variance can be still close to G0 (and eventually cov(H(ejω , θ̂N )) for H(z, θ̂N ))
large !!!)

' $ ' $
=⇒ Typical iterative procedure
1. choose the input signal and collect Z N Possible additional tests for item 5:
2. choose a model structure M • simulation of the identified model
3. identification of the models G(z, θ̂N ) and H(z, θ̂N ) • observation of the poles and zeros of the identified models
4. Verify if S ∈ M. If it is the case, go to item 5. If not, go • comparison of the frequency response of the identified
to item 2 and choose another model structure M models with the ETFE (see later) and/or with the physical
equations.
q
5. Verify if cov(G(ejω , θ̂N )) (and eventually
& % & %
q
cov(H(ejω , θ̂N ))) are small. If not, go back to item 1. If
yes, stop

' 13 Identification in a low order model structure
$ ' • not necessary: for control, a low order model accurate in
$
the frequencies around the cross-over frequency is sufficient
Some real-life systems have a very large order (e.g. chemical =⇒

and industrial plants)
For that type of S,
For such plants, identifying a reliable full-order model is:
• not a good idea since cov(G(ejω , θ̂N )) will be typically • choose a reduced order M which is nevertheless sufficiently
very large rich to be able to represent the behaviour of the system in
the important frequency range
& % & %
n Φv (ω)
cov(G(ejω , θ̂N )) ≈ • perform the identification experiment in such a way that the
N Φu (ω)
identified model is a close estimate of S in the important
with n large and N , Φu (ω) limited frequency range
'
$ '
$
135
Considered problem: What is the influence of the experimental

conditions (choice of u(t), choice of N ) on the approximation
of G0 (z) by G(z, θ̂N ) when:
We restrict thus attention to:
v(t)
z }| {
S: y(t) = G0 (z)u(t) + H0 (z)e(t) • to the approximation of G0 by G(z, θ̂N )
• to Output Error (OE) model structure M (reason: easier
analysis)
& % & %
and M = {G(z, θ) ; H(z, θ) = 1} is an OE model structure
such that 6 ∃θ0 with G(z, θ0 ) = G0 (z)

' Reminder from before ....
$ ' $
θ̂N can be computed as in the case S ∈ M 13.1 Modeling error when S 6∈ M with G0 6∈ G
θ̂N is a random variable due to the stochastic disturbance v(t) the modeling error G0 (z) − G(z, θ̂N ) is decomposed into two
corrupting the data contributions:
θ̂N is distributed as N (θ ∗ , Pθ ) where θ ∗ is the solution of the

G0 (z)−G(z, θ̂N ) = (G0 (z) − G(z, θ ∗ ))+ G(z, θ ∗ ) − G(z, θ̂N )
ideal identification criterion
& % & %
Pθ can not be determined analytically, but Pθ → 0 when
N → ∞ =⇒ θ̂N → θ ∗ w.p. 1 when N → ∞ Note: when S ∈ M, G0 (z) − G(z, θ ∗ ) = 0
' $ ' $
6 ∃θ0 with G(z, θ0 ) = G0 (z) =⇒ G(z, θ ∗ ) 6= G0 (z)
the two contributions and their source:

G0 (z)−G(z, θ̂N ) = (G0 (z) − G(z, θ ∗ ))+ G(z, θ ∗ ) − G(z, θ̂N )
Considered problem (rephrased): what is the influence of the
experimental conditions
• G0 − G(θ ∗ ) is called the bias error and is due to the fact • on the bias error
that S 6∈ M with G0 6∈ G; • on the variance error
& % & %
• G(θ ∗ ) − G(θ̂N ) is called the variance error and is due to
the fact that N < ∞

' 13.3 shaping the bias error G0 − G(θ ∗ )
$ ' M = OE =⇒
$
Recall we consider an OE model structure M ǫ(t, θ) = (G0 (z) − G(z, θ))u(t) + v(t)
13.3.1 a frequency domain expression of the bias error
G0 (ejω ) − G(ejω , θ ∗ )
=⇒
θ ∗ = arg min V̄ (θ)
θ
and
π
& % & %
Z
V̄ (θ) = Ēǫ(t, θ)2 ∗ 1
θ = arg min Φǫ (ω, θ)dω
Z π θ 2π
1 −π
= Φǫ (ω, θ)dω 1
Z π
2π −π = arg min |G0 (ejω ) − G(ejω , θ)|2 Φu (ω) + Φv (w)dω
θ 2π −π
' $ ' $
(Parseval; both expressions are equal to Rǫ (0))
Z π
1 Notes:
θ ∗ = arg min |G0 (ejω ) − G(ejω , θ)|2 Φu (ω) + Φv (w)dω
θ 2π −π
• the bias error is a function of the power spectrum Φu (ω) of

=⇒ the input signal used for the identification
• the bias obtained with a signal u(t) of spectrum Φu (ω) is
G(ejω , θ ∗ ) is the model minimizing the integrated quadratic the same as the bias obtained with spectrum αΦu (ω) (α a
error |G0 (ejω ) − G(ejω , θ)|2 with weighting function Φu (ω) scalar constant)
=⇒ • the absolute level of power has thus no influence on the
& % & %
the bias will be the smallest at those ω’s where Φu (ω) is bias error, but influences the variance error
relatively the largest

' 13.3.2 Another way to shape the bias error - off-line prefiltering
$ ' Proof:
$
If we use the data uF (t) and yF (t) for the identification, the
Given a filter L(z) and the data u(t) and y(t) collected from S
corresponding prediction error ǫF (t, θ) is
Filter u(t) and y(t) with L: ǫF (t, θ) = L(z)ǫ(t, θ)
uF (t) = L(z)u(t) and yF (t) = L(z)y(t) where ǫ(t, θ) is the prediction error if we would have used u(t)
and y(t)
Result:
If you use the data uF (t) and yF (t) for the identification, the Consequently,
& % & %
weighting function shaping the bias error is:
ΦǫF (ω, θ) = |L(eiω )|2 · Φǫ (ω, θ)
iω 2
W (ω) = Φu (ω)|L(e )|
' $ ' $
and therefore W (ω) = Φu (ω)|L(eiω )|2
13.5 Example
S: y(t) = G0 (z)u(t) + e(t)
13.4 shaping the variance error G(θ ∗ ) − G(θ̂N )
with G0 (z) 4th order with three delay

Analysis more difficult than in the case S ∈ M
We have to use a given set of data Z N (N = 5000) for the
identification where u is the sum of a white noise of variance 5
However we can nevertheless cautiously state that and three high-frequencies sinus of amplitude 10
• large Φu (ω) around ω =⇒ small variance error around ω
Objective: Using the given data, identify a good model
& % & %
• large N =⇒ small variance error
G(z, θ̂N ) for G0 (z) in the frequency range [0 0.7] in the
reduced order model structure:
M = OE(nb = 2, nf = 2, nk = 3)

' $ ' G0 (z) (red) and G(z, θ̂N ) (blue) identified with the filtered
$
data
Since Z N is given, the only degree of freedom we have is to use 1
From u1 to y1
10
a pre-filter L(z) to shape the bias error 10

0
Amplitude
We want a small bias error in the frequency range [0 0.7] =⇒ 10
−1
−2
10
choose L(z) such that |L(ejω )|2 Φu (ω) is relatively (much) −3

10
−2 −1 0 1
larger in the frequency range [0 0.7] than in [0.7 π] 10 10 10 10
200
=⇒ L(z) Butterworth low pass filter of order 7 and cut-off 100
Phase (degrees)
0
frequency 0.7rad/s −100
−200
& % & %
We filter u and y collected from S by this L and we obtain −300
filtered data with which we perform the identification in M −400

10
−2
10
−1
Frequency (rad/s)
0
10
1
10
' $ ' $
=⇒ G(z, θ̂N ) is OK
What if we do not use a pre-filter L ?
G0 (z) (red) and G(z, θ̂N ) (blue) identified with the data in Z N
1
From u1 to y1 13.6 What about a Box Jenkins model structure
10
10
0
The weighting function W (ω) for the bias error
Amplitude
10
−1
G0 (ejω ) − G(ejω , θ ∗ ) is then
−2
10
10
−3
10
−2 −1
10
0
10
1
10
Φu (ω)|L(eiω )|2
W (ω) =
200
|H(ejω , θ ∗ )|2
100
Phase (degrees)
the noise model H(ejω , θ ∗ ) influences the bias error of the

0
& % & %
−100
−200
G-model !!
−300
−400
−2 −1 0 1
10 10 10 10
Frequency (rad/s)
=⇒ G(z, θ̂N ) is KO

' $ ' $
General objective of ETFE
S: y(t) = G0 (z)u(t) + v(t)
We apply an input signal u(t) to S and we collect the

Part IV: Nonparametric Identification (ETFE) corresponding output for N time samples:
Z N = { y(t), u(t) | t = 0...(N − 1) }
& % & %
Based on these N time-domain data, we want to estimate the
frequency response G0 (ejω ) (amplitude and phase) of the true
plant transfer function
'
Sc4110: part IV
$ '
1 Sc4110: part IV
$
2
Empirical Transfer Function Estimate (ETFE)
Time-Domain data −→ Frequency-Domain data via (scaled)

Nonparametric identification is generally performed in order Fourier Transform
• to have a first idea of G0 (ejω ) N −1
1 X
{ u(t) | t = 0...(N − 1) } ←→ UN (ω) = √ u(t) e−jωt
• to determine the frequency band of interest N t=0
& % & %
N −1
1 X
{ y(t) | t = 0...(N − 1) } ←→ YN (ω) = √ y(t) e−jωt
N t=0
Sc4110: part IV 3 Sc4110: part IV 4

' $ ' Practical Aspects
$
All information contained in { u(t) | t = 0...(N − 1) } is
Estimate Ĝ(ejω ) of G0 (ejω ) contained in the elements of UN (ω) at the N 2
frequencies
2π
ωk = N k, k = 0, 1, ... located in [0 π]
jω YN (ω)
Ĝ(ejω ) = |Ĝ(ejω )|ej∠Ĝ(e )
= Ĝ(ejω ) is therefore only computed at those frequencies ωk
UN (ω)
Special attention should be given when u(t) is a periodic signal

Ĝ(ejω ) can in theory be computed at each frequency ω ∈ [0 π] of fundamental frequency ω0
& % & %
for which UN (ω) 6= 0
The Fourier transform UN (ω) of such a signal is indeed only
significant at the (active) harmonics of ω0 . Ĝ(ejω ) will
' $ ' $
therefore only be computed at those harmonics.
Illustration
2π
Input signal 1: a multisine of fundamental frequency ω0 = 100
≈ 0.06
G0 (z)
(power=0.5)
z }| {

z −3 0.103 + 0.181z −1
y(t) = u(t) + H0 e(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4
30
1 X
u(t) = √ sin(kω0 t)
30 k=1
with H0 = 1/den(G0 ) and e(t) a white noise disturbance of
variance σe2 = 0.1
& % & %
The ETFE is computed at the 30 harmonics of ω0 present in
We collect N = 10000 data on this true system subsequently u(t)
with two different input signals having the same
Pu = 0.5 = 5σe2

' 10
2
$ ' $
ETFE Modulus
1
10
0
10
−1
10
−2 −1 0
10 10 10
ω
2
10
Input signal 2: a white noise of variance 0.5

ETFE Modulus
1
10
0
10
N
10
−1 The ETFE is computed at all the 2
= 5000 frequencies ωk
−2 −1 0
10 10 10
ω
& % & %
Above plot: the ETFE at the 30 harmonics of ω0 ; Bottom plot:
the same with the frequency response of G0 (ejω ) (blue)
We see that the ETFE is a good estimate of G0 (ejω ) at the
' $ ' $
harmonics of ω0
2
10
1
10
ETFE Modulus
0
10
−1
10
−2
10
−2 −1 0
10 10 10
ω
2
10
How can we explain this?
1
10
ETEF Modulus
0
10
−1
10
For this purpose, we need to understand the statistical
−2
10
−2 −1 0
properties of the ETFE
10 10 10
ω
& % & %
Above plot: the ETFE at ωk ; Bottom plot: the same with the
frequency response of G0 (ejω ) (blue)
We see that the ETFE is an erratic and poor estimate of
G0 (ejω )

' $ ' Moreover,
$
there is no (cor)relation between the estimate at the frequency
Statistical properties of the ETFE ωk and the other frequencies i.e. ωk−1 , ωk+1 , ...
At one frequency ωk , the estimate Ĝ(ejωk ) is a random variable

Due to the stochastic noise v(t) corrupting the data Z N , the (asymptotically) distributed around G0 (ejωk )
ETFE Ĝ(ejω ) is a random variable i.e.
=⇒
& % & %
the ETFE is different at each experiment
the ETFE will be reliable if the variance of the estimates
' $ ' $
Ĝ(ejωk ) are small for all ωk
Explanation of the results in the illustration
Variance of the ETFE P30

Multisine: u(t) = √1 sin(kω0 t)
30 k=1
∆
the variance cov(Ĝ(ejω )) = E|Ĝ(ejω ) − E Ĝ(ejω )|2 is given by:
The ETFE is only computed at the harmonics ωk = k ω0
! (k = 1...30) of ω0 .
jω |VN (ejω )|2
cov(Ĝ(e )) = E
|UN (ejω )|2
Property of |UN |2 at the harmonics ωk :
N A2k 10000
with VN (ω) defined as YN (ω) and UN (ω) |UN (ejωk )|2 =

& % & %
=
4 120
Φv (ω)
cov(Ĝ(ejω )) tends, for increasing values of N , to Φu (ω)
√
since the amplitude Ak of each sine is 1/ 30 and N = 10000

' What is the variance of the ETFE at the available frequencies
$ ' 2
u(t) white noise of variance σu = 0.5
$
ωk ?
N
The ETFE is computed at 2
= 5000 frequencies ωk
E|UN |2 = |UN |2 since u(t) is deterministic

Since N is large, the variance at the frequencies ωk can be
approximated by:
=⇒
Φv (ωk ) Φv (ωk ) Φv (ωk )

E |VN (ejω k )|2 cov(Ĝ(ejωk )) ≈

jω k Φv (ωk ) 120Φv (ωk ) = = = 2Φv (ωk )
cov(Ĝ(e )) = ≈ = Φu (ωk ) 2
σu 0.5
|UN (ejω k )|2 |UN (ejω k )|2 10000
& % & %
Since |UN |2 is proportional to N and A2k , the variance is
1 Unlike for a multisine u(t), the variance is not proportional to
proportional to N and A12 1
; variance only proportional to σ12
' $ ' $
k
N u
Suppose u(t) is not free to be chosen and is stochastic,

Multisine vs. stochastic signal
and that the power of u(t) cannot be increased
ETFE available at more frequencies for stochastic u(t)
For equal power, variance much smaller for multisine u(t) How can we then get a relatively good estimate? How can we
& % & %
reduce the variance ?

' Smoothing of ETFE through the use of windows
$ ' The averaging can be performed as follows:
$
Z π
only really relevant when u(t) is stochastic Wγ (ξ − ω)Ĝ(eiξ )dξ
Ĝsm (ejω ) =
−π
Z π
Principle: reduction of the variance by averaging over Wγ (ξ − ω)dξ
−π
neighbouring frequency points
Smoothing is motivated by: with Ĝ(ejω ) the unsmoothed ETFE and Wγ (ω) a positive
& % & %
• ETFE estimates are independent for different ωk ’s real-valued frequency-function (window)
• Averaging over a frequency area where G0 is constant

reduces the variance A Hamming window is generally chosen for Wγ (ω)
'
Sc4110: part IV
40
Hamming frequency window for different resolutions
$ '
21 Sc4110: part IV
The window is non zero in an interval [−∆ω, +∆ω] around 0.

$
22
35
30
25 The larger γ, the smaller ∆ω.

20
15 Z π
10 Wγ (ξ − ω)Ĝ(eiξ )dξ
Ĝsm (ejω ) =
−π
5
Z π
0
Wγ (ξ − ω)dξ
−π
−5
−1 −0.5 0 0.5 1
frequency (rad/sec)
& % & %
Wγ (ω) of Hamming window for γ = 10 (solid), γ = 20 Ĝsm (ejωk ) at a particular frequency ωk is obtained by averaging
(dash-dotted) and γ = 40 (dashed). Ĝ(ejω ) in the interval [ωk − ∆ω, ωk + ∆ω]
γ is measure for the width of the window.

' $ ' Illustration (cont’d): consequence of the use a too wide window
γ = 10 on the ETFE of slide 11
$
2
10
Smoothed ETFE Modulus

• Window introduced bias in an attempt to reduce the 1
10
variance (bias/variance trade-off) 0

10
−1
10
−2 −1 0
10 10 10
• Choice of window dependent on expected smoothness of ω
2
G0 (eiω )
10

1
10
0
10
• Window too narrow: variance too large
& % & %
−1
10
Window too wide: possible smoothing of dynamics −2
10
−1
10
0
10
ω
Above plot: the smoothed ETFE at ωk ; Bottom plot: the same
' $ ' $
with the frequency response of G0 (ejω ) (blue)
To find this way, note that
YN (ω)
Ĝ(ejω ) =
UN (ω)
Besides trial-and-error coupled with physical insights on G0 (z), YN (ω)UN
∗
(ω)
=
UN (ω)UN
∗
(ω)
is there another way to select γ? +∞
X
N
R̂yu (τ ) e−jωτ
τ =−∞
= +∞
X
N
R̂u (τ ) e−jωτ
Yes....
& % & %
τ =−∞
where the last step follows from expressions (3.13) and (3.19) in
the lecture note.

' +∞
X
N
R̂yu (τ ) e−jωτ
$ ' Interpretation
$
τ =−∞
Ĝ(ejω ) = SPA Φ̂yu (ω)
+∞
X Ĝ(ejω ) can thus be seen as the ratio Φ̂u (ω)
of the
N
R̂u (τ ) −jωτ
e ∆
τ =−∞
approximation Φ̂yu (ω) of Φyu (ω) = F (Ryu (τ )) and of the
∆
approximation Φ̂u (ω) of Φu (ω) = F (Ru (τ )).
with

N −1
 1 This seems logical since
 X
u(t)u(t − τ ) f or |τ | < N − 1

N
R̂u (τ ) = N t=0

 0 Φyu (ω) G0 (ejω )Φu (ω)
f or |τ | > N − 1

= = G0 (ejω )
Φu (ω) Φu (ω)
& % & %

N −1
 1
 X The approximations of the spectra are obtained by taking the
y(t)u(t − τ ) f or 0 < τ < N − 1

N N
N
R̂yu (τ ) = N t=0 Fourier transforms of estimates R̂yu (τ ) and R̂u (τ ) of the exact
' $ ' $
correlation functions.

 0

elsewhere
Hamming lag-window wγ (τ )
1.2
1
Moreover it can be shown that
0.8
+∞
X
N
wγ (τ )R̂yu (τ ) e−jωτ 0.6
jω τ =−∞
Ĝsm (e )= +∞ 0.4
X
N
wγ (τ )R̂u (τ ) −jωτ
e
0.2
τ =−∞
0
with wγ (τ ) obtained as the inverse Fourier transform of the
& % & %
−0.2
frequency window Wγ (ω) 0 10 20 30 40 50 60 70
N
typical R̂yu (τ ) (solid) together with the Hamming lag-windows
w10 (τ ) (dotted), w30 (τ ) (dashed) and w70 (τ ) (dash-dotted).

' $ ' Illustration (cont’d):
N
$
wγ (τ ) is a window with width γ: wγ (τ ) = 0, |τ | > γ We compute R̂yu (τ ) with the data generated by the white
noise of variance 0.5
Smoothing corresponds thus to remove from the estimate 0.3
N 0.25
Φ̂yu (ω) of Φyu (ω) the elements of R̂yu (τ ) for τ > γ 0.2
0.15
0.1
estimated Ryu(τ)
This is relevant since Ryu (τ ) → 0 for τ → ∞ (G0 (z) stable) 0.05
N
and since the accuracy R̂yu (τ ) is smaller and smaller for 0
−0.05
N
increasing values of τ (R̂yu (τ ) computed with less data points) −0.1
−0.15
& % & %
−0.2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Method for the selection of γ: choose γ such that, for τ > γ, τ
N N
R̂yu (τ ) are small w.r.t |R̂yu (0)| and “less reliable”
N
we see the inaccuracy of the estimate: R̂yu (τ ) does not tend to
' $ ' $
0 for τ → ∞
Let us focus on the first 500 τ ’s Obtained smoothed ETFE with γ = 100
0.3 2
10

0.25
1
10
0.2
0
10
0.15
−1
0.1 10
estimated Ryu(τ)
−2 −1 0
10 10 10
0.05 ω
2
10
0

1
−0.05 10
−0.1 0
10
−0.15
−1
10
& % & %
−0.2 10
−2 −1
10
0
10
0 50 100 150 200 250 300 350 400 450 500
τ ω
N
we see that, after τ = 100, R̂yu (τ ) increases again which is Above plot: the smoothed ETFE at ωk ; Bottom plot: the same
much unlikely for Ryu (τ ) =⇒ we select γ = 100 with the frequency response of G0 (ejω ) (blue)

' $ ' $
Final remarks: drawbacks of ETFE
ETFE gives a discrete estimate of the frequency response of =⇒ parametric identification (prediction error identification)
G0 (ejω ) and not the rational transfer function G0 (z)
• delivers a model of the plant G0 and information on Φv (ω)

For simulation, for modern control design, such a transfer
function is necessary n Φv (ω)
• higher accuracy (cov(G(ejω , θ̂N ) ≈ N Φu (ω)
with PEI)
& % & %
No information about the noise spectrum Φv (ω) while this
information is important for e.g. disturbance rejection
'
Sc4110: part IV
Illustration (cont’d): Parametric Identification of G0 (z) with

the 1000 first samples of the white noise of variance 0.5
$ '
37 Sc4110: part IV
$
38
2
10
1
10
Modulus
0
10
the previous figure has to be compared with the non-smoothed

−1
10
−2
10
−1
10 10
0
ETFE and the smoothed ETFE
ω
2
10
1
10
This comparison shows that PEI delivers much better results
Modulus
0
10
even with ten times less data points
& % & %
−1
10
−2 −1 0
10 10 10
ω
Above plot: frequency response of the identified model; Bottom plot:

the same with the frequency response of G0 (ejω ) (blue)

' $ ' $
1 Preparatory experiments
• noise measurement on the output

• step response analysis
Part V: practical issues when designing
the identification experiment • area of linearity
• time constants
• static gain
& % & %
• delay of the system
Possibilities depend on circumstances
'
SC4110: Part V
2 Choice of the sampling frequency ωs = 2π

Ts
$ '
1 SC4110: Part V
$
2
Data for parametric (PEI) identification with smaller ωs

u x[n]
Ts system Ts
High ωs induces numerical problems with parametric
identification
Data for the ETFE with high(est) value of ωs
Indeed, the higher ωs , the larger the frequency range that is Indeed all poles cluster around z = 1 since the discrete-time
capured (Shannon theorem) state-space matrix Ad = eAcont Ts → I when Ts → 0
& % & %
ωs
The ETFE obtained with these data can be represented up to 2
By inspecting this ETFE, it is then possible to determine the

bandwidth ωb of the system (ωb << ω2s )
SC4110: Part V 3 SC4110: Part V 4

' $ ' Remark (actual vs. normalized frequencies):
$
Typical choice for parametric identification: The model of G0 identified with data collected with a sampling
frequency ωs contains information up to the Nyquist frequency
ωs
10ωb < ωs < 30ωb 2
(actual frequency)
with ωb as observed in the ETFE Considering now the normalized frequency ω = ωactual Ts
Data with a smaller ωs can be obtained We note that the interval [0 ω2s ] (actual frequencies)
• either by re-collecting data with a smaller ωs corresponds to the main interval [0 π] when considering
normalized frequencies. Indeed
& % & %
• or by decimating the data obtained with high ωs
(+anti-aliasing filter) ωs π
= =⇒ normalized ω = π
2 T
| {z s}
' $ ' $
actual f requency
3 Input signals used for system identification

Multisines
Finite-power quasi-stationary signals for continuous excitation n

X
u(t) = Ak sin(kω0 t + φk )
• periodic signals (in particular multisines)
k=1
• realization of stochastic process ((filtered) white noise or
alike)
Φu (ω) made up of Dirac pulses at the frequencies of the sines
Trade-off when designing the excitation signal
in the multisines
• the power Pu / Φu (ω) should be as high as possible to
& % & %
increase the accuracy of the identified model
the phase shifts φk can be optimized in order to reduce the
• the amplitude of the time-domain signal should be maximal amplitude of u(t) without any effect on the power
bounded/limited in order not to damage the actuators and spectrum Φu (ω)
in order not to excite the nonlinearities

' Realization of a stochastic process
$ ' Alternative: Random Binary Sequence (RBS)
$

u(t) = F (z)w(t) t
u(t) = c sign w int
ν
with F (z) an user-selected filter and w(t) a white noise of 2
2 with c the amplitude, w(t) a white noise of variance σw and ν
variance σw
the so-called clock period which is an integer such that 1 ≤ ν
The power spectrum is given by:
The amplitude of the RBS is either +c or −c

Φu (ω) = |F (ejω )|2 σw
2
& % & %
Shaping Φu (ω) is very easy, but there is no a-priori bound on The RBS has the maximal power Pu = Ēu2 (t) = c2 that can
the amplitude of u(t) !! be attained by a signal u(t) ≤ c ∀t
'
SC4110: Part V
c
$ '
9 SC4110: Part V
Influence of ν on Φu (ω)
$
10
(a)
1.6
t → 1.4
−c
1.2
c 0.8
(b)
0.6
t → 0.4
−c 0.2
& % & %
0
0 0.5 1 1.5 2 2.5 3 3.5
(a) Typical RBS with clock period equal to sampling interval
(ν = 1); 1
Spectrum 2π Φu (ω) of (P)RBS with basic clock period ν = 1
(b) RBS with increased clock period ν = 2.
(black), ν = 3 (green), ν = 5 (red), and ν = 10 (blue).

' $ ' $
The power spectrum Φu (ω) of the RBS is thus shaped via ν:
• ν = 1 =⇒ Φu (ω) = c2 ∀ω i.e. the RBS has the flat power

Another alternative: P(seudo)RBS
spectrum of a white noise
• binary signal constructed from a deterministic shift register

• For increasing values of ν, the power spectrum Φu (ω) will
be more and more located in low frequencies • otherwise very similar to RBS
& % & %
less flexibility, but bounded amplitude !!
'
SC4110: Part V
$ '
13 SC4110: Part V
$
14
5 Remarks on unstable systems
4 Data (pre)processing
Unstable systems can not be identified in open loop
• Anti-aliasing filter
Experiments has to be done with a stabilizing controller C in
• outliers/spike closed loop:
• Non-zero mean and drift in disturbances; detrending
G0 C H0
& % & %
y(t) = r(t) + e(t)
1 + G0 C 1 + G0 C

' $
G0 C H0
y(t) = r(t) + e(t)
1 + G0 C 1 + G0 C
Since r(t) is independent of e(t), we can excite the closed-loop

G0 C
system via r(t) and identify a model T̂ (z) of 1+G 0C
A model for the unstable G0 (z) is then
& %
T̂ (z)
Ĝ(z) =
C(z)(1 − T̂ (z))
SC4110: Part V 17

Material: Part I: Introduction To System Identification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Material: Part I: Introduction To System Identification

Uploaded by

Copyright:

Available Formats

' System Identification

Material: Part I: Introduction to system identification

• Lecture notes sc4110 - January 2006

System Identification: Part I 1 System Identification: Part I 2

Notion of model common in many branches of science

Within (systems and control) engineering:

System identification is about modeling • system (re)design

System Identification: Part I 3 System Identification: Part I 4

System identification is about data-based modeling motor (measured )

• Given a candidate model (i.e. a transfer function), we can use

• Measure the induced rotational speed y(t) +

measured y(t) −50

• determine that model minimizing the power of ε(t) (often a

System Identification: Part I 7 System Identification: Part I 8

Identification result: a discrete-time transfer function (4th order)

Measured y(t) (blue)

vs. ǫ(t) (red)

Example 1: control of the pick-up mechanism of a CD-player

pick up mechanism: position the reading tool (laser) on the

When thinking of modeling, we indeed generally think of

first-principle modeling = modeling using the laws of physics

Optical sensor to measure the laser position θ(t)

System Identification: Part I 11 System Identification: Part I 12

The model is designed based on the Newton law

Objective: design a fast and precise position controller (required

An identification experiment was then performed and the

double integrator Identified model =⇒ new controller design

Since all significant dynamics were now tackled, the controller

For a bandwidth of ≈ 1000 Hz, the mechanical modes can no

System Identification: Part I 15 System Identification: Part I 16

System Identification: Part I 17 System Identification: Part I 18

Strain sensor Amplitude [V]

pitch actuator 100 4 101 102 100 4 101 102

Strain sensor Phase [deg]

With this model, impossible to deduce a controller stabilizing

System Identification: Part I 19 System Identification: Part I 20

10−2 1st flapping mode

1st lead-lag mode

100 1P 3P 102 100 1P 3P 102

control design based on the identified model leads to a

satisfactory reduction of the strain in the blade

100 101 102 100 101 102 model

System Identification: Part I 21 System Identification: Part I 22

Example 3: Signal equalization in mobile telephony

System Identification: Part I 23 System Identification: Part I 24

preceded by a known sequence uknown (t) 1.5

Summary: First-principle vs. Data-based modeling”

the two methodologies are often combined to increase

System Identification: Part I 27 System Identification: Part I 28

v It can be best modeled via a (zero-mean) stochastic process.

The challenging nature of system identification is due to the

• a contribution independent of u(t) i.e. the disturbance v(t)

System Identification: Part I 29 System Identification: Part I 30

System identification procedure

Construct model • Prediction Error Identification (PEI) delivering a

System Identification: Part I 31 System Identification: Part I 32

Experiment design is very important since it has a direct

influence on the quality of the model

Notions from estimation theory

System Identification: Part I 39 System Identification: Part I 40

In system identification, we deal with measured signals =⇒

& % & the models/systems can be represented by discrete-time transfer

System Identification: part II 1 System Identification: part II 2