You are on page 1of 48

Filtering and Identification

Lecture 2:
Random processes and Linear Least
Squares

Michel Verhaegen and Jan-Willem van Wingerden


1/35

Delft Center for Systems and Control


Delft University of Technology
Overview
• Dynamical Systems and Random
Processes
• Deterministic least squares (LS) problems
• Statistical Properties LS estimates
• Application of LS to System Identification

2/35

Delft Center for Systems and Control


Dynamical Systems (Chapter 3)
v(k)

u(k) y(k)
Σ
“input” “output”

Σ : LTI (Linear Time Invariant) System with


representation:

3/35

Delft Center for Systems and Control


Dynamical Systems (Chapter 3)
v(k)

u(k) y(k)
Σ
“input” “output”

Σ : LTI (Linear Time Invariant) System with


representation:
P∞
• IIR: y(k) =
ℓ=−∞ g(ℓ)u(k − ℓ) + v(k)

3/35

Delft Center for Systems and Control


Dynamical Systems (Chapter 3)
v(k)

u(k) y(k)
Σ
“input” “output”

Σ : LTI (Linear Time Invariant) System with


representation:
P∞
• IIR: y(k) =
ℓ=−∞ g(ℓ)u(k − ℓ) + v(k)
• (Or)
( State Space Model:
x(k + 1) = Ax(k) + Bu(k)
y(k) = Cx(k) + Du(k) + v(k)
• (Or) · · · .
3/35

Delft Center for Systems and Control


Signals (Chapter 4: 4.1 - 4.4)
v(k)

u(k) y(k)
Σ
“input” “output”

4/35

Delft Center for Systems and Control


Signals (Chapter 4: 4.1 - 4.4)
v(k)

u(k) y(k)
Σ
“input” “output”

1. The quantities u(k), y(k) are (measured) Signals (OR)


discrete time sequences (OR) sampled input-output data
sequences generally denoted as:

Given :{u(k), y(k)}N


k=1 N ∈N

4/35

Delft Center for Systems and Control


Signals (Chapter 4: 4.1 - 4.4)
v(k)

u(k) y(k)
Σ
“input” “output”

1. The quantities u(k), y(k) are (measured) Signals (OR)


discrete time sequences (OR) sampled input-output data
sequences generally denoted as:

Given :{u(k), y(k)}N


k=1 N ∈N

2. The quanity v(k) is a stochastic process (generally


unknown) (OR) a discrete time sequence of Random
variables with a (Gaussian) probability density. In this
course we restrict to its mean and covariance function.
4/35

Delft Center for Systems and Control


Stochastic Signals

e(k) Σn v(k)

In this course: A stochastic process v(k) is stationary and


“assumed” to result by filtering zero-mean white noise e(k)
with an LTI system Σn

5/35

Delft Center for Systems and Control


Stationarity
Definition wide sense stationarity (WSS): A
random process x(k) ∈ R is WSS if the following
three conditions are satisfied:
1. mean is constant, mx (k) = mx
2. auto-correlation function
Rx (k, ℓ) = E[x(k)x(ℓ)] only depends on the
lag k − ℓ
3. variance E[(x(k) − mx )2 ] is finite

6/35

Delft Center for Systems and Control


Stationarity
Definition wide sense stationarity (WSS): A
random process x(k) ∈ R is WSS if the following
three conditions are satisfied:
1. mean is constant, mx (k) = mx
2. auto-correlation function
Rx (k, ℓ) = E[x(k)x(ℓ)] only depends on the
lag k − ℓ
3. variance E[(x(k) − mx )2 ] is finite
Hence,
Rx (k, ℓ) = Rx (k − ℓ) = Rx (τ ) =
6/35

Delft Center for Systems and Control


Stationarity
Definition wide sense stationarity (WSS): A
random process x(k) ∈ R is WSS if the following
three conditions are satisfied:
1. mean is constant, mx (k) = mx
2. auto-correlation function
Rx (k, ℓ) = E[x(k)x(ℓ)] only depends on the
lag k − ℓ
3. variance E[(x(k) − mx )2 ] is finite
Hence,
Rx (k, ℓ) = Rx (k − ℓ) = Rx (τ ) = E[x(k)x(k − τ )]
6/35

Delft Center for Systems and Control


White noise e(k) ∼ (0, σe2)
A zero-mean white noise sequence (ZMWN):
The random process e(k) is a ZMWN if it has
mean zero and its auto-covariance
(auto-correlation) function equals:
(
σe2 for k = ℓ
E[e(k)e(ℓ)] =
0 otherwise

Denoted as Re (τ ) = E[e(k)e(k − τ )] = σe2 ∆(τ ),


with ∆(τ ) the unit-pulse.

7/35

Delft Center for Systems and Control


RPs in the time-domain
If the (real) RPs x(k) and y(k) are wide sense
stationary (WSS), then these RPs are fully
characterized in the time-domain by their means
E[x(k)] = mx , E[y(k)] = my
and their auto-, cross-covariance functions:
h i
Cx (τ ) = E (x(k) − mx )(x(k − τ ) − mx )T
h i
Cxy (τ ) = E (x(k) − mx )(y(k − τ ) − my )T

8/35

Delft Center for Systems and Control


RPs in the time-domain
An equivalent characterization is to replace
the auto-, cross-covariance functions
by the auto-, cross-correlation functions:
h i h i
Rx (τ ) = E x(k)x(k − τ )T = E x(k + τ )x(k)T
h i
Rxy (τ ) = E x(k)y(k − τ )T

9/35

Delft Center for Systems and Control


RPs in the time-domain
The numerical calculation may proceed via the
assumption of ergodicity which enables to proof
relationships like:
" N
#
1 X
Pr lim x(k)x(k − τ )T = Rx (τ ) = 1
N →∞ N
k=1

10/35

Delft Center for Systems and Control


Overview
• Dynamical Systems and Random Processes
• Deterministic least squares (LS) problems
• Statistical Properties LS estimates
• Application of LS to System Identification

11/35

Delft Center for Systems and Control


Deterministic least squares problem
Given F ∈ RN ×n , y ∈ RN , the problem is:

min ǫT ǫ subject to: y = Fx + ǫ


x

The argument that minimizes this problem is the


b.
least squares solution and is denoted as, x

12/35

Delft Center for Systems and Control


Deterministic least squares problem
Given F ∈ RN ×n , y ∈ RN , the problem is:

min ǫT ǫ subject to: y = Fx + ǫ


x

The argument that minimizes this problem is the


b.
least squares solution and is denoted as, x
For all x ∈ Rn , it satisfies,
bk22 ≤ky − F xk22
ky − F x

12/35

Delft Center for Systems and Control


Deterministic least squares problem

bk22 ≤ky − F xk22


ky − F x

y
f2
Fx
ǫ
yb
span(F ) f1
h i
where yb = F x
b = f1 f2 x
b

13/35

Delft Center for Systems and Control


The classical solution
Lemma: Let the matrix F in
min ǫT ǫ subject to: y = Fx + ǫ
x

b is:
have full column rank, then the solution x
T
−1 T
b= F F
x F y

This follows from the normal equations:


T ↓
b=F T y
F Fx

14/35

Delft Center for Systems and Control


Proof of the classical solution
Via the completion of squares. For all x and x̂ satisfying:

(F T F )x̂ = F T y

we can write the least squares cost function as:

ky − F xk22 = (y − F x)T (y − F x)
= y T y − xT F T y − y T F x + xT F T F x
= y T y − y T F x̂+ (x − x̂)T F T F (x − x̂)

15/35

Delft Center for Systems and Control


Proof of the classical solution
Via the completion of squares. For all x and x̂ satisfying:

(F T F )x̂ = F T y

we can write the least squares cost function as:

ky − F xk22 = (y − F x)T (y − F x)
= y T y − xT F T y − y T F x + xT F T F x
= y T y − y T F x̂+ (x − x̂)T F T F (x − x̂)

Therefore,
arg minx ky − F xk22 = x̂

15/35

Delft Center for Systems and Control


Proof of the classical solution
  
h i y T y −y T F 1
2
ky − F xk2 = 1 x T    
−F T y F T F x
| {z }
M
   
I −b xT yT y − yT F x
b 0 I 0
M=      ,
0 I 0 FTF −bx I
b satisfying,
for x
b = F T y.
FTFx

16/35

Delft Center for Systems and Control


Proof of the classical solution (Ct’d)
  
h i y T y −y T F 1
ky − F xk22 = 1 x T    
−F T y F T F x

    
h i I −b xT yT y − yT F x
b 0 I 0 1
= 1 x T        
0 I 0 FTF −bx I x
  
h i yT y − yT F xb 0 1
= 1 (x − xb) T    
0 FTF x−xb

b satisfying F T F x
for x b = F T y.

ky − F xk22 = (y T y − y T F x b)T F T F (x − x
b) + (x − x b).

17/35

Delft Center for Systems and Control


Overview
• Dynamical Systems and Random Processes
• Deterministic least squares (LS) problems
• Statistical Properties LS estimates
• Application of LS to System Identification

18/35

Delft Center for Systems and Control


“Measurement Errors” ǫ ∼ (0, I)

Given F ∈ RN ×n , y ∈ RN , the problem is:

min ǫT ǫ subject to: y = Fx + ǫ


x

19/35

Delft Center for Systems and Control


“Measurement Errors” ǫ ∼ (0, I)

Given F ∈ RN ×n , y ∈ RN , the problem is:

min ǫT ǫ subject to: y = Fx + ǫ


x

• F is a known full column rank matrix.


• x an unknown, deterministic vector.
• ǫ is a zero-mean random vector with
E[ǫǫT ] = I

19/35

Delft Center for Systems and Control


Linear Estimators for the least squares problem

Least squares solution


T
−1
b= F F
x FTy
is a linear estimator: it is linear in y.
Definition: Linear estimator for x given y has the
form:
e = My
x
with M ∈ Rn×N

20/35

Delft Center for Systems and Control


Unbiased and minimum variance
cy is unbiased if
b=M
The linear estimator x
x − x] = 0
E[b

21/35

Delft Center for Systems and Control


Unbiased and minimum variance
cy is unbiased if
b=M
The linear estimator x
x − x] = 0
E[b

b=M
The linear estimator x cy is called the
minimum variance estimator if
h i h i
E (b x − x)T ≤ E (e
x − x)(b x − x)(ex − x)T

e = M y.
for all linear estimators x

21/35

Delft Center for Systems and Control


The Gauss-Markov theorem
The least squares solution
T
−1 T
b= F F
x F y
is an unbiased minimum variance estimate
(UMVE) and has covariance matrix:
T T
−1
b)(x − x
E[(x − x b) ] = F F

22/35

Delft Center for Systems and Control


Proof of the Gauss-Markov theorem
Linear Estimator which is UNBIASED
e = My = MF x+Mǫ ⇒ x
x e − x = (M F − I)x + M ǫ

23/35

Delft Center for Systems and Control


Proof of the Gauss-Markov theorem
Linear Estimator which is UNBIASED
e = My = MF x+Mǫ ⇒ x
x e − x = (M F − I)x + M ǫ
e−x
Consider the mean of x
x − x] = (M F − In )x + M E[ǫ] = (M F − I)x
E[e
The linear estimator is unbiased provided,

23/35

Delft Center for Systems and Control


Proof of the Gauss-Markov theorem
Linear Estimator which is UNBIASED
e = My = MF x+Mǫ ⇒ x
x e − x = (M F − I)x + M ǫ
e−x
Consider the mean of x
x − x] = (M F − In )x + M E[ǫ] = (M F − I)x
E[e
The linear estimator is unbiased provided,

x − x] = 0 ⇔ M F = I
E[e
−1
The least squares estimator M = FTF FT
clearly satisfies M F = In
23/35

Delft Center for Systems and Control


Proof of Minimum Variance Property
Recall
e − x = (M F − I)x + M ǫ = M ǫ
x
Then, the covariance matrix of the Unbiased linear
e = hM y with M satisfying
estimate x i M F = I:
E (e x − x)T
x − x)(e = M E[ǫǫT ]M T
= MMT

For the least squares solution x̂(M̂ = (F T F )−1 F T ), its


covariance
h matrix equals,i
E (b x − x)T = (F T F )−1 F T F (F T F )−1
x − x)(b
= (F T F )−1

24/35

Delft Center for Systems and Control


lsq_mvar.m

25/35

Delft Center for Systems and Control


Overview
• Dynamical Systems and Random Processes
• Deterministic least squares (LS) problems
• Statistical Properties LS estimates
• Application of LS to System Identification

26/35

Delft Center for Systems and Control


A simple System Identification Problem
e(k)

u(k) y(k)
Σ

With Σ given by the following (2nd order) difference equation:

y(k) + a1 y(k − 1) + a2 y(k − 2) = b1 u(k − 1) + b2 u(k − 2) +e(k)


| {z } | {z }
AR X

27/35

Delft Center for Systems and Control


A simple System Identification Problem
e(k)

u(k) y(k)
Σ

With Σ given by the following (2nd order) difference equation:

y(k) + a1 y(k − 1) + a2 y(k − 2) = b1 u(k − 1) + b2 u(k − 2) +e(k)


| {z } | {z }
AR X

This can be written in transfer function


 form: 
b1 q +b2 q
−1 −2
G(q) = 1+a
y(k) = G(q)u(k) + H(q)e(k)  1q
−1 +a q −2
2 
H(q) = 1+a1 q−11+a2 q−2

27/35

Delft Center for Systems and Control


A simple System Identification Problem
e(k)

u(k) y(k)
Σ

With Σ given by the following (2nd order) difference equation:

y(k) + a1 y(k − 1) + a2 y(k − 2) = b1 u(k − 1) + b2 u(k − 2) +e(k)


| {z } | {z }
AR X

This can be written in transfer function


 form: 
b1 q +b2 q
−1 −2
G(q) = 1+a
y(k) = G(q)u(k) + H(q)e(k)  1q
−1 +a q −2
2 
H(q) = 1+a1 q−11+a2 q−2

Then the identification problem is: {u(k), y(k)}N


k=1 → Σ̂
27/35

Delft Center for Systems and Control


Solving the ARX Identification Problem
We can denote the difference equation as:  
a1
 
h i a2 
 
y(k) = −y(k − 1) −y(k − 2) u(k − 1) u(k − 2)  +e(k)
 b1 
 
b2

If we define this data relationship as the k − 2th row of the


following matrix equation (for k = 3 : N ):
y = Fx + ǫ
h i
then the ARX parameters a1 a2 b1 b2 can be found by
solving the least squares problem:
min ǫT ǫ subject to: y = F x + ǫ
x
28/35

Delft Center for Systems and Control


lsq_demo.m
Consider the followig AR(MA)X model:
y(k) − 1.5y(k − 1) + 0.7y(k − 2) = u(k − 1) − u(k − 2) + e(k)
 
− e(k − 1) + 0.2e(k − 2)

with u(k) and e(k) independent zero-mean white noise


sequences of unit variance and length 1000. Using
{u(k), y(k)}903
k=1 estimate the parameters of a 2 nd
order ARX
model for (a) MA part zero and (b) MA as given!.

29/35

Delft Center for Systems and Control


The use of the QR factorization

The QR-Theorem: Let A ∈ Rm×n (m ≥ n), then


there exists an orthogonal matrix Q ∈ Rm×m that
can be partitioned as:
h i
Q = Q1 Q2 Q1 ∈ Rm×n

such that,
" #
T R
Q A= with R ∈ Rn×n and R upper-triangular
0

[or the matrix A is factorized as Q1 R.]


30/35

Delft Center for Systems and Control


QR Solution to LS problem
LSQR-Theorem: Consider the LS problem minx ky − F xk2 and
consider the following QR factorization of F and the application
of QT to y as,
     
h i R QT1 d1
F = Q1 Q2     y=  
0 QT2 d2

Consider the matrix F to have full column rank, then the LS


b and the LS residual satisfy:
solution x

b = R−1 d1
x bk2 = kd2 k2
ky − F x

31/35

Delft Center for Systems and Control


SensNeq.m

32/35

Delft Center for Systems and Control


Summary of Lecture 2
• Refreshment of characterization of RPs in
Time (- and Frequency) domain
• The linear least squares problem: unknown x
deterministic.
• Gauss-Markov theorem (MVUE).
• Ready for the first Home Work - download
Now!

33/35

Delft Center for Systems and Control


Next Instruction session
Preparation:
Study Chapters 2(2.6-2.7) and 4 (4.1 - 4.5.2)
Get Homework 1

Next lecture:
Addressing your questions on Homework 1.
Tuesday 17-11-2015

34/35

Delft Center for Systems and Control

You might also like