You are on page 1of 34

ECE5530: Multivariable Control Systems II.

2–1

VECTOR RANDOM PROCESSES

2.1: Scalar random variables

■ We will be working with systems whose state is affected by noise and


whose output measurements are also corrupted by noise.
Sensor noise
v.t /
System
Disturbance y.t /
w.t / x.t /

■ By definition, noise is not deterministic—it is random in some sense.


■ So, to discuss the impact of noise on the system dynamics, we must
review the concept of a “random variable,” (RV) X.

! Cannot predict exactly what we will get each time we measure or


sample the random variable, but
! We can characterize the probability of each sample value by the
“probability density function” (pdf).

Probability density functions (pdf)

■ We denote probability density function (pdf) of RV X as fX .x/.


fX .x/
dx ■ fX .x0/ dx is the probability that
random variable X is between
x
x0 Œx0; x0 C dx!.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–2
■ Properties that are true of all pdfs:
1. f
ZX .x/ " 0 8 x.
1
2. fX .x/ dx D 1.
#1 Z x0
4
3. Pr.X $ x0/ D fX .x/ dx D FX .x0/, which is the RV’s cumulative
#1
distribution function (cdf).1
■ Problem: Apart from simple examples it is often difficult to determine
fX .x/ accurately. ➠ Use approximations to capture the key behavior.
■ Need to define key characteristics of fX .x/.
EXPECTATION : Describes the expected outcome of a random trial.
Z 1
xN D EŒX! D xfX .x/ dx.
#1
■ Expectation is a linear operator (very important for working with it).
■ So, for example, the first moment about the mean: EŒX # x!
N D 0.
STATISTICAL AVERAGE : Different from expectation.
■ Consider a (discrete) RV X that can assume n values x1, x2, . . . xn.
■ Define the average by making many measurements N ! 1. Then,
mi is the number of times the value of the measurement is i.
1 m1 m2 mn
xN D .m1x1 C m2x2 C % % % C mnxn/ D x1 C x2 C % % % C xn.
N N N N
■ In the limit, m =N ! Pr.X D x / and so forth (assuming ergodicity),
1 1
N
X
xN D xi Pr.X D xi /.
i D1

1
Some call this a probability distribution function, with acronym PDF (uppercase), as
different from a probability density function, which has acronym pdf (lowercase).

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–3
■ So, statistical means can converge to expectation in the limit. (Can
show similar property for continuous RVs. . . but harder to do.)
VARIANCE : Second moment about the mean.
Z 1
N 2! D
var.X/ D EŒ.X # x/ N 2fX .x/ dx
.x # x/
#1

D EŒX 2! # .EŒX!/2,
or is equal to the mean-square minus the square-mean.
STANDARD DEVIATION : Measure of dispersion about the mean of the
p
samples of X: "X D var.X/.
■ The expectation and variance capture key features of the actual pdf.
Higher-order moments are available, but we won’t need them!
KEY POINT FOR UNDERSTANDING VARIANCE : Chebychev’s inequality
■ Chebychev’s inequality states (for positive ")
"X2
Pr.jX # xj N " "/ $ 2 ,
"
which implies that probability is concentrated around the mean.
■ It may be proven as follows:
Z x#"
N Z 1
Pr.jX # xj
N " "/ D fX .x/ dx C fX .x/ dx.
#1 xC"
N

■ For the two regions of integration jx # xj="


N " 1 or .x # x/N 2="2 " 1. So,
Z x#"
N Z 1
.x # x/N 2 .x # x/N 2
Pr.jX # xj
N " "/ $ 2
fX .x/ dx C 2
fX .x/ dx.
#1 " xC"
N "
■ Since fX .x/ is positive, then we also have
Z 1
.x # x/N 2 "X2
Pr.jX # xjN " "/ $ 2
fX .x/ dx D 2 .
#1 " "
Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–4
■ This inequality shows that probability is clustered around the mean,
and that the variance is an indication of the dispersion of the pdf.
■ That is, variance (later on, covariance too) informs us of how
uncertain we are about the value of a random variable.

! Low variance means that we are very certain of its value;

! High variance means that we are very uncertain of its value.

■ The mean and variance give us an estimate of the value of a random


variable, and how certain we are of that estimate.

The most important distribution for this course


■ The Gaussian (normal) distribution is of key importance to us. (We
will explain why this is true later—see “main point #7” on pg. 2–14.)
■ Its pdf is defined as:
" !
1 N 2
.x # x/
fX .x/ D p exp # 2
.
2#"X 2"X
■ Symmetric about x.
N
−15 −10 −5 0 5 10 15
1
■ Peak proportional to at x.
N
"X
■ N "X2 /.
Notation: X & N .x;
■ Probability that X within ˙"X of xN is 68%; probability that X within
˙2"X of xN is 96%; probability that X within ˙3"X of xN is 99:7%.

! A ˙3"X range almost certainly covers observed samples.

■ “Narrow” distribution ➠ Sharp peak. High confidence in predicting X.


■ “Wide” distribution ➠ Poor knowledge in what to expect for X.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–5
2.2: Vector random variables

■ With very little change in the preceding, we can also handle vectors of
random variables.
2 3 2 3
X1 x1
6 7 6 7
6 X2 7 6 x2 7
X D6 7
6 ::: 7 I let x0 D 6 7
6 ::: 7 .
4 5 4 5
Xn xn
■ X described by (scalar function) joint pdf fX .x/ of vector X.
■ fX .x0/ means fX .X1 D x1; X2 D x2 % % % Xn D xn/.
■ That is, fX .x0/ dx1 dx2 % % % dxn is the probability that X is between x0
and x0 C dx.
■ Properties of joint pdf fX .x/:

1. f
ZX1.x/
Z 1" 0 Z8 x. Same as before.
1
2. %%% fX .x/ dx1 dx2 % % % dxn D 1. Basically the same.
#1 #1 Z
#1 Z
1 1 Z 1
3. xN D EŒX! D %%% xfX .x/ dx1 dx2 % % % dxn. Basically same.
#1 #1 #1
4. Correlation matrix: Different.

†X D EŒXX T ! (outer product)


Z 1Z 1 Z 1
D ::: xx T fX .x/ dx1 dx2 % % % dxn.
#1 #1 #1
e D X # x.
5. Covariance matrix: Different. Define X N Then,

„X D †Xe D EŒ.X # x/.X


N N T!
# x/
Z 1Z 1 Z 1
D %%% .x # x/.x
N N T fX .x/ dx1 dx2 % % % dxn.
# x/
#1 #1 #1

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–6

„X is symmetric and positive-semi-definite (psd). This means

y T „X y " 0 8 y.

PROOF : For all y ¤ 0,

0 $ EŒ.y T .X # x//
N 2!

D y T EŒ.X # x/.X
N N T !y
# x/

D y T „X y.
■ Notice that correlation and covariance are the same for zero-mean
random vectors.
■ The covariance entries have specific meaning:

.„X /i i D "X2 i

.„X /ij D $ij "Xi "Xj D .„X /j i .

! The diagonal entries are the variances of each vector component;

! The correlation coefficient $ij is a measure of linear dependence


between Xi and Xj . j$ij j $ 1.

The most important multivariable distribution for this course

■ The multivariable Gaussian is of key importance for Kalman filtering.

xN 1 xN 2

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–7

! "
11
fX .x/ D 1=2
N T „#1
exp # .x # x/ X .x # x/
N :
n=2
.2#/ j„X j 2
j„X j D det.„X /; „#1
X requires positive-definite „X .

■ Notation: X & N .x;


N „X /.
■ Contours of constant fX .x/ are hyper-ellipsoids, centered at x,
N
directions governed by „X . Principle axes decouple „X
(eigenvectors).
■ Two-dimensional zero-mean case: (Let "1 D "X1 and "2 D "X2 )
" #
2
"1 $12"1"2
„X D 2
j„X j D "12"22.1 # $122
/:
$12"1"2 "2
0 2 1
x1 x1 x2 x22
1 # 2$12 "1 "2 C " 2
B "12 2 C
fX1 ;X2 .x1; x2/ D q exp @# 2 A.
2#"1"2 1 # $122 2.1 # $ 12 /

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–8
2.3: Uncorrelated versus independent

INDEPENDENCE : Iff jointly-distributed RVs are independent, then

fX .x1; x2; : : : ; xn/ D fX1 .x1/fX2 .x2/ % % % fXn .xn/.

■ Joint distribution can be split up into the product of individual


distributions for each RV
■ Equivalent condition: For all functions f .%/ and g.%/,

EŒf .X1/g.X2/! D EŒf .X1/!EŒg.X2 /!.

■ “The particular value of the random variable X1 has no impact on


what value we would obtain for the random variable X2.”

UNCORRELATED : Two jointly-distributed RVs X1 and X2 are uncorrelated


if their second moments are finite and

cov.X1; X2/ D EŒ.X1 # xN 1 /.X2 # xN 2/! D 0

which implies $12 D 0.


■ Uncorrelated means that there is no linear relationship between X1
and X2.

MAIN POINT #1: If jointly-distributed RV X1 and X2 are independent then


they are uncorrelated. Independence implies uncorrelation.
■ To see this, notice that independence means EŒX1X2! D EŒX1!EŒX2!.
Therefore,

cov.X1; X2/ D EŒX1X2! # EŒX1!EŒX2! D 0,

therefore uncorrelated.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–9
■ Does uncorrelation imply independence? Consider the example
Y1 D sin.2#X/ and Y2 D cos.2#X/
with X uniformly distributed on Œ0; 1!.
■ We can show that EŒY1! D EŒY2! D EŒY1Y2! D 0.
■ So, cov.Y1; Y2/ D 0 and Y1 and Y2 are uncorrelated.
■ But, Y12 C Y22 D 1 and therefore Y1 and Y2 are clearly not independent.
■ Note: cov.X1; X2 / D 0 (uncorrelated) implies that
EŒX1X2 ! D EŒX1 !EŒX2 !

but independence requires


EŒf .X1/g.X2 /! D EŒf .X1/!EŒg.X2 /!

for all f .%/ and g.%/.


■ Therefore, independence is much stronger than uncorrelated.
COROLLARY: Consider a RV X with uncorrelated components. The
covariance matrix „X D EŒ.X # x/.X
N N T ! is diagonal.
# x/
PROOF : Notice that:
! The diagonal elements are .„X /i i D EŒ.Xi # xN i /2! D "i2 :
! The off-diagonal elements are

.„X /ij D EŒ.Xi # xN i /.Xj # xN j /!

D EŒXi Xj # Xi xN j # xN i Xj C xN i xN j !

D EŒXi !EŒXj ! # 2EŒXi !EŒXj ! C EŒXi !EŒXj ! D 0.


MAIN POINT #2: If jointly normally distributed RVs are uncorrelated, then
they are independent. (This is a special case.)

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–10
2.4: Functions of random variables

MAIN POINT #3: Suppose we are given two random variables Y and X
with Y D g.X/. Also assume that g #1 exists, g and g #1 are
continuously differentiable, then
#ˇ #1 ˇ#
#ˇ @g .y/ ˇ#
fY .y/ D fX .g #1.y// #ˇ
#ˇ @y ˇ# ,
ˇ#

where kj%jk means to take the absolute value of the determinant.


■ So what? Say we know the pdf of X1 and X2 and
)
Y1 D X1 C X2
Y D AX
Y2 D X2
we can now form the pdf of Y ➠ handy!

EXAMPLE : Y D kX and X & N .0; "X2 /.


1 #1 1 @g #1 .y/ 1
■ X D Y and so g .y/ D y. Then, D .
k k @y k
$ y % ˇˇ 1 ˇˇ 1 1
!
#.y=k/ 2"
fY .y/ D fX ˇ ˇD p exp
k ˇ k ˇ jkj 2#"X 2"X2
! "
1 #y 2
Dp exp 2
.
2#."X k/ 2 2." X k/
■ Therefore, multiplication by gain k results in a normally-distributed RV
with scale change in standard deviation. Y & N .0; k 2"X2 /.

EXAMPLE : Y D AX C B where A is a constant (non-singular) matrix, B


is a constant vector, and X & N .x;
N „X /.
#1 #1 @g #1 .y/
#1 #1 #1
■ X D A Y # A B so g .y/ D A y # A B. Then, D A#1.
@y

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–11
& '
1 1
■ f .x/ D
X n=2 1=2
exp # .x # x/ N T „#1
X .x # x/ N .
.2#/ j„X j 2
■ Therefore,
& '
jA#1j 1 #1 T #1 #1
fY .y/ D exp # .A .y # B/ # x/
N „ X .A .y # B/ # x/
N
.2#/n=2j„X j1=2 2
& '
1 1 T #1 T #1 #1
D exp # .y # y/
N .A / „X A .y # y/ N
.2#/n=2.jAjj„X jjAT j/1=2 2
& '
1 1
D n=2 1=2
exp # .y # y/„ N #1 Y .y # y/ N ,
.2#/ j„Y j 2
if „Y D A„X AT and yN D AxN C B. That is, Y & N .AxN C B; A„X AT /.
CONCLUSION : Sum of Gaussians is Gaussian—very special case.
RANDOM NOTE : How to use randn.m to simulate non-zero mean
Gaussian noise with covariance „Y ?
■ Y & N .y;
N „Y / but randn.m returns X & N .0; I /.
■ Try y D yN C AT x where A is square with the same dimension as „Y ;
AT A D „Y . (A is the Cholesky decomposition of positive-definite
symmetric matrix „Y ).
ybar = [1; 2];
covar = [1, 0.5; 0.5, 1]; 5000 samples with mean [1;2] and
A = chol(covar); covar [1,0.5; 0.5,1]
5
x = randn([2, 1]);
4
y = ybar + A'*x;
3
y coordinate

■ When „Y is non-positive definite 2

(but also non-negative definite) 1

[L,D] = ldl(covar); 0
x = randn([2,5000]); −1
y = ybar(:,ones([1 5000])) −4 −2 0 2 4 6
x coordinate
+(L*sqrt(D))*x;

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–12

MAIN POINT #4: Normality is preserved in a linear transformation


(extension of above example).
■ Let X and W be independent vector random variables, a is a constant
(vector): X & N .x;
N „X /, W & N .w;
N „W /.
■ Let

Z D AX C BW C a

EŒZ! D Ń D AxN C B wN C a

„Z D A„X AT C B„W B T .

■ Z is also a normal RV and Z & N . Ń; „Z /.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–13
2.5: Conditioning

MAIN POINT #5: Conditional probabilities.


■ Given jointly-distributed RVs, it is often of extreme interest to find the
pdf of some of the RVs given known values for the rest.
■ For example, given the joint pdf fX;Y .x; y/ for RVs X and Y , we want
the conditional pdf of fXjY .x j y/ which is the pdf of X for a known
value Y D y.
■ Can also think of it as fXjY .X D x j Y D y/.

DEFINE : Conditional pdf


fX;Y .x; y/
fXjY .x j y/ D
fY .y/
is the probability that X D x given that Y D y has happened.
NOTE I : The marginal probability fY .y/ may be calculated as
Z 1
fY .y/ D fX;Y .x; y/ dx.
#1
For each y, integrate out the effect of X.
NOTE II : If X; Y independent, fX;Y .x; y/ D fX .x/fY .y/. Therefore
fX;Y .x; y/ fX .x/fY .y/
fXjY .x j y/ D D D fX .x/.
fY .y/ fY .y/
Knowing that Y D y has occurred provides no information about X.
(Is this what you would expect?)
DIRECT EXTENSION :

fX;Y .x; y/ D fXjY .x j y/fY .y/

D fY jX .y j x/fX .x/,
Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–14

Therefore,
fY jX .y j x/fX .x/
fXjY .x j y/ D .
fY .y/
■ This is known as Bayes’ rule. It relates the a posteriori probability to
the a priori probability.
■ It forms a key step in the Kalman filter derivation.

MAIN POINT #6: Conditional expectation.


■ Now that we have a way of expressing a conditional probability, we
can also express a conditional mean.
■ What do we expect the value of X to be given that Y D y has
happened?
Z 1
EŒX D x j Y D y! D EŒX j Y ! D xfXjY .x j Y / dx.
#1

■ Conditional expectation is not a constant (as expectation is) but a


random variable. It is a function of the conditioning random variable
(i.e., Y ).
■ Note: Conditional expectation is critical. The Kalman filter is an
algorithm to compute EŒxk j Zk !, where we define Zk later.

MAIN POINT #7: Central limit theorem.


X
■ If Y D Xi and the Xi are independent and identically distributed
i
IID), and the Xi have finite mean and variance, then Y will be
approximately normally distributed.
■ The approximation improves as the number of summed RVs gets
large.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–15
■ Since the state of our dynamic system adds up the effects of lots of
independent random inputs, it is reasonable to assume that the
distribution of the state tends to the normal distribution.
■ This leads to the key assumptions for the derivation of the Kalman
filter, as we will see:

! We will assume that state xk is a normally-distributed RV;

! We will assume that process noise wk is a normally-distributed RV;

! We will assume that sensor noise vk is a normally-distributed RV;

! We will assume that wk and vk are uncorrelated with each other.

■ Even when these assumptions are broken in practice, the Kalman


filter often works quite well.
■ Exceptions to this rule tend to be with very highly nonlinear systems,
for which particle filters must sometimes be employed to get good
estimates.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–16
2.6: Vector random (stochastic) processes
■ A stochastic random process is a family of random vectors indexed by
a parameter set (“time” in our case).
! For example, the random process X.t/.
! The value of the random process at any time t0 is RV X.t0/.

■ Usually assume stationarity.


! The pdf of the random variables are time-shift invariant.
! Therefore, EŒX.t/! D xN for all t and EŒX.t1/X T .t2 /! D RX .t1 # t2 /.

Properties

1. Autocorrelation: RX .t1; t2/ D EŒX.t1/X T .t2/!. If stationary,


RX .%/ D EŒX.t/X T .t C %/!.
■ Provides a measure of correlation between elements of the
process having time displacement %.
■ RX .0/ D "X2 for zero-mean X.
■ RX .0/ is always the maximum value of RX .%/.
2. Autocovariance: CX .t1; t2/ D EŒ.X.t1/ # EŒX.t1/!/.X.t2/ # EŒX.t2/!/T !. If
stationary,
CX .%/ D EŒ.X.t/ # x/.X.t
N N T !.
C %/ # x/

“White” noise
■ Correlation from one time instant to the next is a key property of a
random process. ➠ RX .%/ and CX .%/ very important.
■ Some processes have a unique autocorrelation:
Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–17
1. Zero mean,
2. RX .%/ D EŒX.t/X.t C %/T ! D SX ı.%/ where ı.%/ is the Dirac delta.
■ Therefore, the process is uncorrelated in time.
■ Clearly an abstraction, but proves to be a very useful one.
White noise Correlated noise
4 0.2

0.15
2
0.1

0.05
Value

Value
0
0

−0.05
−2
−0.1

−4 −0.15
0 200 400 600 800 1000 0 200 400 600 800 1000
Time Time

Power spectral density (PSD)

■ Consider stationary random processes with autocorrelation defined


as RX .%/ D EŒX.t/X.t C %/T !.
■ If a process varies slowly, time-adjacent samples are very correlated,
and RX .%/ will drop off slowly ➠ Mostly low-frequency content.
■ If a process varies quickly, time-adjacent samples aren’t very
correlated, and RX .%/ drops off quickly ➠ More high-freq. content.
■ Therefore, the autocorrelation function tells us about the frequency
content of the random signal.
■ Consider scalar case. We define the power spectral density (PSD) as
the Fourier transform of the autocorrelation function:

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–18
Z 1
SX .!/ D RX .%/e #j!% d%
#1
Z 1
1
RX .%/ D SX .!/e j!% d!.
2# #1
■ SX .!/ gives a frequency-domain interpretation of the “power”
(energy) in the random process X.t/, and is real, symmetric and
positive definite for real random processes.
EXAMPLE : RX .%/ D " 2e #˛j%j for ˛ > 0. Then,
Z 1
2 #˛j%j #j!% 2" 2˛
SX .!/ D " e e d% D 2 2
.
#1 ! C ˛
RX .% / SX .!/
2" 2
Peak of " 2 Peak of
˛

% !
Autocorrelation PSD
■ ˛ is a bandwidth measure.
INTERPRETATION :

1. Area under PSD SX .!/ for !1 $ ! $ !2 provides a measure of


energy in the signal in that frequency range.
2. White noise can be thought of as a limiting process as ˛ ! 1.
From our prior example, let ˛ ! 1.
■ As ˛ ! 1, RX .%/ ! ı.%/; therefore white.
■ Bandwidth of spectral content is approximately ˛, so goes to infinity.
■ Abstraction since implies that signal has infinite energy!

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–19
2.7: Discrete-time dynamic systems with random inputs

■ What are the statistics of the state of a system driven by a random


process?
■ That is, how do the system dynamics influence the time-propagation
of x.t/? ➠ Do not know x.t/ exactly, but can develop a pdf for x.t/.
■ Primary example: A linear system driven by white noise w.t/,
possibly in addition to deterministic input u.t/.

u.t / A; B; C; D
y.t /
w.t /
x.t /
x.t /

■ Contrast to deterministic simulation, which can be easily simulated

x.t/
P D Ax.t/ C Bu.t/; x.0/ D x0; x.0/; u.t/ known.

■ System response is completely specified. Therefore, no uncertainty.


■ The stochastic propagation is different since there are inputs that are
not known exactly. ➠ Best we can do is to say how uncertainty in the
state changes with time.
x2 Pr.x/
x.tf /
x2
x.tf /

x.to /
x.to /
x1 x1
Deterministic Stochastic

■ Deterministic: x.t0/ and inputs known ➠ State trajectory deterministic.


Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–20
■ Stochastic: pdf for x.t0/ and inputs known ➠ Can find only pdf for x.t/.
■ We will work with Gaussian noises to a large extent, which are
uniquely defined by the first- and second central moments of the
statistics ➠ Gaussian assumption not essential.
■ We will only ever track the first two moments.

NOTATION : Until now, we have always used capital letters for random
variables. The state of a system driven by a random process is a
random vector, so we could now call it X.t/ or Xk . However, it is more
common to retain the standard notation x.t/ or xk and understand
from the context that we are now discussing an RV

Discrete-time systems

■ We will start with a discrete system and then look at how to get an
equivalent answer for continuous systems.
■ Model:
xk D Ad;k#1xk#1 C Bd;k#1uk#1 C wk#1 ,
where

xk W State vector; a random process:

uk W Deterministic control inputs:

wk W Noise that drives the process .process noise/; a random process.

■ Ad and Bd assumed known. Can be time-varying. For now, assume


fixed. Generalize later if necessary.
■ Key assumptions about driving noise:

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–21
1. Zero mean: EŒwk ! D 0 8 k.
2. White: EŒwk1 wkT2 ! D „w &.k1 # k2/.
■ „w is called the spectral density of the noise signal wk .
■ Uncertainty at startup: Statistics of initial condition.

EŒx0 ! D xN 0 I

EŒ.x0 # xN 0 /wkT ! D 0 8 k

„x;0 D EŒ.x0 # xN 0/.x0 # xN 0 /T !.

■ Key question: How do we propagate the statistics of this system?

Mean value:

■ The mean value is

EŒxk ! D xN k D EŒAd xk#1 C Bd uk#1 C wk#1 !

D Ad xN k#1 C Bd uk#1.

Therefore, the mean propagation is

xN 0 W Given

xN k D Ad xN k!1 C Bd uk!1 .

Deterministic simulation and mean values treated the same way.

Variations about mean

■ To study the random variations about the mean, we need to form the
second central moment of the statistics.

„x;k D EŒ.xk # xN k /.xk # xN k /T !.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–22
■ Easiest to study if we note that

xk # xN k D Ad xk#1 C Bd uk#1 C wk#1

#Ad xN k#1 # Bd uk#1

D Ad .xk#1 # xN k#1 / C wk#1 .

■ Thus,

„x;k D EŒ.Ad .xk#1 # xN k#1/ C wk#1/.Ad .xk#1 # xN k#1/ C wk#1 /T !.

■ Three terms:

1. EŒAd .xk#1 # xN k#1/.xk#1 # xN k#1/T ATd ! D Ad „x;k#1ATd .


T
2. EŒwk#1wk#1 ! D „w .
T
3. EŒAd .xk#1 # xN k#1/wk#1 ! D?
■ The third term is a cross-correlation term.

! But, xk#1 depends only on x0 and inputs wm for m D 0 : : : k # 2.

! wk#1 is white noise uncorrelated with x0 .

! Therefore, third term is zero.

Therefore, the covariance propagation is

„x;0 W Given

„x;k D Ad „x;k!1 ATd C „w .

In this equation, Ad „x;k!1 ATd is the homogeneous part; „w is the driving term.

■ Note: If Ad and „w are constant and Ad stable, there is a


steady-state solution.

! As k ! 1, „x;k D „x;kC1 D „x .

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–23
T
! Then, „x D Ad „x Ad C „w .

■ This form is called a discrete Lyapunov equation. In MATLAB,


dlyap.m

EXAMPLE : Can solve by hand in the scalar case.


■ Consider: xk D ˛xk#1 C wk#1 . EŒwk#1! D 0; „w D 1; xN 0 D 0.
■ Then,

„x D ˛„x ˛ T C 1

„x .1 # ˛ 2/ D 1
1
„x D .
1 # ˛2
Valid for j˛j < 1; otherwise unstable.
■ In the example below, we plot 100 random trajectories for ˛ D 0:75
and „x;0 D 50.
■ Compare propagation of „x;k with steady-state dlyap.m solution.
„x D 2:29 Propagation of state, with error bounds
50 30

40 20

10
30
Value
„x;k

0
20
−10
10
−20

0 −30
0 5 10 15 20 0 5 10 15 20
Time Time (100 simulations plotted)
p
■ The error bounds plotted are 3" bounds (i.e., ˙3 „x;k ).

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–24
2.8: Continuous-time dynamic systems with random inputs
■ For continuous-time systems we have
x.t/
P D A.t/x.t/ C Bu .t/u.t/ C Bw .t/w.t/,
where
x.t/ W State vector; a random process:

u.t/ W Deterministic control inputs:

w.t/ W Noise driving the process .process noise/; a random process.


Further:
EŒw.t/! D 0I EŒw.t/w.%/T ! D Sw ı.t # %/
T
EŒx.0/! D x.0/I
N EŒ.x.0/ # x.0//.x.0/
N # x.0//
N ! D „x .0/.
■ A, Bu and Bw assumed known. Can be time-varying. For now,
assume fixed. Generalize later if necessary.
■ Sw is the spectral density of w.t/.
■ Easiest analysis is to discretize model; use results obtained earlier;
let &t ! 0.
! Drop deterministic inputs u.t/ for simplicity. (They affect only the
mean, and in known ways.)

Continuous-time propagation of statistics


Mean value
■ Starting with the discrete case, we have
xN k D Ad xN k#1 C Bd uk#1

Ad D e A&t ' I C A&t C O.&t 2/.


Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–25
■ Let &t ! 0 and consider case when uk D 0 for all k.
xN k D .A&t C I /xN k#1
xN k # xN k#1
D AxN k#1.
&t
■ As &t ! 0
PN
x.t/ D Ax.t/
N
as expected.

Therefore, the mean propagation is

x.0/
N W Given
PN / D Ax.t
x.t N / C Bu.t /.

Deterministic simulation and mean values treated the same way.

Variations about mean


■ The result for discrete-time systems was: „x;k D Ad „x;k#1ATd C „w .
■ But, we need a way to relate discrete „w to a continuous spectral
density Sw before we can proceed.
■ Recall the discrete system response in terms of continuous system
matrices:
Z k&t
xk D e A&t xk#1 C e A.k&t #%/ Bw w.%/ d%
.k#1/&t

D e A&t xk#1 C wk#1 .


■ Integral explicitly accounts for variations in the noise during &t.
RECALL : Discrete white noise of the form
8
<„ ; k D lI
T w
EŒwk wl ! D
:0; k ¤ l:
Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–26
■ But noise input term to our converted state dynamics is defined as an
integral Z k&t
wk#1 D e A.k&t #%/Bw w.%/ d%.
.k#1/&t
■ Form outer product to get equivalent „w
2 ! Z !T 3
Z k&t k&t
„w D E 4 e A.k&t #%/ Bw w.%/ d% e A.k&t #' / Bw w.'/ d' 5
.k#1/&t .k#1/&t
"Z Z #
k&t k&t
T .k&t#' /
DE e A.k&t #%/Bw w.%/w.'/T BwT e A d% d' .
.k#1/&t .k#1/&t

■ Big ugly mess but has two saving graces


1. Expectation can go inside integrals.
2. EŒw.%/w.'/T ! D Sw ı.% # '/ ➠ One of the integrals drops out!
■ So, we have
Z k&t
T .k&t#%/
„w D e A.k&t #%/ Bw Sw BwT e A d%.
.k#1/&t

KEY POINT: While Sw may have simple form, „w will be full matrix in
general.
■ To solve integral, we can approximate: As &t ! 0, then k&t # % ! 0
and e A.k&t #%/ ' I C A.k&t # %/ C % % %
■ That is, e A.k&t #%/ ' I . Then,
( )
„w ' Bw Sw BwT &t.

■ We will see a better method to evaluate „w when &t ¤ 0, but for now
we continue with this result to determine the continuous-time system
covariance propagation.
Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–27

■ Start with „x;k D Ad „x;k#1ATd C „w , and substitute „w ' Bw Sw BwT &t.


Also, use Ad ' .I C A&t/
„x;k ' .I C A&t/„x;k#1.I C A&t/T C Bw Sw BwT &t

D „x;k#1 C &t.A„x;k#1 C „x;k#1AT C Bw Sw BwT / C O.&t 2/


„x;k # „x;k#1
D A„x;k#1 C „x;k#1AT C Bw Sw BwT C O.&t/.
&t
■ As &t ! 0

P x .t/ D A„x .t/ C „x .t/AT C Bw Sw BwT ,



initialized with „x .0/.
■ This is a matrix differential equation.
■ Symmetric, so don’t need to solve for every element.
■ Two effects
! A„x .t/ C „x .t/AT : Homogeneous part. Contractive for stable A.
Reduces covariance.
T
! Bw Sw Bw : Impact of process noise. Tends to increase covariance.
■ Steady-state solution: Effects balance for systems with constant
dynamics (A, Bw , Sw ) and stable A.
A„x;ss C „x;ss AT C Bw Sw BwT D 0.
■ This is a continuous-time Lyapunov equation. In MATLAB, lyap.m

The covariance propagation is

„x .0/ W Given
P x .t / D A„x .t / C „x .t /AT C Bw Sw BwT .

In this equation, A„x .t / C „x .t /AT is the homogeneous part; Bw Sw BwT is the driving term.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–28
EXAMPLE : x.t/
P D Ax.t/ C Bw w.t/, where A and Bw are scalars.
■ P x D 2A„x C Bw2 Sw . This can be solved in closed form to find
Then, „
Bw2 Sw 2At
„x .t/ D .e # 1/ C „x .0/e 2At .
2A
■ If A < 0 (stable) then the initial condition contribution goes to zero.
#Bw2 Sw
„x;ss D
2A
independent of „x .0/.
■ Example: Let A D #1, Bw D 1,
„x;ss D 1
Sw D 2, and „x .0/ D 5. Find 5

„x .t/. 4

Solution: 3
„x .t /

2
■ Increased „x;ss as more noise
added: Bw2 Sw term. 1

0
■ Decreased „x;ss as A becomes 0 1 2 3 4 5
Time
“more stable”.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–29
2.9: Relating „w to Sw precisely: A little trick

■ Want to find „w precisely, regardless of &t. Assume we know Sw and


plant dynamics.
Z &t
T
„w D e A.&t #%/ Bw Sw BwT e A .&t #%/ d%.
0

■ Define
2 3 2 3
#A Bw Sw BwT c11 c12
ZD4 5; C De Z&t
D4 5
T 0 c22
0 A
ˇ
■ We can compute C D L Œ.sI # Z/ !ˇt D&t and find that
#1 #1

ˇ ˇ
c11 D L#1 Œ.sI # ´11/#1!ˇt D&t D L#1 Œ.sI C A/#1!ˇt D&t D e #A&t
ˇ
c12 D L#1 Œ.sI # ´11/#1´12.sI # ´22/#1!ˇt D&t
#1 #1 T T #1 ˇ
ˇ
D L Œ.sI C A/ Bw Sw Bw .sI # A / ! t D&t
ˇ
#1 ˇ T #1 ˇ
ˇ T
c22 D L Œ.sI # ´22/ ! t D&t D L Œ.sI # A / ! t D&t D e A &t D ATd
#1 #1

T T
■ So, c22 D Ad . Also c22 c12 D „w . With one expm.m command, can
quickly switch from continuous-time to discrete-time.
■ To show the second identity, recognize that c12 is a convolution of two
impulse responses: one due to the system .sI C A/#1Bw and the
other due to the system Sw BwT .sI # AT /#1.
■ The first of these has impulse response Ie #At Bw and the second has
T
impulse response Sw BwT e A t . Convolving them we get
Z &t
T
Ie #A% Bw Sw BwT e A .&t #%/ d%.
0

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–30
■ Finally, consider
Z &t
T T .&t #%/
c22 c12 D e A.&t #%/Bw Sw BwT e A d% D „w .
0

EXAMPLE : Let " # " #


0 1 0
x.t/
P D x.t/ C w.t/
#1 #1 1
with Sw D 0:06 and &t D 2#=16.

2 3 2 3
0 #1 0 0 0:91 #0:47 #0:006 #0:0046
6 7 6 7
6 1 1 0 0:06 7 6 0:47 1:38 0:0046 0:023 7
6 7 6 7
ZD6 7I C D6 7:
6 0 0 0 #1 7 6 0 0 0:93 #0:32 7
4 5 4 5
0 0 1 #1 0 0 0:32 0:62
■ So,
" # " # " #
0:93 0:32 0:0009 0:0030 0 0
Ad D I „w D I Bw Sw BwT D .
#0:32 0:62 0:0030 0:0156 0 0:06
■ Exact „w and approximate „w very different due to large &t.
■ Compare predictions of steady-state performance.

! Continuous: " #
0:03 0
A„x;ss C „x;ss AT C Bw Sw BwT D 0I „x;ss D :
0 0:03
" #
0:03 0
! Discrete: „x;ss D Ad „x;ss ATd C „w I „x;ss D :
0 0:03
■ Because we used discrete white noise scaled to be equivalent to the
continuous noise, the steady-state predictions are the same.
■ Conversion now complete:
Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–31

DISCRETE : Ad , „w , Bd

Given „x;0 W „x;k D Ad „x;k!1 ATd C „w

Given xN 0 W xN k D Ad xN k!1 C Bd uk!1 .

CONTINUOUS : A, Sw , Bu , Bw
P x .t / D A„x .t / C „x .t /AT C Bw Sw BwT
Given „x .0/ W „

Given x.0/
N PN / D Ax.t
W x.t N / C Bu u.t /.

■ All matrices may be time varying.


■ Equivalent „w for given Sw available.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–32
2.10: Shaping filters

■ We have assumed that the inputs to the linear dynamic systems are
white (Gaussian) noises.
■ Therefore, the PSD is flat ➠ The input has content at all frequencies.
■ Pretty limiting assumption, but one that can be easily fixed ➠ Can use
second linear system to “shape” the noise and modify the PSD as
desired.
Previous Picture New Picture
White Shaped
noise w.t / G.s/ y.t / G.s/ y.t /
noise w1 .t /

White w1 .t /
noise w2 .t / H.s/ G.s/ y.t /
Shaped

■ Therefore, we can drive our linear system with noise that has a
desired PSD by introducing a shaping filter H.s/ that itself is driven by
white noise.
■ The combined system GH.s/ looks exactly the same as before, but
the system G.s/ is not driven by pure white noise any more.
■ Analysis quite simple: Augment original model with filter states.
■ Original system has

x.t/
P D Ax.t/ C Bw w1 .t/

y.t/ D C x.t/

■ New shaping filter with white input and desired PSD output has

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–33
xP s .t/ D As xs .t/ C Bs w2 .t/

w1.t/ D Cs xs .t/.

■ Combine into one system


2 3 2 32 3 2 3
x.t/
P A Bw Cs x.t/ 0
4 5D4 54 5C4 5 w2.t/
xP s .t/ 0 As xs .t/ Bs
2 3
h i x.t/
y.t/ D C 0 4 5
xs .t/
■ Augmented system just a larger-order linear system driven by white
noise.

KEY QUESTION : Given a PSD for w1 , how do we find H.s/ (or, As , Bs , Cs ).


■ Fact: If w2 is unit-variance white noise, then PSD for w1 is

Sw1 .!/ D H.#j!/H T .j!/.

■ Find H.j!/ by spectral factorization of Sw1 .!/. Take the


minimum-phase part (keep the poles and zeros that have negative
real part).

EXAMPLE : Let
2" 2˛ 2
Sw1 .!/ D 2
! C ˛2
p p
2"˛ 2"˛
D % .
˛ C j! ˛ # j!
Then

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–34
p
2"˛
H.s/ D ;
sC˛
p
xP s .t/ D #˛xs .t/ C 2˛"w2 .t/

w1.t/ D xs .t/.

EXAMPLE : Wind model.


■ Wind gusts have some high-frequency content, but typically are
dominated by low-frequency disturbances ➠ White noise a good
model?
■ We would typically think of a PSD for the wind that puts more
emphasis on the low-frequency component of the signal.
1
■ Think of this as the output of a shaping filter H.s/ D .
%w s C 1
NOTE : If the bandwidth of the wind filter .1=%w / >bandwidth of plant
dynamics, then white noise is a good approximation to system input.
■ Otherwise, must use shaping filters in plant model and drive with
white noise.
■ The size of the A matrix becomes a limitation. Use the simplest noise
model possible.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett

You might also like