Vector Random Processes

ECE5530: Multivariable Control Systems II.
2–1
VECTOR RANDOM PROCESSES
2.1: Scalar random variables
■ We will be working with systems whose state is affected by noise and

whose output measurements are also corrupted by noise.
Sensor noise
v.t /
System
Disturbance y.t /
w.t / x.t /
■ By definition, noise is not deterministic—it is random in some sense.

■ So, to discuss the impact of noise on the system dynamics, we must
review the concept of a “random variable,” (RV) X.
! Cannot predict exactly what we will get each time we measure or

sample the random variable, but
! We can characterize the probability of each sample value by the
“probability density function” (pdf).
Probability density functions (pdf)
■ We denote probability density function (pdf) of RV X as fX .x/.

fX .x/
dx ■ fX .x0/ dx is the probability that
random variable X is between
x
x0 Œx0; x0 C dx!.
Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett
ECE5530, VECTOR RANDOM PROCESSES 2–2
■ Properties that are true of all pdfs:
1. f
ZX .x/ " 0 8 x.
1
2. fX .x/ dx D 1.
#1 Z x0
4
3. Pr.X $ x0/ D fX .x/ dx D FX .x0/, which is the RV’s cumulative
#1
distribution function (cdf).1
■ Problem: Apart from simple examples it is often difficult to determine
fX .x/ accurately. ➠ Use approximations to capture the key behavior.
■ Need to define key characteristics of fX .x/.
EXPECTATION : Describes the expected outcome of a random trial.
Z 1
xN D EŒX! D xfX .x/ dx.
#1
■ Expectation is a linear operator (very important for working with it).
■ So, for example, the first moment about the mean: EŒX # x!
N D 0.
STATISTICAL AVERAGE : Different from expectation.
■ Consider a (discrete) RV X that can assume n values x1, x2, . . . xn.
■ Define the average by making many measurements N ! 1. Then,
mi is the number of times the value of the measurement is i.
1 m1 m2 mn
xN D .m1x1 C m2x2 C % % % C mnxn/ D x1 C x2 C % % % C xn.
N N N N
■ In the limit, m =N ! Pr.X D x / and so forth (assuming ergodicity),
1 1
N
X
xN D xi Pr.X D xi /.
i D1
1
Some call this a probability distribution function, with acronym PDF (uppercase), as
different from a probability density function, which has acronym pdf (lowercase).
■ So, statistical means can converge to expectation in the limit. (Can
show similar property for continuous RVs. . . but harder to do.)
VARIANCE : Second moment about the mean.
Z 1
N 2! D
var.X/ D EŒ.X # x/ N 2fX .x/ dx
.x # x/
#1
D EŒX 2! # .EŒX!/2,
or is equal to the mean-square minus the square-mean.
STANDARD DEVIATION : Measure of dispersion about the mean of the
p
samples of X: "X D var.X/.
■ The expectation and variance capture key features of the actual pdf.
Higher-order moments are available, but we won’t need them!
KEY POINT FOR UNDERSTANDING VARIANCE : Chebychev’s inequality
■ Chebychev’s inequality states (for positive ")
"X2
Pr.jX # xj N " "/ $ 2 ,
"
which implies that probability is concentrated around the mean.
■ It may be proven as follows:
Z x#"
N Z 1
Pr.jX # xj
N " "/ D fX .x/ dx C fX .x/ dx.
#1 xC"
N
■ For the two regions of integration jx # xj="

N " 1 or .x # x/N 2="2 " 1. So,
Z x#"
N Z 1
.x # x/N 2 .x # x/N 2
Pr.jX # xj
N " "/ $ 2
fX .x/ dx C 2
fX .x/ dx.
#1 " xC"
N "
■ Since fX .x/ is positive, then we also have
Z 1
.x # x/N 2 "X2
Pr.jX # xjN " "/ $ 2
fX .x/ dx D 2 .
#1 " "
■ This inequality shows that probability is clustered around the mean,
and that the variance is an indication of the dispersion of the pdf.
■ That is, variance (later on, covariance too) informs us of how
uncertain we are about the value of a random variable.
! Low variance means that we are very certain of its value;
! High variance means that we are very uncertain of its value.
■ The mean and variance give us an estimate of the value of a random

variable, and how certain we are of that estimate.
The most important distribution for this course

■ The Gaussian (normal) distribution is of key importance to us. (We
will explain why this is true later—see “main point #7” on pg. 2–14.)
■ Its pdf is defined as:
" !
1 N 2
.x # x/
fX .x/ D p exp # 2
.
2#"X 2"X
■ Symmetric about x.
N
−15 −10 −5 0 5 10 15
1
■ Peak proportional to at x.
N
"X
■ N "X2 /.
Notation: X & N .x;
■ Probability that X within ˙"X of xN is 68%; probability that X within
˙2"X of xN is 96%; probability that X within ˙3"X of xN is 99:7%.
! A ˙3"X range almost certainly covers observed samples.
■ “Narrow” distribution ➠ Sharp peak. High confidence in predicting X.

■ “Wide” distribution ➠ Poor knowledge in what to expect for X.
2.2: Vector random variables
■ With very little change in the preceding, we can also handle vectors of
random variables.
2 3 2 3
X1 x1
6 7 6 7
6 X2 7 6 x2 7
X D6 7
6 ::: 7 I let x0 D 6 7
6 ::: 7 .
4 5 4 5
Xn xn
■ X described by (scalar function) joint pdf fX .x/ of vector X.
■ fX .x0/ means fX .X1 D x1; X2 D x2 % % % Xn D xn/.
■ That is, fX .x0/ dx1 dx2 % % % dxn is the probability that X is between x0
and x0 C dx.
■ Properties of joint pdf fX .x/:
1. f
ZX1.x/
Z 1" 0 Z8 x. Same as before.
1
2. %%% fX .x/ dx1 dx2 % % % dxn D 1. Basically the same.
#1 #1 Z
#1 Z
1 1 Z 1
3. xN D EŒX! D %%% xfX .x/ dx1 dx2 % % % dxn. Basically same.
#1 #1 #1
4. Correlation matrix: Different.
†X D EŒXX T ! (outer product)

Z 1Z 1 Z 1
D ::: xx T fX .x/ dx1 dx2 % % % dxn.
#1 #1 #1
e D X # x.
5. Covariance matrix: Different. Define X N Then,
„X D †Xe D EŒ.X # x/.X

N N T!
# x/
Z 1Z 1 Z 1
D %%% .x # x/.x
N N T fX .x/ dx1 dx2 % % % dxn.
# x/
#1 #1 #1
„X is symmetric and positive-semi-definite (psd). This means
y T „X y " 0 8 y.
PROOF : For all y ¤ 0,
0 $ EŒ.y T .X # x//
N 2!
D y T EŒ.X # x/.X
N N T !y
# x/
D y T „X y.
■ Notice that correlation and covariance are the same for zero-mean
random vectors.
■ The covariance entries have specific meaning:
.„X /i i D "X2 i
.„X /ij D $ij "Xi "Xj D .„X /j i .
! The diagonal entries are the variances of each vector component;
! The correlation coefficient $ij is a measure of linear dependence

between Xi and Xj . j$ij j $ 1.
The most important multivariable distribution for this course
■ The multivariable Gaussian is of key importance for Kalman filtering.
xN 1 xN 2
! "
11
fX .x/ D 1=2
N T „#1
exp # .x # x/ X .x # x/
N :
n=2
.2#/ j„X j 2
j„X j D det.„X /; „#1
X requires positive-definite „X .
■ Notation: X & N .x;

N „X /.
■ Contours of constant fX .x/ are hyper-ellipsoids, centered at x,
N
directions governed by „X . Principle axes decouple „X
(eigenvectors).
■ Two-dimensional zero-mean case: (Let "1 D "X1 and "2 D "X2 )
" #
2
"1 $12"1"2
„X D 2
j„X j D "12"22.1 # $122
/:
$12"1"2 "2
0 2 1
x1 x1 x2 x22
1 # 2$12 "1 "2 C " 2
B "12 2 C
fX1 ;X2 .x1; x2/ D q exp @# 2 A.
2#"1"2 1 # $122 2.1 # $ 12 /
2.3: Uncorrelated versus independent
INDEPENDENCE : Iff jointly-distributed RVs are independent, then
fX .x1; x2; : : : ; xn/ D fX1 .x1/fX2 .x2/ % % % fXn .xn/.
■ Joint distribution can be split up into the product of individual

distributions for each RV
■ Equivalent condition: For all functions f .%/ and g.%/,
EŒf .X1/g.X2/! D EŒf .X1/!EŒg.X2 /!.
■ “The particular value of the random variable X1 has no impact on

what value we would obtain for the random variable X2.”
UNCORRELATED : Two jointly-distributed RVs X1 and X2 are uncorrelated

if their second moments are finite and
cov.X1; X2/ D EŒ.X1 # xN 1 /.X2 # xN 2/! D 0
which implies $12 D 0.

■ Uncorrelated means that there is no linear relationship between X1
and X2.
MAIN POINT #1: If jointly-distributed RV X1 and X2 are independent then

they are uncorrelated. Independence implies uncorrelation.
■ To see this, notice that independence means EŒX1X2! D EŒX1!EŒX2!.
Therefore,
cov.X1; X2/ D EŒX1X2! # EŒX1!EŒX2! D 0,
therefore uncorrelated.
■ Does uncorrelation imply independence? Consider the example
Y1 D sin.2#X/ and Y2 D cos.2#X/
with X uniformly distributed on Œ0; 1!.
■ We can show that EŒY1! D EŒY2! D EŒY1Y2! D 0.
■ So, cov.Y1; Y2/ D 0 and Y1 and Y2 are uncorrelated.
■ But, Y12 C Y22 D 1 and therefore Y1 and Y2 are clearly not independent.
■ Note: cov.X1; X2 / D 0 (uncorrelated) implies that
EŒX1X2 ! D EŒX1 !EŒX2 !
but independence requires

EŒf .X1/g.X2 /! D EŒf .X1/!EŒg.X2 /!
for all f .%/ and g.%/.

■ Therefore, independence is much stronger than uncorrelated.
COROLLARY: Consider a RV X with uncorrelated components. The
covariance matrix „X D EŒ.X # x/.X
N N T ! is diagonal.
# x/
PROOF : Notice that:
! The diagonal elements are .„X /i i D EŒ.Xi # xN i /2! D "i2 :
! The off-diagonal elements are
.„X /ij D EŒ.Xi # xN i /.Xj # xN j /!
D EŒXi Xj # Xi xN j # xN i Xj C xN i xN j !
D EŒXi !EŒXj ! # 2EŒXi !EŒXj ! C EŒXi !EŒXj ! D 0.

MAIN POINT #2: If jointly normally distributed RVs are uncorrelated, then
they are independent. (This is a special case.)
2.4: Functions of random variables
MAIN POINT #3: Suppose we are given two random variables Y and X
with Y D g.X/. Also assume that g #1 exists, g and g #1 are
continuously differentiable, then
#ˇ #1 ˇ#
#ˇ @g .y/ ˇ#
fY .y/ D fX .g #1.y// #ˇ
#ˇ @y ˇ# ,
ˇ#
where kj%jk means to take the absolute value of the determinant.

■ So what? Say we know the pdf of X1 and X2 and
)
Y1 D X1 C X2
Y D AX
Y2 D X2
we can now form the pdf of Y ➠ handy!
EXAMPLE : Y D kX and X & N .0; "X2 /.

1 #1 1 @g #1 .y/ 1
■ X D Y and so g .y/ D y. Then, D .
k k @y k
$ y % ˇˇ 1 ˇˇ 1 1
!
#.y=k/ 2"
fY .y/ D fX ˇ ˇD p exp
k ˇ k ˇ jkj 2#"X 2"X2
! "
1 #y 2
Dp exp 2
.
2#."X k/ 2 2." X k/
■ Therefore, multiplication by gain k results in a normally-distributed RV
with scale change in standard deviation. Y & N .0; k 2"X2 /.
EXAMPLE : Y D AX C B where A is a constant (non-singular) matrix, B

is a constant vector, and X & N .x;
N „X /.
#1 #1 @g #1 .y/
#1 #1 #1
■ X D A Y # A B so g .y/ D A y # A B. Then, D A#1.
@y
& '
1 1
■ f .x/ D
X n=2 1=2
exp # .x # x/ N T „#1
X .x # x/ N .
.2#/ j„X j 2
■ Therefore,
& '
jA#1j 1 #1 T #1 #1
fY .y/ D exp # .A .y # B/ # x/
N „ X .A .y # B/ # x/
N
.2#/n=2j„X j1=2 2
& '
1 1 T #1 T #1 #1
D exp # .y # y/
N .A / „X A .y # y/ N
.2#/n=2.jAjj„X jjAT j/1=2 2
& '
1 1
D n=2 1=2
exp # .y # y/„ N #1 Y .y # y/ N ,
.2#/ j„Y j 2
if „Y D A„X AT and yN D AxN C B. That is, Y & N .AxN C B; A„X AT /.
CONCLUSION : Sum of Gaussians is Gaussian—very special case.
RANDOM NOTE : How to use randn.m to simulate non-zero mean
Gaussian noise with covariance „Y ?
■ Y & N .y;
N „Y / but randn.m returns X & N .0; I /.
■ Try y D yN C AT x where A is square with the same dimension as „Y ;
AT A D „Y . (A is the Cholesky decomposition of positive-definite
symmetric matrix „Y ).
ybar = [1; 2];
covar = [1, 0.5; 0.5, 1]; 5000 samples with mean [1;2] and
A = chol(covar); covar [1,0.5; 0.5,1]
5
x = randn([2, 1]);
4
y = ybar + A'*x;
3
y coordinate
■ When „Y is non-positive definite 2
(but also non-negative definite) 1
[L,D] = ldl(covar); 0
x = randn([2,5000]); −1
y = ybar(:,ones([1 5000])) −4 −2 0 2 4 6
x coordinate
+(L*sqrt(D))*x;
MAIN POINT #4: Normality is preserved in a linear transformation

(extension of above example).
■ Let X and W be independent vector random variables, a is a constant
(vector): X & N .x;
N „X /, W & N .w;
N „W /.
■ Let
Z D AX C BW C a
EŒZ! D Ń D AxN C B wN C a
„Z D A„X AT C B„W B T .
■ Z is also a normal RV and Z & N . Ń; „Z /.
2.5: Conditioning
MAIN POINT #5: Conditional probabilities.

■ Given jointly-distributed RVs, it is often of extreme interest to find the
pdf of some of the RVs given known values for the rest.
■ For example, given the joint pdf fX;Y .x; y/ for RVs X and Y , we want
the conditional pdf of fXjY .x j y/ which is the pdf of X for a known
value Y D y.
■ Can also think of it as fXjY .X D x j Y D y/.
DEFINE : Conditional pdf

fX;Y .x; y/
fXjY .x j y/ D
fY .y/
is the probability that X D x given that Y D y has happened.
NOTE I : The marginal probability fY .y/ may be calculated as
Z 1
fY .y/ D fX;Y .x; y/ dx.
#1
For each y, integrate out the effect of X.
NOTE II : If X; Y independent, fX;Y .x; y/ D fX .x/fY .y/. Therefore
fX;Y .x; y/ fX .x/fY .y/
fXjY .x j y/ D D D fX .x/.
fY .y/ fY .y/
Knowing that Y D y has occurred provides no information about X.
(Is this what you would expect?)
DIRECT EXTENSION :
fX;Y .x; y/ D fXjY .x j y/fY .y/
D fY jX .y j x/fX .x/,
Therefore,
fY jX .y j x/fX .x/
fXjY .x j y/ D .
fY .y/
■ This is known as Bayes’ rule. It relates the a posteriori probability to
the a priori probability.
■ It forms a key step in the Kalman filter derivation.
MAIN POINT #6: Conditional expectation.

■ Now that we have a way of expressing a conditional probability, we
can also express a conditional mean.
■ What do we expect the value of X to be given that Y D y has
happened?
Z 1
EŒX D x j Y D y! D EŒX j Y ! D xfXjY .x j Y / dx.
#1
■ Conditional expectation is not a constant (as expectation is) but a

random variable. It is a function of the conditioning random variable
(i.e., Y ).
■ Note: Conditional expectation is critical. The Kalman filter is an
algorithm to compute EŒxk j Zk !, where we define Zk later.
MAIN POINT #7: Central limit theorem.

X
■ If Y D Xi and the Xi are independent and identically distributed
i
IID), and the Xi have finite mean and variance, then Y will be
approximately normally distributed.
■ The approximation improves as the number of summed RVs gets
large.
■ Since the state of our dynamic system adds up the effects of lots of
independent random inputs, it is reasonable to assume that the
distribution of the state tends to the normal distribution.
■ This leads to the key assumptions for the derivation of the Kalman
filter, as we will see:
! We will assume that state xk is a normally-distributed RV;
! We will assume that process noise wk is a normally-distributed RV;
! We will assume that sensor noise vk is a normally-distributed RV;
! We will assume that wk and vk are uncorrelated with each other.
■ Even when these assumptions are broken in practice, the Kalman

filter often works quite well.
■ Exceptions to this rule tend to be with very highly nonlinear systems,
for which particle filters must sometimes be employed to get good
estimates.
2.6: Vector random (stochastic) processes
■ A stochastic random process is a family of random vectors indexed by
a parameter set (“time” in our case).
! For example, the random process X.t/.
! The value of the random process at any time t0 is RV X.t0/.
■ Usually assume stationarity.

! The pdf of the random variables are time-shift invariant.
! Therefore, EŒX.t/! D xN for all t and EŒX.t1/X T .t2 /! D RX .t1 # t2 /.
Properties
1. Autocorrelation: RX .t1; t2/ D EŒX.t1/X T .t2/!. If stationary,

RX .%/ D EŒX.t/X T .t C %/!.
■ Provides a measure of correlation between elements of the
process having time displacement %.
■ RX .0/ D "X2 for zero-mean X.
■ RX .0/ is always the maximum value of RX .%/.
2. Autocovariance: CX .t1; t2/ D EŒ.X.t1/ # EŒX.t1/!/.X.t2/ # EŒX.t2/!/T !. If
stationary,
CX .%/ D EŒ.X.t/ # x/.X.t
N N T !.
C %/ # x/
“White” noise
■ Correlation from one time instant to the next is a key property of a
random process. ➠ RX .%/ and CX .%/ very important.
■ Some processes have a unique autocorrelation:
1. Zero mean,
2. RX .%/ D EŒX.t/X.t C %/T ! D SX ı.%/ where ı.%/ is the Dirac delta.
■ Therefore, the process is uncorrelated in time.
■ Clearly an abstraction, but proves to be a very useful one.
White noise Correlated noise
4 0.2
0.15
2
0.1
0.05
Value
Value
0
0
−0.05
−2
−0.1
−4 −0.15
0 200 400 600 800 1000 0 200 400 600 800 1000
Time Time
Power spectral density (PSD)
■ Consider stationary random processes with autocorrelation defined

as RX .%/ D EŒX.t/X.t C %/T !.
■ If a process varies slowly, time-adjacent samples are very correlated,
and RX .%/ will drop off slowly ➠ Mostly low-frequency content.
■ If a process varies quickly, time-adjacent samples aren’t very
correlated, and RX .%/ drops off quickly ➠ More high-freq. content.
■ Therefore, the autocorrelation function tells us about the frequency
content of the random signal.
■ Consider scalar case. We define the power spectral density (PSD) as
the Fourier transform of the autocorrelation function:
Z 1
SX .!/ D RX .%/e #j!% d%
#1
Z 1
1
RX .%/ D SX .!/e j!% d!.
2# #1
■ SX .!/ gives a frequency-domain interpretation of the “power”
(energy) in the random process X.t/, and is real, symmetric and
positive definite for real random processes.
EXAMPLE : RX .%/ D " 2e #˛j%j for ˛ > 0. Then,
Z 1
2 #˛j%j #j!% 2" 2˛
SX .!/ D " e e d% D 2 2
.
#1 ! C ˛
RX .% / SX .!/
2" 2
Peak of " 2 Peak of
˛
% !
Autocorrelation PSD
■ ˛ is a bandwidth measure.
INTERPRETATION :
1. Area under PSD SX .!/ for !1 $ ! $ !2 provides a measure of

energy in the signal in that frequency range.
2. White noise can be thought of as a limiting process as ˛ ! 1.
From our prior example, let ˛ ! 1.
■ As ˛ ! 1, RX .%/ ! ı.%/; therefore white.
■ Bandwidth of spectral content is approximately ˛, so goes to infinity.
■ Abstraction since implies that signal has infinite energy!
2.7: Discrete-time dynamic systems with random inputs
■ What are the statistics of the state of a system driven by a random

process?
■ That is, how do the system dynamics influence the time-propagation
of x.t/? ➠ Do not know x.t/ exactly, but can develop a pdf for x.t/.
■ Primary example: A linear system driven by white noise w.t/,
possibly in addition to deterministic input u.t/.
u.t / A; B; C; D
y.t /
w.t /
x.t /
x.t /
■ Contrast to deterministic simulation, which can be easily simulated
x.t/
P D Ax.t/ C Bu.t/; x.0/ D x0; x.0/; u.t/ known.
■ System response is completely specified. Therefore, no uncertainty.

■ The stochastic propagation is different since there are inputs that are
not known exactly. ➠ Best we can do is to say how uncertainty in the
state changes with time.
x2 Pr.x/
x.tf /
x2
x.tf /
x.to /
x.to /
x1 x1
Deterministic Stochastic
■ Deterministic: x.t0/ and inputs known ➠ State trajectory deterministic.

■ Stochastic: pdf for x.t0/ and inputs known ➠ Can find only pdf for x.t/.
■ We will work with Gaussian noises to a large extent, which are
uniquely defined by the first- and second central moments of the
statistics ➠ Gaussian assumption not essential.
■ We will only ever track the first two moments.
NOTATION : Until now, we have always used capital letters for random
variables. The state of a system driven by a random process is a
random vector, so we could now call it X.t/ or Xk . However, it is more
common to retain the standard notation x.t/ or xk and understand
from the context that we are now discussing an RV
Discrete-time systems
■ We will start with a discrete system and then look at how to get an
equivalent answer for continuous systems.
■ Model:
xk D Ad;k#1xk#1 C Bd;k#1uk#1 C wk#1 ,
where
xk W State vector; a random process:
uk W Deterministic control inputs:
wk W Noise that drives the process .process noise/; a random process.
■ Ad and Bd assumed known. Can be time-varying. For now, assume

fixed. Generalize later if necessary.
■ Key assumptions about driving noise:
1. Zero mean: EŒwk ! D 0 8 k.
2. White: EŒwk1 wkT2 ! D „w &.k1 # k2/.
■ „w is called the spectral density of the noise signal wk .
■ Uncertainty at startup: Statistics of initial condition.
EŒx0 ! D xN 0 I
EŒ.x0 # xN 0 /wkT ! D 0 8 k
„x;0 D EŒ.x0 # xN 0/.x0 # xN 0 /T !.
■ Key question: How do we propagate the statistics of this system?
Mean value:
■ The mean value is
EŒxk ! D xN k D EŒAd xk#1 C Bd uk#1 C wk#1 !
D Ad xN k#1 C Bd uk#1.
Therefore, the mean propagation is
xN 0 W Given
xN k D Ad xN k!1 C Bd uk!1 .
Deterministic simulation and mean values treated the same way.
Variations about mean
■ To study the random variations about the mean, we need to form the
second central moment of the statistics.
„x;k D EŒ.xk # xN k /.xk # xN k /T !.
■ Easiest to study if we note that
xk # xN k D Ad xk#1 C Bd uk#1 C wk#1
#Ad xN k#1 # Bd uk#1
D Ad .xk#1 # xN k#1 / C wk#1 .
■ Thus,
„x;k D EŒ.Ad .xk#1 # xN k#1/ C wk#1/.Ad .xk#1 # xN k#1/ C wk#1 /T !.
■ Three terms:
1. EŒAd .xk#1 # xN k#1/.xk#1 # xN k#1/T ATd ! D Ad „x;k#1ATd .

T
2. EŒwk#1wk#1 ! D „w .
T
3. EŒAd .xk#1 # xN k#1/wk#1 ! D?
■ The third term is a cross-correlation term.
! But, xk#1 depends only on x0 and inputs wm for m D 0 : : : k # 2.
! wk#1 is white noise uncorrelated with x0 .
! Therefore, third term is zero.
Therefore, the covariance propagation is
„x;0 W Given
„x;k D Ad „x;k!1 ATd C „w .
In this equation, Ad „x;k!1 ATd is the homogeneous part; „w is the driving term.
■ Note: If Ad and „w are constant and Ad stable, there is a

steady-state solution.
! As k ! 1, „x;k D „x;kC1 D „x .
T
! Then, „x D Ad „x Ad C „w .
■ This form is called a discrete Lyapunov equation. In MATLAB,

dlyap.m
EXAMPLE : Can solve by hand in the scalar case.

■ Consider: xk D ˛xk#1 C wk#1 . EŒwk#1! D 0; „w D 1; xN 0 D 0.
■ Then,
„x D ˛„x ˛ T C 1
„x .1 # ˛ 2/ D 1
1
„x D .
1 # ˛2
Valid for j˛j < 1; otherwise unstable.
■ In the example below, we plot 100 random trajectories for ˛ D 0:75
and „x;0 D 50.
■ Compare propagation of „x;k with steady-state dlyap.m solution.
„x D 2:29 Propagation of state, with error bounds
50 30
40 20
10
30
Value
„x;k
0
20
−10
10
−20
0 −30
0 5 10 15 20 0 5 10 15 20
Time Time (100 simulations plotted)
p
■ The error bounds plotted are 3" bounds (i.e., ˙3 „x;k ).
2.8: Continuous-time dynamic systems with random inputs
■ For continuous-time systems we have
x.t/
P D A.t/x.t/ C Bu .t/u.t/ C Bw .t/w.t/,
where
x.t/ W State vector; a random process:
u.t/ W Deterministic control inputs:
w.t/ W Noise driving the process .process noise/; a random process.

Further:
EŒw.t/! D 0I EŒw.t/w.%/T ! D Sw ı.t # %/
T
EŒx.0/! D x.0/I
N EŒ.x.0/ # x.0//.x.0/
N # x.0//
N ! D „x .0/.
■ A, Bu and Bw assumed known. Can be time-varying. For now,
assume fixed. Generalize later if necessary.
■ Sw is the spectral density of w.t/.
■ Easiest analysis is to discretize model; use results obtained earlier;
let &t ! 0.
! Drop deterministic inputs u.t/ for simplicity. (They affect only the
mean, and in known ways.)
Continuous-time propagation of statistics

Mean value
■ Starting with the discrete case, we have
xN k D Ad xN k#1 C Bd uk#1
Ad D e A&t ' I C A&t C O.&t 2/.

■ Let &t ! 0 and consider case when uk D 0 for all k.
xN k D .A&t C I /xN k#1
xN k # xN k#1
D AxN k#1.
&t
■ As &t ! 0
PN
x.t/ D Ax.t/
N
as expected.
Therefore, the mean propagation is
x.0/
N W Given
PN / D Ax.t
x.t N / C Bu.t /.
Deterministic simulation and mean values treated the same way.
Variations about mean

■ The result for discrete-time systems was: „x;k D Ad „x;k#1ATd C „w .
■ But, we need a way to relate discrete „w to a continuous spectral
density Sw before we can proceed.
■ Recall the discrete system response in terms of continuous system
matrices:
Z k&t
xk D e A&t xk#1 C e A.k&t #%/ Bw w.%/ d%
.k#1/&t
D e A&t xk#1 C wk#1 .

■ Integral explicitly accounts for variations in the noise during &t.
RECALL : Discrete white noise of the form
8
<„ ; k D lI
T w
EŒwk wl ! D
:0; k ¤ l:
■ But noise input term to our converted state dynamics is defined as an
integral Z k&t
wk#1 D e A.k&t #%/Bw w.%/ d%.
.k#1/&t
■ Form outer product to get equivalent „w
2 ! Z !T 3
Z k&t k&t
„w D E 4 e A.k&t #%/ Bw w.%/ d% e A.k&t #' / Bw w.'/ d' 5
.k#1/&t .k#1/&t
"Z Z #
k&t k&t
T .k&t#' /
DE e A.k&t #%/Bw w.%/w.'/T BwT e A d% d' .
.k#1/&t .k#1/&t
■ Big ugly mess but has two saving graces

1. Expectation can go inside integrals.
2. EŒw.%/w.'/T ! D Sw ı.% # '/ ➠ One of the integrals drops out!
■ So, we have
Z k&t
T .k&t#%/
„w D e A.k&t #%/ Bw Sw BwT e A d%.
.k#1/&t
KEY POINT: While Sw may have simple form, „w will be full matrix in
general.
■ To solve integral, we can approximate: As &t ! 0, then k&t # % ! 0
and e A.k&t #%/ ' I C A.k&t # %/ C % % %
■ That is, e A.k&t #%/ ' I . Then,
( )
„w ' Bw Sw BwT &t.
■ We will see a better method to evaluate „w when &t ¤ 0, but for now
we continue with this result to determine the continuous-time system
covariance propagation.
■ Start with „x;k D Ad „x;k#1ATd C „w , and substitute „w ' Bw Sw BwT &t.

Also, use Ad ' .I C A&t/
„x;k ' .I C A&t/„x;k#1.I C A&t/T C Bw Sw BwT &t
D „x;k#1 C &t.A„x;k#1 C „x;k#1AT C Bw Sw BwT / C O.&t 2/

„x;k # „x;k#1
D A„x;k#1 C „x;k#1AT C Bw Sw BwT C O.&t/.
&t
■ As &t ! 0
P x .t/ D A„x .t/ C „x .t/AT C Bw Sw BwT ,

„
initialized with „x .0/.
■ This is a matrix differential equation.
■ Symmetric, so don’t need to solve for every element.
■ Two effects
! A„x .t/ C „x .t/AT : Homogeneous part. Contractive for stable A.
Reduces covariance.
T
! Bw Sw Bw : Impact of process noise. Tends to increase covariance.
■ Steady-state solution: Effects balance for systems with constant
dynamics (A, Bw , Sw ) and stable A.
A„x;ss C „x;ss AT C Bw Sw BwT D 0.
■ This is a continuous-time Lyapunov equation. In MATLAB, lyap.m
The covariance propagation is
„x .0/ W Given
P x .t / D A„x .t / C „x .t /AT C Bw Sw BwT .
„
In this equation, A„x .t / C „x .t /AT is the homogeneous part; Bw Sw BwT is the driving term.
EXAMPLE : x.t/
P D Ax.t/ C Bw w.t/, where A and Bw are scalars.
■ P x D 2A„x C Bw2 Sw . This can be solved in closed form to find
Then, „
Bw2 Sw 2At
„x .t/ D .e # 1/ C „x .0/e 2At .
2A
■ If A < 0 (stable) then the initial condition contribution goes to zero.
#Bw2 Sw
„x;ss D
2A
independent of „x .0/.
■ Example: Let A D #1, Bw D 1,
„x;ss D 1
Sw D 2, and „x .0/ D 5. Find 5
„x .t/. 4
Solution: 3
„x .t /
2
■ Increased „x;ss as more noise
added: Bw2 Sw term. 1
0
■ Decreased „x;ss as A becomes 0 1 2 3 4 5
Time
“more stable”.
2.9: Relating „w to Sw precisely: A little trick
■ Want to find „w precisely, regardless of &t. Assume we know Sw and

plant dynamics.
Z &t
T
„w D e A.&t #%/ Bw Sw BwT e A .&t #%/ d%.
0
■ Define
2 3 2 3
#A Bw Sw BwT c11 c12
ZD4 5; C De Z&t
D4 5
T 0 c22
0 A
ˇ
■ We can compute C D L Œ.sI # Z/ !ˇt D&t and find that
#1 #1
ˇ ˇ
c11 D L#1 Œ.sI # ´11/#1!ˇt D&t D L#1 Œ.sI C A/#1!ˇt D&t D e #A&t
ˇ
c12 D L#1 Œ.sI # ´11/#1´12.sI # ´22/#1!ˇt D&t
#1 #1 T T #1 ˇ
ˇ
D L Œ.sI C A/ Bw Sw Bw .sI # A / ! t D&t
ˇ
#1 ˇ T #1 ˇ
ˇ T
c22 D L Œ.sI # ´22/ ! t D&t D L Œ.sI # A / ! t D&t D e A &t D ATd
#1 #1
T T
■ So, c22 D Ad . Also c22 c12 D „w . With one expm.m command, can
quickly switch from continuous-time to discrete-time.
■ To show the second identity, recognize that c12 is a convolution of two
impulse responses: one due to the system .sI C A/#1Bw and the
other due to the system Sw BwT .sI # AT /#1.
■ The first of these has impulse response Ie #At Bw and the second has
T
impulse response Sw BwT e A t . Convolving them we get
Z &t
T
Ie #A% Bw Sw BwT e A .&t #%/ d%.
0
■ Finally, consider
Z &t
T T .&t #%/
c22 c12 D e A.&t #%/Bw Sw BwT e A d% D „w .
0
EXAMPLE : Let " # " #

0 1 0
x.t/
P D x.t/ C w.t/
#1 #1 1
with Sw D 0:06 and &t D 2#=16.
2 3 2 3
0 #1 0 0 0:91 #0:47 #0:006 #0:0046
6 7 6 7
6 1 1 0 0:06 7 6 0:47 1:38 0:0046 0:023 7
6 7 6 7
ZD6 7I C D6 7:
6 0 0 0 #1 7 6 0 0 0:93 #0:32 7
4 5 4 5
0 0 1 #1 0 0 0:32 0:62
■ So,
" # " # " #
0:93 0:32 0:0009 0:0030 0 0
Ad D I „w D I Bw Sw BwT D .
#0:32 0:62 0:0030 0:0156 0 0:06
■ Exact „w and approximate „w very different due to large &t.
■ Compare predictions of steady-state performance.
! Continuous: " #
0:03 0
A„x;ss C „x;ss AT C Bw Sw BwT D 0I „x;ss D :
0 0:03
" #
0:03 0
! Discrete: „x;ss D Ad „x;ss ATd C „w I „x;ss D :
0 0:03
■ Because we used discrete white noise scaled to be equivalent to the
continuous noise, the steady-state predictions are the same.
■ Conversion now complete:
DISCRETE : Ad , „w , Bd
Given „x;0 W „x;k D Ad „x;k!1 ATd C „w
Given xN 0 W xN k D Ad xN k!1 C Bd uk!1 .
CONTINUOUS : A, Sw , Bu , Bw
P x .t / D A„x .t / C „x .t /AT C Bw Sw BwT
Given „x .0/ W „
Given x.0/
N PN / D Ax.t
W x.t N / C Bu u.t /.
■ All matrices may be time varying.

■ Equivalent „w for given Sw available.
2.10: Shaping filters
■ We have assumed that the inputs to the linear dynamic systems are
white (Gaussian) noises.
■ Therefore, the PSD is flat ➠ The input has content at all frequencies.
■ Pretty limiting assumption, but one that can be easily fixed ➠ Can use
second linear system to “shape” the noise and modify the PSD as
desired.
Previous Picture New Picture
White Shaped
noise w.t / G.s/ y.t / G.s/ y.t /
noise w1 .t /
White w1 .t /
noise w2 .t / H.s/ G.s/ y.t /
Shaped
■ Therefore, we can drive our linear system with noise that has a
desired PSD by introducing a shaping filter H.s/ that itself is driven by
white noise.
■ The combined system GH.s/ looks exactly the same as before, but
the system G.s/ is not driven by pure white noise any more.
■ Analysis quite simple: Augment original model with filter states.
■ Original system has
x.t/
P D Ax.t/ C Bw w1 .t/
y.t/ D C x.t/
■ New shaping filter with white input and desired PSD output has
xP s .t/ D As xs .t/ C Bs w2 .t/
w1.t/ D Cs xs .t/.
■ Combine into one system

2 3 2 32 3 2 3
x.t/
P A Bw Cs x.t/ 0
4 5D4 54 5C4 5 w2.t/
xP s .t/ 0 As xs .t/ Bs
2 3
h i x.t/
y.t/ D C 0 4 5
xs .t/
■ Augmented system just a larger-order linear system driven by white
noise.
KEY QUESTION : Given a PSD for w1 , how do we find H.s/ (or, As , Bs , Cs ).

■ Fact: If w2 is unit-variance white noise, then PSD for w1 is
Sw1 .!/ D H.#j!/H T .j!/.
■ Find H.j!/ by spectral factorization of Sw1 .!/. Take the

minimum-phase part (keep the poles and zeros that have negative
real part).
EXAMPLE : Let
2" 2˛ 2
Sw1 .!/ D 2
! C ˛2
p p
2"˛ 2"˛
D % .
˛ C j! ˛ # j!
Then
p
2"˛
H.s/ D ;
sC˛
p
xP s .t/ D #˛xs .t/ C 2˛"w2 .t/
w1.t/ D xs .t/.
EXAMPLE : Wind model.

■ Wind gusts have some high-frequency content, but typically are
dominated by low-frequency disturbances ➠ White noise a good
model?
■ We would typically think of a PSD for the wind that puts more
emphasis on the low-frequency component of the signal.
1
■ Think of this as the output of a shaping filter H.s/ D .
%w s C 1
NOTE : If the bandwidth of the wind filter .1=%w / >bandwidth of plant
dynamics, then white noise is a good approximation to system input.
■ Otherwise, must use shaping filters in plant model and drive with
white noise.
■ The size of the A matrix becomes a limitation. Use the simplest noise
model possible.

Vector Random Processes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vector Random Processes

Uploaded by

Copyright:

Available Formats

ECE5530: Multivariable Control Systems II.

VECTOR RANDOM PROCESSES

2.1: Scalar random variables

■ We will be working with systems whose state is affected by noise and

■ By definition, noise is not deterministic—it is random in some sense.

! Cannot predict exactly what we will get each time we measure or

Probability density functions (pdf)

■ We denote probability density function (pdf) of RV X as fX .x/.

■ For the two regions of integration jx # xj="

! Low variance means that we are very certain of its value;

! High variance means that we are very uncertain of its value.

■ The mean and variance give us an estimate of the value of a random

The most important distribution for this course

! A ˙3"X range almost certainly covers observed samples.

■ “Narrow” distribution ➠ Sharp peak. High confidence in predicting X.

†X D EŒXX T ! (outer product)

„X D †Xe D EŒ.X # x/.X

„X is symmetric and positive-semi-definite (psd). This means

PROOF : For all y ¤ 0,

.„X /ij D $ij "Xi "Xj D .„X /j i .

! The diagonal entries are the variances of each vector component;

! The correlation coefficient $ij is a measure of linear dependence

The most important multivariable distribution for this course

■ The multivariable Gaussian is of key importance for Kalman filtering.

■ Notation: X & N .x;

INDEPENDENCE : Iff jointly-distributed RVs are independent, then

fX .x1; x2; : : : ; xn/ D fX1 .x1/fX2 .x2/ % % % fXn .xn/.

■ Joint distribution can be split up into the product of individual

EŒf .X1/g.X2/! D EŒf .X1/!EŒg.X2 /!.

■ “The particular value of the random variable X1 has no impact on

UNCORRELATED : Two jointly-distributed RVs X1 and X2 are uncorrelated

cov.X1; X2/ D EŒ.X1 # xN 1 /.X2 # xN 2/! D 0

which implies $12 D 0.

MAIN POINT #1: If jointly-distributed RV X1 and X2 are independent then

cov.X1; X2/ D EŒX1X2! # EŒX1!EŒX2! D 0,

but independence requires

for all f .%/ and g.%/.

.„X /ij D EŒ.Xi # xN i /.Xj # xN j /!

D EŒXi !EŒXj ! # 2EŒXi !EŒXj ! C EŒXi !EŒXj ! D 0.

where kj%jk means to take the absolute value of the determinant.

EXAMPLE : Y D kX and X & N .0; "X2 /.

EXAMPLE : Y D AX C B where A is a constant (non-singular) matrix, B

■ When „Y is non-positive definite 2

(but also non-negative definite) 1

MAIN POINT #4: Normality is preserved in a linear transformation

■ Z is also a normal RV and Z & N . Ń; „Z /.

MAIN POINT #5: Conditional probabilities.

DEFINE : Conditional pdf

fX;Y .x; y/ D fXjY .x j y/fY .y/

MAIN POINT #6: Conditional expectation.

■ Conditional expectation is not a constant (as expectation is) but a

MAIN POINT #7: Central limit theorem.

! We will assume that state xk is a normally-distributed RV;

! We will assume that process noise wk is a normally-distributed RV;

! We will assume that sensor noise vk is a normally-distributed RV;

! We will assume that wk and vk are uncorrelated with each other.

■ Even when these assumptions are broken in practice, the Kalman

■ Usually assume stationarity.

1. Autocorrelation: RX .t1; t2/ D EŒX.t1/X T .t2/!. If stationary,

Power spectral density (PSD)

■ Consider stationary random processes with autocorrelation defined

1. Area under PSD SX .!/ for !1 $ ! $ !2 provides a measure of

■ What are the statistics of the state of a system driven by a random

■ Contrast to deterministic simulation, which can be easily simulated