You are on page 1of 17

Random Variables Overview Definition: Random Variable

• Definitions X(ζ) = ζ
S
• Cumulative distribution function
Real
• Probability density Function ζ Line

• Functions of random variables Range


• Expected values
• Definition: a random variable X is a function that assigns a real
• Mean & variance
number, X(ζ), to each outcome ζ in the sample space of a
• Markov & Chebyshev inequalities random experiment.
• Independence & marginal distributions • The sample space S is the domain of the random variable
• Bayes rule and conditional probability • The set of all values that X can have is the range of the random
variable
• Mean square estimation
• This is a many to one mapping. That is, a set of points, ζ1 , ζ2 , . . .
• Linear prediction
may take on the same value of the random variable
• Will abbreviate as simply “RV”

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 1 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 2

Example 1: Random Variable Definitions Cumulative Distribution Function


Suppose that a coin is tossed three times and the sequence of heads The cumulative distribution function (CDF) of a random variable X
and tails is noted. What is the sample space for this experiment? Let is defined as the probability of the event {X ≤ x}:
X be the number of heads in three coin tosses. What is the range of
F (x) = P [X ≤ x]
X? List all of the points in the domain of the sample space and the
corresponding values of X.
• Sometimes is just called distribution function
• Here X is the random variable and x is a non-random variable

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 3 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 4
Properties of the CDF Example 2: Distribution Functions
1. 0 ≤ F (x) ≤ 1 The arrival time of Joe’s email obeys the exponential probability law
with parameter λ:
2. limx→+∞ F (x) = 1 
3. limx→−∞ F (x) = 0 0 x<0
P [X > x] = −λx
λe x ≥ 0.
4. F (x) is a nondecreasing function of x. Thus, if a < b, then
F (a) ≤ F (b). Find the CDF of X for λ = 2 and plot F (x) versus x.
5. F (x) is continuous from the right. That is, for h > 0,
F (b) = limh→0 F (b + h) = F (b+ )
6. P [a < X ≤ b] = F (b) − F (a)
7. P [X = b] = F (b) − F (b− )

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 5 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 6

Example 2: Distribution Plot Example 2: MATLAB Code


function [] = ExponentialCDF();
Exponential Cumulative Distribution Function
close all;

1 %FigureSet(1);
%figure(1);
FigureSet(1,’LTX’);
lambda = 2;
x = 0:0.01:2;
0.8 y = 1-exp(-lambda*x);

h = plot(x,y,’b’,[0 100],[1 1],’k:’);


set(h,’LineWidth’,1.5);
axis([0 max(x) 0 1.1]);
0.6 xlabel(’x’);
F(x)

ylabel(’F(x)’);
title(’Exponential Cumulative Distribution Function’);
set(gca,’Box’,’Off’);
AxisSet(8);
0.4
print -depsc ExponentialCDF;

0.2

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 7 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 8
Empirical Distribution Function Example 3: Empirical Distribution Function Plot
Let X1 , X2 , . . . , Xn be a random sample. The empirical distribution Exponential Emperical Distribution Function N:25
function (edf) is a function of x which equals the fraction of Xi s that
are less than or equal to x for each x, −∞ < x < ∞ 1

• The “true” CDF is never known


0.8
• All we have is data
• The edf is a rough estimate of the CDF
0.6

S(x)
• Piecewise-constant function (stairs)
• Assuming the sample consist of distinct values, each step has 0.4
height = n1
• Minimum value: 0, Maximum value: 1 0.2
• Nondecreasing
• Is a random function 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 9 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 10

Example 3: MATLAB Code Random Variable Types


function [] = ExponentialEDF();
Discrete: An RV whose CDF is a right-continuous, staircase
close all;
(piecewise constant) function of x.
FigureSet(1,’LTX’);
lambda = 2;
N = 25;
• Only takes on values from a finite set
R = exprnd(1/lambda,N,1); • Encounter often in applications involving counting
x = 0:0.02:max(R);
F = 1-exp(-lambda*x);
Continuous: An RV whose CDF is continuous everywhere
h = cdfplot(R);
hold on; • Can be writtenas an integral of some nonnegative function
plot(x,F,’r’,[0 100],[1 1],’k:’);
x
hold off;
grid off;
f (x): F (x) = −∞ f (u) du
set(h,’LineWidth’,1.0);
set(gca,’XLim’,[0 max(R)]);
set(gca,’YLim’,[0 1.1]);
• Implies P [X = x] = 0 everywhere.
xlabel(’x’);
ylabel(’S(x)’); • In words, there is an infinitesimal probability X will be equal to
title(sprintf(’Exponential Emperical Distribution Function
box off;
N:%d’,N));
any specific number x.
AxisSet(8);
print -depsc ExponentialEDF;
• Nonetheless, an experiment will cause X to equal some value.
Mixed: An RV with a CDF that has jumps on a countable set of
x
Note: MATLAB defines the distribution as f (x) = λ1 e− λ u(x) rather points, but also increases continuously over one or more intervals.
than f (x) = λe−λx u(x). IOW, everything else.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 11 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 12
Definition: Probability Density Function (PDF) Properties of the PDF
The probability density function (PDF) of a continuous RV is 1. f (x) ≥ 0
defined as the derivative of F (x): b
2. P [a ≤ X ≤ b] = a f (u) du
dF (x) x
f (x) = 3. F (x) = −∞ f (u) du
dx
 +∞
Alternatively, 4. −∞ f (u) du = 1
F (x − ) + F (x + ) 5. A valid PDF can be formed from any nonnegative, piecewise
f (x) = lim
→0 2 continuous function g(x) that has a finite integral
6. The PDF must be defined for all real values of x
• Conceptually, it is more useful than the CDF
7. If X does not take on some values, this implies f (x) = 0 for those
• Does not technically exist for discrete or mixed RV’s values
– Can finesse with impulse functions
du(x)
– δ(x) = dx where u(x) is the unit step function
• PDF represents the density of probability at the point x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 13 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 14

Example 4: Exponential PDF Example 4: MATLAB Code


function [] = ExponentialPDF();
Exponential CDF and PDF close all;
1 FigureSet(1,’LTX’);

lambda = 2;
F(x)

x = -0.5:0.005:2;
0.5 xl = [min(x) max(x)];
F = zeros(size(x));
id = find(x>=0);
F(id) = 1-exp(-lambda*x(id));
f = zeros(size(x));
0 f(id) = lambda*exp(-lambda*x(id));
−0.5 0 0.5 1 1.5 2
subplot(2,1,1);
h = plot(x,F,’b’,xl,[1 1],’k:’);
set(h,’LineWidth’,1.5);
2 xlim(xl);
ylim([0 1.1]);
ylabel(’F(x)’);
1.5 title(’Exponential CDF and PDF’);
box off;
f(x)

1 subplot(2,1,2);
h = plot(x,f,’g’);
0.5 set(h,’LineWidth’,1.5);
xlim(xl);
ylim([0 2.1]);
0 xlabel(’x’);
−0.5 0 0.5 1 1.5 2 ylabel(’f(x)’);
box off;
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 15 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 16
AxisSet(8); Histograms
Let X1 , X2 , . . . , Xn be a random sample. The histogram is a function
print -depsc ExponentialPDF;

of x which equals the fraction of Xi s that are within specified intervals.


• Like the CDF, the “true” PDF is never known
• The histogram is a rough estimate of the PDF
• Usually shown in the form of a bar plot
• Minimum value: 0, Maximum value: ∞
• Is a random function
• Perhaps the most common graphical representation of estimated
PDFs

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 17 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 18

Example 5: Histograms Histogram Comments

Exponential Emperical Distribution Function N:100 • Histograms can be misleading


1 • The apparent shape of the histogram is sensitive to
True

0.5 – The bin locations


0
– The bin widths
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
• It can be shown that the bin width affects the bias and the
1 variance of this estimator of the PDF
Estimated

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

1 True
Estimated
0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 19 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 20
Histogram Accuracy Example 5: Histograms with Different Bin Centers
 ∞  2 Exponential Emperical Distribution Function N:100
ISE = fˆ(u) − f (u) du 1
−∞

True
  0.5
1 
BIAS fˆ(x) = 2 f (x) [h − 2(x − bj )] + O(h2 ) for x ∈ (bj , bj + 1]
  f (x)
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Var fˆ(x) = + O(n−1 )
nh 1

Estimated
1 h2 R(f  )
MISE = + + O(n−1 ) + O(h3 ) 0.5
nh 12
where h is the bin width, bj is the jth bin boundary, and 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
R(φ) = φ(u)2 du
1 True
• The bin width controls the bias-variance tradeoff Estimated
0.5
• More on all of this later
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 21 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 22

Example 6: Uniform Distribution Gaussian RV’s


Plot the CDF and PDF for a uniform random variable X ∼ U[a, b]. 1
Gaussian Distribution Function
0.4
Gaussian Density Function

Note: X ∼ U[a, b] denotes that X is drawn from a uniform 0.8


0.3
distribution and has a range of [a, b]. 0.6

F(x)

f(x)
0.2
0.4
0.1
0.2

0 0
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

1 (x−m)2
f (x) = √ e− 2σ2
2πσ
 x
1 (x−m)2
F (x) = P [X ≤ x] = √ e− 2σ 2 dx
2πσ −∞

• Denoted as X ∼ N (μX , σX
2
)
• Also called the normal distribution
• Arises naturally in many applications

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 23 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 24
• Central limit theorem (more later) Functions of RV’s
• We will work with functions of RV’s: Y = g(X)
• Y is also an RV
• Example: Y = aX + b

FY (y) = P [Y ≤ y]
= P [aX + b ≤ y]

y−b
= P X≤
a


y−b
= FX
a
for a > 0.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 25 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 26

Expected Values Overview Expected Values Defined


 +∞
• To completely describe all of the known information about an RV,
we must specify the CDF or PDF E[X] = x f (x) dx
−∞
• Given a data set, estimating the CDF/PDF is one of the most
difficult problems we will discuss (density estimation) • The expected value of a random variable X is denoted E[X]
• Often, much less information about the distribution of X is • This is called the mean of X
sufficient
• The expected value of X is only defined if the integral converges
– Mean absolutely:  ∞
– Median
|x| f (x) dx < ∞
– Standard deviation −∞
– Range • The “best” estimate of the mean of X given a data set is the
• These scalar descriptive statistics are called point estimates sample average,
N
1 
X̄ = xi ≈ E[x]
N i=1

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 27 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 28
Average versus Mean Expected Values of Functions
 N
We can also calculate the expected values of functions of random
+∞
1  variables. Let Y = g(X). Then,
E[X] = μx = xf (x) dx x̄ = μ̂x = xi
−∞ N i=1  ∞
E[Y ] = g(x)f (x) dx
−∞
Note the distinction between the average and mean
Example Let g(X) = I(X) where I(X) is the indicator function of
• The average is the event {X in C}, where C is some interval in the real line:
– an estimate of the mean 
– calculated from a data set 0 X not in C
g(X) =
– a random variable 1 X in C

• The mean is then


 ∞ 
– Calculated from a PDF
E[Y ] = g(x)f (x) dx = f (x) dx = P [X ⊂ C]
– Not a random variable −∞ C
– A property of the PDF Thus, the expected value of the indicator of an event is equal to the
probability of the event.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 29 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 30

Expected Value Properties Variance


1. E[c] = c where c is a constant The variance of a random variable is defined as follows:

2. E[cX] = c E[X] σX
2
≡ E[(X − μx )2 ]
N N
3. E[ k=1 gk (X)] = k=1 E[gk (X)] The nth moment of an RV is defined as
 ∞
4. Proof left as a homework assignment E[X n ] ≡ xn f (x) dx
−∞

• Variance is a measure of how wide a distribution is


• A measure of dispersion
• There are others as well

• The standard deviation is defined as σ ≡ σ2
• σX
2
= E[X 2 ] − E[X]2
• Both are properties of the CDF and are not RVs

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 31 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 32
Markov Inequality Example 7: Markov Inequality
The mean and variance of a RV X give us sufficient information to The mean height of children in a kindergarten class is 3.5 feet. Find
establish bounds on certain probabilities. Suppose that X is a the bound on probability that a kid in the class is taller than 9 feet.
nonnegative random variable.
Markov inequality:
E[X]
P [X ≥ a] ≤
a
Proof
 a  ∞
E[X] = xf (x) dx + xf (x) dx
a
 0

≥ xf (x) dx
a
 ∞
≥ af (x) dx
a
= aP [X ≥ a]

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 33 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 34

Chebyshev Inequality Multiple Random Variables


σ2 A vector random variable is a function that assigns a vector of real
P [|X − μ| ≥ a] ≤ numbers to each outcome ζ in S, the sample space of the random
a2
experiment.
Proof Let D = (X − μ) where μ = E[X]. Then apply the Markov
2 2

inequality • Example: randomly select a student


E[(X − μ)2 ] σ2 • X ≡ [H(ζ), W (ζ), A(ζ)]
P [D ≥ a] = P [D2 ≥ a2 ] ≤ =
a2 a2 • Where
• These bounds are very loose – H(ζ) = height of student ζ
• Note: if σ 2 = 0, the Chebyshev inequality implies P [X = μ] = 1 – W (ζ) = weight of student ζ
– A(ζ) = age of student ζ

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 35 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 36
Jointly Continuous Random Variables Example 8: Jointly Continuous RV
Random variables X and Y are jointly continuous if the probabilities Gaussian Density Function f(x,y)
1.5
of events involving (X, Y ) can be expressed as an integral of a PDF.
0.25
In other words, there is a joint probability density function that is
defined on the real plane such that for any event A,
  1 0.2
P [X, Y in A] = fX,Y (u, v) du dv
A

0.5 0.15

y
Properties
 +∞  +∞
• −∞ −∞ fX,Y (u, v) du dv = 1
0.1
d2 FX,Y (x,y)
• fX,Y (x, y) = dx dy
0

0.05

−0.5
−0.5 0 0.5 1 1.5
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 37 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 38

Example 8: MATLAB Code Example 8: Jointly Continuous RV Continued

Gaussian Density Function f(x,y)


0.25

0.4
0.2
0.3
F(x,y)

0.2
0.15

0.1
0.1
0
2
1 1.5
1 0.05
0 0.5
0
y −1 −0.5 x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 39 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 40
Joint Cumulative Distribution Function (CDF) Marginal PDF’s

Y Y x Y
FX,Y (x, y) FX (x) FY (y)
x y
y

X X X

The marginal PDF’s fX (x) and fY (y) are given as follows


There is also a joint CDF:  ∞
 x  y fX (x) = fX,Y (x, y) dy
FX,Y (x, y) = fX,Y (u, v) du dv −∞
 ∞
−∞ −∞
= P [X ≤ x & Y ≤ y] fY (y) = fX,Y (x, y) dx
−∞
= P [X ≤ x, Y ≤ y]
• fX (x) is the same as the PDF of X, if Y had not been considered
• The marginal PDF can be obtained from the joint PDF
• The joint PDF cannot be obtained from the marginal PDF

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 41 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 42

Independence Conditional CDF’s & Bayes’ Theorem


Two random variables X and Y are independent if and only if their The conditional CDF of Y given X = x:
joint PDF is equal to the product of the marginal PDF’s.
FY (y|x) = lim FY (y|x < X ≤ x + h)
h→0
fX,Y (x, y) = fX (x)fY (y) y
−∞
fX,Y (x, y  ) dy 
Equivalently, they are independent if and only if their joint CDF is =
fX (x)
equal to the product of the marginal CDF’s.
Proof omitted.
FX,Y (x, y) = FX (x)FY (y)
The conditional PDF of Y given X = x:
d fX,Y (x, y)
fY (y|x) = FY (y|x) =
• If X and Y are independent, the random variables W = g(X) and dy fX (x)
Z = h(Y ) are also independent
• This can be viewed as a form of Bayes’ theorem
• Gives a posteriori probability that Y is close to y given that X is
close to x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 43 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 44
Conditional CDF’s & Independence Conditional Expectation
If X and Y are independent The conditional expectation of Y given X = x is defined by
 ∞
fX,Y (x, y) = fX (x)fY (y)
EY [Y |X = x] = y fY (y|x) dy
−∞
and the conditional PDF of Y given X = x is
fX,Y (x, y) • EY [Y |X = x] can be viewed as a function of x: g(x) = EY [Y |x]
fY (y|x) =
fX (x)
• g(X) = EY [Y |X] is a random variable
fX (x)fY (y)
= • It can be shown that EY [Y ] = EX [g(X)] = EX [EY [Y |X]]
fX (x)
= fY (y) • More generally, EY [h(Y )] = EX [EY [h(Y )|X]] where
 ∞
 ∞
EX [EY [h(Y )|X]] = h(y)fY (y|x) dy fX (x) dx
−∞ −∞

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 45 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 46

Correlation and Covariance Correlation Coefficient


The jk th moment of X and Y is defined as The correlation coefficient of X and Y is defined as
 ∞ ∞ σX,Y
2
j k
E[X Y ] = xj y k fX,Y (x, y) dx dy ρX,Y =
σX σY
−∞ −∞ • −1 ≤ ρX,Y ≤ 1
• Extreme values of ρX,Y indicate a linear relationship between X
• The correlation of X and Y is defined as E[XY ] and Y : Y = aX + b
• If E[XY ] = 0, we say X and Y are orthogonal • ρX,Y = 1 implies a > 0, ρX,Y = −1 implies a < 0
• The covariance of X and Y is defined as • X and Y are said to be uncorrelated if ρX,Y = 0
• If X and Y are independent, σX,Y
2
= 0 (see homework)
σX,Y
2
= E[(X − μX )(Y − μY )]
• If X and Y are independent, ρX,Y = 0
• If ρX,Y = 0, X and Y may not be independent
• Uncorrelated variables are not necessarily independent
• However, if X and Y are Gaussian random variables, then
ρX,Y = 0 implies X and Y are independent

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 47 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 48
Mean Square Estimation Example 9: Minimum MSE Estimation
Observed Suppose we wish to estimate a random variable Y with a constant a.
Variables Output What is the best value of a that minimizes the MSE?
x1,...,x n Model y

• Often will want to estimate the value of one RV Y from one or


more other RVs X: Ŷ = g(X)
• Encounter often in nonlinear modeling and classification
• It may be that Y = g(X)
• The estimation error is defined as Y − g(X)
• We will assign a cost to each error c(Y − g(X))
• Goal: find G(X) that minimizes E[c(Y − g(X))]
• The most common cost function is mean squared error (MSE):
MSE = E[(Y − g(X))2 ]

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 49 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 50

Example 10: Minimum Linear MSE Estimation Example 10: Workspace (1)
Suppose we wish to estimate a random variable Y with a linear
function of X, Ŷ = aX + b. What values of a and b minimize the
MSE?

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 51 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 52
Example 10: Workspace (2) MMSE Linear Estimation Discussion

Ŷ = a∗ X + b∗


X − E[X]
= ρX,Y σY + E[Y ]
σX
• Note that X−E[X]
σX is just a scaled version of X
– Zero mean
– Unit variance
– Sometimes called a z score
X−E[X]
• Xs = σY σX has the variance of Y
• The term E[Y ] ensures that E[Ŷ ] = E[Y ]
• ρX,Y specifies the sign and extent of Y relative to Xs
• If uncorrelated, Ŷ = E[Y ]

X−E[X]
• If perfectly correlated, Ŷ = ±σY σX + E[Y ] = Y

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 53 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 54

Orthogonality Condition Best Linear Estimator MMSE


The orthogonality condition states that the error of the best linear  
E {(Y − E[Y ]) − a∗ (X − E[X])}
2
estimator is orthogonal to the observation X − E[X]. MMSE =
 2
• Fundamental result in mean square estimation ∗
= E Ỹ − a X̃
• Central to the area of linear estimation      
• Enables us to more easily find the minimum MSE of the best = E Ỹ − a∗ X̃ Ỹ − a∗ E Ỹ − a∗ X̃ X̃
linear estimator   
= E Ỹ − a∗ X̃ Ỹ
• The notation will be simplified by the following notation
= σY2 − a∗ σX,Y
2
X̃ = X − E[X] Ỹ = Y − E[Y ]  
σX,Y
2
= σY −
2
σX,Y
2
• These are called centered random variables σX2

= σY2 (1 − ρ2X,Y )

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 55 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 56
Best Linear Estimator MMSE Nonlinear Estimation
  • In general, the best estimator of Y given X will be nonlinear
E {(Y − E[Y ]) − a∗ (X − E[X])}
2
MMSE =
• Suppose we wish to find the g(X) that best approximates Y in
 
= σY2 1 − ρ2X,Y the MMSE sense
min EX,Y [(Y − g(X))2 ]
g(·)
• When ρX,Y = ±1, MMSE = 0 Using conditional expectation
• Perfect correlation implies perfect prediction     
2 2
EX,Y (Y − g(X)) = EX EY (Y − g(x)) |X = x
• No correlation (ρX,Y = ±0) implies MMSE = σY2  ∞  
2
= EY (Y − g(x)) |X = x fX (x) dx
−∞
 ∞
 ∞
= (y − g(x)) fY (y|x) dy fX (x) dx
2
−∞ −∞

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 57 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 58

Nonlinear Estimation Continued Random Vectors


 ∞
 ∞ Let X be a random vector,
EX,Y [(Y − g(X))2 ] = (y − g(x))2 fY (y|x) dy fX (x) dx ⎡ ⎤
−∞ −∞ X1
⎢ X2 ⎥
⎢ ⎥
X=⎢ . ⎥
• Integrand is positive for all x ⎣ .. ⎦
• Minimized by minimizing EY [(Y − g(x))2 |X = x] for each x XL
• g(x) is a constant relative to EY [·] Then, the expected value of X is defined as
• Reduces to the equivalent example earlier: estimate Y with a ⎡ ⎤
E[X1 ]
constant g(x) ⎢ E[X2 ] ⎥
• Therefore, the g(x) that minimizes the MSE is ⎢ ⎥
E[X] = ⎢ . ⎥
⎣ .. ⎦
Ŷ = g ∗ (x) = EY [Y |X = x] E[XL ]

• The function g (x) is called the regression curve
• Has the smallest possible MSE
• Linear estimators are generally worse (larger MSE)

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 59 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 60
Linear Estimation with Vectors Linear Estimation Error
Suppose that we wish to estimate Y with a linear sum of random T
variables X1 , X2 , . . . XL . ε2 = (Y − X w)2
T T T

Ŷ =
T
X w = Y 2 + w XX w − 2Y X w
L
The expected value of the squared error is
= X i wi
i=1 MSE ≡ E[ε2 ]
T
Then the error Y − Ŷ can be written as = E[(Y − X w)2 ]
T T T

ε = Y −X w
T = E[Y 2 ] + w E[XX ]w − 2 E[Y X ]w

and the squared error can be written as


T
ε2 = (Y − X w)2
T T T
= Y 2 + w XX w − 2Y X w

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 61 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 62

Correlation Matrix Cross-Correlation Matrix


Let X be a zero-mean random vector. The variance-covariance of Let Y be a zero-mean scalar random variable. Define P , the cross
the vector X, also called the correlation matrix, can be written as correlation matrix, as
R = σ 2 {X} P = E[Y X]
T
⎡ ⎤
= E[XX ] E[Y X1 ]
⎡ 2 ⎤ ⎢ E[Y X2 ] ⎥
σ X1 σX
2
. . . σX
2 ⎢ ⎥
1 ,X2 1 ,XL = ⎢ .. ⎥
⎢ σX
2
σX2
. . . σX
2 ⎥ ⎣ . ⎦
⎢ 2 ,X1 2 2 ,XL ⎥
= ⎢ . .. .. .. ⎥ E[Y XL ]
⎣ .. . . . ⎦
σX
2
L ,X1
σX
2
L ,X2
... σXL
2

T
R is a symmetric matrix: R = R.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 63 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 64
Minimum Mean Squared Error Minimum Mean Squared Error
Using the matrices R and P , the MSE can be rewritten w∗ = R−1 P
T
E[ε2 ] = E[(Y − X w)2 ] Find the minimum MSE by substitution into the equation for the MSE
T T T
= E[Y ] + w E[XX ]w − 2 E[Y X ]w
2 T

T T
min E[ε2 ] = σY2 + w∗ Rw∗ − 2P w∗
= σY2 + w Rw − 2w P T
= σY2 + (R−1 P ) R(R−1 P ) − 2P (R−1 P )
Take the gradient of the MSE above with respect to w and set the =
T
σY2 + P R−1 P − 2P R−1 P
resulting expression equal to zero. T
T T = σY2 − P R−1 P
∇w E[ε ] 2
= ∇w (σY2 + w Rw − 2w P ) T
T = σY2 + P w∗
= R w + Rw − 2P
= 2Rw − 2P
= 0
Solving for w∗ , we obtain
w∗ = R−1 P

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 65 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 66

Closing Comments
• In general, we cannot calculate anything discussed so far
• Everything discussed requires the true PDF (or CDF) be known
• In practice, we’ll have data, not PDF’s
• Represents a best-case scenario
• How close can we approximate the true point estimate given only
data
• Will compare our estimators on cases where the true PDF is known

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 67

You might also like