Professional Documents
Culture Documents
• Definitions X(ζ) = ζ
S
• Cumulative distribution function
Real
• Probability density Function ζ Line
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 1 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 2
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 3 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 4
Properties of the CDF Example 2: Distribution Functions
1. 0 ≤ F (x) ≤ 1 The arrival time of Joe’s email obeys the exponential probability law
with parameter λ:
2. limx→+∞ F (x) = 1
3. limx→−∞ F (x) = 0 0 x<0
P [X > x] = −λx
λe x ≥ 0.
4. F (x) is a nondecreasing function of x. Thus, if a < b, then
F (a) ≤ F (b). Find the CDF of X for λ = 2 and plot F (x) versus x.
5. F (x) is continuous from the right. That is, for h > 0,
F (b) = limh→0 F (b + h) = F (b+ )
6. P [a < X ≤ b] = F (b) − F (a)
7. P [X = b] = F (b) − F (b− )
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 5 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 6
1 %FigureSet(1);
%figure(1);
FigureSet(1,’LTX’);
lambda = 2;
x = 0:0.01:2;
0.8 y = 1-exp(-lambda*x);
ylabel(’F(x)’);
title(’Exponential Cumulative Distribution Function’);
set(gca,’Box’,’Off’);
AxisSet(8);
0.4
print -depsc ExponentialCDF;
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 7 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 8
Empirical Distribution Function Example 3: Empirical Distribution Function Plot
Let X1 , X2 , . . . , Xn be a random sample. The empirical distribution Exponential Emperical Distribution Function N:25
function (edf) is a function of x which equals the fraction of Xi s that
are less than or equal to x for each x, −∞ < x < ∞ 1
S(x)
• Piecewise-constant function (stairs)
• Assuming the sample consist of distinct values, each step has 0.4
height = n1
• Minimum value: 0, Maximum value: 1 0.2
• Nondecreasing
• Is a random function 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 9 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 10
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 11 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 12
Definition: Probability Density Function (PDF) Properties of the PDF
The probability density function (PDF) of a continuous RV is 1. f (x) ≥ 0
defined as the derivative of F (x): b
2. P [a ≤ X ≤ b] = a f (u) du
dF (x) x
f (x) = 3. F (x) = −∞ f (u) du
dx
+∞
Alternatively, 4. −∞ f (u) du = 1
F (x − ) + F (x + ) 5. A valid PDF can be formed from any nonnegative, piecewise
f (x) = lim
→0 2 continuous function g(x) that has a finite integral
6. The PDF must be defined for all real values of x
• Conceptually, it is more useful than the CDF
7. If X does not take on some values, this implies f (x) = 0 for those
• Does not technically exist for discrete or mixed RV’s values
– Can finesse with impulse functions
du(x)
– δ(x) = dx where u(x) is the unit step function
• PDF represents the density of probability at the point x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 13 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 14
lambda = 2;
F(x)
x = -0.5:0.005:2;
0.5 xl = [min(x) max(x)];
F = zeros(size(x));
id = find(x>=0);
F(id) = 1-exp(-lambda*x(id));
f = zeros(size(x));
0 f(id) = lambda*exp(-lambda*x(id));
−0.5 0 0.5 1 1.5 2
subplot(2,1,1);
h = plot(x,F,’b’,xl,[1 1],’k:’);
set(h,’LineWidth’,1.5);
2 xlim(xl);
ylim([0 1.1]);
ylabel(’F(x)’);
1.5 title(’Exponential CDF and PDF’);
box off;
f(x)
1 subplot(2,1,2);
h = plot(x,f,’g’);
0.5 set(h,’LineWidth’,1.5);
xlim(xl);
ylim([0 2.1]);
0 xlabel(’x’);
−0.5 0 0.5 1 1.5 2 ylabel(’f(x)’);
box off;
x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 15 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 16
AxisSet(8); Histograms
Let X1 , X2 , . . . , Xn be a random sample. The histogram is a function
print -depsc ExponentialPDF;
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 17 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 18
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1 True
Estimated
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 19 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 20
Histogram Accuracy Example 5: Histograms with Different Bin Centers
∞ 2 Exponential Emperical Distribution Function N:100
ISE = fˆ(u) − f (u) du 1
−∞
True
0.5
1
BIAS fˆ(x) = 2 f (x) [h − 2(x − bj )] + O(h2 ) for x ∈ (bj , bj + 1]
f (x)
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Var fˆ(x) = + O(n−1 )
nh 1
Estimated
1 h2 R(f )
MISE = + + O(n−1 ) + O(h3 ) 0.5
nh 12
where h is the bin width, bj is the jth bin boundary, and 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
R(φ) = φ(u)2 du
1 True
• The bin width controls the bias-variance tradeoff Estimated
0.5
• More on all of this later
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 21 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 22
F(x)
f(x)
0.2
0.4
0.1
0.2
0 0
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x
1 (x−m)2
f (x) = √ e− 2σ2
2πσ
x
1 (x−m)2
F (x) = P [X ≤ x] = √ e− 2σ 2 dx
2πσ −∞
• Denoted as X ∼ N (μX , σX
2
)
• Also called the normal distribution
• Arises naturally in many applications
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 23 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 24
• Central limit theorem (more later) Functions of RV’s
• We will work with functions of RV’s: Y = g(X)
• Y is also an RV
• Example: Y = aX + b
FY (y) = P [Y ≤ y]
= P [aX + b ≤ y]
y−b
= P X≤
a
y−b
= FX
a
for a > 0.
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 25 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 26
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 27 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 28
Average versus Mean Expected Values of Functions
N
We can also calculate the expected values of functions of random
+∞
1 variables. Let Y = g(X). Then,
E[X] = μx = xf (x) dx x̄ = μ̂x = xi
−∞ N i=1 ∞
E[Y ] = g(x)f (x) dx
−∞
Note the distinction between the average and mean
Example Let g(X) = I(X) where I(X) is the indicator function of
• The average is the event {X in C}, where C is some interval in the real line:
– an estimate of the mean
– calculated from a data set 0 X not in C
g(X) =
– a random variable 1 X in C
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 29 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 30
2. E[cX] = c E[X] σX
2
≡ E[(X − μx )2 ]
N N
3. E[ k=1 gk (X)] = k=1 E[gk (X)] The nth moment of an RV is defined as
∞
4. Proof left as a homework assignment E[X n ] ≡ xn f (x) dx
−∞
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 31 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 32
Markov Inequality Example 7: Markov Inequality
The mean and variance of a RV X give us sufficient information to The mean height of children in a kindergarten class is 3.5 feet. Find
establish bounds on certain probabilities. Suppose that X is a the bound on probability that a kid in the class is taller than 9 feet.
nonnegative random variable.
Markov inequality:
E[X]
P [X ≥ a] ≤
a
Proof
a ∞
E[X] = xf (x) dx + xf (x) dx
a
0
∞
≥ xf (x) dx
a
∞
≥ af (x) dx
a
= aP [X ≥ a]
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 33 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 34
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 35 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 36
Jointly Continuous Random Variables Example 8: Jointly Continuous RV
Random variables X and Y are jointly continuous if the probabilities Gaussian Density Function f(x,y)
1.5
of events involving (X, Y ) can be expressed as an integral of a PDF.
0.25
In other words, there is a joint probability density function that is
defined on the real plane such that for any event A,
1 0.2
P [X, Y in A] = fX,Y (u, v) du dv
A
0.5 0.15
y
Properties
+∞ +∞
• −∞ −∞ fX,Y (u, v) du dv = 1
0.1
d2 FX,Y (x,y)
• fX,Y (x, y) = dx dy
0
0.05
−0.5
−0.5 0 0.5 1 1.5
x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 37 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 38
0.4
0.2
0.3
F(x,y)
0.2
0.15
0.1
0.1
0
2
1 1.5
1 0.05
0 0.5
0
y −1 −0.5 x
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 39 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 40
Joint Cumulative Distribution Function (CDF) Marginal PDF’s
Y Y x Y
FX,Y (x, y) FX (x) FY (y)
x y
y
X X X
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 41 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 42
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 43 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 44
Conditional CDF’s & Independence Conditional Expectation
If X and Y are independent The conditional expectation of Y given X = x is defined by
∞
fX,Y (x, y) = fX (x)fY (y)
EY [Y |X = x] = y fY (y|x) dy
−∞
and the conditional PDF of Y given X = x is
fX,Y (x, y) • EY [Y |X = x] can be viewed as a function of x: g(x) = EY [Y |x]
fY (y|x) =
fX (x)
• g(X) = EY [Y |X] is a random variable
fX (x)fY (y)
= • It can be shown that EY [Y ] = EX [g(X)] = EX [EY [Y |X]]
fX (x)
= fY (y) • More generally, EY [h(Y )] = EX [EY [h(Y )|X]] where
∞
∞
EX [EY [h(Y )|X]] = h(y)fY (y|x) dy fX (x) dx
−∞ −∞
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 45 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 46
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 47 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 48
Mean Square Estimation Example 9: Minimum MSE Estimation
Observed Suppose we wish to estimate a random variable Y with a constant a.
Variables Output What is the best value of a that minimizes the MSE?
x1,...,x n Model y
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 49 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 50
Example 10: Minimum Linear MSE Estimation Example 10: Workspace (1)
Suppose we wish to estimate a random variable Y with a linear
function of X, Ŷ = aX + b. What values of a and b minimize the
MSE?
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 51 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 52
Example 10: Workspace (2) MMSE Linear Estimation Discussion
Ŷ = a∗ X + b∗
X − E[X]
= ρX,Y σY + E[Y ]
σX
• Note that X−E[X]
σX is just a scaled version of X
– Zero mean
– Unit variance
– Sometimes called a z score
X−E[X]
• Xs = σY σX has the variance of Y
• The term E[Y ] ensures that E[Ŷ ] = E[Y ]
• ρX,Y specifies the sign and extent of Y relative to Xs
• If uncorrelated, Ŷ = E[Y ]
X−E[X]
• If perfectly correlated, Ŷ = ±σY σX + E[Y ] = Y
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 53 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 54
= σY2 (1 − ρ2X,Y )
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 55 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 56
Best Linear Estimator MMSE Nonlinear Estimation
• In general, the best estimator of Y given X will be nonlinear
E {(Y − E[Y ]) − a∗ (X − E[X])}
2
MMSE =
• Suppose we wish to find the g(X) that best approximates Y in
= σY2 1 − ρ2X,Y the MMSE sense
min EX,Y [(Y − g(X))2 ]
g(·)
• When ρX,Y = ±1, MMSE = 0 Using conditional expectation
• Perfect correlation implies perfect prediction
2 2
EX,Y (Y − g(X)) = EX EY (Y − g(x)) |X = x
• No correlation (ρX,Y = ±0) implies MMSE = σY2 ∞
2
= EY (Y − g(x)) |X = x fX (x) dx
−∞
∞
∞
= (y − g(x)) fY (y|x) dy fX (x) dx
2
−∞ −∞
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 57 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 58
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 59 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 60
Linear Estimation with Vectors Linear Estimation Error
Suppose that we wish to estimate Y with a linear sum of random T
variables X1 , X2 , . . . XL . ε2 = (Y − X w)2
T T T
Ŷ =
T
X w = Y 2 + w XX w − 2Y X w
L
The expected value of the squared error is
= X i wi
i=1 MSE ≡ E[ε2 ]
T
Then the error Y − Ŷ can be written as = E[(Y − X w)2 ]
T T T
ε = Y −X w
T = E[Y 2 ] + w E[XX ]w − 2 E[Y X ]w
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 61 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 62
T
R is a symmetric matrix: R = R.
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 63 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 64
Minimum Mean Squared Error Minimum Mean Squared Error
Using the matrices R and P , the MSE can be rewritten w∗ = R−1 P
T
E[ε2 ] = E[(Y − X w)2 ] Find the minimum MSE by substitution into the equation for the MSE
T T T
= E[Y ] + w E[XX ]w − 2 E[Y X ]w
2 T
T T
min E[ε2 ] = σY2 + w∗ Rw∗ − 2P w∗
= σY2 + w Rw − 2w P T
= σY2 + (R−1 P ) R(R−1 P ) − 2P (R−1 P )
Take the gradient of the MSE above with respect to w and set the =
T
σY2 + P R−1 P − 2P R−1 P
resulting expression equal to zero. T
T T = σY2 − P R−1 P
∇w E[ε ] 2
= ∇w (σY2 + w Rw − 2w P ) T
T = σY2 + P w∗
= R w + Rw − 2P
= 2Rw − 2P
= 0
Solving for w∗ , we obtain
w∗ = R−1 P
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 65 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 66
Closing Comments
• In general, we cannot calculate anything discussed so far
• Everything discussed requires the true PDF (or CDF) be known
• In practice, we’ll have data, not PDF’s
• Represents a best-case scenario
• How close can we approximate the true point estimate given only
data
• Will compare our estimators on cases where the true PDF is known
J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 67