Chapter 1: INTRODUCTION: Statistical Signal Processing

Statistical Signal Processing
Chapter 1: INTRODUCTION
Instructor:
Phuong T. Tran, PhD.

(slides are borrowed from Prof. Natasha Devoyre, UIC)
Ton Duc Thang University

Faculty of Electrical and Electronics Engineering
Apr. 27 2019
Outline
1 Definitions
2 Continuous Random Variables and Probability
Density Functions (PDF)
3 Cumulative distribution function (CDF)
4 Gaussian Random Variable
5 Multiple Continuous Random Variables
6 Conditional Probability
7 Derived Distributions
8 Textbooks and References
Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Outline
1 Definitions

Outline
1 Definitions

Outline
1 Definitions

Outline
1 Definitions

Outline
1 Definitions

Outline
1 Definitions

Outline
1 Definitions

Definitions
Outline
1 Definitions

Definitions
Probabilistic models
There is an underlying process called experiment that produces

exactly ONE outcome.
A probabilistic model: consists of a sample space and a probability
law
Sample space (denoted Ω): set of all possible outcomes of an
experiment.
Event: any subset of the sample space.
Probability Law: assigns a probability to every set A of possible
outcomes (event).
Choice of sample space (or universe): every element should be
distinct and mutually exclusive (disjoint); and the space should be
“collectively exhaustive” (every possible outcome of an experiment
should be included).

Definitions
Probability axioms
1 Nonnegativity. P(A) ≥ 0 for every event A.

2 Additivity. If A and B are two disjoint events, then
P(A ∪ B) = P(A) + P(B) (also extends to any countable number of
disjoint events).
3 Normalization. Probability of the entire sample space, P(Ω) = 1.

Definitions
Properties of probbaility laws
1 If A ⊆ B, then P(A) ≤ P(B).

2 P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
3 P(A ∪ B) ≤ P(A) + P(B)
4 P(A ∪ B ∪ C ) = P(A) + P(Ac ∩ B) + P(Ac ∩ B c ∩ C ), where
Ac = Ω\B.

Definitions
Random variables
A random variable is a real valued function of the outcome of an

experiment.
Example: Coin tosses. r.v. X = 1 if heads and X = 0 if tails

(Bernoulli r.v.).
A function of a r.v. defines another r.v.
Discrete r.v.: X takes values from the set of integers.
Continuous r.v.: X takes values from the set of real numbers (we
will focus on this).

Continuous Random Variables and Probability
Outline
1 Definitions

Continuous random variables

A r.v. X is called continuous if there is a function fX (x ) with
fX (x ) ≥ 0, called
R
probability density function (PDF), s.t.
P(X ∈ B) = fX (x )dx for all subset B of the real line.
B
Rb
Specially, for B = [a, b], P(a ≤ x ≤ b) = fX (x )dx , and can be
a
interpreted as the area under the graph of PSF fX (x ).
Sample space: Ω = (−∞, +∞).
+∞
R
Normalization: P(Ω) = 1, so fX (x )dx = 1.
−∞
+∞
R
Expected value: E [X ] = xfX (x )dx .
−∞
+∞
Variance: Var [X ] = E [(X − E [X ])2 ] = (x − E [X ])2 fX (x )dx
R
−∞
Families of continuous random

variables

1
if x ∈ [a, b]


Uniform r.v.: X ∼ U[a, b] ⇒ fX (x ) = b − a
if x ∈

0 / [a, b]
λe −λx
(
if x ≥ 0
Exponential r.v.: X ∼ Exp(λ) ⇒ fX (x ) =
0 if x < 0
Gaussian random r.v.: will be discuss later in this lecture.

Cumulative distribution function (CDF)
Outline
1 Definitions

Cumulative distribution
function (CDF
Definition: FX (x ) , P(X ≤ x ) (probability of event {X ≤ x }).
Defined for continuous r.v.’s:

Zx
FX (x ) = fX (t)dt
−∞
Note: the PDF fX (x ) is NOT a probability of any event, it can be

>1. But FX (x ) is the probability of the event {X ≤ x } for both
continuous and discrete r.v.’s, so it must be ≤ 1.

Properties of CDF
FX (x ) is monotonically nondecreasing in x .
FX (x ) → 0 as x → −∞ andFX (x ) → 1 as x → ∞.
FX (x ) is continuous and nonnegative for continuous r.v.’s.

dFX (x )
fX (x ) = dx .

Gaussian Random Variable
Outline
1 Definitions

Gaussian random variables

The most commonly used r.v. in Communications and Signal
Processing.
X is normal or Gaussian if it has a PDF of the form:
1 (x −µ)2
fX (x ) = √ e− 2σ 2
2πσ 2
where we can show that µ = E [X ] and σ 2 = Var [X ].
Standard normal r.v.: the normal r.v. with µ = 0 and σ 2 = 1.
CDF of a standard normal Z , denoted Φ(z):
Zz
1 2 /2
Φ(z) , P(Z ≤ z) = √ e −t dt.
2π
−∞
It is recorded as a table.
Properties of Gaussian random

variables
Theorem (Theorem 1)
X −µ
Let X is a normal r.v. with mean µ and variance σ 2 , then Y = σ is a
standard normal r.v.
Computing CDF of any normal r.v. X using the table for Φ:

x −µ

FX (x ) = Φ
σ
Normal r.v. models the additive effect of many independent factors

well. This is formally stated as the central limit theorem: sum of a
large number of independent and identically distributed (not
necessarily normal) r.v.’s has an approximately normal CDF.

Multiple Continuous Random Variables
Outline
1 Definitions

Topics for reviewing
Conditioning on an event
Joint and Marginal PDF
Expectation, Independence, Joint CDF, Bayes rule

Derived distributions
Function of a single random variable: Y = g(X ) for any function g
Function of a single random variable: Y = g(X ) for linear function g
Function of a single random variable: Y = g(X ) for strictly
monotonic function g
Function of two random variables: Z = g(X , Y ) for any function g

Joint PDF
Two r.v.s X and Y are jointly continuous iff there is a function

fX ,Y (x , y ) with fX ,Y
R
(x , y ) ≥ 0, called the joint PDF, s.t.
P((X , Y ) ∈ B) = fX ,Y (x , y )dxdy for all subset B of the 2D plane.
B
Specifically, for
B = [a, b] × [c, d] , {(x , y ) : a ≤ x ≤ b, c ≤ y ≤ d},
Zb Zd
P(a ≤ x ≤ b, c ≤ y ≤ d) = fX ,Y (x , y )dy dx
x =a y =c

Marginal PDF
Marginal PDF: The PDF obtained by integrating the joint PDF

over the entire range of one r.v. (in general, integrating over a set of
r.v.’s):
P(a ≤ x ≤ b) = P(a ≤ x ≤ b, −∞ ≤ y ≤ +∞)

Zb +∞
Z
= fX ,Y (x , y )dy dx
x =a y =−∞
+∞
Z
⇒ fX (x ) = fX ,Y (x , y )dy
y =−∞

Joint CDF
Joint CDF:
Zy Zx
FX ,Y (x , y ) := P(X ≤ x , Y ≤ y ) = fX ,Y (s, t)dsdt
t=−∞ s=−∞
Obtain joint PDF from joint CDF:
∂ 2 FX ,Y
fX ,Y (x , y ) = (x , y )
∂x ∂y

Conditional Probability
Outline
1 Definitions

Conditioning on an event

 fX (x )

if A occurs
fX |A (x ) := P(A)

0 otherwise

Consider the special case when A := {X ∈ R}, e.g. the region R can be
the interval [a, b]. In this case, we should be writing fX |{X ∈R} . But to
keep things simple, we misuse notation to also write

 fX (x )

if X ∈ R
fX |R (x ) := P(X ∈ R)

0 otherwise

 R fX (x )

 if X ∈ R
f (t)dt

:= t∈R X


0 otherwise


Conditional PDF
Conditional PDF of X given that Y = y is defined as
fX ,Y (x , y )
fX |Y (x |y ) ,
fY (y )
For any y , fX |Y (x |y ) is a legitimate PDF: integrates to 1.
In general, for any region A, we have that

Z
P(X ∈ A|Y = y ) = lim P(X ∈ A|y ≤ Y ≤ y + δ) = fX |Y (x |y )dx
δ→0
x ∈A

Expectation and independence
Expectation: Please review the followings:

E [g(X )|Y = y ], E [g(X , Y )|Y = y ], and total expectation theorem
for E [g(X )] and for E [g(X , Y )].
For any y , fX |Y (x |y ) is a legitimate PDF: integrates to 1.
Independence: X and Y are independent iff fX |Y = fX (or iff

fX ,Y = fX fY , or iff fY |X = fY ).
If X and Y independent, any two events {X ∈ A} and {Y ∈ B} are

independent.
If X and Y independent, E [g(X )h(Y )] = E [g(X )].E [h(Y )] and

Var [X + Y ] = Var [X ] + Var [Y ] (show this?)

Conditional CDF
Conditional CDF:
FX |Y (x |y ) := P(X ≤ x |Y = y )
Zx
= lim P(X ≤ x |y ≤ Y ≤ y + δ) = fX |Y (t|y )dt
δ→0
t=−∞

Bayes’ rule
Bayes rule when unobserved phenomenon is continuous:
fX ,Y (x , y )
fX |Y (x |y ) ,
fY (y )
Bayes rule when unobserved phenomenon is discrete:
pN|X (i|x ) = P(N = i|X = x ) = lim P(N = i|X ∈ [x , x + δ])

δ→0
pN (i)fX |N=i (x )
=P
pN (j)fX |N=j (x )
j

Bayes’ rule (cont.)
Bayes rule with conditioning on events: Suppose that events

A1 , A2 , ..., An form a partition, i.e. they are disjoint and their union
is the entire sample space. The simplest example is n = 2, A1 = A,
A2 = Ac . Then:
P(Ai )fX |Ai (x )
P(Ai |X = x ) = P
P(Aj )fX |Aj (x )
j

Derived Distributions
Outline
1 Definitions

Obtaining PDF of Y = g(X )

ALWAYS use the following 2-step procedure:
R
Compute CDF first. FY (y ) = P(g(X ) ≤ y ) = fX (x )dx .
x |g(x )≤y
Obtain PDF by differentiating FY , i.e. fY (y ) = ∂F ∂y (y ).
Y
Special Case 1: Linear Case: Y = aX + b. Can show that

1 y −b

fY (y ) = fX
|a| a
Special Case 2: Strictly Monotonic Case: Consider Y = g(X )

with g being a strictly monotonic function of X . Thus g is a one
to one function. Thus there exists a function h s.t. y = g(x ) iff
x = h(y ) (i.e. h is the inverse function of g, often denotes as
h , g −1 . Then can show that

dh
fY (y ) = fX (h(y )) (y )

dy
Functions of two random

variables
Two possible ways to solve this depending on which is easier. Try the first
method first: if easy to find the region to integrate over then just do that.
1 Do the following:
(a) Compute CDF of Z = g(X , Y ), i.e. compute FZ (z). In general,

this computed as:
Z
FZ (z) = P(g(X , Y ) ≤ z) = fX ,Y (x , y )dydx .
(x ,y ):g(x ,y )≤z
(b) Differentiate w.r.t. z to get the PDF, i.e. compute

fZ (z) = ∂F∂z Z (z)
.
2 Use a three-step procedure:
(a) Compute conditional CDF, FZ |X (z|x ) := P(Z ≤ z|X = x ).
(b) Differentiate w.r.t. z to get conditional PDF,
∂F (z|x )
fZ |X (z|x ) = Z |X ∂z .R R
(c) Compute fZ (z) = fZ ,X (z, x )dx = fZ |X (z|x )fX (x )dx .
Functions of two random

variables
Special case: PDF of Z = X + Y when X , Y are independent:

convolution of PDFs of X and Y . Here need to use the second method.

Bibliography
Steven M. Kay.
Fundamentals of Statistical Signal Processing - Volume II: Detection Theory, 1e.
Prentice-Hall PTR, 1998.
Steven M. Kay.
Fundamentals of Statistical Signal Processing - Volume I: Estimation Theory, 1e.
Steven M. Kay.
Fundamentals of Statistical Signal Processing - Volume III: Detection Theory, 1e.
T. K. Moon and W. C. Stirling, Pearson
Mathematical Methods and Algorithms for Signal Processing.
Pearson, 1999.
Harry L. Van Trees.
Detection, Estimation, and Modulation Theory, Part I: Detection, Estimation, and
Filtering Theory, 2nd edition.
John Wiley & Sons, 2013.

Chapter 1: INTRODUCTION: Statistical Signal Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1: INTRODUCTION: Statistical Signal Processing

Uploaded by

Copyright:

Available Formats

Statistical Signal Processing

Phuong T. Tran, PhD.

Ton Duc Thang University

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 2 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 3 / 33

There is an underlying process called experiment that produces

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 4 / 33

1 Nonnegativity. P(A) ≥ 0 for every event A.

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 5 / 33

Properties of probbaility laws

1 If A ⊆ B, then P(A) ≤ P(B).

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 6 / 33

A random variable is a real valued function of the outcome of an

Example: Coin tosses. r.v. X = 1 if heads and X = 0 if tails

A function of a r.v. defines another r.v.

Discrete r.v.: X takes values from the set of integers.

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 7 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 8 / 33

Continuous random variables

Families of continuous random

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 10 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 11 / 33

Definition: FX (x ) , P(X ≤ x ) (probability of event {X ≤ x }).

Defined for continuous r.v.’s:

Note: the PDF fX (x ) is NOT a probability of any event, it can be

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 12 / 33

FX (x ) is continuous and nonnegative for continuous r.v.’s.

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 13 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 14 / 33

Gaussian random variables

Properties of Gaussian random

Computing CDF of any normal r.v. X using the table for Φ:

Normal r.v. models the additive effect of many independent factors

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 16 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 17 / 33

Topics for reviewing

Joint and Marginal PDF

Expectation, Independence, Joint CDF, Bayes rule

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 18 / 33

Two r.v.s X and Y are jointly continuous iff there is a function

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 19 / 33

Marginal PDF: The PDF obtained by integrating the joint PDF

P(a ≤ x ≤ b) = P(a ≤ x ≤ b, −∞ ≤ y ≤ +∞)

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 20 / 33

Obtain joint PDF from joint CDF:

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 21 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 22 / 33

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 23 / 33

Conditional PDF of X given that Y = y is defined as

For any y , fX |Y (x |y ) is a legitimate PDF: integrates to 1.

In general, for any region A, we have that

Apr. 27 2019 EE702030 - Lecture 2: Review of Probability Theory 24 / 33

Expectation and independence

Expectation: Please review the followings:

For any y , fX |Y (x |y ) is a legitimate PDF: integrates to 1.

Independence: X and Y are independent iff fX |Y = fX (or iff