W2M1-Prob Review Random Variables

Probability Theory: Review
Anubha Gupta, PhD.

Professor
SBILab, Dept. of ECE,
IIIT-Delhi, India
Contact: anubha@iiitd.ac.in; Lab: http://sbilab.iiitd.edu.in
Machine Learning in Hindi
Probability Theory: Review

(Random Variables)
Learning Objectives
● Definition of a random variable
● Types- discrete and continuous
● Probability Distribution Function (PDF) OR Cumulative Distribution Function (CDF)
○ Examples
● Probability Density Function (pdf)
● Examples of famous random variables
● Expectation & Variance
● Joint distribution and density functions
● Central and non-central moments
● Independence, uncorrelatedness, orthogonality of random variables
● Central limit theorem
● Random vector: mean vector, covariance matrix
Reference Text.: Stark, Henry, and John William Woods. "Probability and random processes with applications to signal processing." (2002).
Random Variable
X:  → R
What is random here?

It is actually not a variable but a function or a mapping whose domain is  and range is R.
Consider the experiment: Tossing of a coin; X: {H} → 1; X: {T} → 0
If the coin is fair (unbiased or not biased to generate any specific outcome)
P[X=1]=P[H]=0.5; P[X=0]=P[T]= 0.5
Compute:
(a) P[X < 0] event is: {X < 0}
(b) P[X ≤ 0] event is: {X ≤ 0}
(c) P[X ≤ 0.4] event is: {X ≤ 0.4}
(d) P[X ≤ 1] event is: {X ≤ 1}
Or, an event is specified as {X ≤ x}
Random Variable
X: (, F, P) → (R, B, Px)
Sigma Field: F = {, {H}, {T}, }

Borel Field: B = subsets on the real line
P= Probability measure associated with the events in F.
Px= Probability measure associated with the events in B.
X is a random variable (r.v.) only if the inverse image under X of all Borel subsets in R
making up the Borel field B are valid events in .
Or, X-1(B) = EB = valid event in F
where B  B
Let us Try!
A bus arrives at random in [0,T]. Let t be the time of arrival of bus. The bus is equally likely to
arrive at any time t between the interval [0,T]. The arrival time is uniformly distributed over [0, T].
Example 1: Define a discrete (binary) random variable X  {0, 1} as below:

Define X(t) = 1 if t  [T/4, T/2]
0 otherwise
(a) P[X(t) = 1] =
(b) P[X(t) = 0] =
(c) P[X(t) ≤ 2] =
Example 2: Define a continuous random variable X  [0, T] as below:
Define X = time of arrival t
(a) P[X(t) ≤ 0] =
(b) P[X(t) ≤ T] =
(c) P[X(t) > T] =
Probability Distribution Function (PDF) or

Cumulative Distribution Function (CDF)
This is a point-wise function and is defined as
FX(x) = P[X ≤ x] = P [{- ∞, x}] X= notation of the random variable

x = the value taken by the random variable
Properties of a PDF function:
1) FX(∞) =1; FX(- ∞) =0
2) If x1 ≤ x2 , then FX(x1) ≤ FX(x2)

i.e., FX(x) is a non-decreasing function of x.
3) FX(x) is continuous from the right, i.e.,

FX(x) = lim FX(x + ε) , where ε > 0
ε→0
Examples of CDF

Define X(t) = 1 if t  [T/4, T/2]
0 otherwise
Compute and plot the CDF of the above random variable.
Examples of CDF

Define X(t) = 1 if t  [T/4, T/2]
0 otherwise
Compute and plot the CDF of the above random variable.
Solution: Since the train is equally likely to arrive within 0 to time T,
P[X = 1] = (T/2-T/4)/T = 1/4 and P[X = 0] = 3/4.
1) Case 1: - ∞ < x < 0

FX(x) = P[X ≤ x] = 0 because no event of X has occurred yet
(it can take only two values 0 and 1).
2) Case 2:
FX(x) = P[X ≤ 1] = 1 because the train would have arrived
and hence, P[X ≤ ∞] = P[X ≤ 1] =…. = 1;
Examples of CDF
Example 2: Define a continuous random variable X as the time of arrival t. Compute and plot the
CDF of the above random variable.
Examples of CDF
Example 2: Define a continuous random variable X as the time of arrival t. Compute and plot the
CDF of the above random variable.
Solution:
1) For T ≤ t ≤ ∞
FX(t) = P[X ≤ t] = 1 because the train would have arrived upto time T and hence,
P[X ≤ ∞] = P[X ≤ T] =…. = 1;
2) For t ≤ 0
FX(t) = P[X ≤ t] = 0 because the train arrives only after time t  0
FX(- ∞) = 0
Probability Density Function (pdf)

If 𝐹𝑋 𝑥 is continuous and differentiable, then the probability density function is defined
as
𝑑 𝐹𝑋 𝑥
𝑓𝑋 𝑥 =
𝑑𝑥
The pdf, if it exists, satisfies the following properties:
1. 𝑓𝑋 𝑥 ≥ 0
𝑥
2. 𝐹𝑋 𝑥 = ‫׬‬−∞ 𝑓𝑋 𝜉 𝑑𝜉
∞
𝐹𝑋 ∞ = 1 ⟹ න 𝑓𝑋 𝜉 𝑑𝜉 = 1
−∞
Area under pdf = 1
𝑥2
3. P 𝑥1 < 𝑋 ≤ 𝑥2 = ‫𝑋𝑓 𝑥׬‬ 𝜉 𝑑𝜉
1
Mean and Variance of a Random Variables

∞
• Mean (𝜇): E [X] = 𝜇 = ‫׬‬−∞ 𝑥𝑓𝑋 𝑥 𝑑𝑥
• Variance (𝜎 2 ) :
𝜎 2 = 𝐸 𝑋 − 𝜇𝑋 2
∞
= න 𝑥 − 𝜇𝑋 2 𝑓𝑋 𝑥 𝑑𝑥
−∞
• Standard Deviation (𝜎):
𝜎= 𝐸 𝑋 − 𝜇𝑋 2
Examples of Continuous Random Variable

1. Uniformly distributed random variable
The PDF is given by

1
, 𝑥1 < 𝑥 ≤ 𝑥2
𝑓𝑋 𝑥 = ൞𝑥2 − 𝑥1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥2 +𝑥1
• Mean =
2
𝑥 −𝑥 2
• Variance = 2 1
12
∞ 𝑥2 𝑥2
1 1 𝑥1 = 2, 𝑥2 = 3, number of samples= 10 Lakh
න 𝑓𝑋 𝑥 𝑑𝑥 = න 𝑑𝑥 = න 𝑑𝑥
−∞ 𝑥1 𝑥2 − 𝑥1 𝑥2 − 𝑥1 𝑥1
1 𝑥 1
= [𝑥]𝑥12 = 𝑥 − 𝑥1 = 1
𝑥2 − 𝑥1 𝑥2 − 𝑥1 2

2. Gaussian [Normal] distributed random
variable
The PDF is given by

1
1 − 𝑥−𝜇 2
𝑓𝑋 𝑥 = 2
𝑒 2𝜎2
2𝜋𝜎
where
𝜇 : mean of the Gaussian random variable
𝜎 : standard deviation of Gaussian random variable
𝜎 2 : variance of Gaussian random variable
X ∶ 𝒩(0, 𝜎 2 ) number of samples= 1,000


• Since a Gaussian random variable is symmetric about
mean, the area under 𝑓𝑋 (𝑥):
𝜇 1
o −∞ < 𝑥 ≤ 𝜇 ∶ ‫׬‬−∞ 𝑓𝑋 𝑥 𝑑𝑥 =
2
∞ 1
o 𝜇 < 𝑥 ≤ ∞ ∶ ‫= 𝑥𝑑 𝑥 𝑋𝑓 𝜇׬‬
2
∞
• Mean (𝜇): E [X] = 𝜇 = ‫׬‬−∞ 𝑥𝑓𝑋 𝑥 𝑑𝑥
number of samples= 1,000
𝑑 𝑓𝑋 𝑥
• Mode: Mo[X] = =0
𝑑𝑥
𝑥𝑚 ∞
• Median (𝑥𝑚 ) ∶ ‫׬‬−∞ 𝑓𝑋 𝑥 𝑑𝑥 = ‫𝑋𝑓 𝑥׬‬ 𝑥 𝑑𝑥
𝑚
𝑋−𝜇
Let us define 𝑌 = ≡ 𝒩(0,1)
𝜎
1 𝑦2
−
𝑓𝑌 𝑦 = 𝑒 2
2𝜋

68-95-99 Rule
• 68% of the population is within 1 standard deviation of the

mean.
• 95% of the population is within 2 standard deviation of the
mean.
• 99.7% of the population is within 3 standard deviation of the
mean.
𝜇 = 0, 𝜎 2 = 1

3. Rayleigh distributed random variable
The PDF is given by

2
𝑥 − 𝑥2
𝑓𝑋 𝑥 = ൞𝜎 2 𝑒
2𝜎 , 𝑥>0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where,
scale parameter, 𝜎 > 0
𝜋
• Mean = 𝜎
2
𝜋
• Variance = 𝜎 2 2 −
2

4. Laplacian distributed random variable
The PDF is given by

𝑐 −𝑐 𝑥
𝑓𝑋 𝑥 = 𝑒
2
where,
scale parameter, 𝑐 > 0

5. Exponentially distributed random variable
The PDF is given by

−𝜆𝑥
𝑓𝑋 𝑥 = ቊ𝜆𝑒 , 𝑥≥0
0, 𝑥 < 0
where,
scale parameter, 𝜆 > 0
1
• Mean:
𝜆
1
• Variance:
𝜆2
Examples of Discrete Random Variable

1. Bernoulli distributed random
variable
The PDF is given by
𝑝, 𝑥=1
𝑓 𝑥 =ቊ
𝑞, 𝑥=0
where,
𝑝: probability of success
𝑞: probability of failure
• Mean = 𝑝
• Variance = 𝑝(1 − 𝑝)

2. Binomial distributed random variable
The PDF is given by
𝑛
𝑓𝑋 𝑋 = 𝑘 = 𝐶𝑘 𝑝𝑘 𝑞 𝑛−𝑘
where,
p: probability of success
q: probability of failure
n: number of trials
𝑘: number of successes
• 𝑝+𝑞 = 1
• Mean = 𝑛𝑝
• Variance = 𝑛𝑝𝑞

3. Poisson distributed random variable
The PDF is given by
𝑒 −𝑎 𝑎𝑘
𝑃𝑋 𝑋 = 𝑘 =
𝑘!
where,
𝑘 = 0,1, …
𝑎>0
• Mean = a
• Variance = a
Two Random Variables

Let there be two random variables 𝑋 and 𝑌
Then,
• 𝑓𝑋𝑌 (𝑥, 𝑦): Joint density function
• 𝐹𝑋𝑌 𝑥, 𝑦 : Joint distribution function
𝐹𝑋𝑌 −∞, ∞ = 1
𝐹𝑋𝑌 ∞, 𝑦 = 𝐹𝑌 (𝑦) [Marginal distribution function of X]
𝐹𝑋𝑌 𝑥, ∞ = 𝐹𝑋 𝑥 [Marginal distribution function of Y]
𝐹𝑋𝑌 −∞, 𝑦 = 0
𝐹𝑋𝑌 𝑥, −∞ = 0
𝜕2 𝐹𝑋𝑌 (𝑥,𝑦)
• 𝑓𝑋𝑌 𝑥, 𝑦 =
𝜕𝑥 𝜕𝑦
Jointly Gaussian Random Variables

Correlated random variables:
1
1 −
2𝜎2 1−𝜌2
𝑥 2 +𝑦2 −2𝜌𝑥𝑦
𝑓𝑋𝑌 𝑥, 𝑦 = 𝑒
2𝜋𝜎 2 1 − 𝜌2
Where
𝜌 : Correlation Coefficient
i.e.,
𝐸 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌
𝜌=
𝐸 𝑋 − 𝜇𝑋 2 𝐸[ 𝑋 − 𝜇𝑌 2 ]
Cases of Joint Random Variables

• Case 1: Uncorrelated random variables
• Case 2: Statistically independent random variables:
Proof:
If A and B are two independent events, then
𝑃 𝐴𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃 𝐵
[S.I.R.V.]
𝐹𝑋𝑌 𝑥, 𝑦 = 𝐹𝑋 𝑥 𝐹𝑌 (𝑦)
𝑥,𝑦 𝑥 𝑦
ඵ 𝑓𝑋𝑌 𝜉, 𝜂 𝑑𝜉 𝑑𝜂 = න 𝑓𝑋 𝜉 𝑑𝜉 න 𝑓𝑌 𝜂 𝑑𝜂
−∞ −∞ −∞
𝑓𝑋𝑌 𝑥, 𝑦 = 𝑓𝑋 𝑥 𝑓𝑌 (𝑦)
• Case 3: Orthogonal random variables

• Case 4: Independent and identically distributed random variables
Jointly Gaussian Random Variables

Let there be two jointly Gaussian random variables 𝑋 ≡ 𝒩(0, 𝜎 2 ) and 𝑌 ≡ 𝒩(0, 𝜎 2 ).
Then,
• Joint distribution function: 𝐹𝑋𝑌 𝑥, 𝑦
• Joint density function:
1 1 𝑥 2 +𝑦2 1 𝑥2 1 𝑦2
− 2 − −
𝑓𝑋𝑌 𝑥, 𝑦 = 2
𝑒 2 𝜎 = 2
𝑒 2𝜎 2
2
𝑒 2𝜎2
2𝜋𝜎 2𝜋𝜎 2𝜋𝜎
= 𝑓𝑋 𝑥 𝑓𝑌 (𝑦)
Thus, X & Y are jointly Gaussian random variables but they are statistically independent (S.I.)
random variable.
I.I.D.: Independent and identically distributed random variable: Having identical density
function & being statistically independent.
Example
2
• Central moments: 𝐸[ 𝑥 − 𝜇 ], 𝐸 𝑥 − 𝜇 , … , 𝐸[ 𝑥 − 𝜇 𝑖 ]
where
∞
𝑖
𝐸 𝑥−𝜇 =න 𝑥 − 𝜇 𝑖 𝑓𝑋 𝑥 𝑑𝑥
−∞
• Non-central moments: 𝐸[𝑋], 𝐸[𝑋 2 ], … , 𝐸[𝑋𝑗 ]

Central Limit Theorem

The normalized sum of a large number of independent random variables X1, X2, X3,….,Xn
each with mean zero and finite variance 12, 22,…., n2 tends to be a normal random
variable provided that the individual variances k2; k=1,2,…,n are small compared to 𝑠𝑛2 ,
where 𝑠𝑛2 = σ𝑛𝑖=1 2𝑖 . This condition on variance is also called the Lindeberg condition.
Define
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑍𝑛 ≜
𝑠𝑛

If ε > 0, n is sufficiently large, and each of the k satisfy

k < εsn for k=1,2,… n,
then Zn  𝒩(0, 1).
Vector Random Variables

𝑋1
•
𝑿= •
•
𝑋𝑛
= n-dimensional random vector with PDF FX(x)

where FX(x) = P[X1 ≤ x1; X2 ≤ x2;……; Xn ≤ xn]
= P[X ≤ x]
Or, FX(x) = Fx1x2…xn (x1,x2,…,xn)
1) P[X ≤ ∞] = P[X1 ≤ ∞; X2 ≤ ∞;……; Xn ≤ ∞] =1.

2) P[X ≤ - ∞] = P[X1 ≤ - ∞; X2 ≤ x2;……; Xn ≤ xn] = P(  A2 …  An)= P()=0
Or = P[X1 ≤ x1; X2 ≤ - ∞;……; Xn ≤ xn] = P(A1   …  An)= P() =0
Or = P[X1 ≤ x1; X2 ≤ x2;……; Xn ≤ - ∞] = P(A1  A2 …  )= P() =0

Any of the above expressions on the right side will have the probability equal to zero.
Vector Random Variables

𝜕 𝑛 𝐹𝑿 (𝐱) 𝜕 𝑛 𝐹𝑿 (𝑥1 𝑥2 … . 𝑥𝑛 )
𝑓𝑿 𝐱 = =
𝜕𝐱 𝑛 𝜕𝑥1 𝜕𝑥2 … . 𝜕𝑥𝑛
𝐱
𝐹𝐱 (𝐱) = ම 𝑓𝑿 𝐱 𝐝𝐱
−∞
Mean of a random vector:

1
•
E[X] = X = • = mean vector
•
𝑛
where
∞
1 = ‫׮‬−∞ 𝑥1 𝑓𝐗 𝐱 𝑑𝑥1 … 𝑑𝑥𝑛
Covariance matrix of a random vector

Covariance matrix of a random vector [X1,……., Xn]T:
K=E[(X-)(X-)T]
(X1−1)2 … (X1−1)(Xn−n)
= ⋮ ⋱ ⋮
(Xn−n)(X1−1) … (Xn−n)2
12 𝐾12 … … … 𝐾1𝑛

𝐾21 22 … … … 𝐾2𝑛
= ⋮ ⋱
⋮ ⋱
… … ⋱
𝐾𝑛1 𝐾𝑛2 … … … 2𝑛
For complex random variables, K=E[(X-)(X-)H], where H denotes the conjugate

transpose operation.
Properties of a covariance matrix

1) A covariance matrix K is a always a symmetric matrix.
For real-valued random variables, K=KT.

For complex-valued random variables, K is Hermitian, i.e., K=KH.
Verify if K21=K12?
2) A covariance matrix is at least positive semi-definite (p.s.d.).
3) A real symmetric covariance matrix K is similar to a diagonal matrix, i.e.,

U-1KU = ,
where the columns of U contain the ordered orthogonal unit eigenvectors 1, 2,…., n
of K.
Multidimensional Gaussian Law

𝑋1
•
Given 𝑿= • , where Xi’s are independent Gaussian random variables
• Xi= 𝒩(i, i2).
𝑋𝑛
The probability density function of X is given by
𝑛
𝑓𝑿 𝑥1 , … , 𝑥𝑛 = ෑ 𝑓𝑋𝑖 (𝑥𝑖 )
𝑖=1
In general, the probability density function of a Gaussian random vector X is given by
1 1
𝑓𝑿 𝑥1 , … , 𝑥𝑛 = 1/2 𝑛 /2
𝑒𝑥𝑝 − 𝐱 −  𝑇𝐊 −𝟏 (𝐱 − )
det 𝐊 2𝜋 2
where  = mean vector of X and K is the covariance matrix of X.

Some Theorems
Theorem: Let X be an n-dimensional normal random vector with a positive definite
covariance matrix K and a mean vector . Let A be an mxn matrix of rank m.
Then, Y=AX has an m-dimensional normal pdf with positive definite covariance matrix Q and
a mean vector  given by
Q= AKAT
= A
Theorem: Let X be an n-dimensional normal random vector with a positive definite

covariance matrix K and a mean vector . Let U be the matrix consisting of the eigenvectors of
K and Z be the diagonal matrix consisting of the corresponding eigenvalues at the diagonal of
Z. Then, Y defined as below is a normal random vector with mean vector A and consists of
uncorrelated Gaussian random variables of unit variance:
Y=AX,
-1/2
where A= Z U . T
Learning Objectives
● Definition of a random variable
● Types- discrete and continuous
● Probability Distribution Function (PDF) OR Cumulative Distribution Function (CDF)
○ Examples
● Probability Density Function (pdf)
● Examples of famous random variables
● Expectation & Variance
● Joint distribution and density functions
● Central and non-central moments
● Independence, uncorrelatedness, orthogonality of random variables
● Central limit theorem
● Random vector: mean vector, covariance matrix
Reference Text.: Stark, Henry, and John William Woods.

"Probability and random processes with applications to signal
processing." (2002).

W2M1-Prob Review Random Variables

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

W2M1-Prob Review Random Variables

Uploaded by

Copyright:

Available Formats

Probability Theory: Review

Anubha Gupta, PhD.

Probability Theory: Review

What is random here?

Sigma Field: F = {, {H}, {T}, }

Example 1: Define a discrete (binary) random variable X  {0, 1} as below:

Probability Distribution Function (PDF) or

FX(x) = P[X ≤ x] = P [{- ∞, x}] X= notation of the random variable

2) If x1 ≤ x2 , then FX(x1) ≤ FX(x2)

3) FX(x) is continuous from the right, i.e.,

Example 1: Define a discrete (binary) random variable X  {0, 1} as below:

Example 1: Define a discrete (binary) random variable X  {0, 1} as below:

1) Case 1: - ∞ < x < 0

Probability Density Function (pdf)

The pdf, if it exists, satisfies the following properties:

Mean and Variance of a Random Variables

• Standard Deviation (𝜎):

Examples of Continuous Random Variable

The PDF is given by

Examples of Continuous Random Variable

The PDF is given by

X ∶ 𝒩(0, 𝜎 2 ) number of samples= 1,000

Examples of Continuous Random Variable

Examples of Continuous Random Variable

• 68% of the population is within 1 standard deviation of the

Examples of Continuous Random Variable

The PDF is given by

Examples of Continuous Random Variable

The PDF is given by

Examples of Continuous Random Variable

The PDF is given by

Examples of Discrete Random Variable

The PDF is given by

Examples of Discrete Random Variable

The PDF is given by

Examples of Discrete Random Variable

The PDF is given by

Two Random Variables

Jointly Gaussian Random Variables

Cases of Joint Random Variables

• Case 3: Orthogonal random variables

Jointly Gaussian Random Variables

• Non-central moments: 𝐸[𝑋], 𝐸[𝑋 2 ], … , 𝐸[𝑋𝑗 ]

Central Limit Theorem

If ε > 0, n is sufficiently large, and each of the k satisfy

Vector Random Variables

= n-dimensional random vector with PDF FX(x)

1) P[X ≤ ∞] = P[X1 ≤ ∞; X2 ≤ ∞;……; Xn ≤ ∞] =1.

Or = P[X1 ≤ x1; X2 ≤ x2;……; Xn ≤ - ∞] = P(A1  A2 …  )= P() =0

Vector Random Variables

Mean of a random vector:

Covariance matrix of a random vector

12 𝐾12 … … … 𝐾1𝑛

For complex random variables, K=E[(X-)(X-)H], where H denotes the conjugate

Properties of a covariance matrix

For real-valued random variables, K=KT.

2) A covariance matrix is at least positive semi-definite (p.s.d.).

3) A real symmetric covariance matrix K is similar to a diagonal matrix, i.e.,

Multidimensional Gaussian Law

where  = mean vector of X and K is the covariance matrix of X.

Theorem: Let X be an n-dimensional normal random vector with a positive definite

Reference Text.: Stark, Henry, and John William Woods.