You are on page 1of 36

Probability Theory: Review

Anubha Gupta, PhD.


Professor
SBILab, Dept. of ECE,
IIIT-Delhi, India
Contact: anubha@iiitd.ac.in; Lab: http://sbilab.iiitd.edu.in
Machine Learning in Hindi

Probability Theory: Review


(Random Variables)
Machine Learning in Hindi

Learning Objectives
● Definition of a random variable
● Types- discrete and continuous
● Probability Distribution Function (PDF) OR Cumulative Distribution Function (CDF)
○ Examples
● Probability Density Function (pdf)
● Examples of famous random variables
● Expectation & Variance
● Joint distribution and density functions
● Central and non-central moments
● Independence, uncorrelatedness, orthogonality of random variables
● Central limit theorem
● Random vector: mean vector, covariance matrix

Reference Text.: Stark, Henry, and John William Woods. "Probability and random processes with applications to signal processing." (2002).
Machine Learning in Hindi

Random Variable
X:  → R

What is random here?


It is actually not a variable but a function or a mapping whose domain is  and range is R.
Consider the experiment: Tossing of a coin; X: {H} → 1; X: {T} → 0

If the coin is fair (unbiased or not biased to generate any specific outcome)
P[X=1]=P[H]=0.5; P[X=0]=P[T]= 0.5

Compute:
(a) P[X < 0] event is: {X < 0}
(b) P[X ≤ 0] event is: {X ≤ 0}
(c) P[X ≤ 0.4] event is: {X ≤ 0.4}
(d) P[X ≤ 1] event is: {X ≤ 1}
Or, an event is specified as {X ≤ x}
Machine Learning in Hindi

Random Variable
X: (, F, P) → (R, B, Px)

Sigma Field: F = {, {H}, {T}, }


Borel Field: B = subsets on the real line
P= Probability measure associated with the events in F.
Px= Probability measure associated with the events in B.

X is a random variable (r.v.) only if the inverse image under X of all Borel subsets in R
making up the Borel field B are valid events in .
Or, X-1(B) = EB = valid event in F
where B  B
Machine Learning in Hindi

Let us Try!
A bus arrives at random in [0,T]. Let t be the time of arrival of bus. The bus is equally likely to
arrive at any time t between the interval [0,T]. The arrival time is uniformly distributed over [0, T].

Example 1: Define a discrete (binary) random variable X  {0, 1} as below:


Define X(t) = 1 if t  [T/4, T/2]
0 otherwise

(a) P[X(t) = 1] =
(b) P[X(t) = 0] =
(c) P[X(t) ≤ 2] =
Example 2: Define a continuous random variable X  [0, T] as below:
Define X = time of arrival t

(a) P[X(t) ≤ 0] =
(b) P[X(t) ≤ T] =
(c) P[X(t) > T] =
Machine Learning in Hindi

Probability Distribution Function (PDF) or


Cumulative Distribution Function (CDF)
This is a point-wise function and is defined as

FX(x) = P[X ≤ x] = P [{- ∞, x}] X= notation of the random variable


x = the value taken by the random variable
Properties of a PDF function:
1) FX(∞) =1; FX(- ∞) =0

2) If x1 ≤ x2 , then FX(x1) ≤ FX(x2)


i.e., FX(x) is a non-decreasing function of x.

3) FX(x) is continuous from the right, i.e.,


FX(x) = lim FX(x + ε) , where ε > 0
ε→0
Machine Learning in Hindi

Examples of CDF
A bus arrives at random in [0,T]. Let t be the time of arrival of bus. The bus is equally likely to
arrive at any time t between the interval [0,T]. The arrival time is uniformly distributed over [0, T].

Example 1: Define a discrete (binary) random variable X  {0, 1} as below:


Define X(t) = 1 if t  [T/4, T/2]
0 otherwise
Compute and plot the CDF of the above random variable.
Machine Learning in Hindi

Examples of CDF
A bus arrives at random in [0,T]. Let t be the time of arrival of bus. The bus is equally likely to
arrive at any time t between the interval [0,T]. The arrival time is uniformly distributed over [0, T].

Example 1: Define a discrete (binary) random variable X  {0, 1} as below:


Define X(t) = 1 if t  [T/4, T/2]
0 otherwise
Compute and plot the CDF of the above random variable.
Solution: Since the train is equally likely to arrive within 0 to time T,
P[X = 1] = (T/2-T/4)/T = 1/4 and P[X = 0] = 3/4.

1) Case 1: - ∞ < x < 0


FX(x) = P[X ≤ x] = 0 because no event of X has occurred yet
(it can take only two values 0 and 1).

2) Case 2:
FX(x) = P[X ≤ 1] = 1 because the train would have arrived
and hence, P[X ≤ ∞] = P[X ≤ 1] =…. = 1;
Machine Learning in Hindi

Examples of CDF
A bus arrives at random in [0,T]. Let t be the time of arrival of bus. The bus is equally likely to
arrive at any time t between the interval [0,T]. The arrival time is uniformly distributed over [0, T].

Example 2: Define a continuous random variable X as the time of arrival t. Compute and plot the
CDF of the above random variable.
Machine Learning in Hindi

Examples of CDF
A bus arrives at random in [0,T]. Let t be the time of arrival of bus. The bus is equally likely to
arrive at any time t between the interval [0,T]. The arrival time is uniformly distributed over [0, T].

Example 2: Define a continuous random variable X as the time of arrival t. Compute and plot the
CDF of the above random variable.

Solution:
1) For T ≤ t ≤ ∞
FX(t) = P[X ≤ t] = 1 because the train would have arrived upto time T and hence,
P[X ≤ ∞] = P[X ≤ T] =…. = 1;

2) For t ≤ 0
FX(t) = P[X ≤ t] = 0 because the train arrives only after time t  0
FX(- ∞) = 0
Machine Learning in Hindi

Probability Density Function (pdf)


If 𝐹𝑋 𝑥 is continuous and differentiable, then the probability density function is defined
as
𝑑 𝐹𝑋 𝑥
𝑓𝑋 𝑥 =
𝑑𝑥

The pdf, if it exists, satisfies the following properties:

1. 𝑓𝑋 𝑥 ≥ 0
𝑥
2. 𝐹𝑋 𝑥 = ‫׬‬−∞ 𝑓𝑋 𝜉 𝑑𝜉

𝐹𝑋 ∞ = 1 ⟹ න 𝑓𝑋 𝜉 𝑑𝜉 = 1
−∞
Area under pdf = 1
𝑥2
3. P 𝑥1 < 𝑋 ≤ 𝑥2 = ‫𝑋𝑓 𝑥׬‬ 𝜉 𝑑𝜉
1
Machine Learning in Hindi

Mean and Variance of a Random Variables



• Mean (𝜇): E [X] = 𝜇 = ‫׬‬−∞ 𝑥𝑓𝑋 𝑥 𝑑𝑥

• Variance (𝜎 2 ) :

𝜎 2 = 𝐸 𝑋 − 𝜇𝑋 2

= න 𝑥 − 𝜇𝑋 2 𝑓𝑋 𝑥 𝑑𝑥
−∞

• Standard Deviation (𝜎):

𝜎= 𝐸 𝑋 − 𝜇𝑋 2
Machine Learning in Hindi

Examples of Continuous Random Variable


1. Uniformly distributed random variable

The PDF is given by


1
, 𝑥1 < 𝑥 ≤ 𝑥2
𝑓𝑋 𝑥 = ൞𝑥2 − 𝑥1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥2 +𝑥1
• Mean =
2
𝑥 −𝑥 2
• Variance = 2 1
12

∞ 𝑥2 𝑥2
1 1 𝑥1 = 2, 𝑥2 = 3, number of samples= 10 Lakh
න 𝑓𝑋 𝑥 𝑑𝑥 = න 𝑑𝑥 = න 𝑑𝑥
−∞ 𝑥1 𝑥2 − 𝑥1 𝑥2 − 𝑥1 𝑥1
1 𝑥 1
= [𝑥]𝑥12 = 𝑥 − 𝑥1 = 1
𝑥2 − 𝑥1 𝑥2 − 𝑥1 2
Machine Learning in Hindi

Examples of Continuous Random Variable


2. Gaussian [Normal] distributed random
variable

The PDF is given by


1
1 − 𝑥−𝜇 2
𝑓𝑋 𝑥 = 2
𝑒 2𝜎2
2𝜋𝜎
where
𝜇 : mean of the Gaussian random variable
𝜎 : standard deviation of Gaussian random variable
𝜎 2 : variance of Gaussian random variable

X ∶ 𝒩(0, 𝜎 2 ) number of samples= 1,000


Machine Learning in Hindi

Examples of Continuous Random Variable


• Since a Gaussian random variable is symmetric about
mean, the area under 𝑓𝑋 (𝑥):

𝜇 1
o −∞ < 𝑥 ≤ 𝜇 ∶ ‫׬‬−∞ 𝑓𝑋 𝑥 𝑑𝑥 =
2
∞ 1
o 𝜇 < 𝑥 ≤ ∞ ∶ ‫= 𝑥𝑑 𝑥 𝑋𝑓 𝜇׬‬
2


• Mean (𝜇): E [X] = 𝜇 = ‫׬‬−∞ 𝑥𝑓𝑋 𝑥 𝑑𝑥
number of samples= 1,000
𝑑 𝑓𝑋 𝑥
• Mode: Mo[X] = =0
𝑑𝑥
𝑥𝑚 ∞
• Median (𝑥𝑚 ) ∶ ‫׬‬−∞ 𝑓𝑋 𝑥 𝑑𝑥 = ‫𝑋𝑓 𝑥׬‬ 𝑥 𝑑𝑥
𝑚

𝑋−𝜇
Let us define 𝑌 = ≡ 𝒩(0,1)
𝜎
1 𝑦2

𝑓𝑌 𝑦 = 𝑒 2
2𝜋
Machine Learning in Hindi

Examples of Continuous Random Variable


68-95-99 Rule

• 68% of the population is within 1 standard deviation of the


mean.
• 95% of the population is within 2 standard deviation of the
mean.
• 99.7% of the population is within 3 standard deviation of the
mean.

𝜇 = 0, 𝜎 2 = 1
Machine Learning in Hindi

Examples of Continuous Random Variable


3. Rayleigh distributed random variable

The PDF is given by


2
𝑥 − 𝑥2
𝑓𝑋 𝑥 = ൞𝜎 2 𝑒
2𝜎 , 𝑥>0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

where,
scale parameter, 𝜎 > 0

𝜋
• Mean = 𝜎
2
𝜋
• Variance = 𝜎 2 2 −
2
Machine Learning in Hindi

Examples of Continuous Random Variable


4. Laplacian distributed random variable

The PDF is given by


𝑐 −𝑐 𝑥
𝑓𝑋 𝑥 = 𝑒
2

where,
scale parameter, 𝑐 > 0
Machine Learning in Hindi

Examples of Continuous Random Variable


5. Exponentially distributed random variable

The PDF is given by


−𝜆𝑥
𝑓𝑋 𝑥 = ቊ𝜆𝑒 , 𝑥≥0
0, 𝑥 < 0

where,
scale parameter, 𝜆 > 0
1
• Mean:
𝜆
1
• Variance:
𝜆2
Machine Learning in Hindi

Examples of Discrete Random Variable


1. Bernoulli distributed random
variable

The PDF is given by

𝑝, 𝑥=1
𝑓 𝑥 =ቊ
𝑞, 𝑥=0
where,
𝑝: probability of success
𝑞: probability of failure

• Mean = 𝑝
• Variance = 𝑝(1 − 𝑝)
Machine Learning in Hindi

Examples of Discrete Random Variable


2. Binomial distributed random variable

The PDF is given by

𝑛
𝑓𝑋 𝑋 = 𝑘 = 𝐶𝑘 𝑝𝑘 𝑞 𝑛−𝑘

where,
p: probability of success
q: probability of failure
n: number of trials
𝑘: number of successes

• 𝑝+𝑞 = 1
• Mean = 𝑛𝑝
• Variance = 𝑛𝑝𝑞
Machine Learning in Hindi

Examples of Discrete Random Variable


3. Poisson distributed random variable

The PDF is given by

𝑒 −𝑎 𝑎𝑘
𝑃𝑋 𝑋 = 𝑘 =
𝑘!
where,
𝑘 = 0,1, …
𝑎>0
• Mean = a
• Variance = a
Machine Learning in Hindi

Two Random Variables


Let there be two random variables 𝑋 and 𝑌
Then,
• 𝑓𝑋𝑌 (𝑥, 𝑦): Joint density function
• 𝐹𝑋𝑌 𝑥, 𝑦 : Joint distribution function
𝐹𝑋𝑌 −∞, ∞ = 1
𝐹𝑋𝑌 ∞, 𝑦 = 𝐹𝑌 (𝑦) [Marginal distribution function of X]
𝐹𝑋𝑌 𝑥, ∞ = 𝐹𝑋 𝑥 [Marginal distribution function of Y]
𝐹𝑋𝑌 −∞, 𝑦 = 0
𝐹𝑋𝑌 𝑥, −∞ = 0
𝜕2 𝐹𝑋𝑌 (𝑥,𝑦)
• 𝑓𝑋𝑌 𝑥, 𝑦 =
𝜕𝑥 𝜕𝑦
Machine Learning in Hindi

Jointly Gaussian Random Variables


Correlated random variables:
1
1 −
2𝜎2 1−𝜌2
𝑥 2 +𝑦2 −2𝜌𝑥𝑦
𝑓𝑋𝑌 𝑥, 𝑦 = 𝑒
2𝜋𝜎 2 1 − 𝜌2
Where
𝜌 : Correlation Coefficient

i.e.,
𝐸 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌
𝜌=
𝐸 𝑋 − 𝜇𝑋 2 𝐸[ 𝑋 − 𝜇𝑌 2 ]
Machine Learning in Hindi

Cases of Joint Random Variables


• Case 1: Uncorrelated random variables
• Case 2: Statistically independent random variables:

Proof:
If A and B are two independent events, then
𝑃 𝐴𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃 𝐵

[S.I.R.V.]
𝐹𝑋𝑌 𝑥, 𝑦 = 𝐹𝑋 𝑥 𝐹𝑌 (𝑦)
𝑥,𝑦 𝑥 𝑦
ඵ 𝑓𝑋𝑌 𝜉, 𝜂 𝑑𝜉 𝑑𝜂 = න 𝑓𝑋 𝜉 𝑑𝜉 න 𝑓𝑌 𝜂 𝑑𝜂
−∞ −∞ −∞

𝑓𝑋𝑌 𝑥, 𝑦 = 𝑓𝑋 𝑥 𝑓𝑌 (𝑦)

• Case 3: Orthogonal random variables


• Case 4: Independent and identically distributed random variables
Machine Learning in Hindi

Jointly Gaussian Random Variables


Let there be two jointly Gaussian random variables 𝑋 ≡ 𝒩(0, 𝜎 2 ) and 𝑌 ≡ 𝒩(0, 𝜎 2 ).
Then,
• Joint distribution function: 𝐹𝑋𝑌 𝑥, 𝑦
• Joint density function:
1 1 𝑥 2 +𝑦2 1 𝑥2 1 𝑦2
− 2 − −
𝑓𝑋𝑌 𝑥, 𝑦 = 2
𝑒 2 𝜎 = 2
𝑒 2𝜎 2
2
𝑒 2𝜎2
2𝜋𝜎 2𝜋𝜎 2𝜋𝜎

= 𝑓𝑋 𝑥 𝑓𝑌 (𝑦)

Thus, X & Y are jointly Gaussian random variables but they are statistically independent (S.I.)
random variable.

I.I.D.: Independent and identically distributed random variable: Having identical density
function & being statistically independent.
Machine Learning in Hindi

Example
2
• Central moments: 𝐸[ 𝑥 − 𝜇 ], 𝐸 𝑥 − 𝜇 , … , 𝐸[ 𝑥 − 𝜇 𝑖 ]
where

𝑖
𝐸 𝑥−𝜇 =න 𝑥 − 𝜇 𝑖 𝑓𝑋 𝑥 𝑑𝑥
−∞

• Non-central moments: 𝐸[𝑋], 𝐸[𝑋 2 ], … , 𝐸[𝑋𝑗 ]


Machine Learning in Hindi

Central Limit Theorem


The normalized sum of a large number of independent random variables X1, X2, X3,….,Xn
each with mean zero and finite variance 12, 22,…., n2 tends to be a normal random
variable provided that the individual variances k2; k=1,2,…,n are small compared to 𝑠𝑛2 ,
where 𝑠𝑛2 = σ𝑛𝑖=1 2𝑖 . This condition on variance is also called the Lindeberg condition.

Define
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑍𝑛 ≜
𝑠𝑛

If ε > 0, n is sufficiently large, and each of the k satisfy


k < εsn for k=1,2,… n,
then Zn  𝒩(0, 1).
Machine Learning in Hindi

Vector Random Variables


𝑋1

𝑿= •

𝑋𝑛

= n-dimensional random vector with PDF FX(x)


where FX(x) = P[X1 ≤ x1; X2 ≤ x2;……; Xn ≤ xn]
= P[X ≤ x]
Or, FX(x) = Fx1x2…xn (x1,x2,…,xn)

1) P[X ≤ ∞] = P[X1 ≤ ∞; X2 ≤ ∞;……; Xn ≤ ∞] =1.


2) P[X ≤ - ∞] = P[X1 ≤ - ∞; X2 ≤ x2;……; Xn ≤ xn] = P(  A2 …  An)= P()=0
Or = P[X1 ≤ x1; X2 ≤ - ∞;……; Xn ≤ xn] = P(A1   …  An)= P() =0

Or = P[X1 ≤ x1; X2 ≤ x2;……; Xn ≤ - ∞] = P(A1  A2 …  )= P() =0


Any of the above expressions on the right side will have the probability equal to zero.
Machine Learning in Hindi

Vector Random Variables


𝜕 𝑛 𝐹𝑿 (𝐱) 𝜕 𝑛 𝐹𝑿 (𝑥1 𝑥2 … . 𝑥𝑛 )
𝑓𝑿 𝐱 = =
𝜕𝐱 𝑛 𝜕𝑥1 𝜕𝑥2 … . 𝜕𝑥𝑛
𝐱

𝐹𝐱 (𝐱) = ම 𝑓𝑿 𝐱 𝐝𝐱
−∞

Mean of a random vector:


1

E[X] = X = • = mean vector

𝑛
where

1 = ‫׮‬−∞ 𝑥1 𝑓𝐗 𝐱 𝑑𝑥1 … 𝑑𝑥𝑛
Machine Learning in Hindi

Covariance matrix of a random vector


Covariance matrix of a random vector [X1,……., Xn]T:
K=E[(X-)(X-)T]

(X1−1)2 … (X1−1)(Xn−n)
= ⋮ ⋱ ⋮
(Xn−n)(X1−1) … (Xn−n)2

12 𝐾12 … … … 𝐾1𝑛


𝐾21 22 … … … 𝐾2𝑛
= ⋮ ⋱
⋮ ⋱
… … ⋱
𝐾𝑛1 𝐾𝑛2 … … … 2𝑛

For complex random variables, K=E[(X-)(X-)H], where H denotes the conjugate


transpose operation.
Machine Learning in Hindi

Properties of a covariance matrix


1) A covariance matrix K is a always a symmetric matrix.

For real-valued random variables, K=KT.


For complex-valued random variables, K is Hermitian, i.e., K=KH.

Verify if K21=K12?

2) A covariance matrix is at least positive semi-definite (p.s.d.).

3) A real symmetric covariance matrix K is similar to a diagonal matrix, i.e.,


U-1KU = ,
where the columns of U contain the ordered orthogonal unit eigenvectors 1, 2,…., n
of K.
Machine Learning in Hindi

Multidimensional Gaussian Law


𝑋1

Given 𝑿= • , where Xi’s are independent Gaussian random variables
• Xi= 𝒩(i, i2).
𝑋𝑛
The probability density function of X is given by
𝑛

𝑓𝑿 𝑥1 , … , 𝑥𝑛 = ෑ 𝑓𝑋𝑖 (𝑥𝑖 )
𝑖=1
In general, the probability density function of a Gaussian random vector X is given by
1 1
𝑓𝑿 𝑥1 , … , 𝑥𝑛 = 1/2 𝑛 /2
𝑒𝑥𝑝 − 𝐱 −  𝑇𝐊 −𝟏 (𝐱 − )
det 𝐊 2𝜋 2

where  = mean vector of X and K is the covariance matrix of X.


Machine Learning in Hindi

Some Theorems
Theorem: Let X be an n-dimensional normal random vector with a positive definite
covariance matrix K and a mean vector . Let A be an mxn matrix of rank m.

Then, Y=AX has an m-dimensional normal pdf with positive definite covariance matrix Q and
a mean vector  given by
Q= AKAT
= A

Theorem: Let X be an n-dimensional normal random vector with a positive definite


covariance matrix K and a mean vector . Let U be the matrix consisting of the eigenvectors of
K and Z be the diagonal matrix consisting of the corresponding eigenvalues at the diagonal of
Z. Then, Y defined as below is a normal random vector with mean vector A and consists of
uncorrelated Gaussian random variables of unit variance:
Y=AX,
-1/2
where A= Z U . T
Machine Learning in Hindi

Learning Objectives
● Definition of a random variable
● Types- discrete and continuous
● Probability Distribution Function (PDF) OR Cumulative Distribution Function (CDF)
○ Examples
● Probability Density Function (pdf)
● Examples of famous random variables
● Expectation & Variance
● Joint distribution and density functions
● Central and non-central moments
● Independence, uncorrelatedness, orthogonality of random variables
● Central limit theorem
● Random vector: mean vector, covariance matrix

Reference Text.: Stark, Henry, and John William Woods.


"Probability and random processes with applications to signal
processing." (2002).

You might also like