Lecture 11 - Probabilistic ML (2) - Probability Basic Contd - Plain

Probabilistic Machine Learning (2):
Probability Basics (Contd)
CS771: Introduction to Machine Learning

Piyush Rai
2
Expectation
 Expectation of a random variable tells the expected or average value it takes
 Expectation of a discrete random variable having PMF

𝔼 [ 𝑋 ]= ∑ 𝑥𝑝 (𝑥)
Probability that
𝑥 ∈𝑆 𝑋
 Expectation of a continuous random variable having PDF
𝔼 [ 𝑋 ] =∫ 𝑥𝑝 ( 𝑥 ) 𝑑𝑥
Probability density at
Note that this exp. is w.r.t. the

distribution of the r.v.
𝑥 ∈ 𝑆𝑋
Often the subscript is omitted
 The definition applies to functions of r.v. too (e.g.., ) but do keep in mind the
underlying distribution
 Exp. is always w.r.t. the prob. dist. of the r.v. and often written as
CS771: Intro to ML
3
Expectation: A Few Rules
and need not be even
independent. Can be
discrete or continuous
 Expectation of sum of two r.v.’s:

 Proof is as follows
 Define
s.t. where and
Used the rule of marginalization

of joint dist. of two r.v.’s
CS771: Intro to ML
4
Expectation: A Few Rules (Contd) is a real-valued scalar
 Expectation of a scaled r.v.: and are real-valued scalars
 Linearity of expectation: and are arbitrary functions.
 (More General) Lin. of exp.:

 Exp. of product of two independent r.v.’s:
 Law of the Unconscious Statistician (LOTUS): Given an r.v. with a known
prob. dist. and another random variable for some function
Requires finding Requires only which we already have
𝔼 [ 𝑌 ] =𝔼 [ 𝑔 ( 𝑋 ) ] = ∑ 𝑦 𝑝 (𝑦) ¿ ∑ 𝑔 ( 𝑥 ) 𝑝 ( 𝑥) LOTUS also

𝑦 ∈𝑆 𝑌 𝑥 ∈𝑆 𝑋 applicable for
continuous r.v.’s
 Rule of iterated expectation: ]
CS771: Intro to ML
5
Variance and Covariance
 Variance of a scalar r.v. tells us about its spread around its mean value
var [ 𝑋 ] =𝔼 [ ( 𝑋 − 𝜇 ) ] =𝔼 [ 𝑋 ] − 𝜇
2 2 2
 Standard deviation is simply the square root is variance

 For two scalar r.v.’s and , the covariance is defined by
cov [ 𝑋 , 𝑌 ] =𝔼 [ { 𝑋 − 𝔼[ 𝑋 ]}{𝑌 − 𝔼[𝑌 ]}] =𝔼 [ 𝑋𝑌 ] − 𝔼 [X ]𝔼[Y ]
 For two vector r.v.’s and (assume column vec), the covariance matrix is defined by
cov [ 𝑋 , 𝑌 ] =𝔼 [ { 𝑋 − 𝔼[ 𝑋 ]}{𝑌 ⊤ − 𝔼 [𝑌 ⊤ ] }]=𝔼 [ 𝑋 𝑌 ⊤ ] −𝔼 [ X ] 𝔼 [𝑌 ⊤ ]
 Cov. of components of a vector r.v. :

 Note: The definitions apply to functions of r.v. too (e.g., Important result
 Note: Variance of sum of independent r.v.’s:

CS771: Intro to ML
6
Transformation of Random Variables
 Suppose be a linear function of a vector-valued r.v. ( is a matrix and is a vector, both
constants)
 Suppose and , then for the vector-valued r.v.
𝔼 [ 𝑌 ] =𝔼 [ 𝐴𝑋 +𝑏 ] = 𝐴 𝜇+𝑏
cov [ 𝑌 ] =cov [ 𝐴𝑋 +𝑏 ] = 𝐴 Σ 𝐴 ⊤
 Likewise, if be a linear function of a vector-valued r.v. ( is a vector and is a scalar,
both constants)
 Suppose and , then for the scalar-valued r.v.
𝔼 [ 𝑌 ] =𝔼 [ 𝑎 𝑋 +𝑏 ] =𝑎 𝜇 +𝑏
⊤ ⊤
var [ 𝑌 ] = var [ 𝑎 𝑋 +𝑏 ] = 𝑎 Σ 𝑎
⊤ ⊤
CS771: Intro to ML
7
Common Probability Distributions
Important: We will use these extensively to model data as well as parameters of models
 Some common discrete distributions and what they can model
 Bernoulli: Binary numbers, e.g., outcome (head/tail, 0/1) of a coin toss
 Binomial: Bounded non-negative integers, e.g., # of heads in coin tosses
 Multinomial/multinoulli: One of (>2) possibilities, e.g., outcome of a dice roll
 Poisson: Non-negative integers, e.g., # of words in a document
 Some common continuous distributions and what they can model
 Uniform: numbers defined over a fixed range
 Beta: numbers between 0 and 1, e.g., probability of head for a biased coin
 Gamma: Positive unbounded real numbers
 Dirichlet: vectors that sum of 1 (fraction of data points in different clusters)
 Gaussian: real-valued numbers or real-valued vectors
CS771: Intro to ML
8
Coming up next
 Probabilistic Modeling
 Basics of parameter estimation for probabilistic models
CS771: Intro to ML

Lecture 11 - Probabilistic ML (2) - Probability Basic Contd - Plain

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 11 - Probabilistic ML (2) - Probability Basic Contd - Plain

Uploaded by

Copyright:

Available Formats

Probabilistic Machine Learning (2):

Probability Basics (Contd)

CS771: Introduction to Machine Learning

 Expectation of a discrete random variable having PMF

Note that this exp. is w.r.t. the

 Expectation of sum of two r.v.’s:

Used the rule of marginalization

 Expectation of a scaled r.v.: and are real-valued scalars

 Linearity of expectation: and are arbitrary functions.

 (More General) Lin. of exp.:

𝔼 [ 𝑌 ] =𝔼 [ 𝑔 ( 𝑋 ) ] = ∑ 𝑦 𝑝 (𝑦) ¿ ∑ 𝑔 ( 𝑥 ) 𝑝 ( 𝑥) LOTUS also

 Rule of iterated expectation: ]

 Standard deviation is simply the square root is variance

 Cov. of components of a vector r.v. :

 Note: Variance of sum of independent r.v.’s:

You might also like