You are on page 1of 11

EEL

6935 Data Analytics

LECTURE 2

Probability Theory

Jan. 11, 2018


Uncertainty & Probability
• Uncertainty in data:
• inherent in the observed physical process (e.g., voltage measurement in power grid, # customers in a market)
• noise in measurement (e.g., hardware/software limitations)
• finite data size (i.e., lack of access to the entire population)
• Probability:
• a consistent framework for quantification and manipulation of uncertainty
• helps in decision making (e.g., our brains)
• Frequentist probability ~ frequency of observations
• marginal probability joint probability conditional probability
𝑝 𝑌 = 𝑜 = 𝑛/ /𝑛 𝑝 𝑋 = 𝑟, 𝑌 = 𝑔 = 𝑛)* /𝑛 𝑝 𝑌 = 𝑔 𝑋 = 𝑏 = 𝑛-* /𝑛-

• Sum rule: 𝑝 𝑋 = 2 𝑝(𝑋, 𝑌) 𝑛) 𝑛-


4
• Product rule: 𝑝 𝑋, 𝑌 = 𝑝 𝑌 𝑋 𝑝(𝑋)
𝑔 𝑛)* 𝑛-* 𝑛*
𝑝(𝑋, 𝑌) 𝑝 𝑌 𝑋 𝑝(𝑋) 𝑌
• Bayes Theorem: 𝑝 𝑋 𝑌 = =
𝑛)/
𝑝(𝑌) ∑7 𝑝 𝑌 𝑋 𝑝(𝑋) o 𝑛-/ 𝑛/

𝑝 𝑋 = 𝑏 𝑌 = 𝑔 =? 𝑟 𝑏
𝑋
Probability
• Probability density of a continuous variable
- B
𝑝 𝑥 𝜖 𝑎, 𝑏 = < 𝑝 𝑥 𝑑𝑥 𝑝(𝑥) ≥ 0 < 𝑝 𝑥 𝑑𝑥 = 1
> CB
F
𝑃 𝑦 = < 𝑝 𝑥 𝑑𝑥 𝑝(𝐱) ≥ 0 H 𝑝 𝐱 𝑑𝐱 = 1
CB

𝐸𝑓 𝑥 =
𝑝 𝑥 = < 𝑝 𝑥, 𝑦 𝑑𝑦 𝑝 𝑥, 𝑦 = 𝑝 𝑦 𝑥 𝑝(𝑥)
2 𝑓 𝑥 𝑝(𝑥)
K

𝐸K 𝑓 𝑥, 𝑦 = 𝐸K|F 𝑓 𝑥, 𝑦 |𝑦 =
𝐸𝑓 𝑥 = < 𝑓 𝑥 𝑝 𝑥 𝑑𝑥
< 𝑓 𝑥, 𝑦 𝑝 𝑥, 𝑦 𝑑𝑥 < 𝑓 𝑥, 𝑦 𝑝 𝑥|𝑦 𝑑𝑥

𝑉𝑎𝑟 𝑓 𝐶𝑜𝑣 𝑥, 𝑦 𝐶𝑜𝑣 𝐱, 𝐲


O = 𝐸K,F 𝑥 − 𝐸 𝑥 𝑦−𝐸 𝑦 𝑻
=𝐸 𝑓 𝑥 −𝐸 𝑓 𝑥 = 𝐸𝐱,𝐲 𝐱 − 𝐸 𝐱 𝐲−𝐸 𝐲
O O
=𝐸 𝑓 𝑥 −𝐸 𝑓 𝑥 = 𝐸K,F 𝑥𝑦 − 𝐸 𝑥 𝐸 𝑦 = 𝐸𝐱,𝐲 𝐱𝐲 𝑻 − 𝐸 𝐱 𝐸 𝐲 T
Bayesian Probability
• Classical/Frequentist interpretation of probability ~ frequencies of repeatable events
• Bayesian probability ~ a quantification of uncertainty
• repeatable and non-repeatable events, e.g., the probability of a dragon flying through the window
• update with evidence, e.g., it is shown that there exist dragons in Florida, and there are small ones
that can fit through a window.
𝑝 𝐷 𝐱 𝑝(𝐱)
𝑝 𝐱𝐷 =
𝑝(𝐷)

posterior α likelihood x prior

• Prior probability is not an arbitrary choice, reflects common sense (or uninformative)

• Challenge: for predictions and model comparison, marginalization typically difficult!

𝑝 𝐷 = H 𝑝 𝐷 𝐱 𝑝 𝐱 𝑑𝐱
Bayesian vs. Frequentist
Bayesian Frequentist
Likelihood fixed data, random parameters random data, fixed parameters
training data: training + validation data:
Model Selection evidence cross validation (may be
(Occam’s razor) computationally cumbersome)
Regularization naturally provided by prior needs additional penalty
(prevents overfitting)
Accuracy naturally provided by posterior needs additional techniques
(quality evaluation) (confidence interval, bootstrap)

• Bayesian prior may not be realistic, but more and more training data decreases the effect of prior

• Advances in computational power, as well as techniques for computing posterior & marginal
(e.g., sampling techniques such as MCMC, and approximate inference such as variational Bayes)
promote Bayesian approach, enable its use in Big Datasets.
Gaussian Distribution
Gaussian Mean and Variance
Multivariate Gaussian
Gaussian Parameter Estimation

Likelihood function
Maximum (Log) Likelihood
Properties of and

You might also like