Professional Documents
Culture Documents
LECTURE 2
Probability Theory
𝑝 𝑋 = 𝑏 𝑌 = 𝑔 =? 𝑟 𝑏
𝑋
Probability
• Probability density of a continuous variable
- B
𝑝 𝑥 𝜖 𝑎, 𝑏 = < 𝑝 𝑥 𝑑𝑥 𝑝(𝑥) ≥ 0 < 𝑝 𝑥 𝑑𝑥 = 1
> CB
F
𝑃 𝑦 = < 𝑝 𝑥 𝑑𝑥 𝑝(𝐱) ≥ 0 H 𝑝 𝐱 𝑑𝐱 = 1
CB
𝐸𝑓 𝑥 =
𝑝 𝑥 = < 𝑝 𝑥, 𝑦 𝑑𝑦 𝑝 𝑥, 𝑦 = 𝑝 𝑦 𝑥 𝑝(𝑥)
2 𝑓 𝑥 𝑝(𝑥)
K
𝐸K 𝑓 𝑥, 𝑦 = 𝐸K|F 𝑓 𝑥, 𝑦 |𝑦 =
𝐸𝑓 𝑥 = < 𝑓 𝑥 𝑝 𝑥 𝑑𝑥
< 𝑓 𝑥, 𝑦 𝑝 𝑥, 𝑦 𝑑𝑥 < 𝑓 𝑥, 𝑦 𝑝 𝑥|𝑦 𝑑𝑥
• Prior probability is not an arbitrary choice, reflects common sense (or uninformative)
𝑝 𝐷 = H 𝑝 𝐷 𝐱 𝑝 𝐱 𝑑𝐱
Bayesian vs. Frequentist
Bayesian Frequentist
Likelihood fixed data, random parameters random data, fixed parameters
training data: training + validation data:
Model Selection evidence cross validation (may be
(Occam’s razor) computationally cumbersome)
Regularization naturally provided by prior needs additional penalty
(prevents overfitting)
Accuracy naturally provided by posterior needs additional techniques
(quality evaluation) (confidence interval, bootstrap)
• Bayesian prior may not be realistic, but more and more training data decreases the effect of prior
• Advances in computational power, as well as techniques for computing posterior & marginal
(e.g., sampling techniques such as MCMC, and approximate inference such as variational Bayes)
promote Bayesian approach, enable its use in Big Datasets.
Gaussian Distribution
Gaussian Mean and Variance
Multivariate Gaussian
Gaussian Parameter Estimation
Likelihood function
Maximum (Log) Likelihood
Properties of and