Professional Documents
Culture Documents
MST482 - Seminar Presentation
MST482 - Seminar Presentation
CONTENTS
1. Introduction to Bayesian Statistics
2. Key Terms
3. Example
4. Benefits of Bayesian Statistics
5. Key differences between Bayesian and Frequentist approaches
6. How to choose Prior and Posterior distribution
7. MCMC Methods for Sampling
8. Bayesian Decision Making
9. Loss Function and Minimizing Expected Loss
10. Applications of Bayesian Statistics
● Bayesian statistics is a way of analyzing data and making predictions using probability
theory. It involves incorporating our prior knowledge or beliefs about a situation and
updating them with new data to get a better understanding of what's going on.
● Bayesian methods are particularly useful in situations with limited data or complex
models, where prior knowledge can help provide stability and regularization.
● Bayesian statistics also allows for iterative learning, where the posterior distribution
obtained from one analysis can be used as the prior distribution for the next
analysis, incorporating new data as it becomes available.
Key Terms
● Prior Distribution: The prior distribution in Bayesian analysis represents the initial
belief or knowledge about uncertain parameter(s) before observing any data. It is a
probability distribution that describes the uncertainty in the parameters before
incorporating the observed data. The prior distribution can be subjective, based on
prior beliefs or expert knowledge, or objective, using non-informative or weakly
informative distributions.
● Likelihood: The likelihood function is a measure of how likely the observed data is
given the values of the unknown parameter(s). It quantifies the support that the
data provide for different values of the parameter(s). The likelihood is constructed
from the assumed statistical model that describes the relationship between the data
and the parameters.
Likelihood
Prior
Posterior
Normalising Constant
● Updating Process: The Bayesian updating process involves starting with a prior
distribution, updating it with observed data using the likelihood function, and
obtaining the posterior distribution as the updated belief. This process can be
repeated iteratively, using the posterior distribution from one analysis as the prior
for the next analysis, incorporating new data as it becomes available.
Biasedness of a coin!
Example Let = proportion of heads
Let n = 1, heads = 1
𝑃 ( 𝑑𝑎𝑡𝑎|𝜃 )
𝑃 ( 𝑑𝑎𝑡𝑎|𝜃 ) 𝑃 (𝜃)
𝑃 ( 𝜃|𝑑𝑎𝑡𝑎 )=
𝑃 (𝑑𝑎𝑡𝑎)
● Uncertainty Quantification
● Interpretability
● Iterative Learning
● Probability Interpretation:
○ Bayesian: In Bayesian statistics, probabilities can be interpreted as degrees of belief or
subjective probabilities. The probabilities can reflect our uncertainty about the true values
of parameters.
○ Frequentist: Frequentist statistics interprets probabilities as long-run frequencies. It
focuses on the probability of observing the data given that a specific hypothesis or model
is true.
● Parameter Estimation:
○ Bayesian: Bayesian statistics provides a posterior distribution that represents updated
beliefs about the unknown parameters after incorporating the data. The posterior
distribution summarizes the uncertainty in the parameter estimates.
○ Frequentist: Frequentist statistics typically provides point estimates of parameters, such
as maximum likelihood estimators, without directly quantifying uncertainty. Confidence
intervals are used to estimate the range of plausible parameter values.
● Hypothesis Testing:
○ Bayesian: Bayesian statistics evaluates hypotheses by comparing the posterior
probabilities of different hypotheses. The Bayes factor is commonly used to quantify the
strength of evidence in favor of one hypothesis over another.
○ Frequentist: Frequentist statistics uses hypothesis tests based on p-values. The p-value
measures the strength of evidence against a specific null hypothesis and assesses whether
the data are consistent with the null hypothesis.
● Iterative Learning:
○ Bayesian: Bayesian statistics naturally allows for iterative learning. The posterior
distribution obtained from one analysis can be used as the prior distribution for the next
analysis, updating beliefs as new data becomes available.
○ Frequentist: Frequentist statistics treats each analysis as independent and does not have a
built-in mechanism for incorporating previous knowledge
Summary Frequentist
• Focus is on the parameter, which is
assumed to be a fixed constant.
• Confidence intervals are read in terms of
repeated sampling.
“95% of similar sized intervals from repeated
samples of size n will contain
Bayesian
• Focus is on subjective probability, taking
into account a priori predictions.
• Credible intervals read in terms of
subjective uncertainty.
“There is a 95% chance that exists within the
interval”.
● Prior Information: If you have prior knowledge or expert opinions about the
parameters, try to incorporate them into the prior distribution.
● Sensitivity Analysis: It is essential to assess the sensitivity of the results to the choice
of prior distribution. Perform sensitivity analyses to explore how the priors influence
the posterior distribution and inference.
● Robustness: Consider using robust priors that are less sensitive to outliers or
extreme data points. Robust priors can help reduce the influence of influential
observations on the posterior results.
● It's important to note that the posterior distribution is not chosen but rather
calculated as the result of Bayesian inference. The posterior distribution is derived
from the combination of the prior distribution and the likelihood function using
Bayes' theorem.
● The posterior distribution is not directly chosen but is an outcome of the Bayesian
analysis.
MCMC Methods
● MCMC methods are powerful techniques used to sample from complex probability
distributions, particularly in Bayesian analysis.
● They provide a way to explore the posterior distribution of parameters, which may
be analytically intractable or computationally challenging.
● MCMC methods use the principles of Markov chains and Monte Carlo simulation to
generate a sequence of samples that approximate the desired distribution.
Metropolis-Hastings Algorithm
● The algorithm iteratively proposes new candidate states and accepts or rejects them
based on a defined acceptance probability.
Gibbs Sampling
● Gibbs sampling is another widely used MCMC method, especially for problems with
multiple parameters.
● It allows sampling from the joint posterior distribution by iteratively sampling from
each conditional distribution while fixing other parameters.
● MCMC methods have revolutionized Bayesian analysis and have found applications
in various fields.
Summary
● MCMC methods are used to sample from complex probability distributions.
● However, the posterior distribution on its own is not always sufficient. Sometimes
the inference we want to express is a credible interval, because it indicates a range
of likely values for the parameter. And on other occasions, one needs to make a
single number guess about the value of the parameter. For example, you might want
to declare the average payoff for an insurance claim or tell a patient how much
longer he/she has to live.
● Hinge Loss
o Hinge Loss is commonly used in Support Vector Machines (SVMs) for binary
classification.
o It penalizes misclassifications and encourages a margin between classes.
o Minimizing hinge loss helps in finding the optimal decision boundary.
● Environmental Science
THANK YOU