You are on page 1of 27

Seminar Presentation

Paper Code – MST482

Topic : Bayesian Statistics


Name – Nadashree Bose
Registration Number – 2248138

MISSION VISION CORE VALUES


CHRIST is a nurturing ground for an individual’s Excellence and Service Faith in God | Moral Uprightness
holistic development to make effective contribution to Love of Fellow Beings
the society in a dynamic environment Social Responsibility | Pursuit of Excellence
CHRIST
Deemed to be University

CONTENTS
1. Introduction to Bayesian Statistics
2. Key Terms
3. Example
4. Benefits of Bayesian Statistics
5. Key differences between Bayesian and Frequentist approaches
6. How to choose Prior and Posterior distribution
7. MCMC Methods for Sampling
8. Bayesian Decision Making
9. Loss Function and Minimizing Expected Loss
10. Applications of Bayesian Statistics

Excellence and Service


CHRIST
Deemed to be University

Basic Idea about Bayesian Statistics

● Bayesian statistics is a way of analyzing data and making predictions using probability
theory. It involves incorporating our prior knowledge or beliefs about a situation and
updating them with new data to get a better understanding of what's going on.

● In Bayesian statistics, we treat unknown quantities as random variables and assign


probability distributions to them. These distributions represent our beliefs about the
possible values these variables could take.

Excellence and Service


CHRIST
Deemed to be University

Definition and Key Concepts


Bayesian statistics is a branch of statistics that deals with the analysis and interpretation
of data using the principles of Bayesian inference. It is named after Thomas Bayes, an
18th-century British mathematician. At its core, Bayesian statistics uses probability theory
to model uncertainty and make probabilistic statements about the parameters or
hypotheses of interest.

● Bayesian methods are particularly useful in situations with limited data or complex
models, where prior knowledge can help provide stability and regularization.

● Bayesian statistics also allows for iterative learning, where the posterior distribution
obtained from one analysis can be used as the prior distribution for the next
analysis, incorporating new data as it becomes available.

Excellence and Service


CHRIST
Deemed to be University

Key Terms
● Prior Distribution: The prior distribution in Bayesian analysis represents the initial
belief or knowledge about uncertain parameter(s) before observing any data. It is a
probability distribution that describes the uncertainty in the parameters before
incorporating the observed data. The prior distribution can be subjective, based on
prior beliefs or expert knowledge, or objective, using non-informative or weakly
informative distributions.

● Likelihood: The likelihood function is a measure of how likely the observed data is
given the values of the unknown parameter(s). It quantifies the support that the
data provide for different values of the parameter(s). The likelihood is constructed
from the assumed statistical model that describes the relationship between the data
and the parameters.

● Posterior Distribution: The posterior distribution in Bayesian analysis represents the


updated belief about the uncertain parameter(s) after incorporating the prior
information and observed data. It combines the prior distribution and the likelihood
function to provide a complete probabilistic summary of the parameter(s) of
interest.

Excellence and Service


CHRIST
Deemed to be University

● Bayes' Theorem: Bayes' theorem is a fundamental equation in Bayesian inference. It


provides a formal expression for calculating the posterior distribution based on the
prior distribution, likelihood, and marginal likelihood (evidence).
The equation is as follows:

Likelihood
Prior
Posterior

Normalising Constant
● Updating Process: The Bayesian updating process involves starting with a prior
distribution, updating it with observed data using the likelihood function, and
obtaining the posterior distribution as the updated belief. This process can be
repeated iteratively, using the posterior distribution from one analysis as the prior
for the next analysis, incorporating new data as it becomes available.

Excellence and Service


CHRIST
Deemed to be University

Biasedness of a coin!
Example Let = proportion of heads

Let n = 1, heads = 1

𝑃 ( 𝑑𝑎𝑡𝑎|𝜃 )

𝑃 ( 𝑑𝑎𝑡𝑎|𝜃 ) 𝑃 (𝜃)
𝑃 ( 𝜃|𝑑𝑎𝑡𝑎 )=
𝑃 (𝑑𝑎𝑡𝑎)

Excellence and Service


CHRIST
Deemed to be University

Benefits of Bayesian Statistics:

● Incorporation of Prior Knowledge

● Uncertainty Quantification

● Flexibility in Model Complexity

● Interpretability

● Iterative Learning

● Regularization and Shrinkage

Excellence and Service


CHRIST
Deemed to be University

Key differences between Bayesian and frequentist


approaches.
● Prior Knowledge and Beliefs:
○ Bayesian: Bayesian statistics explicitly incorporates prior knowledge and beliefs into the
analysis. It starts with a prior distribution representing initial beliefs about the unknown
quantities.
○ Frequentist: Frequentist statistics does not incorporate prior knowledge. It relies solely on
the data observed and does not assign probabilities to parameters or hypotheses.

● Probability Interpretation:
○ Bayesian: In Bayesian statistics, probabilities can be interpreted as degrees of belief or
subjective probabilities. The probabilities can reflect our uncertainty about the true values
of parameters.
○ Frequentist: Frequentist statistics interprets probabilities as long-run frequencies. It
focuses on the probability of observing the data given that a specific hypothesis or model
is true.

Excellence and Service


CHRIST
Deemed to be University

● Parameter Estimation:
○ Bayesian: Bayesian statistics provides a posterior distribution that represents updated
beliefs about the unknown parameters after incorporating the data. The posterior
distribution summarizes the uncertainty in the parameter estimates.
○ Frequentist: Frequentist statistics typically provides point estimates of parameters, such
as maximum likelihood estimators, without directly quantifying uncertainty. Confidence
intervals are used to estimate the range of plausible parameter values.

● Hypothesis Testing:
○ Bayesian: Bayesian statistics evaluates hypotheses by comparing the posterior
probabilities of different hypotheses. The Bayes factor is commonly used to quantify the
strength of evidence in favor of one hypothesis over another.
○ Frequentist: Frequentist statistics uses hypothesis tests based on p-values. The p-value
measures the strength of evidence against a specific null hypothesis and assesses whether
the data are consistent with the null hypothesis.

Excellence and Service


CHRIST
Deemed to be University

● Sample Size and Replication:


○ Bayesian: Bayesian statistics can accommodate small sample sizes and easily handle cases
with limited data. It can also incorporate prior information to provide more stable
estimates.
○ Frequentist: Frequentist statistics often requires larger sample sizes for accurate
estimations and hypothesis testing. Replication of experiments is crucial for obtaining
reliable results.

● Iterative Learning:
○ Bayesian: Bayesian statistics naturally allows for iterative learning. The posterior
distribution obtained from one analysis can be used as the prior distribution for the next
analysis, updating beliefs as new data becomes available.
○ Frequentist: Frequentist statistics treats each analysis as independent and does not have a
built-in mechanism for incorporating previous knowledge

Excellence and Service


CHRIST
Deemed to be University

Summary Frequentist
• Focus is on the parameter, which is
assumed to be a fixed constant.
• Confidence intervals are read in terms of
repeated sampling.
“95% of similar sized intervals from repeated
samples of size n will contain

Bayesian
• Focus is on subjective probability, taking
into account a priori predictions.
• Credible intervals read in terms of
subjective uncertainty.
“There is a 95% chance that exists within the
interval”.

Excellence and Service


CHRIST
Deemed to be University

How to choose Prior Distribution?

● Prior Information: If you have prior knowledge or expert opinions about the
parameters, try to incorporate them into the prior distribution.

● Non-Informative Priors: In the absence of prior information, non-informative or


weakly informative priors can be used. These priors express minimal assumptions
and let the data have a influence on the posterior distribution.

● Sensitivity Analysis: It is essential to assess the sensitivity of the results to the choice
of prior distribution. Perform sensitivity analyses to explore how the priors influence
the posterior distribution and inference.

● Robustness: Consider using robust priors that are less sensitive to outliers or
extreme data points. Robust priors can help reduce the influence of influential
observations on the posterior results.

Excellence and Service


CHRIST
Deemed to be University

How to choose Posterior Distribution?

● It's important to note that the posterior distribution is not chosen but rather
calculated as the result of Bayesian inference. The posterior distribution is derived
from the combination of the prior distribution and the likelihood function using
Bayes' theorem.

● The posterior distribution is not directly chosen but is an outcome of the Bayesian
analysis.

Excellence and Service


CHRIST
Deemed to be University

MCMC Methods
● MCMC methods are powerful techniques used to sample from complex probability
distributions, particularly in Bayesian analysis.
● They provide a way to explore the posterior distribution of parameters, which may
be analytically intractable or computationally challenging.
● MCMC methods use the principles of Markov chains and Monte Carlo simulation to
generate a sequence of samples that approximate the desired distribution.

Illustration of MCMC Sampling Process


Visual representation of MCMC sampling process:
• Initialization: Starting point in the parameter space.
• Iterations: Sequentially moving through the parameter space, sampling points
based on the acceptance/rejection mechanism.
• Convergence: As the number of iterations increases, the chain converges to the
desired posterior distribution.

Excellence and Service


CHRIST
Deemed to be University

Metropolis-Hastings Algorithm

● The Metropolis-Hastings algorithm is one of the commonly used MCMC methods.

● It allows sampling from a target distribution by constructing a Markov chain with a


desired stationary distribution.

● The algorithm iteratively proposes new candidate states and accepts or rejects them
based on a defined acceptance probability.

Excellence and Service


CHRIST
Deemed to be University

Metropolis-Hastings Algorithm Steps


● Start with an initial state.
○ Choose an initial value for the parameters or variables of interest.
● Propose a new state by perturbing the current state according to a proposal distribution.
○ Generate a candidate state by sampling from a proposal distribution, often a symmetric
distribution centered around the current state.
● Calculate the acceptance probability based on the ratio of the target distribution evaluated at
the proposed state and the current state.
○ Evaluate the target distribution (e.g., the posterior distribution) at both the current and
proposed states.
○ Calculate the acceptance probability as the ratio of the proposed state's target
distribution value to the current state's target distribution value.
● Accept the proposed state with the acceptance probability; otherwise, stay in the current state.
○ Randomly accept or reject the proposed state based on the acceptance probability.
○ If accepted, transition to the proposed state; otherwise, remain in the current state.
● Repeat steps 2-4 for a predetermined number of iterations.
○ Continue generating new candidate states, calculating acceptance probabilities, and
updating the current state.

Excellence and Service


CHRIST
Deemed to be University

Gibbs Sampling

● Gibbs sampling is another widely used MCMC method, especially for problems with
multiple parameters.

● It allows sampling from the joint posterior distribution by iteratively sampling from
each conditional distribution while fixing other parameters.

Excellence and Service


CHRIST
Deemed to be University

Gibbs Sampling Steps

● Initialize the parameters with arbitrary values.


○ Assign initial values to each parameter of interest.
● Select one parameter and sample from its conditional distribution given the current
values of the other parameters.
○ Choose one parameter at a time and sample from its conditional distribution.
○ The conditional distribution depends on the other parameters and the
observed data.
● Repeat step 2 for each parameter, sampling one at a time.
○ Continue sampling for each parameter, moving through the parameter set in a
cyclic manner.
● Continue sampling iteratively until convergence is achieved or a sufficient number of
samples is obtained.
○ Repeat the process for a predetermined number of iterations or until
convergence criteria, such as stable posterior distributions, are met.

Excellence and Service


CHRIST
Deemed to be University

Benefits of MCMC Methods


● MCMC methods provide a flexible and efficient way to sample from complex
posterior distributions.

● They allow for exploration of high-dimensional parameter spaces and enable


estimation and inference even in challenging scenarios.

● MCMC methods have revolutionized Bayesian analysis and have found applications
in various fields.

Summary
● MCMC methods are used to sample from complex probability distributions.

● The Metropolis-Hastings algorithm and Gibbs sampling are common MCMC


techniques.

● MCMC allows exploration of posterior distributions in Bayesian analysis and enables


estimation, inference, and prediction.

Excellence and Service


CHRIST
Deemed to be University

Bayesian Decision Making


● To a Bayesian, the posterior distribution is the basis of any inference, since it
integrates both it’s prior opinions and knowledge and the new information provided
by the data. It also contains everything it believes about the distribution of the
unknown parameter of interest.

● However, the posterior distribution on its own is not always sufficient. Sometimes
the inference we want to express is a credible interval, because it indicates a range
of likely values for the parameter. And on other occasions, one needs to make a
single number guess about the value of the parameter. For example, you might want
to declare the average payoff for an insurance claim or tell a patient how much
longer he/she has to live.

● Therefore, the Bayesian perspective leads directly to decision theory. And in


decision theory, one seeks to minimize one’s expected loss.

Excellence and Service


CHRIST
Deemed to be University

Introduction to Loss Function


● A loss function quantifies the penalty or cost associated with incorrect predictions or
decisions in a statistical or machine learning model.
● It measures the discrepancy between predicted values and the true values or the
deviation from desired outcomes.
● Loss functions play a crucial role in model training, evaluation, and decision-making.

Types of Loss Functions


● Different types of loss functions are used based on the specific problem and the
nature of the data.
● Commonly used loss functions include:
○ Mean Squared Error (MSE)
○ Mean Absolute Error (MAE)
○ Binary Cross-Entropy Loss
○ Categorical Cross-Entropy Loss
○ Hinge Loss
○ Log Loss (Negative Log-Likelihood)

Excellence and Service


CHRIST
Deemed to be University

● Mean Squared Error (MSE)


o MSE is a commonly used loss function for regression problems.
o It measures the average squared difference between predicted and true values.
o The goal is to minimize the MSE to obtain a model that provides more accurate
predictions.

● Mean Absolute Error (MAE)


o MAE is another loss function for regression tasks.
o It measures the average absolute difference between predicted and true values.
o Unlike MSE, MAE is less sensitive to outliers and provides a more robust
measure of error.

● Binary Cross-Entropy Loss


o Binary Cross-Entropy Loss is often used in binary classification problems.
o It quantifies the dissimilarity between predicted probabilities and true binary
labels.
o The aim is to minimize the cross-entropy loss to achieve better classification
accuracy.
Excellence and Service
CHRIST
Deemed to be University
● Categorical Cross-Entropy Loss
o Categorical Cross-Entropy Loss is employed in multi-class classification problems.
o It computes the divergence between predicted class probabilities and true class
labels.
o Minimizing the cross-entropy loss helps in training models that assign accurate
class probabilities.

● Hinge Loss
o Hinge Loss is commonly used in Support Vector Machines (SVMs) for binary
classification.
o It penalizes misclassifications and encourages a margin between classes.
o Minimizing hinge loss helps in finding the optimal decision boundary.

● Log Loss (Negative Log-Likelihood)


o Log Loss is used in probabilistic classification tasks.
o It measures the discrepancy between predicted probabilities and true class labels.
o Minimizing log loss leads to models that provide more accurate probability
estimates.
Excellence and Service
CHRIST
Deemed to be University

Minimizing Expected Loss


● The goal of model training and decision-making is to minimize the expected loss.
● Expected loss is the average loss over all possible outcomes, weighted by their
probabilities.
● By minimizing the expected loss, we aim to find the optimal model or decision rule
that minimizes overall prediction or decision errors.

Minimizing Expected Loss Process


Minimizing expected loss involves:
• Selecting an appropriate loss function based on the problem.
• Optimizing model parameters or decision rules to minimize the chosen loss
function.
• Evaluating the model or decision rule based on the expected loss or
performance metrics.

Excellence and Service


CHRIST
Deemed to be University

Applications of Bayesian Statistics


Bayesian statistics has a wide range of applications across various fields. Here are some
notable applications of Bayesian statistics:
● Bayesian Inference

● Decision Making and Risk Analysis

● Machine Learning and Artificial Intelligence

● Epidemiology and Public Health

● Finance and Econometrics

● Environmental Science

● Genetics and Genomics

● Natural Language Processing and Text Analysis

Excellence and Service


CHRIST
Deemed to be University

THANK YOU

Excellence and Service

You might also like