IDS22Bayes Applications

Introduction to Data Science
Bayesian statistics II
Applications of Bayes theorem

The concept of probability
in the two frameworks
• Frequentist conception: Relative frequency of the
outcome of interest as a proportion of the whole
sample space, in the long run. (“Objective”
probability)
• Bayesian conception: Degree of belief, plausibility.

Beliefs are constantly updated with new
information. (“Subjective” probability)
Probability vs. Likelihood: Probability
Probability
p( z > 1.65 | normal distribution with mean = 0, SD = 1) = 0.05

Probability vs. Likelihood: Probability
Probability
p( z > 1.96 | normal distribution with mean = 0, SD = 1) = 0.025

Probability vs. Likelihood: Likelihood
0.35
0.3
0.25
Probability density
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
L(normal distribution with mean = 0, SD = 1.3 | x = 2) = 0.094

Probability vs. Likelihood: Likelihood
0.35
0.3
0.25
Probability density
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
L(normal distribution with mean = 1, SD = 1.3 | x = 2) = 0.2283

Maximum likelihood estimates:
Introducing Bayes Factors
• NHST has a somewhat convoluted (yet solid) logic:
Assuming that something makes no difference, then
showing that the observed results are unlikely, given
that assumption. Then rejecting this assumptions on
these grounds. And concluding that there probably was
a difference after all.
• Also somewhat weak: What is the difference?
• Bayes Factors provide an alternative/complement to
classical null hypothesis significance testing that
addresses both of these issues:
• p(D|H) à p (H|D)
• Bayes Theorem: Allows to invert conditional
probabilities if (and only if) we know the prior
probabilities of the hypotheses.
The Bayes Factor
!(#! |%) !(%|#! ) !(#! )
= ×
!(#" |%) !(%|#" ) !(#" )
Posterior Bayes Prior

odds Factor odds
So what are Bayes factors?
• In essence, likelihood ratios:
Likelihood
p(A)∗ p(B | A)
p(A | B) =
p(B) Priors
Posterior
p(D | H1 )
BF =
p(D | H 0 )
The history of Bayes factors
• Developed by Jeffreys (1935).
• Called “significance tests”.
• This alternative was not really appreciated until
about the 1990s.
• They are still “catching on”
What do Bayes Factors give us?
• They quantify the strength of the evidence in favor
of a hypothesis, given the data.
• Can be used for model comparison (which model is
more likely, given the data).
Deriving the Bayes factor
Strength of evidence for H1 or H0, given they are
equally likely a priori, from Lee & Wagenmakers, 2013
Example: Determining the most likely effectiveness of
a medication using maximum likelihood estimation
Bayes theorem also allows us to assess
the veracity of the published literature
• What is the probability that a published result
is actually true?
• If we set the significance level α to 0.05, does
this mean that the false positive rate in the
field is 5%?
• No!
• This is a common misconception.
• It also depends on the prior probability of a
true effect in a given field as well as the
statistical power of the study.
Introducing PPV (Ioannidis, 2005):
• PPV = “positive predictive value” – *post* study
probability that a result is true:
• R: Ratio of true to false effects in a field.
• PPV links power, alpha and R:
PPV = (1− β )R / (R − β R + α )
PPV = (1− β )R / ((1− β )R + α )
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG 1-β α 1-β + α
NONSIG β 1-α β + 1-α
TOTAL 1 1 2
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG NT (1-β) NFα NT (1-β) + NF α
NONSIG NTβ NF (1-α) NT β + NF (1-α)
TOTAL NT NF NT+NF=c
R = NT/NF R NF=c - NF R = NT/NF NT = cR/(R+1)
NT = R NF R NF + NF= c R/(R+1) = (NT/NF)/c/NF
NT = c - NF R + 1 = c/NF R/(R+1) = NT/c NF = c/(R+1)
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG NT (1-β) NFα NT (1-β) + NF α
NONSIG NTβ NF (1-α) NT β + NF (1-α)
TOTAL NT NF NT+NF=c
R = NT/NF R NF=c - NF R = NT/NF NT = cR/(R+1)
NT = R NF R NF + NF= c R/(R+1) = (NT/NF)/c/NF
NT = c - NF R + 1 = c/NF R/(R+1) = NT/c NF = c/(R+1)
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG c(1-β)R/(R+1) cα/(R+1) c(R(1-β)+α)/(R+1)
NONSIG cβR/(R+1) c(1-α) /(R+1) c(1 + Rβ-α)/(R+1)
TOTAL cR/(R+1) c/(R+1) c
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG c(1-β)R/(R+1) cα/(R+1) c(R(1-β)+α)/(R+1)
NONSIG cβR/(R+1) c(1-α) /(R+1) c(1 + Rβ-α)/(R+1)
TOTAL cR/(R+1) c/(R+1) c
PPV = p(effect true | significant)= p(true)/p(sig)p(sig|true)
p(true) = NT/c = R/(R+1)
p(sig|true) = 1-β
p(sig) = p(sig|true)p(true)+p(sig|false)p(false)
p(sig) = 1-βR/(R+1)+αNF/c
p(sig) = 1-βR/(R+1)+α(R+1)
PPV = R/(R+1)/1-βR/(R+1)+α(R+1)1-β
PPV = (1-β)R/((1-β)R+α)
A study converts the prior odds of the
effect being true (R) into the posterior –
PPV, as a function of statistical power:
Confidence interval vs. credible interval
• Frequentist approach: The value of a parameter 𝜽 is
unknown, but fixed.
• We can estimate it by taking samples from the
population. This yields a (tychenic) distribution of
sample means, which we can use to calculate the CI.
• Example: You are an epidemiologist and want to
estimate the prevalence of Herpes Simplex in the
population using a frequentist approach:
𝜽: The prevalence of Herpes simplex in
the population
0.14
0.12
0.1
0.08
Proportion
0.06
0.04
0.02
0
0.44 0.46 0.48 0.5 0.52 0.54 0.56
In a Bayesian framework, 𝜽 is a RV and has
a probability distribution
• This probability distribution corresponds to our
degree of belief.
• If this is our prior belief about the value of the
parameter, this is called the prior distribution.
• The prior distribution can take any shape. A
completely flat prior distribution is called an
“uninformative prior”.
• The sharper the prior distribution, the more
informative it is.
• Often used to model the prior distribution of a
proportion: The Beta distribution
Prior distributions of varying degrees of
informativeness:
So what?
• In Bayesian analysis, we can use the data from a
study (which yields the likelihood) in combination
with the prior distribution to compute a posterior
distribution:
• Posterior = Prior x Likelihood
• In something like Herpes simplex, we could model
the likelihood with a binominal distribution (how
many people infected, as a proportion of sample
size:
• p(𝜽 | y) = p(𝜽) x p (y | 𝜽)
• Data = y
Example: We are epidemiologists and want to
know the likely value of 𝜽 in a certain location.
• We have a somewhat informative prior from the
literature (say a Beta distribution with α = β = 10).
• We take a local sample of 100 people and see how
many are infected with the virus.
• This yields a posterior distribution of 𝜽:
What does our study yield?
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Posterior = Prior x Likelihood

How do we get a new estimate of 𝜽
from the credible interval?
The credible interval is the area under the
posterior distribution:
𝜽 = 0.373, at 95% credibility
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

The relative strength of the prior distribution to
determine the location and shape of the
posterior distribution at fixed likelihood:
The impact of larger samples (stronger
likelihood) on the posterior distribution at
a fixed prior
At small sample sizes, the prior matters a lot:
n = 20 n = 20 n = 20
0 0.5 1 0 0.5 1 0 0.5 1
n = 20 n = 20 n = 20
0 0.5 1 0 0.5 1 0 0.5 1
n = 20 n = 20 n = 20
0 0.5 1 0 0.5 1 0 0.5 1

At large sample sizes, the prior doesn’t
matter much:
n = 500 n = 500 n = 500
0 0.5 1 0 0.5 1 0 0.5 1
n = 500 n = 500 n = 500
0 0.5 1 0 0.5 1 0 0.5 1
n = 500 n = 500 n = 500
0 0.5 1 0 0.5 1 0 0.5 1

IDS22Bayes Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IDS22Bayes Applications

Uploaded by

Copyright:

Available Formats

Introduction to Data Science

Applications of Bayes theorem

• Bayesian conception: Degree of belief, plausibility.

p( z > 1.65 | normal distribution with mean = 0, SD = 1) = 0.05

p( z > 1.96 | normal distribution with mean = 0, SD = 1) = 0.025

L(normal distribution with mean = 0, SD = 1.3 | x = 2) = 0.094

L(normal distribution with mean = 1, SD = 1.3 | x = 2) = 0.2283

Posterior Bayes Prior

NONSIG β 1-α β + 1-α

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Posterior = Prior x Likelihood

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.5 1 0 0.5 1 0 0.5 1

0 0.5 1 0 0.5 1 0 0.5 1

0 0.5 1 0 0.5 1 0 0.5 1

0 0.5 1 0 0.5 1 0 0.5 1

n = 500 n = 500 n = 500

0 0.5 1 0 0.5 1 0 0.5 1

n = 500 n = 500 n = 500

0 0.5 1 0 0.5 1 0 0.5 1

You might also like