Professional Documents
Culture Documents
Bayesian statistics II
Probability
Probability
0.3
0.25
Probability density
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
0.3
0.25
Probability density
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
Likelihood
p(A)∗ p(B | A)
p(A | B) =
p(B) Priors
Posterior
p(D | H1 )
BF =
p(D | H 0 )
The history of Bayes factors
• Developed by Jeffreys (1935).
• Called “significance tests”.
• This alternative was not really appreciated until
about the 1990s.
• They are still “catching on”
What do Bayes Factors give us?
• They quantify the strength of the evidence in favor
of a hypothesis, given the data.
• Can be used for model comparison (which model is
more likely, given the data).
Deriving the Bayes factor
Strength of evidence for H1 or H0, given they are
equally likely a priori, from Lee & Wagenmakers, 2013
Example: Determining the most likely effectiveness of
a medication using maximum likelihood estimation
Bayes theorem also allows us to assess
the veracity of the published literature
• What is the probability that a published result
is actually true?
• If we set the significance level α to 0.05, does
this mean that the false positive rate in the
field is 5%?
• No!
• This is a common misconception.
• It also depends on the prior probability of a
true effect in a given field as well as the
statistical power of the study.
Introducing PPV (Ioannidis, 2005):
• PPV = “positive predictive value” – *post* study
probability that a result is true:
• R: Ratio of true to false effects in a field.
• PPV links power, alpha and R:
PPV = (1− β )R / (R − β R + α )
PPV = (1− β )R / ((1− β )R + α )
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG 1-β α 1-β + α
TOTAL 1 1 2
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG NT (1-β) NFα NT (1-β) + NF α
NONSIG NTβ NF (1-α) NT β + NF (1-α)
TOTAL NT NF NT+NF=c
R = NT/NF R NF=c - NF R = NT/NF NT = cR/(R+1)
NT = R NF R NF + NF= c R/(R+1) = (NT/NF)/c/NF
NT = c - NF R + 1 = c/NF R/(R+1) = NT/c NF = c/(R+1)
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG NT (1-β) NFα NT (1-β) + NF α
NONSIG NTβ NF (1-α) NT β + NF (1-α)
TOTAL NT NF NT+NF=c
R = NT/NF R NF=c - NF R = NT/NF NT = cR/(R+1)
NT = R NF R NF + NF= c R/(R+1) = (NT/NF)/c/NF
NT = c - NF R + 1 = c/NF R/(R+1) = NT/c NF = c/(R+1)
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG c(1-β)R/(R+1) cα/(R+1) c(R(1-β)+α)/(R+1)
NONSIG cβR/(R+1) c(1-α) /(R+1) c(1 + Rβ-α)/(R+1)
TOTAL cR/(R+1) c/(R+1) c
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG c(1-β)R/(R+1) cα/(R+1) c(R(1-β)+α)/(R+1)
NONSIG cβR/(R+1) c(1-α) /(R+1) c(1 + Rβ-α)/(R+1)
TOTAL cR/(R+1) c/(R+1) c
PPV = p(effect true | significant)= p(true)/p(sig)p(sig|true)
p(true) = NT/c = R/(R+1)
p(sig|true) = 1-β
p(sig) = p(sig|true)p(true)+p(sig|false)p(false)
p(sig) = 1-βR/(R+1)+αNF/c
p(sig) = 1-βR/(R+1)+α(R+1)
PPV = R/(R+1)/1-βR/(R+1)+α(R+1)1-β
PPV = (1-β)R/((1-β)R+α)
A study converts the prior odds of the
effect being true (R) into the posterior –
PPV, as a function of statistical power:
Confidence interval vs. credible interval
• Frequentist approach: The value of a parameter 𝜽 is
unknown, but fixed.
• We can estimate it by taking samples from the
population. This yields a (tychenic) distribution of
sample means, which we can use to calculate the CI.
• Example: You are an epidemiologist and want to
estimate the prevalence of Herpes Simplex in the
population using a frequentist approach:
𝜽: The prevalence of Herpes simplex in
the population
0.14
0.12
0.1
0.08
Proportion
0.06
0.04
0.02
0
0.44 0.46 0.48 0.5 0.52 0.54 0.56
In a Bayesian framework, 𝜽 is a RV and has
a probability distribution
• This probability distribution corresponds to our
degree of belief.
• If this is our prior belief about the value of the
parameter, this is called the prior distribution.
• The prior distribution can take any shape. A
completely flat prior distribution is called an
“uninformative prior”.
• The sharper the prior distribution, the more
informative it is.
• Often used to model the prior distribution of a
proportion: The Beta distribution
Prior distributions of varying degrees of
informativeness:
So what?
• In Bayesian analysis, we can use the data from a
study (which yields the likelihood) in combination
with the prior distribution to compute a posterior
distribution:
• Posterior = Prior x Likelihood
• In something like Herpes simplex, we could model
the likelihood with a binominal distribution (how
many people infected, as a proportion of sample
size:
• p(𝜽 | y) = p(𝜽) x p (y | 𝜽)
• Data = y
Example: We are epidemiologists and want to
know the likely value of 𝜽 in a certain location.
• We have a somewhat informative prior from the
literature (say a Beta distribution with α = β = 10).
• We take a local sample of 100 people and see how
many are infected with the virus.
• This yields a posterior distribution of 𝜽:
What does our study yield?
n = 20 n = 20 n = 20
n = 20 n = 20 n = 20