You are on page 1of 6

Bayes factor

From Wikipedia, the free encyclopedia

(Redirected from Bayesian model selection)
In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing.
Bayesian model comparison is a method of model selection based on Bayes factors.
The posterior probability Pr(M|D) of a model M given data D is given by Bayes' theorem:
The key data-dependent term Pr(D|M) is a likelihood, and represents the probability that some
data are produced under the assumption of this model, M; evaluating it correctly is the key to
Bayesian model comparison.
Given a model selection problem in which we have to choose between two models, on the
basis of observed data D, the plausibility of the two different models M
and M
, parametrised
by model parameter vectors and is assessed by the Bayes factor K given by
If instead of the Bayes factor integral, the likelihood corresponding to the maximum likelihood
estimate of the parameter for each model is used, then the test becomes a classical
likelihood-ratio test. Unlike a likelihood-ratio test, this Bayesian model comparison does not
depend on any single set of parameters, as it integrates over all parameters in each model
(with respect to the respective priors). However, an advantage of the use of Bayes factors is
that it automatically, and quite naturally, includes a penalty for including too much model
It thus guards against overfitting. For models where an explicit version of the
likelihood is not available or too costly to evaluate numerically, approximate Bayesian
computation can be used for model selection in a Bayesian framework,
with the caveat that
approximate-Bayesian estimates of Bayes factors are often biased.
Other approaches are:
to treat model comparison as a decision problem, computing the expected value or cost
of each model choice;
to use minimum message length (MML).
Bayes factor - Wikipedia, the free encyclopedia
1 of 6 10/09/2014 1:47 PM
A value of K > 1 means that M
is more strongly supported by the data under consideration
than M
. Note that classical hypothesis testing gives one hypothesis (or model) preferred
status (the 'null hypothesis'), and only considers evidence against it. Harold Jeffreys gave a
scale for interpretation of K:
K dB bits Strength of evidence
< 1:1 < 0 Negative (supports M
1:1 to 3:1 0 to 5 0 to 1.6 Barely worth mentioning
3:1 to 10:1 5 to 10 1.6 to 3.3 Substantial
10:1 to 30:1 10 to 15 3.3 to 5.0 Strong
30:1 to 100:1 15 to 20 5.0 to 6.6 Very strong
> 100:1 > 20 > 6.6 Decisive
The second column gives the corresponding weights of evidence in decibans (tenths of a
power of 10); bits are added in the third column for clarity. According to I. J. Good a change in
a weight of evidence of 1 deciban or 1/3 of a bit (i.e. a change in an odds ratio from evens to
about 5:4) is about as finely as humans can reasonably perceive their degree of belief in a
hypothesis in everyday use.
An alternative table, widely cited, is provided by Kass and Raftery (1995):
2 ln K K Strength of evidence
0 to 2 1 to 3 Not worth more than a bare mention
2 to 6 3 to 20 Positive
6 to 10 20 to 150 Strong
>10 >150 Very strong
The use of Bayes factors or classical hypothesis testing takes place in the context of inference
rather than decision-making under uncertainty. That is, we merely wish to find out which
hypothesis is true, rather than actually making a decision on the basis of this information.
Frequentist statistics draws a strong distinction between these two because classical
hypothesis tests are not coherent in the Bayesian sense. Bayesian procedures, including
Bayes factors, are coherent, so there is no need to draw such a distinction. Inference is then
simply regarded as a special case of decision-making under uncertainty in which the resulting
action is to report a value. For decision-making, Bayesian statisticians might use a Bayes
factor combined with a prior distribution and a loss function associated with making the wrong
choice. In an inference context the loss function would take the form of a scoring rule. Use of
a logarithmic score function for example, leads to the expected utility taking the form of the
KullbackLeibler divergence.
Bayes factor - Wikipedia, the free encyclopedia
2 of 6 10/09/2014 1:47 PM
Suppose we have a random variable that produces either a success or a failure. We want to
compare a model M
where the probability of success is q = , and another model M
q is completely unknown and we take a prior distribution for q which is uniform on [0,1]. We
take a sample of 200, and find 115 successes and 85 failures. The likelihood can be
calculated according to the binomial distribution:
So we have
The ratio is then 1.197..., which is "barely worth mentioning" even if it points very slightly
towards M
This is not the same as a classical likelihood ratio test, which would have found the maximum
likelihood estimate for q, namely

= 0.575, and used that to get a ratio of 0.1045...
(rather than averaging over all possible q), and so pointing towards M
. Alternatively,
Edwards's "exchange rate" of two units of likelihood per degree of freedom suggests that
is preferable (just) to , as and : the extra likelihood
compensates for the unknown parameter in .
A frequentist hypothesis test of (here considered as a null hypothesis) would have
produced a more dramatic result, saying that M
could be rejected at the 5% significance
level, since the probability of getting 115 or more successes from a sample of 200 if q = is
0.0200..., and as a two-tailed test of getting a figure as extreme as or more extreme than 115
is 0.0400... Note that 115 is more than two standard deviations away from 100.
is a more complex model than M
because it has a free parameter which allows it to model
the data more closely. The ability of Bayes factors to take this into account is a reason why
Bayesian inference has been put forward as a theoretical justification for and generalisation of
Occam's razor, reducing Type I errors.
See also
Bayes factor - Wikipedia, the free encyclopedia
3 of 6 10/09/2014 1:47 PM
Akaike information criterion
Approximate Bayesian Computation
Deviance information criterion
Model selection
Schwarz's Bayesian information criterion
Wallace's Minimum Message Length (MML)
Statistical ratios
Odds ratio
Relative risk
^ Goodman S (1999). "Toward evidence-based medical statistics. 1: The P value fallacy"
( (PDF). Ann Intern Med 130 (12): 9951004.
doi:10.7326/0003-4819-130-12-199906150-00008 (
/10.7326%2F0003-4819-130-12-199906150-00008) . PMID 10383371
( .
^ Goodman S (1999). "Toward evidence-based medical statistics. 2: The Bayes factor"
( (PDF). Ann Intern Med 130 (12): 100513.
doi:10.7326/0003-4819-130-12-199906150-00019 (
/10.7326%2F0003-4819-130-12-199906150-00019) . PMID 10383350
( .

Robert E. Kass and Adrian E. Raftery (1995). "Bayes Factors" (
/user/kk3n/simplicity/KassRaftery1995.pdf) . Journal of the American Statistical Association 90
(430): 791.
^ Toni, T.; Stumpf, M.P.H. (2009). "Simulation-based model selection for dynamical systems in
systems and population biology" (
(PDF). Bioinformatics 26 (1): 10410. doi:10.1093/bioinformatics/btp619 (
/10.1093%2Fbioinformatics%2Fbtp619) . PMC 2796821 (
/pmc/articles/PMC2796821) . PMID 19880371 (
^ Robert, C.P., J. Cornuet, J. Marin and N.S. Pillai (2011). "Lack of confidence in approximate
Bayesian computation model choice" (
/25/1102900108.short) . Proceedings of the National Academy of Sciences 108 (37):
1511215117. doi:10.1073/pnas.1102900108 ( .
PMID 21876135 ( .
^ H. Jeffreys (1961). The Theory of Probability ( 6.

Bayes factor - Wikipedia, the free encyclopedia
4 of 6 10/09/2014 1:47 PM
printsec=frontcover#v=onepage&q&f=false) (3 ed.). Oxford. p. 432
^ Good, I.J. (1979). "Studies in the History of Probability and Statistics. XXXVII A. M. Turing's
statistical work in World War II". Biometrika 66 (2): 393396. doi:10.1093/biomet/66.2.393
( . MR 82c:01049 (
/mathscinet-getitem?mr=82c%3A01049) .
^ Sharpening Ockham's Razor On a Bayesian Strop (
Gelman, A.; Carlin, J.; Stern, H.; Rubin, D. (1995). Bayesian Data Analysis. London:
Chapman and Hall. ISBN 0-412-03991-5.
Bernardo, J.; Smith, A. F. M. (1994). Bayesian Theory. New York: John Wiley.
ISBN 0-471-92416-4.
Lee, P. M. (1989). Bayesian Statistics. Arnold. ISBN 0-85264-298-9.
Denison, D. G. T.; Holmes, C. C.; Mallick, B. K.; Smith, A. F. M. (2002). Bayesian
Methods for Nonlinear Classification and Regression. New York: John Wiley.
ISBN 0-471-49036-9.
Duda, Richard O.; Hart, Peter E.; Stork, David G. (2000). "Section 9.6.5". Pattern
classification (2nd ed.). Wiley. pp. 487489. ISBN 0-471-05669-3.
Chapter 24 in Probability Theory The logic of science
( by E. T. Jaynes, 1994.
David J.C. MacKay (2003) Information theory, inference and learning algorithms, CUP,
ISBN 0-521-64298-1, (also available online (
/itila/book.html) )
Winkler, Robert (2003). Introduction to Bayesian Inference and Decision (2nd ed.).
Probabilistic. ISBN 0-9647938-4-9.
External links
Web-based Bayes-factor calculator for t-tests, regression designs, and binomially
distributed data (
BayesFactor, an R package for computing Bayes factors in common research designs
The on-line textbook: Information Theory, Inference, and Learning Algorithms
( , by David J.C. MacKay, discusses
Bayesian model comparison in Chapter 28, p343.
Retrieved from ""

Bayes factor - Wikipedia, the free encyclopedia
5 of 6 10/09/2014 1:47 PM
Categories: Bayesian inference Model selection Statistical ratios
This page was last modified on 22 July 2014 at 13:44.
Text is available under the Creative Commons Attribution-ShareAlike License; additional
terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy.
Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit
Bayes factor - Wikipedia, the free encyclopedia
6 of 6 10/09/2014 1:47 PM