Professional Documents
Culture Documents
Abstract: We give suggestions for the presentation of research results from frequentist, information-theoretic, and
Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian
methods offer alternative approaches to data analysis and inference compared to traditionally used methods.
Guidance is lacking on the presentation of results under these alternative procedures and on non testing aspects
of classical frequentist methods of statistical analysis. Null hypothesis testing has come under intense criticism. We
recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false
anyway, or where the null hypothesis is of little interest to science or management.
Key words: AIC, Bayesian statistics, frequentist methods, information-theoretic methods, likelihood, publication
guidelines.
For many years, researchers have relied heavily search hypotheses (represented by models) and
on testing null hypotheses in the analysis of fish- several quantities can be computed to estimate a
eries and wildlife research data. For example, an formal strength of evidence for alternative
average of 42 ?-values testing the statistical signif- hypotheses. Finally, methods based on Bayes'
icance of null hypotheses was reported for arti- theorem have become useful in applied sciences
cles in The journal of Wildlife Management during due mostly to advances in computer technology
1994-1998 (Anderson et al. 2000). This analysis (Gelman et al. 1995).
paradigm has been challenged (Cherry 1998, Over the years, standard methods for present-
Johnson 1999), and alternative approaches have ing results from statistical hypothesis tests have
been offered (Burnham and Anderson 2001). evolved. The Wildlife Society (1995a,b), for
The simplest alternative is to employ a variety of example, addressed Type 11 errors, statistical
classical frequentist methods (e.g., analysis of power, and related issues. However, articles by
variance or covariance, or regression) that focus Cherry (1998),Johnson (1999), and Anderson et
on the estimation of effect size and measures of al. (2000) provide reason to reflect on how re-
its precision, rather than on statistical tests, ?-val- search results are best presented. Anderson et al.
ues and arbitrary, dichotomous statements about (2000) estimated that 47% of the ?-values report-
statistical significance or lack thereof. Estimated ed recently in The journal of Wildlife Management
effect sizes (e.g., the difference between the esti- were naked (i.e., only the ?-value is presented
mated treatment and control means) are the with a statement about its significance or lack of
results useful in future meta-analysis (Hedges significance, without estimated effect size or even
and Olkin 1985), while P-values are almost useless the sign of the difference being provided).
in these important syntheses. A second alterna- Reporting of such results provides no informa-
tive is relatively new and based on criteria that tion and is thus without meaning. Perhaps more
estimate Kullback-Leibler information loss (Kull- importantly, there are thousands of null hypothe-
back and Leibler 1951). These information-theo- ses tested and reported each year in biological
retic approaches allow a ranking of various re- journals that are clearly false on simple a priori
grounds Qohnson 1999). These are called "silly
nulls" and account for over 90% of the null
1
Employed by U.S. Geological Survey, Biological
hypotheses tested in Ecology and The Journal of
Resources Division. Wildlife Management (Anderson et al. 2000). We
2 E-mail: anderson@cnr.colostate.edu seem to be failing by addressing so many trivial
373
374 ANALYTICAL PRESENTATION SUGGESTIONS • Anderson et al. ]. Wild!. Manage. 65(3):2001
issues in theoretical and applied ecology. Articles usually only the direction of the effect is reported
that employ silly nulls and statistical tests of (a nearly naked P-value). The problem with
hypotheses known to be false severely retard naked and nearly naked P-values is that their
progress in our understanding of ecological sys- magnitude is often interpreted as indicative of
tems and the effects of management programs effect size. It is misleading to interpret that small
(O'Connor 2000). The misuse and overuse of P-values indicate large effect sizes because small
P-values is astonishing. Further, there is little P-values can also result from low variability or large
analogous guidance for authors to present results sample sizes. P-values are not a proper strength
of data analysis under the newer information-the- of evidence (Royall 1997, Se like et al. 2001).
oretic or Bayesian methods. We encourage authors to carefully consider
We suggest how to present results of data analy- whether the information they convey in the lan-
sis under each of these 3 statistical paradigms: guage of null hypothesis testing could be greatly
classical frequentist, information-theoretic, and improved by instead reporting estimates and
Bayesian. We make no recommendation on the measures of precision. Emphasizing estimation
choice of analysis; instead, we focus on sugges- over hypothesis testing in the reporting of the
tion's for the presentation of results of the data results of data analysis helps protect against the
analysis. We assume authors are familiar with the pitfalls associated with the failure to distinguish
analysis paradigm they have used; thus, we will between statistical significance and biological sig-
not provide introductory material here. nificance (Yoccoz 1991).
We do not recommend reporting test statistics
Frequentist Methods and P-values from observational studies, at least
Frequentist methods, dating back at least a cen- not without appropriate caveats (Sellke et al.
tury, are much more than merely test statistics 2001). Such results are suggestive rather than
and P-values. P..values resulting from statistical conclusive given the observational nature of the
tests of null hypotheses are usually of far less data. In strict experiments, these quantities can
value than estimates of effect size. Authors fre- be useful, but we still recommend a focus on the
quently report the results of null hypothesis tests estimation of effect size rather than on P-values
(test statistics, degrees of freedom, P-values) and their supposed statistical significance.
when-in less space-they could often report The computer output of many canned statistical
more complete, informative material conveyed packages contains numerous test statistics and
by estimates of effect size and their standard P..values, many of which are of little interest; report-
errors or confidence intervals (e.g., the effect of ing these values may create an aura of scientific
the neck collars on winter survival probability was objectivity when both the objectivity and substance
-0.036, SE= 0.012). are often lacking. We encourage authors to resist
The prevalence of testing null hypotheses that the temptation to report dozens of P-values only
are uninteresting (or even silly) is quite high. because these appear on computer output.
For example, Anderson et al. (2000) found an Do not claim to have proven the null hypothesis;
average of 6,188 P-values per year (1993-1997) in this is a basic tenet of science. If a test yields a non-
Ecology and 5,263 per year (1994-1998) in The significant P-value, it may not be unreasonable to
journal of Wildlife Management and suggested that state that "the test failed to reject the null hypothe-
these large frequencies represented a misuse and sis" or that "the results seem consistent with the null
overuse of null hypothesis testing methods. hypothesis" and then discuss Type I and 11 errors.
Johnson (1999) and Anderson et al. (2000) give However, these classical issues are not necessary
examples of null hypotheses tested that were when discussing the estimated effect size (e.g, "The
clearly of little biological interest, or were entire- estimated effect of the treatment was small," and
ly unsupported before the study was initiated. then give the estimate and a measure of precision).
We strongly recommend a substantial decrease in Do not report estimated test power after a sta-
the reporting of results of null hypothesis tests tistical test has been conducted and found to be
when the null is trivial or uninteresting. nonsignificant, as such post hoc power is not
Naked P-values (i.e., those reported without meaningful (Goodman and Berlin 1994). A priori
estimates of effect size, its sign, and a measure of power and sample size considerations are impor-
precision) are especially to be avoided. Nonpara- tant in planning an experimental design, but esti-
metric tests (like their parametric counterparts) mates of post hoc power should not be reported
are based on estimates of effect size, although (Gerard et al. 1998, Hoenig and Heisey 2001).
J. Wildl. Manage. 65(3):2001 ANALYfiCAL PRESENTATION SUGGESTIONS • Anderson et al. 375
Discussion section. Give estimates of the impor- in sufficient detail. In particular, MCMC requires
tant parameters (e.g., effect size) and measures diagnostics to indicate that the posterior distrib-
of precision (preferably a confidence interval). ution has been adequately estimated.
explanatory or predictor variables (see McCul- - - - , - - - , AN!l C. C. WHITE. 1994. AIC model
lagh and Nelder 1989:8). selection in overdispersed capture-recapture data.
Ecology 75:1780-1793.
Avoid confusing low frequencies with small
BAR:\FTr, V. 1999. Comparative statistical inference.
sample sizes. If one finds only 4 birds on 230 John Wiley & Sons, New York, USA.
plot~, the proportion of plots with birds can be Bt•RNIIA\1, K. P., AND D. R. AKDERSOK. !998. Model selec-
precisely estimated. Alternatively, if the birds are tion and inference: a practical inf(Jrmation-theoretic
the object of study, the 230 plots are irrelevant, approach. Springer-Verlag, New York, USA.
---,AND---. 2001. Kullback-Leibler informa-
and the sample size (4) is very small. tion as a basis hlr strong inference in ecological studies.
It is important to separate analysis of results Wildlife Research 28: in press.
based on questions and hypotheses formed C\Rt.IN, B. P., AND T. A. LEwiS. 1996. Bayes and empiri-
before examining the data from results found cal Bayes methods for data analysis. Chap man & Hall,
New York, USA.
after sequentially examining the results of data
CHAMilERt.IN, T. C. 1965 (1890). The method of multi-
analyses. The first approach tends to be more ple working hypotheses. Science 148:754-759.
confirmatory, while the second approach tends (Reprint of 1890 paper in Science.)
to be more exploratory. In particular, if the data CHERRY, S. 1998. Statistical tests in publications of The
analysis 'suggests a particular pattern leading to Wildlife Society. Wildlife Society Bulletin 26:947-953.
EI.I.ISON, A. M. 1996. An introduction of Bayesian
an interesting hypothesis then, at this midway inference for ecological research and environmen-
point, few statistical tests or measures of precision tal decision-making. Ecological Applications
remain valid (Lindsey i999a,b; White 2000). That 6:1036-1046.
is, an inference concerning patterns or hypothe- GELMAN, A.,j. B. CARLIN, H. S. STERN, AND D. B. RL'BIN.
1995. Bayesian data analysis. Chapman & Hall, New
ses as being an actual feature of the population
York, USA.
or process of interest are not well supported - - - , AND X.-L. MENG. 1996. Model checking and
(e.g., likely to be spurious). Conclusions reached model improvement. Pages 189-201 in W. R. Gilks, S.
after repeated examination of the results of prior Richardson, and D. J. Spiegelhalter, editors. Markov
analyses, while interesting, cannot be taken with Chain Monte Carlo in practice. Chapman & Hall,
New York, USA.
the same degree of confidence as those from the GERARD, P. D., D. R. SMITH, AND G. WF.ERAKKODY. 1998.
more confirmatory analysis. However, these post Limit' of retrospective power analysis. Journal of
hoc results often represent intriguing hypotheses Wildlife Management 62:801-807.
to be readdressed with a new, independent set of GEYER, C.J. 1992. Practical Markov chain Monte Carlo
(with discussion). Statistical Science 7:473-503.
data. Thus, as part of the Introduction, authors
GooDMAN, S. N., ANDJ. A. BERLIN. 1994. The use ofprt!-
should note the degree to which the study was dicted confidence intervals when planning experi-
exploratory versus confirmatory. Provide infor- ment' and the misuse of power when interpreting
mation concerning any post hoc analyses in the results. Annals of Internal Medicine 121:200-206.
Discussion section. HEDGES, L. V., AND I. 0I.KIN. 1985. Statistical methods
for meta-analysis. Academic Press, London, United
Statistical approaches are increasingly impor- Kingdom.
tant in many areas of applied science. The field HoF.NIG, J. M., AND D. M. HEISEY. 2001. The abuse of
of statistics is a science, with new discoveries lead- power: the pervasive fallacy of power calculation for
ing to changing paradigms. New methods some- data analysis. The American Statistician 55:19-24.
JoHNSON, D. H. 1999. The insignificance of statistical
times require new ways of effectively reporting
significance testing. Journal of Wildlife Management
results. We should be able to evolve as progress is 63:763-772.
made and changes are necessary. We encourage Kui.l.llACK, S., AND R. A. LEIBLER. 1951. On information
wildlife researchers and managers to capitalize and sufficiency. Annals of Mathematical Statistics
on modern methods and to suggest how the 22:79-86.
LEE, P. M. 1997. Bayesian statistics, an introduction.
results from such methods might be best pre- Second edition. John Wiley & Sons, New York, USA.
sented. We hope our suggestions will be viewed LIANG, K-Y, AND P. McCUI.!AGH. 1993. Case studies in
as constructive. binary dispersion. Biometrics 49:623-630.
LINDSEY, J. K. 1999a. Some statistical heresies. The Sta-
tistician 48:1-40.
LITERATURE CITED - - - . 1999b. Revealing statistical principles. Oxford
ANDERSON, D. R., K. P. BURNHAM, W. R. Gol'l.D, AND S. University Press, New York, USA.
CHERRY. 2001. Concerns about finding effect~ that are McCUI.IAGH, P., AND J. A. NELDER. 1989. Generalized
actually spurious. Wildlife Society Bulletin 29:311-316. linear models. Chapman & Hall, London, United
- - - , - - - , AND W. L. THOMPSON. 2000. Null Kingdom.
hypothesis testing: problems, prevalence, and an O'CONNOR, R.J. 2000. Why ecology lags behind biolo-
alternative. Journal of Wildlife Management gy. The Scientist 14(20):35.
64:912-923. RoYALL, R. M. 1997. Statistical evidence: a likelihood
378 ANALYTICAL PRESENTATION SUGGESTIONS • Anderson et al. J. Wild!. Manage. 65(3):2001
paradigm. Chapman & Hall, London, United King- biology. Conservation Biology 14:1308-1316.
dom. WHITE, G. C., D. R. ANDERSON, K. P. BVRNHAM, AND D. L.
RuBIN, D. B. 1984. Bayesianly justifiable and relevant Ons. 1982. Capture-recapture and removal meth-
frequency calculations for the applied statistician. ods for sampling closed populations. Los Alamos
Annals of Statistics 12:1151-1172. National Laboratory Report LA-8787-NERP, Los
ScHMIIT, S. A. 1969. Measuring uncertainty: an ele- Alamos, New Mexico, USA.
mentary introduction to Bayesian statistics. Addison- WHITE, H. 2000. A reality check for data snooping.
Wesley, Reading, Massachusetts, USA. Econometrica 68:1097-1126.
ScHWARZ, G. 1978. Estimating the dimension of a THE WILDI.IFE SOCIETY. 1995a. Journal news. Journal of
model. Annals of Statistics 6:461-464. Wildlife Management 59:196-198.
SEI.I.KE, T., M. j. BAYARRI, AND j. 0. BERGER. 2001. Cali- - - - . 1995b. Journal news. Journal ofWildlife Man-
bration of pvalues for testing precise null hypotheses. agement 59:630.
The American Statistician 55:62-71. Yoccoz, N. G. 1991. Use, overuse, and misuse of sig-
TuiTE, E. R. 1983. The visual display of quantitative infor- nificance tests in evolutionary biology and ecology.
mation. Graphic Press, Cheshire, Connecticut, USA. Bulletin of the Ecological Society of America
WADE, P. R. 2000. Bayesian methods in conservation 72: 106-111.