You are on page 1of 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228658202

Update on statistics guidelines for American Physiological Society journals

Article  in  AJP Advances in Physiology Education · December 2007


DOI: 10.1152/advan.00096.2007

CITATIONS READS

0 195

1 author:

Dee Silverthorn
University of Texas at Austin
56 PUBLICATIONS   794 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Sourcebook of Laboratory Activities in Physiology View project

Physiology Education View project

All content following this page was uploaded by Dee Silverthorn on 08 June 2015.

The user has requested enhancement of the downloaded file.


Adv Physiol Educ 31: 294, 2007;
Editorial doi:10.1152/advan.00096.2007.

Update on statistics guidelines for American Physiological Society journals


Dee U. Silverthorn
Section of Integrative Biology, University of Texas, Austin, Texas

THREE YEARS AGO, American Physiological Society (APS) jour- authors to less prescriptive journals. The outcome of the
nals published an editorial by Curran-Everett and Benos with discussion was the decision to publish the followup article,
suggested guidelines for reporting statistics in research articles along with invited commentaries, in Advances in Physiology
(1). The authors then tracked whether their guidelines were Education, in the hope of further educating authors about the
implemented and prepared a followup report (2). Their request rationale for appropriate statistical reporting. The sequel arti-
to have the sequel published in all APS journals met with cle, commentaries, and closing comments by Curran-Everett
unexpected resistance, however, and engendered considerable and Benos appear in this issue of Advances in Physiology
discussion, both formal and informal, at the March 2007 Education.
Editors’ Meeting. Some editors fully supported the guidelines,
but others had reservations about the recommendations, feeling REFERENCES
that they might be viewed as a mandate and thereby drive 1. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in jour-

Downloaded from ajpadvan.physiology.org on December 6, 2007


nals published by the American Physiological Society. Adv Physiol Educ
28: 85– 87, 2004.
Address for reprint requests and other correspondence: D. U. Silverthorn, 2. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in jour-
Section of Integrative Biology, Univ. of Texas, 1 University Station, C0930, nals published by the American Physiological Society: the sequel. Adv
Austin, TX 78712 (E-mail: silverthorn@mail.utexas.edu). Physiol Educ; doi:10.1152/advan.00022.2007.

294 1043-4046/07 $8.00 Copyright © 2007 The American Physiological Society


Adv Physiol Educ 31: 295–298, 2007;
doi:10.1152/advan.00022.2007. Perspectives

Guidelines for reporting statistics in journals published by the American


Physiological Society: the sequel
Douglas Curran-Everett1,2,3 and Dale J. Benos4
1
Division of Biostatistics, National Jewish Medical and Research Center, and Departments of 2Preventive Medicine and
Biometrics and 3Physiology and Biophysics, School of Medicine, University of Colorado at Denver and Health Sciences
Center, Denver, Colorado; and 4Department of Physiology and Biophysics, University of Alabama, Birmingham, Alabama

WE SCIENTISTS rely on statistics. In part, this is because we use If statisticians groused about the guidelines, they never
statistics to report our own science and to interpret the pub- groused to us.
lished science of others. For some of us, reporting and inter- Good guidelines . . . Quite reasonable without being overly
preting statistics can be like reading an unfamiliar language: it fussy. The interpretation of P values is much more sensible
is awkward to do, and it is easy to misinterpret meaning. To than one often gets with medical journals, where P ⬍ 0.05 is all
facilitate these tasks, in 2004, we wrote an editorial (8) in they care about.
which we proposed specific guidelines to help investigators Statistician

Downloaded from ajpadvan.physiology.org on December 6, 2007


analyze data and communicate results. These guidelines for Every now and then, a biologist did grouse:
reporting statistics can be accessed through the American
I do not agree with the edict about presenting data as
Physiological Society (APS) Information for Authors (4). [standard deviations] rather than [standard errors of the mean].
In this followup editorial, we report initial reactions to the These presentations are for visual effect only . . . To me, this
guidelines and the subsequent impact of the guidelines on edict is silly, particularly since showing [standard deviations
reporting practices. We also revisit three guidelines. In 2004, rather than standard errors of the mean] is a cosmetic issue
we hoped the guidelines would improve and standardize the only.
caliber of statistical information reported throughout journals Physiologist
published by the APS. We still do. One biologist was moved to write a Letter to the Editor (15).
For the most part, when someone did complain about the
Initial Reactions guidelines, it was
Initial reactions to the guidelines were mixed. What we Guideline 5. Report variability using a standard deviation.
heard, however, was quite positive. about which they complained.
This is really very helpful indeed. I wish all journals would
adopt this as standard. Subsequent Impact
Physiologist1
Within a year, the Editor of Clinical and Experimental
We were delighted that many of the people who congratu- Pharmacology and Physiology had solicited a critique of the
lated— even thanked— us were statisticians. guidelines (18). The critique set the stage for the journal to
I have just read the guidelines published in Physiological revise its own guidelines for reporting statistics (J. Ludbrook,
Genomics and wish to congratulate you on a nice job! personal communications).2
[About] 6 years ago I reviewed a paper for an APS jour- In May 2006, we received an email from Australia that
nal . . . I offered several trivial biological suggestions and then hinted the guidelines had little immediate impact on reporting
asked [the authors] to report the [standard deviation] rather than practices:
[standard error] to represent the variability about their sample
Many of us were pleased to see this article come out,
mean. The authors adopted all of the biological suggestions but
particularly in a major, trend-setting publication. Early opti-
rejected the one statistical critique stating that it was standard
mism faded as authors and editors did not appear to want to
policy to report the [standard error] and their colleagues all
reinforce your message. . .
expected it.
Most of my colleagues [report standard errors and P ⬍ 0.05]
I am quite certain that these guidelines will prompt much
and, when questioned, tell me that they are using the appropri-
grousing from the biologists about being too theoretical and
ate method: that is what I will find in the journals, that is what
unnecessary, as well as from the statisticians that you left out
their predecessors used, they know what it means (so why don’t
the rules nearest and dearest to their hearts.
I?), and in any case [the result is obvious].
Statistician
Physiologist
To estimate the actual impact of the guidelines on reporting
practices, we sampled original articles published from August

Address for reprint requests and other correspondence: D. Curran-Everett,


Div. of Biostatistics, M222, National Jewish Medical and Research Center,
1400 Jackson St., Denver, CO 80206 (e-mail: EverettD@njc.org).
1 2
Because these comments reflect unsolicited personal correspondence, we It has: see http://www.blackwellpublishing.com/
have elected to withhold the names. submit.asp?ref⫽0305-1870&site⫽1.

1043-4046/07 $8.00 Copyright © 2007 The American Physiological Society 295


Perspectives
296

Table 1. American Physiological Society journal manuscripts in 1996, 2003, and 2006: reporting of statistics

n Standard Deviation Standard Error Confidence Interval Precise P Value

1996 2003 2006 1996 2003 2006 1996 2003 2006 1996 2003 2006 1996 2003 2006

Am J Physiol
Cell Physiol 43 30 322 21 20 19 88 73 78 0 0 1 7 13 13
Endocrinol Metab 28 28 302 18 7 13 86 89 87 0 4 2 4 39* 30
Gastrointest Liver Physiol 26 28 272 8 25 24 92 79 77 0 0 2 4 14 17
Heart Circ Physiol 60 62 627 17 23 22 87 76 77 0 5 1 10 19 20
Lung Cell Mol Physiol 25 26 261 20 19 22 84 88 81* 0 0 2 4 19 18
Regul Integr Comp Physiol 41 29 384 17 10 12 88 90 90 0 0 1 15 41* 27
Renal Physiol 27 25 289 15 12 15 93 80 79 0 4 1 7 4 17*
J Appl Physiol 62 57 519 24 39 35 79 67 65 0 7* 4 6 26* 34
J Neurophysiol 58 61 699 36 23 30 69 64 57 2 5 6 5 30* 38

Values are percentages of journals that reported numbers of manuscripts reviewed (n), standard deviation, standard error of the mean, and precise P value;
for example, P ⫽ 0.03 (rather than P ⬍ 0.05) or P ⫽ 0.12 (rather than P ⬎ 0.05). In 1996, these journals published a total of 3693 original articles; the number
of articles reviewed represents a 10% sample (systematic random sampling, fixed start) of the original articles published by each journal (9). From August 2003
through July 2004, these journals published a total of roughly 3500 original articles; the number of articles reviewed represents a 10% sample (systematic random
sampling, fixed start) of the original articles published by each journal. From August 2005 through July 2006, these journals published a total of 3675 original

Downloaded from ajpadvan.physiology.org on December 6, 2007


articles; the number of articles reviewed represents a complete survey of the original articles published by each journal. *Italicized pairs indicate values that differ
statistically from 2003 to 2006 (0.05 ⱕ P ⱕ 0.10, exact binomial probability, 1-tailed) and from 1996 to 2003 (0.001 ⱕ P ⱕ 0.05, Fisher’s exact test, 2-tailed).

2003 through July 2004, the year before the guidelines, and we The 2004 Guidelines Revisited
reviewed all original articles published from August 2005
through July 2006, the second year after the guidelines. If the These guidelines addressed the reporting of statistics in the
guidelines affected the reporting of statistics, we expected the RESULTSsection of a manuscript:
incidence of standard errors to decrease and the incidence of Guideline 5. Report variability using a standard deviation.
standard deviations, confidence intervals, and precise P values
to increase. Guideline 6. Report uncertainty about scientific importance
What did our literature review reveal? That the guidelines using a confidence interval.
had virtually no impact on the occurrence of standard errors,
Guideline 7. Report a precise P value.
standard deviations, confidence intervals, and precise P values
(Table 1). There were two exceptions: in one journal, the use Our 2004 editorial (8) summarized the theoretic rationale for
of standard errors decreased from 88% to 81%; in another each of these guidelines. The subsequent publication of Scien-
journal, the use of precise P values increased from 4% to 17%. tific Style and Format (6) reinforced their application. Because

Fig. 1. The distinction between standard deviation


and standard error of the mean. Suppose random
variable Y is distributed normally with mean ␮ ⫽ 0
and standard deviation ␴ ⫽ 1 (inset). If you draw
from this population random samples with sizes that
range from 5 to 10,000 observations, for each sam-
ple you can estimate the mean, the standard devia-
tion, and the standard error of the mean (see Eq. 1).
As sample size increases, the sample standard de-
viation fluctuates more tightly about the population
standard deviation ␴. If sample size is infinite–if the
entire population is sampled–then the sample stan-
dard deviation s will equal the population standard
deviation ␴. As sample size increases, the standard
error of the mean decreases progessively because of
its dependence on sample size. If sample size is
infinite, then the standard error of the mean will be
0, and the sample mean y៮ will equal the population
mean ␮: there is no uncertainty about the true value
of the population mean.

Advances in Physiology Education • VOL 31 • DECEMBER 2007


Perspectives
297

they offer the greatest benefit to authors and readers of RESULTS Table 2. Guidelines for rounding P values to sufficient
sections, we revisit each of these guidelines. precision
Guideline 5. Report variability using a standard deviation.3
The distinction between standard deviation and standard error P Value Range Rounding Precision
of the mean is far more than cosmetic: it is an essential one. 0.01 ⱕ P ⱕ 1.00 Round to 2 decimal places: round P ⫽ 0.031 to
These statistics estimate different things: a standard deviation P ⫽ 0.03
estimates the variability among individual observations in a 0.001 ⱕ P ⱕ 0.009 Round to 3 decimal places: round P ⫽ 0.0066 to
sample, but a standard error of the mean estimates the theo- P ⫽ 0.007
P ⬍ 0.001 Report as P ⬍ 0.001; more precision is unnecessary
retical variability among sample means (8, 9).
Individual observations in a sample differ because the pop-
ulation from which they were drawn is distributed over a range Guideline 6. Report uncertainty about scientific importance
of possible values. The study of this intrinsic variability is using a confidence interval.4 A confidence interval focuses
important: it may reveal something novel about underlying attention on the magnitude and uncertainty of an experimental
scientific processes (12). The standard deviation describes the result. In essence, a confidence interval helps answer the
variability among the observations we investigators measure; it question, is the experimental effect big enough to be rele-
characterizes the dispersion of sample observations about the vant? A confidence interval is a strong tool for inference: it
sample mean. provides the same statistical information as the P value from
In contrast, the standard error of the mean provides an a hypothesis test, it circumvents the drawbacks inherent to a

Downloaded from ajpadvan.physiology.org on December 6, 2007


answer to a theoretical question: If I repeat my experiment an hypothesis test, and it provides information about scientific
infinite number of times, by how much will the possible sample importance (9).
means vary about the population mean? In biomedical research, the routine use of confidence inter-
The fundamental difference between standard deviation and vals is recommended (1–3, 8, 9, 13), and, in clinical medicine,
standard error of the mean is reflected further in how these the use of confidence intervals is indeed widespread (2, 3). In
statistics are defined. The standard deviation s and the standard journals published by APS, however, the incidence of confidence
error of the mean SE {y៮ } are intervals was as rare in 2006 as it was in 1996 (see Table 1).

冑冘
Guideline 7. Report a precise P value.5 In 2004 we wrote
共y i ⫺ y៮ 兲 2
s⫽ and SE兵y៮ 其 ⫽ s/ 冑n , (1) that a precise P value communicates more information with the
n⫺1 same amount of ink, and it permits readers to assess a statistical
result according to their own criteria (8). You would think most
where n is the number of observations in the sample, yi is an
authors would report a precise P value. Most do not (see Table
individual observation, and y៮ is the sample mean. Because it
1).
incorporates information about sample size, the standard error
On occasion, an author who reported a precise P value did so
of the mean is a misguided estimate of variability among
with unnecessary precision:
observations (Fig. 1 and Ref. 8). By itself, the standard error of
the mean has no particular value (9). Even with a large sample P ⫽ 0.030711 or P ⫽ 10⫺31 .
size (n ⫽ 35), the interval
The guidelines for rounding P values to sufficient precision are
关 y៮ ⫺ SE兵y៮ 其, y៮ ⫹ SE 兵y៮ 其兴 (2) listed in Table 2.
is just a 68% confidence interval. In other words, we can The Next Step
declare, with modest 68% confidence, that the population mean
is included in the interval [y៮ ⫺ SE{y៮ }, y៮ ⫹ SE{y៮ }]. All of us–authors, reviewers, editors– gravitate toward the
To summarize, in most experiments, it is essential that an familiar. For most of us, reporting standard deviations, confi-
investigator reports on the variability among the actual indi- dence intervals, and precise P values is quite unfamiliar. To
vidual measurements. A sample standard deviation does this: it make matters worse, these reporting practices have been unfa-
describes the variability among the actual experimental mea- miliar for decades. There is considerable inertia to overcome
surements. On the other hand, if an investigator were to repeat before standard deviations, confidence intervals, and precise P
an experiment many times, and each time calculate a sample values become commonplace in journals published by the APS.
mean, the average of those sample means will be the popula- Reform is difficult (5, 7, 10, 11, 14, 16). The question is,
tion mean; the standard deviation of those sample means will how can we help it along? In part, the answer is that all of
be the standard error of the mean (9). A standard error is us–authors, reviewers, editors, Publications Committee–must
simply the standard deviation of a statistic: here, the sample make a concerted effort to use and report statistics in ways that
mean. In nearly all experiments, however, a single sample are consistent with best practices. With these guidelines (8), we
mean is computed. Therefore, it is inappropriate to report a summarized the best practices of statistics.
standard error of the mean–a theoretical estimate of the vari- In 2004, we hoped that the guidelines would improve and
ability of possible values of a sample mean about a population standardize the caliber of statistical information reported
mean–as an estimate of the variability among actual experi- throughout journals published by the APS. It is clear that the
mental measurements.
4
This guideline is mirrored in Scientific Style and Format Section 12.5.2.2
of Ref. 6.
3 5
This guideline is mirrored in Scientific Style and Format Sections 12.5.2.3 This guideline is mirrored in Scientific Style and Format Section 12.5.1.2
and 12.5.2.4 of Ref. 6. of Ref. 6.

Advances in Physiology Education • VOL 31 • DECEMBER 2007


Perspectives
298

mere publication of the guidelines failed to impact reporting 8. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in
practices. We still have an opportunity. journals published by the American Physiological Society. Am J Physiol
Heart Circ Physiol 287: H447–H449, 2004. http://ajpheart.physiology.
ACKNOWLEDGMENTS org/cgi/content/full/287/2/H447.
9. Curran-Everett D, Taylor S, Kafadar K. Fundamental concepts in
We thank Margaret Reich (Director of Publications and Executive Editor, statistics: elucidation and illustration. J Appl Physiol 85: 775–786, 1998.
American Physiological Society) for a 1-yr print subscription to APS journals, http://jap.physiology.org/cgi/content/full/85/3/775.
and we thank Matthew Strand (National Jewish Medical and Research Center, 10. Fidler F, Cumming G, Thomason N, Pannuzzo D, Smith J, Fyffe P,
Denver, CO) and the Editors of the American journals for comments and Edmonds H, Harrington C, Schmitt R. Toward improved statistical
suggestions. reporting in the Journal of Consulting and Clinical Psychology. J Consult
Clin Psychol 73: 136 –143, 2005.
REFERENCES 11. Fidler F, Thomason N, Cumming G, Finch S, Leeman J. Editors can
1. Altman DG. Statistics in medical journals: developments in the 1980s. lead researchers to confidence intervals, but can’t make them think:
Stat Med 10: 1897–1913, 1991. statistical reform lessons from medicine. Psychol Sci 15: 119 –126, 2004.
2. Altman DG. Statistics in medical journals: some recent trends. Stat Med 12. Fisher RA. Statistical Methods for Research Workers. New York: Hafner,
19: 3275–3289, 2000. 1954, p 3.
3. Altman DG, Machin D, Bryant TN, Gardner MJ. Statistics with 13. International Committee of Medical Journal Editors. Uniform Re-
Confidence. Bristol: BMJ Books, 2000. quirements for Manuscripts Submitted to Biomedical Journals. IV. Manu-
4. American Physiological Society. Instructions for Preparing Your Manu- script Preparation and Submission (online). http://www.icmje.org/index.
script. Manuscript Sections (online). http://www.the-aps.org/publications/ html#manuscript [14 April 2007].
i4a/prep_manuscript.htm#manuscript_sections [16 April 2007]. 14. Kendall PC. Editorial. J Consult Clin Psychol 65: 3–5, 1997.

Downloaded from ajpadvan.physiology.org on December 6, 2007


5. Belia S, Fidler F, Williams J, Cumming G. Researchers misunderstand 15. Koehnle T, Curran-Everett D, Benos DJ. The proof is not in the P value.
confidence intervals and standard error bars. Psychol Methods 10: 389 – Am J Physiol Regul Integr Comp Physiol 288: R777–R778, 2005. http://
396, 2005. ajpregu.physiology.org/cgi/content/full/288/3/R777.
6. Council of Science Editors, Style Manual Subcommittee. Scientific 16. La Greca AM. Editorial. J Consult Clin Psychol 73: 3–5, 2005.
Style and Format: the CSE Manual for Authors; Editors; and Publishers 17. Lang TA, Secic M. How to Report Statistics in Medicine: Annotated
(7th ed.). Reston, VA: Rockefeller Univ. Press, 2006. Guidelines for Authors; Editors; and Reviewers. Philadelphia, PA: Amer-
7. Cumming G, Fidler F, Leonard M, Kalinowski P, Christiansen A, ican College of Physicians, 1997.
Kleinig A, Lo J, McMenamin N, Wilson S. Statistical reform in psy- 18. Ludbrook J. Comments on journal guidelines for reporting statistics. Clin
chology: is anything changing? Psychol Sci. In press. Exp Pharmacol Physiol 32: 324 –326, 2005.

Advances in Physiology Education • VOL 31 • DECEMBER 2007


Adv Physiol Educ 31: 299, 2007;
doi:10.1152/advan.00032.2007. Commentary

The need for accurate statistical reporting. A commentary on “Guidelines for


reporting statistics in journals published by the American Physiological
Society: the sequel”
Tom Lang
Tom Lang Communications and Training, Davis, California

I THINK IT OBVIOUS that authors should be required to report their When the SEM is used incorrectly as a measure of disper-
statistical analyses accurately and completely and that the sion for a set of measurements, readers may well assume that
standard error of the mean (SEM) should not be used when the measurements have less variability than they actually do:
other statistics are more appropriate. To argue otherwise is to the SEM is always smaller than the SD. (Indeed, an often-cited
promote a lower standard of research. Although an outsider to explanation for this inappropriate use of the SEM in the
your field, I have been asked to participate in this debate, so I literature is specifically to make measurements appear to be
will do so, first by addressing the misuse of SEM and then the more precise than they really are.) When the SEM is used

Downloaded from ajpadvan.physiology.org on December 6, 2007


need to require good reporting practices. correctly, as a 68% confidence interval, estimates are reported
In the 1990s, I completed a systematic review of the studies with less precision, which, in turn, means less confidence in
identifying statistical reporting errors in biomedical journals. subsequent inferences.
The results are published as a comprehensive collection of Meeting high standards should be required in all research
statistical reporting guidelines (1). The third most commonly and publication efforts, not merely recommended. We require
mentioned error in these studies was inappropriately reporting
investigators to use the scientific method; we do not just
the SEM, by using it as a descriptive statistic instead of the
recommend that they do. We require investigators to explain
standard deviation (SD) or as an inferential statistic instead of
their experimental procedures; we do not just recommend that
the 95% confidence interval. (The two most commonly men-
tioned errors were confusing statistical significance with clin- they do. We even require investigators to format their refer-
ical importance and describing the dispersion of skewed data ences correctly; we do not just recommend that they do.
with the standard deviation.) Authors should be required to report statistics as completely
By definition, the SEM is about a 68% confidence interval, and as accurately as every other aspect of the research. To
that is, it is a measure of precision for an estimated character- allow ignorance, tradition, personal preference, or the practices
istic or treatment effect. It is simply wrong to use it as a of other journals to justify anything less is to legitimize the
measure of dispersion for a set of measurements. In this case, very forces that science attempts to overcome.
the SD is preferred if the data are approximately normally Publication is the final stage of research and in many ways,
distributed, and the range or interpercentile range is preferred it is the most important part: it is the beginning of the formal
if they are not. As a measure of precision for an estimate (in debate about the research, the most widely distributed an-
clinical medicine at least) a 68% confidence interval is too nouncement of the research, and usually the only lasting record
close for comfort to the 50% confidence interval (which will of the research. Why, then, should we expect less of authors at
not contain the true population value in half of similar esti- this stage of research than we do in the others?
mates for the same population), so the more conservative 95%
confidence interval is preferred. REFERENCE
1. Lang T, Secic M. How to Report Statistics in Medicine: Annotated
Address for reprint requests and other correspondence: T. Lang, 1925 Guidelines for Authors, Editors, and Reviewers (2nd ed.). Philadelphia, PA:
Donner Ave., No. 3, Davis, CA 95618 (e-mail: tomlangcom@aol.com). American College of Physicians, 2006.

1043-4046/07 $8.00 Copyright © 2007 The American Physiological Society 299


Adv Physiol Educ 31: 300–301, 2007;
Commentary doi:10.1152/advan.00069.2007.

Statistics: not a confidence trick. A commentary on “Guidelines for reporting


statistics in journals published by the American Physiological Society: the
sequel”
P. K. Rangachari
Department of Medicine, Bachelor of Health Sciences (Hons) Programme, Faculty of Health Sciences, McMaster University,
Hamilton, Ontario, Canada

RECENTLY, Curran-Everett and Benos (5) developed guidelines misunderstanding exists as to what statistics can and cannot do,
for reporting statistics in journals published by the American and the authors deal with some very important elements in their
Physiological Society. In this brief commentary, I will com- short review. There is an unwarranted obsession with P values,
ment on those guidelines as well as present my own personal in particular, and this may lead to either exaggerated or
view of this discipline to make this meaningful for readers of understated claims. This point is made more trenchantly by
Advances in Physiology Education. Goodman (8), when he refers to the P value fallacy. The

Downloaded from ajpadvan.physiology.org on December 6, 2007


My interest in statistics was sparked by a rather unfortunate authors emphasize that there is a clear distinction between
remark made by one of my teachers in medical school over 40 statistical significance and scientific significance, with hypoth-
years ago. Emphatically stressing the wrong syllable and with esis testing pointing to the first, but only estimations revealing
a characteristic jiggle of the head, he declared “Statistics is an the latter.
extremely boring subject, but I’m afraid I have to teach it to The later set of guidelines is based on the information
you.” I was cussed enough to find inspiration in that remark. presented in this brief review. The guidelines are clear and
Statistics is far too exciting to be labeled dull or boring. The provide very useful information for both authors and review-
essence of any inquiry is to tease out meaningful information ers. Their suggestion that authors report a precise P value is a
from a welter of noise. Proper use of statistics can give us the good one, since it does provide the reader with the options of
confidence that any statements we make at the end of our considering or discarding the results presented. However, this
research can be justified or have “warranted assertibility,” to point could be disputed, since some texts would state that there
use Dewey’s expressive term (7). The guidelines prepared by really is no point of quibbling once you have set a significance
Curran-Everett and Benos (5) provide a checklist of do’s and value that is acceptable. Daniel (6) also noted that authors
don’ts. In several earlier reports (3, 4), Curran-Everett and have a tendency to place “more and more zeroes to the right of
others have provided the appropriate background material, and the decimal place to make a calculated P value more notewor-
reading those reports is crucial to appreciate the logic under- thy,” so that the reported probabilities look like the scoreboard
lying the guidelines. for a no-hitter in baseball, even though this may have “abso-
Modern science, which is built on 17th century foundations, lutely nothing to do with the practical significance of the
seeks to infer events in the real world by manipulating it. results.”
Scientists provoke changes and assess the meaning and direc- There are other instances where the authors appear to be
tions of those changes with calibrated instruments. They can excessively prescriptive, particularly for contributors to this
select, at best, a small sample from a larger population (rats, journal.
mice, dogs, or humans) and attempt to infer from the effects Consider, for example, the injunction to report variability
observed in the small sample, what the effects would be if the using a standard deviation. This is excellent advice. However,
entire population had been subject to the same manipulations. for some studies reported in this journal, it may not be the best
Knowledge of statistics helps them to extrapolate the informa- approach. Authors should be able to provide their data in the
tion they gather from these limited samples to the larger format that best suits their purposes without being constrained.
population with varying degrees of confidence. In their review, Curran-Everett et al. (3) do agree that although
This helps them deal with two fundamental questions: a standard deviation is useful, it could be a deceptive index of
1) Has there been a change in any of the quantities measured variability, since even subtle departures from a normal distri-
and 2) Is the change large enough to be meaningful? The first bution can make it useless. In educational settings, deviations
question is answered by hypothesis testing and the second by from a normal distribution may occur fairly often when one is
estimation. It is important to establish whether or not a change dealing with selected samples of students. As the authors are
has occurred that cannot be accounted for by random varia- not unaware of this issue, it is surprising, then, that in their
tions. However, hypothesis testing is fairly limited, and the guidelines they are so prescriptive. Several years ago, I pub-
authors point out it that is largely an artificial construct. The lished a report (11) in this journal, where I reported on a course
more important issue is whether the magnitude and direction of designed to teach students the elements of scientific discovery.
the change have any relevance. Unfortunately, considerable I included a table that showed student evaluations of the
course, where I showed data as median, mode, and range of
responses to each question on a 5-point scale. I could well have
Address for reprint requests and other correspondence: P. K. Rangachari,
Dept. of Medicine, Bachelor of Health Sciences (Hons) Programme, Faculty of given just the mean and standard deviation, but as the re-
Health Sciences, McMaster Univ., Hamilton, ON, Canada L8N 3Z5 (e-mail: sponses were skewed toward the higher end, giving the data in
chari@mcmaster.ca). the format I chose was more meaningful.
300 1043-4046/07 $8.00 Copyright © 2007 The American Physiological Society
Commentary
STATISTICS: NOT A CONFIDENCE TRICK 301

Curran-Everett and Benos (5) also cautioned authors from test” (9). Motulsky (10) made this point quite explicit when he
presenting data giving variability in terms of standard errors of said that “If your data speak for themselves, don’t interrupt.”
the mean rather than standard deviations. This is a reasonable Although I am not disputing the need for clear guidelines and
position but could, again, be more rigid than necessary. I would that the authors have succeeded admirably in their aims, I feel
argue that what we are really trying to do in most studies is to a bit uncomfortable that these may be interpreted in a narrow
gauge what the population parameters are based on our sample sense to exclude good information from being published in this
characteristics. Thus, giving the standard error of the mean journal.
allows the reader to have some idea as to how closely the mean
values reported approximate the population mean. Further- REFERENCES
more, giving standard errors of the mean and n values helps 1. Carver RP. The case against statistical significance testing. Harvard Educ
readers gauge the confidence limits of the mean. Rev 48: 378 –399, 1978.
2. Cohen J. The Earth is round (p⬍0.05). Am Psychol 49: 997–1003, 1994.
Another point on which authors should be given leeway is to 3. Curran-Everett D, Taylor S, Kafadar K. Fundamental concepts in
choose not to do any statistical tests at all. The authors point statistics; elucidation and illustration. J Appl Physiol 85: 775–786, 1998.
out, in an earlier report (3), the problem with tests of signifi- 4. Curran-Everett D. Multiple comparisons: philosophies and illustrations.
cance. This particular issue has been debated at length (1, 2, 6). Am J Physiol Regul Integr Comp Physiol 279: R1–R8 2000.
Daniel (6) made much the same point when he suggested that 5. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in
journals published by the American Physiological Society. Adv Physiol
editors should “require authors to avoid using SSTs (statistical Educ 28: 85– 87, 2004.
significance tests) where not appropriate.” Carver (1), in an 6. Daniel LG. Statistical significance testing: a historical overview of misuse

Downloaded from ajpadvan.physiology.org on December 6, 2007


earlier commentary, was less conciliatory, stating bluntly that and misinterpretation with the implications for the editorial policies of
“Statistical significance testing has involved more fantasy than educational journals. Res Schools 5: 23–32, 1998.
7. Dewey J. Logic: The Theory of Inquiry. New York: Irvington, 1938/1982,
fact. The emphasis on statistical significance testing over p. 9 and 104 –105.
scientific significance in educational research represents a cor- 8. Goodman SN. Toward evidence-based medical statistics. 1: the P value
rupt form of the scientific method. Educational research would fallacy. Ann Intern Med 130: 995–1004, 1999.
be better off if it stopped testing its results for statistical 9. Kitchen I. Statistics and pharmacology: the bloody obvious test. Trends
significance.” I would have liked the authors in their guidelines Pharmacol Sci 8: 252–253, 1987.
10. Motulsky H. Intuitive Biostatistics. London: Oxford Univ. Press, 1995.
to make an explicit statement, by stating that when changes are 11. Rangachari PK. Exploring the context of biomedical research through a
clear enough, there is really no need to do any tests at all or fall problem-based course for undergraduate students. Adv Physiol Educ 23:
back on what pharmacologists call the “the bloody obvious 40 –51, 2000.

Advances in Physiology Education • VOL 31 • DECEMBER 2007


Adv Physiol Educ 31: 302–304, 2007;
Commentary doi:10.1152/advan.00084.2007.

How should we achieve high-quality reporting of statistics in scientific


journals? A commentary on “Guidelines for reporting statistics in journals
published by the American Physiological Society”

Murray K. Clayton
Departments of Statistics and Plant Pathology, University of Wisconsin, Madison, Wisconsin

SOME DISTURBING NEWS was reported in the March 2005 issue of falls below this particular value, then you should reject the null
Significance, a publication of the Royal Statistical Society (7). hypothesis; if you have two groups, it’s a t-test; three groups
Based on a variety of audits, it was found that 38% of the and it’s ANOVA, etc. We bypass the complexities and leave
articles published in 2001 in Nature contained some statistical students believing that, upon the completion of the semester,
“incongruence”: a disparity between reported test statistics they are now qualified to “do statistics.” That’s hopelessly
(t-tests, F-tests, etc.) and their corresponding P values. The naı̈ve; most experiments yield data far more complex than can
British Medical Journal (BMJ) faired a little better, with 25% be handled by a semester or two of statistics.

Downloaded from ajpadvan.physiology.org on December 6, 2007


of the articles containing at least one incongruence. A subse- Of course, an algorithmic approach is easier to learn and
quent audit of Nature Medicine for 2000 “showed clear evi- teach (and grade), but it ultimately does not serve the practi-
dence that the authors did not even understand the meaning of tioner well. Caveats notwithstanding, I fear Curran-Everett and
P values.” Benos are reinforcing this perspective. Too many of these
With understated regret, the Significance article notes that, guidelines have a nearly algorithmic, dictatorial sound: “the
although all Nature journals will eventually adopt guidelines right thing is to do x.” This is too easily transmuted into rules
regarding the use of statistics, “the journal will not introduce a that take the form of “if you don’t do x, then it’s wrong,”
statistical review process.” The approach taken by BMJ’s followed by a citation to the guidelines. I have, on a number of
Deputy Editor seems to reflect philosophical resignation: “Re- occasions, had precisely this happen: a reviewer, not knowing
search done at the BMJ shows that peer reviewers identify only a lot of statistics, but knowing the guidelines well, will reject
a minority of major errors in a manuscript–so what hope is a manuscript because the statistics are “wrong.” Well, no,
there of them identifying these minor ones?” This seems only they’re not wrong, but they involve subtleties that the guide-
a short step away from saying “Why bother? And, besides, it lines don’t cover. In an attempt to clarify and simplify, the
doesn’t matter anyway.” guidelines actually prove to be an annoying, if not worse, tool
Well, of course it does matter. Faulty statistical analyses can in the hands of the nonexpert.
result in wasted research resources and, worse still, compro- Despite this potential for annoyance, I must say that some of
mise the health of research animals, human subjects, and the the guidelines I absolutely endorse. I can’t, for example, object
ultimate recipients of therapies. The American Physiological to guideline 1 (Consult a statistician), although I think a critical
Society (APS) is therefore to be commended for taking on this point of having a statistician involved in project and experi-
issue. What precisely should be done about the matter is a mental design is that it can lead to simpler, clearer analyses and
greater challenge, however. In 2004, Curran-Everett and Benos interpretations. (I also appreciate and agree with the notion that
leapt into the fray by providing 10 guidelines regarding the statistician “can help;” I’m sorry that some statisticians take
statistical approaches (4) and now report that nobody ap- it upon themselves to be officious, instead.) Guideline 3 (Iden-
pears to be heeding their advice but, at least, they didn’t hear tify your methods, references, and software), guideline 7 (Re-
any grousing from statisticians. Of course, the fact that they port precise P values), and the interpretations of P values in
didn’t hear grousing doesn’t mean that it doesn’t exist and Table 1 all sit well. [Some of these points can also be found in
so I’ll go on record: although I believe that their intentions
the elegant, less prescriptive 1988 offering by the preeminent
are good, in some ways I fear that they have just muddled
statisticians Mosteller and Bailar (2).]
the issue further.
From this point the waters get deep. Curran-Everett and
They’re just guidelines, right? So what’s the problem? Let
Benos describe their guidelines as representing best practices
me digress for a moment. For the most part, I think statistics is
in statistics. I cannot agree– certainly they would not receive
poorly taught. Not that the instructors are bad, necessarily, but
uniform support among statisticians. That great pioneer of
rather that we aim too high. Realistically, we might hope that
statistics, Fisher, would certainly appreciate guideline 7 but
a student who has finished a one-semester, introductory course
is able to read reports and statistics cited in a newspaper more would abhor guideline 2 (Use a predetermined critical value).
critically. With loftier goals, we try to jam in more information, Guideline 2 reflects more the philosophical approach of Ney-
and this leads to a formulaic, algorithmic approach: if the data man and Pearson, also giants in the field, who specifically
look like this, then the analysis should be that; if the P value advocated the use of critical values like ␣. [There was mutual
animosity between Fisher and the Neyman-Pearson school, as
evidenced by their occasionally intemperate remarks on these
Address for reprint requests and other correspondence: M. K. Clayton,
issues (5, 6).] In a curious ecumenicism, however, the authors’
Depts. of Statistics and Plant Pathology, Univ. of Wisconsin-Madison, Mad- interpretation of “if the P value is less than ␣, then the
ison, WI 53706 (e-mail: clayton@stat.wisc.edu). experimental effect is likely to be real,” fits nicely with the
302 1043-4046/07 $8.00 Copyright © 2007 The American Physiological Society
Commentary
HIGH-QUALITY REPORTING OF STATISTICS IN SCIENTIFIC JOURNALS 303

Bayesian school but is a patently incorrect interpretation of a P tunately for the nonspecialist, the field of statistics is deep,
value from either a Neyman-Pearson or Fisherian view. complex, and evolving. In each of the last several years, for
Guideline 10 (Interpret each main result with a confidence example, the journal Statistics in Medicine has been publishing
interval and precise P value) repeats the error of interpretation about 4,000 pages of articles focused principally on the devel-
and compounds the problem with poor advice: “If either bound opment of new methods for analyzing data. Over the last 20
of the confidence interval is important from a scientific per- years or so, the manual for the statistical analysis component of
spective, then the experimental effect may be large enough to SAS has expanded from about 1,000 pages to roughly 5,000
be relevant.” The “may be” leaves the authors an out, but the pages. Aspects of apparently routine methods such as regres-
critical issue is that, if the sample is too small, then the sion and ANOVA are under continuous refinement, and meth-
confidence interval will be large, thus increasing the chance ods employed today are often quite different from those used
that one of the bounds of the confidence intervals is, appar- even a few decades ago. It is difficult to keep up.
ently, “important from a scientific perspective.” This confusion So what to do? I have two suggestions. First, I think the APS
underlies guideline 6 (Report uncertainty about scientific im- editorial board should indicate in their instructions that authors
portance using a confidence interval) as well. Although I
are responsible for ensuring that the statistics in articles are
appreciate the technical accuracy of the definition of a confi-
correct, appropriate, follow modern practices, and are well-
dence interval, the authors oversimplify. It is correct to say that
presented and that they are subject to review. And mean it. This
a confidence interval characterizes uncertainty about the true
value of a parameter but incorrect to say that it provides an is effectively what the Federal Drug Administration does in

Downloaded from ajpadvan.physiology.org on December 6, 2007


assessment of scientific importance. drug trials and what the National Institutes of Health does in
Both in their advice regarding critical values and confidence evaluating grant applications involving human subjects. The
intervals, Curran-Everett and Benos have overstated what these burden is on the author, of course, to ensure that there is
things actually provide, perhaps in favor of what they wish adequate quality in the work.
they would provide. The discussion of guideline 6 leads to Second, I think we need to move away from the idea that
another issue, by the way: for a data graphic, they advise using statistics is a technical tool, like a pH meter, and recognize
a confidence interval. Not everyone agrees; it depends in part that it is a scientific discipline, requiring considerable train-
on the intent, as noted by Andrews et al. (1). ing, skill, and practice. We recognize more and more the
I have a minor quibble with guideline 4 (Control for multiple need for interdisciplinary teams to solve research problems,
comparisons), whose principle I support completely. However, and so, for example, a given project might benefit from the
in 1973, Carmer and Swanson made it pretty clear that the input of a neurochemist, a muscle physiologist, an expert in
Newman-Keuls procedure is inferior to the protected least- ion channel function, and a proteomicist. Each contributes
significant difference procedure (3). Referring to an outmoded their own expertise, each acknowledges their own limita-
technique again leads to the risk that such mention reflects an tions, and the combined effort is a superior product. Journals
endorsement. do not need to provide detailed guidelines on how to
We’re left with guideline 5 (Report variability using the conduct or report the research insofar as high standards
SD), which I understand raised a lot of concerns among some result from the review process. So why not treat statisticians
APS members. That’s understandable: it appears to go against on an equal footing, both as collaborators and reviewers?
tradition. My concern is that it might equally go against logic. All too often there is an asymmetry in the way that statistics
Yes, the SD does indicate variability of a single observation. are treated in a research project compared with, say, bio-
But it’s actually the rare instance where we want to know such chemistry, genomics, etc.
a thing. Such an instance can arise when we want to develop Another curious asymmetry arises in how we handle data. It
normal ranges for some diagnostic result: it’s useful to know is curious to me that researchers might spend months collecting
the SD of systolic blood pressure, because then we can calcu- tissue samples, say, and then many more months performing
late a 95% interval and, more importantly, know when some- assays on these samples, but the actual data analysis often gets
one’s blood pressure lies outside the “normal range.”
short shrift. Why? Why, given the extensive resources and time
But, in most circumstances in reporting experimental results,
it takes to collect the data, do some people expect to be able to
we want to compare means of different groups, and the imme-
do the analysis in an afternoon? Why would they want to?
diately relevant quantity is the SE. Yes, citing the mean ⫾ SE
only gives a 68% confidence interval, but a quick trick is that In 2000, in its endorsement of the Mathematical Association
the mean ⫾ 2 ⫻ SE gives a good approximation to a 95% of America Guidelines for Programs and Departments in
confidence interval. And, although it’s true that the SD and SE Undergraduate Mathematical Sciences, the American Statisti-
are related, performing the mental math to convert SD to SE is cal Association noted that “Generic packages such as Excel are
nontrivial. But why not make a more useful recommendation: not sufficient even for the teaching of statistics, let alone for
report, on a graph or in text, the SD or SE, depending on what research and consulting” (http://www.amstat.org /Education/
information you want to convey. If you want to compare index.cfn?fusoradian-ASAendorsement). Numerous other arti-
means, use SE; if you want to look at variation among a group cles document numerical errors, misstatements, and various
of individuals, use SD. For other purposes, you might use weaknesses in that software. Why is there acceptance, there-
something else again (1). fore, of using mediocre analysis tools in statistics when high
I truly appreciate the authors’ goals: to improve the quality standards are held for the use of other tools and techniques in
of statistics in the journals and, of course, in the accompanying the sciences?
science. I also have sympathy for authors, reviewers, and I don’t think it is needlessly idealistic to think that statistics
editors. Who doesn’t want a short, simple set of rules? Unfor- can be an equal partner in the sciences. I have seen this model

Advances in Physiology Education • VOL 31 • DECEMBER 2007


Commentary
304 HIGH-QUALITY REPORTING OF STATISTICS IN SCIENTIFIC JOURNALS

work countless times, and I have had the good fortune to medical journals–amplifications and explanations. Ann Intern Med 108:
participate in it. Despite my quibbling with the details of the 266 –273, 1988.
3. Carmer SG, Swanson MR. An evaluation of ten pairwise multiple com-
suggestions by Curran-Everettt and Benos, I have complete parison procedures by Monte Carlo methods. J Am Statist Assoc 68: 66 –74,
sympathy with their efforts to improve the situation. However, 1973.
rather than promulgating a handful of guidelines, I believe it 4. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in jour-
will take support from APS members as a whole, and not just nals published by the American Physiological Society. Adv Physiol Educ
28: 85– 87, 2004.
a few advocates, to make positive change.
5. Fisher RA. Statistical Methods and Scientific Inference (3rd ed). New
York: Hafner, 1973, p. 79 – 82.
REFERENCES
6. Neyman J. “Inductive behavior” as a basic concept of philosophy of
1. Andrews HP, Snee RD, Sarner MH. Graphical display of means. Am science. Rev Int Stat Inst 25: 7–22, 1957.
Statistician 34: 195–199, 1980. 7. Royal Statistical Society. More scrutiny for scientific statistics. Signifi-
2. Bailar JC, Mosteller F. Guidelines for statistical reporting in articles for cance 2: 2, 2005.

Downloaded from ajpadvan.physiology.org on December 6, 2007

Advances in Physiology Education • VOL 31 • DECEMBER 2007


Adv Physiol Educ 31: 305, 2007;
doi:10.1152/advan.00087.2007. Commentary

Sustained efforts should promote statistics literacy in physiology.


Commentary on “Guidelines for reporting statistics in journals published by
the American Physiological Society: the sequel”
Bryan Mackenzie
Department of Molecular and Cellular Physiology, University of Cincinnati College of Medicine, Cincinnati, Ohio

DOUGLAS CURRAN-EVERETT, DALE BENOS, and the editors of the A common reporting deficiency not discussed in the guide-
American Physiological Society journals are to be commended lines is the omission of the absolute value of the 100% or
for their efforts to raise the standards of statistics reporting in control value (and a measure of its variability) in the presen-
physiology journals. I read with interest the followup (in this tation of normalized data. Adding it to the figure legend is no
issue) to the 2004 guidelines (2), and I am glad to see those difficult task, and its inclusion is necessary for other investi-
efforts sustained. Curran-Everett and Benos reported that “the gators in planning future studies.
mere publication of the guidelines failed to impact reporting While some may disagree with specific recommendations in

Downloaded from ajpadvan.physiology.org on December 6, 2007


practices” (at least with regard to the use of standard devia- the guidelines and others may fear them too prescriptive, the
tions, confidence intervals, and precise P values), but I offer guidelines for reporting statistics certainly have physiologists
that it is still too early to assess. Change comes about slowly. talking more about statistics in general. That’s good. Perhaps
If investigators fully embraced the guidelines, which include the wider benefit will be to promote statistics literacy within
guidelines on planning the study and establishing the critical our discipline. After all, the literature contains more egregious
significance level ␣, we should only now begin to see signif- statistical errors than error bar deficiencies, and such errors
icant change in the resulting publications. Still, there are those spring faulty conclusions. The guidelines now need to be
who remain to be persuaded. backed by efforts on at least two fronts. The first is in better
As I have talked with colleagues about the 2004 guidelines,
educating our students. A new graduate course “Statistical
the issue of standard deviation versus standard error has drawn
Methods in Physiology” at this institution is set to become part
the strongest reaction, so I am pleased that this is given special
of the core curricula for our PhD programs in physiology,
attention in the followup. Figure 1 there illustrates that the
sample standard deviation (s, SD), but not the standard error of systems biology, neuroscience, and pharmacology and our MS
the sample mean (SEM), describes the variability within the program in physiology. I hope that other institutions are giving
sample, and your other commentators have more eloquently the same priority to statistics courses. The second is in the
argued this point than can I. Figure 1 in the followup also review process. Tom Lang, in his commentary, places the ball
reminds us that just how well sample standard deviation squarely in the court of editors when he “doubt[s] that anything
approximates the population standard deviation (␴) depends on will change until journals stop accepting manuscripts in which
sample size (n); hence, the importance of specifying the value the statistics are incorrectly reported.” Editors need to provide
of n despite the common idea that it is not necessary when reviewers explicit permission to call attention to errors in
reporting sample standard deviations. Meanwhile, the utility of statistical analysis or reporting in manuscripts under review,
SEM in making between-group inferences (when its relation- and editors should weigh such criticisms as they would criti-
ship to confidence intervals is understood) has not been under- cisms of the biology or experimental approach. I suspect that it
stated (see also Ref. 1). is presently all too easy for authors to dismiss criticisms
Investigators may be more readily accepting of the push to concerning statistics, as in the example cited by one of the
report precise P values. I would add that reiterating “not commentators in the followup. We need to be reminded that
significant” alongside the precise P value may be helpful when careful statistical analyses are not icing; they are central to
accepting the null hypothesis since readers may be accustomed reaching valid and unbiased conclusions.
to seeing P values only when they are used to report significant
differences. REFERENCES
1. Cumming G, Fidler F, Vaux DL. Error bars in experimental biology.
Address for reprint requests and other correspondence: B. Mackenzie, J Cell Biol 177: 7–11, 2007.
Dept. of Molecular and Cellular Physiology, Univ. of Cincinnati College of 2. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in jour-
Medicine, PO Box 670576, Cincinnati, OH 45267-0576 (e-mail: nals published by the American Physiological Society. Am J Physiol
bryan.mackenzie@uc.edu). Gastrointest Liver Physiol 287: G307–G309, 2004.

1043-4046/07 $8.00 Copyright © 2007 The American Physiological Society 305


Adv Physiol Educ 31: 306–307, 2007;
Last Word doi:10.1152/advan.00089.2007.

Last Word on Perspectives “Guidelines for reporting statistics in journals


published by the American Physiological Society: the sequel”
Douglas Curran-Everett1,2,3 and Dale J. Benos4
1
Division of Biostatistics, National Jewish Medical and Research Center, and Departments of 2Preventive Medicine and
Biometrics and 3Physiology and Biophysics, School of Medicine, University of Colorado at Denver and Health Sciences
Center, Denver, Colorado; and 4Department of Physiology and Biophysics, University of Alabama, Birmingham, Alabama

WE APPRECIATE ALL COMMENTS, past (13, 15) and present (2, 14, science, the statistical philosophies about ␣ and P have been
16, 17), on our guidelines (6) for reporting statistics in journals melded together.
published by the American Physiological Society (APS). In Guideline 4. Control for multiple comparisons. Clayton (2)
2004 we wrote that the guidelines embodied fundamental is concerned that we mentioned the Newman-Keuls procedure
concepts in statistics (8) and that they were consistent with the in a footnote to the exposition of this guideline (6). In that
Uniform Requirements (12), used by roughly 650 biomedical footnote, we listed also the Bonferroni and least significant
journals, and with Scientific Style and Format (4), used by APS difference procedures as examples of common multiple com-

Downloaded from ajpadvan.physiology.org on December 6, 2007


publications. In 2007 we will go farther: the guidelines reflect parison procedures. By the mere mention of these procedures,
mainstream statistical concepts and accepted statistical prac- we did not mean to endorse them: a review (5) cited in the
tices. same footnote illustrates that each of these three procedures is
Although we published guidelines–not dictums–for report- of limited practical value.
ing statistics, Clayton (2) and Rangachari (17) described our Guideline 5. Report variability using a standard deviation. This
wording of the guidelines as dictatorial and prescriptive. These guideline reinforces the essential difference between a standard
reactions raise a question not about statistics but about com- deviation and a standard error (see Refs. 6 – 8). Clayton (2)
position: if you want to write an effective guideline, just how expresses concern about the logic of this guideline but then
do you do it? We followed three principles of composition promptly reinforces it. Variability among sample observations
from The Elements of Style (19): is a basic scientific and statistical characteristic. Sir Ronald
Fisher (11) argued its value:
15. Put statements in positive form.
16. Use definite, specific, concrete language. [Populations] always display variation . . . The variation
17. Omit needless words. itself was not an object of study, but was recognised rather as
a troublesome circumstance which detracted from the value of
Because it was impossible for each guideline, so written, to the average . . . [The] study of the causes of variation of any
fully explain itself, we included a brief explanation or example variable phenomenon . . . should be begun by the examination
to elucidate subtleties the guideline itself could not possibly and measurement of the variation which presents itself.
accommodate.1 In addition, we provided other resources (1, 3,
5, 8 –10, 12, 18) interested readers could use in concert with the Guideline 6. Report uncertainty about scientific importance
guidelines. using a confidence interval and Guideline 10. Interpret each
main result by assessing the numerical bounds of the confi-
Specific Guidelines dence interval and by considering the precise P value. Clayton
(2) argues these guidelines harbor errors of interpretation and
In this section, we address comments made about specific give poor advice. We disagree.
guidelines. When we wrote this guideline, we synthesized a lot of
Guideline 2. Define and justify a critical significance level ␣ information into the phrase scientific importance, and we
appropriate to the goals of your study. This guideline rein- packed a lot of information into the explanation
forces the notion that a statistical benchmark of 0.05, that is,
␣ ⫽ 0.05, is not always the optimum choice. As Clayton (2) If either bound of the confidence interval is important from
points out, there are different statistical philosophies about the a scientific perspective, then the experimental effect may be
interpretation and use of the critical significance level ␣ and the large enough to be relevant.
observed significance level P. The explanation of this guideline The examples below illustrate interpretations and advantages
reflects not a specific philosophy but the reality of use: in of confidence intervals.
Suppose you study the impact of new drugs on systemic
hypertension. You find three studies in the American Journal
Address for reprint requests and other correspondence: D. Curran-Everett, of Physiology-Heart and Circulatory Physiology that investi-
Div. of Biostatistics, M222, National Jewish Medical and Research Center, gated independently the effect of three different drugs on
1400 Jackson St., Denver, CO 80206 (e-mail: EverettD@njc.org).
1
systemic hypertension. Each study involved a sample of 25
Imagine the guideline Report variability using a standard deviation written subjects, and– because they wanted to be especially confident of
as Report variability using a standard deviation, unless the underlying distri-
bution is nonnormal. In that case, report variability using an interquartile their findings– each reported a 99% confidence interval for the
range. mean change in systemic blood pressure. For each drug, these are
306 1043-4046/07 $8.00 Copyright © 2007 The American Physiological Society
Last Word
REPLY TO COMMENTARIES ON THE 2004 GUIDELINES 307

the sample mean y៮ , sample standard deviation s, P value, and 99% changed. The guidelines we published in 2004 (6) embody
confidence interval: those fundamental concepts. We continue to believe the guide-
lines offer a concise, accurate framework that we hope will
help improve the caliber of statistical information reported in
Sample Standard articles published by the American Physiological Society.
Sample Mean y៮ Deviation s P Value Confidence Interval
Drug A ⫺20 18 ⬍0.001 ⫺30 to ⫺10 REFERENCES
Drug B ⫺0.2 18 ⬍0.001 ⫺0.3 to ⫺0.1
Drug C ⫺20 18 0.07 ⫺50 to ⫹10 1. Bailar JC III, Mosteller F. Guidelines for statistical reporting in articles
for medical journals. Ann Intern Med 108: 266 –273, 1988.
2. Clayton MK. How should we achieve high-quality reporting of statistics
in scientific journals? A commentary on “Guidelines for reporting statis-
tics in journals published by the American Physiological Society: the
How do you interpret these results? sequel”. Adv Physiol Educ; doi:10.1152/advan.00084.2007.
Drug A decreased blood pressure by 20 mmHg, a change 3. Conover WJ. Practical Nonparametric Statistics (2nd ed.). New York:
that differed convincingly from 0 (P ⬍ 0.001). The confidence Wiley, 1980.
interval suggests the true mean impact of drug A is likely to be 4. Council of Science Editors, Style Manual Subcommittee. Scientific
between a 10- and 30-mmHg decrease in blood pressure, a Style and Format: the CSE Manual for Authors, Editors, and Publishers
(7th ed.). Reston, VA: Rockefeller Univ. Press, 2006.
change that is scientifically meaningful and reasonably precise. 5. Curran-Everett D. Multiple comparisons: philosophies and illustrations.
Drug A produced a convincing change of scientific importance. Am J Physiol Regul Integr Comp Physiol 279: R1–R8, 2000.

Downloaded from ajpadvan.physiology.org on December 6, 2007


Drug B decreased blood pressure by 0.2 mmHg, a change 6. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in
that also differed convincingly from 0 (P ⬍ 0.001). The journals published by the American Physiological Society. Adv Physiol
Educ 28: 85– 87, 2004. http://advan.physiology.org/cgi/content/full/
confidence interval suggests the true mean impact of drug B 28/3/85.
is likely to be between a 0.1- and 0.3-mmHg decrease in 7. Curran-Everett D, Benos DJ. Guidelines for reporting statistics in
blood pressure, a change that is scientifically trivial but journals published by the American Physiological Society: the sequel. Adv
quite precise. Drug B produced a convincing change of no Physiol Educ; doi:10.1152/advan.00022.2007.
scientific importance. 8. Curran-Everett D, Taylor S, Kafadar K. Fundamental concepts in
statistics: elucidation and illustration. J Appl Physiol 85: 775–786, 1998.
Drug C decreased blood pressure by 20 mmHg, a change 9. Draper NR, Smith H. Applied Regression Analysis (2nd ed.). New York:
consistent with 0 (P ⫽ 0.07). The confidence interval suggests Wiley, 1981.
the true impact of drug C could range from a 10-mmHg 10. Ehrenberg ASC. Rudiments of numeracy. J R Stat Soc Ser A 140:
increase to a 50-mmHg decrease in blood pressure. If the true 277–297, 1977.
11. Fisher RA. Statistical Methods for Research Workers. New York: Hafner,
impact of drug C is a 10-mmHg increase in blood pressure, 1954, p. 3.
then drug C is not a viable drug with which to decrease blood 12. International Committee of Medical Journal Editors. Uniform require-
pressure. In contrast, if the true impact of drug C is a 50-mmHg ments for manuscripts submitted to biomedical journals. Ann Intern Med
decrease in blood pressure, then drug C does decrease blood 126: 36 – 47, 1997.
pressure. Because it is relatively long, the confidence interval 13. Koehnle T, Curran-Everett D, Benos DJ. The proof is not in the P value.
Am J Physiol Regul Integr Comp Physiol 288: R777–R778, 2005.
for drug C is an imprecise estimate of the true impact of drug 14. Lang T. The need for accurate statistical reporting. A commentary on
C on blood pressure. Drug C bears further study using a larger “Guidelines for reporting statistics in journals published by the American
sample size. Physiological Society: the sequel”. Adv Physiol Educ; doi:10.1152/
Note that the scientific importance of the upper and lower advan.00032.2007.
15. Ludbrook J. Comments on journal guidelines for reporting statistics. Clin
bounds of a confidence interval depends on scientific context. Exp Pharmacol Physiol 32: 324 –326, 2005.
Although a standard error characterizes uncertainty about 16. Mackenzie B. Sustained efforts should promote statistics literacy in
the true value of some population characteristic (for example, physiology. A commentary on “Guidelines for reporting statistics in
the mean), a confidence interval is a more useful estimate journals published by the American Physiological Society: the sequel”.
(6 – 8, 14). Adv Physiol Educ; doi:10.1152/advan.00087.2007.
17. Rangachari P. Statistics: not a confidence trick. A commentary on
Summary “Guidelines for reporting statistics in journals published by the American
Physiological Society: the sequel”. Adv Physiol Educ; doi:10.1152/
Clayton (2) reminds us that statistics, like science, continues advan.00069.2007.
18. Snedecor GW, Cochran WG. Statistical Methods (7th ed.). Ames, IA:
to evolve. Of course it does. But fundamental concepts of Iowa State Univ. Press, 1980.
statistics– concepts of statistical significance, scientific impor- 19. Strunk W Jr, White EB. The Elements of Style (3rd ed.). New York:
tance, variability, uncertainty, multiple testing–remain un- Macmillan, 1979, p. 19 –25.

Advances in Physiology Education • VOL 31 • DECEMBER 2007

View publication stats

You might also like