You are on page 1of 6

Open access, freely available online

Essay

Why Most Published Research Findings


Are False
John P. A. Ioannidis

factors that influence this problem and is characteristic of the field and can
Summary some corollaries thereof. vary a lot depending on whether the
There is increasing concern that most field targets highly likely relationships
Modeling the Framework for False or searches for only one or a few
current published research findings are
Positive Findings true relationships among thousands
false. The probability that a research claim
is true may depend on study power and Several methodologists have and millions of hypotheses that may
bias, the number of other studies on the pointed out [9–11] that the high be postulated. Let us also consider,
same question, and, importantly, the ratio rate of nonreplication (lack of for computational simplicity,
of true to no relationships among the confirmation) of research discoveries circumscribed fields where either there
relationships probed in each scientific is a consequence of the convenient, is only one true relationship (among
field. In this framework, a research finding yet ill-founded strategy of claiming many that can be hypothesized) or
is less likely to be true when the studies conclusive research findings solely on the power is similar to find any of the
conducted in a field are smaller; when the basis of a single study assessed by several existing true relationships. The
effect sizes are smaller; when there is a formal statistical significance, typically pre-study probability of a relationship
greater number and lesser preselection for a p-value less than 0.05. Research being true is R⁄(R + 1). The probability
of tested relationships; where there is is not most appropriately represented of a study finding a true relationship
greater flexibility in designs, definitions, and summarized by p-values, but, reflects the power 1 − β (one minus
outcomes, and analytical modes; when unfortunately, there is a widespread the Type II error rate). The probability
there is greater financial and other notion that medical research articles of claiming a relationship when none
interest and prejudice; and when more truly exists reflects the Type I error
teams are involved in a scientific field It can be proven that rate, α. Assuming that c relationships
are being probed in the field, the
in chase of statistical significance. most claimed research expected values of the 2 × 2 table are
Simulations show that for most study
designs and settings, it is more likely for findings are false. given in Table 1. After a research
a research claim to be false than true. finding has been claimed based on
Moreover, for many current scientific should be interpreted based only on achieving formal statistical significance,
fields, claimed research findings may p-values. Research findings are defined the post-study probability that it is true
often be simply accurate measures of the here as any relationship reaching is the positive predictive value, PPV.
prevailing bias. In this essay, I discuss the formal statistical significance, e.g., The PPV is also the complementary
implications of these problems for the effective interventions, informative probability of what Wacholder et al.
conduct and interpretation of research. predictors, risk factors, or associations. have called the false positive report
“Negative” research is also very useful. probability [10]. According to the 2
“Negative” is actually a misnomer, and × 2 table, one gets PPV = (1 − β)R⁄(R
− βR + α). A research finding is thus

P
ublished research findings are the misinterpretation is widespread.
sometimes refuted by subsequent However, here we will target
evidence, with ensuing confusion relationships that investigators claim Citation: Ioannidis JPA (2005) Why most published
and disappointment. Refutation and exist, rather than null findings. research findings are false. PLoS Med 2(8): e124.
controversy is seen across the range of As has been shown previously, the
Copyright: © 2005 John P. A. Ioannidis. This is an
research designs, from clinical trials probability that a research finding open-access article distributed under the terms
and traditional epidemiological studies is indeed true depends on the prior of the Creative Commons Attribution License,
probability of it being true (before which permits unrestricted use, distribution, and
[1–3] to the most modern molecular reproduction in any medium, provided the original
research [4,5]. There is increasing doing the study), the statistical power work is properly cited.
concern that in modern research, false of the study, and the level of statistical
Abbreviation: PPV, positive predictive value
findings may be the majority or even significance [10,11]. Consider a 2 × 2
the vast majority of published research table in which research findings are John P. A. Ioannidis is in the Department of Hygiene
compared against the gold standard and Epidemiology, University of Ioannina School of
claims [6–8]. However, this should Medicine, Ioannina, Greece, and Institute for Clinical
not be surprising. It can be proven of true relationships in a scientific Research and Health Policy Studies, Department of
that most claimed research findings field. In a research field both true and Medicine, Tufts-New England Medical Center, Tufts
University School of Medicine, Boston, Massachusetts,
are false. Here I will examine the key false hypotheses can be made about United States of America. E-mail: jioannid@cc.uoi.gr
the presence of relationships. Let R
be the ratio of the number of “true Competing Interests: The author has declared that
The Essay section contains opinion pieces on topics no competing interests exist.
of broad interest to a general medical audience.
relationships” to “no relationships”
among those tested in the field. R DOI: 10.1371/journal.pmed.0020124

PLoS Medicine | www.plosmedicine.org 0696 August 2005 | Volume 2 | Issue 8 | e124


same question, claims a statistically
Table 1. Research Findings and True Relationships
significant research finding is easy to
Research True Relationship estimate. For n independent studies of
Finding Yes No Total
equal power, the 2 × 2 table is shown in
Yes c(1 − β)R/(R + 1) cα/(R + 1) c(R + α − βR)/(R + 1) Table 3: PPV = R(1 − βn)⁄(R + 1 − [1 −
No cβR/(R + 1) c(1 − α)/(R + 1) c(1 − α + βR)/(R + 1) α]n − Rβn) (not considering bias). With
Total cR/(R + 1) c/(R + 1) c increasing number of independent
studies, PPV tends to decrease, unless
DOI: 10.1371/journal.pmed.0020124.t001
1 − β < α, i.e., typically 1 − β < 0.05.
This is shown for different levels of
more likely true than false if (1 − β)R are lost in noise [12], or investigators
power and for different pre-study odds
> α. Since usually the vast majority of use data inefficiently or fail to notice
in Figure 2. For n studies of different
investigators depend on α = 0.05, this statistically significant relationships, or
power, the term βn is replaced by the
means that a research finding is more there may be conflicts of interest that
product of the terms βi for i = 1 to n,
likely true than false if (1 − β)R > 0.05. tend to “bury” significant findings [13].
but inferences are similar.
What is less well appreciated is There is no good large-scale empirical
that bias and the extent of repeated evidence on how frequently such Corollaries
independent testing by different teams reverse bias may occur across diverse A practical example is shown in Box
of investigators around the globe may research fields. However, it is probably 1. Based on the above considerations,
further distort this picture and may fair to say that reverse bias is not as one may deduce several interesting
lead to even smaller probabilities of the common. Moreover measurement corollaries about the probability that a
research findings being indeed true. errors and inefficient use of data are research finding is indeed true.
We will try to model these two factors in probably becoming less frequent Corollary 1: The smaller the studies
the context of similar 2 × 2 tables. problems, since measurement error has conducted in a scientific field, the less
decreased with technological advances likely the research findings are to be
Bias
in the molecular era and investigators true. Small sample size means smaller
First, let us define bias as the are becoming increasingly sophisticated power and, for all functions above,
combination of various design, data, about their data. Regardless, reverse the PPV for a true research finding
analysis, and presentation factors that bias may be modeled in the same way as decreases as power decreases towards
tend to produce research findings bias above. Also reverse bias should not 1 − β = 0.05. Thus, other factors being
when they should not be produced. be confused with chance variability that equal, research findings are more likely
Let u be the proportion of probed
may lead to missing a true relationship true in scientific fields that undertake
analyses that would not have been
because of chance. large studies, such as randomized
“research findings,” but nevertheless
controlled trials in cardiology (several
end up presented and reported as Testing by Several Independent thousand subjects randomized) [14]
such, because of bias. Bias should not Teams than in scientific fields with small
be confused with chance variability
Several independent teams may be studies, such as most research of
that causes some findings to be false by
addressing the same sets of research molecular predictors (sample sizes 100-
chance even though the study design,
questions. As research efforts are fold smaller) [15].
data, analysis, and presentation are
globalized, it is practically the rule Corollary 2: The smaller the effect
perfect. Bias can entail manipulation
that several research teams, often sizes in a scientific field, the less likely
in the analysis or reporting of findings.
dozens of them, may probe the same the research findings are to be true.
Selective or distorted reporting is a
or similar questions. Unfortunately, in Power is also related to the effect
typical form of such bias. We may
some areas, the prevailing mentality size. Thus research findings are more
assume that u does not depend on
until now has been to focus on likely true in scientific fields with large
whether a true relationship exists
isolated discoveries by single teams effects, such as the impact of smoking
or not. This is not an unreasonable
and interpret research experiments on cancer or cardiovascular disease
assumption, since typically it is
in isolation. An increasing number (relative risks 3–20), than in scientific
impossible to know which relationships
of questions have at least one study fields where postulated effects are
are indeed true. In the presence of bias
claiming a research finding, and small, such as genetic risk factors for
(Table 2), one gets PPV = ([1 − β]R +
this receives unilateral attention. multigenetic diseases (relative risks
uβR)⁄(R + α − βR + u − uα + uβR), and
The probability that at least one 1.1–1.5) [7]. Modern epidemiology is
PPV decreases with increasing u, unless
study, among several done on the increasingly obliged to target smaller
1 − β ≤ α, i.e., 1 − β ≤ 0.05 for most
situations. Thus, with increasing bias,
the chances that a research finding Table 2. Research Findings and True Relationships in the Presence of Bias
is true diminish considerably. This is Research True Relationship
shown for different levels of power and Finding Yes No Total
for different pre-study odds in Figure 1.
Yes (c[1 − β]R + ucβR)/(R + 1) cα + uc(1 − α)/(R + 1) c(R + α − βR + u − uα + uβR)/(R + 1)
Conversely, true research findings
No (1 − u)cβR/(R + 1) (1 − u)c(1 − α)/(R + 1) c(1 − u)(1 − α + βR)/(R + 1)
may occasionally be annulled because Total cR/(R + 1) c/(R + 1) c
of reverse bias. For example, with large
measurement errors relationships DOI: 10.1371/journal.pmed.0020124.t002

PLoS Medicine | www.plosmedicine.org 0697 August 2005 | Volume 2 | Issue 8 | e124


effect sizes [16]. Consequently, the outcomes) [23]. Similarly, fields that
proportion of true research findings use commonly agreed, stereotyped
is expected to decrease. In the same analytical methods (e.g., Kaplan-
line of thinking, if the true effect sizes Meier plots and the log-rank test)
are very small in a scientific field, [24] may yield a larger proportion
this field is likely to be plagued by of true findings than fields where
almost ubiquitous false positive claims. analytical methods are still under
For example, if the majority of true experimentation (e.g., artificial
genetic or nutritional determinants of intelligence methods) and only “best”
complex diseases confer relative risks results are reported. Regardless, even
less than 1.05, genetic or nutritional in the most stringent research designs,
epidemiology would be largely utopian bias seems to be a major problem.
endeavors. For example, there is strong evidence
Corollary 3: The greater the number that selective outcome reporting,
and the lesser the selection of tested with manipulation of the outcomes
relationships in a scientific field, the and analyses reported, is a common
less likely the research findings are to problem even for randomized trails
be true. As shown above, the post-study [25]. Simply abolishing selective
probability that a finding is true (PPV) publication would not make this
depends a lot on the pre-study odds problem go away.
(R). Thus, research findings are more Corollary 5: The greater the financial
likely true in confirmatory designs, and other interests and prejudices
such as large phase III randomized in a scientific field, the less likely
controlled trials, or meta-analyses the research findings are to be true.
thereof, than in hypothesis-generating Conflicts of interest and prejudice may
experiments. Fields considered highly increase bias, u. Conflicts of interest
informative and creative given the are very common in biomedical
wealth of the assembled and tested research [26], and typically they are
information, such as microarrays and inadequately and sparsely reported
other high-throughput discovery- [26,27]. Prejudice may not necessarily
oriented research [4,8,17], should have have financial roots. Scientists in a
extremely low PPV. given field may be prejudiced purely
Corollary 4: The greater the because of their belief in a scientific DOI: 10.1371/journal.pmed.0020124.g001
flexibility in designs, definitions, theory or commitment to their own
Figure 1. PPV (Probability That a Research
outcomes, and analytical modes in findings. Many otherwise seemingly Finding Is True) as a Function of the Pre-Study
a scientific field, the less likely the independent, university-based studies Odds for Various Levels of Bias, u
research findings are to be true. may be conducted for no other reason Panels correspond to power of 0.20, 0.50,
Flexibility increases the potential for than to give physicians and researchers and 0.80.
transforming what would be “negative” qualifications for promotion or tenure.
This seemingly paradoxical corollary
results into “positive” results, i.e., bias, Such nonfinancial conflicts may also
follows because, as stated above, the
u. For several research designs, e.g., lead to distorted reported results and
PPV of isolated findings decreases
randomized controlled trials [18–20] interpretations. Prestigious investigators
when many teams of investigators
or meta-analyses [21,22], there have may suppress via the peer review process
are involved in the same field. This
been efforts to standardize their the appearance and dissemination of
may explain why we occasionally see
conduct and reporting. Adherence to findings that refute their findings, thus
major excitement followed rapidly
common standards is likely to increase condemning their field to perpetuate
by severe disappointments in fields
the proportion of true findings. The false dogma. Empirical evidence
that draw wide attention. With many
same applies to outcomes. True on expert opinion shows that it is
teams working on the same field and
findings may be more common extremely unreliable [28].
with massive experimental data being
when outcomes are unequivocal and Corollary 6: The hotter a
produced, timing is of the essence
universally agreed (e.g., death) rather scientific field (with more scientific
in beating competition. Thus, each
than when multifarious outcomes are teams involved), the less likely the
team may prioritize on pursuing and
devised (e.g., scales for schizophrenia research findings are to be true.
disseminating its most impressive
“positive” results. “Negative” results may
Table 3. Research Findings and True Relationships in the Presence of Multiple Studies become attractive for dissemination
Research True Relationship only if some other team has found
Finding Yes No Total a “positive” association on the same
question. In that case, it may be
Yes cR(1 − βn)/(R + 1) c(1 − [1 − α]n)/(R + 1) c(R + 1 − [1 − α]n − Rβn)/(R + 1)
attractive to refute a claim made in
No cRβn/(R + 1) c(1 − α)n/(R + 1) c([1 − α]n + Rβn)/(R + 1)
Total cR/(R + 1) c/(R + 1) c
some prestigious journal. The term
Proteus phenomenon has been coined
DOI: 10.1371/journal.pmed.0020124.t003 to describe this phenomenon of rapidly

PLoS Medicine | www.plosmedicine.org 0698 August 2005 | Volume 2 | Issue 8 | e124


Box 1. An Example: Science analyses, and reporting so as to make
at Low Pre-Study Odds more relationships cross the p = 0.05
threshold even though this would not
Let us assume that a team of have been crossed with a perfectly
investigators performs a whole genome adhered to design and analysis and with
association study to test whether perfect comprehensive reporting of the
any of 100,000 gene polymorphisms results, strictly according to the original
are associated with susceptibility to study plan. Such manipulation could be
schizophrenia. Based on what we done, for example, with serendipitous
know about the extent of heritability inclusion or exclusion of certain patients
of the disease, it is reasonable to or controls, post hoc subgroup analyses,
expect that probably around ten investigation of genetic contrasts that
gene polymorphisms among those were not originally specified, changes
tested would be truly associated with in the disease or control definitions,
schizophrenia, with relatively similar and various combinations of selective
odds ratios around 1.3 for the ten or so or distorted reporting of the results.
polymorphisms and with a fairly similar Commercially available “data mining”
power to identify any of them. Then packages actually are proud of their
R = 10/100,000 = 10−4, and the pre-study ability to yield statistically significant
probability for any polymorphism to be results through data dredging. In the
associated with schizophrenia is also presence of bias with u = 0.10, the post-
R/(R + 1) = 10−4. Let us also suppose that study probability that a research finding
the study has 60% power to find an is true is only 4.4 × 10−4. Furthermore,
association with an odds ratio of 1.3 at
even in the absence of any bias, when
α = 0.05. Then it can be estimated that
ten independent research teams perform
if a statistically significant association is
similar experiments around the world, if
found with the p-value barely crossing the
one of them finds a formally statistically
0.05 threshold, the post-study probability
significant association, the probability
that this is true increases about 12-fold
that the research finding is true is only
compared with the pre-study probability,
1.5 × 10−4, hardly any higher than the
but it is still only 12 × 10−4.
probability we had before any of this
Now let us suppose that the extensive research was undertaken!
investigators manipulate their design,
DOI: 10.1371/journal.pmed.0020124.g002

Figure 2. PPV (Probability That a Research is hot or has strong invested interests eventually true about 85% of the time.
Finding Is True) as a Function of the Pre-Study may sometimes promote larger studies A fairly similar performance is expected
Odds for Various Numbers of Conducted and improved standards of research,
Studies, n
of a confirmatory meta-analysis of
enhancing the predictive value of its good-quality randomized trials:
Panels correspond to power of 0.20, 0.50,
and 0.80. research findings. Or massive discovery- potential bias probably increases, but
oriented testing may result in such a power and pre-test chances are higher
alternating extreme research claims large yield of significant relationships compared to a single randomized trial.
and extremely opposite refutations that investigators have enough to Conversely, a meta-analytic finding
[29]. Empirical evidence suggests that report and search further and thus from inconclusive studies where
this sequence of extreme opposites is refrain from data dredging and pooling is used to “correct” the low
very common in molecular genetics manipulation. power of single studies, is probably
[29]. false if R ≤ 1:3. Research findings from
These corollaries consider each Most Research Findings Are False underpowered, early-phase clinical
factor separately, but these factors often for Most Research Designs and for trials would be true about one in four
influence each other. For example, Most Fields times, or even less frequently if bias
investigators working in fields where In the described framework, a PPV is present. Epidemiological studies of
true effect sizes are perceived to be exceeding 50% is quite difficult to an exploratory nature perform even
small may be more likely to perform get. Table 4 provides the results worse, especially when underpowered,
large studies than investigators working of simulations using the formulas but even well-powered epidemiological
in fields where true effect sizes are developed for the influence of power, studies may have only a one in
perceived to be large. Or prejudice ratio of true to non-true relationships, five chance being true, if R = 1:10.
may prevail in a hot scientific field, and bias, for various types of situations Finally, in discovery-oriented research
further undermining the predictive that may be characteristic of specific with massive testing, where tested
value of its research findings. Highly study designs and settings. A finding relationships exceed true ones 1,000-
prejudiced stakeholders may even from a well-conducted, adequately fold (e.g., 30,000 genes tested, of which
create a barrier that aborts efforts at powered randomized controlled trial 30 may be the true culprits) [30,31],
obtaining and disseminating opposing starting with a 50% pre-study chance PPV for each claimed relationship is
results. Conversely, the fact that a field that the intervention is effective is extremely low, even with considerable

PLoS Medicine | www.plosmedicine.org 0699 August 2005 | Volume 2 | Issue 8 | e124


standardization of laboratory and lower intake tertiles. Then the claimed spent their careers is a “null field.”
statistical methods, outcomes, and effect sizes are simply measuring However, other lines of evidence,
reporting thereof to minimize bias. nothing else but the net bias that has or advances in technology and
been involved in the generation of experimentation, may lead eventually
Claimed Research Findings this scientific literature. Claimed effect to the dismantling of a scientific field.
May Often Be Simply Accurate sizes are in fact the most accurate Obtaining measures of the net bias
Measures of the Prevailing Bias estimates of the net bias. It even follows in one field may also be useful for
As shown, the majority of modern that between “null fields,” the fields obtaining insight into what might be
biomedical research is operating in that claim stronger effects (often with the range of bias operating in other
areas with very low pre- and post- accompanying claims of medical or fields where similar analytical methods,
study probability for true findings. public health importance) are simply technologies, and conflicts may be
Let us suppose that in a research field those that have sustained the worst operating.
there are no true findings at all to be biases.
discovered. History of science teaches For fields with very low PPV, the few How Can We Improve
us that scientific endeavor has often true relationships would not distort the Situation?
in the past wasted effort in fields with this overall picture much. Even if a Is it unavoidable that most research
absolutely no yield of true scientific few relationships are true, the shape findings are false, or can we improve
information, at least based on our of the distribution of the observed the situation? A major problem is that
current understanding. In such a “null effects would still yield a clear measure it is impossible to know with 100%
field,” one would ideally expect all of the biases involved in the field. This certainty what the truth is in any
observed effect sizes to vary by chance concept totally reverses the way we research question. In this regard, the
around the null in the absence of bias. view scientific results. Traditionally, pure “gold” standard is unattainable.
The extent that observed findings investigators have viewed large However, there are several approaches
deviate from what is expected by and highly significant effects with to improve the post-study probability.
chance alone would be simply a pure excitement, as signs of important Better powered evidence, e.g., large
measure of the prevailing bias. discoveries. Too large and too highly studies or low-bias meta-analyses,
For example, let us suppose that significant effects may actually be more may help, as it comes closer to the
no nutrients or dietary patterns are likely to be signs of large bias in most unknown “gold” standard. However,
actually important determinants for fields of modern research. They should large studies may still have biases
the risk of developing a specific tumor. lead investigators to careful critical and these should be acknowledged
Let us also suppose that the scientific thinking about what might have gone and avoided. Moreover, large-scale
literature has examined 60 nutrients wrong with their data, analyses, and evidence is impossible to obtain for all
and claims all of them to be related to results. of the millions and trillions of research
the risk of developing this tumor with Of course, investigators working in questions posed in current research.
relative risks in the range of 1.2 to 1.4 any field are likely to resist accepting Large-scale evidence should be
for the comparison of the upper to that the whole field in which they have targeted for research questions where
the pre-study probability is already
considerably high, so that a significant
Table 4. PPV of Research Findings for Various Combinations of Power (1 − β), Ratio research finding will lead to a post-test
of True to Not-True Relationships (R), and Bias (u) probability that would be considered
1−β R u Practical Example PPV quite definitive. Large-scale evidence is
also particularly indicated when it can
0.80 1:1 0.10 Adequately powered RCT with little 0.85 test major concepts rather than narrow,
bias and 1:1 pre-study odds specific questions. A negative finding
0.95 2:1 0.30 Confirmatory meta-analysis of good- 0.85
can then refute not only a specific
quality RCTs
proposed claim, but a whole field or
0.80 1:3 0.40 Meta-analysis of small inconclusive 0.41
studies
considerable portion thereof. Selecting
0.20 1:5 0.20 Underpowered, but well-performed 0.23 the performance of large-scale studies
phase I/II RCT based on narrow-minded criteria,
0.20 1:5 0.80 Underpowered, poorly performed 0.17 such as the marketing promotion of a
phase I/II RCT specific drug, is largely wasted research.
0.80 1:10 0.30 Adequately powered exploratory 0.20 Moreover, one should be cautious
epidemiological study that extremely large studies may be
0.20 1:10 0.30 Underpowered exploratory 0.12
more likely to find a formally statistical
epidemiological study
significant difference for a trivial effect
0.20 1:1,000 0.80 Discovery-oriented exploratory 0.0010
research with massive testing
that is not really meaningfully different
0.20 1:1,000 0.20 As in previous example, but 0.0015 from the null [32–34].
with more limited bias (more Second, most research questions
standardized) are addressed by many teams, and
it is misleading to emphasize the
The estimated PPVs (positive predictive values) are derived assuming α = 0.05 for a single study.
RCT, randomized controlled trial.
statistically significant findings of
DOI: 10.1371/journal.pmed.0020124.t004 any single team. What matters is the

PLoS Medicine | www.plosmedicine.org 0700 August 2005 | Volume 2 | Issue 8 | e124


totality of the evidence. Diminishing many relationships are expected to be 19. Ioannidis JP, Evans SJ, Gotzsche PC, O’Neill
RT, Altman DG, et al. (2004) Better reporting
bias through enhanced research true among those probed across the of harms in randomized trials: An extension
standards and curtailing of prejudices relevant research fields and research of the CONSORT statement. Ann Intern Med
may also help. However, this may designs. The wider field may yield some 141: 781–788.
20. International Conference on Harmonisation
require a change in scientific mentality guidance for estimating this probability E9 Expert Working Group (1999) ICH
that might be difficult to achieve. for the isolated research project. Harmonised Tripartite Guideline. Statistical
In some research designs, efforts Experiences from biases detected in principles for clinical trials. Stat Med 18: 1905–
1942.
may also be more successful with other neighboring fields would also be 21. Moher D, Cook DJ, Eastwood S, Olkin I,
upfront registration of studies, e.g., useful to draw upon. Even though these Rennie D, et al. (1999) Improving the quality
randomized trials [35]. Registration assumptions would be considerably of reports of meta-analyses of randomised
controlled trials: The QUOROM statement.
would pose a challenge for hypothesis- subjective, they would still be very Quality of Reporting of Meta-analyses. Lancet
generating research. Some kind of useful in interpreting research claims 354: 1896–1900.
22. Stroup DF, Berlin JA, Morton SC, Olkin I,
registration or networking of data and putting them in context.  Williamson GD, et al. (2000) Meta-analysis
collections or investigators within fields of observational studies in epidemiology:
may be more feasible than registration References A proposal for reporting. Meta-analysis
1. Ioannidis JP, Haidich AB, Lau J (2001) Any of Observational Studies in Epidemiology
of each and every hypothesis- casualties in the clash of randomised and (MOOSE) group. JAMA 283: 2008–2012.
generating experiment. Regardless, observational evidence? BMJ 322: 879–880. 23. Marshall M, Lockwood A, Bradley C,
2. Lawlor DA, Davey Smith G, Kundu D, Adams C, Joy C, et al. (2000) Unpublished
even if we do not see a great deal of Bruckdorfer KR, Ebrahim S (2004) Those rating scales: A major source of bias in
progress with registration of studies confounded vitamins: What can we learn from randomised controlled trials of treatments for
in other fields, the principles of the differences between observational versus schizophrenia. Br J Psychiatry 176: 249–252.
randomised trial evidence? Lancet 363: 1724– 24. Altman DG, Goodman SN (1994) Transfer
developing and adhering to a protocol 1727. of technology from statistical journals to the
could be more widely borrowed from 3. Vandenbroucke JP (2004) When are biomedical literature. Past trends and future
randomized controlled trials. observational studies as credible as randomised predictions. JAMA 272: 129–132.
trials? Lancet 363: 1728–1731. 25. Chan AW, Hrobjartsson A, Haahr MT,
Finally, instead of chasing statistical 4. Michiels S, Koscielny S, Hill C (2005) Gotzsche PC, Altman DG (2004) Empirical
significance, we should improve our Prediction of cancer outcome with microarrays: evidence for selective reporting of outcomes in
A multiple random validation strategy. Lancet
understanding of the range of R 365: 488–492.
randomized trials: Comparison of protocols to
published articles. JAMA 291: 2457–2465.
values—the pre-study odds—where 5. Ioannidis JPA, Ntzani EE, Trikalinos TA, 26. Krimsky S, Rothenberg LS, Stott P, Kyle G
research efforts operate [10]. Before Contopoulos-Ioannidis DG (2001) Replication
(1998) Scientific journals and their authors’
validity of genetic association studies. Nat
running an experiment, investigators financial interests: A pilot study. Psychother
Genet 29: 306–309.
Psychosom 67: 194–201.
should consider what they believe the 6. Colhoun HM, McKeigue PM, Davey Smith
27. Papanikolaou GN, Baltogianni MS,
chances are that they are testing a true G (2003) Problems of reporting genetic
Contopoulos-Ioannidis DG, Haidich AB,
associations with complex outcomes. Lancet
rather than a non-true relationship. Giannakakis IA, et al. (2001) Reporting of
361: 865–872.
conflicts of interest in guidelines of preventive
Speculated high R values may 7. Ioannidis JP (2003) Genetic associations: False
and therapeutic interventions. BMC Med Res
or true? Trends Mol Med 9: 135–138.
sometimes then be ascertained. As 8. Ioannidis JPA (2005) Microarrays and
Methodol 1: 3.
described above, whenever ethically 28. Antman EM, Lau J, Kupelnick B, Mosteller F,
molecular research: Noise discovery? Lancet
365: 454–455. Chalmers TC (1992) A comparison of results
acceptable, large studies with minimal of meta-analyses of randomized control trials
9. Sterne JA, Davey Smith G (2001) Sifting the
bias should be performed on research evidence—What’s wrong with significance tests. and recommendations of clinical experts.
findings that are considered relatively BMJ 322: 226–231. Treatments for myocardial infarction. JAMA
10. Wacholder S, Chanock S, Garcia-Closas M, El 268: 240–248.
established, to see how often they are 29. Ioannidis JP, Trikalinos TA (2005) Early
ghormli L, Rothman N (2004) Assessing the
indeed confirmed. I suspect several probability that a positive report is false: An extreme contradictory estimates may
appear in published research: The Proteus
established “classics” will fail the test approach for molecular epidemiology studies. J
phenomenon in molecular genetics research
Natl Cancer Inst 96: 434–442.
[36]. 11. Risch NJ (2000) Searching for genetic and randomized trials. J Clin Epidemiol 58:
Nevertheless, most new discoveries determinants in the new millennium. Nature 543–549.
405: 847–856. 30. Ntzani EE, Ioannidis JP (2003) Predictive
will continue to stem from hypothesis- ability of DNA microarrays for cancer outcomes
12. Kelsey JL, Whittemore AS, Evans AS,
generating research with low or very Thompson WD (1996) Methods in and correlates: An empirical assessment.
low pre-study odds. We should then observational epidemiology, 2nd ed. New York: Lancet 362: 1439–1444.
Oxford U Press. 432 p. 31. Ransohoff DF (2004) Rules of evidence
acknowledge that statistical significance for cancer molecular-marker discovery and
13. Topol EJ (2004) Failing the public health—
testing in the report of a single study Rofecoxib, Merck, and the FDA. N Engl J Med validation. Nat Rev Cancer 4: 309–314.
gives only a partial picture, without 351: 1707–1709. 32. Lindley DV (1957) A statistical paradox.
14. Yusuf S, Collins R, Peto R (1984) Why do we Biometrika 44: 187–192.
knowing how much testing has been need some large, simple randomized trials? Stat 33. Bartlett MS (1957) A comment on D.V.
done outside the report and in the Med 3: 409–422. Lindley’s statistical paradox. Biometrika 44:
relevant field at large. Despite a large 15. Altman DG, Royston P (2000) What do we 533–534.
mean by validating a prognostic model? Stat 34. Senn SJ (2001) Two cheers for P-values. J
statistical literature for multiple testing Med 19: 453–473. Epidemiol Biostat 6: 193–204.
corrections [37], usually it is impossible 16. Taubes G (1995) Epidemiology faces its limits. 35. De Angelis C, Drazen JM, Frizelle FA, Haug C,
Science 269: 164–169. Hoey J, et al. (2004) Clinical trial registration:
to decipher how much data dredging A statement from the International Committee
17. Golub TR, Slonim DK, Tamayo P, Huard
by the reporting authors or other C, Gaasenbeek M, et al. (1999) Molecular of Medical Journal Editors. N Engl J Med 351:
research teams has preceded a reported classification of cancer: Class discovery 1250–1251.
and class prediction by gene expression 36. Ioannidis JPA (2005) Contradicted and
research finding. Even if determining monitoring. Science 286: 531–537. initially stronger effects in highly cited clinical
this were feasible, this would not 18. Moher D, Schulz KF, Altman DG (2001) research. JAMA 294: 218–228.
inform us about the pre-study odds. The CONSORT statement: Revised 37. Hsueh HM, Chen JJ, Kodell RL (2003)
recommendations for improving the quality Comparison of methods for estimating the
Thus, it is unavoidable that one should of reports of parallel-group randomised trials. number of true null hypotheses in multiplicity
make approximate assumptions on how Lancet 357: 1191–1194. testing. J Biopharm Stat 13: 675–689.

PLoS Medicine | www.plosmedicine.org 0701 August 2005 | Volume 2 | Issue 8 | e124

You might also like