Professional Documents
Culture Documents
Essay
factors that influence this problem and is characteristic of the field and can
Summary some corollaries thereof. vary a lot depending on whether the
There is increasing concern that most field targets highly likely relationships
Modeling the Framework for False or searches for only one or a few
current published research findings are
Positive Findings true relationships among thousands
false. The probability that a research claim
is true may depend on study power and Several methodologists have and millions of hypotheses that may
bias, the number of other studies on the pointed out [9–11] that the high be postulated. Let us also consider,
same question, and, importantly, the ratio rate of nonreplication (lack of for computational simplicity,
of true to no relationships among the confirmation) of research discoveries circumscribed fields where either there
relationships probed in each scientific is a consequence of the convenient, is only one true relationship (among
field. In this framework, a research finding yet ill-founded strategy of claiming many that can be hypothesized) or
is less likely to be true when the studies conclusive research findings solely on the power is similar to find any of the
conducted in a field are smaller; when the basis of a single study assessed by several existing true relationships. The
effect sizes are smaller; when there is a formal statistical significance, typically pre-study probability of a relationship
greater number and lesser preselection for a p-value less than 0.05. Research being true is R⁄(R + 1). The probability
of tested relationships; where there is is not most appropriately represented of a study finding a true relationship
greater flexibility in designs, definitions, and summarized by p-values, but, reflects the power 1 − β (one minus
outcomes, and analytical modes; when unfortunately, there is a widespread the Type II error rate). The probability
there is greater financial and other notion that medical research articles of claiming a relationship when none
interest and prejudice; and when more truly exists reflects the Type I error
teams are involved in a scientific field It can be proven that rate, α. Assuming that c relationships
are being probed in the field, the
in chase of statistical significance. most claimed research expected values of the 2 × 2 table are
Simulations show that for most study
designs and settings, it is more likely for findings are false. given in Table 1. After a research
a research claim to be false than true. finding has been claimed based on
Moreover, for many current scientific should be interpreted based only on achieving formal statistical significance,
fields, claimed research findings may p-values. Research findings are defined the post-study probability that it is true
often be simply accurate measures of the here as any relationship reaching is the positive predictive value, PPV.
prevailing bias. In this essay, I discuss the formal statistical significance, e.g., The PPV is also the complementary
implications of these problems for the effective interventions, informative probability of what Wacholder et al.
conduct and interpretation of research. predictors, risk factors, or associations. have called the false positive report
“Negative” research is also very useful. probability [10]. According to the 2
“Negative” is actually a misnomer, and × 2 table, one gets PPV = (1 − β)R⁄(R
− βR + α). A research finding is thus
P
ublished research findings are the misinterpretation is widespread.
sometimes refuted by subsequent However, here we will target
evidence, with ensuing confusion relationships that investigators claim Citation: Ioannidis JPA (2005) Why most published
and disappointment. Refutation and exist, rather than null findings. research findings are false. PLoS Med 2(8): e124.
controversy is seen across the range of As has been shown previously, the
Copyright: © 2005 John P. A. Ioannidis. This is an
research designs, from clinical trials probability that a research finding open-access article distributed under the terms
and traditional epidemiological studies is indeed true depends on the prior of the Creative Commons Attribution License,
probability of it being true (before which permits unrestricted use, distribution, and
[1–3] to the most modern molecular reproduction in any medium, provided the original
research [4,5]. There is increasing doing the study), the statistical power work is properly cited.
concern that in modern research, false of the study, and the level of statistical
Abbreviation: PPV, positive predictive value
findings may be the majority or even significance [10,11]. Consider a 2 × 2
the vast majority of published research table in which research findings are John P. A. Ioannidis is in the Department of Hygiene
compared against the gold standard and Epidemiology, University of Ioannina School of
claims [6–8]. However, this should Medicine, Ioannina, Greece, and Institute for Clinical
not be surprising. It can be proven of true relationships in a scientific Research and Health Policy Studies, Department of
that most claimed research findings field. In a research field both true and Medicine, Tufts-New England Medical Center, Tufts
University School of Medicine, Boston, Massachusetts,
are false. Here I will examine the key false hypotheses can be made about United States of America. E-mail: jioannid@cc.uoi.gr
the presence of relationships. Let R
be the ratio of the number of “true Competing Interests: The author has declared that
The Essay section contains opinion pieces on topics no competing interests exist.
of broad interest to a general medical audience.
relationships” to “no relationships”
among those tested in the field. R DOI: 10.1371/journal.pmed.0020124
Figure 2. PPV (Probability That a Research is hot or has strong invested interests eventually true about 85% of the time.
Finding Is True) as a Function of the Pre-Study may sometimes promote larger studies A fairly similar performance is expected
Odds for Various Numbers of Conducted and improved standards of research,
Studies, n
of a confirmatory meta-analysis of
enhancing the predictive value of its good-quality randomized trials:
Panels correspond to power of 0.20, 0.50,
and 0.80. research findings. Or massive discovery- potential bias probably increases, but
oriented testing may result in such a power and pre-test chances are higher
alternating extreme research claims large yield of significant relationships compared to a single randomized trial.
and extremely opposite refutations that investigators have enough to Conversely, a meta-analytic finding
[29]. Empirical evidence suggests that report and search further and thus from inconclusive studies where
this sequence of extreme opposites is refrain from data dredging and pooling is used to “correct” the low
very common in molecular genetics manipulation. power of single studies, is probably
[29]. false if R ≤ 1:3. Research findings from
These corollaries consider each Most Research Findings Are False underpowered, early-phase clinical
factor separately, but these factors often for Most Research Designs and for trials would be true about one in four
influence each other. For example, Most Fields times, or even less frequently if bias
investigators working in fields where In the described framework, a PPV is present. Epidemiological studies of
true effect sizes are perceived to be exceeding 50% is quite difficult to an exploratory nature perform even
small may be more likely to perform get. Table 4 provides the results worse, especially when underpowered,
large studies than investigators working of simulations using the formulas but even well-powered epidemiological
in fields where true effect sizes are developed for the influence of power, studies may have only a one in
perceived to be large. Or prejudice ratio of true to non-true relationships, five chance being true, if R = 1:10.
may prevail in a hot scientific field, and bias, for various types of situations Finally, in discovery-oriented research
further undermining the predictive that may be characteristic of specific with massive testing, where tested
value of its research findings. Highly study designs and settings. A finding relationships exceed true ones 1,000-
prejudiced stakeholders may even from a well-conducted, adequately fold (e.g., 30,000 genes tested, of which
create a barrier that aborts efforts at powered randomized controlled trial 30 may be the true culprits) [30,31],
obtaining and disseminating opposing starting with a 50% pre-study chance PPV for each claimed relationship is
results. Conversely, the fact that a field that the intervention is effective is extremely low, even with considerable