Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
John P. A. Ioannidis Why Most Published Research Findings Are False

John P. A. Ioannidis Why Most Published Research Findings Are False

Ratings: (0)|Views: 22|Likes:
Published by mrwonkish
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

More info:

Categories:Types, Research
Published by: mrwonkish on Jan 04, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

09/02/2013

pdf

text

original

 
PLoS Medicine | www.plosmedicine.org0696
Essay
Open access, freely available online
August 2005 | Volume 2 | Issue 8 | e124
P
ublished research findings aresometimes refuted by subsequent evidence, with ensuing confusionand disappointment. Refutation andcontroversy is seen across the range of research designs, from clinical trialsand traditional epidemiological studies[1–3] to the most modern molecularresearch [4,5]. There is increasingconcern that in modern research, falsefindings may be the majority or eventhe vast majority of published researchclaims [6–8]. However, this shouldnot be surprising. It can be proventhat most claimed research findingsare false. Here I will examine the key factors that influence this problem andsome corollaries thereof.
Modeling the Framework for FalsePositive Findings
Several methodologists havepointed out [9–11] that the highrate of nonreplication (lack of confirmation) of research discoveriesis a consequence of the convenient, yet ill-founded strategy of claimingconclusive research findings solely onthe basis of a single study assessed by formal statistical significance, typically for a
 p 
-value less than 0.05. Researchis not most appropriately representedand summarized by 
 p 
-values, but,unfortunately, there is a widespreadnotion that medical research articlesshould be interpreted based only on
 p 
-values. Research findings are definedhere as any relationship reachingformal statistical significance, e.g.,effective interventions, informativepredictors, risk factors, or associations.“Negative” research is also very useful.“Negative” is actually a misnomer, andthe misinterpretation is widespread.However, here we will target relationships that investigators claimexist, rather than null findings. As has been shown previously, theprobability that a research findingis indeed true depends on the priorprobability of it being true (beforedoing the study), the statistical powerof the study, and the level of statisticalsignificance [10,11]. Consider a 2 × 2table in which research findings arecompared against the gold standardof true relationships in a scientificfield. In a research field both true andfalse hypotheses can be made about the presence of relationships. Let 
R
 be the ratio of the number of “truerelationships” to “no relationships”among those tested in the field.
R
 is characteristic of the field and can vary a lot depending on whether thefield targets highly likely relationshipsor searches for only one or a few true relationships among thousandsand millions of hypotheses that may be postulated. Let us also consider,for computational simplicity,circumscribed fields where either thereis only one true relationship (amongmany that can be hypothesized) orthe power is similar to find any of theseveral existing true relationships. Thepre-study probability of a relationshipbeing true is
R
⁄(
R
+ 1). The probability of a study finding a true relationshipreflects the power 1 − β (one minusthe Type II error rate). The probability of claiming a relationship when nonetruly exists reflects the Type I errorrate, α. Assuming that 
relationshipsare being probed in the field, theexpected values of the 2 × 2 table aregiven in Table 1. After a researchfinding has been claimed based onachieving formal statistical significance,the post-study probability that it is trueis the positive predictive value, PPV.The PPV is also the complementary probability of what Wacholder et al.have called the false positive report probability [10]. According to the 2× 2 table, one gets PPV = (1 − β)
R
⁄(
R
 − βR + α). A research finding is thus
 The Essay section contains opinion pieces on topicsof broad interest to a general medical audience.
Why Most Published Research FindingsAre False
 John P. A. Ioannidis
Citation:
Ioannidis JPA (2005) Why most publishedresearch findings are false. PLoS Med 2(8): e124.
Copyright:
© 2005 John P. A. Ioannidis. This is anopen-access article distributed under the termsof the Creative Commons Attribution License,which permits unrestricted use, distribution, andreproduction in any medium, provided the originalwork is properly cited.
Abbreviation:
PPV, positive predictive valueJohn P. A. Ioannidis is in the Department of Hygieneand Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for ClinicalResearch and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, TuftsUniversity School of Medicine, Boston, Massachusetts,United States of America. E-mail: jioannid@cc.uoi.gr
Competing Interests:
The author has declared thatno competing interests exist.
DOI:
10.1371/journal.pmed.0020124
Summary
 There is increasing concern that mostcurrent published research findings arefalse. The probability that a research claimis true may depend on study power andbias, the number of other studies on thesame question, and, importantly, the ratioof true to no relationships among therelationships probed in each scientificfield. In this framework, a research findingis less likely to be true when the studiesconducted in a field are smaller; wheneffect sizes are smaller; when there is agreater number and lesser preselectionof tested relationships; where there isgreater flexibility in designs, definitions,outcomes, and analytical modes; whenthere is greater financial and otherinterest and prejudice; and when moreteams are involved in a scientific fieldin chase of statistical significance.Simulations show that for most studydesigns and settings, it is more likely fora research claim to be false than true.Moreover, for many current scientificfields, claimed research findings mayoften be simply accurate measures of theprevailing bias. In this essay, I discuss theimplications of these problems for theconduct and interpretation of research.
It can be proven thatmost claimed researchfindings are false.
 
PLoS Medicine | www.plosmedicine.org0697
more likely true than false if (1 − β)
R
 > α. Since usually the vast majority of investigators depend on α = 0.05, thismeans that a research finding is morelikely true than false if (1 − β)
R
> 0.05. What is less well appreciated isthat bias and the extent of repeatedindependent testing by different teamsof investigators around the globe may further distort this picture and may lead to even smaller probabilities of theresearch findings being indeed true. We will try to model these two factors inthe context of similar 2 × 2 tables.
Bias
First, let us define bias as thecombination of various design, data,analysis, and presentation factors that tend to produce research findings when they should not be produced.Let 
be the proportion of probedanalyses that would not have been“research findings,” but neverthelessend up presented and reported assuch, because of bias. Bias should not be confused with chance variability that causes some findings to be false by chance even though the study design,data, analysis, and presentation areperfect. Bias can entail manipulationin the analysis or reporting of findings.Selective or distorted reporting is atypical form of such bias. We may assume that 
does not depend on whether a true relationship existsor not. This is not an unreasonableassumption, since typically it isimpossible to know which relationshipsare indeed true. In the presence of bias(Table 2), one gets PPV = ([1 − β]
R
+
β
R
)⁄(
R
+ α − β
R
+
α +
β
R
), andPPV decreases with increasing
, unless1 − β ≤ α, i.e., 1 − β ≤ 0.05 for most situations. Thus, with increasing bias,the chances that a research findingis true diminish considerably. This isshown for different levels of power andfor different pre-study odds in Figure 1.Conversely, true research findingsmay occasionally be annulled becauseof reverse bias. For example, with largemeasurement errors relationshipsare lost in noise [12], or investigatorsuse data inefficiently or fail to noticestatistically significant relationships, orthere may be conflicts of interest that tend to “bury” significant findings [13].There is no good large-scale empiricalevidence on how frequently suchreverse bias may occur across diverseresearch fields. However, it is probably fair to say that reverse bias is not ascommon. Moreover measurement errors and inefficient use of data areprobably becoming less frequent problems, since measurement error hasdecreased with technological advancesin the molecular era and investigatorsare becoming increasingly sophisticatedabout their data. Regardless, reversebias may be modeled in the same way asbias above. Also reverse bias should not be confused with chance variability that may lead to missing a true relationshipbecause of chance.
Testing by Several IndependentTeams
Several independent teams may beaddressing the same sets of researchquestions. As research efforts areglobalized, it is practically the rulethat several research teams, oftendozens of them, may probe the sameor similar questions. Unfortunately, insome areas, the prevailing mentality until now has been to focus onisolated discoveries by single teamsand interpret research experimentsin isolation. An increasing numberof questions have at least one study claiming a research finding, andthis receives unilateral attention.The probability that at least onestudy, among several done on thesame question, claims a statistically significant research finding is easy toestimate. For
independent studies of equal power, the 2 × 2 table is shown inTable 3: PPV =
R
(1 − β
)⁄(
R
+ 1 − [1 −α]
R
β
) (not considering bias). Withincreasing number of independent studies, PPV tends to decrease, unless1 − β < α, i.e., typically 1 − β < 0.05.This is shown for different levels of power and for different pre-study oddsin Figure 2. For
studies of different power, the term β
is replaced by theproduct of the terms β
for
= 1 to
,but inferences are similar.
Corollaries
 A practical example is shown in Box1. Based on the above considerations,one may deduce several interestingcorollaries about the probability that aresearch finding is indeed true.
Corollary 1: The smaller the studiesconducted in a scientific field, the lesslikely the research findings are to betrue.
Small sample size means smallerpower and, for all functions above,the PPV for a true research findingdecreases as power decreases towards1 − β = 0.05. Thus, other factors beingequal, research findings are more likely true in scientific fields that undertakelarge studies, such as randomizedcontrolled trials in cardiology (severalthousand subjects randomized) [14]than in scientific fields with smallstudies, such as most research of molecular predictors (sample sizes 100-fold smaller) [15].
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
 Power is also related to the effect size. Thus research findings are morelikely true in scientific fields with largeeffects, such as the impact of smokingon cancer or cardiovascular disease(relative risks 3–20), than in scientificfields where postulated effects aresmall, such as genetic risk factors formultigenetic diseases (relative risks1.1–1.5) [7]. Modern epidemiology isincreasingly obliged to target smaller
Table 1.
Research Findings and True Relationships
ResearchFindingTrue RelationshipYesNoTotal
Yes
(1 −
β
)
R
 /(
R
+ 1)
α
 /(
R
+ 1)
(
R
+
α
β
R
)/(
R
+ 1)No
β
R
 /(
R
+ 1)
(1 −
α
)/(
R
+ 1)
(1 −
α
+
β
R
)/(
R
+ 1) Total
cR
 /(
R
+ 1)
 /(
R
+ 1)
DOI: 10.1371/journal.pmed.0020124.t001
Table 2.
Research Findings and True Relationships in the Presence of Bias
ResearchFindingTrue RelationshipYesNoTotal
Yes(
[1 −
β
]
R
+
uc 
β
R
)/(
R
+ 1)
α
+
uc 
(1 −
α
)/(
R
+ 1)
(
R
+
α
β
R
+
u
u
α
+
u
β
R
)/(
R
+ 1)No(1
u
)
β
R
 /(
R
+ 1)(1
u
)
(1 −
α
)/(
R
+ 1)
(1 −
u
)(1 −
α
+
β
R
)/(
R
+ 1) Total
cR/ 
(
R
+ 1)
 /(
R
+ 1)
DOI: 10.1371/journal.pmed.0020124.t002
August 2005 | Volume 2 | Issue 8 | e124
 
PLoS Medicine | www.plosmedicine.org0698
effect sizes [16]. Consequently, theproportion of true research findingsis expected to decrease. In the sameline of thinking, if the true effect sizesare very small in a scientific field,this field is likely to be plagued by almost ubiquitous false positive claims.For example, if the majority of truegenetic or nutritional determinants of complex diseases confer relative risksless than 1.05, genetic or nutritionalepidemiology would be largely utopianendeavors.
Corollary 3: The greater the numberand the lesser the selection of testedrelationships in a scientific field, theless likely the research findings are tobe true.
As shown above, the post-study probability that a finding is true (PPV)depends a lot on the pre-study odds
(R) 
. Thus, research findings are morelikely true in confirmatory designs,such as large phase III randomizedcontrolled trials, or meta-analysesthereof, than in hypothesis-generatingexperiments. Fields considered highly informative and creative given the wealth of the assembled and testedinformation, such as microarrays andother high-throughput discovery-oriented research [4,8,17], should haveextremely low PPV.
Corollary 4: The greater the flexibility in designs, definitions,outcomes, and analytical modes ina scientific field, the less likely theresearch findings are to be true.
 Flexibility increases the potential fortransforming what would be “negative”results into “positive” results, i.e., bias,
. For several research designs, e.g.,randomized controlled trials [18–20]or meta-analyses [21,22], there havebeen efforts to standardize theirconduct and reporting. Adherence tocommon standards is likely to increasethe proportion of true findings. Thesame applies to outcomes. Truefindings may be more common when outcomes are unequivocal anduniversally agreed (e.g., death) ratherthan when multifarious outcomes aredevised (e.g., scales for schizophreniaoutcomes) [23]. Similarly, fields that use commonly agreed, stereotypedanalytical methods (e.g., Kaplan-Meier plots and the log-rank test)[24] may yield a larger proportionof true findings than fields whereanalytical methods are still underexperimentation (e.g., artificialintelligence methods) and only “best”results are reported. Regardless, evenin the most stringent research designs,bias seems to be a major problem.For example, there is strong evidencethat selective outcome reporting, with manipulation of the outcomesand analyses reported, is a commonproblem even for randomized trails[25]. Simply abolishing selectivepublication would not make thisproblem go away.
Corollary 5: The greater the financialand other interests and prejudicesin a scientific field, the less likely the research findings are to be true.
 Conflicts of interest and prejudice may increase bias,
. Conflicts of interest are very common in biomedicalresearch [26], and typically they areinadequately and sparsely reported[26,27]. Prejudice may not necessarily have financial roots. Scientists in agiven field may be prejudiced purely because of their belief in a scientifictheory or commitment to their ownfindings. Many otherwise seemingly independent, university-based studiesmay be conducted for no other reasonthan to give physicians and researchersqualifications for promotion or tenure.Such nonfinancial conflicts may alsolead to distorted reported results andinterpretations. Prestigious investigatorsmay suppress via the peer review processthe appearance and dissemination of findings that refute their findings, thuscondemning their field to perpetuatefalse dogma. Empirical evidenceon expert opinion shows that it isextremely unreliable [28].
Corollary 6: The hotter a scientific field (with more scientificteams involved), the less likely theresearch findings are to be true.
 This seemingly paradoxical corollary follows because, as stated above, thePPV of isolated findings decreases when many teams of investigatorsare involved in the same field. Thismay explain why we occasionally seemajor excitement followed rapidly by severe disappointments in fieldsthat draw wide attention. With many teams working on the same field and with massive experimental data beingproduced, timing is of the essencein beating competition. Thus, eachteam may prioritize on pursuing anddisseminating its most impressive“positive” results. “Negative” results may become attractive for disseminationonly if some other team has founda “positive” association on the samequestion. In that case, it may beattractive to refute a claim made insome prestigious journal. The termProteus phenomenon has been coinedto describe this phenomenon of rapidly 
Table 3.
Research Findings and True Relationships in the Presence of Multiple Studies
ResearchFindingTrue RelationshipYesNoTotal
Yes
cR
(1 −
β
n
)/(
R
+ 1)
(1 − [1 −
α
]
n
)/(
R
+ 1)
(
R
+ 1 − [1 −
α
]
n
R
β
n
)/(
R
+ 1)No
cR
β
n
 /(
R
+ 1)
(1 −
α
)
n
 /(
R
+ 1)
([1 −
α
]
n
+
R
β
n
)/(
R
+ 1) Total
cR
 /(
R
+ 1)
 /(
R
+ 1)
DOI: 10.1371/journal.pmed.0020124.t003
DOI: 10.1371/journal.pmed.0020124.g001
Figure 1.
PPV (Probability That a ResearchFinding Is True) as a Function of the Pre-StudyOdds for Various Levels of Bias,
u
Panels correspond to power of 0.20, 0.50,and 0.80.
August 2005 | Volume 2 | Issue 8 | e124

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->