You are on page 1of 17

Savepaperandfollow@newyorkeronTwitter

AnnalsofScience

DECEMBER13,2010ISSUE

TheTruthWearsOff
Istheresomethingwrongwiththescientificmethod?
BYJONAHLEHRER

Manyresultsthatarerigorouslyprovedandaccepted
startshrinkinginlaterstudies.
LAURENTCILLUFFO

nSeptember18,2007,afewdozen
neuroscientists,psychiatrists,and
drugcompanyexecutivesgatheredina
hotelconferenceroominBrusselstohear
somestartlingnews.Ithadtodowithaclass
ofdrugsknownasatypicalorsecondgenerationantipsychotics,
whichcameonthemarketintheearlynineties.Thedrugs,soldunder
brandnamessuchasAbilify,Seroquel,andZyprexa,hadbeentested
onschizophrenicsinseverallargeclinicaltrials,allofwhichhad
demonstratedadramaticdecreaseinthesubjectspsychiatric
symptoms.Asaresult,secondgenerationantipsychoticshadbecome
oneofthefastestgrowingandmostprofitablepharmaceutical
classes.By2001,EliLillysZyprexawasgeneratingmorerevenue
thanProzac.Itremainsthecompanystopsellingdrug.

ButthedatapresentedattheBrusselsmeetingmadeitclearthat
somethingstrangewashappening:thetherapeuticpowerofthedrugs
appearedtobesteadilywaning.Arecentstudyshowedaneffectthat
waslessthanhalfofthatdocumentedinthefirsttrials,intheearly
nineteennineties.Manyresearchersbegantoarguethattheexpensive
pharmaceuticalswerentanybetterthanfirstgeneration
antipsychotics,whichhavebeeninusesincethefifties.Infact,
sometimestheynowlookevenworse,JohnDavis,aprofessorof
psychiatryattheUniversityofIllinoisatChicago,toldme.
Beforetheeffectivenessofadrugcanbeconfirmed,itmustbetested
andtestedagain.Differentscientistsindifferentlabsneedtorepeat
theprotocolsandpublishtheirresults.Thetestofreplicability,asits
known,isthefoundationofmodernresearch.Replicabilityishowthe
communityenforcesitself.Itsasafeguardforthecreepof
subjectivity.Mostofthetime,scientistsknowwhatresultstheywant,
andthatcaninfluencetheresultstheyget.Thepremiseof
replicabilityisthatthescientificcommunitycancorrectforthese
flaws.
Butnowallsortsofwellestablished,multiplyconfirmedfindings
havestartedtolookincreasinglyuncertain.Itsasifourfactswere
losingtheirtruth:claimsthathavebeenenshrinedintextbooksare
suddenlyunprovable.Thisphenomenondoesntyethaveanofficial
name,butitsoccurringacrossawiderangeoffields,from
psychologytoecology.Inthefieldofmedicine,thephenomenon
seemsextremelywidespread,affectingnotonlyantipsychoticsbut
alsotherapiesrangingfromcardiacstentstoVitaminEand
antidepressants:Davishasaforthcominganalysisdemonstratingthat
theefficacyofantidepressantshasgonedownasmuchasthreefoldin
recentdecades.

Formanyscientists,theeffectisespeciallytroublingbecauseofwhat
itexposesaboutthescientificprocess.Ifreplicationiswhatseparates
therigorofsciencefromthesquishinessofpseudoscience,wheredo
weputalltheserigorouslyvalidatedfindingsthatcannolongerbe
proved?Whichresultsshouldwebelieve?FrancisBacon,theearly
modernphilosopherandpioneerofthescientificmethod,once
declaredthatexperimentswereessential,becausetheyallowedusto
putnaturetothequestion.Butitappearsthatnatureoftengivesus
differentanswers.

onathanSchoolerwasayounggraduatestudentattheUniversity
ofWashingtoninthenineteeneightieswhenhediscovereda
surprisingnewfactaboutlanguageandmemory.Atthetime,itwas
widelybelievedthattheactofdescribingourmemoriesimproved
them.But,inaseriesofcleverexperiments,Schoolerdemonstrated
thatsubjectsshownafaceandaskedtodescribeitweremuchless
likelytorecognizethefacewhenshownitlaterthanthosewhohad
simplylookedatit.Schoolercalledthephenomenonverbal
overshadowing.
Thestudyturnedhimintoanacademicstar.Sinceitsinitial
publication,in1990,ithasbeencitedmorethanfourhundredtimes.
Beforelong,Schoolerhadextendedthemodeltoavarietyofother
tasks,suchasrememberingthetasteofawine,identifyingthebest
strawberryjam,andsolvingdifficultcreativepuzzles.Ineach
instance,askingpeopletoputtheirperceptionsintowordsledto
dramaticdecreasesinperformance.
ButwhileSchoolerwaspublishingtheseresultsinhighlyreputable
journals,asecretworrygnawedathim:itwasprovingdifficultto
replicatehisearlierfindings.Idoftenstillseeaneffect,butthe
effectjustwouldntbeasstrong,hetoldme.Itwasasifverbal
overshadowing,mybignewidea,wasgettingweaker.Atfirst,he
assumedthathedmadeanerrorinexperimentaldesignora

statisticalmiscalculation.Buthecouldntfindanythingwrongwith
hisresearch.Hethenconcludedthathisinitialbatchofresearch
subjectsmusthavebeenunusuallysusceptibletoverbal
overshadowing.(JohnDavis,similarly,hasspeculatedthatpartofthe
dropoffintheeffectivenessofantipsychoticscanbeattributedto
usingsubjectswhosufferfrommilderformsofpsychosiswhichare
lesslikelytoshowdramaticimprovement.)Itwasntavery
satisfyingexplanation,Schoolersays.Oneofmymentorstoldme
thatmyrealmistakewastryingtoreplicatemywork.Hetoldme
doingthatwasjustsettingmyselfupfordisappointment.
Schoolertriedtoputtheproblemoutofhismindhiscolleagues
assuredhimthatsuchthingshappenedallthetime.Overthenextfew
years,hefoundnewresearchquestions,gotmarriedandhadkids.
Buthisreplicationproblemkeptongettingworse.Hisfirstattemptat
replicatingthe1990study,in1995,resultedinaneffectthatwas
thirtypercentsmaller.Thenextyear,thesizeoftheeffectshrank
anotherthirtypercent.WhenotherlabsrepeatedSchoolers
experiments,theygotasimilarspreadofdata,withadistinct
downwardtrend.Thiswasprofoundlyfrustrating,hesays.Itwas
asifnaturegavemethisgreatresultandthentriedtotakeitback.In
private,Schoolerbeganreferringtotheproblemascosmic
habituation,byanalogytothedecreaseinresponsethatoccurswhen
individualshabituatetoparticularstimuli.Habituationiswhyyou
dontnoticethestuffthatsalwaysthere,Schoolersays.Itsan
inevitableprocessofadjustment,aratchetingdownofexcitement.I
startedjokingthatitwaslikethecosmoswashabituatingtomyideas.
Itookitverypersonally.
SchoolerisnowatenuredprofessorattheUniversityofCaliforniaat
SantaBarbara.Hehascurlyblackhair,palegreeneyes,andthe
relaxeddemeanorofsomeonewholivesfiveminutesawayfromhis
favoritebeach.Whenhespeaks,hetendstogetdistractedbyhisown
digressions.Hemightbeginwithapointaboutmemory,which

remindshimofafavoriteWilliamJamesquote,whichinspiresalong
soliloquyontheimportanceofintrospection.Beforelong,were
lookingatpicturesfromBurningManonhisiPhone,whichleadsus
backtothefragilenatureofmemory.
Althoughverbalovershadowingremainsawidelyacceptedtheory
itsofteninvokedinthecontextofeyewitnesstestimony,forinstance
Schoolerisstillalittlepeevedatthecosmos.IknowIshouldjust
moveonalready,hesays.Ireallyshouldstoptalkingaboutthis.
ButIcant.Thatsbecauseheisconvincedthathehasstumbledona
seriousproblem,onethatafflictsmanyofthemostexcitingnewideas
inpsychology.
Oneofthefirstdemonstrationsofthismysteriousphenomenoncame
intheearlynineteenthirties.JosephBanksRhine,apsychologistat
Duke,haddevelopedaninterestinthepossibilityofextrasensory
perception,orE.S.P.RhinedevisedanexperimentfeaturingZener
cards,aspecialdeckoftwentyfivecardsprintedwithoneoffive
differentsymbols:acardwasdrawnfromthedeckandthesubject
wasaskedtoguessthesymbol.MostofRhinessubjectsguessed
abouttwentypercentofthecardscorrectly,asyoudexpect,butan
undergraduatenamedAdamLinzmayeraveragednearlyfiftypercent
duringhisinitialsessions,andpulledoffseveraluncannystreaks,
suchasguessingninecardsinarow.Theoddsofthishappeningby
chanceareaboutoneintwomillion.Linzmayerdiditthreetimes.
Rhinedocumentedthesestunningresultsinhisnotebookand
preparedseveralpapersforpublication.Butthen,justashebeganto
believeinthepossibilityofextrasensoryperception,thestudentlost
hisspookytalent.Between1931and1933,Linzmayerguessedatthe
identityofanotherseveralthousandcards,buthissuccessratewas
nowbarelyabovechance.Rhinewasforcedtoconcludethatthe
studentsextrasensoryperceptionabilityhasgonethroughamarked
decline.AndLinzmayerwasnttheonlysubjecttoexperiencesucha

dropoff:innearlyeverycaseinwhichRhineandothersdocumented
E.S.P.theeffectdramaticallydiminishedovertime.Rhinecalledthis
trendthedeclineeffect.
SchoolerwasfascinatedbyRhinesexperimentalstruggles.Herewas
ascientistwhohadrepeatedlydocumentedthedeclineofhisdatahe
seemedtohaveatalentforfindingresultsthatfellapart.In2004,
SchoolerembarkedonanironicimitationofRhinesresearch:he
triedtoreplicatethisfailuretoreplicate.InhomagetoRhines
interests,hedecidedtotestforaparapsychologicalphenomenon
knownasprecognition.Theexperimentitselfwasstraightforward:he
flashedasetofimagestoasubjectandaskedhimorhertoidentify
eachone.Mostofthetime,theresponsewasnegativetheimages
weredisplayedtooquicklytoregister.ThenSchoolerrandomly
selectedhalfoftheimagestobeshownagain.Whathewantedto
knowwaswhethertheimagesthatgotasecondshowingweremore
likelytohavebeenidentifiedthefirsttimearound.Couldsubsequent
exposurehavesomehowinfluencedtheinitialresults?Couldthe
effectbecomethecause?
Thecrazinessofthehypothesiswasthepoint:Schoolerknowsthat
precognitionlacksascientificexplanation.Buthewasnttesting
extrasensorypowershewastestingthedeclineeffect.Atfirst,the
datalookedamazing,justaswedexpected,Schoolersays.I
couldntbelievetheamountofprecognitionwewerefinding.But
then,aswekeptonrunningsubjects,theeffectsizeastandard
statisticalmeasurekeptongettingsmallerandsmaller.The
scientistseventuallytestedmorethantwothousandundergraduates.
Intheend,ourresultslookedjustlikeRhines,Schoolersaid.We
foundthisstrongparanormaleffect,butitdisappearedonus.
Themostlikelyexplanationforthedeclineisanobviousone:
regressiontothemean.Astheexperimentisrepeated,thatis,anearly
statisticalflukegetscancelledout.Theextrasensorypowersof

Schoolerssubjectsdidntdeclinetheyweresimplyanillusionthat
vanishedovertime.AndyetSchoolerhasnoticedthatmanyofthe
datasetsthatendupdecliningseemstatisticallysolidthatis,they
containenoughdatathatanyregressiontothemeanshouldntbe
dramatic.Thesearetheresultsthatpassallthetests,hesays.The
oddsofthembeingrandomaretypicallyquiteremote,likeoneina
million.Thismeansthatthedeclineeffectshouldalmostnever
happen.Butithappensallthetime!Hell,itshappenedtome
multipletimes.AndthisiswhySchoolerbelievesthatthedecline
effectdeservesmoreattention:itsubiquityseemstoviolatethelaws
ofstatistics.WheneverIstarttalkingaboutthis,scientistsgetvery
nervous,hesays.ButIstillwanttoknowwhathappenedtomy
results.Likemostscientists,Iassumedthatitwouldgeteasierto
documentmyeffectovertime.Idgetbetteratdoingthe
experiments,atzeroinginontheconditionsthatproduceverbal
overshadowing.Sowhydidtheoppositehappen?Imconvincedthat
wecanusethetoolsofsciencetofigurethisout.First,though,we
havetoadmitthatwevegotaproblem.

n1991,theDanishzoologistAndersMller,atUppsalaUniversity,
inSweden,madearemarkablediscoveryaboutsex,barnswallows,
andsymmetry.Ithadlongbeenknownthattheasymmetrical
appearanceofacreaturewasdirectlylinkedtotheamountof
mutationinitsgenome,sothatmoremutationsledtomore
fluctuatingasymmetry.(Aneasywaytomeasureasymmetryin
humansistocomparethelengthofthefingersoneachhand.)What
Mllerdiscoveredisthatfemalebarnswallowswerefarmorelikely
tomatewithmalebirdsthathadlong,symmetricalfeathers.This
suggestedthatthepickyfemaleswereusingsymmetryasaproxyfor
thequalityofmalegenes.Mllerspaper,whichwaspublishedin
Nature,setoffafrenzyofresearch.Herewasaneasilymeasured,
widelyapplicableindicatorofgeneticquality,andfemalescouldbe
showntogravitatetowardit.Aestheticswasreallyaboutgenetics.

Inthethreeyearsfollowing,thereweretenindependenttestsofthe
roleoffluctuatingasymmetryinsexualselection,andnineofthem
foundarelationshipbetweensymmetryandmalereproductive
success.Itdidntmatterifscientistswerelookingatthehairsonfruit
fliesorreplicatingtheswallowstudiesfemalesseemedtoprefer
maleswithmirroredhalves.Beforelong,thetheorywasappliedto
humans.Researchersfound,forinstance,thatwomenpreferredthe
smellofsymmetricalmen,butonlyduringthefertilephaseofthe
menstrualcycle.Otherstudiesclaimedthatfemaleshadmore
orgasmswhentheirpartnersweresymmetrical,whileapaperby
anthropologistsatRutgersanalyzedfortyJamaicandanceroutines
anddiscoveredthatsymmetricalmenwereconsistentlyratedasbetter
dancers.
Thenthetheorystartedtofallapart.In1994,therewerefourteen
publishedtestsofsymmetryandsexualselection,andonlyeight
foundacorrelation.In1995,therewereeightpapersonthesubject,
andonlyfourgotapositiveresult.By1998,whenthereweretwelve
additionalinvestigationsoffluctuatingasymmetry,onlyathirdof
themconfirmedthetheory.Worsestill,eventhestudiesthatyielded
somepositiveresultshowedasteadilydecliningeffectsize.Between
1992and1997,theaverageeffectsizeshrankbyeightypercent.
Anditsnotjustfluctuatingasymmetry.In2001,MichaelJennions,a
biologistattheAustralianNationalUniversity,setouttoanalyze
temporaltrendsacrossawiderangeofsubjectsinecologyand
evolutionarybiology.Helookedathundredsofpapersandfortyfour
metaanalyses(thatis,statisticalsynthesesofrelatedstudies),and
discoveredaconsistentdeclineeffectovertime,asmanyofthe
theoriesseemedtofadeintoirrelevance.Infact,evenwhennumerous
variableswerecontrolledforJennionsknew,forinstance,thatthe
sameauthormightpublishseveralcriticalpapers,whichcoulddistort
hisanalysistherewasstillasignificantdecreaseinthevalidityof
thehypothesis,oftenwithinayearofpublication.Jennionsadmits

thathisfindingsaretroubling,butexpressesareluctancetotalkabout
thempublicly.Thisisaverysensitiveissueforscientists,hesays.
Youknow,weresupposedtobedealingwithhardfacts,thestuff
thatssupposedtostandthetestoftime.Butwhenyouseethese
trendsyoubecomealittlemoreskepticalofthings.
Whathappened?LeighSimmons,abiologistattheUniversityof
WesternAustralia,suggestedoneexplanationwhenhetoldmeabout
hisinitialenthusiasmforthetheory:Iwasreallyexcitedby
fluctuatingasymmetry.Theearlystudiesmadetheeffectlookvery
robust.Hedecidedtoconductafewexperimentsofhisown,
investigatingsymmetryinmalehornedbeetles.Unfortunately,I
couldntfindtheeffect,hesaid.ButtheworstpartwasthatwhenI
submittedthesenullresultsIhaddifficultygettingthempublished.
Thejournalsonlywantedconfirmingdata.Itwastooexcitinganidea
todisprove,atleastbackthen.ForSimmons,thesteepriseandslow
falloffluctuatingasymmetryisaclearexampleofascientific
paradigm,oneofthoseintellectualfadsthatbothguideandconstrain
research:afteranewparadigmisproposed,thepeerreviewprocess
istiltedtowardpositiveresults.Butthen,afterafewyears,the
academicincentivesshifttheparadigmhasbecomeentrenchedso
thatthemostnotableresultsarenowthosethatdisprovethetheory.
Jennions,similarly,arguesthatthedeclineeffectislargelyaproduct
ofpublicationbias,orthetendencyofscientistsandscientific
journalstopreferpositivedataovernullresults,whichiswhat
happenswhennoeffectisfound.Thebiaswasfirstidentifiedbythe
statisticianTheodoreSterling,in1959,afterhenoticedthatninety
sevenpercentofallpublishedpsychologicalstudieswithstatistically
significantdatafoundtheeffecttheywerelookingfor.Asignificant
resultisdefinedasanydatapointthatwouldbeproducedbychance
lessthanfivepercentofthetime.Thisubiquitoustestwasinvented
in1922bytheEnglishmathematicianRonaldFisher,whopickedfive
percentastheboundaryline,somewhatarbitrarily,becauseitmade

pencilandsliderulecalculationseasier.Sterlingsawthatifninety
sevenpercentofpsychologystudieswereprovingtheirhypotheses,
eitherpsychologistswereextraordinarilyluckyortheypublished
onlytheoutcomesofsuccessfulexperiments.Inrecentyears,
publicationbiashasmostlybeenseenasaproblemforclinicaltrials,
sincepharmaceuticalcompaniesarelessinterestedinpublishing
resultsthatarentfavorable.Butitsbecomingincreasinglyclearthat
publicationbiasalsoproducesmajordistortionsinfieldswithout
largecorporateincentives,suchaspsychologyandecology.

hilepublicationbiasalmostcertainlyplaysaroleinthe
declineeffect,itremainsanincompleteexplanation.Forone
thing,itfailstoaccountfortheinitialprevalenceofpositiveresults
amongstudiesthatneverevengetsubmittedtojournals.Italsofails
toexplaintheexperienceofpeoplelikeSchooler,whohavebeen
unabletoreplicatetheirinitialdatadespitetheirbestefforts.Richard
Palmer,abiologistattheUniversityofAlberta,whohasstudiedthe
problemssurroundingfluctuatingasymmetry,suspectsthatanequally
significantissueistheselectivereportingofresultsthedatathat
scientistschoosetodocumentinthefirstplace.Palmersmost
convincingevidencereliesonastatisticaltoolknownasafunnel
graph.Whenalargenumberofstudieshavebeendoneonasingle
subject,thedatashouldfollowapattern:studieswithalargesample
sizeshouldallclusteraroundacommonvaluethetrueresult
whereasthosewithasmallersamplesizeshouldexhibitarandom
scattering,sincetheyresubjecttogreatersamplingerror.Thispattern
givesthegraphitsname,sincethedistributionresemblesafunnel.
Thefunnelgraphvisuallycapturesthedistortionsofselective
reporting.Forinstance,afterPalmerplottedeverystudyof
fluctuatingasymmetry,henoticedthatthedistributionofresultswith
smallersamplesizeswasntrandomatallbutinsteadskewedheavily
towardpositiveresults.Palmerhassincedocumentedasimilar
probleminseveralothercontestedsubjectareas.OnceIrealizedthat

selectivereportingiseverywhereinscience,Igotquitedepressed,
Palmertoldme.Asaresearcher,yourealwaysawarethatthere
mightbesomenonrandompatterns,butIhadnoideahow
widespreaditis.Inarecentreviewarticle,Palmersummarizedthe
impactofselectivereportingonhisfield:Wecannotescapethe
troublingconclusionthatsomeperhapsmanycherished
generalitiesareatbestexaggeratedintheirbiologicalsignificance
andatworstacollectiveillusionnurturedbystrongaprioribeliefs
oftenrepeated.
Palmeremphasizesthatselectivereportingisnotthesameas
scientificfraud.Rather,theproblemseemstobeoneofsubtle
omissionsandunconsciousmisperceptions,asresearchersstruggleto
makesenseoftheirresults.StephenJayGouldreferredtothisasthe
shoehorningprocess.Alotofscientificmeasurementisreally
hard,Simmonstoldme.Ifyouretalkingaboutfluctuating
asymmetry,thenitsamatterofminusculedifferencesbetweenthe
rightandleftsidesofananimal.Itsmillimetresofatailfeather.And
somaybearesearcherknowsthathesmeasuringagoodmalean
animalthathassuccessfullymatedandheknowsthatitssupposed
tobesymmetrical.Well,thatactofmeasurementisgoingtobe
vulnerabletoallsortsofperceptionbiases.Thatsnotacynical
statement.Thatsjustthewayhumanbeingswork.
Oneoftheclassicexamplesofselectivereportingconcernsthe
testingofacupunctureindifferentcountries.Whileacupunctureis
widelyacceptedasamedicaltreatmentinvariousAsiancountries,its
useismuchmorecontestedintheWest.Theseculturaldifferences
haveprofoundlyinfluencedtheresultsofclinicaltrials.Between
1966and1995,therewerefortysevenstudiesofacupuncturein
China,Taiwan,andJapan,andeverysingletrialconcludedthat
acupuncturewasaneffectivetreatment.Duringthesameperiod,
therewereninetyfourclinicaltrialsofacupunctureintheUnited
States,Sweden,andtheU.K.,andonlyfiftysixpercentofthese

studiesfoundanytherapeuticbenefits.AsPalmernotes,thiswide
discrepancysuggeststhatscientistsfindwaystoconfirmtheir
preferredhypothesis,disregardingwhattheydontwanttosee.Our
beliefsareaformofblindness.
JohnIoannidis,anepidemiologistatStanfordUniversity,arguesthat
suchdistortionsareaseriousissueinbiomedicalresearch.These
exaggerationsarewhythedeclinehasbecomesocommon,hesays.
Itdbereallygreatiftheinitialstudiesgaveusanaccuratesummary
ofthings.Buttheydont.Andsowhathappensiswewastealotof
moneytreatingmillionsofpatientsanddoinglotsoffollowup
studiesonotherthemesbasedonresultsthataremisleading.In
2005,IoannidispublishedanarticleintheJournaloftheAmerican
MedicalAssociationthatlookedatthefortyninemostcitedclinical
researchstudiesinthreemajormedicaljournals.Fortyfiveofthese
studiesreportedpositiveresults,suggestingthattheintervention
beingtestedwaseffective.Becausemostofthesestudieswere
randomizedcontrolledtrialsthegoldstandardofmedical
evidencetheytendedtohaveasignificantimpactonclinical
practice,andledtothespreadoftreatmentssuchashormone
replacementtherapyformenopausalwomenanddailylowdose
aspirintopreventheartattacksandstrokes.Nevertheless,thedata
Ioannidisfoundweredisturbing:ofthethirtyfourclaimsthathad
beensubjecttoreplication,fortyonepercenthadeitherbeendirectly
contradictedorhadtheireffectsizessignificantlydowngraded.
Thesituationisevenworsewhenasubjectisfashionable.Inrecent
years,forinstance,therehavebeenhundredsofstudiesonthevarious
genesthatcontrolthedifferencesindiseaseriskbetweenmenand
women.Thesefindingshaveincludedeverythingfromthemutations
responsiblefortheincreasedriskofschizophreniatothegenes
underlyinghypertension.Ioannidisandhiscolleagueslookedatfour
hundredandthirtytwooftheseclaims.Theyquicklydiscoveredthat
thevastmajorityhadseriousflaws.Butthemosttroublingfact

emergedwhenhelookedatthetestofreplication:outoffourhundred
andthirtytwoclaims,onlyasingleonewasconsistentlyreplicable.
Thisdoesntmeanthatnoneoftheseclaimswillturnouttobetrue,
hesays.But,giventhatmostofthemweredonebadly,Iwouldnt
holdmybreath.
AccordingtoIoannidis,themainproblemisthattoomany
researchersengageinwhathecallssignificancechasing,orfinding
waystointerpretthedatasothatitpassesthestatisticaltestof
significancetheninetyfivepercentboundaryinventedbyRonald
Fisher.Thescientistsaresoeagertopassthismagicaltestthatthey
startplayingaroundwiththenumbers,tryingtofindanythingthat
seemsworthy,Ioannidissays.Inrecentyears,Ioannidishasbecome
increasinglybluntaboutthepervasivenessoftheproblem.Oneofhis
mostcitedpapershasadeliberatelyprovocativetitle:WhyMost
PublishedResearchFindingsAreFalse.
Theproblemofselectivereportingisrootedinafundamental
cognitiveflaw,whichisthatwelikeprovingourselvesrightandhate
beingwrong.Itfeelsgoodtovalidateahypothesis,Ioannidissaid.
Itfeelsevenbetterwhenyouvegotafinancialinterestintheideaor
yourcareerdependsuponit.Andthatswhy,evenafteraclaimhas
beensystematicallydisprovenhecites,forinstance,theearlywork
onhormonereplacementtherapy,orclaimsinvolvingvarious
vitaminsyoustillseesomestubbornresearcherscitingthefirst
fewstudiesthatshowastrongeffect.Theyreallywanttobelievethat
itstrue.
ThatswhySchoolerarguesthatscientistsneedtobecomemore
rigorousaboutdatacollectionbeforetheypublish.Werewasting
toomuchtimechasingafterbadstudiesandunderpowered
experiments,hesays.Thecurrentobsessionwithreplicability
distractsfromtherealproblem,whichisfaultydesign.Henotesthat
nobodyeventriestoreplicatemostsciencepaperstherearesimply

toomany.(AccordingtoNature,athirdofallstudiesneverevenget
cited,letalonerepeated.)Ivelearnedthehardwaytobe
exceedinglycareful,Schoolersays.Everyresearchershouldhave
tospellout,inadvance,howmanysubjectstheyregoingtouse,and
whatexactlytheyretesting,andwhatconstitutesasufficientlevelof
proof.Wehavethetoolstobemuchmoretransparentaboutour
experiments.
Inaforthcomingpaper,Schoolerrecommendstheestablishmentof
anopensourcedatabase,inwhichresearchersarerequiredtooutline
theirplannedinvestigationsanddocumentalltheirresults.Ithink
thiswouldprovideahugeincreaseinaccesstoscientificworkand
giveusamuchbetterwaytojudgethequalityofanexperiment,
Schoolersays.Itwouldhelpusfinallydealwithalltheseissuesthat
thedeclineeffectisexposing.

lthoughsuchreformswouldmitigatethedangersofpublication
biasandselectivereporting,theystillwouldnterasethedecline
effect.Thisislargelybecausescientificresearchwillalwaysbe
shadowedbyaforcethatcantbecurbed,onlycontained:sheer
randomness.Althoughlittleresearchhasbeendoneonthe
experimentaldangersofchanceandhappenstance,theresearchthat
existsisntencouraging.
Inthelatenineteennineties,JohnCrabbe,aneuroscientistatthe
OregonHealthandScienceUniversity,conductedanexperimentthat
showedhowunknowablechanceeventscanskewtestsof
replicability.Heperformedaseriesofexperimentsonmouse
behaviorinthreedifferentsciencelabs:inAlbany,NewYork
Edmonton,AlbertaandPortland,Oregon.Beforeheconductedthe
experiments,hetriedtostandardizeeveryvariablehecouldthinkof.
Thesamestrainsofmicewereusedineachlab,shippedonthesame
dayfromthesamesupplier.Theanimalswereraisedinthesamekind
ofenclosure,withthesamebrandofsawdustbedding.Theyhadbeen

exposedtothesameamountofincandescentlight,werelivingwith
thesamenumberoflittermates,andwerefedtheexactsametypeof
chowpellets.Whenthemicewerehandled,itwaswiththesamekind
ofsurgicalglove,andwhentheyweretesteditwasonthesame
equipment,atthesametimeinthemorning.
Thepremiseofthistestofreplicability,ofcourse,isthateachofthe
labsshouldhavegeneratedthesamepatternofresults.Ifanysetof
experimentsshouldhavepassedthetest,itshouldhavebeenours,
Crabbesays.Butthatsnotthewayitturnedout.Inone
experiment,Crabbeinjectedaparticularstrainofmousewith
cocaine.InPortlandthemicegiventhedrugmoved,onaverage,six
hundredcentimetresmorethantheynormallydidinAlbanythey
movedsevenhundredandoneadditionalcentimetres.Butinthe
Edmontonlabtheymovedmorethanfivethousandadditional
centimetres.Similardeviationswereobservedinatestofanxiety.
Furthermore,theseinconsistenciesdidntfollowanydetectable
pattern.InPortlandonestrainofmouseprovedmostanxious,while
inAlbanyanotherstrainwonthatdistinction.
ThedisturbingimplicationoftheCrabbestudyisthatalotof
extraordinaryscientificdataarenothingbutnoise.Thehyperactivity
ofthosecokedupEdmontonmicewasntaninterestingnewfactit
wasameaninglessoutlier,abyproductofinvisiblevariableswe
dontunderstand.Theproblem,ofcourse,isthatsuchdramatic
findingsarealsothemostlikelytogetpublishedinprestigious
journals,sincethedataarebothstatisticallysignificantandentirely
unexpected.Grantsgetwritten,followupstudiesareconducted.The
endresultisascientificaccidentthatcantakeyearstounravel.
Thissuggeststhatthedeclineeffectisactuallyadeclineofillusion.
WhileKarlPopperimaginedfalsificationoccurringwithasingle,
definitiveexperimentGalileorefutedAristotelianmechanicsinan
afternoontheprocessturnsouttobemuchmessierthanthat.Many

scientifictheoriescontinuetobeconsideredtrueevenafterfailing
numerousexperimentaltests.Verbalovershadowingmightexhibitthe
declineeffect,butitremainsextensivelyrelieduponwithinthefield.
Thesameholdsforanynumberofphenomena,fromthedisappearing
benefitsofsecondgenerationantipsychoticstotheweakcoupling
ratioexhibitedbydecayingneutrons,whichappearstohavefallenby
morethantenstandarddeviationsbetween1969and2001.Eventhe
lawofgravityhasntalwaysbeenperfectatpredictingrealworld
phenomena.(Inonetest,physicistsmeasuringgravitybymeansof
deepboreholesintheNevadadesertfoundatwoandahalfpercent
discrepancybetweenthetheoreticalpredictionsandtheactualdata.)
Despitethesefindings,secondgenerationantipsychoticsarestill
widelyprescribed,andourmodeloftheneutronhasntchanged.The
lawofgravityremainsthesame.
Suchanomaliesdemonstratetheslipperinessofempiricism.Although
manyscientificideasgenerateconflictingresultsandsufferfrom
fallingeffectsizes,theycontinuetogetcitedinthetextbooksand
drivestandardmedicalpractice.Why?Becausetheseideasseemtrue.
Becausetheymakesense.Becausewecantbeartoletthemgo.And
thisiswhythedeclineeffectissotroubling.Notbecauseitreveals
thehumanfallibilityofscience,inwhichdataaretweakedandbeliefs
shapeperceptions.(Suchshortcomingsarentsurprising,atleastfor
scientists.)Andnotbecauseitrevealsthatmanyofourmostexciting
theoriesarefleetingfadsandwillsoonberejected.(Thatideahas
beenaroundsinceThomasKuhn.)Thedeclineeffectistroubling
becauseitremindsushowdifficultitistoproveanything.Weliketo
pretendthatourexperimentsdefinethetruthforus.Butthatsoften
notthecase.Justbecauseanideaistruedoesntmeanitcanbe
proved.Andjustbecauseanideacanbeproveddoesntmeanits
true.Whentheexperimentsaredone,westillhavetochoosewhatto
believe.

JONAHLEHRER

You might also like