Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to
The American Mathematical Monthly.
http://www.jstor.org
19781 CONTROVERSIES IN THE FOUNDATIONS OF STATISTICS 231
BRADLEY EFRON
1. Introduction.Statistics
seemstobe a difficult
subjectformathematicians, perhapsbecauseits
elusiveand wide-ranging charactermitigatesagainstthe traditional theorem-proof methodof
presentation.
It maycomeas somecomfort thenthatstatistics
is alsoa difficult
subjectforstatisticians.
We arenowcelebrating theapproximate bicentennialofa controversy concerningthebasicnatureof
statistics.
The twomainfactions inthisphilosophical
battle,theBayesiansandthefrequentists, have
alternated
dominance severaltimes,withthefrequentists currentlyholdingan uneasyupperhand.A
smallerthirdparty,perhapsbestcalledtheFisherians, snipesawayat bothsides.
Statistics,
bydefinition,is uninterested
in thespecialcase. Averagesare themeatofstatisticians,
where"average"hereis understood in thewidesenseof anysummary statement abouta large
populationofobjects."TheaverageI.Q. ofa collegefreshman is 109"isonesuchstatement,as is "the
probability
ofa faircoinfallingheadsis 1/2."The controversies dividing
thestatistical
worldrevolve
on thefollowingbasicpoint:justwhichaveragesaremostrelevant indrawinginferencesfromdata?
Frequentists,Bayesians,and Fisherianshave producedfundamentally differentanswersto this
question.
Thisarticlewillproceedbya seriesofexamples, ratherthanan axiomatic orhistorical
exposition
ofthevariouspointsofview.Theexamplesareartificially simpleforthesakeofhumanepresentation,
but readersshouldbe assuredthat real data are susceptibleto the same disagreements. A
counter-warningis also apt: thesedisagreements haven'tcrippledstatistics,eithertheoretical or
applied,and haveas a matteroffactcontributed to itsvitality.Importantrecentdevelopments, in
theempirical
particular Bayesmethods mentioned inSection8,havesprung fromthetension
directly
betweentheBayesianand frequentist viewpoints.
All of ourexampleswillinvolvethenormaldistribution,
2. The normaldistribution. whichfor
variousreasonsplaysa centralrole in theoreticaland appliedstatistics.A normal,or Gaussian,
randomvariablex is a quantity
whichpossibly can takeon anyvalueon therealaxis,butnotwith
thatx fallsin theinterval
The probability
equal probability. [a, b] is givenbytheareaunderGauss'
famousbell-shapedcurve,
where
we indicatesucha randomvariableby
For convenience
(2.3) x _
A(,O, 0.2),
Theparameter
,uisthe"mean"or"expectation"
oftherandom
quantity
x.Using"E" toindicate
expectation,
Thereadermaywishtothink
ofE{g(x)} foranarbitrary
function
g(x)asjustanother
notation
forthe
ofg(x) withrespect
integral to 40,,,(x)dx,
E{g(x)} istheweighted
Intuitively, ofthepossible
average valuesofg(x),weighted
accordingtothe
probabilities
4s,<,(x)dxfortheinfinitesimal [x,x + dx]. In otherwords,E{g(x)} is a
intervals
averageofan infinite
theoretical ofg(x) values,wherethex's occurin proportion
population to
+S, (x).
It is easyto see, by symmetry, that,t is indeedthetheoretical
averageof x itselfwhen
X _ (It, 02). A moredifficult
calculation(though forfriends
easyenough ofthegamma function)
givestheexpectation of g(x) = (x -_ )2,
probability
calculation
showsthattherandom xi - x,x2- x, X3 - X,..., Xn - X havea joint
quantities
distribution
which doesnotdependinanywayontheunknown parameter ,u.Inother words,what's
leftoverinthedataafter
thestatistician x isdevoidofinformation
learns about,t. (Thisdeceptively
simple eludedbothGaussandLaplace!)
principle
is
Nobodydoubtsthattheseresultsare true.The questionraisedby Bayesiansand Fisherians
whether to theprocessofinference
averagesarereallyrelevant
frequentist use inreasoning
scientists
fromnoisydata back to theunderlying
mathematical models.We turnnextto theBayesianpoint
of view.
4. Bayesianestimation
of themean.So farwe haveconsidered ,u tobe a fixed,
albeitunknown,
Supposethoughthat,u itselfis a randomvariable,knownto havethenormaldistribution
quantity.
withmeanm and standarddeviations,
(4.1) ,u _ X(M, s2),
m and s beingconstants For example,if,tt is thetrueI.Q. of a person
knownto thestatistician.
randomly chosenfromthe populationof the UnitedStates,(4.1) holdswithm = 100 and s = 15
(approximately).About68% ofI.Q.'s are between85 and 115,about95% between70 and 130,etc.
Information for,i" in thelanguageof the Bayesians,changesthe
like(4.1), a "priordistribution
natureof theestimation process.
StandardI.Q. testsareconstructed so thatifwe testourrandomly chosenpersonto discoverhis
particular
it value,theoverall test
score*,sayx,is an normally
unbiased distributedestimator of,t as
in Section3,
(4.2) I,,xt -s'(y,it 2/n),
withor/ n about7.5. We can expectx to be within 7.5 I.Q. pointsof,t 68% ofthetime,etc.The
notation"9itt " emphasizesthatthe N(t, c,2/n) distributionforx is conditional
on theparticular
valuetakenbytherandomquantity ,u.The reasonforthischangein notationwillbe madeclearer
soon.
Bayes' theorem, discovered
originally bytheremarkable Reverend ThomasBayesaround1750,is
a mathematical formula7for
combining(4.1)and(4.2)toobtaintheconditional distribution
of,t given
.Z In thiscase theformulagives
(4.3) ,tta.
- K(m+ C(x - m),D),
where
nIoad,D= 1
(4.4) C i= 1s2+n/a2
Dn and
/cS2+./2_2
* The symbolsx forthe testscore and cr/- /n foritsstandarddeviationare chosen to agree withour previous
notation,even thoughreal I.Q. scoresaren'tactuallytheaverageof n independenttestitems.Perfectnormality, as
expressedin (4.2), is an ideal only approximatedby actual test scores.
236 BRADLEY EFRON [April
posteriordistribution
of true I.Q. for a
person scoring160 on- /-,
test
priordistribution
of I.Q. scores in Bayes
population estimate-,
/ \ | /11
* \ observed
Isumx
95% probability
FIG. 2. I.Q. scores have a X(100,(15)2) distributionin the populationas a whole. A randomlyselected person
scoring160 on a normalunbiasedI.Q. testwithstandarddeviation7.5 pointsis estimatedto have a trueI.Q. of 148.
The probabilityis 95% that the person's true I.Q. is in the interval[134.6, 161.41.
Confidence
intervals
have an obviousBayesiananalogue,from(4.3),
(4.7) - 2VD
*(xg)
Prob{ut ? + 2VD Ix} = .95.
A c ,u*(xg)
The notationProb{. Ix} indicatesprobability conditional on theobservedvalueof x In theI.Q.
example,Prob{134.6' 18x= 160}= .95.
? 161.8
Nobody(well,almostnobody)disagreeswiththeuse of Bayesianmethods in situationslikethe
I.Q. problemwherethereis a clearlydefinedand well-known priordistribution forA. The Bayes
theory,as we shallsee, offerssomestrikingadvantages inclarity andconsistency. Theseadvantages
aredue to thefactthatBayesianaveragesinvolveonlythedatavaluex actuallyseen,ratherthana
collectionof theoreticallypossibleotherx values.
Difficultiesand controversiesarisebecauseBayesianstatisticians wishto use Bayesianmethods
whenthereis no obviouspriordistribution for,u,or goingevenfurther, whenit is clearthatthe
unknown ,t is a fixedconstantwithno randomcharacter at all. (Forexample,if,t is somephysical
constant,suchas thespeedoflight, beingexperimentally estimated.) Itisnotperversity thatmotivates
thisBayesianimpulse,butrathera well-documented casebookof unpleasant inconsistencies in the
frequentistapproach.
As an exampleof the kindof difficulties frequentists experience,let us reconsider the I.Q.
estimationproblem, butwithout assumingknowledge ofthepriordistribution (4.1) for,u.In other
words,assumeonlythatwe observex - .A8(t, 2'/n), o,/Vn = 7.5,and wishto estimate,u.Having
observedx = 160,theresultsof Section3 tellus to estimate,u by , = 160,with95% confidence
interval[ - 2o /Vn, 4 + 2o / /n = [145,175].
Supposenowthatthefrequentist receivesa letterfromthecompany whichadministered theI.Q.
test:"On thedaythescoreof x = 160was reported, ourtest-grading machinewas malfunctioning.
Anyscorex below100was reported as 100.The machinefunctioned forscoresx above
perfectly
100."
1978] CONTROVERSIES IN THE FOUNDATIONS OF STATISTICS 237
(6.7) 8 arctan(92/i1), r =V + 2.
of 8. It is unbiased,EO = 8, withexpectedsquarederror
Then 8 is theobviousestimator
(6.8) E(6 -_ ) = .12
(obtainedbynumerical integration;(6.8) makestheconventionthat6- 8 rangesfrom- ir to ir for
anyvalueof 8,thelargestpossibleestimation erroroccurringif(xl,x2) is antipodalto (,b U2). This
convention becausetheprobability
is unimportant of 6 - 8 > ir/2 is only.0014).
The unobviousfactpointedout byFisheris thatr playsthesameroleas did "n" in examples
(6.1)-(6.4).
(i) The distributionof r does notdependon thetruevalueof 8. (Forreadersfamiliar withthe
240 BRADLEY EFRON [April
data vectoril,2xr
is observed to
( X2
lie on this
circle
FIG. 3. The model X, - X (,u, 1) independentof X2 -ff(/L2, 1), with(tL t 2) knownto lie on a circleof radius 3
centeredat theorigin.We wishto estimatetheangularlocation6 of(Ax,,l2) on thecircle.The data vector(21 x2) is
observed to have polar coordinates(0, r).
E{(6-0 )2r} .26 .18 .14 .12 .10 .09 .08 .07 .12
TABLE 1. The conditionalexpectedsquared errorof estimationin the circleproblem,E{(6 - 6)21r}, as a function
of the ancillarystatisticr V
A/x2+ x2. The accuracyof 6 improvesas r increases.FisherarguedthatE{(0 - I)21r}
is a more relevantmeasure of the accuracyof 0 than is the unconditionalexpectationE(O - O)2.
A ? Theanswerseemsclearlynot,butitisdifficult
tocodifya wayofavoiding suchtraps.To putthings
theotherway,supposewe knowthe weightswI, w2,w3,..., w22n of themicebeforewe beginthe
experiment.
Underreasonablefrequentist assumptionstherewillbe a uniquebestway{Y(x), Y(y)}
ofassigning
themice'tothetreatments forthepurposeoftestingtreatment x versustreatmenty,one
thatoptimallyequalizesthe weightassignments trainedin the
to the two groups.Statisticians
Fisherian
tradition
finditdifficult
to acceptsuch"optimalexperimental designs"becausetheelement
of randomization
has been eliminated.
Thereadermayhavenoticedthatthecontroversies
8. Stein'sPhenomenon. so farhavebeenmore
academicthanpractical.All philosophical
factionsagreethatin the absenceof priorkiiowledge
[x -2* o-IV/nx+ 2 ro-IVn]is a 95% intervalfor,u, the disagreement beingoverwhat"95%"
means.Thissituationchanges,fortheworse,whenwe consider thesimultaneousestimationofmany
parameters.
Supposethenthatwehaveseveralnormalmeans,u1,12,.. .*, lk toestimate,
foreachone ofwhich
we observean independent,unbiasednormalestimate
(8.1) , 1) independentlyi = 1,2,..., k.
xi sV'(,l
(Once againwe havetakenthevarianceo-2/nequal to 1 forthesakeofconvenience.) The natural
analogueof squarederrorlosswhenthereare severalparameters to estimateis Euclideansquared
distance.To simplify notation,let i= (I, x2,.. .,k) be the vectorof observedaverages, t =
'
(/1-, Il2, -/-Lk)
. , thevectoroftruemeans,and = (41, 2,..*, ^k) thevectorofestimates. Thenthe
squarederrormisestimation penaltyis
k
(8.2) 11 - z 112
=
i=l
( -i )2.
Prob{Ilill> lluII} 1.00 .967 .904 .857 .822 .795 .762 .719
TABLE ' ||,u 11is alwaysgreaterthan .5. For the case k = 10 the probabilitiesare much
2. The probabilitythat IIJJ
greaterthan .5 for moderatevalues of II 11.
= i
(8.9) 1-l x
has
(8.10) Ell, -, 112<k
foreveryit! (Thisparticular formof ,u was developedjointlywithW. Jamesin 1960.)Froma
frequentist pointof view,,u estimatesit uniformly betterthandoes ,u. It is also betterfroma
Bayesianpointofview:givenanypriordistribution on it,estimating by,u ratherthan,u results in a
loweroverallexpectedsquarederrorofestimation (averaging nowovertherandomness init andthe
randomness in x).
Stein'sestimator = II IItendsto be greaterthan11it
is based on (8.3). Since IIl,ill 1Iwithhigh
probability, a shrinking factor[1- (k - 2)/11 is usedto givean estimate
x 112] nearer,i. The shrinking
factoris moredrasticwhenII is small.Withk = 10,II
i 112 = 12,we have ,u = [.333]i. If instead
i 112
le
0
FIG. 4. Stein's estimate, is obtainedby shrinking the obvious estimate, x towardthe origin0. The shrinking
factoris moreextremethecloser ||x|| lies to O. Steinand JamesshowedthatE!l,i - Iu 11'< E|l,4 - Iu 112
foreveryIu.
We can choose any other originO' and obtain a different Stein estimate,,u', whichalso dominates,u.
AnnotatedReferences
theorem-Anextension
1. The Looman-Menchoff of Goursat'stheorem.It is well known'thata
f = u + iv,defined
function
complex-valued on a domainD inthecomplexplanesatisfies
andanalytic
theCauchy-Riemannequations
du dv du dv
dx dy and d