Controversies in The Foundation of Statisttics 1

Controversies in the Foundations of Statistics
Author(s): Bradley Efron

Reviewed work(s):
Source: The American Mathematical Monthly, Vol. 85, No. 4 (Apr., 1978), pp. 231-246
Published by: Mathematical Association of America
Stable URL: http://www.jstor.org/stable/2321163 .
Accessed: 02/08/2012 05:00
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to
The American Mathematical Monthly.
http://www.jstor.org
19781 CONTROVERSIES IN THE FOUNDATIONS OF STATISTICS 231
An apparentgrowing trendto rewardeffort or attendance ratherthanachievement has been

makingit increasingly difficult
formathematics teachersto maintainstandards. We recom-
mendthatmathematics departments reviewevaluation procedures to insurethatgradesreflect
student achievement. Further,we urgeadministrators to support teachers in thisendeavor.
4. In lightof 3 above,we also recognizethatadvancement of studentswithoutappropriate
achievement has a detrimental effecton theindividual student and on theentireclass.We,
therefore,recommend thatschooldistricts makespecialprovisions to assiststudents when
deficienciesarefirstnoted.
5. We recommend thatcumulative evaluations be giventhroughout eachcourse,as wellas at its
completion, to all students.We believethattheabsenceof cumulative evaluationpromotes
short-term learning. We strongly opposethepracticeof exempting students fromevaluations.
6. We recommend thatcomputers and handcalculators be usedin imaginative waysto reinforce
learningand to motivatethe studentas proficiency in mathematics is gained.Calculators
shouldbe used to supplement ratherthanto supplantthestudyof necessary computational
skills.
7. We recommend thatcollegesand universities administer placement examinations in mathe-
maticspriorto finalregistration to aid students in selecting appropriate collegecourses.
8. We encourage thecontinuation or initiationofjointmeetings ofcollegeand secondary school
mathematics instructorsand counselorsin orderto improvecommunication concerning
mathematics prerequisites for careers,preparation of studentsfor collegiatemathematics
courses,joint curriculum coordination, remedialprograms in schoolsand colleges,an ex-
changeofsuccessful instructional
strategies,planning ofin-service programs, andotherrelated
topics.
9. Schoolsshouldfrequently reviewtheirmathematics curricula to see thattheymeettheneeds
of theirstudents in preparing themforcollegemathematics. Schooldistricts thathave not
conducted a curriculum analysisrecently shoulddo so now,primarily to identifytopicsin the
curriculum whichcouldbe eitheromitted or de-emphasized, ifnecessary, in orderto provide
sufficienttimeforthetopicsincludedin theabovestatement. We suggestthat,forexample,
thefollowing couldbe de-emphasized or omitted ifnowin thecurriculum:
(A) logarithmic calculationsthatcan betterbe handledbycalculators or computers,
(B) extensive solvingoftriangles in trigonometry,
(C) proofsofsuperfluous or trivialtheorems in geometry.
10. We recommend thatalgebraicconceptsand skillsbe incorporated wherever possibleinto
geometry and othercoursesbeyondalgebrato helpstudents retaintheseconceptsand skills.
CONTROVERSIES IN THE FOUNDATIONSOF STATISTICS
BRADLEY EFRON
1. Introduction.Statistics
seemstobe a difficult
subjectformathematicians, perhapsbecauseits
elusiveand wide-ranging charactermitigatesagainstthe traditional theorem-proof methodof
presentation.
It maycomeas somecomfort thenthatstatistics
is alsoa difficult
subjectforstatisticians.
We arenowcelebrating theapproximate bicentennialofa controversy concerningthebasicnatureof
statistics.
The twomainfactions inthisphilosophical
battle,theBayesiansandthefrequentists, have
BradleyEfronreceivedhis Ph.D. in StatisticsfromStanfordin 1964 underthe directionof RupertMiller.He

at Stanfordin both the StatisticsDepartmentand the Departmentof PreventiveMedicine.
holds professorships
His interestscover most of theoreticaland applied statistics,with special emphasis on the application of
geometricalmethodsto statisticalproblems.- Editors
232 BRADLEY EFRON [April
alternated
dominance severaltimes,withthefrequentists currentlyholdingan uneasyupperhand.A
smallerthirdparty,perhapsbestcalledtheFisherians, snipesawayat bothsides.
Statistics,
bydefinition,is uninterested
in thespecialcase. Averagesare themeatofstatisticians,
where"average"hereis understood in thewidesenseof anysummary statement abouta large
populationofobjects."TheaverageI.Q. ofa collegefreshman is 109"isonesuchstatement,as is "the
probability
ofa faircoinfallingheadsis 1/2."The controversies dividing
thestatistical
worldrevolve
on thefollowingbasicpoint:justwhichaveragesaremostrelevant indrawinginferencesfromdata?
Frequentists,Bayesians,and Fisherianshave producedfundamentally differentanswersto this
question.
Thisarticlewillproceedbya seriesofexamples, ratherthanan axiomatic orhistorical
exposition
ofthevariouspointsofview.Theexamplesareartificially simpleforthesakeofhumanepresentation,
but readersshouldbe assuredthat real data are susceptibleto the same disagreements. A
counter-warningis also apt: thesedisagreements haven'tcrippledstatistics,eithertheoretical or
applied,and haveas a matteroffactcontributed to itsvitality.Importantrecentdevelopments, in
theempirical
particular Bayesmethods mentioned inSection8,havesprung fromthetension
directly
betweentheBayesianand frequentist viewpoints.
All of ourexampleswillinvolvethenormaldistribution,
2. The normaldistribution. whichfor
variousreasonsplaysa centralrole in theoreticaland appliedstatistics.A normal,or Gaussian,
randomvariablex is a quantity
whichpossibly can takeon anyvalueon therealaxis,butnotwith
thatx fallsin theinterval
The probability
equal probability. [a, b] is givenbytheareaunderGauss'
famousbell-shapedcurve,
(2.1) Prob{a 'x - b}I = f (x)dx,
where
(2.2) + l(x \(x ) ]
we indicatesucha randomvariableby
For convenience
(2.3) x _
A(,O, 0.2),
withcr2 insteadof 0r as thesecondargument byconvention.

Figure1 illustrates
thenormaldistribution. The highpointof4 ,,(x) is at x =,, thecurvefalling
offquicklyforIx -, I > cr.Mostoftheprobability, 99.7%,is within? 3 cr-unitsofthecentralvalue
ju.We can writex _
X(,u, 0.2) as x = ,u+ E,where
E _
X(0, 0.2); the
adding constant shifts
,u merely
E _
(0, 0-2) /L unitsto theright.
A-3or ,-2or aA-orb ,u ,+or ,+2or ,+3or

The randomquantityx - X (,, cr2) occursin [a, b] withprobabilityequal to the
FIG. 1. The normaldistribution.
shaded area. 68% of the probabilityis in the interval[A - a, A + a], 95% in [,A- 2a,, + 2a], 99.7% in
[A-3or t +.3a_
1978] CONTROVERSIES IN THE FOUNDATIONS OF STATISTICS 233
Theparameter
,uisthe"mean"or"expectation"
oftherandom
quantity
x.Using"E" toindicate
expectation,
(2.4) t =E{x}-f_ x4,,(x)dx.
Thereadermaywishtothink
ofE{g(x)} foranarbitrary
function
g(x)asjustanother
notation
forthe
ofg(x) withrespect
integral to 40,,,(x)dx,
(2.5) E{g(x)}-J g(x)4,,(x)dx.
E{g(x)} istheweighted
Intuitively, ofthepossible
average valuesofg(x),weighted
accordingtothe
probabilities
4s,<,(x)dxfortheinfinitesimal [x,x + dx]. In otherwords,E{g(x)} is a
intervals
averageofan infinite
theoretical ofg(x) values,wherethex's occurin proportion
population to
+S, (x).
It is easyto see, by symmetry, that,t is indeedthetheoretical
averageof x itselfwhen
X _ (It, 02). A moredifficult
calculation(though forfriends
easyenough ofthegamma function)
givestheexpectation of g(x) = (x -_ )2,
(2.6) E{(x-A)2} = f (X-_)%0,a(X)dX = _2.
Theparameter o, calledthe"standard deviation," setsthescaleforthevariability

ofx aboutthe
central valuett,as Figure1 shows.A K(1,10-6)random variable
willhavealmostno perceptible
variability under repeated 997outof1000repetitions
trials, in1.997,
occurring sinceaf= 10'.
1.003],
A X(1,106) randomvariableis almostall noiseand no signal,in theevocative languageof
communications theory,
Thenormal hasa very
distribution useful
closure propertythatmakesitaseasytodealwith many
observations aswitha single
one.LetX1,X2,X3,.. ., Xnbe n independent eachofwhich
observations, is
.ff(,U o,2), u andC. being thesameforalln repetitions. Independencemeans thatthevalueofxi,say,
doesnotaffect anyoftheothervalues:observing xl> Atdoesnotincrease ordecrease the34%
probability thatx2E[-, ,, + ar],etc.A familiar (non-normal)exampleof independent variables
X1,X2,X3,... is given bysuccessive observations ofa well-rolleddie.
Let
n
(2.7) x xiln
i =1
be theobserved ofthen independent

average It is easyto showthat
X(,1,p.2) variables.
(2.8) x .A"(A, o2/n).
Thedistribution ofI is thesameas thatfortheindividual xi except thatthescalingparameter has
beenreduced from 0. to alIVn.Bytakingn sufficiently
largewecanreduce ofx about
thevariability
,utoanarbitrarily smalllevel,butofcourse inrealproblems n islimited andx retainsanirreducible
component ofrandom variability.
Inallofourexamples 0. will
beassumed known tothestatistician.
Theunknown parameter,i will
be theobjectofinterest, thegoalbeingtomakeinferences aboutthevalueof,t onthebasisofthe
datax1,x2,x3,..., xX. In 1925SirRonaldFishermadethefundamental thatin this
observation
situation theaverage x containsallpossible
informationabout,u.Foranyinference problem aboutit,
knowing x isjustas goodas knowing theentire datasetX1,X2,X3,.. ., Xn.In modernparlance,x isa
"sufficientstatistic" fortheunknown parameter ,t.
It is easytoverify inthisparticular
sufficiency case.Giventheobserved valueofx9,
a standard
probability
calculation
showsthattherandom xi - x,x2- x, X3 - X,..., Xn - X havea joint
quantities
distribution
which doesnotdependinanywayontheunknown parameter ,u.Inother words,what's
leftoverinthedataafter
thestatistician x isdevoidofinformation
learns about,t. (Thisdeceptively
simple eludedbothGaussandLaplace!)
principle
3. Frequentist estimation ofthemean.Thestatistician maywishtoestimate theunobservable

parameter ,u onthebasisoftheobserved datax1, X2, X3,.. ., X" "Estimate"usuallymeans"makea
on xl,x2,.. ., xn,
guessA(X1, X2, X3,.. ., xn) depending with theunderstanding that youwillbepenalized
an amount which is a smooth increasingfunction oftheerrorofestimation 4 - I."Theusual
A
penaltyfunction, which weshallalsousehere,is (4 - , )2, thesquared-error lossfunction
originally
introduced byGauss.
Fisher'ssufficiency principle
saysthatweneedonlyconsider ruleswhich
estimation area function
ofx. Themostobvious is x itself,
candidate
(3.1) (Xl,x2, xn)
Thisestimation
ruleis "unbiased"
for,u; no matter
whatthetruevalueof,u is,
(3.2) Ex = ,t.
isbynomeansa necessary
Unbiasedness condition fora goodestimationrule,as weshallseelater,
butitdoeshaveconsiderable appealas a guarantee
intuitive thatthestatistician
isnottryingtoslant
theestimation infavor
process ofanyparticular ,u value.
Theexpected forusingA = x is,according
penalty to (2.6)and(2.8),
(3.3) E( A- _L)2 = 0_2/ n.
Gaussshowedthatamongall unbiasedestimation rules4 (xl,x2,...,x) whichare linearin

x1x2,x3 ... xn, therule4 =x uniformly
, minimizesE( A- _t)2 foreveryvalueof,u.In theearly
1940'sthisresultwas;extended
toincludeanyunbiasedestimator atall,linear
ornonlinear.Theproof,
which depends on ideasFisherdeveloped inthe1920's,wasputforth separatelybyH. Cramer in
SwedenandC. R. Rao inIndia.
Ifweagreetoabidebytheunbiasedness criterion
andtousesquared-error loss,x seemstobethe
bestestimatorfor,u.Itishelpful
forthestatistician
toprovide notonlya "point for,u,x in
estimator"
thiscase,butalsoa rangeofplausible with
valuesof,uconsistent thedata.From(2.8)andFigure 1we
see that
(3.4) Prob{I - ,Iu- 2cr/Vn}= .95,
which
is equivalent
tothestatement
(3.5) Prob{x- 2o!/ /n' ,-' xj+ 2cr/!/n}= .95.
Theinterval [x- 2or//n,x + 2o!//n]is calleda "95% confidence
interval"for,u.Thetheoryof
confidence wasdeveloped
intervals byJ.Neyman intheearly1930's.As anexample, n = 4,
suppose
0r= 1, andwe observe xi= 1.2,x2= 0.3,X3= 0.7,x4= 0.2.Thenx = 0.6 andthe95% confidence
for,u is [ - .04,1.6].
interval
Allofthisseemssoinnocuous andstraightforward thatthereader maywonder where thegrounds
forcontroversylie.Thefactisthatalloftheresults
presented so farare"frequentist" innature.That
relate
is,they totheoretical
averages with totheX(/i,c_2/n) distribution
respect ofx,with ,uassumed
fixedat itstruevalue,whatever thatmaybe. Unbiasedness itselfis a frequentist concept;the
theoretical
average of with
4 ,uheldfixed, EA,equals,u.Results(3.3)and(3.5),andtheCramer-Rao
theorem, arefrequentist
statements.Forexample, theproper interpretation of(3.5)isthattheinterval
[--2cr/In,x+2o2r!/nl coversthetruevalueof ,u withfrequency 95% in a longseriesof
independent ofx fN'(i,,
repetitions n).
cr2/
is
Nobodydoubtsthattheseresultsare true.The questionraisedby Bayesiansand Fisherians
whether to theprocessofinference
averagesarereallyrelevant
frequentist use inreasoning
scientists
fromnoisydata back to theunderlying
mathematical models.We turnnextto theBayesianpoint
of view.
4. Bayesianestimation
of themean.So farwe haveconsidered ,u tobe a fixed,
albeitunknown,
Supposethoughthat,u itselfis a randomvariable,knownto havethenormaldistribution
quantity.
withmeanm and standarddeviations,
(4.1) ,u _ X(M, s2),
m and s beingconstants For example,if,tt is thetrueI.Q. of a person
knownto thestatistician.
randomly chosenfromthe populationof the UnitedStates,(4.1) holdswithm = 100 and s = 15
(approximately).About68% ofI.Q.'s are between85 and 115,about95% between70 and 130,etc.
Information for,i" in thelanguageof the Bayesians,changesthe
like(4.1), a "priordistribution
natureof theestimation process.
StandardI.Q. testsareconstructed so thatifwe testourrandomly chosenpersonto discoverhis
particular
it value,theoverall test
score*,sayx,is an normally
unbiased distributedestimator of,t as
in Section3,
(4.2) I,,xt -s'(y,it 2/n),
withor/ n about7.5. We can expectx to be within 7.5 I.Q. pointsof,t 68% ofthetime,etc.The
notation"9itt " emphasizesthatthe N(t, c,2/n) distributionforx is conditional
on theparticular
valuetakenbytherandomquantity ,u.The reasonforthischangein notationwillbe madeclearer
soon.
Bayes' theorem, discovered
originally bytheremarkable Reverend ThomasBayesaround1750,is
a mathematical formula7for
combining(4.1)and(4.2)toobtaintheconditional distribution
of,t given
.Z In thiscase theformulagives
(4.3) ,tta.
- K(m+ C(x - m),D),
where
nIoad,D= 1
(4.4) C i= 1s2+n/a2
Dn and
/cS2+./2_2
For example,ifx = 160(and m = 100,s = 15, c./IVn= 7.5) then

(4.5) , jx - .At(148,
(6.7)2).
Expression(4.5),or moregenerally (4.3),is the"posterior for,t giventheobserved
distribution
valueofx" It is possibleto makesucha statement intheBayesianframework becausewe startout
assuming that,u itselfis random.In theBayesianframework theaveraging processis reversed; the
data x is assumedfixedat itsobservedvaluewhileit is theparameter It which varies. In (4.5) for
example,theconditional averageof,Atgivenx = 160is seen to be 148.Ifwe randomly selectedan
enormousnumberof people,gave themeach an I.Q. test,and considered thesubsetof thosewho
scored160,thissubsetwouldhavean averagetrueI.Q. of148;68% ofthetrueI.Q.'s wouldbe inthe
interval[148- 6.7, 148+ 6.7],etc.
How shouldwe estimate ,tuintheBayesiansituation? It seemsnaturalto use theestimator A
whichminimizes theconditional expectationof(,s - jt *)2 giventheobserved valueof . From(4.3)itis
* The symbolsx forthe testscore and cr/- /n foritsstandarddeviationare chosen to agree withour previous
notation,even thoughreal I.Q. scoresaren'tactuallytheaverageof n independenttestitems.Perfectnormality, as
expressedin (4.2), is an ideal only approximatedby actual test scores.
easyto derivethatthis"Bayes estimator"

is
(4.6) , *(x) = m+ C(x - m),
themeanoftheposterior of,t givenx. Havingobservedx = 160,theBayesestimate
distribution is
148,not160.Eventhoughwe areusingan unbiasedI.Q. test,so manymoretrueI.Q.'s lie below160
rather
thanabovethatitlowerstheexpectedestimation errorto biastheobservedscoretoward100.
Figure2 illustrates
thesituation.
posteriordistribution
of true I.Q. for a
person scoring160 on- /-,
test
priordistribution
of I.Q. scores in Bayes
population estimate-,
/ \ | /11
* \ observed
Isumx
70 85 100 115 130 145148 160
95% probability
FIG. 2. I.Q. scores have a X(100,(15)2) distributionin the populationas a whole. A randomlyselected person
scoring160 on a normalunbiasedI.Q. testwithstandarddeviation7.5 pointsis estimatedto have a trueI.Q. of 148.
The probabilityis 95% that the person's true I.Q. is in the interval[134.6, 161.41.
Confidence
intervals
have an obviousBayesiananalogue,from(4.3),
(4.7) - 2VD
*(xg)
Prob{ut ? + 2VD Ix} = .95.
A c ,u*(xg)
The notationProb{. Ix} indicatesprobability conditional on theobservedvalueof x In theI.Q.
example,Prob{134.6' 18x= 160}= .95.
? 161.8
Nobody(well,almostnobody)disagreeswiththeuse of Bayesianmethods in situationslikethe
I.Q. problemwherethereis a clearlydefinedand well-known priordistribution forA. The Bayes
theory,as we shallsee, offerssomestrikingadvantages inclarity andconsistency. Theseadvantages
aredue to thefactthatBayesianaveragesinvolveonlythedatavaluex actuallyseen,ratherthana
collectionof theoreticallypossibleotherx values.
Difficultiesand controversiesarisebecauseBayesianstatisticians wishto use Bayesianmethods
whenthereis no obviouspriordistribution for,u,or goingevenfurther, whenit is clearthatthe
unknown ,t is a fixedconstantwithno randomcharacter at all. (Forexample,if,t is somephysical
constant,suchas thespeedoflight, beingexperimentally estimated.) Itisnotperversity thatmotivates
thisBayesianimpulse,butrathera well-documented casebookof unpleasant inconsistencies in the
frequentistapproach.
As an exampleof the kindof difficulties frequentists experience,let us reconsider the I.Q.
estimationproblem, butwithout assumingknowledge ofthepriordistribution (4.1) for,u.In other
words,assumeonlythatwe observex - .A8(t, 2'/n), o,/Vn = 7.5,and wishto estimate,u.Having
observedx = 160,theresultsof Section3 tellus to estimate,u by , = 160,with95% confidence
interval[ - 2o /Vn, 4 + 2o / /n = [145,175].
Supposenowthatthefrequentist receivesa letterfromthecompany whichadministered theI.Q.
test:"On thedaythescoreof x = 160was reported, ourtest-grading machinewas malfunctioning.
Anyscorex below100was reported as 100.The machinefunctioned forscoresx above
perfectly
100."
It mayseemthatthefrequentist hasnothing to worry about,sincethescorehe received, x = 160,

was correctly reported. However,thereasonhe is using4 = x to estimate,u is thatit is thebest
unbiasedestimator. The malfunction of the gradingmachineimpliesthat4' is no longereven
unbiased!
If thetruevalueof ,t equals 100,themachinefunctioning as describedin theletterproduces
E= 103,a biasof + 3 points. To regainunbiasedness thefrequentist mustreplacetheestimation rule
4 = x with4' = x- A(), wherethefunction
A(x) ischosentoremovethebiascausedbythemachine
malfunction.
The correction termA(x?)willbe tinyforx = 160,butit is disturbing thatanychangeat all is
necessary. The letterfrom-thegradingcompanycontainedno new information about the score
actuallyreported, or about I.Q.'s in general.It onlyconcernedsomething bad thatmighthave
happenedbut didn't.Whyshouldwe changeour inference aboutthetruevalueof ,u? Bayesian
methodsare freefromthisdefect;theinferences theyproducedependonlyon thedata value x
actually observed,since Bayesian averages such as (4.6), (4.7) are conditionalon the
observedx.
How can a Bayesiananalysisproceedin theabsenceof firmpriorknowledge like(4.1)? Two
differentapproaches areinuse.The "subjectivist"branchofBayesianstatistics attempts to assessthe
statistician's
subjectiveprobability distributionforthe unknownparameter ,u,beforethe data is
collected,bya seriesofhypothetical wagers.Thesewagersareoftheform"wouldyoube willing to
bet evenmoneythat,u > 85 versus,u_ 85? Wouldyoube willingto bet two-to-one that,u< 150
versus,u' 150?..." The workof L. J. Savage and B. deFinetti showsthata completely rational
personshouldaiwaysbe ableto arriveat a unique(forhimself) priordistributionon ,u bysufficiently
prolonged self-interrogation.
The subjectivist approachcan be veryfruitful in cases wherethe statistician (usuallyin
collaboration withtheexperimenter, ofcourse)hassomevagueprioropinionsaboutthetruevalueof
At,which he is tryingtou'pdateon thebasisoftheobserveddatax. Becauseitissubjective, themethod
is not muchused whereobjectivity is theprimeconsideration, forexamplein thepublication of
controversial newscientific results.
AnotherlineofBayesianthought, whichmight be (butusuallyisn't)called"objectiveBayesian-
ism,"attempts, intheabsenceofpriorknowledge, toproducea priordistribution thateveryone would
agreerepresents a completelyneutralprioropinionabout,u.In theI.Q. problem, sucha "flat"prior
might taketheform,u - X(0, 00), whereby we mean,u _ XV(0,s2) withs2 goingtoinfinity. From(4.3),
(4.4) we get
(4.8) _Ix X(x 2/n).
Thisresulthasa lotofappeal.TheBayesestimator ,u* equalsthefrequentist estimator = x.The
A
95% Bayesprobability interval (4.7) is thesame as the95% frequentist confidenceinterval(3.5).

Moreover, because(4.8)isa Bayesianstatement, theletterfromtheI.Q. testing company hasnoeffect
on it. We seemto be enjoyingthebestof boththefrequentist and Bayesianworlds.
An enormousamountof effort has been expendedin codifying theobjectiveBayesianpointof
view.Bayeshimself putforththisapproach(apparently withconsiderable reservations-his paper
appearedposthumously and onlythrough theeffortsof an enthusiasticfriend)whichwas adopted
unreservedly by Laplace. It fellintodisreputein theearly1900's,and has sincebeen somewhat
revived bytheworkofHaroldJeffreys. is thata "flat"priordistribution
One difficulty for,uis notat
all flatforA', say,so expressing ignoranceseemsto dependon whichfunction of theunknown
parameter one is interestedin. A morepernicious difficultyis discussedin Section8; in problems
involving theestimation of severalunknown parameters at once,whatappearsto be an eminently
neutralpriordistribution turnsout to implyundesirable assumptions abouttheparameters.
of the mean.Ronald Fisherwas one of the principalarchitects
5. Fisherianestimation of
frequentist
theory.However,he wasa lifelong
critic, so, ofthestandard
oftenvehemently frequentist
approach.His criticismsmovedalongthesamelinesas thoseof theBayesians:whyshouldwe be

in theoretical
interested averagesconcerning manyx valuesare randomly
whathappensifinfinitely
generatedfromX(p(,u2/n), with,t fixed?We onlyhaveoneobserved valueofx inanyoneinference
problem,and theinference processshouldconcentrateon justthatobservedvalue.
Fisherwas also opposedto theBayesianapproach,perhapsbecausethetypeof data analysis
problemshe metinhisagricultural workwerenotwellsuitedto theassessment
andgenetical ofprior
Withcharacteristic
distributions. he producedanotherformofinference,
ingenuity neither
Bayesian
norfrequentist.
The relationx - X(,t, o2I n) maybe written
(5.1) x =A,u+ , e _X(0,2C n).
We obtaintheobservation x?byaddingnormalnoise,e _
V(0, O.2 In), to theunobservable
meanA.
as
Expression(5.1) can also be written
(5.2) = X-
It is obvious,or at leastwas obviousto Fisher,thatin a situation

wherewe knownothing a priori
about,t, observing x tellsus nothingabout e. As a matterof fact,said Fisher,if we can learn
something aboutE fromx thenmodel(5.1) byitselfmustbe missing someimportant aspectofthe
statistical
situation. We shallsee thisargument again,in moreconcreteform,in thenextsection.
If? _ (0, a2/I n) then- E _ A(0,C2I n) becauseofthesymmetry ofthebell-shaped curveabout
itscentralpoint.Fisher'sinterpretation of (5.2) was
(5.3) ,u |. _
X(x~,(r2I n).
(4.8),buthasbeenobtainedwithout
Bayesianstatement
Thislooksjustliketheobjectivist recourse
to
statement
on ,. The interval
priordistributions followingfrom(3.3) is
(5.4) Prob{x-2a n -' x+ 2o,VnIx-}= .95.
Thisis a "fiducial"probabilitystatement, in Fisher'sterminology.
In thefiducialargument randomness residesneither in thedata x, as in frequentist
calculations,
nor in A, as in Bayesiancalculations.Ratherit lies in the mechanism whichtransforms the
unobservableit to the observedx. (In the case at hand,this mechanismis the additionof
E _VX(0,o2In) to ,A.)Fiducialstatements suchas (5.4) are obtainedas averagesovertherandom
transformation mechanism.
The fiducialargument has fallenoutoffavorsinceitsheydayinthe1940's.Most,thoughnotall,
contemporary considerit eithera formof objectiveBayesianism,
statisticians or justplainwrong.
Appliedto the simultaneous estimation of severalparameters, thefiducialargument can lead to
as shownin Section8.
disaster,
Lest the readerfeelsorryforFisher,two otherof his novelideas on averaging, conditional
inference and randomization, are stillverymuchin vogue,and are thesubjectsof the nexttwo
sections.
6. Conditional We return
inference. pointofview,butwitha twist,
tothefrequentist "condition-
ing," introducedby Fisherin 1934. Conditionalinferenceillustrates anothermajor sourceof
ambiguityin thefrequentist
methodology, thechoiceofthecollection oftheoreticallypossibledata
valuesaveragedoverto obtaina frequentist
inference.
_
Suppose againthatwehaveindependent normal X1,X2, X3,.. ., Xn, eachxi X(,u,o2),but
variables
thatbeforeobservationbeginsthenumbern is randomly selectedby theflipof a faircoin,
10 1/2
(6.1) n= withprobability
100 1/2.
19781 CONTROVERSIES IN THE FOUNDATIONS OF STATISTICS 239
A onthebasisofthedataX1,X2, X3, ... ., Xn andn with

Westillwishtoestimate a a known as
constant
before.
distribution
The conditional valueofn is
of x giventheobserved
(6.2) xI n_ X(,u, 21/n)
is nota sufficient
as at (2.8).Theobservedaveragex byitself statistic Wealsoneedto
inthissituation.
knowwhether n equals10 or 100.Withoutthisknowledge we stillhavean unbiasedestimatorof ,
namely4 = x, butwe don'tknowthestandarddeviationof 4.
Whatis theexpectedsquarederrorof4 = x inthissituation? Averaging (3.3)overthetwovalues
of n gives
_2
(6.3) E A )2 =
_ a
210 2100'
pointed
Fisher It isobviously
calculation.
outthatthisis a ridiculous toassessthe
moreappropriate
accuracyof 4 conditional
on thevalueof n actuallyobserved,
(6.4) cr2/10 if n =10
Ef(IA-Ay)lnf=
aIJt 2/100 if n=100.
Thereis nothingwrongwith(6.3),exceptthattheaveragesquarederroritcomputes to
is irrelevant
anyparticularvalueofn andx actually observed!If n = 100then(6.3) is muchtoo pessimisticabout
theaccuracyof 4,whileif n = 10 it is muchtoo optimistic.
Thismayall seemso obviousthatit is hardlyworthsaying.Fisher'ssurprise was to showthat
exactlythe same situation more
arises, in
subtly, other problemsof inference.
statistical We will
thiswithan exampleinvolving
illustrate theestimation oftwodifferentnormalmeans,sayu, and 2,
on thebasisof independent unbiasednormalestimates foreach of them,
(6.5) xl - X(,ull1), x-2-fy(,2i 1),
xl and x2 independent of each other.(For simplicity we have assumedthatbothestimates have

cr2/n = 1.) The twodimensional datavector(x, x2)cantakeon anyvalueintheplane,butwithhigh
probability lies no morethana fewunitsawayfromthevectorof means(u11, tz2).
Givenno further informationwe wouldprobably estimate((,1,U 2) by(x, x-2).(Butsee Section8!)
However,wenowaddtheassumption that(,u, 2) is knowntolieon thecircleofradius3 centered at
theorigin,
(6.6) (,U1, U2)= 3(cos0,sin0) -< c0
Thestatistical
problem, inFigure3,is toestimate
as illustrated 6 on thebasis
parameter
theunknown
of (X1, 2)*
of (xl,x2) by
Let us indicatethepolarcoordinates
(6.7) 8 arctan(92/i1), r =V + 2.
of 8. It is unbiased,EO = 8, withexpectedsquarederror
Then 8 is theobviousestimator
(6.8) E(6 -_ ) = .12
(obtainedbynumerical integration;(6.8) makestheconventionthat6- 8 rangesfrom- ir to ir for
anyvalueof 8,thelargestpossibleestimation erroroccurringif(xl,x2) is antipodalto (,b U2). This
convention becausetheprobability
is unimportant of 6 - 8 > ir/2 is only.0014).
The unobviousfactpointedout byFisheris thatr playsthesameroleas did "n" in examples
(6.1)-(6.4).
(i) The distributionof r does notdependon thetruevalueof 8. (Forreadersfamiliar withthe
true mean vector

(J"1, /2) is known X(li, 2) = 3(cos6,sin6)
to lie on this
circle 3
data vectoril,2xr
is observed to
( X2
lie on this
circle
FIG. 3. The model X, - X (,u, 1) independentof X2 -ff(/L2, 1), with(tL t 2) knownto lie on a circleof radius 3
centeredat theorigin.We wishto estimatetheangularlocation6 of(Ax,,l2) on thecircle.The data vector(21 x2) is
observed to have polar coordinates(0, r).
bivariatenormaldensity, thisfollowsfromthecircular symmetry (6.5) of(xl,x2)

ofthedistribution
about(,tt1,/L2).)
(ii) Ifr is small,then6 haslessaccuracy whileifr is largethen6 hasgreater
than(6.8) indicates,
accuracy that(6.8)indicates. Table 1 showstheconditional expectedsquarederrorE{(6 - 6)2 1 r} as a
function of r.
In Fisher'sterminqlogy, r is an "ancillary" containinformation
Itdoesn'tdirectly
statistic. about6,
becauseof property (i), butitsvaluedetermines theaccuracyof 6. It nowseemsobviousthatwe
shouldcondition ourassessment oftheaccuracy of6 on theobservedvalueofr.Ifr = 2, as inFigure
3,thenE{(6 - _)21r} = :18is morerelevant totheaccuracy of6 thanis theunconditionalexpectation
E(6- O) = .12.
Unconditional
Value
r 1.5 2 2.5 3 3.5 4 4.5 5 E(6- _)2
E{(6-0 )2r} .26 .18 .14 .12 .10 .09 .08 .07 .12
TABLE 1. The conditionalexpectedsquared errorof estimationin the circleproblem,E{(6 - 6)21r}, as a function
of the ancillarystatisticr V
A/x2+ x2. The accuracyof 6 improvesas r increases.FisherarguedthatE{(0 - I)21r}
is a more relevantmeasure of the accuracyof 0 than is the unconditionalexpectationE(O - O)2.
Manyreal statisticalproblemshave the property thatsome data valuesare obviouslymore

informative thanothers.Conditioning is theintuitively wayto proceed,butfewsituations
correct are
as clearlystructured
as thecircleproblem. Sometimes morethanone ancillary exists,andthe
statistic
samedata valuewillyielddifferent accuracyestimates dependingon whichancillary is conditioned
upon.Moreoftenno ancillary exists,butvariousapproximate ancillarystatistics
suggestthemselves.
Whatthecircleexamplerevealsis thatfrequentist statementslike(6.8) maybe truebutirrelevant.
Fisher'spointwasthatthetheoretical averageof(6 - 6)2 shouldbe takennotoverall possibledata
values,butonlyoverthosecontaining thesameamountof information for6. So farit has proved
impossible to codifythisstatementin a satisfactoryway.
A Bayesianwouldagreethatitis correctto condition one'sopinionoftheaccuracyof6 on the
observedvalueofr,butwouldaskwhynotgo further andcondition on theobservedvalueof(x1,x-2)

itself.
Thisis impossible
inthefrequentistframework, sinceifwereduceouraveraging settoonedata
point,thereis nothingleftto averageover.Bayesianinferences are alwaysconditionalon thedata
pointactuallyobserved.In the circleproblemthe naturalflatprioris a uniform distribution
on
6E xr]. Withthispriordistributionit turnsoutthatE{(6 9)21(1, X-2)} equalsE{(6 _ 6)21r =
-
2+1} as givenin Table 1, so in thisparticularcase the objectiveBayesianand conditional

frequentist in
pointsofviewagree.(Noticethat the first
expectation"6" istherandomquantity, while
in thesecondit is "6" whichvaries.)
7. Randomization. Randomization is yetanotherformofinferential

averagingintroduced
byR.
A. Fisher.In ordertodiscussitsimply
wemustchangestatistical from
problems, estimation to
theory
"hypothesis testing."The data are now in the formof 2n independent normalobservations
Xl, X2, X3, . ., Xn Yl, Y2, Y3, ... * Yn
(7.1) xi_-A(/ul, -2), yi - A"(/L2, 2) i = 1,2, . n,

...,
witho-known,u1 and U 2 unknown. We wishto testthe"nullhypothesis"
that t2 = 1 versusthe
"alternative
hypothesis"
that 2 > ,U oftenwritten
(7.2) H:U 2 = 1 versusA: 2 > U1.
(For our purposes,U2 < U1 is assumedimpossible.)

In hypothesis testingthenullhypothesisH usuallyplaystheroleofa devil'sadvocatewhichthe
experimenter is tryingto disprove.
Forexample,thex's mayrepresent to an old drugand
responses
they's responses to a newdrugthattheexperimenter hopesis an improvement. Becausethereis a
vestedinterest indiscreditingH, conservative
statistical
methods havebeendevelopedwhichdemand
a ratherstifflevelofevidencebeforeH is declaredinvalid.Thefrequentist whichis dominant
theory,
inhypothesis testing,
accomplishes thisbyrequiring
thattheprobabilityoffalsely
rejectingH infavor
ofA, whenH is true,be heldbelowa certainsmalllevel,usually.05.A testsatisfyingthiscriterion
is
said to be ".05 level"fortestingH versusA.
Withthedata as in (7.1) it seemsnaturalto compute = Y2lx5 I/n, =y nyiIn, and rejectH in
favorof A if
(7.3) -x> c.
The constantc is chosenso thatif H is truethenProb{ - > c} =.05. Standardprobability
calculationsshow that c = 2.326 cr/Vn is the correctchoice. The theoryof optimaltesting
developedbyJ.NeymanandE. Pearsonaround1930showsthat(7.3)is actually thebest.05leveltest
ofH versusA, inthesensethatifA is actually truethentheprobabilityofrejectingH infavorofA
is maximized.
Thex 's andy'sweobserveareactually measurementson somesortofexperimental units,perhaps
collegefreshmen orwhite miceorheadache Letusdenote
victims. theseunitsbyU1, U2, U3, ..., U2n.
The opportunity forrandomization ariseswhenwe have an experiment in whichwe can decide
beforehand whichn oftheunitsaretobe x's, andwhichn aretobe y's.Ifwearelazywecanjustgive
thefirstn unitswe happento haveat handthex treatment and thelastn they treatment. Thisis
begging fordisaster!
The firstn headachevictims maybe thosewiththeworstheadaches,thefirst n
micethosein thecage withtheheavieranimals,etc.An experiment donein thelazywaymayhave
probabilityoffalsely thenullhypothesis
rejecting muchgreater than.05becauseofsuchuncontrolled
factors.
In hisvastly
influential
workonexperimental design,Fisherarguedthatthechoiceofexperimental
unitsbe donebyrandomization. Thatis,theassignment ofthen unitsto thex treatment groupand
then unitsto they treatment groupbe donewithequalprobability foreachofthe(2n!)/(n!)2 such
assignments.A random number generatingdeviceisusedtocarryouttherandomization process.
Fisherpointedoutthatrandomized studieswerelikelytobe freeofthetypeofexperimental biases

discussedabove. Supposeforexamplethatthereis somesortof "covariate"connectedwiththe
experimental units,bywhichwe meana quantity whichis thoughtto affecttheobservation on that
unitno matter whichtreatment isgiven.Forexample,weight might be an important covariate forthe
whitemice.Heavymicemightrespondlesswellto thestimulus thanlightmice.If n is reasonably
large,say10,itis veryunlikelythattherandomized experiment willhaveall theheavymiceinthex
groupandthelightmiceinthey group.Thisstatement appliesequallytoeverycovariate, whether or
notwe knowit affects theresponse,and evenifwe are unawareof itsexistence.
None of thishas anything to do withaveraging. The connection comesthrough Fisher'snext
suggestion:thatwe computetheoretical averagesnotoverthehypothesized normaldistributions, but
insteadovertherandomization processitself.Supposethatifall 2n experimental unitshadreceived
treatment x,theobservations wouldhavebeenXI, X2,.. ., X2n,Xi beingtheobservation on unitUi.
The capitallettersindicatethatthesearehypothetical observationsandnotnecessarily theobserved
data.Underthenullhypothesis H, treatment y is thesameas treatment x,so we canindeedconsider
all 2n unitsto havereceivedtreatment x. In thiscase theobserveddata xl,x2,.. ., x., y1,y2,. . ., Yn
coincidewiththetheoretical valuesX1,X2,.. ., X2n.Let Y(x) be theindicesof thoseunitsactually
assignedto thex treatment and 9(y) thoseassignedto they treatment. Then,ifH is true,
(7.4) x=E Xi/n, y=E Xi/ln.

iE9'(x) iE Y(y)
Ifthestudyhasbeenrandomized thenx is merelytheaverageofn randomly selectedX's andy the

averageof theremainingn X's.
The randomization
(or "permutation") testof H analogousto (7.3) is constructedas follows:
(i) Giventhe observeddata xl,x2,...,xn, yb,y2,...,yn, defineu1-=xl,U2 X2, * * *, Un+-
y,I, ..., u2. yn.(Noticethat,ifH istrue,theu'scoincidewiththeX's oftheprevious
paragraph.)
(ii) Foreachpartition
P- {Y,, Y2}of{1,2,..., 2n} intotwodisjointsubsetsofsizen,calculate
(7.5) uyx)i / - E ui/n.

*n
iEE92 iEEYI
(iii) Listall (2n!)/(n!)2 valuesof (y- in ascending order.

(iv) RejectH infavorofA ifthevalueofy - i actually observed is intheupper5% ofthelist.
The randomization testhasa .05chanceoffalsely rejectingH, where .05 nowrefers
theprobability
toan averagetakenoverall (2n!)/(n!)2 randomassignments oftreatment typestoexperimental units.
The testis stilloftheform"rejectH infavorofA ify - x > c," exceptthatc no longerequalsthe
constant 2.326 c/Vn. Insteadc is a function ofthesetofvalues{u1,u2,. .., u2n}constructed in (i).
For each set {ul, u2,..., u2n},c is selectedto satisfy(iv).
Therandomization testhasonebigadvantage overtest(7.3).Its.05probabilityoffalsely rejecting
H remainsvalidunderanynullhypothesis thatsaysthe2n x's and y's are generated bythesame
probability distribution,normalor otherwise.,'As a matterof fact,no randomness at all in the
observations needbe assumed.We canjusttakethenullhypothesis tobe thateachunitUi hasa fixed
responseXi connected withit,no matter whetheritisgiventhex ory treatment. Thislaststatement
reemphasizes thattherandomization testmustinvolvea non-frequentist formof averaging.
Randomization, or at leastinference based on randomization, appearshereticalto a Bayesian
statistician.The trueBayesianmustcondition on theassignment {Yf(x),S/(y)}ofunitsto treatments
actuallyused,sincethisis partoftheavailabledata,andnotaverageoverall possiblepartitions that
might havebeen.(Fisher'sarguments on ancillarityseemtopointinexactly thesamedirection, which
is to say directly oppositeto randomization!)
One aspectofrandomization makesbothfrequentists andBayesiansuneasy.Suppose,justbybad
luck,thattherandomization processdoeshappento assignall heavymiceto thex treatment andall
lightmiceto they treatment. Can we stillusethe.05levelrandomization testtorejectH infavorof
A ? Theanswerseemsclearlynot,butitisdifficult
tocodifya wayofavoiding suchtraps.To putthings
theotherway,supposewe knowthe weightswI, w2,w3,..., w22n of themicebeforewe beginthe
experiment.
Underreasonablefrequentist assumptionstherewillbe a uniquebestway{Y(x), Y(y)}
ofassigning
themice'tothetreatments forthepurposeoftestingtreatment x versustreatmenty,one
thatoptimallyequalizesthe weightassignments trainedin the
to the two groups.Statisticians
Fisherian
tradition
finditdifficult
to acceptsuch"optimalexperimental designs"becausetheelement
of randomization
has been eliminated.
Thereadermayhavenoticedthatthecontroversies
8. Stein'sPhenomenon. so farhavebeenmore
academicthanpractical.All philosophical
factionsagreethatin the absenceof priorkiiowledge
[x -2* o-IV/nx+ 2 ro-IVn]is a 95% intervalfor,u, the disagreement beingoverwhat"95%"
means.Thissituationchanges,fortheworse,whenwe consider thesimultaneousestimationofmany
parameters.
Supposethenthatwehaveseveralnormalmeans,u1,12,.. .*, lk toestimate,
foreachone ofwhich
we observean independent,unbiasednormalestimate
(8.1) , 1) independentlyi = 1,2,..., k.
xi sV'(,l
(Once againwe havetakenthevarianceo-2/nequal to 1 forthesakeofconvenience.) The natural
analogueof squarederrorlosswhenthereare severalparameters to estimateis Euclideansquared
distance.To simplify notation,let i= (I, x2,.. .,k) be the vectorof observedaverages, t =
'
(/1-, Il2, -/-Lk)
. , thevectoroftruemeans,and = (41, 2,..*, ^k) thevectorofestimates. Thenthe
squarederrormisestimation penaltyis
k
(8.2) 11 - z 112
=
i=l
( -i )2.
Beforepursuing the problemof estimatingit on the basis of x, we note an elementary

but
fact.Thisfact,whichcan be provedin one linebyreadersfamiliar
important withthemultivariate
normaldistribution,
is thatforeveryparameter vector,i we have
(8.3) Prob{ 1i 1> 11
it II}>.50.
Thatis,thedatavectorx tendsto be farther awayfromtheoriginthandoestheparameter vectorjA,
no matter what,u-is.Table2 showsthatfork = 10theprobability is actuallyquitea bitgreater
than
.50 formoderatevaluesof I 11
Supposethatk = 10,andwe observea datavectorx withsquaredlengthI|x 112= 12.Assumealso
thatwe haveno priorknowledge aboutit. Lookingat Table 2, it seemsto be a verygoodbetthat
||l 112< 12.For|| i 112intherange [0,40],whichis almostcertainlythecaseif1 x 112= 12,morethan75%
ofthetimewe have lix1> 11 I|.However,thisis a frequentist "75%," calculatedwithyi fixedandi
varying randomly according to (8.1).The analogueoftheobjectiveBayesianargument presentedin
Section4 givesquitedifferent results.
11L112 0 6 12 18 24 30 40 6b
Prob{Ilill> lluII} 1.00 .967 .904 .857 .822 .795 .762 .719
TABLE ' ||,u 11is alwaysgreaterthan .5. For the case k = 10 the probabilitiesare much
2. The probabilitythat IIJJ
greaterthan .5 for moderatevalues of II 11.
Givenourcompletepriorignorance abouttheparameter vector,, itseemsnaturalto use a flat

prioroftheform,u,-. J(0, x) (thatis,gi -y(0,
_
S2) with fori = 1,2,.. ., k.This
s2 _> oc) independently
leadsto theposterior distribution(4.8) foreach parameter ,gi,
(8.4) ,i xi ( 1)
fori = 1,2,.. .,k. Thisof courseis a Bayesianstatement,

independently withthexi's fixedat their
observedvaluesandthe,/i's varying randomly accordingto (8.4). Reversing
thenamesofthefixed
and randomquantitiesin Table 2 gives
(8.5) ,u 1>
Prob{ 11 1 i lIx
11| =12}
112 = .904.
It nowseemsto be a verygood bet that||,u || > lix||.As a matterof fact,
(8.6) ,u 1> 1x 11
Prob{11 IxI}>.50
foreveryobserveddata vectorx! Fisher'sfiducialargument of Section5 also leads to (8.4)-(8.6).
Equations(8.3) and(8.6) showa clearcontradiction
betweenthefrequentist andBayesianpoints
of view.Whichis correct?Thereis a mostsurprising and persuasiveargument in favorof the
frequentist
calculation (8.3). Thiswas providedbyCharlesSteinin themid1950'sandconcerns the
estimation
of it on thebasisof thedata vectorx (or equivalently
theestimation oftheparameters
/1 , /t2, - - -, /Uk on thebasisof xl, x~2,. . k)
The obviousestimator
is
(8.7) ,u(x) =
whichestimates
each ,gi by xi,as at (3.1). Thisestimatehas expectedsquarederrorloss
(8.8) Ell,u 11,2| =k
foreveryparametervectorit.WhatSteinshowedis thatifk,thenumber
ofmeanstobe estimated,
is
3, thentheestimator
= i
(8.9) 1-l x
has
(8.10) Ell, -, 112<k
foreveryit! (Thisparticular formof ,u was developedjointlywithW. Jamesin 1960.)Froma
frequentist pointof view,,u estimatesit uniformly betterthandoes ,u. It is also betterfroma
Bayesianpointofview:givenanypriordistribution on it,estimating by,u ratherthan,u results in a
loweroverallexpectedsquarederrorofestimation (averaging nowovertherandomness init andthe
randomness in x).
Stein'sestimator = II IItendsto be greaterthan11it
is based on (8.3). Since IIl,ill 1Iwithhigh
probability, a shrinking factor[1- (k - 2)/11 is usedto givean estimate
x 112] nearer,i. The shrinking
factoris moredrasticwhenII is small.Withk = 10,II
i 112 = 12,we have ,u = [.333]i. If instead
i 112
xi112= 800 then,u = [.99]i. Figure4 givesa schematic illustration.

Noticethattheorigin0 playsa specialroleintheconstruction of1,,eventhough thereis nothing
inthestatement oftheestimation problem thatfavors0. As a matter offact,wecanchangetheorigin
to anyotherpointin k dimensional space,O' say,and obtaina different Steinestimate,
(8.11) 1'=O'+ 1- kx 2 -)'
whichis also uniformly betterthan,u.

Stein'sresulthascreateda hostofdifficulties
forfrequentists andBayesiansalike,whichwe can't
pursuehere.Theimplications forobjectiveBayesiansandfiducialists havebeenespecially
disturbing.
The seemingly leadingto (8.4) isn'tflatat all: itforcestheparameter
flatpriordistribution vectorto
farawayfromany prechosenoriginO'. If a satisfactory
relatively theoryof objectiveBayesian
inferenceexists,Stein'sestimatorshowsthatit mustbe a greatdeal moresubtlethanpreviously
expected.
I **. . ,', Stein's estimatorrelative
= [1_ k-2 i 0', some otherorigin
le
0
FIG. 4. Stein's estimate, is obtainedby shrinking the obvious estimate, x towardthe origin0. The shrinking
factoris moreextremethecloser ||x|| lies to O. Steinand JamesshowedthatE!l,i - Iu 11'< E|l,4 - Iu 112
foreveryIu.
We can choose any other originO' and obtain a different Stein estimate,,u', whichalso dominates,u.
The troublewiththemultiparameter problemis notthatitis harderthanestimating

estimation a
singleparameter. It is easier,in thesensethatdealingwithmanyproblems can give
simultaneously
extrainformation nototherwise available.Thetroubleliesinfinding andusingtheextrainformation.
ConsidertheBayesianmodel(4.1).Withjusta single,u toestimate thismodelmustbe takenon pure
faith(or relevantexperience). However,ifwe haveseveralmeansto estimate,t1, /12,.. .,-,gk, each
drawnindependently froman )f(m, s2) population, m and
thedataxl, x2, . . ., xk allowsus to estimate
s2, insteadof postulating theirvalues.Plugging theestimated valuesinto(4.6) givesan "empirical
Bayesrule"verymuchliketheSteinrule(8.11).EmpiricalBayestheory, developedby
originally
HerbertRobbinsintheearly1950's,offers somehopeofa partialreconciliation betweenfrequentists
and Bayesians.
9. Some last comments. Thefieldofstatistics continuestoflourishdespite,andpartly becauseof,
itsfoundational controversies. Literallymillions analyseshavebeenperformed
ofstatistical inthepast
50 years,certainly enoughto make it abundantly clear thatcommonstatistical methodsgive
trustworthy answers whenusedcarefully. In myownconsulting workI amconstantly reminded ofthe
powerof thestandardmethodsto dissectand explainformidable data setsfromdiversescientific
In a waythisis themostimportant
disciplines. beliefofall,cuttingacrossthefrequentist-Bayesians
divisions:thattheredo existmoreor lessuniversal techniquesforextracting information fromnoisy
data,adaptableto almosteveryfieldof inquiry. In otherwords,statisticians believethatstatistics
existsas a discipline in itsownright, eveniftheycan'tagreeon itsexactnature.
Whatdoes the futurehold? At a recentconference Dennis Lindley,of University College,
London,gave a talkentitled,"The futureof statistics-ABayesian21stcentury." My personal
subjectiveprobability is .15on thateventuality.Thebigadvantage ofsubjective Bayesianism, whichis
whatProfessor Lindleywas referring to,is itslogicalconsistency.Philosophers whoinvestigate the
foundations ofscientific inferenceusuallywindup beingrepelledbyfrequentism andattracted to the
Bayesianargument.
But consistency isn't enough.SubjectiveBayesianismmustface the challengeof scientific
Thisis theultimate
objectivity. stronghold ofthefrequentist viewpoint.Ifthe21stcentury isBayesian,
myguessis thatit willbe somecombination of subjective,objective,and empirical Bayesian,not
significantlyless complicated and contradictory thanthepresentsituation. The complexity of the
problems statisticiansare askedto deal withis increasing at an alarming rate.It is notunusualthese
daysto deal withdatasetsofa millionnumbers, and modelswithseveralthousandparameters. As
Section8 suggests, thistrendis likelyto exacerbate ofproducing
thedifficulties a logicallyconsistent
theoryof statistics.
246 J. D. GRAY AND S. A. MORRIS [April
AnnotatedReferences
V. Barnett,ComparativeStatisticalInference,Wiley,New York, 1973. [A clear discussionof the frequentist

viewpointas compared withBayesian methods.]
A. Birnbaum,On the foundationsof statisticalinferenice (withdiscussion).J. Amer. Statist.Assoc., 57 (1962)
269-326. [A muchdeeper discussionof foundationalcontroversies.The discussionis excellentin its own right.I
stole Pratt'smeterman example forSection 4.]
B. DeFinetti, Foresight:Its logical laws, its subjectivesources. Studies in Subjective Probability,ed. by M.
Kyburgand H. Smokler,93-158,Wiley,New York, 1964.[The mostextreme,and withSavage themostinfluential,
subjectivistofour timewrotethisseminalworkin 1935.This volumealso containsessaysbyVenn, Boral, Ramsey,
Koopman, and Savage.]
B. Efron,Biased versusunbiasedestimation.Advances in Math.,No. 3, 16 (1975) 259-277. [Stein'sestimatorin
theoryand practice.]
R. A. Fisher,StatisticalMethodsand ScientificInference.Oliver and Boyd, London, 1956. [Fisher'slast major
work. Fiducial and conditionalargumentsare persuasivelyadvanced. Must be read withcaution!]
H. Jeffreys, Theoryof Probability,3rd Edition. Clarendon Press,Oxford,1967. [The mostimportantmodern
work on objective Bayesianism.]
D. V. Lindley,Bayesian Statistics-A Review. SIAM Monographsin Applied Mathematics,SIAM, Philadel-
phia, (1971). [A good referencefor the Bayesian point of view, both subjectiveand objective.]
, The futureof statistics-a Bayesian 21st century.Proceedingsof the Conferenceon Directionsfor
MathematicalStatistics,(1974). Special supplementto Advances in Applied Probability,September1975. [The
essays by P. J. Huber and H. Robbins also relate to the futureof statistics.]
L. J. Savage, The Foundationsof Statistics.Wiley,New York, 1954. [This book sparkedthe revivalof interest
in the subjectivistBayesian point of view.]
DEPARTMENT OF STATISTICS, STANFORD UNIVERSITY, STANFORD, CA 94305.
WHEN IS A FUNCTION THAT SATISFIES THE

CAUCHY-RIEMANN EQUATIONS ANALYTIC?
J. D. GRAY AND S. A. MORRIS
theorem-Anextension
1. The Looman-Menchoff of Goursat'stheorem.It is well known'thata
f = u + iv,defined
function
complex-valued on a domainD inthecomplexplanesatisfies
andanalytic
theCauchy-Riemannequations
du dv du dv
dx dy and d
throughoutD. The standardtextbooks,such as thoseauthoredby Ahlfors,Cartan,Churchill,

Jameson,Knopp,Sansoneand Gerretson, avoidanswering thequestionas to whether or not the
converseholds.Mostinsteadofferthefollowingpartialconversedue to Goursat[131.
THEOREM1. Iff = u + iv,defined
on a domainD, is suchthat
(i) du/dx,du/dy,dv/dx,dv/dyexisteverywhere in D,
(ii) u, v satisfy theCauchy-Riemann in D, and iffurther
equationseverywhere
(iii) f is continuous in D,
(iv) du/Idx.du/Idy, in D,
dv/dx,dv/dyare continuous
is
thenf analytic in D.
revisedversionof an articleby the presentauthorsand S. A. R. Disney thatappeared in

This is a substantially
the Gazette of the AustralianMathematicalSociety2 (3) (1975), 67-81. S. A. R. Disney's name does not appear
above only because he preferredit thatway.
' This is a rare instanceof a well-knownresultthat is indeed well known.

Controversies in The Foundation of Statisttics 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Controversies in The Foundation of Statisttics 1

Uploaded by

Copyright:

Available Formats

Controversies in the Foundations of Statistics

Author(s): Bradley Efron

An apparentgrowing trendto rewardeffort or attendance ratherthanachievement has been

CONTROVERSIES IN THE FOUNDATIONSOF STATISTICS

BradleyEfronreceivedhis Ph.D. in StatisticsfromStanfordin 1964 underthe directionof RupertMiller.He

(2.1) Prob{a 'x - b}I = f (x)dx,

(2.2) + l(x \(x ) ]

withcr2 insteadof 0r as thesecondargument byconvention.

A-3or ,-2or aA-orb ,u ,+or ,+2or ,+3or

(2.4) t =E{x}-f_ x4,,(x)dx.

(2.5) E{g(x)}-J g(x)4,,(x)dx.

(2.6) E{(x-A)2} = f (X-_)%0,a(X)dX = _2.

Theparameter o, calledthe"standard deviation," setsthescaleforthevariability

be theobserved ofthen independent

3. Frequentist estimation ofthemean.Thestatistician maywishtoestimate theunobservable

Gaussshowedthatamongall unbiasedestimation rules4 (xl,x2,...,x) whichare linearin

For example,ifx = 160(and m = 100,s = 15, c./IVn= 7.5) then

easyto derivethatthis"Bayes estimator"

70 85 100 115 130 145148 160

It mayseemthatthefrequentist hasnothing to worry about,sincethescorehe received, x = 160,

95% Bayesprobability interval (4.7) is thesame as the95% frequentist confidenceinterval(3.5).

approach.His criticismsmovedalongthesamelinesas thoseof theBayesians:whyshouldwe be

It is obvious,or at leastwas obviousto Fisher,thatin a situation

A onthebasisofthedataX1,X2, X3, ... ., Xn andn with

xl and x2 independent of each other.(For simplicity we have assumedthatbothestimates have

true mean vector

bivariatenormaldensity, thisfollowsfromthecircular symmetry (6.5) of(xl,x2)

Manyreal statisticalproblemshave the property thatsome data valuesare obviouslymore

observedvalueofr,butwouldaskwhynotgo further andcondition on theobservedvalueof(x1,x-2)

2+1} as givenin Table 1, so in thisparticularcase the objectiveBayesianand conditional

7. Randomization. Randomization is yetanotherformofinferential

(7.1) xi_-A(/ul, -2), yi - A"(/L2, 2) i = 1,2, . n,

(For our purposes,U2 < U1 is assumedimpossible.)

Fisherpointedoutthatrandomized studieswerelikelytobe freeofthetypeofexperimental biases

(7.4) x=E Xi/n, y=E Xi/ln.

Ifthestudyhasbeenrandomized thenx is merelytheaverageofn randomly selectedX's andy the

(7.5) uyx)i / - E ui/n.

(iii) Listall (2n!)/(n!)2 valuesof (y- in ascending order.

Beforepursuing the problemof estimatingit on the basis of x, we note an elementary

Givenourcompletepriorignorance abouttheparameter vector,, itseemsnaturalto use a flat

fori = 1,2,.. .,k. Thisof courseis a Bayesianstatement,

xi112= 800 then,u = [.99]i. Figure4 givesa schematic illustration.

(8.11) 1'=O'+ 1- kx 2 -)'

whichis also uniformly betterthan,u.

I **. . ,', Stein's estimatorrelative

= [1_ k-2 i 0', some otherorigin

The troublewiththemultiparameter problemis notthatitis harderthanestimating

V. Barnett,ComparativeStatisticalInference,Wiley,New York, 1973. [A clear discussionof the frequentist

DEPARTMENT OF STATISTICS, STANFORD UNIVERSITY, STANFORD, CA 94305.

WHEN IS A FUNCTION THAT SATISFIES THE

J. D. GRAY AND S. A. MORRIS

throughoutD. The standardtextbooks,such as thoseauthoredby Ahlfors,Cartan,Churchill,

revisedversionof an articleby the presentauthorsand S. A. R. Disney thatappeared in

You might also like