You are on page 1of 22

StatisticalScience

1986,Vol. 1, No. 1, 54-77

BootstrapMethodsforStandardErrors,
ConfidenceIntervals,and OtherMeasures of
StatisticalAccuracy
B. Efronand R. Tibshirani

Abstract.This is a reviewof bootstrapmethods,concentrating on basic


ideasand applications ratherthantheoreticalconsiderations.
It beginswith
an expositionof the bootstrapestimateof standarderrorforone-sample
situations.Severalexamples,someinvolving quitecomplicated statistical
procedures, are given.The bootstrapis thenextendedto othermeasuresof
statisticalaccuracysuchas bias and prediction error,and to complicated
data structures suchas timeseries,censoreddata,and regression models.
Severalmoreexamplesarepresented theseideas.The lastthird
illustrating
ofthepaperdeals mainlywithbootstrapconfidence intervals.
Key words: Bootstrapmethod,estimatedstandarderrors,approximate
confidence
intervals, methods.
nonparametric

1. INTRODUCTION particularlyEfron(1982a). Some of the discussion


hereis abridgedfromEfronand Gong(1983)and also
A typicalproblemin appliedstatisticsinvolvesthe
fromEfron(1984).
estimation ofan unknown parameter 0. The twomain
Beforebeginning the mainexposition, we willde-
questionsasked are (1) whatestimator0 shouldbe
scribehowthebootstrap worksin termsofa problem
used? (2) Havingchosento use a particular0, how
whereit is not needed,assessingthe accuracyof the
accurateis it as an estimatorof0? The bootstrapis a
sample mean. Suppose that our data consistsof a
generalmethodology foranswering the secondques-
randomsamplefroman unknownprobability distri-
tion.It is a computer-based method, whichsubstitutes
butionF on therealline,
considerable amountsofcomputation in place ofthe-
oreticalanalysis.As we shall see, the bootstrapcan (1.1) Xl X2, * , X.- F.
routinely answerquestionswhichare fartoo compli-
HavingobservedX1 = x1, X2 = x2, ... , Xn = xn, we
catedfortraditional statisticalanalysis.Even forrel-
computethe samplemeanx = 1 xn/n, and wonder
ativelysimpleproblemscomputer-intensive methods
how accurateit is as an estimateof the truemean
likethebootstrapare an increasingly gooddata ana-
6 = EFIX}.
lyticbargainin an eraofexponentially decliningcom-
IfthesecondcentralmomentofF is 182(F) EFX2
putationalcosts. - (EFX)2, then the standard errora(F; n, x), that is
This paper describesthe basis of the bootstrap
thestandarddeviationofi fora sampleofsize n from
theory, whichis verysimple,and givesseveralexam-
F, is
distribution
ples of its use. Relatedideas like the jackknife,the
deltamethod, andFisher'sinformation boundarealso (1.2) o(F) = [,M2(F)/n]112.
discussed.Mostoftheproofsandtechnicaldetailsare The shortened notationo(F) -(F; n, i) is allow-
omitted.These can be foundin the references given, ablebecausethesamplesizen and statisticofinterest
x are known,only being
F unknown. The standard
ofStatistics
B. Efronis Professor and
andBiostatistics, measureofi's accuracy.Un-
erroris the traditional
ChairmanoftheProgramin Mathematical and Com- fortunately,we cannotactuallyuse (1.2) to assessthe
putationalScienceat StanfordUniversity.His mailing accuracyofi, sincewe do notknowM2(F), butwe can
addressisDepartment SequoiaHall,Stan-
ofStatistics, use theestimated standarderror
fordUniversity,Stanford, CA 94305.R. Tibshiraniis
(1.3) = [#2/n]l/2
Fellowin theDepartment
a Postdoctoral ofPreventive
Medicineand Biostatistics,FacultyofMedicine,Uni- wherejX2= Ei (Xi-x)2/(n - 1), theunbiased estimate
versityof Toronto,McMurrickBuilding,Toronto, of A2(F).
Ontario,M5S 1A8,Canada. Thereis a moreobviouswayto estimateo(F). Let

54
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 55

F indicatetheempiricalprobability
distribution, We willsee thatbootstrapconfidence intervalscan
automatically incorporate
trickslikethis,withoutre-
(1.4) F: probability
mass 1/non x1,x2,... , xn. quiringthedataanalysttoproducespecialtechniques,
Thenwecan simplyreplaceF byF in (1.2),obtaining likethetanh-1transformation,foreachnewsituation.
Animportant themeofwhatfollows is thesubstitution
(1.5) a - F) = [U2(P)/nl . ofrawcomputing powerfortheoretical analysis.This
as the estimatedstandarderrorfori. This is the is not an argumentagainsttheory,of course,only
bootstrapestimate.The reasonforthe name "boot- againstunnecessary theory.Mostcommonstatistical
strap"willbe apparentin Section2,whenweevaluate methodsweredevelopedinthe1920sand 1930s,when
v(F) forstatisticsmorecomplicatedthanx. Since computation was slowand expensive.Now thatcom-
putationis fastand cheapwecan hopeforand expect
changesin statisticalmethodology. This paper dis-
(1.6) 82(F) = X (
/2 a
cussesone suchpotentialchange,Efron(1979b)dis-
cussesseveralothers.
a is too
is notquitethesame as a-,butthedifference
in
smallto be important mostapplications.
Of course we do not reallyneed an alternative 2. THE BOOTSTRAPESTIMATEOF STANDARD
formulato (1.3) in this case. The troublebeginswhen ERROR
wewanta standarderrorforestimators morecompli- This sectionpresentsa morecarefuldescription of
catedthanx, forexample,a medianor a correlation thebootstrapestimateofstandarderror.For nowwe
ora slopecoefficientfroma robustregression. In most willassumethatthe observeddata y = (xl, x2, **
cases thereis no equivalentto formula(1.2), which xn)consistsofindependent andidenticallydistributed
expressesthestandarderrora(F) as a simplefunction (iid) observations X1,X2- .-, Xn fiidF, as in (1.1).
ofthe samplingdistribution F. As a result,formulas Here F represents an unknownprobability distribu-
like(1.3) do notexistformoststatistics. tion on r, the commonsample space of the observa-
This is wherethecomputercomesin. It turnsout tions. We have a statisticof interest,say 0(y), to
thatwecan alwaysnumerically evaluatethebootstrap whichwe wishto assignan estimatedstandarderror.
estimatea = a(F), withoutknowinga simpleexpres- Fig. 1 showsan example.The samplespace r is
sionfora(F). The evaluationofa is a straightforward n2+, the positivequadrantof the plane. We have
MonteCarloexercisedescribedin thenextsection.In observedn = 15 bivariatedata points,each corre-
a good computing environment, as describedin the spondingto an Americanlaw school.Each pointxi
remarksin Section2, the bootstrapeffectively gives consistsoftwosummary statisticsforthe1973enter-
the statisticiana simple formula like (1.3) for any ingclass at law schooli
no matterhowcomplicated.
statistic,
Standarderrorsare crudebut usefulmeasuresof (2.1) xi= (LSATi, GPAi);
statisticalaccuracy.They are frequentlyused to give
approximateconfidenceintervalsfor an unknown LSATi is the class' averagescore on a nationwide
parameter 0 examcalled "LSAT"; GPAi is the class' averageun-
dergraduate grades.The observedPearsoncorrelation
(1.7) 0 E 0 ? Sz(a),
wherez(a)is the100 * a percentilepointofa standard
3.5-
normalvariate,e.g., Z(95) = 1.645. Interval(1.7) is
sometimes good,and sometimes notso good.Sections GPA *1
7 and 8 discussa moresophisticated use oftheboot- 3.3t *2
strap,whichgivesbetterapproximate confidence in-
tervalsthan(1.7). 3.1 -10
*6
The standardinterval(1.7) is based on takinglit- GPA - 97 *4
erallythe largesamplenormalapproximation (f - 2.9 - *@14
15
0)/S N(0, 1). Appliedstatisticians use a varietyof 03
tricksto improvethisapproximation. For instanceif @13 *12
2.7 '- lI l 1I
0 is the correlation and 0 the samplecor-
coefficient 540 560 580 600 620 640 660 680
relation,thenthetransformation 4 = tanh-1(0), = LSAT
tanh-1(0)greatly improves thenormalapproximation,
FIG. 1. The law schooldata (Efron,1979b). The data points,begin-
at leastin thosecases wheretheunderlying sampling
ning with School 1, are (576, 3.39), (635, 3.30), (558, 2.81),
distribution is bivariatenormal.The correcttactic (578, 3.03), (666, 3.44), (580, 3.07), (555, 3.00), (661, 3.43),
thenis to transform, computetheinterval(1.7) for, (651, 3.36), (605, 3.13), (653, 3.12), (575, 2.74), (545, 2.76),
and transform thisintervalbackto the0 scale. (572, 2.88), (594, 2.96).
56 B. EFRON AND R. TIBSHIRANI

forthese15 pointsis 6 = .776.We wishto


coefficient Carloalgorithm willnotconverge to a'ifthebootstrap
assigna standarderrorto thisestimate. samplesize differs fromthetruen. Bickeland Freed-
Let o(F) indicatethe standarderrorof 0, as a man(1981) showhowto correctthealgorithm to give
functionoftheunknownsamplingdistribution F, a ifin factthebootstrapsamplesizeis takendifferent
than n, but so far theredoes not seem to be any
(2.2) a(F) = [VarF{Ny)
practicaladvantageto be gainedin thisway.
ofthesamplesize n
Ofcoursea (F) is also a function Fig. 2 showsthe histogram of B = 1000bootstrap
and the formof the statistic0(y), but sincebothof replicationsofthecorrelationcoefficient fromthelaw
these are knowntheyneed not be indicatedin the schooldata. For convenient reference the abscissais
notation.The bootstrapestimateofstandarderroris plottedin termsof 0* - 0 = 0* - .776. Formula (2.4)
gives6' = .127 as thebootstrapestimateof standard
(2.3) =
error.This can be comparedwiththe usual normal
whereF is the empiricaldistribution (1.4), putting theoryestimateofstandarderrorfor0,
probability1/non each observeddata pointxi.In the
lawschoolexample,F is thedistribution puttingmass (2.5) TNORM = (1 - )/(n - 3)1/ = .115,
1/15on each point in Fig. 1, and a is the standard [Johnson
and Kotz (1970,p. 229)].
deviationofthecorrelation coefficientfor15iidpoints
drawnfromF. REMARK.The Monte Carlo algorithm leadingto
In most cases, includingthat of the correlation is to
U7B(2.4) simple program. On the Stanfordversion
thereis no simpleexpression
coefficient, forthefunc- of the statisticalcomputinglanguageS, Professor
it is easyto numeri-
tiona(F) in (2.2). Nevertheless, Arthur Owenhas introduced a singlecommandwhich
callyevaluatea = 0(F) by meansof a MonteCarlo bootstraps anystatisticin theS catalog.For instance
algorithm, whichdependson the following notation: thebootstrapresultsin Fig. 2 are obtainedsimplyby
= (x4, 4, *, x*) indicatesn independent draws typing
fromF, called a bootstrap sample.Because F is the
tboot(lawdata, B = 1000).
correlation,
empiricaldistributionofthedata,a bootstrapsample
turnsout to be the same as a randomsampleof size The executiontimeis abouta factorofB greater
than
n drawnwithreplacement fromthe actual sample thatfortheoriginalcomputation.
..
{X1, X2, * * Xnl.
The MonteCarloalgorithm proceedsinthreesteps: There is anotherway to describethe bootstrap
(i) usinga randomnumbergenerator, independently standarderror:F is thenonparametric maximum like-
drawa largenumberofbootstrapsamples,say y*(1), lihoodestimate(MLE) oftheunknowndistribution F
y*(2), ***, y*(B); (ii) foreach bootstrap sample y*(b), (Kiefer and Wolfowitz, 1956). This means that the
evaluatethestatisticofinterest, say 0*(b)= 0(y*(b))g bootstrapestimateaf = a(F) is the nonparametric
b = 1, 2, * , B; and (iii) calculate the sample standard MLE ofv(F), thetruestandarderror.
deviationofthe0*(b)values In factthereis nothingwhichsays thatthe boot-
strapmustbe carriedoutnonparametrically. Suppose
Zb=1
A {8*(b)-
-A 0()2A
*.)}0/ 1/2 forinstancethatin thelaw schoolexamplewebelieve
the truesamplingdistribution F mustbe bivariate
(2.4) B-i
normal.ThenwecouldestimateF withitsparametric
l*(.)= >20*(b) MLE FNORM, thebivariatenormaldistribution having
the same meanvectorand covariancematrixas the
B~~~~~~
It is easy to see that as B 60, 5B will approach
-

a = (F), the bootstrapestimateof standarderror.


All we are doingis evaluatinga standarddeviation
by Monte Carlo sampling.Later, in Section 9, we NORMAL
will discusshow largeB need be taken. For most THEORYHITGA
DENSITY HISTOGRAM
situationsB in therange50 to 200 is quiteadequate.
In whatfollowswe willusuallyignorethe difference
between5B and a, callingbothsimply"a" HISTOGRAM
PERCENTILES
Whyis each bootstrapsampletakenwiththesame 160/o 50 84
samplesize n as theoriginaldata set?Remember that
o(F) is actually(F, n, 6), the standarderrorforthe -0.4 -0.3 -0.2 -0.1 0 0.1 0.2

statistic0( ) basedon a randomsampleofsize n from FIG. 2. Histogramof B = 1000 bootstrapreplicationsof 6*for the
the unknowndistribution F. The bootstrapestimate law schooldata. The normaltheorydensitycurvehas a similarshape,
f is actuallyo(F, n,0) evaluatedat F = F. The Monte butfallsoffmorequicklyat the uppertail.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 57

data. The bootstrapsamplesat step (i) of the algo- expression forthestandarderrorofan MLE), and in
rithmcouldthenbe drawnfromFNORM insteadofF, fact all analyticdifficulties of any kind. The data
and steps(ii) and (iii) carriedoutas before. analystis freeto obtain standarderrorsforenor-
The smoothcurvein Fig. 2 showsthe resultsof mouslycomplicatedestimators, subjectonlyto the
carrying out this "normaltheorybootstrap"on the constraints ofcomputer time.Sections3 and6 discuss
law schooldata. Actuallythereis no need to do the someinteresting appliedproblemswhichare fartoo
bootstrapsamplingin this case, because of Fisher's complicated forstandardanalyses.
formula forthesamplingdensityofa correlation coef- How welldoes the bootstrapwork?Table 1 shows
ficientin the bivariate normal situation(see Chapter the answerin one situation.Here r is the real line,
32 of Johnsonand Kotz, 1970). This densitycan be n = 15, and the statistic0 of interestis the 25%
thoughtof as the bootstrap distributionfor B = oo. trimmed mean.If thetruesamplingdistribution F is
Expression(2.5) is a close approximation to "1NORM= N(0, 1), thenthe truestandarderroris a(F) = .286.
o(FNORM), theparametricbootstrapestimate ofstand- The bootstrap estimate'ais nearlyunbiased,averaging
arderror. .287 in a largesamplingexperiment. The standard
In considering the meritsor demeritsof the boot- deviationofthebootstrapestimatea' is itself.071 in
strap, it is worth remembering that all of the usual thiscase,withcoefficient ofvariation.071/.287= .25.
for
formulas estimating standard errors,like g-l/2 (Notice that there are two levels of Monte Carlo
where J is the observed Fisher information,are es- involvedin Table 1: firstdrawingthe actualsamples
sentiallybootstrapestimatescarriedout in a para- y = (xl, x2, ..., x15)fromF, and thendrawingboot-
metricframework. This pointis carefully explainedin * *,x15)withy held fixed.The
strapsamples (x4, x2*, *
Section5 ofEfron(1982c).The straightforward non- bootstrapsamplesevaluatea' fora fixedvalue of y.
parametric algorithm(i)-(iii) has thevirtuesofavoid- The standarddeviation.071 refersto the variability
ing all parametricassumptions,all approximations ofa' dueto therandomchoiceofy.)
(such as thoseinvolvedwiththe Fisherinformation The jackknife,anothercommonmethodofassign-
ing nonparametric standarderrors,is discussedin
Section10. The jackknifeestimateCJ' is also nearly
TABLE 1
unbiasedfora(F), buthas highercoefficient ofvari-
A samplingexperimentcomparingthe bootstrapand jackknife
estimatesofstandarderrorforthe 25% trimmedmean, ation (CV). The minimum possibleCV fora scale-
sample size n = 15 invariantestimateof a(F), assumingfullknowledge
F standard F negative
of the parametricmodel,is shownin brackets.The
normal exponential nonparametric bootstrapis seen to be moderately
efficientin bothcases considered in Table 1.
Ave SD CV Ave SD CV
Table 2 returnsto the case of 0 the correlation
Bootstrap f .2s87 .071 .25 .242 .078 .32 coefficient.Insteadof real data we have a sampling
(B = 200) experiment in whichthe trueF is bivariatenormal,
Jackknife
6f .280 .084 .30 .224 .085 .38 0 = .50,samplesize n = 14. Table 2
truecorrelation
CV) .286
True(minimum (.19) .232 (.27)
is abstracted from a largertablein Efron(1981b),in
TABLE 2
F bivariatenormalwithtrue
C and forX = tanh-'6; sample size n 14, distribution
Estimatesofstandarderrorforthe correlationcoefficient
correlationp = .5 (froma largertablein Efron,1981b)

Summarystatisticsfor200 trials

Standard errorestimatesforC Standard errorestimatesforX


Ave SD CV MSE Ave SD CV -VKfM
1. BootstrapB = 128 .206 .066 .32 .067 .301 .065 .22 .065
2. BootstrapB = 512 .206 .063 .31 .064 .301 .062 .21 .062
3. Normal smoothedbootstrapB = 128 .200 .060 .30 .063 .296 .041 .14 .041
4. UniformsmoothedbootstrapB = 128 .205 .061 .30 .062 .298 .058 .19 .058
5. UniformsmoothedbootstrapB = 512 .205 .059 .29 .060 .296 .052 .18 .052

6. Jackknife .223 .085 .38 .085 .314 .090 .29 .091


7. Delta method .175 .058 .33 .072 .244 .052 .21 .076
(Infinitesimaljackknife)

8. Normal theory .217 .056 .26 .056 .302 0 0 .003

True standarderror .218 .299


58 B. EFRON AND R. TIBSHIRANI

whichsomeofthemethodsforestimating a standard (x = 0)
time(y) in weeksfortwogroups,treatment
errorrequiredthesamplesize to be even. and control(x = 1), and a 0-1 variable (bi) indicating
The leftside ofTable 2 refersto 0, whiletheright whetheror notthe remissiontimeis censored(0) or
side refersto X = tanh-'(0) = .5 log(1 + 6)/(1 -). complete(1). Thereare 21 micein each group.
For each estimatorof standarderror,the rootmean The standardregression modelforcensoreddata is
squarederrorof estimation[E(a - )2]1/2 iS givenin Cox's proportional
hazardsmodel(Cox, 1972).It as-
thecolumnheadedVMi.. sumesthatthehazardfunction h(tIx),theprobability
The bootstrapwas runwithB = 128 and also with of goingintoremissionin nextinstantgivenno re-
B = 512,thelattervalueyieldingonlyslightly better missionup to timet fora mousewithcovariatex, is
estimatesin accordancewiththeresultsofSection9. oftheform
FurtherincreasingB wouldbe pointless.It can be
(3.1) h(tIx) = ho(t)e:x.
shownthatB = oo givesVii-i = .063for0, only.001
less thanB = 512. The normaltheoryestimate(2.5), Hereho(t)is an arbitraryunspecifiedfunction.
Since
whichwe knowto be ideal forthissamplingexperi- x hereis a groupindicator,
thismeanssimplythatthe
ment,has ../i-Si= .056. hazardforthecontrolgroupis e: timesthehazardfor
We can compromise betweenthe totallynonpara- the treatment group.The regression parameterd is
metric bootstrap estimatea'andthetotallyparametric estimatedindependently of ho(t)throughmaximiza-
bootstrapestimateC7NORM. This is done in lines3, 4, tionoftheso called"partiallikelihood"
and 5 ofTable 2. Let 2 = Sin-l (xi - )(x- i)'/n be e,3xi
the samplecovariancematrixof the observeddata. (3.2) PL = 11 e-xi
The normalsmoothedbootstrap drawsthe bootstrap iED EiER, e i'
samplefromF (D N2(0, .252), (D indicatingconvolu- whereD is the set ofindicesofthe failuretimesand
tion.This amountsto estimating F by an equal mix- Ri is thesetofindicesofthoseat riskat timeyi.This
ture of the n distributions N2(xi,.252), thatis by a maximization requiresan iterative computer search.
normalwindowestimate.Each pointxi*in a smoothed The estimated forthesedata turnsoutto be 1.51.
bootstrapsampleis the sum of a randomlyselected Taken literally, thissaysthatthe hazardrateis e'5'
originaldata pointxj, plus an independent bivariate = 4.33 timeshigherin the controlgroupthanin the
-
normalpointzj N2(0,.252). Smoothing makes little treatment group,so the treatment is veryeffective.
difference on the leftside ofthe table, but is spectac- Whatis thestandarderroroffA? The usualasymptotic
ularlyeffective in the k case. The latterresultis maximum likelihoodtheory, one overthesquareroot
suspectsincethetruesamplingdistribution is bivar- oftheobservedFisherinformation, givesan estimate
iate normal,and the function q = tanh- O is specifi- of .41. Despitethe complicated natureoftheestima-
callychosento havenearlyconstantstandarderrorin tionprocedure, wecanalso estimate thestandarderror
the bivariatenormalfamily.The uniform smoothed using the bootstrap.We sample with replacement
bootstrapsamples from F (D W(0, .252), where fromthe triplesI(y', xi, 5k), *..*, (Y42, x42, 642)). For
WI(0,.252) is the uniform distribution on a rhombus each bootstrap sample $(y*, x*, 0), .., (y*, x4*2,
selectedso VIhas meanvector0 andcovariancematrix 6*)) we formthe partiallikelihoodand numerically
.25Z. It yieldsmoderatereductions in vMi-SR forboth maximizeit to producethe bootstrapestimateA*.A
sidesofthetable. histogram of1000bootstrap valuesis shownin Fig.3.
Line 6 ofTable 2 refersto thedeltamethod, which The bootstrapestimateof the standarderrorof A
is the mostcommonmethodof assigningnonpara- based on these 1000 numbersis .42. Althoughthe
metricstandarderror.Surprisingly enough,it is badly bootstrap and standardestimatesagree,it is interest-
biaseddownward on bothsidesofthetable.The delta ingto notethatthe bootstrapdistribution is skewed
method,also knownas the methodof statisticaldif- to the right.This leads us to ask: is thereother
ferentials,theTaylorseriesmethod, andtheinfinites- information thatwe can extractfromthe bootstrap
imaljackknife, is discussedin Section10. distribution otherthana standarderrorestimate?The
answeris yes-in particular, the bootstrapdistribu-
3. EXAMPLES tioncan be used to forma confidence intervalforfA,
as wewillsee in Section9. The shapeofthebootstrap
Example 1. Cox's ProportionalHazards Model
distributiion will help determinethe shape of the
In this sectionwe applybootstrapstandarderror confidence interval.
estimation statistics.
to somecomplicated In thisexampleourresampling unitwas thetriple
The data forthis examplecome froma studyof (yi,xi,bi),and we ignoredtheuniqueelementsofthe
leukemiaremissiontimesin mice,taken fromCox problem, and theparticularmodel
i.e.,the censoring,
(1972). They consistof measurements of remission beingused.In fact,thereare otherwaysto bootstrap
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 59

Noticetherelationoftheprojection pursuitregres-
200 sion modelto the standardlinearregression model.
Whenthe function sj(-) is forced to be linear and is
estimatedby the usual least squaresmethod,a one-
termprojection pursuitmodelis exactlythe same as
150
the standardlinearregression model.That is to say,
the fittedmodels'(a'1 xi) exactlyequals the least
squaresfita' + jxi,. This is because the least
f31
100 squaresfit,bydefinition, findsthebestdirection and
the best linearfunctionof that direction.Note also
thataddinganotherlinearterms 2(& * X2)wouldnot
changethe fittedmodelsincethe sum of two linear
50 functions is anotherlinearfunction.
Hastieand Tibshirani(1984) appliedthebootstrap
to thelinearand projection pursuitregression models
01 to assess the variability of the coefficients in each.
0.5 1 1.5 2 2.5 3
The datatheyconsidered aretakenfromBreimanand
FIG. 3. Histogram of 1000 bootstrapreplicationsfor the mouse Friedman(1985). The responseY is Upland atmos-
leukemiadata. phericozoneconcentration (ppm);the covariatesX1
= SandburgAir Force base temperature (CO),X2 =
thisproblem.We willsee thiswhenwe discussboot- inversion base height(ft),X3 = Daggotpressuregra-
censoreddata in Section5.
strapping dient(mmHg), X4 = visibility (miles),and X5 = day
oftheyear.Thereare 330 observations. The number
Example2: Linearand ProjectionPursuit ofterms(m) inthemodel(3.4) is takentobe two.The
Regression projectionpursuitalgorithm chosedirections al = (.80,
-.38, .37, -.24, -.14)' and 62 = (.07, .16, .04, -.05,
We illustratean applicationof the bootstrapto
-.98)'. These directionsconsistmostlyof Sandburg
standardlinearleast squaresregression
as wellas to
Air Force temperature and day of the year,respec-
a nonparametric regression
technique.
Considerthestandardregression setup.We haven
observationson a responseY and covariates(X1,X2,
***,X,). Denotetheithobservedvectorofcovariates
by xi = (xil, xi2, ... , xip)'. The usual linear regression
modelassumes a5 I

p
(3.3) E(Yi) = a + E /fl1xi.
j=l
a4 A. I
Friedmanand Stuetzle(1981) introduced
a moregen-
model
eralmodel,theprojectionpursuitregression
m
(3.4) E(Yi)= X sj(aj - xi). a3

j=l

The p vectorsaj are unitvectors("directions"),


and
thefunctions
sj(.) are unspecified. a2

Estimation of la,, sl(.)), ..., {am1,Sm(-)} is per-


in a forward
formed stepwisemanneras follows.Con-
sider {al, s ( -)}. Given a directional, s, ()* is estimated a, L
bya nonparametric smoother(e.g.,runningmean)of
y on a, * x. The projectionpursuitregression
algo-
rithmsearchesover all unit directionsto findthe -1 -0.5 0 0.5 1
directional and associated functionsl(-) that mini- bootstrappedcoefficients
mize (y1-sl(is xa))2.
* Then residuals are taken
FIG. 4. Smoothedhistogramsofthebootstrapped forthe
coefficients
and the nextdirectionand functionare determined. firsttermin theprojectionpursuitregressionmodel.Solid histograms
This processis continueduntil no additionalterm are forthe usual projectionpursuitmodel;the dottedhistogramsare
reducestheresidualsumofsquares.
significantly forlinear9(*).
60 B. EFRON AND R. TIBSHIRANI

on 157patients.A pro-
(1). Thereare measurements
portionalhazardsmodelwas fitto thesedata,witha
quadraticterm,i.e, h(t I x) = ho(t)elx+i32x. Both #,and
f2are highlysignificant;
thebrokencurvein Fig.6 is
/3x+ f2x2as a functionofx.
For comparison,Fig. 6 shows(solid line) another
a4 estimate.This was computedusing local likelihood
estimation(Tibshiraniand Hastie, 1984). Given a
hazardsmodeloftheformh(tIx)
generalproportional
= ho(t)es(x),the local likelihood technique assumes
a3
nothingabouttheparametric formofs(x); insteadit
estimatess(x) nonparametrically usinga kindoflocal
averaging. The algorithm is verycomputationally in-
a2 tensive, andstandardmaximum likelihoodtheory can-
notbe applied.
A comparisonof the two functions revealsan im-
portantqualitative difference:the parametric estimate
al~~ ~ ~~~~~~ - p p suggeststhatthe hazarddecreasessharplyup to age
34,thenrises;the local likelihoodestimatestaysap-
-1 -0.5 0 0.5 1 proximately constantup to age 45 thenrises.Has the
bootstrapped coefficients forcedfitting ofa quadraticfunction produceda mis-
leadingresult?To answerthisquestion,we can boot-
FIG. 5. Smoothedhistogramsofthebootstrapped
coefficients
forthe
second termin theprojectionpursuitmodel. strapthe local likelihoodestimate.We samplewith
replacement fromthe triplesI(Yl, x1, 61) ... (Y157i
X157,6157) and applythelocal likelihood algorithm to
tively.(We do notshowgraphsoftheestimatedfunc- eachbootstrapsample.Fig. 7 showsestimatedcurves
tionss(*(.) and s2(. ) althoughin a fullanalysisofthe from20 bootstrapsamples.
data theywouldalso be of interest.)Forcings'( *) to Someofthecurvesare flatup to age 45,othersare
be linearresultsin the directiona' = (.90,-.37, .03, decreasing.Hence the originallocal likelihoodesti-
-.14, -.19)'. These are just the usual least squares mateis highlyvariablein thisregionand on thebasis
estimatesi1, *.,* ,, Ascaled so that EP12 = 1. ofthesedata we cannotdetermine the truebehavior
To assess the variability of the directions,a boot- ofthefunction there.A lookbackat theoriginaldata
strapsampleis drawnwithreplacement from(Yi, x11, showsthatwhilehalfof thepatientswereunder45,
. . ., X15), * * *, (Y330, X3301, *-- , X3305)and theprojection only13% of the patientswereunder30. Fig. 7 also
pursuitalgorithm is applied.Figs.4 and 5 showhis- showsthattheestimateis stablenearthemiddleages
togramsofthedirections a'* and a* for200 bootstrap butunstablefortheolderpatients.
replications. Also shownin Fig.4 (brokenhistogram)
are the bootstrap replicationsof a& with s.(.) forced 3
to be linear.
The firstdirectionofthe projectionpursuitmodel
is quite stableand onlyslightlymorevariablethan
thecorresponding linearregression
direction.But the 2
seconddirectionis extremely unstable!It is clearly
unwiseto putanyfaithin theseconddirection ofthe
originalprojectionpursuitmodel. t0 //~~~~~~~~~~
/ /
Example 3: Cox's Model and Local Likelihood \L //
a /t
Estimation
In this example,we returnto Cox's proportional
hazardsmodeldescribedin Example1,butwitha few
addedtwists.
The data thatwe willdiscusscomefromtheStan-
fordhearttransplant program and are givenin Miller 10 20 30 40 50 60
and Halpern(1982). The responsey is survivaltime age
in weeksaftera hearttransplant, the covariatex is FIG. 6. Estimates of log relativeriskfor the Stanfordheart trans-
age at transplant,and the 0-1 variable3 indicates plant data. Broken curve: parametric estimate. Solid curve: local
whether thesurvivaltimeis censored(0) or complete likelihoodestimate.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 61

TABLE 3
4 BHCGbloodserumlevelsfor54 patientshavingmetasticized
breast
cancerin ascending order <
0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.8, 0.8, 0.9, 0.9, 1.3, 1.3, 1.4, 1.5, 1.6,
3 1.6, 1.7, 1.7, 1.7, 1.8, 2.0, 2.0, 2.2, 2.2, 2.2, 2.3, 2.3, 2.4, 2.4, 2.4,
2.4, 2.4, 2.4, 2.5, 2.5, 2.5, 2.7, 2.7, 2.8, 2.9, 2.9, 2.9, 3.0, 3.1, 3.1,
3.2, 3.2, 3.3, 3.3, 3.5, 4.4, 4.5, 6.4, 9.4

As an exampleconsiderthe blood serumdata of


Table 3. Supposewe wishto estimatethetruemean
A = EF{X I ofthispopulation using0,the25%trimmed
mean.We calculatej, = ,u(F) = 2.32,thesamplemean
ofthe54 observations, and0= 2.24,thetrimmed mean.
The trimmed meanis lowerbecauseit discountsthe
-11,..,.
10 20 30 40 50 60 effectof the largeobservations 6.4 and 9.4. It looks
age likethetrimmed meanmightbe morerobustforthis
typeof data, and as a matterof fact a bootstrap
ofthelocallikelihood
FIG.7. 20 bootstraps estimate
fortheStanford analysis,B = 1000, gave estimatedstandarderror
data.
hearttransplant = .16 for0, comparedto .21 forthe samplemean.
a

But whataboutbias?
The same 1000 bootstrapreplications whichgave
4. OTHER MEASURES OF STATISTICALERROR a
= .16 also gave0*(-) = 2.29,so

So farwe have discussedstatisticalerror,or accu- (4.5) = 2.29 - 2.32 = -0.03.


racy,intermsofthestandarderror.It is easyto assess accordingto (4.4). (The estimatedstandarddeviation
othermeasuresof statisticalerror,such as bias or of fB- due to the limitations
- of havingB = 1000
predictionerror,usingthebootstrap. bootstrapsis only0.005in thiscase,so we can ignore
Considertheestimation ofbias.Fora givenstatistic the differencebetweenfB and A3.)Whetheror not a
0(y),and a givenparameter,u(F),let bias of magnitude-0.03 is too largedependson the
(4.1) R(y, F) = 0(y) - A(F). contextoftheproblem.If we attemptto removethe
bias by subtraction,we get 0- = 2.24 - (-0.03)-
(It willhelpkeepournotationclearto call theparam- 2.27. Removingbias in thisway is frequently a bad
eterofinterestAratherthan0.) Forexample,,umight idea (see Hinkley,1978),but at least the bootstrap
bethemeanofthedistribution F, assumingthesample analysishas givenus a reasonablepictureofthebias
spaceX is therealline,and 0 the25% trimmed mean. and standarderrorof0.
The bias of0 forestimatingitis Here is anothermeasureof statisticalaccuracy,
(4.2) A(F) = EFR(Y, F) = EF10(y)) - A(F). different fromeitherbias or standarderror.Let 0(y)
be the 25% trimmed meanand M(F) be the meanof
The notationEF indicatesexpectation withrespectto F, as in the serumexample,and also let i(y) be the
the probability mechanismappropriate to F, in this interquartilerange,thedistancebetweenthe25thand
case y = (xl, x2, - - *,xn)a randomsample fromF. 75thpercentilesof the sampley = (x1, x2, *--, xn).
The bootstrapestimateofbias is Define
(4.3), fi= A(F) = EFR(y*, F) = Ep{0(Y*)) - U(F)
(4.6) R(y, F) = (y)- (F)
As in Section2, y* denotesa randomsample(x4,x, I(y)
***,4) fromF, i.e.,a bootstrapsample.To numeri- R is like a Student'st statistic,exceptthatwe have
callyevaluate,3,all we do is changestep (iii) of the substituted the 25% trimmedmean forthe sample
bootstrapalgorithm in Section2 to
mean and the interquartile rangeforthe standard
'A - 1 B deviation.
/B - - E R(y*(b), F ). Supposewe knowthe 5th and 95thpercentilesof
Bb=1
R(y,F), sayp(-05)(F)and p(.95)(F),wherethedefinition
b A ofp(.05)(F)is
(4.4) -
0*(b) -()
(4.7) ProbF,R(y,F) < p(5)(F) I = .05,
andsimilarlyforpf95)(F). The relationship
ProbFpt(.OS)
3
AsB-*oo,LXgoesto/F(4.3). s R < (95)} - .90 combineswithdefinition (4.6) to
62 B. EFRON AND R. TIBSHIRANI

givea central90% "t interval"forthemean,(F),


5. MORE COMPLICATEDDATA SETS
(4.8) y E [6_i(.95), 6 _ g(.05)] to situationswhere
The bootstrapis notrestricted
Of coursewe do not knowp`05)(F)and p(95)(F), the data is a simplerandomsample froma single
but we can approximatethem by theirbootstrap distribution.Supposeforinstancethatthe data con-
F and (95)(F). A bootstrap sample
estimates p(O5)(F) sistsoftwoindependent randomsamples,
y* gives a bootstrapvalue of (4.6), R(y*, F) = and
(5.1) U1, U2,---,Ur-F
(6(y*) - p(F))/i(y*),wherei(y*) is the interquartile
range of the bootstrap data x, *, * ., *. For V1,V2, ... Vn G,
any fixed numberp, the bootstrapestimate of distributions
whereF and G are possiblydifferent on
ProbF{R < p} based on B bootstrapsamples is therealline.Supposealso thatthestatisticofinterest
is theHodges-Lehmann shiftestimate
(4.9) F) < pJB.
#fR(y*(b), A

0=

By keepingtrack of the empiricaldistribution of (5.2)


R(y*(b),F), we can pick offthe values of p which medianIVj- Ui, i = 1, 2, *.. m,j = 1, 2, ..., n}.
make (4.9) equal .05 and .95. These approach F(05)(F) HavingobservedU1 = ul, U2 = u2, *--, Vn = Vn,
and p(95)(F) as B -* oo. we desirean estimatefora(F, G), the standarderror
Fortheserumdata,B = 1000bootstrap replications of0.
gave p( 5)(F) -.303 and p 95)(F)
= = .078. Substituting The bootstrapestimate of a(F, G) is a = G(F,G),
thesevalues into (4.9), and usingthe observedesti- of u1, u2, * .
whereF is the empiricaldistribution *,
mates0 = 2.24, i = 1.40, gives um, and G is the empiricaldistributionof v1, v2, ** ,
theMonteCarloalgorithm
vn. It is easyto modify of
(4.10) L E [2.13,2.66]
Section2 to numericallyevaluatev. Let y = (u1,u2,
as a central90% "bootstrapt interval"forthe true *-, vn) be the observeddata vector.A bootstrap
mean ,u(F). This is considerablyshorterthan the sampley*= (uiu, -* * *,u*4,v, v, *, v*) consists...

standardt intervalfor,tbased on 53 degreesoffree- of a random sample U*, U* from F and an


...,

dom,i ? 1.67W= [1.97,2.67]. Here-a= .21 is theusual independentrandomsample V*, - - -, V* fromG. With
estimateofstandarderror(1.3). onlythis modification, steps (i) through(iii) of the
Bootstrapconfidence arediscussedfurther
intervals MonteCarloalgorithm producefB, (2.4),approaching
in Sections7 and 8. They requiremorebootstrap aas B -- oo.
replications thando bootstrapstandarderrors, on the Table 4 reportson a simulationexperiment inves-
order of B = 1000 ratherthan B = 50 or 100. This tigating howwellthebootstrap worksonthisproblem.
pointis discussedbriefly in Section9. 100 trialsof situation(5.1) wererun,withm = 6,
By now it shouldbe clear that we can use any n = 9, F and G bothUniform[0, 1]. For each trial,
randomvariableR(y,F) to measureaccuracy, notjust bothB = 100andB = 200bootstrap replicationswere
(4.1) or (4.6), and thenestimateEFIR(y, F)} by its generated. The bootstrapestimateJB was nearlyun-
bootstrapvalue EpIR(y*,F1)}- b=1 R(y*(b),F)/B. biasedforthetruestandarderroru(F, G) = .167 for
Similarly we can estimateEFR(y,F)2 byEpR(y*,F)2, eitherB = 100orB = 200,witha quitesmallstandard
etc.Efron(1983) considerstheprediction problem,in deviation from trialtotrial.The improvement ingoing
whicha trainingset of data is used to constructa fromB = 100 to B = 200 is too smallto showup in
predictionrule. A naive estimateof the prediction thisexperiment.
rule'saccuracyis theproportion ofcorrectguessesit In practice,statisticiansmustoftenconsiderquite
makeson its owntrainingset,butthiscan be greatly complicated data structures:timeseriesmodels,mul-
overoptimistic sincethe prediction ruleis explicitly
constructed to minimize errorson thetrainingset.In TABLE 4
this case, a naturalchoice of R(y, F) is the over ofstandarderror
estimate
Bootstrap fortheHodges-Lehmann
optimism, the difference betweenthe naive estimate two-sample 100trials
shiftestimate;
and the actualsuccessrateoftheprediction rulefor forOB
statistics
Summary
newdata. Efron(1983) givesthe bootstrapestimate
Ave SD CV
ofoveroptimism, and showsthatit is closelyrelated
to cross-validation, the usual methodof estimating B = 100 .165 .030 .18
overoptimism. The papergoeson to showthatsome B = 200 .166 .031 .19
True o .167
modifications of the bootstrapestimategreatlyout
perform bothcross-validation and thebootstrap. Note: m = 6, n = 9; truedistributionsF and G both uniform[0, 1].
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 63

tifactorlayouts,sequentialsampling,censoredand of/ andti,forinstance


iates;andg is a knownfunction
missingdata,etc.Fig. 8 illustrates howthebootstrap e:'ti. The ei are an iid sample from some unknown
estimationprocessproceeds in a generalsituation. F on therealline,
distribution
The actualprobability mechanism P whichgenerates
(5.4) e1, 2F,. . . En
the observeddata y belongsto some familyP of
possibleprobability mechanism. In the Hodges-Leh- whereF is usuallyassumedto be centered at 0 in some
mannexample,P (F, = G), a pair on
ofdistributions sense,perhapsEHe} 0 = or Prob$e < 0} = .5. The
therealline,P equalsthefamilyofall suchpairs, and probability modelis P (A,= F); (5.3) and (5.4) describe
y = (u1, U2, *--, Um, V1, v2, *--, vn) is generated thestepP -* 8.
y in Fig. The covariates t1, t2, *.. , tn
by randomsamplingm timesfromF and n times likethesamplesize n in the simple problem (1.1),are
fromG. considered fixed at their observed values.
We have a randomvariableof interestR(y, P), For everychoice of A we have a vectorg(3) =
whichdependson bothy and theunknownmodelP, (g(/, t1),g(B, t2), ** , g(/, tn)) ofpredicted valuesfor
and we wish to estimatesome aspect of the dis- y. Havingobserved y, we estimate A by minimizing
tributionof R. In the Hodges-Lehmannexample, somemeasureofdistancebetweeng(/3)and y,
R(y, P) = 0(y) - Ep{0j, and we estimatedo(P) = /: min D(y, g(/)).
the (5.5)
lEpR(y,p)2}1/2, thestandarderrorof0. As before,
notationEp indicatesexpectation wheny is generated
according to mechanism P. The most common choice of D is D(y, g) =
We assumethatwe have someway of estimating = 1$yi- g(/, ti)12.
the entireprobability modelP fromthe data y, pro- How accurateis as an estimateofd? Let R(y,P)
ducingthe estimatecalled P in Fig. 8. (In the two- equalthevector/ -. A familiarmeasureofaccuracy
sampleproblem,P = (F, G), the pair of empirical is themeansquareerrormatrix
distributions.) Thisis thecrucialstepforthebootstrap. Z(P) = Ep( - )( - /)'
(5.6)
It can be carriedouteitherparametrically or nonpar- = EpR(y, P)R(y, P)'.
ametrically, bymaximum likelihoodor bysomeother
estimation technique. The bootstrapestimateof accuracyZ = AP) is ob-
Oncewe haveP, we can use MonteCarlomethods tainedbyfollowing through Fig.8.
to generatebootstrapdata sets y*, accordingto the Thereis an obviouschoiceforP = (/3,F) in this
same rules by whichy is generatedfromP. The case. The estimate/3is obtainedfrom(5.5). ThenF is
bootstraprandomvariableR(y*, P) is observable, theempiricaldistribution oftheresiduals,
sincewe knowP as wellas y*,so the distribution of F: mass(1/n) on 90-9g(/, ti),
R(y*,P) can be foundbyMonteCarlosampling.The (5.7)
bootstrapestimateofEpR(y,P) is thenEfR(y*,P), i = 1, ... , n.
and likewisefor estimatingany other aspect of A bootstrapsampley* is obtainedby following rules
R(y,P)'s distribution. (5.3) and (5.4),
A regression modelis a familiarexampleofa com-
plicateddata structure. We observey = (Yi, Y2, * (5.8) Y' = g(, ti) + , i = 1,2, *** n,
Yn), where whereerc*',2 e* is an iid samplefromF. Notice
n. thatthe e* are independent bootstrapvariates,even
(5-3) yi = g(#, ti) + ei i = 1, 2, *--,
thoughthee'iarenotindependent variatesintheusual
HereA is a vectorofunknownparameters we wishto sense.
estimate;foreach i, tiis an observedvectorofcovar- Each bootstrap sampley*(b)givesa bootstrap value
Al*(b),
FAMILY OF
POSSIBLE
PROBABILITY
ACTUAL
PROBABILITY OBSERVED
ESTIMATED
PROBABILITY BOOTSTRAP (5.9) minD(y*(b), g(/3)),
MO*(b):
MODELS MODEL DATA MODEL DATA
-----------
pe P y P y
as in (5.5). The estimate

R(y,P) R(y*,P) (5.10) 2B


= Eb
1t{3*(b)
-
3*( )H}3*(b) -
B
RANDOM VARIABLE OF INTEREST BOOTSTRAP RANDOM VARIABLE

approaches the bootstrapestimate Z as B -- oo. (We


FIG. 8. A schematicillustration processfora general
ofthebootstrap
probabilitymodelP. The expectationof R(y, P) is estimatedby the
couldjustas welldividebyB 1 in (5.10).) -

bootstrapexpectationof R(y*, P). The double arrow indicates the In the case of ordinaryleast squares regression,
crucialstep in applyingthe bootstrap. where g(/, ti) = /'ti and D(y, g) = = (y1-
64 B. EFRON AND R. TIBSHIRANI

Section7 of Efron(1979a) showsthatthe bootstrap The statistician


getsto observe
estimate, B = oo, can be calculated without Monte
(5.14) Xi= min{X?, WiJ
Carlosampling,
and is
and
(5.11) (2 tit
2_ 2 1if Xi = X
(5.15) D= X
X = W.
This is the usual Gauss-Markovanswer,exceptfor
thedivisorn in thedefinition
ofa2. Note: 1 - S?(t) and 1 -R(t) are the cumulative
Thereis another, simplerwayto bootstrap
a regres- distribution
functions(cdf) forX? and Wi,respec-
sion problem.We can considereach covariate-re- tively;withcensoreddata it is moreconvenientto
sponse pair xi = (ti, yi) to be a single data point considersurvivalcurvesthancdf.
obtainedbysimplerandomsamplingfroma distribu- Underassumptions(5.12)-(5.15)thereis a simple
tionF. Ifthecovariatevectortiis p-dimensional, F is formulaforthe nonparametric MLE of S?(t), called
a distributionon p + 1 dimensions.Then we apply the Kaplan-Meier estimator(Kaplan and Meier,
the bootstrapas describedoriginally in Section2 to 1958).Forconvenience
supposex1< X2 < X3 < ... <
thedata setx1,X2, . ., Xn ~iid F. xn, n = 97. Then theKaplan-Meierestimateis
The twobootstrap methodsfortheregression prob-
lem are asymptotically equivalent,but can perform (5.16) SO) = l :)did
in smallsamplesituations.
quitedifferently The class
of possibleprobability modelsP is different forthe wherektis the value of k suchthatt E [Xk, Xk*1). In
twomethods. The simplemethod, described last,takes the case of no censoring, S?(t) is equivalentto the
less advantageof the specialstructure of the regres- observedempiricaldistribution ofx1,x2, **, xn, but
sion problem.It does not give answer(5.11) in the otherwise (5.16) corrects
theempiricaldistribution to
case ofordinaryleastsquares.On theotherhandthe accountforcensoring. Likewise
simplemethodgives a trustworthy
variability
estimateof f's
even if the regressionmodel (5.3) is not
correct.The bootstrap, as outlinedin Fig. 5, is very
(5.17) 9 R(t) =A
(= -; 1)d

general, butbecauseofthisgenerality therewilloften is the Kaplan-Meierestimateof thecensoringcurve


be morethanone bootstrapsolutionfora givenprob- R(t).
le.m Fig.9 showsS0(t) forthe ChanningHouse men.It
As the finalexampleof this section,we discuss crossesthe 50% survivallevel at 0 = 1044 months.
censored data. The ages of 97 men at a California Call thisvaluetheobservedmedianlifetime. We can
retirement center,ChanningHouse, were observed use the bootstrapto assigna standarderrorto the
eitherat death(an uncensored observation) or at the observedmedian.
timethe studyended (a censoredobservation).The The probabilitymechanismis P = (SO, R); P
data sety = {(x1,dl), (x2, d2), ***, (X97,d97)J,wherexi produces(X?, Di) according to (5.12)-(5.15),and y-
was theage oftheithmanobserved, and (xi,di), - (xn, dn)} by n = 97 independent repeti-
1 if xi uncensored tionsofthisprocess.Anobviouschoiceoftheestimate
{ 0 if xi censored. P in Fig. 8 is (S? R), (5.14), (5.15). The rest of
Thus (777, 1) represents a ChanningHouse manob-
servedto die at age 777 months, while(843,0) repre-
1.0
sentsa man 843 monthsold whenthe studyended.
His observationcouldbe writtenas "843+," and in
0.8 _
factdi is just an indicatorfortheabsenceorpresence
of "+." A fulldescriptionof the Channing House data 0.6 _
appearsin Hyde(1980).
A typicaldata point(Xi, Di) can be thoughtof as O. 4 -
generatedin the following
way:a real lifetimeX? is
selectedrandomly accordingto a survivalcurve 0.2 -

(5.12) S?(t) ProbJX9> t}, (0 c t < oo)


800 900 1000 1044 1100
and a censoringtime Wi is independently selected
to
according another
survival curve FIG. 9. Kaplan-Meier estimatedsurvival curve for the Channing
House men; t = age in months.The mediansurvivalage is estimated
(5.13) R(t) Prob{ W. > t), (0 < t < em). to be 1044 months(87 years).
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 65

bootstrapprocess is automatic: S? and R replace S? 250


and R in (5.12) and (5.13); n pairs (Xi*, D:*) are
independentlygenerated according to rules (5.12)-
(5.15), givingthe bootstrapdata set y* = {xl*,d*, * , 200
(x*, dn*); and finally the bootstrap Kaplan-Meier
curve S* is constructedaccordingto formula(5.16),
and the bootstrapobservedmedian 0* calculated. For 150
the Channing House data, B = 1600 bootstraprepli-
cations of 0* gave estimated standard errora = 14.0
months for 0. An estimated bias of 4.1 months was 100
calculated as at (4.4). Efron (1981b) gives a fuller
description.
Once again thereis a simplerway to apply to boot- 50
strap. Consider each pair y-= (xi, di) as an observed
point obtained by simple random sampling from a
bivariate distributionF, and apply the bootstrap as
describedin Section 2 to the data set yl, Y2, , Yn 0.6 0.7 0.8 0.9
-iid F. This methodmakes no use of the special struc- FIG. 10. Bootstraphistogramof of, *, k10oofor the Wolfersun-
ture (5.12)-(5.15). Surprisingly,it gives exactly the spot data, model(6.1).
same answers as the more complicated bootstrap
methoddescribedearlier(Efron,1981a). This leads to
a surprisingconclusion:bootstrapestimatesof varia- that the distributionis skewed to the left, so a
bilityforthe Kaplan-Meier curve give correctstand- confidenceintervalfor'kmightbe asymmetricabout
ard errorseven whenthe usual assumptionsabout the 'kas discussed in Sections 8 and 9.
censoringmechanism,(5.12)-(5.15), fail. In bootstrappingthe residuals, we have assumed
that the first-orderautoregressivemodel is correct.
6. EXAMPLES WITH MORE COMPLICATED (Recall the discussionof regressionmodels in Section
DATA STRUCTURES 5.) In fact,the first-orderautoregressivemodel is far
from adequate for this data. A fit of second-order
Example 1: Autoregressive Time Series Model autoregressivemodel
This example illustrates an application of the (6.2) z- = azi-- + OZi-2 + e
bootstrapto a famoustime series.
The data are the Wolfer annual sunspot numbers gave estimates a' = 1.37, 0 = -.677, both with an
forthe years 1770-1889 (taken fromAnderson,1975). estimated standard error of .067, based on Fisher
Let the count forthe ith year be zi. Aftercentering informationcalculations. We applied the bootstrapto
the data (replacingz- by z- - z--),we fit a first-order this model, producing the histograms for a*t, *--,
autoregressivemodel a*too and O*, *--, o shown in Figs. 11 and 12,
respectively.
(6.1) Z. = kz-i-+ E- The bootstrap standard errorswere .070 and .068,
wheree- iid N(0, a2). The estimate k turnedout to respectively,both close to the usual value. Note that
be .815 withan estimatedstandarderror,one overthe the additional term has reduced the skewness of the
square rootof the Fisher information, of .053. firstcoefficient.
A bootstrapestimateof the standarderrorof k can
Example 2: Estimatinga Response Transformation
be obtained as follows.Define the residuals ki = Z-
in Regression
ozi-1 fori = 2, 3, -.., 120. A bootstrapsample z*, z*,
.-, *20is createdby sampling2, 3, *, 0 with Box and Cox (1964) introduceda parametricfamily
replacement from the residuals,then letting z* = zl, for estimatinga transformationof the response in
and z4* = Oz 1 + si*, i = 2, ..., 120. Finally, after a regression. Given regression data {(xl, yi), * ,
centeringthe time series z*, z*, *-, 420, g is the (Xn,Yn)}, theirmodel takes the form
estimateof the autoregressiveparameterforthis new
(6.3) z(X) = xi * +
time series. (We could, if we wished, sample the *
froma fittednormaldistribution.) where z-(X) = (yi - 1)/X for X $ 0 and log yi for
A histogramof 1000 such bootstrapvalues k*f, 2 , X = 0, and e, iid N(0, (2). Estimates of X and d are
* io,/40 is shown in Fig. 10. foundbyminimizing
Ej (Z1 - 1 )2-
The bootstrapestimateof standard errorwas .055, Breiman and Friedman (1985) proposeda nonpara-
agreeingnicelywiththe usual formula.Note however metricsolutionforthis problem.Their so called ACE
66 B. EFRON AND R. TIBSHIRANI

sponse Y being numberof cycles to failure,and the


factorslengthoftest specimen(X1) (250, 300, and 350
250 mm),amplitudeof loading cycle (X2) (8, 9, or 10 mm),
and load (X3) (40, 45, or 50 g). As in Box and Cox, we
treatthe factorsas quantitiveand allow only a linear
200 termforeach. Box and Cox foundthat a logarithmic
transformation was appropriate,withtheirprocedure
150 producinga value of -.06 forXwithan estimated95%
confidenceintervalof (-.18, .06).
Fig. 13 shows the transformationselected by the
100 ACE algorithm.For comparison,the log functionis
plotted(normalized)on the same figure.
The similarityis trulyremarkable!In orderto assess
50
the variabilityof the ACE curve, we can apply the
bootstrap.Since the X matrixin this problemis fixed
by design,we resampledfromthe residuals instead of
1.1 1.2 1.3 1.4 1.5
fromthe (xi, y-) pairs. The bootstrapprocedurewas
FIG. 11. Bootstraphistogramof & &,*.,&o for the Wolfersun-
the following:
spot data, model(6.2). Calculate residuals , = s(y,) - xi.*3, i = 1, 2, ... , n.
Repeat B times
300 Choose a sample l , *,n
withreplacementfrom el, En
Calculate y* = s-'(x, - + i = 1, 2, n
Compute s*(.) = resultof ACE algorithm
applied to (x1, y ), .*., (xn, y*)
200 End
The number of bootstrap replications B was 20.
Note thatthe residualsare computedon the s(*) scale,
not the y scale, because it is on the s(*) scale that the
true residuals are assumed to be approximatelyiid.
100
The 20 estimated transformations, s *- -, S20(
are shown in Fig. 14.
The tightclusteringof the smooths indicates that
theoriginalestimates((.) has low variability,especially
forsmallervalues of Y. This agrees qualitativelywith
C -0.8 -0.7 -0.6 -0.5
: ? W } f I I I --T I I I

FIG. 12. Bootstraphistogramof 0*, ..., 6100for the Wolfersun-


I

2
spot data, model(6.2).

(alternatingconditional expectation) model general- Estimated


izes (6.3) to
(6.4) S(y1) = x- ~
0
wheres(.) is an unspecifiedsmooth function.(In its
Log Function
mostgeneralform,ACE allows fortransformations of
the covariates as well.) The functions(.) and param-
eterj3are estimatedin an alternatingfashion,utilizing
a nonparametricsmootherto estimates(-).
In the followingexample,taken fromFriedmanand
Tibshirani (1984), we compare the Box and Cox pro- -2
, , ,, I , I I , . , .
cedure to ACE and use the bootstrap to assess the o 1000 2000 3000
y
variabilityof ACE.
The data fromBox and Cox (1964) consist of a 3 X fromACE and the logfunction
FIG. 13. Estimated transformation
3qX 3qexpenriment on the strengthof yar-ns,the re- forBox and Cox example.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 67
TABLE 5
2- Exact and approximatecentral90% confidenceintervalsfor0, the
truecorrelationcoefficient,
fromthe law schoQldata ofFig. 1
1. Exact (normal theory) [.496, .898] R/L = .44
2. Standard (1.7) [.587, .965] R/L = 1.00
3. Transformedstandard [.508, .907] R/L = .49
4. Parametricbootstrap(BC) [.488, .900] R/L = .43
5. Nonparametricbootstrap(BCa) [.43, .92] R/L = .42

Note: R/L = ratio of rightside of interval,measuredfrom0 = .776,


-1 to left side. The exact interval is strikinglyasymmetricabout 0.
Section 8 discusses the nonparametricmethodof line 5.

-2/ strikingly different fromtheexactnormaltheoryin-


terval based on the assumption ofa bivariatenormal
0 1000 2000 3000
y sampling distribution F.
In this case, it is well knownthat it is betterto
FIG. 14. BootstrapreplicationsofACE transformations
forBox and
Cox example.
makethetransformation k= tanh-(O),X = tanh-(0),
apply(1.7) on theX scale,and thentransform backto
the0 scale.The resulting interval, line3 ofTable 5, is
theshortconfidence intervalforXin theBox and Cox movedcloserto the exactinterval.However,thereis
analysis. nothing automaticaboutthe tanh-1transformation.
For a different statisticfromthe correlation coeffi-
7. BOOTSTRAPCONFIDENCEINTERVALS cient or a different distributional family from the
bivariatenormal,wemightverywellneedothertricks
This sectionpresentsthreecloselyrelatedmethods to make(1.7) perform satisfactorily.
ofusingthebootstrap to setconfidence intervals. The The bootstrapcan be usedto produceapproximate
discussionis in termsof simpleparametricmodels, confidence intervalsin an automaticway.The follow-
wherethe logicalbasis of the bootstrapmethodsis ing discussionis abridgedfromEfron (1984 and
easiestto see. Section8 extendsthemethodsto mul- 1985)and Efron(1982a,Chapter10). Line 4 ofTable
tiparameter and nonparametric models. 5 showsthattheparametric bootstrapintervalforthe
We have discussedobtaininga, the estimatedstand- correlation coefficient0 is nearlyidenticalwiththe
ard errorof an estimator0. In practice,0 and a' are exactinterval."Parametric" in thiscase meansthat
usuallyusedtogether to formtheapproximate confi- the bootstrapalgorithmbegins fromthe bivariate
denceinterval0 E 0 ? 5z" , (1.7), wherez is the normalMLE FNORM, as forthe normaltheorycurve
100 - a percentile pointofa standardnormaldistri- ofFig. 2. This goodperformance is no accident.The
bution.The interval(1.7) is claimedto haveapproxi- bootstrapmethodused in line 4 in effecttransforms
matecoverageprobability 1 - 2a. For the law school 0 tothebest(mostnormal)scale,findstheappropriate
exampleof Section2, the values 0 = .776,& = .115, interval,and transforms this intervalback to the 0
Z(05) = -1.645, give0 E [.587,.965]as an approximate
scale.Allofthisis doneautomatically bythebootstrap
90% centralintervalforthe truecorrelation coeffi- algorithm, without requiringspecialintervention from
cient. the statistician. The pricepaid is a largeamountof
We willcall (1.7) thestandardinterval for0. When computing, perhapsB = 1000bootstrapreplications,
working withinparametric familieslikethebivariate as discussedin Section10.
normal,a'in (1.7) is usuallyobtainedbydifferentiating Define G(s) to be the parametricbootstrapcdf
the log likelihoodfunction,see Section 5a of Rao of0*,
(1973),althoughin thecontextofthispaperwemight
preferto use theparametric bootstrapestimateof a, (7.1) G(s) = Prob*4o*< SI,
e.g., 1NORM in Section2.
The standardintervalsare an immenselyuseful whereProb*indicatesprobability computed according
statisticaltool. Thev have the greatvirtueof being to the bootstrapdistribution of 0*. In Fig. 2 G(s) is
automatic:a computer programcan be written which obtainedby integrating thenormaltheorycurve.We
produces(1.7) directly fromthe data y and the form willpresentthreedifferent kindsofbootstrapconfi-
of the densityfunctionfory, withno further input denceintervalsin orderof increasinggenerality. All
required fromthestatistician. Nevertheless thestand- threemethodsuse percentiles ofG to definethecon-
ardintervals canbe quiteinaccurateas Table 5 shows. fidenceinterval.Theydiffer in whichpercentiles are
The standardinterval(1.7), using 1NORM,(2.5), is used.
68 B. EFRON AND R. TIBSHIRANI

The simplestmethodis to take 0 E [G-'(a), forsome monotonetransformation = g X= ()


G '(1 - a)] as an approximate1 - 2a centralinterval where r is a constant. In the correlationcoefficient
for0. This is calledthepercentilemethodin Section example the functiong was tanh-'. The standard
10.4 of Efron(1982a). The percentilemethodinter- limits(7.2) can now be grosslyinaccurate.Howeverit
val is just theintervalbetweenthe 100 - a and 100 - is easy to verifythat the percentile limits (7.2) are
(1 - a) percentilesof the bootstrapdistribution of still correct. "Correct" here means that (7.2) is the
6*. mappingof the obvious intervalforXz, + r , back
We willuse the notation0[a] forthe a levelend- to the 0 scale, Op[a] = g-(4 + TZ(a)). It is also correct
pointofan approximate confidenceintervalfor0, so in the sense of having exactly the claimed converge
0 E [6[a], 0[1 - a]] is the central1 - 2a interval. probability1 - 2a.
Subscriptswillbe usedtoindicatethevariousdifferent Anotherway to state things is that the percentile
methods.The percentile intervalhas endpoints intervalsare transformation invariant,
(7.2) Op[a] _ G-'(a) (7.7) OP[a] = g(Op[al)
This compareswiththestandardinterval,
forany monotonetransformation g. This impliesthat
(7.3) Os[a] = 0 + iz(a) if the percentileintervalsare correcton some trans-
formedscale X = g(6), then theymust also be correct
Lines1 and 2 ofTable 6 summarize thesedefinitions.
on the originalscale 0. The statisticiandoes not need
the
Suppose bootstrap cdfG is normal,say
perfectly to know the normalizingtransformation g, only that
(7.4) G(s) = -((s- it exists. Definition(7.2) automaticallytakes care of
the bookkeepinginvolved in the use of normalizing
dt,thestandardnormal
where4(1s)= fs (2wr)-1/2e_t2/2
transformations forconfidenceintervals.
cdf.In otherwords,supposethat 6* has bootstrap
Fisher's theoryof maximumlikelihood estimation
N(O,a2). In thiscase thestandardmethod
distribution
says that we are always in situation (7.5) to a first
and the percentile method agree, Os[a] = Op[a]. In
orderof asymptoticapproximation.However,we are
situationslikethatofFig.2,whereG is markedly
non-
also in situation(7.6), forany choice ofg, to the same
normal,the standardintervalis quitedifferent
from
order of approximation.Efron (1984 and 1985) uses
(7.2). Whichis better?
higher order asymptotictheory to differentiatebe-
To answerthisquestion,considerthesimplestpos-
tween the standard and bootstrapintervals.It is the
siblesituation,whereforall 0
higherorderasymptotictermswhichoftenmake exact
(7.5) 0 - N(6, a2) intervalsstronglyasymmetricabout the MLE 0 as in
Table 5. The bootstrapintervalsare effectiveat cap-
That is, we have a singleunknownparameter0 with
turingthis asymmetry.
no nuisanceparameters, and a singlesummary statis-
The percentilemethod automaticallyincorporates
tic0 normally distributedabout0 withconstantstand-
normalizingtransformations, as in going from(7.5)-
ard errora. In thiscase theparametric bootstrapcdf
(7.6). It turns out that there are two other important
is given by (7.4), so Os[a] = Op[a]. (The bootstrap
waysthat assumption(7.5) can be misleading,the first
estimatea' equals a.)
of which relates to possible bias in 0. For example
Supposethoughthatinsteadof (7.5) we have,for
considerfo(O),the familyof densitiesforthe observed
all 0,
correlationcoefficient0 when samplingn = 15 times
(7.6 N) 2 froma bivariate normal distributionwith true corre-

TABLE 6
Four methodsofsettingapproximateconfidenceintervalsfora real valuedparameter6

Method Abbreviation a level endpoint Correctif

1. Standard s[a] a + ^ (a) N(O, U2) a constant


There exists monotonetransformation
+ = g(6), 4 = g(6) such that:
2. Percentile Op[a] G'(a) N(O, T2) T constant
3. Bias-corrected OBC[a] G-1(cf2zo + z(a)1) N(O-ZOT,
- T2) zo, T constant

4. BCa OBCj[a] ( { z + -(zo + z (a) N(-zoTO,


where Tz, = 1+ a4
a(zo +Za) zo, a constant
-
1 rO

Note: Each methodis correctundermoregeneralassumptionsthan its predecessor.Methods 2, 3, and 4 are definedin termsofthe percentiles
of G, the bootstrapdistribution(7.1).
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 69

lation0. In factit is easy to see that no monotone TABLE 7


mappingk = g(0), q = g(0) transforms this family Central90% confidenceintervalsfor6 havingobserved
to - N(0, T2), as in (7.6). If thereweresuch a g, H(X2/19)
then Proba9G< 01 = Probofo < 14 = .50, but for 1. Exact [.631 6,6 1.88 * 6] R/L = 2.38
O = .776 integrating the densityfunctionf776(0) 2. Standard (1.7) [.466 * 6,1.53 * 6] R/L = 1.00
3. BC (7.9) [.580 * 6,1.69 * 6] R/L = 1.64
givesProb=.776{0< 01 = .431. 4. BCa (7.15) [.630 *0, 1.88 8]6 R/L = 2.37
The bias-correctedpercentile method(BC method), 5. NonparametricBCa [.640 8,6 1.68 6]6 R/L = 1.88
line 3 ofTable 6, makesan adjustment forthistype
Note: The exact intervalis sharplyskewed to the rightof 6. The
ofbias. Let BC methodis onlya partialimprovementoverthe standardinterval.
(7.8) zo3 1G(O)1, The BCa interval,a = .108, agrees almost perfectlywith the exact
interval.
where FV-is the inversefunctionof the standard
normalcdf.The BC methodhas a levelendpoint
makes 6 unbiasedfor 6.) A confidenceintervalis
(7.9) OBC[a] - G-1({2Z + Z(a)). desiredforthescale parameter6. In thiscase theBC
intervalbasedon 6 is a definite
improvement overthe
Note: if G(O) = .50, that is if halfof the bootstrap standardinterval(1.7),butgoesonlyabouthalfas far
distributionof J*is less than the observedvalue 0, as it shouldtowardachievingthe asymmetry of the
thenzo = 0 and OBC[a] = Op[a]. Otherwisedefinition exactinterval.
(7.9) makesa bias correction. It turns out that the parametricfamily0 -
Section10.7 of Efron(1982a) showsthatthe BC #(X29/19)cannotbe transformed into(7.10),noteven
intervalfor0 is exactlycorrectif approximately.The resultsofEfron(1982b)showthat
(7.10) X -N(M - zor,
- theredoes exist a monotonetransformation g such
that X = g(O), 4 = g(6) satisfyto a highdegreeof
forsomemonotone transformationX = g(0), q = g(0)
approximation
and someconstantzo. It does not look like (7.10) is
muchmoregeneralthan (7.6), but in factthe bias (7.14) N(O- zor, r) (To = 1 + a+ ).
correction is oftenimportant.
In the exampleof Table 5, the percentilemethod The constantsin (7.14) are zo = .1082, a = .1077.
(7.2) givescentral90% interval[.536,.911]compared The BCa method(Efron,1984),line 4 of Table 6,
to the BC interval[.488,.900] and theexactinterval is a methodofassigning confidence
bootstrap intervals
[.496,.898].By definitionthe endpointsofthe exact whichare exactlyrightforproblems which can be
intervalsatisfy mappedinto form(7.14). This method has a level
endpoint
.7761= .05
(7.11) Probo=.49610> zo + z~
= Prob6.898$0
< .776}. (7.15) OBCakY] G + 1 - a(zo + z(a))
quantitiesforthe BC endpoints
The corresponding
are If a = 0 then 6BC[aI = OBC[a], but otherwisethe BCa

I> .7761= .0465, intervalscan be a substantialimprovement overthe


(7.12) Prob=.488{
BC methodas shownin Table 7.
< .7761 = .0475,
Prob0=.go0O The constant zo in (7.15) is given by zo =
comparedto 4r-1G(O)},(7.8),and so can be computeddirectly from
the bootstrapdistribution. How do we knowa? It
> .7761= .0725,
Prob=.53610 turnsoutthatin one-parameter familiesfo(O),a good
Probo=.9110< .7761 = .0293. approximationis
forthe percentileendpoints.The bias correctionis a I SKEWo=4(l0(t))
quiteimportant in equalizingthe errorprobabilities (7.16)
6
at thetwoendpoints.Ifzocan be approximated accu-
rately(as mentionedin Section9),thenitis preferable whereSKEWo=(i(t)) is the skewnessat parameter
to use theBC intervals. value 0 = 0 of the score statisticlo(t) = (d/a0)log
Table 7 shows a simpleexamplewherethe BC f0(t).For0 lNX19/19) thisgivesa .1081, compared
methodis less successful.The data consistsof the to theactualvaluea = .1077derivedin Efron(1984).
singleobservation0 0(X29/19), thenotationindicat- For the normaltheorycorrelation familyof Table 5
ing an unknownscale parameter0 timesa random a 0 whichexplainswhytheBC method, whichtakes
variablewith distribution X29/19.(This definition a = 0, wordsso wellthere.
70 B. EFRON AND R. TIBSHIRANI

The advantageofformula (7.18) is thatwe neednot TABLE 8


knowthe transformation g leadingto (7.14) in order Central90% confidenceintervalsfor6 = fl2/fll and foro = 1/d
havingobserved(YI, Y2) = (8, 4) froma bivariatenormal
to approximate a. In factOBCa[],likeOBC[a] and Op[a], distributiony N2(iq,I)
is transformation invariant, as in (7.7). Liketheboot-
ForO For
strapmethods,the BCa intervalsare computeddi-
rectlyfromtheformofthedensityfunction Jo(.), for 1. Exact (Fieller) [.29, .76] [1.32, 3.50]
O near0. 2. Parametricboot (BC) [.29, .76] [1.32, 3.50]
3. Standard (1.7) [.27, .73] [1.08, 2.92]
Formula(7.16) appliesto the case where0 is the
onlyparameter.Section8 briefly discussesthe more MLE O=.5 2
challenging problem ofsetting confidence intervals for Note: The BC intervals,line 2, are based on the parametricboot-
a parameter 0 in a multiparameter family, and also in strapdistributionof 6 = Y2/Y1.
nonparametric situationswherethe numberof nui-
sanceparameters is effectivelyinfinite.
To summarize thissection,theprogression fromthe meanvectorq and covariancematrixtheidentity,
standardintervalsto the BCa methodis based on a (8.1) y - N2(1,I)-
seriesof increasingly less restrictive assumptions, as
shownin Table 6. Each successivemethodin Table 6 The parameterof interest,for whichwe desire a
requiresthe statisticianto do a greateramountof confidence is theratio
interval,
computation; firstthe bootstrapdistribution G, then
(8.2) 0= -
fl2/f11
the bias correction constantz0, and finallythe con-
stanta. However,all ofthesecomputations are algo- Fieller(1954)providedwellknownexactintervals for
rithmicin character,and can be carriedout in an 6 in this case. The Fiellerintervalsare based on a
automaticfashion. clevertrick,whichseems veryspecial to situation
Chapter10 ofEfron(1982a) discussesseveralother (8.1), (8.2).
waysofusingthebootstrapto construct approximate Table 8 showsFieller'scentral90% intervalfor6
confidence intervals, whichwillnotbe presented here. havingobservedy = (8, 4). Also shownis theFieller
One ofthesemethods, the "bootstrapt,"was used in interval forX = 1/0 = lql/fl2, whichequals [.76-1,.29-1],
thebloodserumexampleofSection4. theobvioustransformation ofthe intervalfor0. The
standardinterval(1.7) is satisfactory for0,butnotfor
8. NONPARAMETRIC AND MULTIPARAMETER 0. Notice that the standard intervaldoes not trans-
CONFIDENCEINTERVALS formcorrectly from6 to k.
Section7 focusedon the simplecase - Jo,where Line 2 shows the BC intervalsbased on applying
we have onlya real valuedparameter0 and a real definitions (7.8) and (7.9) to theparametric bootstrap
valuedsummary statistic0 fromwhichwe are trying distribution of 0 Y2/Y1 (or X = y1/y2). This is the
to constructa confidence intervalfor0. Variousfa- distribution of 0* = Y*/y* when samplingy* =
-
(Y1, Y2*)fromFNORM N2((y1,Y2), I). The bootstrap
vorableproperties of the bootstrapconfidence inter-
vals were demonstrated in the simplecase, but of intervals transform correctly, and in this case they
coursethesimplecase is whereweleastneeda general agree withthe exact interval to threedecimalplaces.
methodlikethebootstrap. Example 2. Product of NormalMeans
Now we will discussthe morecommonsituation
wherethereare nuisanceparameters besidesthe pa- For mostmultiparameter situations,theredo not
rameterof interest0; or even more generallythe exist exactconfidence intervals fora singleparameter
n6nparametric case, wherethe numberof nuisance of interest.Suppose for instance that(8.2) is changed
parametersis effectively infinite.The discussionis to
limitedto a fewbriefexamples.Efron(1984and 1985)
(8.3) 0= 1772,
developsthe theoreticalbasis of bootstrapapproxi-
mateconfidence intervalsforcomplicated situations, stillassuming(8.1). Table 9 showsapproximate inter-
and givesmanymoreexamples.The word"approxi- vals for0, and also for / = 02, havingobservedy =
mate" is important here since exact nonparametric (2, 4). The "almostexact"intervalsare based on an
confidence intervalsdo notexistformostparameters analogofFieller'sargument (Efron,1985),whichwith
(see Bahadurand Savage,1956). suitablecare can be carriedthrough to a highdegree
ofaccuracy.Once again,theparametric BC intervals
Example1. RatioEstimation
area closematchto line1. The factthatthestandard
The data consists of y = (Yi, Y2),assumed to come intervalsdo not transform correctlyis particularly
froma bivariatenormaldistribution withunknown obvioushere.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 71
TABLE 9 on bootstrapping 0 = t(F), whichis thenonparametric
Central90% confidenceintervalsfor6 = fllfl2 and =0 62 having
observedy = (2, 4), wherey N2(iq,I)
MLE of 0. In thiscase a goodapproximation to the
constanta is givenin termsoftheempiricalinfluence
ForO For 0 functionU?, definedin Section10 at (10.11),
1. Almostexact [1.77, 17.03] [3.1, 290.0] 11
2. Parametricboot'(BC) [1.77, 17.12] [3.1, 239.1] (8.5) a --=(U?)3
--I
3. Standard (1.7) [0.64, 15.36] [-53.7, 181.7]
MLE E-8 64 This is a convenient
formula,sinceit is easy to nu-
Note: The almost exact intervals are based on the high order mericallyevaluatethe Ui?by simplysubstituting a
approximationtheoryof Efron (1985). The BC intervalsof line 2 smallvalueof0 into(10.11).
are based on the parametricbootstrapdistributionof 6 = Y1Y2-
Example 3. The Law School Data
The good performanceof the parametricBC inter- For 6 the correlation coefficient,the values of U?
vals is not accidental. The theorydeveloped in Efron corresponding to the 15 data pointsshownin Fig. 1
(1985) shows that the BC intervals,based on boot- are -1.507, .168,.273,.004,.525,-.049, -.100, .477,
strappingthe MLE 0, agree to high order with the .310, .004, -.526, -.091, .434, .125, -.048. (Notice
almost exact intervalsin the followingclass of prob- howinfluential law school1 is.) Formula(8.5) gives
lems: the data y comes froma multiparameterfamily a - -.0817. B = 100,000 bootstrapreplications,
of densitiesf,(y), both y and q k-dimensionalvectors; about 100 timesmorethan was actuallynecessary
the real valued parameter of interest0 is a smooth (see Section10), gave zQ = -.0927, and the central
functionof q, 0 = t(q); and the familyf,(y) can be 90% interval0 E [.43, .92] shownin Table 5. The
transformedto multivariatenormality,say nonparametric BCa intervalis quitereasonablein this
example,particularly consideringthat there is no
(8.4) g(y) - Nk(h(,q),I), guaranteethatthe truelaw schooldistribution F is
by some one-to-onetransformations g and h. anywhere nearbivariatenormal.
Just as in Section 7, it is not necessary for the Example 4. Mouse Leukemia Data
statisticianto know the normalizingtransformations (the FirstExample in Section 3)
g and h, only that they exist. The BC intervals are
obtaineddirectlyfromthe originaldensitiesfr,:we find The standardcentral90% intervalfor,Bin formula
11= ?f(y),the MLE of q; sample y* - f,; compute 0*, (3.1) is [.835,2.18].The bias correction
constantzo-
the bootstrapMLE of 0; calculate G, the bootstrapcdf .0275,givingBC interval[1.00,2.39]. This is shifted
of 0*, usually by Monte Carlo sampling,and finally farrightofthe standardinterval,reflecting the long
apply definitions(7.8) and (7.9). This process gives righttail of the bootstraphistogram seen in Fig. 3.
the same intervalfor 0 whetheror not the transfor- We can calculate"a" from(8.5), considering each of
mationto form(8.4) has been made. the n = 42 data pointsto be a triple(yi, xi,bi):a-
Not all problemscan be transformedas in (8.4) to -.152. Because a is negative,the BCa intervalis
a normal distributionwith constant covariance. The shiftedback to the left,equaling[.788, 2.10]. This
case considered in Table 7 is a one-dimensional contrastswiththe law schoolexample,wherea, zo,
counterexample. As a resultthe BC intervalsdo not and theskewnessofthebootstrapdistribution added
always work as well as in Tables 8 and 9, although to each otherratherthancancellingout,resulting in
they usually improveon the standard method. How- a BCa intervalmuchdifferent fromthe standardin-
ever, in orderto take advantage of the BCa method, terval.
whichis based on moregeneralassumptions,we need Efron(1984) providessometheoretical supportfor
to be able to calculate the constanta. thenonparametric BCa method.Howevertheproblem
Efron (1984) gives expressions for "a" generalizing ofsettingapproximate nonparametric confidence in-
(7.16) to multiparameter families, and also to non- tervalsis stillfarfromwellunderstood, and all meth-
parametric situations. If (8.4) holds, then "a" will have ods shouldbe interpreted withsomecaution.We end
value zero, and the BCa method reduces to the BC thissectionwitha cautionary example.
case. Otherwise the two intervals differ.
Example 5. The Variance
Here we will discuss only the nonparametric situa-
tion: the observed data y = (xl, x2, ... , xn) consists of Suppose X is the real line, and 0 = VarFX,the
iid observations X1, X2, ... , - F, where F can be variance.Line5 ofTable 2 showstheresultofapplying
any distribution on the sample space 2; we want a the nonparametric BCa methodto data sets xl, x2,
confidence interval for 0 = t(F), some real valued *--, x20whichwereactuallyiid samplesfroma N(0,
functional of F; and the bootstrap interval are based 1) distribution.
The number.640 forexampleis the
72 B. EFRON AND R. TIBSHIRANI

average of OBC[.05]/0over 40 such data sets,B = 4000 The situationis quitedifferent bootstrap
forsetting
bootstrapreplications per data set. The upperlimit confidenceintervals. ofEfron(1984),
The calculations
small,as pointedoutbySchenker
1.68. 0 is noticeably Section8, showthatB = 1000 is a roughminimum
(1985).The reasonis simple:thenonparametric boot- forthe numberof MonteCarlobootstrapsnecessary
strapdistribution of 0* has a shortuppertail; com- to compute the BC or BCa intervals. Somewhat
paredto theparametric bootstrapdistributionwhich smallervalues,sayB = 250,can givea usefulpercen-
is a scaledx29 randomvariable.The resultsofBeran beingthatthenthe con-
tile interval,the difference
(1984),BickelandFreedman(1981),and Singh(1981) stantzo neednotbe computed.Confidence intervals
showthatthenonparametric bootstrapdistribution
is are a fundamentally moreambitiousmeasureof sta-
highlyaccurateasymptotically, but of coursethatis tisticalaccuracythan standarderrors,so it is not
nota guaranteeofgoodsmallsamplebehavior.Boot- surprising thattheyrequiremorecomputational ef-
strapping froma smoothedversionofF, as in lines3, fort.
4, and 5 of Table 2 alleviatesthe problemin this
particular example. 10. THE JACKKNIFEAND THE DELTA METHOD

9. BOOTSTRAPSAMPLE SIZES to thesimplecase ofassigning


This sectionreturns
a standard errorto 0(y), where y = (x1, ..., x.) is
How manybootstrapreplicationsmustwe take? obtainedbyrandomsamplingfroma singleunknown
Considerthe standarderrorestimate'B based on B X1, * - *, Xn -iid F. We willgiveanother
distribution,
bootstrapreplications,(2.4). As B -* oo, B approaches descriptionof the bootstrapestimatea, whichillus-
a, the bootstrapestimateof standarderroras origi- tratesthebootstrap's relationshipto oldertechniques
nallydefinedin (2.3). BecauseF does notestimateF of assigningstandarderrors,like the jackknifeand
perfectly, 'a = o(F) willhavea non-zerocoefficient
of thedeltamethod.
variationforestimating the truestandarderrora = For a givenbootstrapsample y* = (x*, x**,4), as
o(F); &B willhavea largerCV becauseoftherandom- describedin step(i) ofthealgorithm in Section2,let-
nessaddedbytheMonteCarlobootstrapsampling. Pi' indicatethe proportion of the bootstrapsample
It is easyto derivethefollowing approximation, equal to xi,
(9.1) CV(5B) {CV(&) E+ 2] 1/2 i4 #= * =
xi)x i=
{V- CV(5)2 + E4B) (10.1) in 1,2 ..
n
where6 is thekurtosis ofthebootstrapdistributionof = (p*9,p*, **, p*). The vectorp* has a rescaled
&'*, giventhe data y, and E{II its expectedvalue multinomialdistribution
averagedover y. For typicalsituations,CV(Uf)lies
between.10 and .30. For example,if 0 = i, n = 20, p* - MUltn(n,p?)/n
Xi -fidN(0, 1), thenCV(j) -.16. (10.2) (p0 = (1/n, 1/n, *.., 1/n)),
Table 10 showsCV(&B)forvariousvaluesofB and
CV(GD), assumingE{&} = 0 in (9.1). For values of wherethenotationindicatestheproportions observed
>
CV(Uj) .10,thereis littleimprovementpast B = 100. fromn randomdraws on n categories,each with
In factB as smallas 25 givesreasonableresults.Even probability1/n.
smallervalues of B can be quite informative,as we For n = 3 thereare 10 possiblebootstrapvectors
saw in theStanfordHeartTransplantData (Fig. 7 of p*. These are indicatedin Fig. 15 along withtheir
Section3). multinomial from(10.2). For example,
probabilities
to x* =
p* = (1/3, 0, 2/3), corresponding (x1, X3, X3) or
TABLE 10 anypermutation ofthesevalueshas bootstrapproba-
ofvariationof aB, the bootstrapestimateofstandard
Coefficient bility1/9.
errorbased on B Monte Carlo replications,as a functionofB and To make our discussioneasier suppose that the
CV(f), the limitingCV as B oo
form:0 = 0(F),
statisticof interest0 is of functional
B--
where8(F) is a functional assigninga realnumberto
25 50 100 200- X anydistribution F on thesamplespaceX. The mean,
thecorrelation coefficient,and thetrimmed meanare
CV(&) .25 .29 .27 .26 .25 .25
l .20 .24 .22 .21 .21 .20
all of functionalform.Statisticsof functionalform
.15 .21 .18 .17 .16 .15 have the same value as a functionof F, no matter
.10 .17 .14 .12 .11 .10 whatthe samplesize n maybe, whichis convenient
.05 .15 .11 .09 .07 .05 fordiscussing thejackknifeand deltamethod.
0 .14 .10 .07 .05 0 For any vectorp = (P1, P2, , Pn) havingnon-
Note:Basedon (9.1),assumingEf&} = 0. negativeweightssumming to 1, define the weighted
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 73

x3
1/27 (10.6) p(i) - (1, 1, .., 1, , 1, ..., 1)
p*-(l/3,O,2/3)
i = 1, 2, *.. , n. Fig. 15 indicates the jackknifepoints
/9* 1/9
forn = 3; because0 is the functional
form,(10.4),it
does notmatterthatthejackknifepointscorrespond
P(2) P(1) to sample size n - 1 ratherthan n.
p The linear functionOj(p) is calculated to be
1/9 2/9 1/9 (10.7) &J(p) = 0(i) + (P - p0) U
where,in termsof 0(i) 0(p(i), 0(.) = XO=10(i)/n,and
U is thevectorwithithcoordinate
(10.8) Ui = (n - 1)(0(.) -0(i))
1/27 1/9 p 1/9 1/27 Thejackknife estimateofstandarderror(Tukey,1958;
Xxi (3) X2 Miller,1974)is
FIG. 15. The bootstrapand jackknifesamplingpoints in the case ___ ]~~~1/2[ U211/2
n = 3. The bootstrappoints (.) are shown withtheirprobabilities. (10.9) .0 [n E {-(i) -())2J
0nn- =

A standardmultinomial
calculationgivesthe follow-
empiricaldistribution ingtheorem(Efron,1982a),
Pion xi i = 1, ***,n.
(10.3) F(p): probability
THEOREM. The jackknifeestimateof standarder-
Forp = p0= 1/n,theweighted empiricaldistribution rorequals [n/(n - 1)]1/2 timesthebootstrapestimateof
equalsF, (1.4). standarderrorforOj,
Corresponding to p is a resampledvalueof0, F
inI
1~~~~~1/2

(10.4) 0(p) O(F(p)). (10.10) = [n -


1 var.*0J(p*)

The shortenednotation0(p) assumesthat the data In otherwords,thejackknifeestimateis itselfalmost


(x1,x2, *--, xn) is consideredfixed.Noticethat0(p0) estimateappliedto a linearapproximation
a bootstrap
= 0(F) is theobserved valueofthestatisticofinterest. of 0. The factor [n/(n - 1)/2 in (10.10) makes &Tj
The bootstrapestimatea, (2.3),can thenbe written unbiasedforu2 in the case where0 = i, the sample
mean.We couldmultiply thebootstrapestimatea' by
(10.5) a =[var*0(p*)Il/,
thissamefactor, and achievethesameunbiasedness,
wherevar*indicatesvariancewithrespectto distri- buttheredoesnotseemtobe anyconsistent advantage
bution(10.2). In termsof Fig. 15, 'a is the standard to doingso. The jackkniferequiresn,ratherthanB =
deviationof the ten possiblebootstrapvalues 9(p*) 50 to 200 resamples,at theexpenseofaddinga linear
weighted as shown. approximation to thestandarderrorestimate.Tables
It lookslikewe couldalwayscalculate'a simplyby 1 and 2 indicatethatthereis someestimating effi-
doinga finitesum. Unfortunately, the numberof ciencylostin makingthisapproximation. For statis-
bootstrappointsis (2n-1), 77,558,710forn = 15 so tics like the sample medianwhichare difficult to
straightforward calculationof'a is usuallyimpractical. approximatelinearly,the jackknifeis useless (see
That is whywe have emphasizedMonte Carlo ap- Section3.4 ofEfron,1982a).
prbximations to v. Therneau (1983) considersthe There is a moreobviouslinearapproximation to
questionof methodsmoreefficient thanpureMonte 0(p) than Oj(p). Why not use the first-orderTaylor
Carlo, but at presentthereis no generallybetter seriesexpansionfor0(p) aboutthepointp = p0?This
methodavailable. is the idea of Jaeckel's infinitesimal
jackknife(1972).
However,thereis anotherapproachto approximat- The Taylorseriesapproximation
turnsoutto be
ing (10.5). We can replacethe usuallycomplicated
O^(P) 0(P0) + (P - P
function0(p) by an approximation linearin p, and
thenuse thewellknownformulaforthemultinomial where
varianceof a linearfunction. The jackknifeapproxi-
mationOj(p) is thelinearfunction ofp whichmatches (10.11) U? = lim 0((i - )P + e5i) -"0(Po)
0(p), (10.4), at the n points corresponding to the
deletionof a singlexi fromthe observeddata set 5,beingthe ith coordinatevector.This suggeststhe
74 B. EFRON AND R. TIBSHIRANI

infinitesimal
jackknifeestimateofstandarderror namesforthe same method.Noticethatthe results
(10.12) ^ =-[var reported in line7 ofTable 2 showa"severedownward
*T(p*)Il/2=[=U02/n ]
bias. Efronand Stein (1981) showthatthe ordinary
withvar*stillindicating
varianceunder(10.2). The jackknifeis alwaysbiased upward,in a sense made
jackknifecan be thoughtof as takinge =
ordinary precisein that paper. In the authors'opinionthe
-1/(n - 1) in the definitionof U?, while the infini- ordinary jackknifeis themethodofchoiceifone does
tesimaljackknifeletse -*0, thereby
earningthename. notwantto do thebootstrapcomputations.
The U? are valuesofwhatMallows(1974) callsthe
empiricalinfluencefunction.Their definitionis a ACKNOWLEDGMENT
nonparametric estimateofthetrueinfluence function This paperis based on a previousreviewarticleap-
pearingin Behaviormetrika. The authorsand Editor
IF(x) = lim 0((1 - )F + ek) - 0(F) aregrateful to thatjournalforgraciously
allowingthis
e--O e
revisionto appearhere.
Axbeingthe degeneratedistribution puttingmass 1
on x. The rightside of (10.12) is thenthe obvious REFERENCES
estimateof the influencefunctionapproximation ANDERSON, 0. D. (1975). Time Series Analysis and Forecasting:
The Box-JenkinsApproach.Butterworth,
London.
to the standarderrorof 0 (Hampel,1974), v(F)-
BAHADUR, R. and SAVAGE, L. (1956).The nonexistence
ofcertain
[ IF2(x) dF(x)/n]1/2.The empirical influencefunction statistical
procedures
in nonparametric problems.
Ann.Math.
methodand the infinitesimal jackknifegiveidentical Statist.27 1115-1122.
estimatesofstandarderror. BERAN, R. (1984). Bootstrapmethodsin statistics.
Jahrb. Math.
How have statisticiansgottenalong forso many Ver.86 14-30.
BICKEL, P. J.andFREEDMAN, D. A. (1981).Someasymptotic theory
years withoutmethodslike the jackknifeand the
forthebootstrap. Ann.Statist.9 1196-1217.
bootstrap?The answeris the deltamethod,whichis Box, G. E. P. and Cox, D. R. (1964).An analysisoftransforma-
stillthemostcommonly useddeviceforapproximating tions.J. R. Statist.Soc. Ser. B 26 211-252.
standarderrors.The methodappliesto statisticsof BREIMAN, L. and FRIEDMAN, J. H. (1985). Estimatingoptimal
the formt(Q1, Q2, . , QA), wheret(-, ., -. ., .) is a for multipleregressionand correlation.
transformations J.
knownfunction and each Qa is an observedaverage, Amer.Statist.Assoc. 80 580-619.
Cox, D. R. (1972).Regression
modelsand lifetables.J. R. Statist.
Qa = X7n=1Qa(Xi)/n. For example, the correlation0 is Soc. Ser. B 34 187-202.
a functionofA = 5 suchaverages;theaverageofthe CRAMER, H. (1946). MathematicalMethods of Statistics.Princeton
firstcoordinatevalues,the second coordinates, the University NewJersey.
Press,Princeton,
firstcoordinatessquared,the second coordinates EFRON, B. (1979a).Bootstrapmethods:anotherlookat thejack-
knife.Ann. Statist. 7 1-26.
squared,and thecross-products.
EFRON, B. (1979b).Computers
andthetheory
ofstatistics:
thinking
In itsnonparametric formulation, thedeltamethod the unthinkable.Soc. Ind. Appl. Math. 21 460-480.
worksby (a) expandingt in a linearTaylor series EFRON, B. (1981a). Censoreddata and the bootstrap.J. Amer.
aboutthe expectations of the Qa; (b) evaluatingthe Statist.Assoc. 76 312-319.
standarderrorof the Taylorseriesusingthe usual EFRON, B. (1981b).Nonparametric ofstandarderror:the
estimates
expressions forvariancesand covariancesofaverages; jackknife,thebootstrap,andotherresamplingmethods,
Biom-
etrika68 589-599.
and (c) substituting -y(F) forany unknownquantity EFRON, B. (1982a).The jackknife,
thebootstrap,
and otherresam-
'y(F) occurringin (b). Forexample,thenonparametric pling plans. Soc. Ind. Appl. Math. CBMS-Natl. Sci. Found.
deltamethodestimatesthestandarderrorofthecor- Monogr.38.
relation0 by EFRON, B. (1982b).Transformationtheory:hownormalis a one
parameterfamilyofdistributions?
Ann.Statist.10 323-339.
[ A240
+ 04 + 222 4,22 4231 413 / EFRON, B. (1982c).Maximum likelihood
anddecisiontheory.
Ann.
,-;- A 2
+
A
-
+A A Statist. 10 340-356.
I4n
fL20 L02 L20L02 A11 A11/02 Al11L02 J
EFRON, B. (1983).Estimating
the errorrateof a prediction
rule:
improvementsin cross-validation.J. Amer. Statist.Assoc. 78
where,in termsof xi = (yi, zi),
316-331.
-Lg z (Y -) g -)h EFRON, B. (1984).Betterbootstrapconfidenceintervals.
Tech.Rep.
Stanford Univ.Dept.Statist.
(Cramer(1946),p. 359). EFRON, B. (1985). Bootstrapconfidence intervalsfora class of
parametricproblems. Biometrika 72 45-58.
EFRON, B. and GONG, G. (1983).A leisurelylookat thebootstrap,
THEOREM. For statisticsoftheform0 = t(Q1,
l.., the jackknife,and cross-validation. Amer.Statistician37
QA), the nonparametricdelta methodand the infinites- 36-48.
imaljackknifegive thesame estimateofstandarderror EFRON, B. andSTEIN, C. (1981).Thejackknifeestimateofvariance.
(Efron,1982c). Ann. Statist.9 586-596.
FIELLER, in interval
E. C. (1954).Someproblems estimation.
J. R.
Statist.Soc. Ser. B 16 175-183.
The infinitesimal
jackknife,
the deltamethod,and FRIEDMAN, J. H. and STUETZLE, W. (1981). Projectionpursuit
theempiricalinfluencefunctionapproachare three regression.J. Amer.Statist.Assoc. 76 817-823.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 75

FRIEDMAN, J. H. and TIBSHIRANI, R. J. (1984). The monotone MALLOWS, Memorandum,


C. (1974).On sometopicsin robustness.
smoothingof scatter-plots.Technometrics26 243-250. Bell Laboratories,MurrayHill,NewJersey.
HAMPEL, F. R. (1974). The influencecurve and its role in robust MILLER, R. G. (1974). The jackknife-a review.Biometrika 61
estimation.J. Amer.Statist.Assoc. 69 383-393. 1-17.
HASTIE, T. J. and TIBSHIRANI, R. J. (1985). Discussion of Peter MILLER, R. G. and HALPERN, J. (1982).Regressionwithcensored
Huber's "ProjectionPursuit."Ann. Statist. 13 502-508. data.Biometrika 69 521-531.
HINKLEY, D. V. (1978). Improvingthe jackknifewithspecial refer- RAO,C. R. (1973). Linear StatisticalInferenceand Its Applications.
ence to correlationestimation.Biometrika65, 13-22. Wiley,NewYork.
HYDE, J. (1980). Survival Analysis with IncompleteObservations. SCHENKER, N. (1985).Qualmsaboutbootstrap intervals.
confidence
BiostatisticsCasebook.Wiley,New York. J. Amer.Statist.Assoc. 80 360-361.
jackknife.MemorandumMM
JAECKEL, L. (1972). The infinitesimal accuracyofEfron'sbootstrap.
SINGH,K. (1981).On theasymptotic
72-1215-11.Bell Laboratories,MurrayHill, New Jersey. Ann. Statist.9 1187-1195.
JOHNSON, N. and KOTZ, S. (1970). Continuous UnivariateDistri- THERNEAU, T. (1983).Variancereduction techniquesfortheboot-
butions.HoughtonMifflin,Boston, Vol. 2. strap.Ph.D. thesis,StanfordUniversity, Department ofStatis-
KAPLAN, E. L. and MEIER, P. (1958). Nonparametricestimation tics.
fromincompletesamples. J. Amer. Statist.Assoc. 53 457-481. esti-
TIBSHIRANI, R. J. and HASTIE, T. J. (1984).Local likelihood
KIEFER, J. and WOLFOWITZ, J. (1956). Consistencyofthe maximum mation.Tech.Rep.Stanford Univ.Dept.Statist.97.
likelihoodestimatorin the presenceof infinitelymanyinciden- TUKEY, J. (1958).Bias and confidence in notquitelargesamples,
tal parameters.Ann. Math. Statist.27 887-906. abstract.Ann. Math. Statist.29 614.

Comment
J.A. Hartigan

Efronand Tibshiraniare to be congratulated on a empiricaldistribution, and supposethat t(Fn) is an


wide-ranging persuasivesurveyof the manyuses of estimateofsomepopulationparameter t(F). The sta-
theboostraptechnology. Theyarea bitcageyon what tistict(F,) is computed forseveralrandomsubsamples
is or is nota bootstrap,butthedescription at theend (each observationappearingin the subsamplewith
ofSection4 seemsto coverall thecases; somedatay probability1/2), and the set of t(Fn) values obtained is
comesfroman unknown probabilitydistribution F; it regardedas a samplefromthe posteriordistribution
is desiredtoestimatethedistribution ofsomefunction of t(F). For example,the standarddeviationof the
R(y, F) givenF; and thisis doneby estimating the t(F,) is an estimateof the standarderrorof t(FJ)
distribution ofR (y*,F) givenF whereF is an estimate fromt(F); however, theprocedure is notrestricted to
ofF basedon y,andy* is sampledfromtheknownF. realvaluedt.
Therewillbe threeproblemsin anyapplicationof The procedureseems to work not too badly in
the bootstrap:(1) how to choose the estimateF? gettingat the first-and second-order behaviorsof
(2) how muchsamplingof y* fromF? and (3) how t(Fn)whent(Fn)is nearnormal,but it noteffective
close is the distribution of R(y*, F) given F to in handlingthird-order behavior,bias,and skewness.
R(y, F) givenF? Thus thereis notmuchpointin takinghugesamples
Efronand Tibshiranisuggesta varietyofestimates t(F,) sincethe third-order behavioris not relevant;
F forsimplerandomsampling,regression, and auto- andiftheprocedure worksonlyfort(Fn)nearnormal,
regression;their remarksabout (3) are confined therearelessfancyprocedures forestimating standard
mainlyto empiricaldemonstrations of the bootstrap errorsuchas dividingthesampleup into10 subsam-
in specificsituations. plesofequal size andcomputing theirstandarddevia-
I have some generalreservations about the boot- tion. (True, this introducesmorebias than having
strap based on my experienceswith subsampling tech- randomsubsampleseach containingabout halfthe
niques (Hartigan,1969, 1975). Let X1, X, be a
..., observations.) Indeed,evenift(Fn)is notnormal,we
randomsamplefroma distribution F, let F, be the can obtainexactconfidence intervalsforthe median
of t(Fn110) usingthe 10 subsamples.Even fivesub-
J.A. HartiganisEugeneHigginsProfessor ofStatistics, sampleswill givea respectableidea of the standard
Yale University,Box 2179 Yale New
Station, Haven, error.
CT 06520. Transferring backto thebootstrap:(A) is theboot-

You might also like