Professional Documents
Culture Documents
Bootstrap Methods For Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy
Bootstrap Methods For Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy
BootstrapMethodsforStandardErrors,
ConfidenceIntervals,and OtherMeasures of
StatisticalAccuracy
B. Efronand R. Tibshirani
54
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 55
F indicatetheempiricalprobability
distribution, We willsee thatbootstrapconfidence intervalscan
automatically incorporate
trickslikethis,withoutre-
(1.4) F: probability
mass 1/non x1,x2,... , xn. quiringthedataanalysttoproducespecialtechniques,
Thenwecan simplyreplaceF byF in (1.2),obtaining likethetanh-1transformation,foreachnewsituation.
Animportant themeofwhatfollows is thesubstitution
(1.5) a - F) = [U2(P)/nl . ofrawcomputing powerfortheoretical analysis.This
as the estimatedstandarderrorfori. This is the is not an argumentagainsttheory,of course,only
bootstrapestimate.The reasonforthe name "boot- againstunnecessary theory.Mostcommonstatistical
strap"willbe apparentin Section2,whenweevaluate methodsweredevelopedinthe1920sand 1930s,when
v(F) forstatisticsmorecomplicatedthanx. Since computation was slowand expensive.Now thatcom-
putationis fastand cheapwecan hopeforand expect
changesin statisticalmethodology. This paper dis-
(1.6) 82(F) = X (
/2 a
cussesone suchpotentialchange,Efron(1979b)dis-
cussesseveralothers.
a is too
is notquitethesame as a-,butthedifference
in
smallto be important mostapplications.
Of course we do not reallyneed an alternative 2. THE BOOTSTRAPESTIMATEOF STANDARD
formulato (1.3) in this case. The troublebeginswhen ERROR
wewanta standarderrorforestimators morecompli- This sectionpresentsa morecarefuldescription of
catedthanx, forexample,a medianor a correlation thebootstrapestimateofstandarderror.For nowwe
ora slopecoefficientfroma robustregression. In most willassumethatthe observeddata y = (xl, x2, **
cases thereis no equivalentto formula(1.2), which xn)consistsofindependent andidenticallydistributed
expressesthestandarderrora(F) as a simplefunction (iid) observations X1,X2- .-, Xn fiidF, as in (1.1).
ofthe samplingdistribution F. As a result,formulas Here F represents an unknownprobability distribu-
like(1.3) do notexistformoststatistics. tion on r, the commonsample space of the observa-
This is wherethecomputercomesin. It turnsout tions. We have a statisticof interest,say 0(y), to
thatwecan alwaysnumerically evaluatethebootstrap whichwe wishto assignan estimatedstandarderror.
estimatea = a(F), withoutknowinga simpleexpres- Fig. 1 showsan example.The samplespace r is
sionfora(F). The evaluationofa is a straightforward n2+, the positivequadrantof the plane. We have
MonteCarloexercisedescribedin thenextsection.In observedn = 15 bivariatedata points,each corre-
a good computing environment, as describedin the spondingto an Americanlaw school.Each pointxi
remarksin Section2, the bootstrapeffectively gives consistsoftwosummary statisticsforthe1973enter-
the statisticiana simple formula like (1.3) for any ingclass at law schooli
no matterhowcomplicated.
statistic,
Standarderrorsare crudebut usefulmeasuresof (2.1) xi= (LSATi, GPAi);
statisticalaccuracy.They are frequentlyused to give
approximateconfidenceintervalsfor an unknown LSATi is the class' averagescore on a nationwide
parameter 0 examcalled "LSAT"; GPAi is the class' averageun-
dergraduate grades.The observedPearsoncorrelation
(1.7) 0 E 0 ? Sz(a),
wherez(a)is the100 * a percentilepointofa standard
3.5-
normalvariate,e.g., Z(95) = 1.645. Interval(1.7) is
sometimes good,and sometimes notso good.Sections GPA *1
7 and 8 discussa moresophisticated use oftheboot- 3.3t *2
strap,whichgivesbetterapproximate confidence in-
tervalsthan(1.7). 3.1 -10
*6
The standardinterval(1.7) is based on takinglit- GPA - 97 *4
erallythe largesamplenormalapproximation (f - 2.9 - *@14
15
0)/S N(0, 1). Appliedstatisticians use a varietyof 03
tricksto improvethisapproximation. For instanceif @13 *12
2.7 '- lI l 1I
0 is the correlation and 0 the samplecor-
coefficient 540 560 580 600 620 640 660 680
relation,thenthetransformation 4 = tanh-1(0), = LSAT
tanh-1(0)greatly improves thenormalapproximation,
FIG. 1. The law schooldata (Efron,1979b). The data points,begin-
at leastin thosecases wheretheunderlying sampling
ning with School 1, are (576, 3.39), (635, 3.30), (558, 2.81),
distribution is bivariatenormal.The correcttactic (578, 3.03), (666, 3.44), (580, 3.07), (555, 3.00), (661, 3.43),
thenis to transform, computetheinterval(1.7) for, (651, 3.36), (605, 3.13), (653, 3.12), (575, 2.74), (545, 2.76),
and transform thisintervalbackto the0 scale. (572, 2.88), (594, 2.96).
56 B. EFRON AND R. TIBSHIRANI
statistic0( ) basedon a randomsampleofsize n from FIG. 2. Histogramof B = 1000 bootstrapreplicationsof 6*for the
the unknowndistribution F. The bootstrapestimate law schooldata. The normaltheorydensitycurvehas a similarshape,
f is actuallyo(F, n,0) evaluatedat F = F. The Monte butfallsoffmorequicklyat the uppertail.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 57
data. The bootstrapsamplesat step (i) of the algo- expression forthestandarderrorofan MLE), and in
rithmcouldthenbe drawnfromFNORM insteadofF, fact all analyticdifficulties of any kind. The data
and steps(ii) and (iii) carriedoutas before. analystis freeto obtain standarderrorsforenor-
The smoothcurvein Fig. 2 showsthe resultsof mouslycomplicatedestimators, subjectonlyto the
carrying out this "normaltheorybootstrap"on the constraints ofcomputer time.Sections3 and6 discuss
law schooldata. Actuallythereis no need to do the someinteresting appliedproblemswhichare fartoo
bootstrapsamplingin this case, because of Fisher's complicated forstandardanalyses.
formula forthesamplingdensityofa correlation coef- How welldoes the bootstrapwork?Table 1 shows
ficientin the bivariate normal situation(see Chapter the answerin one situation.Here r is the real line,
32 of Johnsonand Kotz, 1970). This densitycan be n = 15, and the statistic0 of interestis the 25%
thoughtof as the bootstrap distributionfor B = oo. trimmed mean.If thetruesamplingdistribution F is
Expression(2.5) is a close approximation to "1NORM= N(0, 1), thenthe truestandarderroris a(F) = .286.
o(FNORM), theparametricbootstrapestimate ofstand- The bootstrap estimate'ais nearlyunbiased,averaging
arderror. .287 in a largesamplingexperiment. The standard
In considering the meritsor demeritsof the boot- deviationofthebootstrapestimatea' is itself.071 in
strap, it is worth remembering that all of the usual thiscase,withcoefficient ofvariation.071/.287= .25.
for
formulas estimating standard errors,like g-l/2 (Notice that there are two levels of Monte Carlo
where J is the observed Fisher information,are es- involvedin Table 1: firstdrawingthe actualsamples
sentiallybootstrapestimatescarriedout in a para- y = (xl, x2, ..., x15)fromF, and thendrawingboot-
metricframework. This pointis carefully explainedin * *,x15)withy held fixed.The
strapsamples (x4, x2*, *
Section5 ofEfron(1982c).The straightforward non- bootstrapsamplesevaluatea' fora fixedvalue of y.
parametric algorithm(i)-(iii) has thevirtuesofavoid- The standarddeviation.071 refersto the variability
ing all parametricassumptions,all approximations ofa' dueto therandomchoiceofy.)
(such as thoseinvolvedwiththe Fisherinformation The jackknife,anothercommonmethodofassign-
ing nonparametric standarderrors,is discussedin
Section10. The jackknifeestimateCJ' is also nearly
TABLE 1
unbiasedfora(F), buthas highercoefficient ofvari-
A samplingexperimentcomparingthe bootstrapand jackknife
estimatesofstandarderrorforthe 25% trimmedmean, ation (CV). The minimum possibleCV fora scale-
sample size n = 15 invariantestimateof a(F), assumingfullknowledge
F standard F negative
of the parametricmodel,is shownin brackets.The
normal exponential nonparametric bootstrapis seen to be moderately
efficientin bothcases considered in Table 1.
Ave SD CV Ave SD CV
Table 2 returnsto the case of 0 the correlation
Bootstrap f .2s87 .071 .25 .242 .078 .32 coefficient.Insteadof real data we have a sampling
(B = 200) experiment in whichthe trueF is bivariatenormal,
Jackknife
6f .280 .084 .30 .224 .085 .38 0 = .50,samplesize n = 14. Table 2
truecorrelation
CV) .286
True(minimum (.19) .232 (.27)
is abstracted from a largertablein Efron(1981b),in
TABLE 2
F bivariatenormalwithtrue
C and forX = tanh-'6; sample size n 14, distribution
Estimatesofstandarderrorforthe correlationcoefficient
correlationp = .5 (froma largertablein Efron,1981b)
Summarystatisticsfor200 trials
whichsomeofthemethodsforestimating a standard (x = 0)
time(y) in weeksfortwogroups,treatment
errorrequiredthesamplesize to be even. and control(x = 1), and a 0-1 variable (bi) indicating
The leftside ofTable 2 refersto 0, whiletheright whetheror notthe remissiontimeis censored(0) or
side refersto X = tanh-'(0) = .5 log(1 + 6)/(1 -). complete(1). Thereare 21 micein each group.
For each estimatorof standarderror,the rootmean The standardregression modelforcensoreddata is
squarederrorof estimation[E(a - )2]1/2 iS givenin Cox's proportional
hazardsmodel(Cox, 1972).It as-
thecolumnheadedVMi.. sumesthatthehazardfunction h(tIx),theprobability
The bootstrapwas runwithB = 128 and also with of goingintoremissionin nextinstantgivenno re-
B = 512,thelattervalueyieldingonlyslightly better missionup to timet fora mousewithcovariatex, is
estimatesin accordancewiththeresultsofSection9. oftheform
FurtherincreasingB wouldbe pointless.It can be
(3.1) h(tIx) = ho(t)e:x.
shownthatB = oo givesVii-i = .063for0, only.001
less thanB = 512. The normaltheoryestimate(2.5), Hereho(t)is an arbitraryunspecifiedfunction.
Since
whichwe knowto be ideal forthissamplingexperi- x hereis a groupindicator,
thismeanssimplythatthe
ment,has ../i-Si= .056. hazardforthecontrolgroupis e: timesthehazardfor
We can compromise betweenthe totallynonpara- the treatment group.The regression parameterd is
metric bootstrap estimatea'andthetotallyparametric estimatedindependently of ho(t)throughmaximiza-
bootstrapestimateC7NORM. This is done in lines3, 4, tionoftheso called"partiallikelihood"
and 5 ofTable 2. Let 2 = Sin-l (xi - )(x- i)'/n be e,3xi
the samplecovariancematrixof the observeddata. (3.2) PL = 11 e-xi
The normalsmoothedbootstrap drawsthe bootstrap iED EiER, e i'
samplefromF (D N2(0, .252), (D indicatingconvolu- whereD is the set ofindicesofthe failuretimesand
tion.This amountsto estimating F by an equal mix- Ri is thesetofindicesofthoseat riskat timeyi.This
ture of the n distributions N2(xi,.252), thatis by a maximization requiresan iterative computer search.
normalwindowestimate.Each pointxi*in a smoothed The estimated forthesedata turnsoutto be 1.51.
bootstrapsampleis the sum of a randomlyselected Taken literally, thissaysthatthe hazardrateis e'5'
originaldata pointxj, plus an independent bivariate = 4.33 timeshigherin the controlgroupthanin the
-
normalpointzj N2(0,.252). Smoothing makes little treatment group,so the treatment is veryeffective.
difference on the leftside ofthe table, but is spectac- Whatis thestandarderroroffA? The usualasymptotic
ularlyeffective in the k case. The latterresultis maximum likelihoodtheory, one overthesquareroot
suspectsincethetruesamplingdistribution is bivar- oftheobservedFisherinformation, givesan estimate
iate normal,and the function q = tanh- O is specifi- of .41. Despitethe complicated natureoftheestima-
callychosento havenearlyconstantstandarderrorin tionprocedure, wecanalso estimate thestandarderror
the bivariatenormalfamily.The uniform smoothed using the bootstrap.We sample with replacement
bootstrapsamples from F (D W(0, .252), where fromthe triplesI(y', xi, 5k), *..*, (Y42, x42, 642)). For
WI(0,.252) is the uniform distribution on a rhombus each bootstrap sample $(y*, x*, 0), .., (y*, x4*2,
selectedso VIhas meanvector0 andcovariancematrix 6*)) we formthe partiallikelihoodand numerically
.25Z. It yieldsmoderatereductions in vMi-SR forboth maximizeit to producethe bootstrapestimateA*.A
sidesofthetable. histogram of1000bootstrap valuesis shownin Fig.3.
Line 6 ofTable 2 refersto thedeltamethod, which The bootstrapestimateof the standarderrorof A
is the mostcommonmethodof assigningnonpara- based on these 1000 numbersis .42. Althoughthe
metricstandarderror.Surprisingly enough,it is badly bootstrap and standardestimatesagree,it is interest-
biaseddownward on bothsidesofthetable.The delta ingto notethatthe bootstrapdistribution is skewed
method,also knownas the methodof statisticaldif- to the right.This leads us to ask: is thereother
ferentials,theTaylorseriesmethod, andtheinfinites- information thatwe can extractfromthe bootstrap
imaljackknife, is discussedin Section10. distribution otherthana standarderrorestimate?The
answeris yes-in particular, the bootstrapdistribu-
3. EXAMPLES tioncan be used to forma confidence intervalforfA,
as wewillsee in Section9. The shapeofthebootstrap
Example 1. Cox's ProportionalHazards Model
distributiion will help determinethe shape of the
In this sectionwe applybootstrapstandarderror confidence interval.
estimation statistics.
to somecomplicated In thisexampleourresampling unitwas thetriple
The data forthis examplecome froma studyof (yi,xi,bi),and we ignoredtheuniqueelementsofthe
leukemiaremissiontimesin mice,taken fromCox problem, and theparticularmodel
i.e.,the censoring,
(1972). They consistof measurements of remission beingused.In fact,thereare otherwaysto bootstrap
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 59
Noticetherelationoftheprojection pursuitregres-
200 sion modelto the standardlinearregression model.
Whenthe function sj(-) is forced to be linear and is
estimatedby the usual least squaresmethod,a one-
termprojection pursuitmodelis exactlythe same as
150
the standardlinearregression model.That is to say,
the fittedmodels'(a'1 xi) exactlyequals the least
squaresfita' + jxi,. This is because the least
f31
100 squaresfit,bydefinition, findsthebestdirection and
the best linearfunctionof that direction.Note also
thataddinganotherlinearterms 2(& * X2)wouldnot
changethe fittedmodelsincethe sum of two linear
50 functions is anotherlinearfunction.
Hastieand Tibshirani(1984) appliedthebootstrap
to thelinearand projection pursuitregression models
01 to assess the variability of the coefficients in each.
0.5 1 1.5 2 2.5 3
The datatheyconsidered aretakenfromBreimanand
FIG. 3. Histogram of 1000 bootstrapreplicationsfor the mouse Friedman(1985). The responseY is Upland atmos-
leukemiadata. phericozoneconcentration (ppm);the covariatesX1
= SandburgAir Force base temperature (CO),X2 =
thisproblem.We willsee thiswhenwe discussboot- inversion base height(ft),X3 = Daggotpressuregra-
censoreddata in Section5.
strapping dient(mmHg), X4 = visibility (miles),and X5 = day
oftheyear.Thereare 330 observations. The number
Example2: Linearand ProjectionPursuit ofterms(m) inthemodel(3.4) is takentobe two.The
Regression projectionpursuitalgorithm chosedirections al = (.80,
-.38, .37, -.24, -.14)' and 62 = (.07, .16, .04, -.05,
We illustratean applicationof the bootstrapto
-.98)'. These directionsconsistmostlyof Sandburg
standardlinearleast squaresregression
as wellas to
Air Force temperature and day of the year,respec-
a nonparametric regression
technique.
Considerthestandardregression setup.We haven
observationson a responseY and covariates(X1,X2,
***,X,). Denotetheithobservedvectorofcovariates
by xi = (xil, xi2, ... , xip)'. The usual linear regression
modelassumes a5 I
p
(3.3) E(Yi) = a + E /fl1xi.
j=l
a4 A. I
Friedmanand Stuetzle(1981) introduced
a moregen-
model
eralmodel,theprojectionpursuitregression
m
(3.4) E(Yi)= X sj(aj - xi). a3
j=l
on 157patients.A pro-
(1). Thereare measurements
portionalhazardsmodelwas fitto thesedata,witha
quadraticterm,i.e, h(t I x) = ho(t)elx+i32x. Both #,and
f2are highlysignificant;
thebrokencurvein Fig.6 is
/3x+ f2x2as a functionofx.
For comparison,Fig. 6 shows(solid line) another
a4 estimate.This was computedusing local likelihood
estimation(Tibshiraniand Hastie, 1984). Given a
hazardsmodeloftheformh(tIx)
generalproportional
= ho(t)es(x),the local likelihood technique assumes
a3
nothingabouttheparametric formofs(x); insteadit
estimatess(x) nonparametrically usinga kindoflocal
averaging. The algorithm is verycomputationally in-
a2 tensive, andstandardmaximum likelihoodtheory can-
notbe applied.
A comparisonof the two functions revealsan im-
portantqualitative difference:the parametric estimate
al~~ ~ ~~~~~~ - p p suggeststhatthe hazarddecreasessharplyup to age
34,thenrises;the local likelihoodestimatestaysap-
-1 -0.5 0 0.5 1 proximately constantup to age 45 thenrises.Has the
bootstrapped coefficients forcedfitting ofa quadraticfunction produceda mis-
leadingresult?To answerthisquestion,we can boot-
FIG. 5. Smoothedhistogramsofthebootstrapped
coefficients
forthe
second termin theprojectionpursuitmodel. strapthe local likelihoodestimate.We samplewith
replacement fromthe triplesI(Yl, x1, 61) ... (Y157i
X157,6157) and applythelocal likelihood algorithm to
tively.(We do notshowgraphsoftheestimatedfunc- eachbootstrapsample.Fig. 7 showsestimatedcurves
tionss(*(.) and s2(. ) althoughin a fullanalysisofthe from20 bootstrapsamples.
data theywouldalso be of interest.)Forcings'( *) to Someofthecurvesare flatup to age 45,othersare
be linearresultsin the directiona' = (.90,-.37, .03, decreasing.Hence the originallocal likelihoodesti-
-.14, -.19)'. These are just the usual least squares mateis highlyvariablein thisregionand on thebasis
estimatesi1, *.,* ,, Ascaled so that EP12 = 1. ofthesedata we cannotdetermine the truebehavior
To assess the variability of the directions,a boot- ofthefunction there.A lookbackat theoriginaldata
strapsampleis drawnwithreplacement from(Yi, x11, showsthatwhilehalfof thepatientswereunder45,
. . ., X15), * * *, (Y330, X3301, *-- , X3305)and theprojection only13% of the patientswereunder30. Fig. 7 also
pursuitalgorithm is applied.Figs.4 and 5 showhis- showsthattheestimateis stablenearthemiddleages
togramsofthedirections a'* and a* for200 bootstrap butunstablefortheolderpatients.
replications. Also shownin Fig.4 (brokenhistogram)
are the bootstrap replicationsof a& with s.(.) forced 3
to be linear.
The firstdirectionofthe projectionpursuitmodel
is quite stableand onlyslightlymorevariablethan
thecorresponding linearregression
direction.But the 2
seconddirectionis extremely unstable!It is clearly
unwiseto putanyfaithin theseconddirection ofthe
originalprojectionpursuitmodel. t0 //~~~~~~~~~~
/ /
Example 3: Cox's Model and Local Likelihood \L //
a /t
Estimation
In this example,we returnto Cox's proportional
hazardsmodeldescribedin Example1,butwitha few
addedtwists.
The data thatwe willdiscusscomefromtheStan-
fordhearttransplant program and are givenin Miller 10 20 30 40 50 60
and Halpern(1982). The responsey is survivaltime age
in weeksaftera hearttransplant, the covariatex is FIG. 6. Estimates of log relativeriskfor the Stanfordheart trans-
age at transplant,and the 0-1 variable3 indicates plant data. Broken curve: parametric estimate. Solid curve: local
whether thesurvivaltimeis censored(0) or complete likelihoodestimate.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 61
TABLE 3
4 BHCGbloodserumlevelsfor54 patientshavingmetasticized
breast
cancerin ascending order <
0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.8, 0.8, 0.9, 0.9, 1.3, 1.3, 1.4, 1.5, 1.6,
3 1.6, 1.7, 1.7, 1.7, 1.8, 2.0, 2.0, 2.2, 2.2, 2.2, 2.3, 2.3, 2.4, 2.4, 2.4,
2.4, 2.4, 2.4, 2.5, 2.5, 2.5, 2.7, 2.7, 2.8, 2.9, 2.9, 2.9, 3.0, 3.1, 3.1,
3.2, 3.2, 3.3, 3.3, 3.5, 4.4, 4.5, 6.4, 9.4
But whataboutbias?
The same 1000 bootstrapreplications whichgave
4. OTHER MEASURES OF STATISTICALERROR a
= .16 also gave0*(-) = 2.29,so
0=
dom,i ? 1.67W= [1.97,2.67]. Here-a= .21 is theusual independentrandomsample V*, - - -, V* fromG. With
estimateofstandarderror(1.3). onlythis modification, steps (i) through(iii) of the
Bootstrapconfidence arediscussedfurther
intervals MonteCarloalgorithm producefB, (2.4),approaching
in Sections7 and 8. They requiremorebootstrap aas B -- oo.
replications thando bootstrapstandarderrors, on the Table 4 reportson a simulationexperiment inves-
order of B = 1000 ratherthan B = 50 or 100. This tigating howwellthebootstrap worksonthisproblem.
pointis discussedbriefly in Section9. 100 trialsof situation(5.1) wererun,withm = 6,
By now it shouldbe clear that we can use any n = 9, F and G bothUniform[0, 1]. For each trial,
randomvariableR(y,F) to measureaccuracy, notjust bothB = 100andB = 200bootstrap replicationswere
(4.1) or (4.6), and thenestimateEFIR(y, F)} by its generated. The bootstrapestimateJB was nearlyun-
bootstrapvalue EpIR(y*,F1)}- b=1 R(y*(b),F)/B. biasedforthetruestandarderroru(F, G) = .167 for
Similarly we can estimateEFR(y,F)2 byEpR(y*,F)2, eitherB = 100orB = 200,witha quitesmallstandard
etc.Efron(1983) considerstheprediction problem,in deviation from trialtotrial.The improvement ingoing
whicha trainingset of data is used to constructa fromB = 100 to B = 200 is too smallto showup in
predictionrule. A naive estimateof the prediction thisexperiment.
rule'saccuracyis theproportion ofcorrectguessesit In practice,statisticiansmustoftenconsiderquite
makeson its owntrainingset,butthiscan be greatly complicated data structures:timeseriesmodels,mul-
overoptimistic sincethe prediction ruleis explicitly
constructed to minimize errorson thetrainingset.In TABLE 4
this case, a naturalchoice of R(y, F) is the over ofstandarderror
estimate
Bootstrap fortheHodges-Lehmann
optimism, the difference betweenthe naive estimate two-sample 100trials
shiftestimate;
and the actualsuccessrateoftheprediction rulefor forOB
statistics
Summary
newdata. Efron(1983) givesthe bootstrapestimate
Ave SD CV
ofoveroptimism, and showsthatit is closelyrelated
to cross-validation, the usual methodof estimating B = 100 .165 .030 .18
overoptimism. The papergoeson to showthatsome B = 200 .166 .031 .19
True o .167
modifications of the bootstrapestimategreatlyout
perform bothcross-validation and thebootstrap. Note: m = 6, n = 9; truedistributionsF and G both uniform[0, 1].
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 63
bootstrapexpectationof R(y*, P). The double arrow indicates the In the case of ordinaryleast squares regression,
crucialstep in applyingthe bootstrap. where g(/, ti) = /'ti and D(y, g) = = (y1-
64 B. EFRON AND R. TIBSHIRANI
2
spot data, model(6.2).
TABLE 6
Four methodsofsettingapproximateconfidenceintervalsfora real valuedparameter6
Note: Each methodis correctundermoregeneralassumptionsthan its predecessor.Methods 2, 3, and 4 are definedin termsofthe percentiles
of G, the bootstrapdistribution(7.1).
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 69
average of OBC[.05]/0over 40 such data sets,B = 4000 The situationis quitedifferent bootstrap
forsetting
bootstrapreplications per data set. The upperlimit confidenceintervals. ofEfron(1984),
The calculations
small,as pointedoutbySchenker
1.68. 0 is noticeably Section8, showthatB = 1000 is a roughminimum
(1985).The reasonis simple:thenonparametric boot- forthe numberof MonteCarlobootstrapsnecessary
strapdistribution of 0* has a shortuppertail; com- to compute the BC or BCa intervals. Somewhat
paredto theparametric bootstrapdistributionwhich smallervalues,sayB = 250,can givea usefulpercen-
is a scaledx29 randomvariable.The resultsofBeran beingthatthenthe con-
tile interval,the difference
(1984),BickelandFreedman(1981),and Singh(1981) stantzo neednotbe computed.Confidence intervals
showthatthenonparametric bootstrapdistribution
is are a fundamentally moreambitiousmeasureof sta-
highlyaccurateasymptotically, but of coursethatis tisticalaccuracythan standarderrors,so it is not
nota guaranteeofgoodsmallsamplebehavior.Boot- surprising thattheyrequiremorecomputational ef-
strapping froma smoothedversionofF, as in lines3, fort.
4, and 5 of Table 2 alleviatesthe problemin this
particular example. 10. THE JACKKNIFEAND THE DELTA METHOD
x3
1/27 (10.6) p(i) - (1, 1, .., 1, , 1, ..., 1)
p*-(l/3,O,2/3)
i = 1, 2, *.. , n. Fig. 15 indicates the jackknifepoints
/9* 1/9
forn = 3; because0 is the functional
form,(10.4),it
does notmatterthatthejackknifepointscorrespond
P(2) P(1) to sample size n - 1 ratherthan n.
p The linear functionOj(p) is calculated to be
1/9 2/9 1/9 (10.7) &J(p) = 0(i) + (P - p0) U
where,in termsof 0(i) 0(p(i), 0(.) = XO=10(i)/n,and
U is thevectorwithithcoordinate
(10.8) Ui = (n - 1)(0(.) -0(i))
1/27 1/9 p 1/9 1/27 Thejackknife estimateofstandarderror(Tukey,1958;
Xxi (3) X2 Miller,1974)is
FIG. 15. The bootstrapand jackknifesamplingpoints in the case ___ ]~~~1/2[ U211/2
n = 3. The bootstrappoints (.) are shown withtheirprobabilities. (10.9) .0 [n E {-(i) -())2J
0nn- =
A standardmultinomial
calculationgivesthe follow-
empiricaldistribution ingtheorem(Efron,1982a),
Pion xi i = 1, ***,n.
(10.3) F(p): probability
THEOREM. The jackknifeestimateof standarder-
Forp = p0= 1/n,theweighted empiricaldistribution rorequals [n/(n - 1)]1/2 timesthebootstrapestimateof
equalsF, (1.4). standarderrorforOj,
Corresponding to p is a resampledvalueof0, F
inI
1~~~~~1/2
infinitesimal
jackknifeestimateofstandarderror namesforthe same method.Noticethatthe results
(10.12) ^ =-[var reported in line7 ofTable 2 showa"severedownward
*T(p*)Il/2=[=U02/n ]
bias. Efronand Stein (1981) showthatthe ordinary
withvar*stillindicating
varianceunder(10.2). The jackknifeis alwaysbiased upward,in a sense made
jackknifecan be thoughtof as takinge =
ordinary precisein that paper. In the authors'opinionthe
-1/(n - 1) in the definitionof U?, while the infini- ordinary jackknifeis themethodofchoiceifone does
tesimaljackknifeletse -*0, thereby
earningthename. notwantto do thebootstrapcomputations.
The U? are valuesofwhatMallows(1974) callsthe
empiricalinfluencefunction.Their definitionis a ACKNOWLEDGMENT
nonparametric estimateofthetrueinfluence function This paperis based on a previousreviewarticleap-
pearingin Behaviormetrika. The authorsand Editor
IF(x) = lim 0((1 - )F + ek) - 0(F) aregrateful to thatjournalforgraciously
allowingthis
e--O e
revisionto appearhere.
Axbeingthe degeneratedistribution puttingmass 1
on x. The rightside of (10.12) is thenthe obvious REFERENCES
estimateof the influencefunctionapproximation ANDERSON, 0. D. (1975). Time Series Analysis and Forecasting:
The Box-JenkinsApproach.Butterworth,
London.
to the standarderrorof 0 (Hampel,1974), v(F)-
BAHADUR, R. and SAVAGE, L. (1956).The nonexistence
ofcertain
[ IF2(x) dF(x)/n]1/2.The empirical influencefunction statistical
procedures
in nonparametric problems.
Ann.Math.
methodand the infinitesimal jackknifegiveidentical Statist.27 1115-1122.
estimatesofstandarderror. BERAN, R. (1984). Bootstrapmethodsin statistics.
Jahrb. Math.
How have statisticiansgottenalong forso many Ver.86 14-30.
BICKEL, P. J.andFREEDMAN, D. A. (1981).Someasymptotic theory
years withoutmethodslike the jackknifeand the
forthebootstrap. Ann.Statist.9 1196-1217.
bootstrap?The answeris the deltamethod,whichis Box, G. E. P. and Cox, D. R. (1964).An analysisoftransforma-
stillthemostcommonly useddeviceforapproximating tions.J. R. Statist.Soc. Ser. B 26 211-252.
standarderrors.The methodappliesto statisticsof BREIMAN, L. and FRIEDMAN, J. H. (1985). Estimatingoptimal
the formt(Q1, Q2, . , QA), wheret(-, ., -. ., .) is a for multipleregressionand correlation.
transformations J.
knownfunction and each Qa is an observedaverage, Amer.Statist.Assoc. 80 580-619.
Cox, D. R. (1972).Regression
modelsand lifetables.J. R. Statist.
Qa = X7n=1Qa(Xi)/n. For example, the correlation0 is Soc. Ser. B 34 187-202.
a functionofA = 5 suchaverages;theaverageofthe CRAMER, H. (1946). MathematicalMethods of Statistics.Princeton
firstcoordinatevalues,the second coordinates, the University NewJersey.
Press,Princeton,
firstcoordinatessquared,the second coordinates EFRON, B. (1979a).Bootstrapmethods:anotherlookat thejack-
knife.Ann. Statist. 7 1-26.
squared,and thecross-products.
EFRON, B. (1979b).Computers
andthetheory
ofstatistics:
thinking
In itsnonparametric formulation, thedeltamethod the unthinkable.Soc. Ind. Appl. Math. 21 460-480.
worksby (a) expandingt in a linearTaylor series EFRON, B. (1981a). Censoreddata and the bootstrap.J. Amer.
aboutthe expectations of the Qa; (b) evaluatingthe Statist.Assoc. 76 312-319.
standarderrorof the Taylorseriesusingthe usual EFRON, B. (1981b).Nonparametric ofstandarderror:the
estimates
expressions forvariancesand covariancesofaverages; jackknife,thebootstrap,andotherresamplingmethods,
Biom-
etrika68 589-599.
and (c) substituting -y(F) forany unknownquantity EFRON, B. (1982a).The jackknife,
thebootstrap,
and otherresam-
'y(F) occurringin (b). Forexample,thenonparametric pling plans. Soc. Ind. Appl. Math. CBMS-Natl. Sci. Found.
deltamethodestimatesthestandarderrorofthecor- Monogr.38.
relation0 by EFRON, B. (1982b).Transformationtheory:hownormalis a one
parameterfamilyofdistributions?
Ann.Statist.10 323-339.
[ A240
+ 04 + 222 4,22 4231 413 / EFRON, B. (1982c).Maximum likelihood
anddecisiontheory.
Ann.
,-;- A 2
+
A
-
+A A Statist. 10 340-356.
I4n
fL20 L02 L20L02 A11 A11/02 Al11L02 J
EFRON, B. (1983).Estimating
the errorrateof a prediction
rule:
improvementsin cross-validation.J. Amer. Statist.Assoc. 78
where,in termsof xi = (yi, zi),
316-331.
-Lg z (Y -) g -)h EFRON, B. (1984).Betterbootstrapconfidenceintervals.
Tech.Rep.
Stanford Univ.Dept.Statist.
(Cramer(1946),p. 359). EFRON, B. (1985). Bootstrapconfidence intervalsfora class of
parametricproblems. Biometrika 72 45-58.
EFRON, B. and GONG, G. (1983).A leisurelylookat thebootstrap,
THEOREM. For statisticsoftheform0 = t(Q1,
l.., the jackknife,and cross-validation. Amer.Statistician37
QA), the nonparametricdelta methodand the infinites- 36-48.
imaljackknifegive thesame estimateofstandarderror EFRON, B. andSTEIN, C. (1981).Thejackknifeestimateofvariance.
(Efron,1982c). Ann. Statist.9 586-596.
FIELLER, in interval
E. C. (1954).Someproblems estimation.
J. R.
Statist.Soc. Ser. B 16 175-183.
The infinitesimal
jackknife,
the deltamethod,and FRIEDMAN, J. H. and STUETZLE, W. (1981). Projectionpursuit
theempiricalinfluencefunctionapproachare three regression.J. Amer.Statist.Assoc. 76 817-823.
BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 75
Comment
J.A. Hartigan