Professional Documents
Culture Documents
Agresti - Caffo - 2000 Modificado PDF
Agresti - Caffo - 2000 Modificado PDF
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Statistician.
http://www.jstor.org
Teacher's Corner
Simple and EffectiveConfidenceIntervalsforProportions
and Differencesof ProportionsResult fromAdding Two
Successes and Two Failures
Alan AGRESTI and Brian CAFFO
* An approximate100(1 - cv)%confidenceintervalfor
P1 - P2 iS
The standardconfidenceintervalsforproportions and their
used in introductory
differences statisticscourseshave poor P
performance, the actual coverage probabilityoftenbeing ( -1-32) + Z/2 Pl(I +) P2 (-P2) (2)
muchlowerthanintended.However,simpleadjustments of
theseintervalsbased on addingfourpseudo observations, These confidenceintervalsresult from invertinglarge-
halfof each type,performsurprisingly well even forsmall sample Wald tests, which evaluate standard errors at
samples.To illustrate, fora broadvarietyof parameterset- the maximumlikelihoodestimates.For instance,the in-
tingswith 10 observationsin each sample,a nominal95% terval for p is the set of
po values for which IP -
intervalfor the differenceof proportionshas actual cov-
PoI/ (l - p3)/n < Z./2; that is, the set of po having P
erage probabilitybelow .93 in 88% of the cases withthe value exceedingaein testingHo: p =
po againstHa p :&PO
standardintervalbutin only 1% withtheadjustedinterval; usingtheapproximately normalteststatistic.The intervals
the mean distancebetweenthe nominaland actual cover- are sometimescalled Waldintervals.Althoughtheseinter-
age probabilitiesis .06 forthestandardinterval,but .01 for vals are simple and naturalfor studentswho have previ-
theadjustedone. In teachingwiththeseadjustedintervals, ously seen analogous large-sampleformulasfor means,a
one can bypassawkwardsamplesize guidelinesand use the considerableliterature showsthattheybehavepoorly(e.g.,
same formulaswithsmall and large samples. Ghosh 1979; Vollset1993; Newcombe 1998a, 1998b). This
KEY WORDS: Binomial distribution;
Score test; Small can be trueevenwhenthesamplesize is verylarge(Brown,
sample;Wald test. Cai, and DasGupta 1999). In thisarticle,we describesim-
ple adjustments of theseintervalsthatperformmuchbetter
but can be easily taughtin the typicalnon-calculus-based
statisticscourse.
1. INTRODUCTION These referencesshowed thata muchbetterconfidence
intervalfor a single proportionis based on invertingthe
Let X denote a binomial variatefor n trialswith pa- test with standarderrorevaluatedat the null hypothesis,
rameterp, denotedbin(n,p), and let p = X/n denotethe whichis the score testapproach.This confidenceinterval,
sample proportion.For two independentsamples,let X1 due to Wilson (1927), is the set of po values for which
be bin(nl,pl), and let X2 be bin(n2,p2).Let Za denotethe P - Po | Po(i - Po)b2 <
Za/2, whichis
1- a quantileof thestandardnormaldistribution. Nearlyall
elementary statisticstextbookspresentthefollowingconfi- Z /2
(
_
n_
_
dence intervalsforp and P1 - P2: n(
1 ' 2/2) n+ z/ 2)
.00 -. 85 - 8
j0 ~~
~~~~p ~~~~~
70p 7p
0 .2 .4 .6 . 1 2 4 6 1 0 2 4 6 8 1
.95 - .95 -
,95
.80 - 80 - 80
.75- 7 5
afteraddingz.025= 1.962 4 pseudo observations, two of p of p and 1/2 ratherthan the weightedaverage of the
each type.That is, theiradjusted"add two successes and variances;by Jensen'sinequality,the adjusted intervalis
two failures"intervalhas the simpleform widerthanthe score interval.
For small samples,the improvement in performance of
theadjustedintervalcomparedto theordinary Wald interval
iZ. 025 V/P( - p/t (3) is dramatic.To illustrate, Figure 1 shows theactual cover-
but withn = (n + 4) trialsand p (X + 2)/(n + 4). The -
age probabilities for the nominal 95% Wald and adjusted
midpointequals thatof the 95% score confidenceinterval intervals plotted as a function of p, forn = 5, 10, and 20.
(roundingZ.025to 2.0 forthatinterval), butthecoefficient of For all n great improvement occurs forp near 0 or 1. For
Z.025 uses the variancep(l - p)/niat the weightedaverage
instance, Brown et al. (1999) stated that whenp = .01, the
size of n requiredsuch thattheactualcoverageprobability
of a nominal95% Wald intervalis uniformly at least .94
Coverage Probability
forall n above thatvalue is n = 7963, whereasforthead-
1.0
justed intervalthis is truefor everyn; when p .10 the
values are n = 646 forthe Wald intervaland n 11 for
.8 theadjustedinterval.The Wald intervalbehavesespecially
poorlywithsmalln forp neartheboundary, partlybecause
of the nonnegligibleprobabilityof havingp = 0 or 1 and
.6 thus the degenerateinterval[0, 0] or [1, 1]. Agrestiand
Coull (1998) recommendedtheadjustedintervalforuse in
elementarystatisticscourses,since the Wald intervalbe-
.4
haves poorlyyetthescore intervalis too complexformost
students.Many studentsin non-calculus-based coursesare
mystified by quadratic equations (which are neededto solve
.2
forthe score interval)and would have difficulty using the
0 2 4 6 8 weightedaverage formula above. In such courses, it is of-
teneasier to show how to adapt a simplemethodso thatit
t Pseudo Observations workswell ratherthanto presenta morecomplexmethod.
Let It (n,x) denotethe adjustmentof the Wald interval
Figure 2. Boxplots of coverage probabilitiesfornominal95% ad- thatadds t/2 successes and t/2 failures.With confidence
justed confidenceintervalsbased on adding t pseudo observations;dis-
tributions
-
referto 10,000 cases, withn1 and n2 each chosen uniformly levels (1 ca)otherthan.95, theAgrestiand
Coull approx-
between 10 and 30 and p 1 and p2 chosen uniformly between 0 and 1. imationof the score intervaluses It (n, x) with t = z2
Cov. Prob. < .93 10 .880 .090 .010 .100 .235 .072 .046
NOTE: Table reportsmean of coverage probabilitiesCt(n,pl; n,p2), mean of distances Ct(n,pi; n,p2) - .951 fromnominallevel,mean of expected intervallengths,and proportionof cases
with Ct(n,p1; n,p2) <.93.
insteadof t = 4, for instanceadding2.7 pseudo observa- tially afteradding a pseudo observationof each type to
tionsfora 90% intervaland 5.4 fora 99% interval.Many each sample, regardingsample i as (nm+ 2) trials with
instructorsin elementary courseswill findit simplerto tell Pi = (Xi + 1)/(mn+ 2). There is no reason to expect an
studentsto use the same constantfor all cases. One will optimalintervalto resultfromthismethod,or in particu-
do reasonablywell, especiallyat high nominalconfidence lar fromaddingthe same numberof pseudo observations
levels, by the recipe of always using t = 4. The perfor- to each sample or even the same numberof cases of each
manceof theadjustedinterval14(n,xc)is muchbetterthan type,butwe restricted attention to thisformbecause of the
the Wald interval(1) for the usual confidencelevels. To simplicityof explainingit in a classroomsetting.
illustrate,Figure 1 also shows coverage probabilitiesfor
nominal99% intervals,when in = 5, 10, 20. Since the .95
confidencelevel is the mostcommonin practiceand since 2. COMPARING PERFORMANCE OF WALD
this"add two successes and two failures"adjustmentpro- INTERVALS AND ADJUSTED INTERVALS
vides strongimprovement over the Wald for otherlevels For the two-samplecomparisonof proportions, we now
as well,it is simplestforelementary coursesto recommend studytheperformance of theWald confidenceformula(2)
thatadjustment uniformly.Of theelementary textsthatrec- afteraddingt pseudo observations, t/4of each typeto each
ommendadjustmentof theWald intervalby addingpseudo sample,truncating whentheintervalforP1 -P2 containsval-
observations,some (e.g., McClave and Sincich 2000) di- ues < -1 or > 1. Denote thisintervalby It (n1, x1; n2, X2),
rect studentsto use 14(n,c) regardlessof the confidence or It for short,so 1o denotesthe ordinaryWald interval.
coefficient whereasothers(e.g., Samuels and Witmer1999) Our discussionrefersmainlyto the .95 confidencecoef-
recommendt = z2 ficient,but our evaluationsalso studied.90 and .99 coef-
The purposeof thisarticleis to show thata simplead- ficients.Let Ct(nm,pi;n2,P2), or Ct for short,denotethe
justment,adding two successes and two failures(total), truecoverageprobabilityof a nominal95% confidencein-
also worksquite well fortwo-samplecomparisonsof pro- tervalIt. We investigatedwhetherthereis a t value for
portions.The simpleWald formula(2) improvessubstan- which ICt((nl,pl;n2,P2) - .951tendsto be small formost
282 Teacher'sCornier
ProportionBelow .93 ProportionBelow .93
1 1
.8 .8
.6 .6
nl = n2 = 10 nl = 30 n2 = 10
.4 .4
.2 .2
0 _ _ _ _ _ _ _ _ _0_ _ _ _ _ _ _ _
l l l l l l l l l I
0 2 4 6 8 0 2 4 6 8
Figure3. Proportionof (p1, p2) cases withp1 and p2 chosen uniformly between 0 and 1 forwhichnominal95% adjusted confidenceintervals
based on adding t pseudo observationshave actual coverage probabilitiesbelow ,93, forn1 = n2 = 10 and n 1 = 30, n2 = 10.
(P1, P2),evenwithsmall nr and n2, withCt rarelyveryfar The ordinary95% Wald intervalbehavespoorly.Its cov-
(say .02) below .95. To exploretheperformance fora vari- erageprobabilitiestendto be too small,and theyconverge
etyof t withsmallnT, we randomlysampled10,000values to 0 as each pi moves toward 1 or 0. The coveragesfor
of (ni, P1; n2,P2), takingP1 and P2 independently froma It improvegreatlyfor the positivevalues of t. The case
14 withfourpseudo observationsbehaves especiallywell,
uniformdistribution over [0,1] and takingn, and n2 inde-
havingrelativelyfew poor coverageprobabilities.For in-
pendently froma uniform distribution over{10, 11,.. ., 30}.
stance,theproportion of cases fort = (0, 2, 4, 6, 8) thathad
For each realizationwe evaluatedCt(ni, P1; n2, P2) fort be- < .93 were (.572, .026, .002, .046, .171). Similarly,the
Ct
tween0 and 8. Figure2 illustratesresults,showingskeletal proportionof nominal99% intervalsthathad actual cover-
box plotsof Ct fort = 0, 2,4, 6, 8 (i.e., adding0, .5, 1, 1.5, age probabilitybelow .97 were(.310, .012, .000, .000, .000),
2 observationsof each typeto each sample). and the proportionof nominal90% intervalsthathad ac-
.90 - .90 0
90
------- Wald
Adjusted
95 7 m , 50 95
14,u
.J95
.90 ' ', ' '90 - .90'P"'"'"'e"' ''""''"'v\'"- ""'n"'""'''r'"G '"
.85 ", 85 8
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
tual coverageprobabilitybelow .88 were (.623, .045, .016, (nm,n2) = (10, 10), (20, 10), and (40, 10). Figure6 showsC0o
.131, .255). The patternexhibitedhere is illustrativeof a and C4 as a functionofP1 whenP1 -P2 = 0 or .2 and when
varietyof resultsfromanalyzingCt more closely,as we the relativeriskP1/P2 = 2.0 or 4.0, when ni = n2 = 10.
now discuss. In Figures4-6, onlyrarelydoes the adjustedintervalhave
We analyzed the performanceof the It intervalfor coveragesignificantly below thenominallevel.On theother
various fixed (nl, n2) combinations.Table 1 summarizes hand,Figures4 and 6 showthatit can be veryconservative
some characteristics, in an average sense based on tak- when P1 and P2 are both close to 0 or 1, say with (P1 +P2)/2
ing (P1, P2) uniformfromthe unit square, for (n1, n2) = below about.2 or above about.8 forthesmall samplesizes
(10,10), (20, 20), (30,30), (30,10). Although the adjusted studiedhere. This is preferred, however,to the verylow
interval14 tends to be conservative,it compareswell to coverages of the Wald interval in these cases. Figures 7
othercases in themean of thedistancesICt - .951and es- and 8 illustratetheirbehavior,showing surfaceplotsof C0o
and C4 overtheunitsquarewhenni = n2 = 10. The spikes
peciallytheproportion of cases forwhichCt < .93. For n.
10, for instance,theactualcoverageprobability is below at values of pi in Figures4 and 5 become ridgesat values
in
.93 for 88% of such cases withthe Wald interval,but for of P1 P2 thesefigures.
-
The poor performance of theWald intervaldoes not oc-
only 1% of themwith 14. Figure 3 shows the proportions
cur because it is too short.In fact,for moderate-sizedpi
of coverageprobabilitiesthatare below .93 as a function
it tendsto be too long. For instance,when nr = 12 = 10,
of t, for(n1, n2) = (10, 10) and (30, 10). The improvement
Io has greaterexpectedlengththan14 forP2 between.11
over theordinaryWald intervalfromaddingt = 4 pseudo
and .89 when P1 = .5 and for P2 between .18 and .82
observationsis substantial.Remainingfiguresconcentrate
when P1 = .3. When n, = n2 = n and when Pi =
on thisparticularadjustment, whichfaredwell in a variety
P2 = P, Io has greaterlengththanIt when p falls within
of evaluationswe conducted.
/.25 - n(4n + t)/[24n2 + 12nt+ 2t2] of .5. For all t > 0,
Averagingperformance over theunitsquare for (P1, P2) thisintervalaround.5 shrinksmonotonically as n increases
can mask poor behaviorin certainregions,and in practice to .50 i
.50/v3, or (.21,.79), which applies also to the
certainpairings(e.g., JP1- P21 small) are oftenmorecom- Agrestiand Coull (1998) adjustedintervalin the single-
mon or moreimportant thanothers.Thus,besides studying samplecase. As in thesingle-proportion case, theWald in-
these summaryexpectations,we plottedCt as a function tervalsuffers fromhavingthemaximumlikelihoodestimate
of P1 for variousfixedvalues of P2, P1 - P2, and P1/P2. exactlyin themiddleof theinterval.
To illustrate,Figure4 plots the Wald coverageCo and the Thereis nothinguniqueaboutt = 4 pseudo observations
coverage C4 for the adjustedinterval,fixingP2 at .1, .3, in providinggood performance of adjustedintervalsin the
and .5, for ni = n2 = 20. The poor coverage spikes for one- and two-sampleproblems.For instance,Figure 3 and
the Wald intervaldisappear with 14, but this adjustment Table 1 showthatotheradjustments oftenworkwell. A re-
is quite conservativewhen P1 and P2 are both close to 0 gion of t values providesubstantialimprovement over the
or bothclose to 1. The adjustment14 performsreasonably Wald interval,withvalues near t = 2 being less conserva-
well,and muchbetterthantheWald interval, evenwithvery tive thant = 4. We emphasizedthe case t = 4 earlierfor
smallor unbalancedsamplesizes. Figure5 illustrates, plot- the two-samplecase because it rarelyhas poor coverage.
tingCo and C4 as a functionof P1 withP2 fixedat .3, for We believeit is worthpermitting some conservativeness to
284 Teacher'sCorner
Coverage Probability P1 -P2=0 Coverage Probability P1 -P2=.2
1.0 l 1.0
.9g- .9 - ."- - -
.8 - ..8
Wald
Adjusted .7
.6 p1 .6 p1
0 .2 .4 .6 .8 1 .2 .4 .6 .8
.8 .8 .'
.7 -.7 -
l~~~~~~~~p ,'
.6 - l
I p1 .6 - l
I p1
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
Figure6. Coverage probabilitiesfornominal95% Wald and adjusted confidenceintervals(adding t = 4 pseudo observations)as a function
of
p1 whenp1-p2 = 0 or .2 and whenp1/p2 = 2 or 4, forn1 = n2 = 10.
CoverageProbability
CoverageProbability
95
.9
.7
.7
p2
286 Teacher'sCorner
withthe intervalforP1 - P2 extendingbelow -1 or above Finally,an alternativeway to improvethe Wald method
+ 1 and thusrequiringtruncation. OvershootforIt is less is witha continuity correction(Fleiss 1981,p. 29). As with
commonas t increases.For instance,for these randomly othercontinuity corrections,this generallyresultsin con-
selectedcases, the mean probabilityof overshootfort servativeperformance, usually more so thanthe adjusted
(0, 2, 4, 6, 8) was (.048, .033, .016, .006, .000). interval.However,the coverageprobabilities, like those of
Since standardintervalsforp andP1 -P2 improvegreatly the Wald interval,can dip substantially below the nominal
withadjustmentcorresponding to shrinkageof pointesti- level whenbothpi are near0 or 1.
mates,one wouldexpectintervalsresultingfroma Bayesian
approachwithcomparableshrinkagealso to performwell
in a frequentist sense. Carlin and Louis (1996, pp. 117-
123) providedevidence of this typefor estimatingp. For 4. TEACHING THE ADJUSTED INTERVALS
P1 - P2, considerindependentuniformpriordistributions Agrestiand Coull (1998) motivated theiradjustedinterval
forP1 and P2. The posteriordistribution of pi is beta with forthe
(3) fora singleproportion as a simpleapproximation
meanPi = (Xi + 1) j (ni + 2) and variancePi (I -Pi)/ (ni + 3). score 95% confidenceinterval.We know of no such sim-
Using a crudenormalapproximation forthedistribution of
ple motivationforthe adjustedintervalforthetwo-sample
thedifference of theposteriorbeta variatesleads to thein-
comparison,otherthanthesimilarity withtheBayesian in-
terval
terval(4). A problemforfutureresearchis to studywhether
theoreticalsupportexists for this simple yet effectivead-
justment,suchas Edgeworthor saddlepointexpansionsthat
(P1-P2)?Za/2 i1(l-i3) + P2(l-P2) (4) mightprovideimprovedapproximations forthetail behav-
ior of Pl - P2-
The motivationneeded for teachingin the elementary
statistics course is quite different. How can one motivate
This has the same centeras the adjusted interval14 but
uses ni + 3 insteadof ni + 2 in the denominatorsof the adding pseudo observations? In the single-samplecase we
standarderror.For elementarycourses,this intervalwas remind students that the binomial distributionis highly
suggestedby Berry(1996, p. 291). Like Newcombe's hy- skewed as p approaches 0 and 1, and because ofthisperhaps
brid score interval,it tends to performquite well, being p should not be the midpoint of the interval. As supportfor
slightlyshorterand less conservativethan14 but suffering this, we have students use the software ExplorStat (available
occasionalpoorercoverages(see Table 1). For sample size at http://www.stat.ufl.edu/-dwack/). Through simulation
combinationswe considered,its minimumcoverageproba- it showshow operatingcharacteristics of statisticalmethods
bilitywas onlyslightlybelow thatfortheadjustedinterval. change as students vary sample sizes and populationdistri-
If conservativeness is a concern(e.g., if bothpi are likely butions. For instance, when p takes values such as .10 or
to be close to 0), the approximateBayes and hybridscore .90, students observe a relatively high proportion of Wald
intervalsare slightlypreferableto 14. intervals failing to contain p when n is 30, the sample size
The adjusted interval14 (and the similar approximate their text suggests is adequate for large-sample inference
Bayes interval(4)) is simplerthanothermethodsthatim- fora mean.
prove greatlyover the Wald interval.Thus, we believe it Most students,however,seem more convincedby spe-
is appropriateforelementarystatisticscourses.We do not cific exampleswheretheWald methodseems nonsensical,
claim optimalityin any sense or thatothermethodsmay such as whenp = 0 or 1. We oftenuse data froma ques-
notbe betterforsome purposes.Some applications,forin- tionnaire administered to the studentsat the beginningof
stance,may requirethatthe true confidencelevel be no term. For instance, one of us (Agresti)taughta class to 24
lower thanthe nominallevel, mandatinga methodthatis honors students in fall 1999. In responseto the question,
necessarilyconservative (e.g.,Chan and Zhang 1999). Also, "Are you a vegetarian?",0 of the 24 studentsresponded
we recommend14 forintervalestimationand notforan im- "yes,"yettheyrealizedthattheWald intervalof [0, 0] was
plicittestof Ho: P1 - P2 = 0, althoughsuch a testwould notplausiblefora corresponding populationproportion. We
be morereliablethanone based on theWald interval.For have also used homeworkexercisessuch as estimatingthe
a significance test,we would continueto teachthePearson probability of success fora new medicaltreatment whenall
chi-squaredtestin elementarycourses. The testbased on 10 subjectsin a sample experiencesuccess, or estimating
14 is too conservativewhen the commonvalue of pi un- theprobability of deathdue to suicidewhena sampleof 30
der the null is close to 0 or close to 1, for most sample deathrecordshas no occurrences.(Again,theWald interval
sizes more conservativethanthe Pearson testfor such pi. is [0, 0], but the National Centerfor Health Statisticsre-
Althoughthe adjustedintervalis notguaranteedto be con- portsthatin theUnitedStatestheprobabilityof deathdue
sistentwiththe resultof the Pearson test,it usually does to suicideis about .01.) Althoughone can amendtheWald
agree.For instance,forcommonvalues (.1, .2, .3, .4, .5) of methodto improveits behaviorwhenp 0 or 1, such as
Pi, the95% versionof 14 and thePeareson testwithnominal by reeplacing the endpointsby ones based on the exact bi-
significancelevel of .05 agree withprobability(.972, .996, nomialtest,makingsuch exceptionsfroma generalrecipe
.9996, 1.000, 1.000) whennl = 2=30 and (1.0, 1.0, 1.0, distractsstudentsfreom themainidea of takingtheestimate
1.0, 1.0) whennl = 2=10. plus and minusa normal-score multipleof a standarderror.
288 Teacher'sCoriier