You are on page 1of 10

Simple and Effective Confidence Intervals for Proportions and Differences of Proportions

Result from Adding Two Successes and Two Failures


Author(s): Alan Agresti and Brian Caffo
Reviewed work(s):
Source: The American Statistician, Vol. 54, No. 4 (Nov., 2000), pp. 280-288
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2685779 .
Accessed: 08/03/2012 19:06

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Statistician.

http://www.jstor.org
Teacher's Corner
Simple and EffectiveConfidenceIntervalsforProportions
and Differencesof ProportionsResult fromAdding Two
Successes and Two Failures
Alan AGRESTI and Brian CAFFO
* An approximate100(1 - cv)%confidenceintervalfor
P1 - P2 iS
The standardconfidenceintervalsforproportions and their
used in introductory
differences statisticscourseshave poor P
performance, the actual coverage probabilityoftenbeing ( -1-32) + Z/2 Pl(I +) P2 (-P2) (2)
muchlowerthanintended.However,simpleadjustments of
theseintervalsbased on addingfourpseudo observations, These confidenceintervalsresult from invertinglarge-
halfof each type,performsurprisingly well even forsmall sample Wald tests, which evaluate standard errors at
samples.To illustrate, fora broadvarietyof parameterset- the maximumlikelihoodestimates.For instance,the in-
tingswith 10 observationsin each sample,a nominal95% terval for p is the set of
po values for which IP -
intervalfor the differenceof proportionshas actual cov-
PoI/ (l - p3)/n < Z./2; that is, the set of po having P
erage probabilitybelow .93 in 88% of the cases withthe value exceedingaein testingHo: p =
po againstHa p :&PO
standardintervalbutin only 1% withtheadjustedinterval; usingtheapproximately normalteststatistic.The intervals
the mean distancebetweenthe nominaland actual cover- are sometimescalled Waldintervals.Althoughtheseinter-
age probabilitiesis .06 forthestandardinterval,but .01 for vals are simple and naturalfor studentswho have previ-
theadjustedone. In teachingwiththeseadjustedintervals, ously seen analogous large-sampleformulasfor means,a
one can bypassawkwardsamplesize guidelinesand use the considerableliterature showsthattheybehavepoorly(e.g.,
same formulaswithsmall and large samples. Ghosh 1979; Vollset1993; Newcombe 1998a, 1998b). This
KEY WORDS: Binomial distribution;
Score test; Small can be trueevenwhenthesamplesize is verylarge(Brown,
sample;Wald test. Cai, and DasGupta 1999). In thisarticle,we describesim-
ple adjustments of theseintervalsthatperformmuchbetter
but can be easily taughtin the typicalnon-calculus-based
statisticscourse.
1. INTRODUCTION These referencesshowed thata muchbetterconfidence
intervalfor a single proportionis based on invertingthe
Let X denote a binomial variatefor n trialswith pa- test with standarderrorevaluatedat the null hypothesis,
rameterp, denotedbin(n,p), and let p = X/n denotethe whichis the score testapproach.This confidenceinterval,
sample proportion.For two independentsamples,let X1 due to Wilson (1927), is the set of po values for which
be bin(nl,pl), and let X2 be bin(n2,p2).Let Za denotethe P - Po | Po(i - Po)b2 <
Za/2, whichis
1- a quantileof thestandardnormaldistribution. Nearlyall
elementary statisticstextbookspresentthefollowingconfi- Z /2

(
_
n_
_
dence intervalsforp and P1 - P2: n(
1 ' 2/2) n+ z/ 2)

* An approximate100(1 - c)% confidenceintervalfor a/+Zc,/2 [/2) ( 2?>/) + (2) G) (t22+z/ )]


p is
The midpointis a weightedaverage of p and 1/2, and it
equals the sampleproportionafteraddingZ /2 pseudo ob-
servations,halfof each type.The square of the coefficient
.ofZa/2 in thisformulais a weightedaverageof thevariance
Alan Agrestiis Professor,
andBrianCaffois a GraduateStudent,Depart- of a sample proportionwhenp = p and the varianceof a
mentof Statistics,Universityof Florida,Gainesville,FL 32611-8545 (E- sample proportionwhenp = 1/2,using n + z 2 in place
mail:AA@STAT.UFL.EDU). This workwas partiallysupportedby grants
fromthe National Institutesof Health and the National Science Foun-
of theusual sample size n. For the 95% case, Agrestiand
dation.The authorsappreciatehelpfulcommentsfromBrentCoull and Coull (1998) used thisrepresentation to motivateapproxi-
YongyiMin. matingthe score intervalby theordinaryWald interval(1)

280 TheAnmericani November-


Statisticiani, 2000, Vol.54, No. 4 ? 2000 AmericanStatisticalAssociationi
CoverageProbability CoverageProbability CoverageProbability
1.00 ':1.00
'V + 1 1.00 ,
.9 v .95 -.95

950/ 9090 ,- : ,90 ,, ,,

.00 -. 85 - 8

.80 - .80 - .80

.75 - .75 - :75 -

j0 ~~
~~~~p ~~~~~
70p 7p
0 .2 .4 .6 . 1 2 4 6 1 0 2 4 6 8 1

|------- Wald Adjusted

CoverageProbability CoverageProbability CoverageProbability


1.00 - . . '1.00 .. 1.00

.95 - .95 -
,95

99% .90- .90- .90-

.85 - .85 - .85 -

.80 - 80 - 80

.75- 7 5

.70 p .70 p .70 p


0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1

n=5 n=10 n=20


Figure 1. Coverage probabilitiesforthebinomialparameterp withthe nominal95% and 99% Waldconfidenceintervaland the adjusted interval
based on adding fourpseudo observations,forn = 5, 10, 20.

afteraddingz.025= 1.962 4 pseudo observations, two of p of p and 1/2 ratherthan the weightedaverage of the
each type.That is, theiradjusted"add two successes and variances;by Jensen'sinequality,the adjusted intervalis
two failures"intervalhas the simpleform widerthanthe score interval.
For small samples,the improvement in performance of
theadjustedintervalcomparedto theordinary Wald interval
iZ. 025 V/P( - p/t (3) is dramatic.To illustrate, Figure 1 shows theactual cover-
but withn = (n + 4) trialsand p (X + 2)/(n + 4). The -
age probabilities for the nominal 95% Wald and adjusted
midpointequals thatof the 95% score confidenceinterval intervals plotted as a function of p, forn = 5, 10, and 20.
(roundingZ.025to 2.0 forthatinterval), butthecoefficient of For all n great improvement occurs forp near 0 or 1. For
Z.025 uses the variancep(l - p)/niat the weightedaverage
instance, Brown et al. (1999) stated that whenp = .01, the
size of n requiredsuch thattheactualcoverageprobability
of a nominal95% Wald intervalis uniformly at least .94
Coverage Probability
forall n above thatvalue is n = 7963, whereasforthead-
1.0
justed intervalthis is truefor everyn; when p .10 the
values are n = 646 forthe Wald intervaland n 11 for
.8 theadjustedinterval.The Wald intervalbehavesespecially
poorlywithsmalln forp neartheboundary, partlybecause
of the nonnegligibleprobabilityof havingp = 0 or 1 and
.6 thus the degenerateinterval[0, 0] or [1, 1]. Agrestiand
Coull (1998) recommendedtheadjustedintervalforuse in
elementarystatisticscourses,since the Wald intervalbe-
.4
haves poorlyyetthescore intervalis too complexformost
students.Many studentsin non-calculus-based coursesare
mystified by quadratic equations (which are neededto solve
.2
forthe score interval)and would have difficulty using the
0 2 4 6 8 weightedaverage formula above. In such courses, it is of-
teneasier to show how to adapt a simplemethodso thatit
t Pseudo Observations workswell ratherthanto presenta morecomplexmethod.
Let It (n,x) denotethe adjustmentof the Wald interval
Figure 2. Boxplots of coverage probabilitiesfornominal95% ad- thatadds t/2 successes and t/2 failures.With confidence
justed confidenceintervalsbased on adding t pseudo observations;dis-
tributions
-
referto 10,000 cases, withn1 and n2 each chosen uniformly levels (1 ca)otherthan.95, theAgrestiand
Coull approx-
between 10 and 30 and p 1 and p2 chosen uniformly between 0 and 1. imationof the score intervaluses It (n, x) with t = z2

The Amterican November2000, Vol.54, No. 4


Stcatisticicani, 281
Table 1. Summary of Performanceof Nominal 95% Confidence Intervalsfor Pi - P2 Based on Adding t Pseudo Obser-
vations,AveragingwithRespect to a Uniform for(Pl,P2).
Distribution

Numberof Pseudo Observationst Hybrid Approximate


Characteristic n 0 2 4 6 8 Score Bayes

Coverage 10 .891 .949 .960 .958 .945 .954 .952

20 .924 .949 .956 .955 .948 .953 .951

30 .933 .949 .954 .954 .949 .950 .951

30, 10 .895 .948 .959 .959 .950 .950 .952

Distance 10 .059 .014 .013 .020 .035 .014 .012

20 .026 .008 .008 .012 .022 .009 .007

30 .017 .006 .006 .008 .016 .008 .006

30, 10 .055 .018 .012 .013 .023 .010 .011

Length 10 .647 .670 .673 .668 .659 .654 .647

20 .480 .487 .488 .487 .485 .481 .477

30 .398 .401 .401 .401 .401 .398 .396

30, 10 .537 .551 .553 .551 .545 .537 .536

Cov. Prob. < .93 10 .880 .090 .010 .100 .235 .072 .046

20' .404 .016 .002 .046 .175 .020 .008

30 .180 .005 .000 .023 .131 .009 .002

30, 10 .934 .112 .004 .028 .173 .029 .018

NOTE: Table reportsmean of coverage probabilitiesCt(n,pl; n,p2), mean of distances Ct(n,pi; n,p2) - .951 fromnominallevel,mean of expected intervallengths,and proportionof cases
with Ct(n,p1; n,p2) <.93.

insteadof t = 4, for instanceadding2.7 pseudo observa- tially afteradding a pseudo observationof each type to
tionsfora 90% intervaland 5.4 fora 99% interval.Many each sample, regardingsample i as (nm+ 2) trials with
instructorsin elementary courseswill findit simplerto tell Pi = (Xi + 1)/(mn+ 2). There is no reason to expect an
studentsto use the same constantfor all cases. One will optimalintervalto resultfromthismethod,or in particu-
do reasonablywell, especiallyat high nominalconfidence lar fromaddingthe same numberof pseudo observations
levels, by the recipe of always using t = 4. The perfor- to each sample or even the same numberof cases of each
manceof theadjustedinterval14(n,xc)is muchbetterthan type,butwe restricted attention to thisformbecause of the
the Wald interval(1) for the usual confidencelevels. To simplicityof explainingit in a classroomsetting.
illustrate,Figure 1 also shows coverage probabilitiesfor
nominal99% intervals,when in = 5, 10, 20. Since the .95
confidencelevel is the mostcommonin practiceand since 2. COMPARING PERFORMANCE OF WALD
this"add two successes and two failures"adjustmentpro- INTERVALS AND ADJUSTED INTERVALS
vides strongimprovement over the Wald for otherlevels For the two-samplecomparisonof proportions, we now
as well,it is simplestforelementary coursesto recommend studytheperformance of theWald confidenceformula(2)
thatadjustment uniformly.Of theelementary textsthatrec- afteraddingt pseudo observations, t/4of each typeto each
ommendadjustmentof theWald intervalby addingpseudo sample,truncating whentheintervalforP1 -P2 containsval-
observations,some (e.g., McClave and Sincich 2000) di- ues < -1 or > 1. Denote thisintervalby It (n1, x1; n2, X2),
rect studentsto use 14(n,c) regardlessof the confidence or It for short,so 1o denotesthe ordinaryWald interval.
coefficient whereasothers(e.g., Samuels and Witmer1999) Our discussionrefersmainlyto the .95 confidencecoef-
recommendt = z2 ficient,but our evaluationsalso studied.90 and .99 coef-
The purposeof thisarticleis to show thata simplead- ficients.Let Ct(nm,pi;n2,P2), or Ct for short,denotethe
justment,adding two successes and two failures(total), truecoverageprobabilityof a nominal95% confidencein-
also worksquite well fortwo-samplecomparisonsof pro- tervalIt. We investigatedwhetherthereis a t value for
portions.The simpleWald formula(2) improvessubstan- which ICt((nl,pl;n2,P2) - .951tendsto be small formost
282 Teacher'sCornier
ProportionBelow .93 ProportionBelow .93

1 1

.8 .8

.6 .6
nl = n2 = 10 nl = 30 n2 = 10

.4 .4

.2 .2

0 _ _ _ _ _ _ _ _ _0_ _ _ _ _ _ _ _
l l l l l l l l l I

0 2 4 6 8 0 2 4 6 8

t Pseudo Observations t Pseudo Observations

Figure3. Proportionof (p1, p2) cases withp1 and p2 chosen uniformly between 0 and 1 forwhichnominal95% adjusted confidenceintervals
based on adding t pseudo observationshave actual coverage probabilitiesbelow ,93, forn1 = n2 = 10 and n 1 = 30, n2 = 10.

(P1, P2),evenwithsmall nr and n2, withCt rarelyveryfar The ordinary95% Wald intervalbehavespoorly.Its cov-
(say .02) below .95. To exploretheperformance fora vari- erageprobabilitiestendto be too small,and theyconverge
etyof t withsmallnT, we randomlysampled10,000values to 0 as each pi moves toward 1 or 0. The coveragesfor
of (ni, P1; n2,P2), takingP1 and P2 independently froma It improvegreatlyfor the positivevalues of t. The case
14 withfourpseudo observationsbehaves especiallywell,
uniformdistribution over [0,1] and takingn, and n2 inde-
havingrelativelyfew poor coverageprobabilities.For in-
pendently froma uniform distribution over{10, 11,.. ., 30}.
stance,theproportion of cases fort = (0, 2, 4, 6, 8) thathad
For each realizationwe evaluatedCt(ni, P1; n2, P2) fort be- < .93 were (.572, .026, .002, .046, .171). Similarly,the
Ct
tween0 and 8. Figure2 illustratesresults,showingskeletal proportionof nominal99% intervalsthathad actual cover-
box plotsof Ct fort = 0, 2,4, 6, 8 (i.e., adding0, .5, 1, 1.5, age probabilitybelow .97 were(.310, .012, .000, .000, .000),
2 observationsof each typeto each sample). and the proportionof nominal90% intervalsthathad ac-

Coverage Probability Coverage Probability Coverage Probability


1.00- 1.00- 1.00-

.95- * tVV$ .95- i :: ;.95t wtAA?AAt:4


I V IV %"-i
j~t .Y.
:h&~~~~~~.A~~
h~~I)
hfv~~~~h
ti~~X%

.90 - .90 0
90

.85 - .85 - .85 -

------- Wald
Adjusted

.80 l_ l_ l_ l___ '. p1 .80 l_l _l _l __ p1 .80- p1


0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1

P2 =.1 P2 =.3 P2 =.5

Figure4. Coverage probabilitiesfornominal95% Waldand adjusted confidenceintervals(adding t = 4 pseudo observations)as a function


of
p1 whenp2= .1,.3,.5, withn1 = n2= 20.

The Amzericani November2000, Vol.54, No. 4


Statisticiani, 283
Coverage Probability Coverage Probability Coverage Probability
1.00 1.00- 1.00 -

95 7 m , 50 95

14,u
.J95
.90 ' ', ' '90 - .90'P"'"'"'e"' ''""''"'v\'"- ""'n"'""'''r'"G '"

.85 ", 85 8

.80 - .80 - .80 -


------ Wald
Adjusted

.75 -75 - .75 -

.70 __ _ __ _ __ _ p1 .70 - _ _ _ __ p1 .70 - p1

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1

nl = n2 = 10 nl= 20, n2= 10 nl= 40, n2= 10


Figure5. Coverage probabilitiesfornominal95% Wald and adjusted confidenceintervals(adding t = 4 pseudo observations)as a function
of
p1 whenp2 = .3 when n1 = n2 = 10, n1 = 20, n2 = 10, and nl = 40, n2 = 10.

tual coverageprobabilitybelow .88 were (.623, .045, .016, (nm,n2) = (10, 10), (20, 10), and (40, 10). Figure6 showsC0o
.131, .255). The patternexhibitedhere is illustrativeof a and C4 as a functionofP1 whenP1 -P2 = 0 or .2 and when
varietyof resultsfromanalyzingCt more closely,as we the relativeriskP1/P2 = 2.0 or 4.0, when ni = n2 = 10.
now discuss. In Figures4-6, onlyrarelydoes the adjustedintervalhave
We analyzed the performanceof the It intervalfor coveragesignificantly below thenominallevel.On theother
various fixed (nl, n2) combinations.Table 1 summarizes hand,Figures4 and 6 showthatit can be veryconservative
some characteristics, in an average sense based on tak- when P1 and P2 are both close to 0 or 1, say with (P1 +P2)/2
ing (P1, P2) uniformfromthe unit square, for (n1, n2) = below about.2 or above about.8 forthesmall samplesizes
(10,10), (20, 20), (30,30), (30,10). Although the adjusted studiedhere. This is preferred, however,to the verylow
interval14 tends to be conservative,it compareswell to coverages of the Wald interval in these cases. Figures 7
othercases in themean of thedistancesICt - .951and es- and 8 illustratetheirbehavior,showing surfaceplotsof C0o
and C4 overtheunitsquarewhenni = n2 = 10. The spikes
peciallytheproportion of cases forwhichCt < .93. For n.
10, for instance,theactualcoverageprobability is below at values of pi in Figures4 and 5 become ridgesat values
in
.93 for 88% of such cases withthe Wald interval,but for of P1 P2 thesefigures.
-
The poor performance of theWald intervaldoes not oc-
only 1% of themwith 14. Figure 3 shows the proportions
cur because it is too short.In fact,for moderate-sizedpi
of coverageprobabilitiesthatare below .93 as a function
it tendsto be too long. For instance,when nr = 12 = 10,
of t, for(n1, n2) = (10, 10) and (30, 10). The improvement
Io has greaterexpectedlengththan14 forP2 between.11
over theordinaryWald intervalfromaddingt = 4 pseudo
and .89 when P1 = .5 and for P2 between .18 and .82
observationsis substantial.Remainingfiguresconcentrate
when P1 = .3. When n, = n2 = n and when Pi =
on thisparticularadjustment, whichfaredwell in a variety
P2 = P, Io has greaterlengththanIt when p falls within
of evaluationswe conducted.
/.25 - n(4n + t)/[24n2 + 12nt+ 2t2] of .5. For all t > 0,
Averagingperformance over theunitsquare for (P1, P2) thisintervalaround.5 shrinksmonotonically as n increases
can mask poor behaviorin certainregions,and in practice to .50 i
.50/v3, or (.21,.79), which applies also to the
certainpairings(e.g., JP1- P21 small) are oftenmorecom- Agrestiand Coull (1998) adjustedintervalin the single-
mon or moreimportant thanothers.Thus,besides studying samplecase. As in thesingle-proportion case, theWald in-
these summaryexpectations,we plottedCt as a function tervalsuffers fromhavingthemaximumlikelihoodestimate
of P1 for variousfixedvalues of P2, P1 - P2, and P1/P2. exactlyin themiddleof theinterval.
To illustrate,Figure4 plots the Wald coverageCo and the Thereis nothinguniqueaboutt = 4 pseudo observations
coverage C4 for the adjustedinterval,fixingP2 at .1, .3, in providinggood performance of adjustedintervalsin the
and .5, for ni = n2 = 20. The poor coverage spikes for one- and two-sampleproblems.For instance,Figure 3 and
the Wald intervaldisappear with 14, but this adjustment Table 1 showthatotheradjustments oftenworkwell. A re-
is quite conservativewhen P1 and P2 are both close to 0 gion of t values providesubstantialimprovement over the
or bothclose to 1. The adjustment14 performsreasonably Wald interval,withvalues near t = 2 being less conserva-
well,and muchbetterthantheWald interval, evenwithvery tive thant = 4. We emphasizedthe case t = 4 earlierfor
smallor unbalancedsamplesizes. Figure5 illustrates, plot- the two-samplecase because it rarelyhas poor coverage.
tingCo and C4 as a functionof P1 withP2 fixedat .3, for We believeit is worthpermitting some conservativeness to
284 Teacher'sCorner
Coverage Probability P1 -P2=0 Coverage Probability P1 -P2=.2
1.0 l 1.0

.9g- .9 - ."- - -

.8 - ..8

Wald
Adjusted .7

.6 p1 .6 p1
0 .2 .4 .6 .8 1 .2 .4 .6 .8

Coverage Probability Coverage Probability


P1/P2=2 P1/P2=4
1.0 1.0

.9 -|, ' ,-- ,'K j1/~Ai

.8 .8 .'

.7 -.7 -
l~~~~~~~~p ,'
.6 - l
I p1 .6 - l
I p1

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
Figure6. Coverage probabilitiesfornominal95% Wald and adjusted confidenceintervals(adding t = 4 pseudo observations)as a function
of
p1 whenp1-p2 = 0 or .2 and whenp1/p2 = 2 or 4, forn1 = n2 = 10.

ensurethatthecoverageprobability rarelyfallsmuchbelowcourses,it focuseson the simpleIt adjustmentratherthan


thenominallevel. In the one-samplecase the adjustedin- methodsthatmaybe suggestedby statisticalprinciples.To
terval'2(n, x) is betterthan14(n, x) in approximatingthe
finda good methodmoregenerally, one approachis to invert
score intervalwith small confidencelevels, such as 90%.a testof Ho: P1 P2- = A thathas good properties,such
An advantageof the interval12(n,x) forp is consistency
as using the large-samplescore test(Mee 1984) or profile
betweenthe single-samplecase and our recommendedad-
likelihoodmethods(Newcombe 1998b). The score testof
justment14(n1, x1; n2,x2) for two samples. For instance,
as ri2 ~+oc and the second sample yields a perfectesti-P1 - P2 = 0 is the familiarPearson chi-squaredtest,so
mate,the resulting"add two successes and two failures" thisapproachhas theadvantagethattheconfidenceinterval
two-sampleintervaluses the firstsample in the same way is consistentwith the most commonlytaughttest of the
as does the"add one success and one failure"single-sample
samenominallevel.The methodof obtainingtheconfidence
interval.However,forthesingle-sampleproblemwe prefer intervalis too complex for elementarycourses,however,
the 14(n,x) interval,since .95-is by farthe mostcommon partlybecause thetestof P1 - P2 = A requiresfindingthe
confidencelevel in practiceand thisintervalworkssome-
maximumlikelihoodestimatesof (P1, P2) forthe standard
whatbetterthan'2 (n, x) in thatcase.
errorsubjectto theconstraint P1 - P2 - A.
Newcombe(1998b) evaluatedvariousconfidenceinterval
methodsforP1 - P2. He proposeda methodthatperforms
substantially betterthan the Wald intervaland similarto
3. COMPARING THE ADJUSTED INTERVAL the score interval, whilebeingcomputationally simpler(al-
WITH OTHER GOOD INTERVALS though too complex formost elementary statistics courses).
Many methodshave been proposedforimprovingon the His methodis a hybridof resultsfromthe single-sample
ordinaryWald confidenceintervalforP1 - P2. Since this score intervalsforP1 and P2 Specifically, let (ej,ui) be the
articledicussesmethodsappropriate
in elementary
statistics rootsforpi in Z, /2= I-Pil/ pi (l - pi) ni. Newcombe's
November2000, Vol.54, No. 4
TheAmericanStatisticiani, 285
hybridscore intervalis .92 forthe95% adjustedintervaland .86 forthe95% hybrid
score interval.
1
) +
The adjusted intervalI4 and the hybridscore interval
(il - P2) - Z /2 U2(1-U2)
bothhave a greatertendencyfordistal non-coveragethen
mesialnon-coverage. For instance,forthe 10,000randomly
selected cases, the mean probabilityfor which the lower
(Pl-_p2+Zo2 Ul(1-Ul) +? 2(l1-2)11 limitexceeds P1 - P2 whenP1 - P2 > 0 or the upperlimit
ni n2
is less thanP1 - P2 whenP1 - P2 < 0 was .030 for14 and
Comparedto the adjustedintervalI4, the hybridscore in- .033 forthe 95% hybridscore interval,whereasthe mean
tervalalso is conservativewhenP1 and P2 are bothclose to probabilityfor whichthe upperlimitis less thanP1 - P2
0 or 1; overall,it is less conservative,
however,withmean whenP1 - P2 > 0 or the lowerlimitexceeds P1 - P2 when
coverageprobabilitycloser to thenominallevel (see Table P1 - P2 < 0 was .013 forI4 and .014 forthe 95% hybrid
1). Likewise, it tends to be a bit shorter.It has a some- score. As t increasesforIt, theratioof incidenceof distal
whathigherproportionof cases withcoverageprobability non-coverageto mesial non-coverageincreases;for these
being too small,mainlyforvalues of JP1- P21 near 1; for randomlyselectedcases, fort = (0, 2, 4, 6, 8) it equals (.7,
the 10,000 randomlyselected cases with ni also random 1.2,2.2, 4.3, 8.1). Unliketheadjustedintervaland theWald
between10 and 30, theminimumcoverageprobability was interval,thehybridscoreintervalcannotproduceovershoot,

CoverageProbability
CoverageProbability

95

.9

.7
.7

p2

Figure8. Coverageprobabilitiesfor95% nominaladjustedconfi-


for95% nominalWaldconfidence dence interval
Figure7. Coverageprobabilities ofp1
as a function
(addingt = 4 pseudo observations)
interval ofp1 andp2, whenn1 = n2 = 10.
as a function and p2, whenni1= n2 = 10.

286 Teacher'sCorner
withthe intervalforP1 - P2 extendingbelow -1 or above Finally,an alternativeway to improvethe Wald method
+ 1 and thusrequiringtruncation. OvershootforIt is less is witha continuity correction(Fleiss 1981,p. 29). As with
commonas t increases.For instance,for these randomly othercontinuity corrections,this generallyresultsin con-
selectedcases, the mean probabilityof overshootfort servativeperformance, usually more so thanthe adjusted
(0, 2, 4, 6, 8) was (.048, .033, .016, .006, .000). interval.However,the coverageprobabilities, like those of
Since standardintervalsforp andP1 -P2 improvegreatly the Wald interval,can dip substantially below the nominal
withadjustmentcorresponding to shrinkageof pointesti- level whenbothpi are near0 or 1.
mates,one wouldexpectintervalsresultingfroma Bayesian
approachwithcomparableshrinkagealso to performwell
in a frequentist sense. Carlin and Louis (1996, pp. 117-
123) providedevidence of this typefor estimatingp. For 4. TEACHING THE ADJUSTED INTERVALS
P1 - P2, considerindependentuniformpriordistributions Agrestiand Coull (1998) motivated theiradjustedinterval
forP1 and P2. The posteriordistribution of pi is beta with forthe
(3) fora singleproportion as a simpleapproximation
meanPi = (Xi + 1) j (ni + 2) and variancePi (I -Pi)/ (ni + 3). score 95% confidenceinterval.We know of no such sim-
Using a crudenormalapproximation forthedistribution of
ple motivationforthe adjustedintervalforthetwo-sample
thedifference of theposteriorbeta variatesleads to thein-
comparison,otherthanthesimilarity withtheBayesian in-
terval
terval(4). A problemforfutureresearchis to studywhether
theoreticalsupportexists for this simple yet effectivead-
justment,suchas Edgeworthor saddlepointexpansionsthat
(P1-P2)?Za/2 i1(l-i3) + P2(l-P2) (4) mightprovideimprovedapproximations forthetail behav-
ior of Pl - P2-
The motivationneeded for teachingin the elementary
statistics course is quite different. How can one motivate
This has the same centeras the adjusted interval14 but
uses ni + 3 insteadof ni + 2 in the denominatorsof the adding pseudo observations? In the single-samplecase we
standarderror.For elementarycourses,this intervalwas remind students that the binomial distributionis highly
suggestedby Berry(1996, p. 291). Like Newcombe's hy- skewed as p approaches 0 and 1, and because ofthisperhaps
brid score interval,it tends to performquite well, being p should not be the midpoint of the interval. As supportfor
slightlyshorterand less conservativethan14 but suffering this, we have students use the software ExplorStat (available
occasionalpoorercoverages(see Table 1). For sample size at http://www.stat.ufl.edu/-dwack/). Through simulation
combinationswe considered,its minimumcoverageproba- it showshow operatingcharacteristics of statisticalmethods
bilitywas onlyslightlybelow thatfortheadjustedinterval. change as students vary sample sizes and populationdistri-
If conservativeness is a concern(e.g., if bothpi are likely butions. For instance, when p takes values such as .10 or
to be close to 0), the approximateBayes and hybridscore .90, students observe a relatively high proportion of Wald
intervalsare slightlypreferableto 14. intervals failing to contain p when n is 30, the sample size
The adjusted interval14 (and the similar approximate their text suggests is adequate for large-sample inference
Bayes interval(4)) is simplerthanothermethodsthatim- fora mean.
prove greatlyover the Wald interval.Thus, we believe it Most students,however,seem more convincedby spe-
is appropriateforelementarystatisticscourses.We do not cific exampleswheretheWald methodseems nonsensical,
claim optimalityin any sense or thatothermethodsmay such as whenp = 0 or 1. We oftenuse data froma ques-
notbe betterforsome purposes.Some applications,forin- tionnaire administered to the studentsat the beginningof
stance,may requirethatthe true confidencelevel be no term. For instance, one of us (Agresti)taughta class to 24
lower thanthe nominallevel, mandatinga methodthatis honors students in fall 1999. In responseto the question,
necessarilyconservative (e.g.,Chan and Zhang 1999). Also, "Are you a vegetarian?",0 of the 24 studentsresponded
we recommend14 forintervalestimationand notforan im- "yes,"yettheyrealizedthattheWald intervalof [0, 0] was
plicittestof Ho: P1 - P2 = 0, althoughsuch a testwould notplausiblefora corresponding populationproportion. We
be morereliablethanone based on theWald interval.For have also used homeworkexercisessuch as estimatingthe
a significance test,we would continueto teachthePearson probability of success fora new medicaltreatment whenall
chi-squaredtestin elementarycourses. The testbased on 10 subjectsin a sample experiencesuccess, or estimating
14 is too conservativewhen the commonvalue of pi un- theprobability of deathdue to suicidewhena sampleof 30
der the null is close to 0 or close to 1, for most sample deathrecordshas no occurrences.(Again,theWald interval
sizes more conservativethanthe Pearson testfor such pi. is [0, 0], but the National Centerfor Health Statisticsre-
Althoughthe adjustedintervalis notguaranteedto be con- portsthatin theUnitedStatestheprobabilityof deathdue
sistentwiththe resultof the Pearson test,it usually does to suicideis about .01.) Althoughone can amendtheWald
agree.For instance,forcommonvalues (.1, .2, .3, .4, .5) of methodto improveits behaviorwhenp 0 or 1, such as
Pi, the95% versionof 14 and thePeareson testwithnominal by reeplacing the endpointsby ones based on the exact bi-
significancelevel of .05 agree withprobability(.972, .996, nomialtest,makingsuch exceptionsfroma generalrecipe
.9996, 1.000, 1.000) whennl = 2=30 and (1.0, 1.0, 1.0, distractsstudentsfreom themainidea of takingtheestimate
1.0, 1.0) whennl = 2=10. plus and minusa normal-score multipleof a standarderror.

November2000, Vol.54, No. 4


The AmericanStatistician, 287
Whyfoutr pseudo observations? In thesingle-samplecase [ReceivedSeptemnber 1999. RevisedFebru-cary2000.]
we explain that this approximatesthe resultsof a more
complex methodthatdoes not requireestimatingthe un-
knownstandarderror;here,we explainthe conceptof in- REFERENCES
vertingthetestwithnullstandarderror,or findingsolutions Agresti,A., and Coull, B. A. (1998), "Approximateis Betterthan'Exact'
of (p - p) = 2 /p(l -p)/n thatdo not requireestimating forIntervalEstimationof BinomialProportions,"TheAmericanStatis-
tician,52, 119-126.
/p(l - p)/n. In thetwo-samplecase one could explainthat
thisapproximatesa statisticalanalysisthatrepresents prior Berry, D. A. (1996), Statistics.A Bayesian Perspective,Belmont,CA:
Wadsworth.
beliefsabout each pi by a uniformdistribution. (Some in-
Brown,L. D., Cai, T. T., and DasGupta,A. (1999), "ConfidenceIntervals
structors, of course,will prefera more fullyBayesian ap- fora BinomialProportionand EdgeworthExpansions,"technicalreport
proach,as in Berry1996.) 99-18, PurdueUniversity, StatisticsDepartment.
The poor performanceof the ordinaryWald intervals Carlin,B. P.,and Louis, T. A. (1996), Bayes anidEmpiricalBayes Methods
for p and for P1 - P2 is unfortunate, since they are the for-Data Anialysis, London: Chapmanand Hall.
simplestand most obvious ones to presentin elementary Chan,I. S. F., and Zhang,Z. (1999), "Test-BasedExact ConfidenceInter-
Biomet7ics,55,
courses.Also unfortunate fortheseintervalsis thedifficulty vals forthe Differenceof Two Binomial Proportions,"
1202-1209.
of providingadequate sample size guidelines.Introductory Fleiss, J. L. (1981), StatisticalMethoclsfor-Rates anidPr-oportions (2nd
textbooksprovidea varietyof recommendations, butthese ed.), New York:Wiley.
have inadequacies (Leemis and Trivedi1996; Brownet al. Ghosh,B. K. (1979), "A Comparisonof Some ApproximateConfidenceIn-
1999). And, needless to say, most texts do not indicate tervalsfortheBinomialParameter," of theAmericani
Journ-7Zal Statistical
what to do when the guidelinesare violated,otherthan Association,74, 894-900.
perhapsto consult a statistician.The resultsin this arti- Leemis,L. M., and Trivedi,K. S. (1996), "A Comparisonof Approximate
IntervalEstimatorsforthe BernoulliParameter,"The Anmericani Statis-
cle suggestthatfor the "add two successes and two fail- ticiani,50, 63-68.
ures" adjustedconfidenceintervals,one mightsimplyby- McClave, J. T., and Sincich,T. (2000), Statistics(8th ed.), Englewood
pass sample size rules. The adjusted intervalshave safe Cliffs,NJ:PrenticeHall.
operatingcharacteristics forpracticalapplicationwith al- Mee, R. W. (1984), "ConfidenceBounds fortheDifferenceBetweenTwo
mostall samplesizes. In fact,we notein closing(and with Probabilities," 40, 1175-1176.
Biomiietr-ics,
tonguein cheek) that the adjustedintervals14(n, x) and Newcombe,R. (1998a), "Two-Sided ConfidenceIntervalsforthe Single
14(n1, x1; n2, X2) have theadvantagethat,as withBayesian Proportion:Comparisonof Seven Methods,"Statisticsin Medicinle,17,
857-872.
methods,one can do an analysiswithouthavingany data.
(1998b), "IntervalEstimationfor the DifferenceBetween Inde-
In thesingle-samplecase theadjustedsamplethenhas p = pendentProportions:Comparisonof Eleven Methods,"Statisticsin
2/4, and the 95% confidenceintervalis .5 i 2A/(.5)(.5)/4, Medicinie,17, 873-890.
or [0, 1]. In thetwo-samplecase theadjustedsampleshave Samuels,M. L., and Witmer,J.W. (1999), Statisticsfor theLifeScienices
P, = 1/2 and P2 = 1/2,and the 95% confidenceintervalis (2nd ed.), EnglewoodCliffs,NJ:PrenticeHall.
- or
(.5 .5) i 2\ [(.5) (.5)/2] + [(.5) (.5)/2], [-1, +1]. Both Vollset, S. E. (1993), "ConfidenceIntervalsfor a Binomial Proportion,"
Statisticsin Medicine,12, 809-824.
analysesare uninformative, as one would hope froma fre-
E. B. (1927), "Probable Inference,the Law of Succession, and
quentistapproachwithno data. No one will get into too Wilson, StatisticalInference,"Journ71alof theAmnerican StatisticalAssociationl,
muchtroubleusingthem! 22, 209-212.

288 Teacher'sCoriier

You might also like