You are on page 1of 7

A New Confidence Interval Method Based on the Normal Approximation for the Difference

of Two Binomial Probabilities


Author(s): Peter H. Peskun
Source: Journal of the American Statistical Association, Vol. 88, No. 422 (Jun., 1993), pp. 656-
661
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2290348 .
Accessed: 14/06/2014 01:06

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org

This content downloaded from 195.34.79.101 on Sat, 14 Jun 2014 01:06:43 AM


All use subject to JSTOR Terms and Conditions
A New Confidence IntervalMethod Based on the
Normal Approximation forthe Differenceof
Two Binomial Probabilities
PETERH. PESKUN*

Usinga generalmethodforobtainingconfidence fromsamplesfromdiscrete


intervals thisarticleintroduces
distributions, a new
confidence methodforestimating
interval thedifference
oftwobinomialprobabilities and compares itto threeotherconfidence
interval oneofwhichistheusualmethod
methods, withnocontinuity correction.
Eachoftheother twoconfidence intervalmethods
usesitsowncontinuity onecombines
correction; itwithanestimate ofthestandarderrorthatisslightly
differentfrom thatcommonly
used.Somevaluesofthe"exact"confidenceinterval
limitsarealsoderived.Thefourconfidence intervalmethods, eachofwhichis
basedon thenormalapproximation andcanbe carriedouteasilyon a handcalculator, arecompared in termsoftheirprecision,
theagreementoftheircoverage withnominal
probabilities confidencelevelvalues,andthesmallness oftheirsamplesizesbefore the
normal approximationcanbeconsidered Coverage
appropriate. probabilities
andmeasures ofprecision arecomputed exactly
rather
thanestimatedbysimulation.
On thebasisofthesecomparisons, thenewconfidence intervalmethod is recommended.
KEY WORDS: Coverage Exactconfidence
probability; limit;Precision.

Let XI and X2 be two statisticallyindependentbinomial maximumlikelihoodestimatorT = t(XI, X2) = X11n -X21


randomvariableswithparametersn1,Pi and n2,P2, 0 ' Pi, n2to obtainone-sidedand two-sidedexactlowerand upper
P2 ' 1. If n1and n2are each sufficientlylarge,thenapprox- confidencelimitsfortheunknownparameterPi - P2. Here
imate 100(1 - a)% one-sidedor two-sidedlowerand upper theterm"exact" refersto theuse of theexactdiscretesam-
confidencelimitscan be constructedforthe assumed un- plingdistribution ofXI /nI - X2/n2in computingthe con-
known parameterPi - P2, -1 ' Pi - P2 ' 1, using the fidencelimitsforPi - P2; it does notreferto theconfidence
fact that the unbiased maximum likelihood estimator levelsbeingexactlyequal to the statedvalues. In fact,the
Xl/n, - X2/n2ofPi - P2 has a samplingdistribution that confidencelevelsare at leastequal to thestatedvalues.
is approximatelynormalwithmean Pi - P2 and variance If tois an observedvalue of T, and ifthegeneralmethod
pi(1 - pl)/nl + p2(1 - p2)/n2. For example,( 31-P2) is applied,thena one-sidedexactlower100(1 - aL)% con-
? Za/2[P13(1 - ^,)/n, + 32(1 - 32)/n2l1/2is one of seven fidencelimit,L(to), forPI - P2 is providedby
different approximate100(1- a)% two-sidedconfidence
L(to) = inf {P -p2IPr[T? to;
intervalsforPi -P2 compared by Hauck and Anderson O?P1,P2? 1
(1986). Here 1 = xl/nl and P2 = x2/n2,wherexi and x2
are the observedvalues of XI and X2; Za/2 is the upper ni, pi ,n2, P21> atL} l
a!/2 percentagepointofthe standardnormaldistribution. It easilyfollowsfrom(1) thatL (-1) =-1, and it is shown
The purposeofthisarticleis to introducea newconfidence in AppendixA thatform1 = max(n1, n2),
intervalmethodbased on thenormalapproximationand to
compare it to the methodsrecommendedby Hauck and L(1/m, - 1) = -(1 - aL)/1, (2)
Anderson(1986). The precisionof the method,the agree-
mentof its coverageprobabilitieswithnominalconfidence and thatform2= min(n1,n2),
levelvalues,and thesmallnessofthesamplesizes n1and n2 L(1) = (n, + n2)(L/n11 n22) 1/(nl+n2) - 1
beforethenormalapproximationcan be consideredappro-
In contrastto thestudyconducted
priatewillbe investigated. if aill ? ni/n2 < a-l/n2
by Hauck and Anderson(1986), coverageprobabilities will
= Li/m2 otherwise. (3)
be computedexactlyratherthan estimatedby simulation.
Measuresof precisionwillalso be computedexactly.Some Similarly,a one-sidedexactupper100(1 - au)% confidence
values of the "exact" confidenceintervallimitswill be de- limit,U(to0), forpI - P2 is providedby
rived.
U(tO) = sup {P -P2IPr[T? to;
1. EXACT CONFIDENCE LIMITS O?P1,P2? 1

Peskun (1990) describeda generalmethodforobtaining ni,Pi, n2, P2] > au}. (4)
well-definedone-sidedand two-sided"exact" lowerand up-
It easilyfollowsfrom(4) thatU( 1) = 1,and itcan be similarly
per confidencelimitsfroma sample froma discretedistri-
shown,as in AppendixA, that
bution withan unknownparameter.This methodcan be
appliedto thediscretesamplingdistribution
oftheunbiased U(1 - 1/Imn)= (1- au)l/mI (5)

? 1993 American Statistical Association


* PeterH. Peskun is AssociateProfessor,Departmentof Mathematics Journal of the American Statistical Association
and Statistics,
York University,
NorthYork,OntarioM3J 1P3, Canada. June 1993, Vol. 88, No. 422, Theory and Methods
656

This content downloaded from 195.34.79.101 on Sat, 14 Jun 2014 01:06:43 AM


All use subject to JSTOR Terms and Conditions
Peskun: Confidence Interval forDifference of Two Binomial Probabilities 657

and that The smallersolutionof thisquadraticequation in d gives


L(to) approximately;
thatis,
U(-1) = 1- (n1 + n2)(aU/flnl n22)(n 2)

L(to) LI(to)
if aUln' nl/n2 < a-'l/n2
- [to- CL - Z-LV(n + Z?L )/4nln2 - (to - CL)2/In]
=-au 1/r2 otherwise. (6)
+ (l +Z2aL/fn). (12)
Ifa = aL + au and 0 < a < 1,thenL(to) and U(to) provide
exact two-sidedlowerand upper 100(1 - a)% confidence Note thatthis value of d can alwaysbe used as an ap-
limitsforPi - P2 which,withoutproof,I strongly feelare proximationforL(to), but if n1 # n2 it mightbe smaller
welldefinedin thesensethatL(to) ? U(to). thantheminimumdefinedby(8) becauseitis theminimum
ofpl -P2 on thewholeellipseg(pi, P2) = 0 definedby (9),
2. APPROXIMATE CONFIDENCELIMITS and thepoint(PI, P2) = (.5 + nId/n, .5 - n2d/n)at which
it is achieved may be outsidethe unit square 0 ? PI, P2
Using the fact that for nI and n2 sufficiently large, T ? 1. If thisis thecase, thenthedesiredminimumvalue of
=Xl/n1 - X2/n2is approximatelynormallydistributed PI - P2 definedby (8) is achieved at one of the pointsof
withmeanp1 -P2 and variancep1 (1 -p1)/nl + P2(1 -P2) intersection oftheellipseg(pl, P2) = 0 and theboundaryof
-n2, theprobability Pr[ T > to;nI,p1, n2,P2] in (1), written theunitsquare0 ' PI, P2 ? 1. A simpleexampleis givenin
moreconveniently as Pr[ T ? to],can be computedapprox- AppendixB.
imatelyas follows: Similarly,it can be shownthat

Pr[T? to] U(to)- max {PI -P21[to + CU-(PI -P2)I


O<PI,P2S I

=Pr~
-Pr T-(p1-p2) * [pl (1-pl)/nl + P2( -p2)/n2J112
Vpl (I- p1)/n1 + P2(1- p2)/n2 }, (13)
-~ ~~~t -
(P 2 -ZaU

to- (P1-P2)1 and, as a result,an approximationforU(to) is givenby


Vp1(l- p1)/n1+ p2(1 - p2)/n2J
U1(to) = [to + Cu + zVu(ll(n + z2 u)/4nln2- (to + cU)2/n]

PrZ
tO- CL- (P1 P2) 1. (7)
7
( 1 + zeul/n). (14)
/P,0( - PI1 nI + P2( - p2)/n2 J
Here cu = cu(nl, xi, n2,x2) is a continuity correctionequal
Here Z is the standardnormal random variable and CL to half the absolute difference between to = xl InI x2/ n2 -

= cL(n1,xl, n2,x2) is a continuitycorrectionequal to half and the next larger possible value of T. It can easily be shown
theabsolutedifference betweento = xl/nl - x2/n2and the that cL = cu= 1/n if n, = n2. But ifn1 # n2, then cL and cu
nextsmallerpossiblevalue of T. are not always equal. In Section 5, itis shown how the values
From (1) and (7), it followsthat of CL and cu can be determined in general.
Note thatifn1* n2and ifthepoint(PI, P2) = (.5 + n,d/
L(to) min {PI -P21 [to- CL-(P1 -P2)] n, .5 - n2d/n) at which the maximum of p- P2 = d on
0P1,P2- 1
the ellipseh(pi, P2) = [to + CU - (PI - P2)]/[PI(1 - PI )/
[Pl(I -p1)/nl + P2(l -p2)/n2]'12 n, + P2( - P22)/n2]1/2 + Zu = 0 is achievedlies outsidethe
unitsquare0 ? PI, P2 < 1, thenthedesiredmaximumvalue
ZaL }. (8) OfPi - P2 definedby (13) is achievedat one ofthepointsof
intersection oftheellipseh(PI, P2) = 0 and theboundaryof
Using the method of a Lagrangemultiplierto minimize the unit square 0 ? PI, P2 ? 1. Note thatthismaximum
f( Pi, P2) = Pi - P2 subjectto theside condition value ofp1 - P2 will necessarilybe smallerthan U1(to).
To achievegoodagreement betweencoverageprobabilities
g(P1, P2) = [to - CL - (PI P2)1 and nominalconfidencelevelvalues,as willbe discussedin
* [p1(1 - p1)/nI + P2(1 - p2)/n212 the nextsection,I recommendthatthe approximateconfi-
dencelimitsLI (to) and U1(to) alwaysbe used insteadofthe
Za'L
0 (9) possiblylargerminimumand smallermaximumvalues of
it can be shown thatPi and P2 must satisfythe following Pi - P2 definedby (8) and (13). This recommendationis
based on the resultsof a separatestudythatI have carried
constraint:
out but have not producedhere.
For comparison,the followingapproximateconfidence
n2PI + nlp2 = (n, + n2)/2 = n/2. (10)
limitswillalso be considered:
LettingPi -P2 = d, it followsfrom(9) and ( 10) that L2(x1, X2) =(1 - P2)
(to -CL -d)/(n/4n1n2 -d2/n)"/2 - zaL. (11) -ZALV/Pl(l - P3)/n1+P^2(l -j32)/n2,( 15)

This content downloaded from 195.34.79.101 on Sat, 14 Jun 2014 01:06:43 AM


All use subject to JSTOR Terms and Conditions
658 Journal of the American Statistical Association, June 1993

U2(X1,x2) = (l -P2 ) confidencelimitsLi and Ui (i = 1, 2, 3, 4),


+ Za8Vfil(1-Al )/In,+PA2(I _
P2)/n2, (16) Li(xi, x2) = -U1(n, - x1, n2 -X2),
L3(x, X2) = (P^l-P2) 0<x1I n1, 0?x2?n2. (22)

- + P^2(1
[ZALVIl(l - 13^)/In, This leads to a reductionin the amount of computation
A

2)ln2
involvedin calculating(21), the minimumprobabilityof
+ (1/2n, + 1/2n2)], (17) coverage,by allowingtheparameterspace 0 c Pl, P2 1 to
U3(XI,X2) =(l -P2) be replacedby either0 PI ?C P2 c 1 or 0 C P2 ? pc ? 1.
The lattersimplifiedformof the parameterspace will be
+ [ZaU8VIi(1 -I _)/nA + p2(1 -132)/n2 used in thisarticle.To further reducethe amountof com-
putationinvolvedin calculating(21) exactly,thesimplified
+ (1/2n, + 1/2n2)], (18) parameterspace 0 < P2 ' PI < 1 will be discretizedto the
L4(xI, x2) finitegridof pointsPI, P2 = .00(.01) 1.00 withP2 ' Pi. Al-
thoughthisdiscretization willpossiblyresultin thecomputed
=(pi3 - 2) minimumprobability of coveragebeingslightly largerthan
the actual value, it should not invalidateany comparisons
- [ZaLVil(1 - _PI)/(rn1- 1) + P2(1 - P2)1(n2 - 1)
made betweenthefourconfidenceintervalmethodsrelative
+ 1/2 min(nl, n2)], (19) to the agreementof theirrespectivecoverageprobabilities
withnominalconfidencelevelvalues.
and In Table 1 notethatthecomputedminimumprobabilities
U4(xI, X2) of coverage,givenfora nominalconfidencelevel equal to
A
.95, were obtainedusingthe exact confidencelimitsgiven
= (131-12) A2
in Section 1 for the special values to = -1, 1/rm- 1,
1 - 1 /mlI,and 1,no matterwhichpairofapproximatecon-
+ [ZaUV13i(1- 1,)/(n- - P2 P2)/(n2 - 1)
fidencelimitsLi and U, (i = 1, 2, 3,4) was used forotherto
+ 1/2 min(n1, n2)]. (20) values.As has beenverified in a separatestudynotproduced
here,usingtheseexactconfidencelimitsresultsin a general
HereP^l-PI2 = xl -x2/n2 = to. Note thatL2 and U2are increasein theminimumprobability ofcoverage,especially
theapproximateconfidencelimitsmostcommonlygivenin forsmall sample sizes nI and n2. More importantly, note
elementarystatisticsbooks. The approximateconfidence that formost of the sample size combinationsconsidered
limitsL3 and U3 differfromL2 and U2 only by the in- here,theagreement ofthecomputedminimumprobabilities
clusion of the so-calledYates continuitycorrection(1 /2nI of coveragewiththe nominalconfidencelevel value .95 is
+ 1/2n2). Hauck and Anderson( 1986) recommendedusing quite good forconfidenceintervalmethod 1 but verypoor
L4 and U4, whichdiffer fromL3 and U3 in the choice of
continuity correction,namely,1/2 min(nl, n2), and in the
choice of ni - 1 ratherthan ni (i = 1, 2) in theestimateof Table 1. MinimumProbability
of Coverage of the Two-Sided
thestandarderror. Approximate95% Lower and UpperConfidenceLimits
forP, - P2 UsingSome Exact ConfidenceLimits

OF COVERAGE
3. MINIMUMPROBABILITY Confidence n2

COMPARISONS interval
method n1 2 5 10 30 100

Foreachpairoftwo-sidedapproximate100(1 - a)% lower 1 2 .95703


and upperconfidencelimitsLi and Ui (i = 1, 2, 3, 4), the 5 .91000 .95099
10 .91000 .94564 .95842
minimumprobability of coverage 30 .91000 .93525 .94822 .94914
100 .91000 .93122 .94324 .92979 .95048
n n , (l IX n2
i nf
in I,
n2'
p,i( pl)n
22
px( -P2)n2-X2IA;
(xI, X2), 2 2 .01990
O<PI,P2l x1=O X2=0 XI x2 5 .01980 .04900
10 .01980 .04900 .09550
(21) 30 .01980 .04900 .09550 .26007
100 .01980 .04900 -.09550 .26007 .63339
willbe computedforsamplesizes n1, n2 = 2, 5, 10, 30, and 3 2 .75990
100 withn, 2 n2becauseitcan easilybe shown,forexample, 5 .59040 .68420
that the minimum probabilityof coverage for n1 = 5, 10 .52390 .57893 .68793
30 .46710 .47133 .51571 .70599
n2= 2 is exactlythe same as thatfornI = 2, n2 = 5. Here 100 .42297 .44093 .46123 .59876 .86724
IA,(.) is the indicatorfunctionof the set Ai = {(xI, x2)
4 2 .45240
jLi(x1, x2) ? Pi - P2 ? Ui(x1, x2)}, and L1(x1, x2) 5 .45240 .44093
= L1(t0) and U1(x1,x2) = U1(t0) forto = x1/n1-x2/n2. 10 .45240 .44093 .46123
For abL = ?au = a /2,itcan easilybe shownthatforeach 30 .45240 .44093 .46123 .45449
100 .41361 .44093 .46123 .45449 .63386
pairoftwo-sidedapproximate100(1 1-o a)% lowerand upper

This content downloaded from 195.34.79.101 on Sat, 14 Jun 2014 01:06:43 AM


All use subject to JSTOR Terms and Conditions
Peskun: Confidence Interval forDifference of Two Binomial Probabilities 659

Table 2. Values ofp, and P2 at Whichthe ComputedMinimumProbabilityof Coverage is Achievedforthe Two-SidedApproximate95%


Lower and UpperConfidenceLimitsforP, - P2

n2

Confidence 2 5 10 30 100
interval
method n, Pi P2 Pi P2 Pi P2 Pi P2 Pi P2

1 2 .91 .09
5 1 .30 1 .01
10 1 .30 1 .08 .50* .49*
30 1 .30 1 .53 1 .46 .94 .06
100 1 .30 1 .55 1 .52* 1 .30 .61 .38
2 2 1* .99*
5 1 .99 1* .99*
10 1 .99 1 .99 1* .99*
30 1 .99 1 .99 1 .99 1 .99
100 1 .99 1 .99 1 .99 1 .99 1 .99
3 2 1* .49*
5 1 .64 1* .79*
10 1 .69 1 .84 1* .89*
30 1 .73 1 .88 1 .93 1* .96*
100 .98 .71 1 .89 1 .94 1 .97 .02 0
4 2 1* .74*
5 1 .74 1* .89*
10 1 .74 1 .89 1 .94
30 1 .74 1 .89 1 .94 1* .98*
100 .98 .72 1 .89 1 .94 1 .98 1 .99
* One of a possible set of values ofPi and P2 at whichthe computedminimum of coverage is achieved.
probability

forthe otherthreemethods.The same can also be said if Hauck and Anderson( 1986), Table 2, as an illustration, gives
the nominalconfidencelevel is changedfrom.95 to either the values of Pi and P2 at whichthe computedminimum
.90 or .99. I haveverified
thisin a separatestudynotproduced probabilitiesof coverageare achievedin Table 1. Note that
here. forconfidenceintervalmethods2, 3, and 4, the minimum
To explainthisapparentdiscrepancywiththefindings of expectedcell size, MCS = min[n1p1,n1(l - Pi), n2p2,
n2(1 -P2), is 0 moreoftenthannot,witha maximum
Table3. Minimum Probability
ofCoverageoftheTwo-Sided valueof.59. In contrast, thefindings ofHauck and Anderson
Approximate 95% Lowerand UpperConfidence Limits (1986) are based on MCS 2 3.
forpi - P2 With
Minimum ExpectedCellSizes Table 3 is similarin contentto Table 1, exceptthatthe
ofat Least3 and 5 computationof the minimumprobabilityof coveragehas
Minimum Confidence been restricted to both MCS ? 3 and MCS ? 5, and the
samplesizes nl, n2 = 2, 5, and nl, n2 = 2, 5, 10, have been
n2

expected interval
cellsize method n, 10 30 100 droppedaccordingly.As mightbe expected,the computed
3 1 10 .95842 minimumprobabilityof coveragehas greatlyimprovedfor
30 .95456 .95287 confidenceintervalmethods2, 3, and 4 buteitherremained
100 .94968 .94976 .95048 thesame or onlyslightly improvedin mostcases studiedfor
2 10 .88109 confidenceintervalmethod1. But it is stillthecase thatthe
30 .88676 .87875 agreementofcomputedminimumprobabilities ofcoverage
100 .84827 .88900 .89800
withnominalconfidence levelvalue .95 is bestforconfidence
3 10 .95248 intervalmethod1 in mostcases studied.This observationis
30 .94068 .94471
100 .92004 .93657 .94462 evenmorenoteworthy, keepingin mindthatin practiceMCS
is an unknownquantity.In addition,itdoes notappearfrom
4 10 .93097
30 .94031 .91827 Table 3, as claimedbyHauck and Anderson(1986, p. 322),
100 .92509 .92850 .91917 thatforMCS > 3, confidenceintervalmethod3 is somewhat
5 1 30 .95287 unnecessarily conservative for95% intervals, and confidence
100 .94976 .95048 intervalmethod4 providesadequateprotectionagainstcov-
2 30 .91655 erageprobabilitiesless thannominalfor95% intervals.
100 .91550 .91948
3 30 .95047 4. PRECISIONCOMPARISONS
100 .95007 .95405
As a resultoftheminimumprobability ofcoveragecom-
4 30 .94220 parisons,onlyconfidenceintervalmethods1, 3, and 4 will
100 .94718 .93191
be comparedin termsoftheirprecision;thatis, in termsof

This content downloaded from 195.34.79.101 on Sat, 14 Jun 2014 01:06:43 AM


All use subject to JSTOR Terms and Conditions
660 Journal of the American Statistical Association, June 1993

Table 4. PercentageofMethod 1 Approximate95% Confidence 5. EXAMPLE


IntervalsThatare at Least as Precise (Narrow)as Their
Method3 and 4 Counterparts Triola(1992,ex. 15,p. 438)provided dataobtainedfrom
Confidence n2
a studyconducted toinvestigate theuseofseatbeltsintaxi-
interval cabs (Welkonand Reisinger 1977).Amongn1= 72 taxis
method ni 10 30 100 observed inPittsburgh,
xl = 36hadseatbeltsatleastpartially
3 10 68.60 visible.Amongn2= 129taxisobserved inChicago,x2= 77
30 61.88 59.83 hadseatbeltsatleastpartiallyvisible.In Exercise16,Triola
100 54.10 54.65 46.46 (1992,p. 438) askedthereader to construct a 95% interval
4 10 50.41 estimateofthedifferenceP - P2 betweenthetwoproportions
30 58.94 38.19 ofcabswithseatbeltsat leastpartially visible.
100 58.06 51.64 21.30
To determinethe continuitycorrectionsCL = cL(n1, xi,
n2,x2) and cu = cu(nli,xi, n2,x2) neededto computethe
theirconfidenceintervalwidthsU1- Li (i = 1, 3, 4), where two-sided approximate 95% lowerand upperconfidence
limits L1 (to) and U1 (to), thenextsmaller andlarger possible
Li is replacedby -1 ifLi <-1 and U, is replacedby 1 if
U, > 1. The precisioncomparisonswill be made only for values of T = XI In - X2/n2relativeto to = x In -x2/n2

sample sizes nl, n2 = 10, 30, 100, so thatMCS ? 3 willbe = 36/72 - 77/129 = -.097 must be found.This can be
attained. donemosteasilybyworking withthevariablenIn2tO = n2x1

For 0 ? xl < n, and 0 < x2 < n2,Table 4 givestheexact - nlx2 = (129)(36) - (72)(77) = -900 rather than toitself
percentagesof method1 confidenceintervalwidthsU1(xl, and by noting the simple following ways of changing the
x2) - L1(xi, x2) thatare at leastas precise(narrow)as their value of n2x1 - nlx2:increasing or decreasing xl by 1 in-
method3 and method4 counterparts, creases or decreases n2xI - n1x2by n2;increasing or de-
U3(xl, x2) -LAXI,
x2) and U4(xi, x2) - L4(xI, x2). This comparisonfavors creasing x2 by 1 decreasesor increases n2xI - n1x2by n1;
confidenceintervalmethod1,especiallyoverconfidencein- and increasing or decreasing both xl and x2 by 1 changes
tervalmethod3, because formostof the cases studiedthe n2x1 - n1x2 by (n2- n1) or (n1 - n2).
percentagesare nearor at least 50. Because n2- n1= 129 - 72 = 57 is smallerthann1= 72
Furtherinsightintothiscomparisonis givenin Table 5, and n2= 129,initially increasing ordecreasing bothxl = 36
wherefor0 < xl < n1and 0 < x2 < n2theexactlycomputed and x2 = 77 by 1 increases or decreases n2x1 - n1x2 by57.
lowestvalue (L), the firstquartile(Qi), the median (Q2), A smaller increase can be achieved by first
decreasing just
the thirdquartile(Q3), and the highestvalue (H) of the x2= 77 by 1 to x2 = 76, and then decreasing both xl = 36
widthsUi(xl, x2) - Li(xl, x2) (i = 1, 3, 4) are tabulated.It and x2 = 77 by 1. In thiscase n2X - n x2 increasesfirst by
appearsthatthe smaller(larger)confidenceintervalwidths 72 andthendecreases by57 fora netincreaseof 15.More
ofmethod1 tendtobe larger(smaller)thantheircounterparts generally, decreasing just x2 = 77 by r to x2 = 77 - r
formethods3 and 4. and then decreasing both xl = 36 and x2 = 77 - r by s
would resultin n2xI - n1x2increasing by 72r - 57s
= 3 (24r - 19s). In particular, for r = 4 and s = 5, thesmall-
Table 5. Lowest Value (L), FirstQuartile(Q,), Median (Q2),
ThirdQuartile(Q3), and HighestValue (H) of the estpossibleincreasein n2xI - n1x2is achieved, namely3.
Approximate95% ConfidenceIntervalWidths Thuscu = 3/2nln2= 1/6,192.Similarly, it can be shown
U, - Li (i = 1, 3,4) thatcL= 1/6,192.
From(12) and (14), it now followsthatthetwo-sided
Confidence
interval approximate95% confidenceinterval[LI, U1] = [-.237,
n2 ni method L Q1 Q2 Q3 H .047]. Note thatit is moreprecisethananyone oftheother
10 10 1 .3369 .7964 .8549 .8825
three two-sided approximate 95% confidence intervals
.8859
3 .2000 .7259 .8790 .9840 1.0765 [Li, U1] (i = 2, 3, 4), whichare equal to [-.240, .046],
4 .1000 .6767 .8196 .9264 1.0240 [-.251, .057], and [-.248, .054]. Note also thatL1 and
30 1 .3085 .6438 .6926 .7098 .7140 U1are notsymmetrically locatedaboutto= -.097 as are
3 .1333 .5627 .7278 .7940 .8490 Li and Ui (i = 2, 3, 4).
4 .1000 .5550 .7200 .7901 .8479
100 1 .3085 .6193 .6388 .6467 .6487 6. CONCLUSION
3 .0842 .5112 .6407 .7244 .7601
4 .0793 .5202 .6562 .7469 .7824
Withrespectto the fourconfidence intervalmethods
30 30 1 .1193 .4604 .5007 .5178 .5219
3 .0667 .4213 .4806 .5293 .5727 discussedin this articlefor constructing
approximate
4 .0333 .3940 .4543 .5038 .5481 100(1 - a)% two-sidedlowerand upperconfidence
limits
100 1 .1157 .3675 .3927 .4027 .4053 fortheunknowndifference
Pi - P2 of two binomialproba-
3 .0433 .3197 .3876 .4220 .4513 I recommend
bilities, method1 becauseofitscomparable
4 .0333 .3131 .3827 .4176 .4472 precision, thegoodagreement ofitscoverageprobabilities
100 100 1 .0366 .2483 .2720 .2817 .2844 withnominalconfidence levelvalues,and thesurprising
3 .0200 .2186 .2477 .2738 .2972 smallness ofitssamplesizesn1andn2forwhichthenormal
4 .0100 .2096 .2389 .2651 .2886
approximation is deemedappropriate.

This content downloaded from 195.34.79.101 on Sat, 14 Jun 2014 01:06:43 AM


All use subject to JSTOR Terms and Conditions
Peskun: Confidence Interval forDifference of Two Binomial Probabilities 661

APPENDIXA: PROOF THAT thatis, L(1) is the minimumof thefunction


L(1/m1 - 1) = -(1 - aL)11
g(p2) = [a/(l ( P2) n2] I Inl -P2
To provethatL(/ml - 1) =-(1 - aL)I/l forml = max(n1, for 0 < P2 ? 1 - a'!2. Using the second-derivative test,it is
n2),notethatthenextlargest
valueafterthesmallest
possible
value easily shown that g(p2) has a relative minimum at P2 = 1
O/n,- n2/n2= -1 ofthestatistic T = XI/nn- X2/n2is either - [(n2/nl)'1aL] I/(n+n2), this minimum value being equal to
1l/n- n2/n2= 1/n,- 1 orO/n1- (n2- 1)/n2= 1/n2- 1.That (ni + n2)(ad/nnln n2)l/(nl+n2) - 1. But the minimum of g(p2) is
is,itis 1/ml- 1. Then desiredfor0 < P2 < 1 - aj 1 . Thus, fortheminimumto be equal 2

L(1/m - 1) = inf {Pi -p2 lPr[T? 1/Ml - 11> aL}, to (ni + n2)(aL/nni n2)/1(n+n2) 1,
O0P1,P2?1
0 ? 1 - [(n2/nl)fllaLIl/(nl+n2) < 1 - a/l/n2;
where thatis,
Pr[T? 1/ml- 1] = 1 - Pr[T= -1]
Ij?L < nl/n2 <? -' /n2
= 1 - Pr[XI= 0,X2= n2] Otherwise, theminimumofg(p2) for0 < P2 < 1 - aL/2 is achieved
= 1 -( 1 -pl )nl,pn2.
at eitherP2 = 0 or P2 = 1 - aj'In2. Note that
Thus g(P2) = aL In, for P2 = 0,

L(1 /ml- )= inf {PI -P2|1 -(1 P1))lP2>aL} = aj l2 for P2 = 1 - a(L2;


OsP1,P2Sl
thatis,
inf {Pl P21P2 < [(1 - alL)/(_ -pl)nf]l/Il2}
OAP,P2V 1 L(l) = (n, + n2)(aL/li n2) ( 2) - 1,
min{pi - [(1 - aL)/(l/ )
pl nl] If2 if a' In,
< n1/n2 ? a-l/n2
OSp,?1
= a LIm2 otherwise.
I(1 - aL)/(1 -pl)n' < 1}
It can be shown similarly,that U(l - 1 /ml) = (1 - au)l/ml and
min {Pi - [(1 -aL)/(1 _p)nll)ln2 that
O<pl<l

P, < 1 -(1 - aL) I} U(-1) = 1 - (n + n2)(aUlnlli2n2) l(n +n2)

thatis,L(1 /Iml- 1) is theminimum ofthefunction if aU In < n1/n2 <a'-/'ln2

f(pi) = pi - [(1 - aL)/(1 -P p)n1]l/n2


=- au , otherwise.
for0 ? Pi ? 1 - (1 - aL) ". Usingthesecond-derivativetest,it
is easily shown thatf( pi) has a relativemaximum at Pi = 1 APPENDIX B: MINIMUM OF Pi - P2
- [(ni/n2)n2(1 - aL)]lIn, wheren = n1+ n2.Thustheminimum
Suppose that n1 = 5, xi = 2, n2 = 2, and x2 = 2. Then to = -.6
off(p1)for0 < Pi < 1 - (1 - aL) n is achievedat either
P, = 0 and CL = .1. The minimumofp, - P2 on theellipseg(p1, P2) = 0
orp = 1-(1 -aL)I,. Note that
definedby (9) withaL = .05 is L(to) = d = -.9978, but thepoint
f(pi) = (1 - aL)/2 for Pi = 0, (pi,p2) = (.5 + ndl/n, .5 - n2d/n) = (-.2127, .7851) at whichit
= -(1 - aL)) for Pi = 1 - ( a1 - aL)/;
is achievedlies outsidetheunitsquare0 ? Pi, P2 < 1.
A largervalue of d is obtainedby determiningwheretheellipse
thatis,L(1/rm1 - 1) = -(1 - aL)'.-
g(pi, P2) = 0 intersects theboundaryoftheunitsquare0 ? Pl, P2
Now,to provethatform2= min(n1,n2), < 1. This is easilydone by substituting in turnthevalues 0 and 1
L(1) = (ni + n2)(aL/nninfn2)1/(nl+n2)- 1 firstforpi and thenforP2, and solvingg(pi, P2) = 0 firstforP2
and thenforpl. In particular,it is easilyshownthatg(O, P2) = 0
ifaLln' < ni/n2 < a-i/'n2,
and g(pl, 1) = 0 are bothsolvablewithP2 = .9510 orp2 = .2190
=ajLm2, otherwise, and p, = .0897 orpi = .6507, and thatbothg(l,P2) = 0 and g(pl,
notethatbecausethelargest X1/nI - X2/n2 is
valueofthestatistic 0) = 0 are notsolvable.Thus theminimumofp1 - P2 on thatpart
nln,- 0/n2 = 1,then
oftheellipseg(pl, P2) = 0 containedin theunitsquare0 ? Pi, P2
< 1 is achievedat the boundarypoint (0, .9510) and is equal to
L(1) = inf {p -p2 lPr[T2 11> aL}, -.9510.
0--!PI,P2:!_ I

where [Received
October
1991.RevisedSeptember
1992.]

Pr[T 2 1] = P[X1 = nl, X2 =01=p] I -P2) 2 REFERENCES


Thus
Hauck, W. W., and Anderson,S. (1986), "A ComparisonofLarge-Sample
L(1) = inf {mP-P2IPi'(l P2)> aL} ConfidenceIntervalMethodsforthe Difference of Two BinomialProb-
0:5Pl ,P2` 1
abilities,"TheAmericanStatistician,40, 318-322.
inf {pi P2lPI > [aL/(l P2)2] l} Peskun,P. H. (1990), "A Note on a GeneralMethod forObtainingCon-
0?PI,P2?<1 fidenceIntervalsFrom Samples From Discrete Distributions,"The
AmericanStatistician,44, 31-35.
min{[aL/(1 _p2)n2]l/ln -P2 |aL/(1 -P2)n2 < 1} Triola,M. F. (1992), Elementary
Statistics
(5thed.), Reading,MA: Addison-
0?P2 1
Wesley.
Welkon,C., and Reisinger,K. S. (1977), "The PhantomTaxi Seat Belt,"
= min { [aCL/ (1
O?P2?1
-P2 )l2 ] /f IP21|P2 < 1- a' /l2 };
AmericanJournalofPublicHealth, 11, 1091-1092.

This content downloaded from 195.34.79.101 on Sat, 14 Jun 2014 01:06:43 AM


All use subject to JSTOR Terms and Conditions

You might also like