Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Peskun (1990) describeda generalmethodforobtaining ni,Pi, n2, P2] > au}. (4)
well-definedone-sidedand two-sided"exact" lowerand up-
It easilyfollowsfrom(4) thatU( 1) = 1,and itcan be similarly
per confidencelimitsfroma sample froma discretedistri-
shown,as in AppendixA, that
bution withan unknownparameter.This methodcan be
appliedto thediscretesamplingdistribution
oftheunbiased U(1 - 1/Imn)= (1- au)l/mI (5)
L(to) LI(to)
if aUln' nl/n2 < a-'l/n2
- [to- CL - Z-LV(n + Z?L )/4nln2 - (to - CL)2/In]
=-au 1/r2 otherwise. (6)
+ (l +Z2aL/fn). (12)
Ifa = aL + au and 0 < a < 1,thenL(to) and U(to) provide
exact two-sidedlowerand upper 100(1 - a)% confidence Note thatthis value of d can alwaysbe used as an ap-
limitsforPi - P2 which,withoutproof,I strongly feelare proximationforL(to), but if n1 # n2 it mightbe smaller
welldefinedin thesensethatL(to) ? U(to). thantheminimumdefinedby(8) becauseitis theminimum
ofpl -P2 on thewholeellipseg(pi, P2) = 0 definedby (9),
2. APPROXIMATE CONFIDENCELIMITS and thepoint(PI, P2) = (.5 + nId/n, .5 - n2d/n)at which
it is achieved may be outsidethe unit square 0 ? PI, P2
Using the fact that for nI and n2 sufficiently large, T ? 1. If thisis thecase, thenthedesiredminimumvalue of
=Xl/n1 - X2/n2is approximatelynormallydistributed PI - P2 definedby (8) is achieved at one of the pointsof
withmeanp1 -P2 and variancep1 (1 -p1)/nl + P2(1 -P2) intersection oftheellipseg(pl, P2) = 0 and theboundaryof
-n2, theprobability Pr[ T > to;nI,p1, n2,P2] in (1), written theunitsquare0 ' PI, P2 ? 1. A simpleexampleis givenin
moreconveniently as Pr[ T ? to],can be computedapprox- AppendixB.
imatelyas follows: Similarly,it can be shownthat
=Pr~
-Pr T-(p1-p2) * [pl (1-pl)/nl + P2( -p2)/n2J112
Vpl (I- p1)/n1 + P2(1- p2)/n2 }, (13)
-~ ~~~t -
(P 2 -ZaU
PrZ
tO- CL- (P1 P2) 1. (7)
7
( 1 + zeul/n). (14)
/P,0( - PI1 nI + P2( - p2)/n2 J
Here cu = cu(nl, xi, n2,x2) is a continuity correctionequal
Here Z is the standardnormal random variable and CL to half the absolute difference between to = xl InI x2/ n2 -
= cL(n1,xl, n2,x2) is a continuitycorrectionequal to half and the next larger possible value of T. It can easily be shown
theabsolutedifference betweento = xl/nl - x2/n2and the that cL = cu= 1/n if n, = n2. But ifn1 # n2, then cL and cu
nextsmallerpossiblevalue of T. are not always equal. In Section 5, itis shown how the values
From (1) and (7), it followsthat of CL and cu can be determined in general.
Note thatifn1* n2and ifthepoint(PI, P2) = (.5 + n,d/
L(to) min {PI -P21 [to- CL-(P1 -P2)] n, .5 - n2d/n) at which the maximum of p- P2 = d on
0P1,P2- 1
the ellipseh(pi, P2) = [to + CU - (PI - P2)]/[PI(1 - PI )/
[Pl(I -p1)/nl + P2(l -p2)/n2]'12 n, + P2( - P22)/n2]1/2 + Zu = 0 is achievedlies outsidethe
unitsquare0 ? PI, P2 < 1, thenthedesiredmaximumvalue
ZaL }. (8) OfPi - P2 definedby (13) is achievedat one ofthepointsof
intersection oftheellipseh(PI, P2) = 0 and theboundaryof
Using the method of a Lagrangemultiplierto minimize the unit square 0 ? PI, P2 ? 1. Note thatthismaximum
f( Pi, P2) = Pi - P2 subjectto theside condition value ofp1 - P2 will necessarilybe smallerthan U1(to).
To achievegoodagreement betweencoverageprobabilities
g(P1, P2) = [to - CL - (PI P2)1 and nominalconfidencelevelvalues,as willbe discussedin
* [p1(1 - p1)/nI + P2(1 - p2)/n212 the nextsection,I recommendthatthe approximateconfi-
dencelimitsLI (to) and U1(to) alwaysbe used insteadofthe
Za'L
0 (9) possiblylargerminimumand smallermaximumvalues of
it can be shown thatPi and P2 must satisfythe following Pi - P2 definedby (8) and (13). This recommendationis
based on the resultsof a separatestudythatI have carried
constraint:
out but have not producedhere.
For comparison,the followingapproximateconfidence
n2PI + nlp2 = (n, + n2)/2 = n/2. (10)
limitswillalso be considered:
LettingPi -P2 = d, it followsfrom(9) and ( 10) that L2(x1, X2) =(1 - P2)
(to -CL -d)/(n/4n1n2 -d2/n)"/2 - zaL. (11) -ZALV/Pl(l - P3)/n1+P^2(l -j32)/n2,( 15)
- + P^2(1
[ZALVIl(l - 13^)/In, This leads to a reductionin the amount of computation
A
2)ln2
involvedin calculating(21), the minimumprobabilityof
+ (1/2n, + 1/2n2)], (17) coverage,by allowingtheparameterspace 0 c Pl, P2 1 to
U3(XI,X2) =(l -P2) be replacedby either0 PI ?C P2 c 1 or 0 C P2 ? pc ? 1.
The lattersimplifiedformof the parameterspace will be
+ [ZaU8VIi(1 -I _)/nA + p2(1 -132)/n2 used in thisarticle.To further reducethe amountof com-
putationinvolvedin calculating(21) exactly,thesimplified
+ (1/2n, + 1/2n2)], (18) parameterspace 0 < P2 ' PI < 1 will be discretizedto the
L4(xI, x2) finitegridof pointsPI, P2 = .00(.01) 1.00 withP2 ' Pi. Al-
thoughthisdiscretization willpossiblyresultin thecomputed
=(pi3 - 2) minimumprobability of coveragebeingslightly largerthan
the actual value, it should not invalidateany comparisons
- [ZaLVil(1 - _PI)/(rn1- 1) + P2(1 - P2)1(n2 - 1)
made betweenthefourconfidenceintervalmethodsrelative
+ 1/2 min(nl, n2)], (19) to the agreementof theirrespectivecoverageprobabilities
withnominalconfidencelevelvalues.
and In Table 1 notethatthecomputedminimumprobabilities
U4(xI, X2) of coverage,givenfora nominalconfidencelevel equal to
A
.95, were obtainedusingthe exact confidencelimitsgiven
= (131-12) A2
in Section 1 for the special values to = -1, 1/rm- 1,
1 - 1 /mlI,and 1,no matterwhichpairofapproximatecon-
+ [ZaUV13i(1- 1,)/(n- - P2 P2)/(n2 - 1)
fidencelimitsLi and U, (i = 1, 2, 3,4) was used forotherto
+ 1/2 min(n1, n2)]. (20) values.As has beenverified in a separatestudynotproduced
here,usingtheseexactconfidencelimitsresultsin a general
HereP^l-PI2 = xl -x2/n2 = to. Note thatL2 and U2are increasein theminimumprobability ofcoverage,especially
theapproximateconfidencelimitsmostcommonlygivenin forsmall sample sizes nI and n2. More importantly, note
elementarystatisticsbooks. The approximateconfidence that formost of the sample size combinationsconsidered
limitsL3 and U3 differfromL2 and U2 only by the in- here,theagreement ofthecomputedminimumprobabilities
clusion of the so-calledYates continuitycorrection(1 /2nI of coveragewiththe nominalconfidencelevel value .95 is
+ 1/2n2). Hauck and Anderson( 1986) recommendedusing quite good forconfidenceintervalmethod 1 but verypoor
L4 and U4, whichdiffer fromL3 and U3 in the choice of
continuity correction,namely,1/2 min(nl, n2), and in the
choice of ni - 1 ratherthan ni (i = 1, 2) in theestimateof Table 1. MinimumProbability
of Coverage of the Two-Sided
thestandarderror. Approximate95% Lower and UpperConfidenceLimits
forP, - P2 UsingSome Exact ConfidenceLimits
OF COVERAGE
3. MINIMUMPROBABILITY Confidence n2
COMPARISONS interval
method n1 2 5 10 30 100
n2
Confidence 2 5 10 30 100
interval
method n, Pi P2 Pi P2 Pi P2 Pi P2 Pi P2
1 2 .91 .09
5 1 .30 1 .01
10 1 .30 1 .08 .50* .49*
30 1 .30 1 .53 1 .46 .94 .06
100 1 .30 1 .55 1 .52* 1 .30 .61 .38
2 2 1* .99*
5 1 .99 1* .99*
10 1 .99 1 .99 1* .99*
30 1 .99 1 .99 1 .99 1 .99
100 1 .99 1 .99 1 .99 1 .99 1 .99
3 2 1* .49*
5 1 .64 1* .79*
10 1 .69 1 .84 1* .89*
30 1 .73 1 .88 1 .93 1* .96*
100 .98 .71 1 .89 1 .94 1 .97 .02 0
4 2 1* .74*
5 1 .74 1* .89*
10 1 .74 1 .89 1 .94
30 1 .74 1 .89 1 .94 1* .98*
100 .98 .72 1 .89 1 .94 1 .98 1 .99
* One of a possible set of values ofPi and P2 at whichthe computedminimum of coverage is achieved.
probability
forthe otherthreemethods.The same can also be said if Hauck and Anderson( 1986), Table 2, as an illustration, gives
the nominalconfidencelevel is changedfrom.95 to either the values of Pi and P2 at whichthe computedminimum
.90 or .99. I haveverified
thisin a separatestudynotproduced probabilitiesof coverageare achievedin Table 1. Note that
here. forconfidenceintervalmethods2, 3, and 4, the minimum
To explainthisapparentdiscrepancywiththefindings of expectedcell size, MCS = min[n1p1,n1(l - Pi), n2p2,
n2(1 -P2), is 0 moreoftenthannot,witha maximum
Table3. Minimum Probability
ofCoverageoftheTwo-Sided valueof.59. In contrast, thefindings ofHauck and Anderson
Approximate 95% Lowerand UpperConfidence Limits (1986) are based on MCS 2 3.
forpi - P2 With
Minimum ExpectedCellSizes Table 3 is similarin contentto Table 1, exceptthatthe
ofat Least3 and 5 computationof the minimumprobabilityof coveragehas
Minimum Confidence been restricted to both MCS ? 3 and MCS ? 5, and the
samplesizes nl, n2 = 2, 5, and nl, n2 = 2, 5, 10, have been
n2
expected interval
cellsize method n, 10 30 100 droppedaccordingly.As mightbe expected,the computed
3 1 10 .95842 minimumprobabilityof coveragehas greatlyimprovedfor
30 .95456 .95287 confidenceintervalmethods2, 3, and 4 buteitherremained
100 .94968 .94976 .95048 thesame or onlyslightly improvedin mostcases studiedfor
2 10 .88109 confidenceintervalmethod1. But it is stillthecase thatthe
30 .88676 .87875 agreementofcomputedminimumprobabilities ofcoverage
100 .84827 .88900 .89800
withnominalconfidence levelvalue .95 is bestforconfidence
3 10 .95248 intervalmethod1 in mostcases studied.This observationis
30 .94068 .94471
100 .92004 .93657 .94462 evenmorenoteworthy, keepingin mindthatin practiceMCS
is an unknownquantity.In addition,itdoes notappearfrom
4 10 .93097
30 .94031 .91827 Table 3, as claimedbyHauck and Anderson(1986, p. 322),
100 .92509 .92850 .91917 thatforMCS > 3, confidenceintervalmethod3 is somewhat
5 1 30 .95287 unnecessarily conservative for95% intervals, and confidence
100 .94976 .95048 intervalmethod4 providesadequateprotectionagainstcov-
2 30 .91655 erageprobabilitiesless thannominalfor95% intervals.
100 .91550 .91948
3 30 .95047 4. PRECISIONCOMPARISONS
100 .95007 .95405
As a resultoftheminimumprobability ofcoveragecom-
4 30 .94220 parisons,onlyconfidenceintervalmethods1, 3, and 4 will
100 .94718 .93191
be comparedin termsoftheirprecision;thatis, in termsof
sample sizes nl, n2 = 10, 30, 100, so thatMCS ? 3 willbe = 36/72 - 77/129 = -.097 must be found.This can be
attained. donemosteasilybyworking withthevariablenIn2tO = n2x1
For 0 ? xl < n, and 0 < x2 < n2,Table 4 givestheexact - nlx2 = (129)(36) - (72)(77) = -900 rather than toitself
percentagesof method1 confidenceintervalwidthsU1(xl, and by noting the simple following ways of changing the
x2) - L1(xi, x2) thatare at leastas precise(narrow)as their value of n2x1 - nlx2:increasing or decreasing xl by 1 in-
method3 and method4 counterparts, creases or decreases n2xI - n1x2by n2;increasing or de-
U3(xl, x2) -LAXI,
x2) and U4(xi, x2) - L4(xI, x2). This comparisonfavors creasing x2 by 1 decreasesor increases n2xI - n1x2by n1;
confidenceintervalmethod1,especiallyoverconfidencein- and increasing or decreasing both xl and x2 by 1 changes
tervalmethod3, because formostof the cases studiedthe n2x1 - n1x2 by (n2- n1) or (n1 - n2).
percentagesare nearor at least 50. Because n2- n1= 129 - 72 = 57 is smallerthann1= 72
Furtherinsightintothiscomparisonis givenin Table 5, and n2= 129,initially increasing ordecreasing bothxl = 36
wherefor0 < xl < n1and 0 < x2 < n2theexactlycomputed and x2 = 77 by 1 increases or decreases n2x1 - n1x2 by57.
lowestvalue (L), the firstquartile(Qi), the median (Q2), A smaller increase can be achieved by first
decreasing just
the thirdquartile(Q3), and the highestvalue (H) of the x2= 77 by 1 to x2 = 76, and then decreasing both xl = 36
widthsUi(xl, x2) - Li(xl, x2) (i = 1, 3, 4) are tabulated.It and x2 = 77 by 1. In thiscase n2X - n x2 increasesfirst by
appearsthatthe smaller(larger)confidenceintervalwidths 72 andthendecreases by57 fora netincreaseof 15.More
ofmethod1 tendtobe larger(smaller)thantheircounterparts generally, decreasing just x2 = 77 by r to x2 = 77 - r
formethods3 and 4. and then decreasing both xl = 36 and x2 = 77 - r by s
would resultin n2xI - n1x2increasing by 72r - 57s
= 3 (24r - 19s). In particular, for r = 4 and s = 5, thesmall-
Table 5. Lowest Value (L), FirstQuartile(Q,), Median (Q2),
ThirdQuartile(Q3), and HighestValue (H) of the estpossibleincreasein n2xI - n1x2is achieved, namely3.
Approximate95% ConfidenceIntervalWidths Thuscu = 3/2nln2= 1/6,192.Similarly, it can be shown
U, - Li (i = 1, 3,4) thatcL= 1/6,192.
From(12) and (14), it now followsthatthetwo-sided
Confidence
interval approximate95% confidenceinterval[LI, U1] = [-.237,
n2 ni method L Q1 Q2 Q3 H .047]. Note thatit is moreprecisethananyone oftheother
10 10 1 .3369 .7964 .8549 .8825
three two-sided approximate 95% confidence intervals
.8859
3 .2000 .7259 .8790 .9840 1.0765 [Li, U1] (i = 2, 3, 4), whichare equal to [-.240, .046],
4 .1000 .6767 .8196 .9264 1.0240 [-.251, .057], and [-.248, .054]. Note also thatL1 and
30 1 .3085 .6438 .6926 .7098 .7140 U1are notsymmetrically locatedaboutto= -.097 as are
3 .1333 .5627 .7278 .7940 .8490 Li and Ui (i = 2, 3, 4).
4 .1000 .5550 .7200 .7901 .8479
100 1 .3085 .6193 .6388 .6467 .6487 6. CONCLUSION
3 .0842 .5112 .6407 .7244 .7601
4 .0793 .5202 .6562 .7469 .7824
Withrespectto the fourconfidence intervalmethods
30 30 1 .1193 .4604 .5007 .5178 .5219
3 .0667 .4213 .4806 .5293 .5727 discussedin this articlefor constructing
approximate
4 .0333 .3940 .4543 .5038 .5481 100(1 - a)% two-sidedlowerand upperconfidence
limits
100 1 .1157 .3675 .3927 .4027 .4053 fortheunknowndifference
Pi - P2 of two binomialproba-
3 .0433 .3197 .3876 .4220 .4513 I recommend
bilities, method1 becauseofitscomparable
4 .0333 .3131 .3827 .4176 .4472 precision, thegoodagreement ofitscoverageprobabilities
100 100 1 .0366 .2483 .2720 .2817 .2844 withnominalconfidence levelvalues,and thesurprising
3 .0200 .2186 .2477 .2738 .2972 smallness ofitssamplesizesn1andn2forwhichthenormal
4 .0100 .2096 .2389 .2651 .2886
approximation is deemedappropriate.
L(1/m - 1) = inf {Pi -p2 lPr[T? 1/Ml - 11> aL}, to (ni + n2)(aL/nni n2)/1(n+n2) 1,
O0P1,P2?1
0 ? 1 - [(n2/nl)fllaLIl/(nl+n2) < 1 - a/l/n2;
where thatis,
Pr[T? 1/ml- 1] = 1 - Pr[T= -1]
Ij?L < nl/n2 <? -' /n2
= 1 - Pr[XI= 0,X2= n2] Otherwise, theminimumofg(p2) for0 < P2 < 1 - aL/2 is achieved
= 1 -( 1 -pl )nl,pn2.
at eitherP2 = 0 or P2 = 1 - aj'In2. Note that
Thus g(P2) = aL In, for P2 = 0,
where [Received
October
1991.RevisedSeptember
1992.]