Professional Documents
Culture Documents
Author(s): F. Yates
Reviewed work(s):
Source: Supplement to the Journal of the Royal Statistical Society, Vol. 1, No. 2 (1934), pp.
217-235
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2983604 .
Accessed: 02/01/2013 18:34
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to
Supplement to the Journal of the Royal Statistical Society.
http://www.jstor.org
By F. YATES, B.A.
Introduction.
THERE has in the past been a good deal of argumentas to the ap-
propriate statistical tests of independence for contingencytables,
particularlythose in which each classificationis a simple dichotomy
(i.e. 2 X 2 tables). It is probably now almost universallyadmitted
that when the numbers in the various cells are large the x2 test,
introduced by K. Pearson in 1900,4 with the vital modification
as to degrees of freedom established by R. A. Fisher in 1922,1
and confirmedby Yule,5 is the appropriate one. The necessity
for this modification has been amply demonstrated both from
theoretical considerations1, 5 and by actual tests on material
knownto be almost if not quite freefromassociation,2 5 and there
is no need to discuss the matter furtherhere. Several other forms
of test for 2 X 2 tables have been shown to be equivalent to the
X2 test.'
The x2test is admittedly approximate,for in order to establish
the test it is necessary to regard each cell value as normally dis
tributed with a variance equal to the expected value, the whole
set of values being subject to certain restrictions. The accuracy
of this approximationdepends on the numbersin the various cells,
and in practice it has been customaryto regard x2 as sufficiently
accurate if no cell has an expectancy of less than 5. It is with the
question of the applicability of x2 to 2 X 2 contingency tables
involving small expectancies that we are directlyconcernedin this
paper.
It was suggestedto me by ProfessorFisher that the probability
of any observedset of values in a 2 X 2 contingencytable withgiven
marginal totals can be exactly determined. The method will be
explained in the next section. Armed with the exact distribution
the divergenceof the x2test in any special case can be tested. It
will be shown that although the test as ordinarilyapplied becomes
inaccurate even with moderatelysmall numbersin the cells, a simple
modificationenables the range of usefulness to be considerably
extended.
The problem of testing the independence of contingencytables
involving small numbers is of considerable practical importance
c! d! pc(l p)d
Binomial Distributionswithknownp.
The examination of the x2 test in the case of simple binomial
distributionswith known p will illustratesome importantpoints.
We will firstconsider the symmetricaldistribution
(1 _ 1) 10.
TABLE I.
TABLE II.
Successes. p'. P. P(X). Discrepancy. P(%'). Discrepancy.
for Continuity.
Discrepanciesof thex2 Test aftercorrecting
If any given contingencydistributionis calculated by means of
the appropriate hypergeometricseries the discrepancies between
the values of x correspondingto the true probabilities and the
equivalent values of Z' can be evaluated. These discrepancies
are more convenientto work with than the discrepanciesbetween
the true and X' probabilities which were considered in the last
section: clearly they are immediatelyconvertiblethe one into the
otherby referenceto a table of the normalprobabilityintegral.
There are, of course, in general no discrepancies corresponding
to the exact 2X5 per cent. and 05 per cent. points, but it is possible
to determineapproximate hypotheticaldiscrepanciescorresponding
to these points on the true probability scale by interpolation. If
the discrepancies are plotted against the logarithms of the true
probabilitiesthe resultant points lie on remarkablyregular curves,
making graphical or other interpolationeasy. Fig. 1 shows the
graph of these discrepanciesin the case of the binomial of Table II.
For every distributiongenerated by a 2 X 2 contingencytable
with fixed marginaltotals but variable 'lass numbers,therefore,a
+ 02
+ 0.1
P ~~~~0.5% 2.5%
0
LogP- 30 -25 -20 -'15 _10 0.5
-0-1 _/
-02-02
- 03
- 04
where p rn/n, i.e. the expectation divided by the range less one.
In what followsit will be convenientto use m and p instead of m
and n + 1 as definingthe class.
With expectation 4 and range 12 + I, for example, we obtain
the series with sets of marginaltotals (24, 12; 24, 12), (27, 12; 26,
13), (30, 12; 28, I4), etc., with the limiting binomial (3 + 9)12.
With expectation32- and range io + i we obtain the series (30, Ia;
26, 14), (50, Ia; 39, 2i), etc. In each case the first-named distri-
bution is the limitingcontingencydistribution.
The utilityof the above classificationlies in the fact (established
by examinationof special cases) that the X' discrepanciesare similar
forall distributionsin anyone class,and in generaldecreaseorincrease
steadilywithincreasingN. Thus the knowledgeofthe X' discrepancy
forthelimitingcontingencydistributionand the binomialof any class
sets definitelimits to its value for any distributionof that class.
Moreoverthe X' discrepancies for the limiting contingencydis-
tributionsand the binomials vary in a regular manner as m and p
are varied. Fig. 2 illustrates this variation with variations of
p whenmis equal to 4. Thereare fourseparatediagramscorrespond-
ing to 2-5 per cent. and 0-5 per cent. points of the longerand shorter
tails. In each case the values actually calculated are marked. It
will be seen that the values fall very satisfactorilyon to smooth
curves (the curves of binomial values age shown full, those of the
limitingcontingencyseries values dotted).
n
+02 2
+ 01
0S
o
0-1 0-2 0-3
O !
0
I I
0-2
I
0-3 0-4 0-
, '
07
08
1
00 00 0.0
L-00
-0-1 v
1S
-0-2 Jd
03_
2-5% point Longer Tail. 2-5% point ShorterTail.
043 X'9 4 -
+03 - +02 _ /
+0, _ +01 _
O~~
p- 0
I
01
.
00
~ 03
'm
05
I
06
0
0 01 02
I
03
'
04
I
00
',S
- 07 08
-03 _
TABLE III.
The 2-5per cent.and o 5 per cent.points of x'.
The values forthe binomialdistributionsare shownin ordinarytype,those
forthe limitingcontingencydistributionsin italics.
thesmallest
the sma.llestexpecta.tion
m = the smallestexpectation. p = the sma.llestm arin total'
ma.rgina.l
2 5 per cent. points. 0 5 per cent. points.
2.04 - -2-67 - -
3 2 19 2 07 1.88 1 73 - 3 05 2 78
202 193 183 267 2-48
4 2 16 2 06 1 90 1 77 1.68 2 97 2 76 2 41 2 18
2 02 194 1 85 2 67 2 50 2 32
5 2 14 2 06 1 91 1.80 1 71 2 95 2 75 2 44 2 23 2 06
2 01 194 1-86 2 66 2 52 2 36
6 2 13 2 05 1 92 1 82 1 74 2 92 2 73 2 47 2 27 2-13
,2 01 194 1 87 2 66 2 53 2 38
8 2 11 2 04 1 93 1 84 1-77 2 88 2 72 2 50 2 32 2 19
2 00 1 95 1-89 2 65 254 242
12 2 08 2 02 1-94 1 87 1.81 2 83 2-70 2-52 2 38 2 27
199 195 190 264 255 246
24 2 05 2 01 1 95 1 90 1 86 2*76 2 67 2 55 2 45 2 37
1 99 1 95 1 92 2 63 2 56 2 50
48 2 03 2 00 1 96 1-91 1-89 2 70 2 64 2 56 2 49 2 43
198 196 1-94 262 257 252
96 2-01 1 99 1 96 1 93 1 91 2 67 2 63 2 57 2 52 2 48
1 97 1 96 1 94 2 60 2 57 2 54
tail, and + 0o095 and + OI73 on the longer tail. Adding the
2-5 per cent. discrepancies to I96o (the 2-5 per cent. point of X)
and the o*5 per cent. discrepanciesto 2-576 (the o 5 per cent. point
of X) gives the tabulated values of iv8o, 2-23, 2o6 and 2-75 respec-
tively.
Since the table only contains values for the binomials and the
limiting contingencydistributionsit only serves to provide upper
and lower limits to the actual 2-5 per cent. and o 5 per cent points
of X' for other contingencydistributions. In testingany particular
.
Breast-fed . 4 16 20
Bottle-fed . .. ... ... 1 21 22
Total .5 37 42
No. of Normal
Breast-fed Children. Multiplier. Probability.
0 0-0309568
1 5.20/1.18 0-171982
2 4.19/2.19 0-343964
3 3.18/3.20 0-309568
4 2.17/4.21 0 125301}0.143527
5 1.16/5.22 0*018226f
0-999998
No. of Normal .
Breast-fed Children. Probablity.
0 0-12859
1 0-31652
2 0-31892
3 0-17136
4 0 05355
5 0 00993
6 0-00106 1006460
7 0-00006 I
8 0 00000J
0.99999
Here the ordinaryx2 test attains the 5 per cent. level of significance
(2.5 per cent. on one tail). The true probability,however,oo646
on the one tail, is nowhere near this, and again the correctionfor
continuity,whichgives a probabilityof 0-0571, is a good approxima-
tion. Here also X' is sufficientlysmall to make referenceto Table III
unnecessary.
Thus it will be seen that even on the most favourable grouping
of the data association of the degree observed mighthave arisen by
chance about once in eight times, so that Hillman's conclusions
cannot be regarded as established.
In neitherof these examples has it been necessaryto make any
exact referenceto Table III. This is the case withthe greatmajority
of tests, but to illustratethe application of the table we will suppose
that the results obtained in the second case were as follows:
Breast-fed. ... 5 15 20
Breast or breast and bottlefed ... 3 69 72
Total .. ... ... 8 84 92
TABLE V.
Discrepanciestn a 2 X 3 Contingency
Table.
Summary.
1. A method of obtaining the exact probability distribution
associated with a 2 X 2 contingencytable with given marginal
totals is developed.
2. It is shown that the ordinaryx2test is liable to considerable
errors when the expectations are moderately small. A simple
modificationis suggested which considerablyincreases the accuracy
of x2. Tables are given which enable the limits of applicability
of the modifiedtest to be determined,and serve as a means of in-
creasingthe accuracy of the modifiedtest.
3. The applicabilityof the x2test to contingencytables involving
morethan one degree of freedomis brieflydiscussed.
References.
1 R. A. Fisher (1922). " On the Interpretationof x2 from Contingency
Tables, and the Calculationof P." Journ.Roy. Stat. Soc., Vol. LXXXV, pp.
87-94.
2 R. A. Fisher (1926). " Bayes' Theorem and the Fourfold Table."
EugenicsRev., 18, pp. 32-33.
3 M. Hellman (1914). " A Study of some Etiological Factors of Maloc-
clusion." Dental Cosmos,Vol. LVI, pp. 1017-1032.
4 K. Pearson (1900). " On the criterionthat a given systemof deviations
fromthe probablein the case of a correlatedsystemof variablesis such that it
can be reasonably supposed to have arisen fromrandom sampling." Phil.
Mag., 1, Series 5, Vol. L, pp. 157-175.
5 G. U. Yule (1922). " On the Applicationof the x2Methodof Association
and ContingencyTables, with ExperimentalIllustrations." Journ.Roy. Stat.
Soc., Vol. LXXXV, pp. 95-104.