You are on page 1of 17

The Analysis of Multiple Classifications with Unequal Numbers in the Different Classes

Author(s): F. Yates
Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 29, No. 185 (Mar., 1934), pp. 51-
66
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2278459 .
Accessed: 18/01/2013 00:00

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
51] The Analysis of Multiple Classifications 51

THE ANALYSIS OF MULTIPLE CLASSIFICATIONS WITH


UNEQUAL NUMBERS IN THE DIFFERENT CLASSES
BY F. YATES, RothamstedExperimentalStation, England

A type of problemwhichfrequentlyconfrontsthe statisticianis the


analysis of data whichcan be classifiedsimultaneouslyin two or more
differentways, as forexample,the analysis of the incidenceof disease
in differentfactories,wherethefactoriesmightbe classifiedaccordingto
type of workand also accordingto geographicalposition. The statis-
tical procedure appropriate to the case where the numbers in the
various sub-classes are equal is specially simple, and has been very
fullydevelopedin connectionwithreplicatedfieldtrialsin agriculture.
The procedureis a special case of the methodknownas the analysis of
variance,whichwas firstintroducedby R. A. Fisher.
When analyzing tables in which the numbersof the various sub-
classes are unequal the procedureappropriate to equal numbersre-
quires considerablemodification. A. E. Brandt ' in a paper in this
JOURNAL has set out a methodof analyzinga 2 Xs table. The present
paper considersthe more general case of a p X q table, and suggests
certaincorrectionswhichappear to be necessaryin Brandt's methods.
The matterhas alreadybeen discussedin a recentpaper 2 onlysome-
what incidentallyin connectionwith ratherdifferent problemsof ex-
perimentaldesign. It is proposedhere to give a rathermore detailed
account ofthe methodsofanalysisand the logical principlesunderlying
them.
ESTIMATION AND TESTS OF SIGNIFICANCE

In the analysis of any given body of data the practical statistician


is usually concernedwith the two fundamentallydifferent statistical
problemsof estimationand tests of significance. The object of esti-
mation is to obtain the best value available fromthe data for the
magnitudeof an effectassumed to exist,whereasthe object of tests of
significanceis to decide whetherthereis any evidence that the effect
existsat all. Both these problemsnecessitatecertainassumptionsbe-
fore we can proceed to theirsolution.
In estimationthe data are assumed to be a random sample froma
I A. E. Brandt, "The Analysis of Variance in a 2Xs table with DisproportionateFrequencies,"
thiS JOURNAL, 1933, 28, 164.
2F. Yates, "The Principlesof Orthogonalityand Confoundingin Replicated Experiments,"J.
Agric.Sci., 1933,23, 108.

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
52 American
Statistical
Association [52
population of some given mathematicaltype, but with certain con-
stantsundetermined. For example,in measuringthe heightofa moun-
tain by observingits angle of elevationfroma place a knowndistance
away various values mightbe obtained. The commonpractice is to
take the mean of all the values as the best estimateofthe height. Yet
the mean is only the best estimate if the observed values can be re-
garded as a sample of a normallydistributedpopulation of values
(or of certainotherspecial typesofdistribution). We may regardthis
as a reasonable assumption, partly in view of our knowledgeof the
sourcesof the errorsto which the observationsare subject, partly by
previousexperiencewith similarobservations,partly by examination
of the observations themselves. But the assumption (at least of
approximatenormality)must be made, for there are other types of
distributionin which the observationsmightconceivablybe distrib-
uted in which the mean is by no means the best estimate,some in
fact in which the mean is a very poor estimateindeed. This is im-
plicitlyrecognizedby the surveyorin that he usually reserveshimself
the rightto rejectwidelydiscordantobservations.
Tests of significancedepend on a similarassumptionas to the exist-
ence of an infinitepopulation of values giving rise to the particular
values observed,but herethe questionpropoundedis somewhatdiffer-
ent, being of the form:What is the probabilitythat an estimatefrom
a set of values can be regarded as a memberof some population of
estimates,or alternativelywhatis the probabilitythat two or moreesti-
mates fromdifferent sets of values can be regardedas belongingto
the same population of estimates? If this probabilityis small it is
taken as evidence that the two sets of values are measuresof different
quantities. In all cases the formof the population of estimates is
determined,once the form of the population of values is assumed
(though the available methods of mathematicalanalysis may not be
powerfulenoughto defineit) but the observationsthemselvesmay be
utilized to determinesome or all of the parameters. Consider, for
example, estimatesof the heightof childrenof a certainage made by
measuringthe heightof two groups of children,one fromwell-to-do
and the otherfrompoor parents. Assumingthat both sets of heights
are samples fromthe same normallydistributedpopulationof heights
it is possible to use the observationsthemselvesto evaluate the prob-
ability of obtaining two sets of heights with means differingmore
widelythan the means of the two observedsets differ. If such a prob-
ability is low (say less than 1 in 20 or 1 in 100) the two estimatesof
mean height are said to be significantlydifferent, and if adequate
precautionshave been taken to eliminateor equalize otherdisturbing

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
53] TheAnalysisofMultipleClassifications 53
factors,thismaybe regardedas evidencethatthereis in facta real
differenceattributable to causesassociatedwithparentage.
In the case of multipleclassifications withunequalclass numbers,
estimatesand testsof significance varyaccordingas certainassump-
tionsare or are notmade. We willhereconfine ourattention to the
discussionof a double(or two-way)classification, exceptfora brief
noteon multiple two-fold classifications.The extension to morecom-
plexclassifications involvesno newprinciple.
Assuming thateachvaluecan be classified in one ofp classesrepre-
sentingeffects A and simultaneously in one of q classesrepresenting
effectsB, therewillbe pq sub-classes, each ofwhichmay containsev-
eral, one, or no values. The numbersin each sub-class(the class
numbers, n1i, etc.,) may be set out in a two-waytable,Table I, the
marginaltotalsbeingrepresented by N1.,etc.,N.1,etc.,and the gen-
eraltotalbyN.
TABLE I
CLASS NUMBERS

Ai A2 . . . . . . . A, Total

Bi nil n2l . .l. . . . . npl N.i


B2 n12 n22 . . . . . . . nP2 N.:
* . . . . . . . . . . . . . . . .
B99 nig n2a . . . . q N.a

Total Ni. N2. N,. N

If y be takenas thegeneralsymbolforan observedvaluethevalues


in thesub-classA1B1maybe represented as yiii,Y112, etc.and thesub-
class meanas y11. The meanof all the A1 sub-classmeansmay be
writtenas y., and the mean of all the sub-classmeans as y. Similarly
the sumof all the valuesin the sub-classA1 B1 maybe writtenSy11,
thesumofall theA1sub-classsumsas SSyl. and thegrandtotalSSy.
The valuesy willbe assumedto be normally distributed
withthe
samevarianceabouthypothetical sub-classmeans.
Anestimateofthisvarianceis immediately availablefromthediffer-
encesoftheindividualvaluesfromtheirownsub-classmeans,theesti-
matesfromall thesub-classesbeingpooledin theordinary mannerof
the analysisof variancewithunequal classes (outlinedin the next
section). The differencebetweenany two sub-classmeanscan then
be testedby meansofthet test,and thedifference betweenany pair
of meansof sub-classmeans,weightedor unweighted, maybe tested
in a similarmanner. It shouldbe bornein mindthatsuchtestswill
frequentlybe adequateto decideon thesignificanceorotherwise
ofthe

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
54 American Association
Statistical [54
effectsunderconsideration,and that the moreelaborate tests about to
be describedwill not then be necessary. But whethertests of signifi-
cance involvingseveral degreesof freedomare undertakenor not, the
various possible estimates of the differenteffectsshould be clearly
distinguished.
The pq hypotheticalsub-class means, if known,mightbe combined
in various ways correspondingto real physicaleffects. In particular,
in a two-wayclassificationthe differences of the means of all sets of
hypotheticalsub-class means having the same A may be taken as
representing(or morestrictlyas defining)the average A effects. The
averageB effectsmay be similarlydefined.
The observed sub-class means are efficientestimates of the hypo-
thetical sub-class means, and in the absence of any furtherassump-
tions the efficientestimatesof the average A effectsare obtained by
takingthe means of the observedsub-classmeans forlike A over all B
sub-classes. In the above notationthese means are yi., y. etc.
It may happen, however,that the phenomenawe are investigating
are such that the A and B effectsare additive,so that the hypothetical
sub-classmeans are of the form
+ a, + .; r= 1, 2, ...p; s = 1, 2, . . .q;

where,umay be called the hypotheticalgeneralmean and the a's and


f3'sthe hypotheticaldeviations due to the treatments,these being
subject to the relations
al+a2+ . . .+ap=,

/1+/32+ . . . +q= 0.
In this case the differencesof ri., y2*,etc., though still unbiased
estimatesof the average A effects(whichnow correspondto the differ-
ences of a,, a2, etc.), are no longerefficientestimates. Efficientesti-
mates are given by the methodof maximumlikelihood,whichis here
equivalent to the methodof least squares.
fromthe natureofthe
In certaincases it is possibleto say definitely,
experimentalmaterial,that the additivelaw holds,but morefrequently
the additive law must be regardedas a possibledescriptivelaw, which
is essentiallyan approximationto some morecomplexlaw. We may,
however,be preparedto assume that the departuresfromthe additive
law are negligibleon the ground that they are small in comparison
withthe errorsof the data.
Departuresfromthe additive law are generallytermedinteractions.
The significanceor otherwiseof the interactionsas a whole may be
tested by finding the residual variance between sub-class means

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
55] TheAnalysisofMultipleClassifications 55
after fittingconstants representingadditive effectsand comparing
with the variance within classes. This, of course, does not provide
any absolute criterionas to whetherthe assumptionof negligiblein-
teractionsis justified,but it is a usefulindication.
Tests of significanceof the average A effectsare different
according
as the interactionsare assumed non-existent,or not. If the assump-
tion is made the fittedconstants must be tested for significance,in
the ordinarymannerof the methodof least squares; if the assumption
is not made then the method of weightedsquares of means described
in the nextsectioncan be used.

METHOD OF WEIGHTED SQUARES OF MEANS

The appropriateprocedurefor the analysis of variance of a single


classificationwith unequal numbersin the various classes has been
describedby Fisher 1,? 44. Suppose that the class totals and means,
and numbersof observationsin each class are givenby Table II. The
TABLE II

Class Al A2 . . . . . . . Ap All

Class total ............................ Syi Sy2 . . . . . Sy Sy


Class mean............................ V i Y2 . . . . VP . . . V i
Numberofobservations......... ........ ni n2 . . . . . . . np N

total sum of squares is SS(y -)2, correspondingto N-1 degrees of


freedom,whichcan be split up into p -1 degreesof freedominvolving
comparisonsbetween the class means, and N-p degreesof freedom
involvingcomparisonswithineach class only. Class A1 contributes
n1-1 degreesof freedomto the latter,and the correspondingsum of
squares is clearlyS(y1- y) , the total sum of squares forthe N - p de-
greesof freedomwithinclasses being
S(Yl--yl)'+S(y2 Y2)2

In virtueof the identity


SS(y-j)2 = S(y1-Y1)2+S(Y2Y2)2+ ... +1( )2+n2(2 _y)2+

the remainingportionof the sum of squares forthe p -1 degrees of


freedomcan be computedfromthe formula
Q=ni(y-)2+n2(+2- .-y2 .

=-nly,2+n2y-22+ -NY
_N2

1 R. A. Fisher, Statistical Methods for Research Workers, 1925, Edinburgh: Oliver and Boyd. (4th
Edition,1932.)

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
56 American Association
Statistical [56
which is equivalent to the last set of terms of the identity. When
n1,n2, . . . are all equal this reduces to the ordinaryformused in the
analysis of variancewithequal numbersin the different classes.
The expressionQ, when divided by the numberof degrees of free-
dom, p-1, provides an efficientestimate fromthe differences of the
class means of the variance of the individual observations. It will
hold equally wheny',y2, . . . are any numbersdistributedabout some
(unknown)mean withvariances-, -, ., q2 beingalso unknown
n, n2
but n1, n2, . . . known. We are thus led to the more general case
where an estimate of a variance is requiredfromdifferences of a set
of numbers whose variances are known fractionsof that variance.
From the above expressionforQ this is seen to be the weightedsum of
the squares ofthe deviationsfromthe weightedmean of the numbers,
dividedby p-1, the weightsbeingequal to the reciprocalsofthe known
fractions. If the numbers are ul, u2, . . . and their variances
-, -, . . . ofthe generalvariancea2,the estimates2 of a2 is given by
Wl W2

Q = (p-l u)1+82(u-u)2+
-lu-
-w1u12+w2u22+ ... -(wl+w .W+ . )i (A)
where iu=.= W1U1+W2U+
WJ+W2+

This result is immediatelyapplicable to the analysis of multiple


classifications. The variances of the marginal means of the sub-
class means are known fractionsof the variance of a single observed
value, the variance of 1. (Table I) being 1/1+1 . . .)<r2, etc.
Hence the efficient estimateof the variance fromthe A means of the
sub-class means is given by the substitutions

1 =1 (1 + 1 + . . .), etc.,
W1 q2 nil n12
ul=yV., etc.,
in the above formulaforQ, equation (A).
This estimateof the variance may be comparedwiththe estimateof
variance fromthe variationwithinsub-classesby means of the z test.
The latterestimateis obtainedby the methodsdescribedabove appro-
priate to a single classification,regardingeach sub-class as an inde-
pendentclass. It will be based on N - pq degreesof freedom.
The estimateof variancefromthe A means of the sub-classmeans is

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
57] TheAnalysisofMultipleClassifications 57
based on p -1 degreesof freedomwhich are independentof (orthog-
onal to) the N-pq degrees of freedomwithin classes, but they will
be the appropriateset of degreesof freedomfortestingthe A effects
if,and only if,they are composed of a set of singledegreesof freedom
which betweenthem containall the informationon the A effects.
This will be the case unless the interactionsare knownnot to exist
(or we are prepared to assume their non-existence). In the latter
event the methodofweightedsquares ofmeans can onlybe regardedas
an approximation to the rigorous method of least squares. The
approximationwillbecomeless close as the class numbersbecomemore
unequal.
The methodof weightedsquares of means also has additionalinter-
est in a 2 Xs table, in that it furnishesa method for computingthe
sum of squares appropriateto the efficienttest of interactions. These
sums of squares can be furtherused to constructthe sums of squares
requiredto test the main effectson the assumptionof negligibleinter-
actions,or as a checkon them.
We will now apply the methodto the data given by Brandt in the
paper already referredto. These data are reproducedin Table III
(with the correctionof one class number).
TABLE III

Female Male Total


Breed
Log of Log of Log of
Number per cent Number per cent Number per cent
bacon bacon bacon

Hampshire.33 66.55 89 181.04 122 247.59


Duroc Jersey .51 98.69 141 281.43 192 380.12
Tamworth.13 25.90 17 34.20 30 60.10
Yorkshire.4 7.62 9 17.58 13 25.20
Berkshire.8 14.64 4 8.20 12 22.84
Poland China. 15 28.11 32 64.42 47 92.53
ChesterWhite.35 66.90 47 90.52 82 157.42
All others.12 23.32 23 46.70 35 70.02
Total ............. 171 331.73 362 724.09 533 1,055.82

The table of sub-classmeans is given in Table IV. They are given


to six decimal places in orderthat the sum of squares betweenclasses
may be computedby the productsof means and totals (using a work-
ing mean of 1.9).
The relative weightsof the means of the sub-class means are ob-
tained fromthe reciprocalsof the sub-classnumbers,Table V. That
for Hampshire,forinstance,is the reciprocalof 1 1 + 1) or 96.29.
These weightsare givenin the last columnof Table V.

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
58 AmericanStatisticalAssociation [58
TABLE IV

Breed Female Male Unweighted Difference


mean

Hampshire..........2.016667 2.034158 2.025412 0.017491


Duroc Jersey ..........1.935099 1.995958 1.965528 0.060859
Tamworth............ 1.992307 2.011765 2.002036 0.019458
Yorkshire...........:1.905000 1.953333 1.929167 0.048333
Berkshire
................1.830000 2.050000 1.940000 0.220000
Poland China..............1.874000 2.013125 1.943562 0.139125
ChesterWhite..............1.911429 1.925958 1.918694 0.014529
ARlothers................1.943333 2.030434 1.986884 0.087101
Unweightedmean...........1.925979 2.001841 1.963910 0.075862

TABLE V
RECIPROCALS OF SUB-CLASS NUMBERS

Breed Female Male Total Wegh

Hampshire...............0.03030 0.01124 0.04154 96.29


Duroc Jersey...............0.01961 0.00709 0.02670 149.81
Tamworth....0.07692 0.05882 0.13574 29.47
Yorkshire. .. .. .. ..... ..... . . 0.25000 0.11111 0.36111 11.08
Berkshire................0.12500 0.25000 0.37500 10.67
Poland China..............0.06667 0.03125 0.09792 40.85
ChesterWhite..............0.02857 0.02128 0.04985 80.24
All others................0.08333 0.04348 0.12681 31.54
Total ................0.68040 0.53427 1.21467 449.95

The primaryanalysisof variancewithinand betweenclasses is


givenin Table VI. (The totalsumofsquaresis takenfromBrandt's
paper.)
TABLE VI

Degrees of freedom Sums of squares Mean square z

Betweenclasses......... 15 1.2715 0.0848 0.659


Withinclasses......... 517 11.7427 0.0227
Total ........... 532 13.0142

ofclassification
The effects are clearlysignificant.On examination
ofTable IV it is apparentthatsexhas producedan effect,
themalesof
all eightbreedsg'iv'inghighervalues than the females. It is also
apparent,withoutfurther analysis,that thereare significant
differ-
encesbetweenthebreeds. The estimateofthestandarderrorofthe
Hampshire mean,2.0254,is V .02/69)or 0.0154,andthatofthe
ChesterWhitemean,1.9187,is 0.0168. The difference ofthemeans,
0.1067,is morethan412 timesits standarderrorV/(0.01542+0.01682)
or 0.0228. The significanceis undoubted, evenallowingforthefact
that we have selectedthe greatestdifference.We will, however,

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
59] The Analysis of MultipleClassifications 59
work out the sum of squares and mean square attributableto the
whole 7 degrees of freedomby the method of weightedsquares of
means,both as an illustrationof procedure,and in orderlater to make
a comparisonwith the values obtained wheninteractionsare assumed
non-existent.
The sum of squares attributableto the whole7 degreesof freedomis
computed as follows. The sum of the weights is 449.95, and the
weightedmean of the means is therefore
u= (2.025412X96.29+1.965528X149.81+ . . . )/449.95
= 1.970385.
Hence the appropriatesum of squares, usingthe workingmean of 1.9,
is given by
Q=0.1254122X96.29+0.0655282X 149.81+ . . . -0.0703852 X449.95
= 0.6056.
It is important,when squaring means instead of totals, to employ a
working mean somewherenear the true mean, as otherwisesmall
inaccuraciesin the means due to roundingoffwill seriouslyaffectthe
result.
The correspondingmean square is 0.0865 (7 degrees of freedom)
which when compared with the mean square withinclasses is quite
clearlysignificant.
The directeffectof sex is measuredby the difference 0.075862, and
the significanceof the difference can be tested immediatelyby means
of the t test. Referringto the table of reciprocalsof the sub-class
numbers,the variance of the mean of the female sub-class means is
seen to be 0.6804002/82, and the correspondingvariance forthe males
is 0.53427T2/82. The estimateof the standard errorof the difference
of these means is therefore18V/(1.21467XO.0227) or 0.0207, so that t
is 3.66 (517 degreesoffreedom)and the difference is clearlysignificant.
This test can, if desired,be thrownin analysis of varianceform,the
relevant mean square being the square of the differencedivided by
the ratio of its variance to a2, i.e. 0.0758622.82/1.21467, or 0.3032.
It remainsto test whetherthereis any interaction. In the absence
ofany interactionthe variationbetweenthe differences ofthe male and
femalesub-class means given in Table IV will be whollyascribable to
error, and in any case the differencesof these differencesprovide
efficient estimatesof the interactionswhentheyexist. The methodof
weightedsquares of means may thereforebe applied to these differ-
ences, and will providean efficient test of the significanceof the inter-
actions. The computationproceedsas before,withthe exceptionthat

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
60 American Association
Statistical [60
the weightsare one-quarterofthoseshownin Table V. The value ob-
tained forthe weightedmean of the differences is 0.053014,and forthe
sum of squares (7 degreesof freedom)is 0.2299, givinga z of 0.185,
whereas the 5 per cent point is about 0.35. The interactions,there-
fore,are not significant.
betweenthe greatestand
If the t test were applied to the difference
least of the differences, the differencewould be judged significant.
Such a testis not valid because the greatestand least values have been
chosen. The methodof weightedsquares of means takes into account
insteadofonlya pair ofthem.
the variationof all the differences
TABLE VII
ANALYSIS OF VARIANCE-INTERACTIONS NOT ASSUMED NON-EXISTENT

Degrees of freedom Sums of squares Mean squares

Sex ................................... 1 0.3032 0.3032


Breed. .... 7 0.6056 0.0865
Interactions .................................. 7 0.2299 0.0328

The results are given in Table VII. It should be noted that the
sums of squares obtained forbreed,sex, and interactionsdo not total
to the sum of squares for the 15 degrees of freedombetween sub-
classes. The additivepropertyonlyholdswhenthe degreesoffreedom
appropriateto the testingof the variouseffectsare mutuallyindepend-
ent (orthogonal).

METHOD OF FITTING CONSTANTS

It was pointed out in the second section that the assumptionthat


interactionsare non-existentis equivalent to the assumptionthat the
hypotheticalsub-class means are additive functionsof the constants
I, x,.
a, This is the assumptionwhichBrandt implicitlymakes in his
discussionof the analysis of a 2 Xs table, forhe uses the fact that the
marginal totals of a table constructedfrom the estimates of the
constantsmust be identicalwiththe actual marginalmeans.
In the general p X q table the ordinaryprocedureof the method of
least squares gives the followingequations form, a7, b., the estimates
ofA, a,, #%.
Leading
term
m Nm+Ni.a1+N2.a2+ ... +N.lb1+N.2b2+ ... =SSy
a, Ni.m+Ni.ai +nilbl+nl2b2+ ... = SSyl.
a2 N2.m +N2.a2 +n2,b1+n22b2+... = SSy2.

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
61] TheAnalysisofMultipleClassiflcation8 61
bi N.im+nllal+n2la2+ ... +N.1b, = SSy.
b2 N.2m+nl2a,+n22a2+ . . +N.2b2 =SSy.2

The ruleofformation N
is as follows. Write,or imaginewritten,
equationsfortheN observedvaluesoftheform
Yr.t =P+ Xr+fa+ (B)
Xret

wherexr.tis an errorterm. To formtheequationin whichm is the


leadingterm,sumthesquaresofthecoefficients of, in theseN equa-
withthe corresponding
tionsand the productsof these coefficients
of a,, a2, . . .
coefficients /31, 2, . . . in turn. In this case the coeffi-
cientforAis alwaysunityso thefirstsumis N, thecoefficient of a, is
unityin the N1. equationsforthe observationsin all A1 classes,and
zeroelsewhere,so thatthesumoftheproducts is N1.,andso on. These
of m, a,, a2, . . . bi, b2 . . . in the first
sums give the coefficients
equation. The numerical termis givenby thesumoftheproductsof
eachobservedvaluewiththecoefficientofpuin thecorrespondingequa-
tion(B), and thusforthefirstequationby thesumofall thevalues.
The otherequationsare formedin thesamemanner. The wholeset
ofequationsgivesthevaluesofA,the & s and the,B'swhichmakethe
sumofthesquaresoftheerrortermsxr.ta minimum.
Onlyp+q- 1 ofthe above p+ q+ 1 equationsare independent, the
equationwithleadingtermm beingthe same as the sumsof all the
equationswithan a as leadingterm,and ofall theequationswitha b
as leadingterm.
byeliminating
The solutionis besteffected theb'sorthea's (which-
everare themorenumerous). Fromthebiequation
N.b= SSy. -N.m -nlal -n2la2- ... (C)

Usingthisand thecorresponding valuesfortheotherb's we obtainon

(
in thea equationsand somesimplification
substitution
2 nl12Z22
flu2nl22nX2
N . . . /a,- _ln2l Nl23na23
+ N.2 + N.3

- SSy1.i-l lSSy SY2-V-SSy.3**. (D)

formforsolution,
The a equationsare nowin themostconvenient
a. being eliminatedby means of the identityal+a2+ . . . +a,=O.
Onlyp-1 oftheequations(D) areindependent, thepth beingderivable
fromthe sum of all the others. This providesa check. The values

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
62 American
Statistical
Association [62
of m+b1, m+b2, . . can then by derived from(C) by substitution
and finallym determinedfromthe identitybl+b2+ . . . +b=O.
The formation
ofthea equationsis facilitated
by the construction

ofa tableof the quantitiesniu n12 N..


2
,etc. The numer-
N.,)N.2'
-

ical factorsofthea termscan thenbe obtainedas thestraight


sumsof
products ofnumbers inthistableand thetableofsub-classnumbers.
Afterthevaluesofthea's, b's and m havebeenobtainedthereduc-
tioninthesumofsquaresdueto fitting theconstants
canbe calculated
as thesumoftheproductsofeach constantwiththenumerical term
of the equationof whichit is the leadingterm. This includesthe
ordinary correctionforthemean. Symbolically
SSy2-SS(y-Y)2 = mSSy+a, SSyl. + ... +l SSyl. +
Deductingthis,less the correction forthe mean,fromthe sum of
squaresbetweenclasses,weobtaintheresidualsumofsquaresbetween
classescorresponding to (p-1) (q -1) degreesoffreedom, whichis the
appropriate sum of squaresfortestingthe significance of the inter-
actions.
If thereareonlytwoB classesand M2b is substituted
forb2(or- b1),b
willbe theefficient
estimateofthedifference ofthetwoB effects.Re-
writing equation(D) in termsofbi and b2we obtainaftera littlesim-
plification
( niin12+n2n22 b=X nn12 ( - n2n22- -
Y
nii+n.2 n2i+n22 nil+n12 n2l+n2

In otherwordsb is simplytheweighted meanofthedifferences


ofthe
pairs of A sub-classmeans,and is identicalwithBrandt'sb. The
above equationcorresponds to hisequation(6).
The a's and m are givenby theequations

m+ai= 1 [SSy. + 1(nil -n2)b


nii+nl2- 2
etc. These equationsgive the efficient
estimatesof the A effects.
Theycan be deducedfromBrandt'sequations(1) to (4) bythesubsti-
tutionof m+a, - '2b forx, etc.
In theexampleon pigsalreadyanalyzedthevaluesofthea's and b
werefoundto be

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
63] The Analysisof MultipleClassifications 63
m+a, = 2.017259 m+a5= 1.912169
m+a2=1.967367 m+a6=1.959136
m+a3= 1.999799 m+a7= 1.915877
m+a4=1.928267 m+a8=1.992241
b= +0.053014
The discrepancywithBrandt'svalueforb appearsto be due to a com-
putationalerror. It shouldbe noticedthat b has alreadybeen de-
termined in computing theinteraction sumof squaresin theanalysis
by weighted squaresofmeans.
The reduction in thesumof squaresdue to the fitting of the con-
stants,excludingthemean,is
2.017259X 247.59+ 1.967367X 380.12+
+ 120.053014(724.09 -331 73) - (1055.82)2/533.
In orderto avoid the necessityof usingan excessivenumberof
decimalplacesit is betterto employa workingmeanof1.9,as before,
whenthereduction in the sumof squaresbecomes
0.117259X 15.79+0.067367 X 15.32+ . . . +Y20.053014(36.29 -6.83)
- (43.12)2/533= 1.04146.
It remainsto determine the appropriate testof significanceforthe
sex and treatment effects separately. In thesetestswe differ from
Brandt,who deducesformulaefromthoseapplicableto equal class
numbers. These formulaedo not appear to be correct,exceptthat
fortheinteraction sumofsquares. The disagreement ofhisvaluefor
theinteraction sumofsquaresin thenumerical examplewiththeone
givenbelowappearsto be due to a misinterpretation of the formula.
The computation is also at faultin thattheadditiveproperty of the
varioussumsofsquaresis assumed;thisdoesnothold.
The generalruleapplicableto any groupsof fittedconstantsis to
findthepartofthesumofsquaresaccountedforby fitting all thecon-
stantsand deductfromit thepartofthesumofsquaresaccountedfor
by fitting
all theconstants exceptthoseto be tested. In thecase ofa
doubleclassification thereis no difficulty
in computing thelattersum
of squares,even when each classification containsmorethan two
classes. The sumofsquaresremovedbytheb constants alone,forin-
stance,is the sum of squaresobtainedfromthe B marginaltotals,
SSy.1,etc.,it beingremembered thatthesetotalsincludeunequalnum-
bersofobservations.
The resultsin thecase oftheexamplealreadyanalyzedare givenin
Table VIII. UsingTable III, the sumof squaresforbreedalone is
givenby

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
64 American
Statistical
Association [64
(247.59)2+ 1 (380.12)2+ . . . - (1055.82)2.
122 192 533
in thesamemanner,
The sumofsquaresforsexalonecan be computed
or by thesimplerexpression
(171X 724.09-362 X331.73)2
171X362X533
TABLE VIII

Degrees of freedom Sums of squares Mean square

Teetforbreed:
Sex alone....................... ...... 1 0.4224
Breed................................ 7 0.6191 0.0885
Sex and breed (constants)....... ....... 8 1.0415
Teat for see:
Breed alone ................. .......... 7 0.7253
Sex .................................. 1 0.3162 0.3162
Sex and breed (constants)....... ....... 8 1.0415
Interactions................ .......... 7 0.2300 0.0329
Between classes ............. .......... 15 1.2715

ComparingthiswithTable VII it willbe seen thatthereis little


materialdifferencebetweenthemeansquaresobtainedforbreedand
sexbythetwomethods. The interaction meansquareis thesameby
bothmethods, as indeedit shouldbe,beingbasedon thesameassump-
tions. The two methodsof computation providea valuablecheck.
The sumofsquaresdue to sexcan also be obtaineddirectly. It is the
squareofb multipliedby itsrelativeweight,112.49.
PROPORTIONATE CLASS NUMBERS

If theclassnumbers maybe represented as theproductoftwonum-


bers,one,h,depending ontheA classificationandtheother,k,on theB
classification,so thatnr.= hrks,theyare styledby Brandtproportion-
ate, since the relationnrt/n,=n t/nflsholds generally.
In thiscase meansoftheA marginal totalsofthetableofsub-class
totals are efficient estimatesof A effectsaveragedover any class
numbersin the B classificationproportionalto the actual class num-
bers,whetherB effects or interactions
existor not. Thesetotalswill
not containequal numbersof observations, but the sum of squares
appropriate to testingtheA effects maybe computedby themethod
alreadydescribedfora singleclassification withunequalnumbersin
thevariousclasses.
The B effects may be estimatedand testedin a likemanner,and
finallythe interactions may be testedforsignificance by deducting

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
65] TheAnalysisofMultipleClassifications 65
thetwosumsofsquaresalreadyobtainedfromthetotalsumofsquares
betweenclasses,sincein thiscase the threesumsof squares,forA
B effects,
effects, and interactions are additive.
respectively,
ANALYSIS OF A 2X2X2X . . . TABLE

On theassumption ofnegligible
interactions estimateof
theefficient
thesexeffect
inthe2 X s tablewasseento be theweightedmeanofthes
of the pairsof sub-classmeansforthes treatments.All
differences
the main effectsofa 2 X 2 X 2 X . . . table can be estimatedand tested
by suchweighted
means,thet testbeingused. If theinteractions
are
not assumednegligible
thenthe unweightedmeansof the differences
must be taken. The individualinteractions
can be testedby un-
weightedmeansin thesamemanner.
APPROXIMATE METHOD OF ANALYSIS

The varianceofanysub-classmeanis u2/nr8. If thedifferences be-


tweenthe variousclass numbersare ignored,all the sub-classmeans
beingassumedto have a varianceequal to the meanof all the true
variances,themethodsappropriate to thecase ofequal classnumbers
maybe employed. This approximation is onlyusefulwhenthe class
numbers do notdifferverygreatly.
Table IX showstheresultsofapplyingthismethodto thepigdata.
The mean varianceof the sub-classmeansis 0.07592o.2 (Table V).
Analyzing Table IV as ifit werean ordinary 2 X8 tablewitha single
valuein each sub-class,and dividingthesumsofsquaresobtainedby
0.07592,Table IX is obtained.
TABLE IX

Degrees of freedom Sums of squares Mean square

Sex.................................... 1 0.3032 0.3032


Breed.7 0.2635 0.0376
Interactions .7 0.2387 0.0341
Total.15 0.8054

Comparing theseresultswiththoseof Table VII and Table VIII,


it willbe seenthatthesumofsquaresdue to sexis thesameas thatof
ofbreed,however,is veryseri-
Table VII, as it shouldbe. The effect
ouslyunderestimated.The sub-classnumbers arein facttoo unequal
forthismethodto giveevena tolerableapproximation.
SUMMARY

The methodsof estimationand testsof significance


of the main
of a two-waytablediffer
effects accordingas to whether
interactions

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions
66 AmericanStatisticalAssociation [66

are, or are not, assumed non-existent. If the assumptionis made the


method of fittingconstants provides efficient estimatesand efficient
tests of significance. If it is not made the sub-class means provide
efficientestimates,and the methodof weightedsquares of means effi-
cienttests of significance.
The methodof fittingconstantsalso providesefficient tests of the
interactions. The method of weighted squares of means provides
efficienttests of the interactionsin the special case of a 2Xs table,
givinga checkon the constants.
If the class numbersare proportionateonly slightmodificationsare
necessaryin the methodsof analysis appropriateto the case of equal
class numbers.
In the case of a 2X2X2X . . . table all estimatesand all tests of
significancemay be made very simply whetherinteractionsare as-
sumed non-existentor not.
Tables with only slight inequalities in the class numbersmay be
analyzed by an approximatemethodbased on the assumptionthat the
variancesofall the sub-class means are equal.

This content downloaded on Fri, 18 Jan 2013 00:00:14 AM


All use subject to JSTOR Terms and Conditions

You might also like