You are on page 1of 15

Biometrika Trust

Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis
Author(s): J. C. Gower
Reviewed work(s):
Source: Biometrika, Vol. 53, No. 3/4 (Dec., 1966), pp. 325-338
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2333639 .
Accessed: 03/05/2012 12:05

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

http://www.jstor.org
Biometrilka(1966), 53, 3 and 4, p. 325 325
Printed in GreatBritain

oflatentrootand vectormethods
Somedistanceproperties
usedin multivariate
analysis
BY J. C. GOWER
Rothamsted
Experimental
Station

SUMMARY
This paper is concernedwiththe representation of a multivariatesampleof size n as
pointsP1,P2,..., PI ina Euclideanspace.Theinterpretation ofthedistanceA(Pi,Pj) between
theithandjthmembers ofthesampleis discussedforsomecommonly usedtypesofanalysis,
includingbothQ and R techniques.Whenall the distancesbetweenn pointsare knowna
methodis derivedwhichfindstheirco-ordinates referred to principalaxes.A setofnecessary
and sufficientconditionsfora solutionto existin real Euclideanspace is found.Q and R
techniquesare definedas beingdual to oneanotherwhentheybothlead to a setofn points
withthesameinter-point distances.Pairsofdual techniquesare derived.In factoranalysis
the distancesbetweenpointswhoseco-ordinates are the estimatedfactorscorescan be
interpretedas D2 witha singulardispersion matrix.

1. INTRODUCTION
If we have information on v variatesforeach ofn individuals,thiscan be set out in a
two-waytablewithxijas thevalueofthejth variatefortheithindividual.Thistableis the
starting-point formost multivariatestatisticalmethodssuch as discriminant analysis,
factoranalysisand principalcomponents analysis.
Different methodsrequiredifferent assumptionsabout the structure ofthe sampleand
about the hypotheticalmultivariateprobabilitydistribution fromwhichthe sample is
drawn.Thus,in discriminant analysiswe mustbe able to assigneach individualto one of
a setofpredetermined groupseachwitha knowndistributional form,and in factoranalysis
thedispersion matrixmusthave a particularstructure.
Recentlytherehas beenwidespreadinterest inthepossibilityofusingnumerical methods
Sokal & Sneath(1963,p. 178) givereferences
as an aid to classification. to investigations
usinga varietyofdifferent numericalmethods;thesemethodsall beginwitha multivariate
sample,butso faras is knowneachindividualmaycomefroma different biologicalpopula-
tionor all individualsmay comefromthe samepopulation.An additionalcomplication is
thatmanyor all ofthe variatesmaybe qualitative,so thatproductmomentcorrelations
betweenvariatesmaybe inappropriate. Thetechniquesusedarebasedona n x n symmetric
matrixA whose(i,j)th element,aij, is a coefficient ofassociationbetweenthe ith and jth
individuals.Sokal & Sneath(1963)givea listofcommonly manyofwhich
usedcoefficients,
are familiarto statisticians.
In one typeof analysisthe matrixA is firstcalculatedand thena methodofforming
groupsofindividualsknownas a clusteranalysisfollows.Individualsare assignedto the
samegroupwhentheircoefficients in A obeycertaincriteria,whichdependon themethod
ofclusteranalysisbeingused.Thesecriteriaalwayshave a simplegeometric interpretation
2I Biom. 53
326 J. C. GOWER
when ai1, or some functionof aij, is regarded as the distance between the ith and jth indi-
viduals. Thus an arbitrarydistance a may be chosen and a criteriondefinedby sayingthat
all membersof a group must be withindistance a of all othermembers,or that it must be
possible to finda chain linkingall membersof the group such that the lengthof no link is
greaterthan 6, or that the average distance between all membersof the group is less than
6, etc. The usual techniqueis to let 6 vary froma maximumto a minimum(sortingupwards)
or froma minimumto a maximum (sortingdownwards),and to examine how these groups
split or combine at the differentlevels of d. Not only do differentcriteriagive different
groups,but some criteriaformdifferent groupsforthe same value ofa6when sortingupwards
fromwhen sortingdownwards.
We are not primarilyconcernedwithclusteranalysis herebut onlywishto emphasize that
the concept of distance between individuals is fundamentalto these techniques.
Recognizing the metricnature of the matrix A, several workershave endeavoured to
constructmodels in two or three dimensionswhich reflectthe inter-individualdistances;
see, for example, Lysenko & Sneath (1959) and Bidwell & Hole (1964). One of the by-
products-ofmost standard multivariate statistical techniques is a representationof the
multivariatesample in a small number of dimensions.There is a growingtendencyto use
these techniques formallyon association matricesto derive multi-dimensionalrepresenta-
tions of the sample, although the standard underlyingassumptions are not even approxi-
mately satisfied.In particular the association matrix is a n x n matrix formedfromthe
comparisonof all pairs of individuals,i.e. a so-called Q matrix,whilststandard techniques
postulate a v x v dispersion or correlationmatrix, i.e. a so-called R matrix, formedfrom
comparisonsbetweenthe variates. Despite this the techniques have been used successfully,
in the sense that the expected relative magnitudesof inter-individualdistances have been
recovered.In this paper we shall investigatethe extentto whichthis use is valid. To do this
we shall need the methodofprincipalcomponentsanalysis,whichis thereforediscussedin ?2.

2. PRINCIPAL COMPONENT ANALYSIS


When all the variates are quantitative,the method of principal componentscan be used
to constructmulti-dimensionalmodels of the type just discussed. An interpretationforthe
special case of qualitative (0, 1) data is given in ? 4-1. Unlike other formsof multivariate
analysis no assumption need be made about the distributionof the variates in the hypo-
theticalpopulation,except ofcoursewhensignificancetestsare ofinterest.The multivariate
sample is regarded as defininga set of n points Pi (i = 1,-2,..., n) in v space, where Pi has
co-ordinates(xi., xi2, ..., x,v) referred
to rectangularaxes. Thus the implieddistancedij
between PI and Pj is given by
v
di2j (XirXjr)2, (1)
r=1

and the spatial configurationof the sample is of interestonly if dij satisfactorilymeasures


the similaritybetween the ith and jth individuals.
As a measure of similaritydij has the obvious defectthat it depends in a complex manner
on the scales ofmeasurementofthe different variates. When different variates are measured
in differentscales, dV,has nonsensical physical dimensions. To evade this difficultyit is
common practice to normalizevariates by dividingeach by its sample standard error,but
Distance propertiesof latentvectormethods 327
othernormalizers couldbe used,forexample,thevariatemean(whenzerois notarbitrarily
located),or therangeor eventhe cuberootofthesamplethirdmoment.
Formula(1) makesno attemptto allow forcorrelations. In thisrespectdijhas similar
propertiesto measuresof distancebased on varioussimilaritycoefficients currently in
favourin classification workand contrastswithdistancesused in discriminant analysis,
e.g. D2.
Principalcomponentsare computedby evaluatingthe latentrootsand vectorsof the
sums of squaresand productsmatrixformedfromthe normalizedvariates.The vector
corresponding to the largestrootgivesthe directioncosinesof a linethroughthe sample
mean,C, suchthatthesumofsquaresoftheperpendicular distancesfromthe Pi ontothis
line is a minimum.If the footof the perpendicular fromPi to the line is Hi thensince
CP4 = CIIM+ 1HPgwe musthave maximizedthe sum of squaresYGH4. In factY'GH4is
equal to thelargestlatentroot;principalcomponents analysisis therefore oftenregarded
as a meansof findingthe directioncosinesof a line such that the sum of squaresof the
projectionsGHiontothislineis a maximum.Havingfoundone linewiththe above pro-
perties,we lookfora secondline,at rightanglesto thefirstand corresponding to thesecond
largestlatentrootand vector,whichminimizesthe sum of squaresofthe perpendiculars
ontotheplanedefinedbythetwolines.The sumofsquaresin thisnewdirection is equal to
thesecondlargestlatentroot.Similarproperties holdforfurther rootsand vectors.Clearly
if the samplefitsexactlyinto k( < v) dimensionsthe last v- k latentrootsmustbe zero.
Anydimension whichhas a smalllatentrootwillcontribute littleto theoriginaldistances
betweenthenormalizedco-ordinates and maybe ignored.Whennearlyall thevariationis
in twoorthreedimensions theco-ordinatesofthefeetoftheperpendiculars HlqfromPi onto
the reduceddimensions may be used to a
plot graphor constructa modelpreserving as
nearlyas possible,in thedefinedsense,theinter-individual distancesdi,.The pointsfound
willhave theproperty thatthesquareofthedistancefromHi to IIj summedoverall pairs
of individualswill be the maximumpossibleforthe chosennumberof dimensions.The
possibilityof associatingdefiniteproperties,e.g. factors,with the reducednumberof
dimensions willnotbe discussedhere.

3. Q-TECHNIQUES AND THE TREATMENT OF QUALITATIVE VARIATES


The methodofprincipalcomponentsis oftenused,and misused,by statisticians. When
unordered qualitativevariatesoccurit is notapplicable,exceptpossiblyforthespecialcase
of (0, 1) data; see ?4 1. A different
axis can be assignedto each level ofeveryqualitative

Table 1. The latentrootsand vectors


ofthesymmetric
matrixA
(-,Cis the mean value of the elements of the rth vector.)

Root

A1 A2 ... An

Q1 C1l C12 ... CI,n


Point Q2 C21 C22 ... C2n

Qn cn 'n2 ... Cnn


Centroid Q c-l c-2 ...Cn

2I -2
328 J. C. GOWER
variate and a scoreof 1 givenifthis level occurs and 0 ifit does not occur; the problemthen
reduces to that of (0, 1) data discussedin ? 4 1. An alternativetreatmentfrequentlyused for
qualitative data, and sometimesfor quantitative data, is to calculate a n x n Q matrix of
coefficientsof association between individuals. The co-ordinates of the point Qi corre-
spondingto the ith individual are now definedby the ith componentofeach ofthe n vectors
of the Q matrix.It is hoped to get a low-dimensionalrepresentationof the sample by using
co-ordinateaxes correspondingto only a few large latent roots. Such methods have been
foundto lead to reasonable results,but the underlyingtheoryis not so well understoodas is
that ofprincipalcomponentsanalysis. In particulartwo questions arise: what is the correct
scalingforeach latent vectorand what is the distance A(Qi, Qj) whenthis scalingis used? In
the remainderofthis sectionwe firstanswerthese questions,thenshow how the methodcan
be improvedand give a fewancillaryresults.
Suppose A is a symmetricmatrixof ordern withlatent roots A1l A2D. A and associated
columnvectorscl, c2,..., cn. We may writethe vectorsdown in a square array as in Table 1
wherethe rthcolumnis the vector cr. The technique outlinedabove is to take the elements
of the ith row as the co-ordinatesof a point Qi in n space and the distance dij betweenQi
and Qj is then given by
n n n
= EC2+
- XC2r-2E CirCr (2)
r=1 r=1 r=1

Now it is a well-knownpropertyof latent vectors that if they are normalized so that the
sums of squares of theirelementsare equal to theircorrespondinglatent roots then
A =c1cl + c2c2+ ***+Cncn (3)
n n
Thus a - c?r
C and aij = E CirCIr,
r=1 r=1

so that (2) becomes d = aii + ajj - 2aii. Thus when the vectors of Table 1 have been
normalizedso that E2c =
-Ar,
{A(Q, Qj)}2 = ai + aj1-2aij. (4)
The points Qi oftenhave the rightsort of metricpropertiesforrepresentingthe inter-
relationshipsbetween the individuals when A is an association matrix. For if A is a simi-
larity matrix or a formal product-momentcorrelationmatrix between the individuals,
A(Qi,Qj) will be zero for complete identificationand will attain its maximum value for
completeopposites.In both these cases the diagonal elementsofA are unityand (4) becomes

{A(Q, Qj)}2 = 2(l - aij).


It also followsfrom(4) that ifwe put aij = l-d21 and aii = 0 then A(Qi, Qj) = d,>and this
gives a direct method of findingthe co-ordinates of a set of points given their inter-
distances d*j.
It is easy to see why a good representationcan be obtained in a reduced number of
dimensionswhen some of the Ai are small. If Aris small then the contribution(cir- Cjr)2 to
the distance between Qi and Qj will also be small; in fact the sum of squares of the co-
ordinatesalong the Araxis is Arby definition.However, ifAris large but the circorresponding
to it are not very differentthen (cir- Cjr)2 will also be small. Thus, the only co-ordinates
which contributemuch to the distances are those forlarge Arwhich have wide variation in
the elements of their vectors. In many applications it is found that the distances can be
Distance propertiesof latentvectormethods 329
adequately expressed in terms of two or three such vectors. Very often the vector cor-
respondingto the largest root has more or less constant elements which by (3) will con-
tributeconstantamounts to all the aij and may be regardedas allowingforthe mean value
ofall the elementsofA. Clearlysuch a mean value is unimportant,forifwe add any constant
to all the elements of A and go throughthe same numerical procedure we will leave the
distances A(Qi, Qj) as given by (4) invariant. We will of course get differentco-ordinate
values, Q*,in Table 1 and different roots but the distances will not have altered. These new
points must be an orthogonaltransformationof the firstset aftera change of originand
possibly a mirrorimage transformation,depending on the signs arbitrarilygiven to the
vector elements; we can ask which of these transformationswill give us the best fitwith a
reduced number of co-ordinates.This is preciselythe situation for the use of a principal
componentsanalysis. If the elementsof A are dimensionless,as is usual forassociation type
matrices,then we have a set of co-ordinatesgiven by Table 1, and principal components
analysis will give the directioncosines of lines of best fitthroughQ in as many dimensions
as required.These directioncosinesare the latent vectorsofa correctedsums ofsquares and
products matrix B wherethe columnsof Table 1 are now regarded as variates. We have
n n
z~~~
cij=A,zCfjC =0O,
==1

so that bij = Ai-n-0, bf =-ncicj. (5)


The characteristicequation forthe latent roots ,uiof B is

f(4a) -2L+-2L ... + Ln-Lo/n = O, (6)


where Lo (A1- ) (A2-1X) ... (An

and Li = (A1- a) (Ai1 - ) (Ai1 -1) ... (An -1X) if it 0-

Using equation (6) it is possible to relate the roots,t to the rootsA. First (6) has a zero root,
because it is always possible to fitn pointsinto n -1 dimensions,and secondlythe remaining
roots of (6) are separated by the roots A. Thus a knowledgeof the Ai will tell us quite a lot
about the best fitto be obtained froma principal componentsanalysis of the Qi.
If two, or more, of the Ai are negative then we must have one, or more,negative values
of ,t, indicating that points cannot be found in real space such that d41i= aii + aj -2ai,
When the negative jt have small modulus they will have littleeffecton the values of dij as
calculated fromthe real co-ordinatesand will give no trouble. If, however,a large negative
It occurs it may still be possible to finda low-dimensionalmodel to give the rightorder of
magnitudeofdi2.but thismodelwillinvolve one ormorepurelyimaginarysets of co-ordinate
axes which will upset intuitiveideas of distance. Thus the points Q1(1,i) and Q2(- 1, - )
willin factbe zero distance apart but in any real model will appear distant.Thus it becomes
important to examine the conditions under which the matrix A will give rise to a real
configuration.
Obviously it is sufficientforA to be positive semi-definite(p.s.d.) forthen all A and hence
all It are non-negative.On the other hand if thereis more than one negative A a real con-
figurationis impossible because this implies at least one negative Itt. If there is just one
negativeroot Anthena real configuration is possible onlywhenIt,, > 0. If A has a zero root
then a necessaryand sufficient conditionfora real configuration is that A is p.s.d., because
if A has a negative root so will B. It is always possible to adjust A so that it has a zero root
330 J. C. GOWER
withoutalteringthe distances A(Qi, Qj) forifdi is the mean value of the ith row (or column)
of A and -ais the overall mean we can definea matrix a with elementsgiven by

aij = aij-di-dj+ a (7)


It is easy to see that = (8)

preservingthe distance property. The sum of every row and column of a is zero and
consequentlya has a zero root.
A set of necessaryand sufficientconditionsfora real configurationof points Qi to exist
such that the distances dij are given by d41= ai+ ajj -2aij is that a be p.s.d. For many
frequentlyused coefficientsof association it can be shown that A has this property.An
account of this workwill be given elsewhere.
We have now established that if we are given an association matrix A we can finda set
of points Qi (i = 1, ..., n) in n space such that the distance dij between Qi and Qj is given by
d4j = aii + ajj - 2aij and that formany coefficients ofassociation thisis a desirableproperty.
Furthermore,it is legitimateto use the methodofprincipalcomponentson the co-ordinates
of the Qi to findthe best fitin fewerdimensions.This method requires a two-stage com-
putational procedureand at each stage the latent roots and vectors of an n x n matrix are
required. It would be advantageous if these two stages could be collapsed into one. From
(5) we see that B will become the diagonal matrix diag (Al,A2,..., A) if ci = 0 forall i and
under these conditionsthe points Qi are themselvesthe principal componentprojections
because the directioncosines definedby the vectors of any diagonal matrixare parallel to
the originalaxes.
Because the rows of a all sum to zero the vector 1 = (1, 1, ..., 1)' is the latent vector of a
correspondingto the zero root, so that if v = (v., v2,..., vn)' is any otherlatent vector,then
1'v = 0 and thus Evi= 0.
If now Table 1 has arisenfroma ratherthan fromA we have ci = 0, eitherbecause Ai = 0
or by the above result. It followsfromthe previous paragraph that the points Qi derived
froma are the principal componentfitto the co-ordinatesthat would be arrived at after
the two-stagecalculation startingfromA.
The computationalproceduresuggestedis:
(i) Form the association matrixA.
(ii) Transformthis to a by equation (7).
(iii) ConstructTable 1 by findingthe latent roots and vectorsof a scaling each vector so
that its sum of squares is equal to its correspondinglatent root.
(iv) The ith row of Table 1 may now be regardedas the co-ordinatesof a set of points Qi
whose distances apart are given by the best approximations to (aii + a, - 2aij)l in the
chosen numberof dimensions.
(v) As in all principal components analyses, the sum of squares of the residuals (i.e.
perpendiculars on to the reduced k-dimensionalrepresentation) will be the difference
between the trace of a and the sum of the k largestroots of a.
Results similarto those given above have previouslybeen reportedby Torgerson(1958,
p. 123) but this author does not appear to realize the close connexion with principal com-
ponents analysis and suggests that a can be analysed by any factor analytical method;
such an approach would only obscure interpretation.Rao (1964), in a paper surveyingthe
different uses of principal components,has quoted Torgerson's results but overlooks the
fact that the co-ordinatesobtained froma have their centroidas origin,forhe advocates
Distance propertiesof latentvectormethods 331
shiftingthe originto the centroidby replacing a by (I-i I'/n) a. Because the column means
of a are zero, I'a is a null row vector and the formulaleaves a unchanged.

4. DUALITY OF Q AND R TECHNIQUES


Q techniques are those which operate on a n x n matrix whose elementsare measures of
association between the individuals. R techniques operate on a v x v matrix defining
relationshipsamongstthe variates. Thus ? 3 is concernedwitha Q techniquewhilstprincipal
componentsand most otherstatisticalmultivariatemethodsare R techniques. Both Q and
R techniques give the co-ordinatesof a set of n points in multidimensionalspace. If, given
a multivariatesample, a Q and an R technique both give riseto the same set ofpoints,in the
sense that the distances between all pairs of points are duplicated, then we shall say that
the Q and R techniques are dual to one another.In a fewimportantcases we can findpairs
of dual techniques.
4*1. Thedual ofprincipalcomponents
analysik
The Q-matrixdefinedby qi, - j, wheredijis given by (1), must, using the method of
? 3, give rise to the principal componentsfitto a set of points whose distances apart are
(qii + qjj - 2q j)1 = d,j. These are the originaldistances and it followsthat the method of ? 3
operating on - 1d01is dual to principal components analysis of the sum of squares and
products matrix between the variates. We now give an algebraic proof of this; we shall
need some of the resultslater.
Let X be the n x v data matrixcomposed entirelyofvariates withzero sample means,then
the sums of squares and products matrix required for a principal component analysis is
X'X. Suppose this matrix has a latent root A and correspondingvector u. Thus
X'Xu = Au. (9)
Considerthe Q matrix XX' which has a root ,u and vector v then
XX'v = ,Uv. (10)
Pre-multiplying(9) by X we have
XX'(Xu) = A(Xu), (11)
so that A = ,t and Xu = kv, where k is a constant relatingthe scaling of the two sets of
vectors. The directioncosines forprincipal componentsare normalizedsuch that u'u = 1.
From (11), k2v'v - u'X'Xu = Au'u = A. If we normalize the v vectors so that v'v = A as
is required in ? 3 then k = 1 and (11) becomes v = Xu.
The elementsaij of the Q matrix XX' are given by
V V

ai =E x1r, aij = XirXjr. (12)


r=1 r=1

The methodof ? 3 willgive riseto a set ofco-ordinatesQi whoseinter-co-ordinate


distances
are given by v
d4j = a + aj -2aj -xj)2.
(x,(Xir (13)
r=1
Thus a dual of principal components analysis is the method of ? 3 operating on the
Q matrixgiven by (12). AnotherQ matrixgivingan identical solutionis given by
V
qfj
=-d22 =-2 E Xitr-Xjr)2
r=1
332 J. C. GOWER
of which (12) is the a matrix of ? 3. The co-ordinatesgiven by the firstt vectorsof (12) are
the same as those obtained fromthe projections on to the firstt directioncosine vectors
obtained from the principal component analysis. This follows from the fact that the
technique of ? 3 gives the principal componentfitto a set of points whose distances apart
are given by (13) which are distances between the co-ordinatepoints used forthe principal
components analysis. There are two important special cases of this duality of principal
components.These are dealt with in the followingtwo subsections.

Sokal's measureof taxonomicdistance. Suppose each xij is standardized by the variate


standard errorsp,then principal componentsare derived fromthe correlationmatrix. The
distances are given by

r=

whichis the measureoftaxonomic distance proposed by Sokal and others(Sokal & Sneath,
1963, p. 147). The technique of ? 3 used on the matrix
V

dii = Xir XjrIS


r=1

will lead to a reduced dimensionalconfigurationidentical to that obtained froma principal


componentsanalysis of the correlationmatrix.
When n < v the Q technique is clearly more convenient computationally because it
requiresthe roots and vectorsof a smallermatrix.When n > v the Q technique may stillbe
betterbecause it leads directlyto the co-ordinatesof Qi whilstthe R technique needs a set
of orthogonaltransformationsu'x to derive these co-ordinates.

Matchingcoefficients. Suppose now that X is composed entirelyof (0, 1) data. When com-
paring two individuals we have, in the usual notation, a (1, 1), b (0, 1), c (1,0) and d (0, 0)
pairs where a + b + c + d = v, the number of variates. Sokal & Sneath (1963, p. 125) give
a list ofdifferent
matchingcoefficients whichhave been used fromtimeto time.We are here
only concernedwith Sij = (a+ d)/vand note that

(Xir -Xjr)2=b + c = v(l -Sij)


r=1
and that (12) becomes

aii = numberof characterspresentforith individual, 1(14)


aij= number of characterscommon to ith and jth individuals.(
We shall ignorev in the followingas it onlyhas a constantmultiplicativeeffectthroughout
and does not materiallyaffectour results;in practice v may be importantifthereare,many
missing values. The technique of ? 3 used on the matrix definedby (14) will give points
distance {2( 1- Sij)}1 apart which are also the same as the distances given by aij = Sij. The
same points would be obtained by a principal componentsanalysis on the matrix of cor-
rected sums of squares and products between the variates. Although a conventional
principal componentsanalysis of (0, 1) data may seem of dubious-validity,the above shows
that it is exactly equivalent to assuming that the individuals are representedby points
whose distances apart are proportionalto (1 - Sij)i
Distancepropertiesof latentvectormethods 333

4-2. Mahalanobis'sD2 statistic


and discriminant
analysis
If we have k differentmultivariatenormal populations with a commondispersionmatrix
W, then the Mahalanobis measure of distance between the means xi and 5j of the ith and
jth populations is given by
D4Si=( i-X) W-1(x-i-xi)'
Assuming, without loss of generality,that lr+... + k = 0 for r = 1,2, ..., v, the a
matrix correspondingto D2 is given by

cXii= xFW-1lx, xij = xfW-lxj. (15)


The method of ? 3 will give a set of k points referredto rectangularprincipal axes with
inter-distancesD,j. We shall show that the configurationusing the space definedby some
only of the co-ordinateaxes is the same as that provided by the normallinear discriminant
functions,or canonical variates, forthe same dimensions.A closely related result is given
by Rao (1952) wherehe showed that the firstt canonical variates maximize the total D2 in
t dimensions.
The matrixa of (15) may be written.RW-1R',whereX is the kIx v matrixofthepopulation
means whose ith row gives the means forthe ith population.
We put W-1 = UU'. This can always be done and in fact U may be upper triangular.
Thus a = (XU) (XU)' and this is of the same formas (10) and consequentlythe analysis of
?3 gives the same results as a principal components analysis of (XU)'(XU), that is
U'X'XU. We require the latent vectors 1 of U'X'XU; that is
(U'X'XU-AI)1 = 0, or U'(X'X -AU'-'U-) Ul = 0.
Now U'-1U-1 = (UU')-1 = W and thereforeU'(X'X - AW) Ul = 0.
This is the equation to be solved forfindingthe discriminantfunctions,whose coefficients
are thereforegiven by Ul.
The co-ordinatesof the k means given by the discriminantcoefficientsare XU1 whilst
those given by the principal componentsanalysis are given by (11) and are also XU1. This
proves the result.
It is instructiveto see what happens if,given a singlepopulation, we definethe distance
di between individuals by dg3= (xi - xj) W-l(x - xj)', where W is the dispersionmatrix.
We may do this with D2 in mind, hoping to allow for correlationsbetween the variates in
our measure of distance. The previous algebraic treatmentfollowsexcept that X'X is now
W. Thus we have to solve (W - AW) Ul = 0, an equation whichonlyhas a non-nullsolution
when A = 1 in which case any vector 1 will suffice.This implies that variation in all direc-
tions is homogeneous,or, what amounts to the same thing,the points distance dij apart lie
at theverticesofa regularsimplexin n - 1 dimensionsprojectedonto a space ofv dimensions.
This resultis satisfactorybecause no preferenceis given to any one individual out of the
whole set whichdefinethe population and reflectsthe attitude that the group ofindividuals
is homogeneous.

5. FACTOR ANALYSIS
In the last paragraph we showed that a reasonable measure of distance between sample
numbersof a singlepopulation does not lead to a reductionin the numberof dimensionsof
the sample fromthe smaller of n -1 and v. Factor analysis achieves this by assuming that
334 J. C. GOWER
each variate can be expressed as a linear compound of k (< v) hypotheticalvariates, the
common factors,plus an additional term depending only on the particular variate and
known as the specificfactor.Algebraically the variates are related to each other by
k
xi= ,lifj+ei (i = 1,2,*..,v).
j=1

The specificsei are supposed to be independentofeach otherand ofthefj. There is no loss of


generalityin assuming that thefj are uncorrelatedand with unit variance. The covariance
matrix C of the x's is C = LL' + V, where V is the diagonal matrix of the variances of the
ei and L is the v x k matrix of the coefficientslij, the factor loadings. The problem is to
estimateL and V and also to estimatethe factorscoresfjforeach individual. The estimation
proceduredepends on what is assumed about the distributionof the x's.

5 1. Distancesbetween
individuals
It is usual in factor analysis to concentrateintereston the factor loadings but we are
more concerned here with the distances between the individuals, regardingtheir factor
scores as co-ordinates.We firstnote that the fundamentalequation of factoranalysis may
be written
x-e = Lf. (16)

Thus forany estimates of f fora set of individuals the correctedscores x - e lie in k-space
as do the factorscoresthemselves.The factorsf are uncorrelatedwithunit variance, though
this is not true of the estimated factor scores using the methods discussed in the next
section. Absence of correlationis retained for any orthogonalrotation of axes in factor
space. Throughout this section we assume the theoreticalunit dispersionmatrix for the
factor scores so that distances between the individuals with factorscores as co-ordinates
can be calculated directlyusingD2 witha unit-dispersionmatrix.The dispersionmatrixfor
the correctedscoresis C - V or LL' whichhas rank k and thereforeno inversein the ordinary
sense. A slightextension to the definitionwill allow us to defineD2 forsingulardispersion
matrices; details are given in the Appendix.
Equation (16) relates the v variates x - e to the k variates f as describedin the Appendix
and therefore,using the extended definitionof D2, the distances between the estimated
factor scores are equal to the distances between the observed variates correctedfor the
estimated specifics.This resultis independentof the particularmethod used forestimating
the factorscores.
An alternativeway of lookingat the factoranalysis model is to regardthe v-space of the
x-variatesas embedded in a (v + k)-space whose axes are orthogonal,v ofthese representing
the specificsand k the common factors. Correctingthe x's for the specificsensures that
a k-space is attained.
Writing6y for (xi - ei) - (xj - ej) and 61forfi- fj, the differencesbetween the corrected
scores and the estimated factorscores respectivelyof two individuals i and j, the equality
between the two distances gives
MM
A= 6y'(LL')- 5y, (17)
where (LL')- is a suitably chosengeneralizedinverseof LL' dependingon the method used
forestimatingfactorscores; see below.
Distance propertiesof latentvectormethods 335

5*2. Distancesusinyparticularestimates
offactorscores
There are two closely related methods forestimatingfactorscores, one due to Bartlett
(1938) and the other to Thomson (1951); see also Lawley & Maxwell (1963, p. 88). For
each of these methodswe can findan algebraic formfor f' Mfin termsof the originaldata
and the matricesC, V, L.
Bartlett's methodestimates the factorscoresf by
f = J-1L'V-lx, (18)

where J = L'V-1L. (19)


Thus on notingthat 8y = L f and writing6x forxi - xi, equation (17) becomes
,x'V-1LJ-1J-1L'V-1ax - ax'V-1LJ-1L'(LL')-LJ-1L'V-1ax. (20)
If we set (LL')- = V-1LJ-1J-1L'V-1, (21)
we see that (20) is satisfiedand it is easy to verifythat thisis a generalizedinverseofLL'; for
(LL') V-1LJ-2L'V-1(LL') = LL'
followsimmediatelyfrom(19).
Thus forBartlett's method of estimationthe distance is given by
ix'(LL')-6x which may be written 6x'(C - V)-6x, (22)
with the particular generalized inverse given by (21). Equation (22) is Mahalanobis's D2
with C - V in the role of the withinpopulation dispersionmatrix.
Thomson's method of estimatingfactorscores gives
f = L'C-lx. (23)
Substitutinginto (17), we have that
ax'C-1LL'C-1ax = ax'C-1LL'(LL')-LL'C-1ax, (24)
a resultwhich is true by definitionof a generalizedinverse.
Thus in Thomson's case the distance between the factorscores is given by
axiC-i(C - V) C-18x, (25)

which may be written4x'[C(C - V)- C]->x so that C-1(C - V) C-1 is a generalizedinverse


of CV-1LJ-1J-1L'V-1C which may be shown to be equivalent to L(I + J-1)2L'. Thus (25)
may be written
8x'[L(I + J-1)2L']->x, (26)
which is a more complex D2 formthan (22).

5*3. Relationoffactoranalysistoprincipalcomponents
This investigationof distance in factoranalysis should not be taken to imply that the
writerrecommendsthe use of the methodin the type of situation consideredearlierwhere
the multivariatesample cannot be consideredas homogeneous.The method can be legiti-
mately used onlyin those situationswhereits underlyingassumptionsare at least approxi-
mately fulfilled;in many published applications this is not so.
336 J. C. GOWER
Cattell (1965), a leading exponentoffactoranalysis,discussingthe heterogeneouspopula-
tion problem recommends that a cluster analysis should precede factor analysis in an
attempt to separate out the differentpopulations or 'species'. We can then separately
factoranalyse the membersofeach species to findfactorsthat defineindividuals withinthe
species, and do an inter-speciesfactoranalysis on the species means to findthe factorsthat
differentiate species. This is contraryto commonpractice which uses factoranalysis on the
complete heterogeneoussample as an alternative to clusteranalysis in the hope that the
firstfactorwill differentiatespecies; but as Cattell points out, 'the factorsobtained will
neither be clear species differentiators nor optimum individual differentiators'.In the
model-makingsituationspreliminaryto classification,discussed earlier,thereusually seems
to be little justificationforusing factoranalysis and the simpler computational methods
given above in ? 3 are just as useful. It seems that the reason for obtaining meaningful
resultswhen using factoranalysis in these situationsis that the resultsobtained are often
very close to the results obtained by a principal componentsanalysis.
Some insightinto the relationshipbetween the two methods can be obtained by noting
that the maximum likelihood estimates of factorloadings satisfythe equations
JL'V-1 = L'V-I(V-AV--I). (27)
This is equation (2.11) of Lawley & Maxwell (1963) in a rearrangedform.Thus L'V- are
the firstk latent vectors of V-IAV4 - I and J is the diagonal matrix of the latent roots.
By settingJ = L'V-1L the vectors are scaled so that the sum of squares of theirelements
equal the roots. AlternativelyL'V- are the latent vectorsof V-IAV- withroots given by
I + J. In either case L'V-1 are the vectors of a matrix which is put into non-dimensional
formby scaling by the diagonal matrix V-1. The situation is very close to the principal
componentsanalysis of the correlationmatrixwhere V-mis replaced by S-1, the matrix of
standard errorsgiven by S = diag (A). Regarding J-L'V-1 as direction cosines derived
froma principalcomponentsanalysis ofV-IAV-Wwe see that the projected values ofscaled
co-ordinatesV-x onto k dimensionsare given by
v = J-L'V-lx. (28)
This result differsfromBartlett's and Thomson's estimates of factorscores only in a scale
factoralong each of the k axes so that, except in exceptional circumstances,the same sort
of spatial configurationcan be expected fromall the methods.Thus when V is proportional
to S a principal components analysis can be expected to give similar results to a factor
analysis.

6. PROXIMITY ANALYSIS
An interestingtechniquehas been describedby Shepard (1962 a, b) and furtherdeveloped
by Kruskal (1964a, b) who name their method ProximityAnalysis. The method assumes
the matrixA to be available and endeavours to finda set of points Qi in a reduced number
of dimensionsk such that the distancesA(Qi, Qj) are monotonicallyrelatedto the aij. Rather
empirical methods are given fordeciding on the optimum value of k required to preserve
the monotonicrelationshipwithoutusing too many dimensions.The relationshipof the a
to the iX(Qi,Qj) gives the monotonictransformation ofail requiredto reduce dimensionality.
For example, the monotonictransformationfromaij to -log aij may be such that points
Qj can be foundin, say, two dimensionsso that the rankingofthe distances A(Qi, Qj) is very
Distance propertiesof latentvectormethods 337
nearlythatofthe - logaij and ifthisis so thegraphofaij againstA(Qi,Qj) shouldrecover
thistransformation. The transformation, althoughmonotonic, need not have any simple
functionalform.Shepardhas producedexamplesto showthat the methoddoes indeed
recovera knowntransformation whenit is usedon dummydata intwoorthreedimensions.
The maininterestofShepard'sworkis thatthe numericalvalues ofthe aij are not used;
onlytherankorderoftheirsizesis important. Shepardarguesthatwhenall the 1n(n-1)
distancesare known,a knowledgeof theirrankorder,ratherthan the actual distances,
entailsverylittleloss ofinformation.
In ? 3 and whenall thediagonalelementsareoriginally 1,beforecalculating thea matrix,
we haveshownthatthedistancesaregivenby{2(1 - aij)}1andthisis a particular monotonic
transformation oftheaij. Consequentlyifwegeta goodfitin,say,twodimensions usingthis
distancefunction, we would expecthighcorrelation withShepard'ssolution.However,
Shepardmaybe able to finda goodfitin a lownumberofdimensions wherethemethodof
? 3 cannot.We arein effect usinga particulardistancefunction, thoughthereis nothingto
stopus transforming thevalues ofaij ourselvesifwe have goodreasonto thinkthatsuch
a transformation willbetterreflectthe inter-relationshipsbetweentheindividuals.As all
similarities are positivenumbersnot greaterthan one, the pointsgivenby ? 3 mustlie
withinan n-1 dimensionalregularsimplex.Thus the pointsforany threesimilarities
812, 823, 831 lie insidean equilateraltriangle ofside V2. The logarithmictransformation will
convertthisrestricted space to one wherea pair of completely dissimilarindividualswill
becomepointsan infinite distanceapart,butit mayintroduce difficulties
shouldno solution
existin realspace.
The relationships betweenShepard'ssolutionand the principalcomponentssolution
needs investigation but as Shepard'scomputationsare muchmorecomplexthan those
requiredto findlatentrootsand vectorsof a symmetric matrixit is suggestedthat the
methodof? 3 be triedfirst andifthisdoesnotlead toa solutionofsufficiently lowdimension-
alitythenthe co-ordinates foundin n -1 dimensionscan be used as a starting-point for
Shepard'siterativemethod.

7. CONCLUSION
The workreportedin this paper grewfroma dissatisfaction withthe manyreported
applicationsoffactoranalysisand principalcomponents analysisof Q matricesfoundin
classificationwork,particularlyin the biologicalliterature.The interpretation of such
methodscan be betterunderstoodby examiningthe distances,suitablydefined,between
the individualsand we have givena methodforfindingco-ordinates foreach individual
referred to principalaxes whichpreservethesedistances.To distinguish the methodfrom
classicalprincipalcomponents analysisit mightbe usefulto referto a principalco-ordinate8
analysis.
Perhapsthe mostimportantaspect of thistypeof analysisis the suitablechoiceof a
distancefunction.When identification is requiredD2 has certainoptimalproperties for
normalpopulationswhilstfornon-normal populations,functionssimilarto D2 could be
devisedwhichwouldimmediately providea canonicalanalysisand mightlead to a practical
solutionofthe discriminant problem.Rao (1948) has suggesteda generalapproachto the
problemofdistancewhichmightbe usefulhere.Unfortunately it is notalwayspossibleto
recognizethe different populationsin the originalsampleof individualsand thesemust
338 J. C. GOWER
firstbe foundby using an analysis based on similaritiesor distances which do not allow for
within population correlation.Furtherexamination of the propertiesof such distances is
needed.

I thank Mr M. J. R. Healy formany helpfuldiscussionsand also forthe neat proofthat


Zvi = 0 given at the end of ? 3. Much of this work was stimulated by Dr J. H. Rayner's
concernwith soil classification.

APPENDIX. GENERALIZED DISTANCE WHEN THE DISPERSION MATRIX


IS SINGTULAR
To defineD2 in themoregeneralcase we firstnote that ifM is a non-singularkx kmatrixand two sets
of variates are related by y = Mx then D2 forthe x's is the same as D2 forthe y's, because
I
(yi - yn) {E(yy')}-' (yi - yj) = (xi - x,) M'{ME(xx') M'}-' M(xi - x,)
= (xixi)I
- {E(xx')}-' (xi - xj).

Thus a non-singular-transformation fromone set of k variates to any otherset of k variates leaves D2


unchanged.Now suppose M is a v x k matrixwithv > k,thenwe derivev variatesy fromthe kvariates x
and E(yy') = ME(xx') M' onlyhas rank k. We may choose a subset of k variates out ofthe v y variates
in manyways,and at least one such subset Yi willhave a dispersionmatrixofrankk. Suppose in general
subsets of size k with dispersionmatricesof rank k: then each subset
that YIIY2, ..., Yr are all different
is a non-singulartransformation of the x variates and must give rise to the same distanceD2 as before.
It is naturalto choosethisvalue ofD2 as the distance.The value ofD2 may be computedfroma general-
ized inverseof the dispersionmatrix(Rao, 1962).

REFERENCES

BARTLETT, M. S. (1938). Methods of estimatingmental factors.Nature,Lond. 141, 609-10.


BIDWELL, 0. W. & HOLE, F. D. (1964). An experimentin the numericalclassificationof Kansas soils.
Soil sci. Soc. Amer. Proc. 28, 263-8.
CATTELL, R. B. (1965). Factor analysis: An introductionto essentials.II. The role of factoranalysis in
research.Biometrics21, 405-35.
KRUSKAL,J. B. (1964a). Multidimensionalscaling by optimisinggoodness of fitto a nonmetrichypo-
thesis. Psychometrika
29, 1-27.
KiRusEAL,J. B. (1964b). Nonmetricmultidimensionalscaling: a numericalmethod.Psychometri,ka 29,
115-29.
LAWLEY,D. N. & MAXWELL, A. E. (1963). Factor Analysis as a StatisticalMethod.London: Butter-
worths.
LYSENKO,0. & SNEATH, P. H. (1959). The use of models in bacterial classification.J. Gen. Microbiol.
20, 284-90.
RAO, C. R. (1948). On the distance between two populations. Sankhya 9, 246.
RAO, C. R. (1952). AdvancedStatisticalMethodsin BiometricResearch.New York: John Wiley.
RAO, C. R. (1962). A note on a generalized inverse of a matrix with applications to problems in
mathematicalstatistics.J. B. Statist.Soc. B, 24, 152-8.
RAO, C. R. (1964). The use and interpretationof principalcomponents analysis in applied research.
SankhyaiA, 26, 329-58.
SHEPARD, R. N. (1962a) The analysis of proximities: multidimensional scaling with an unknown
distance function.I. Psychometrika27, 125-39.
SHEPARD, R. N. (1962b). The analysis of proximities: multidimensionalscaling with an unknown
distance function.II. Psychometrika27, 219-46.
SOKAL, R. R. & SNEATH, P. H. (1963). PrinciplesofNumericalTaxonomy.San Francisco and London:
W. H. Freeman.
THOMSON, G. H. (1951). The Factorial Analysis of HumnanAbility.5th ed. London University Press.
TORGERSON, W. S. (1958). Theoryand Methodsof Scaling. New York: John Wiley.

You might also like