Professional Documents
Culture Documents
Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis
Author(s): J. C. Gower
Reviewed work(s):
Source: Biometrika, Vol. 53, No. 3/4 (Dec., 1966), pp. 325-338
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2333639 .
Accessed: 03/05/2012 12:05
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
Biometrilka(1966), 53, 3 and 4, p. 325 325
Printed in GreatBritain
oflatentrootand vectormethods
Somedistanceproperties
usedin multivariate
analysis
BY J. C. GOWER
Rothamsted
Experimental
Station
SUMMARY
This paper is concernedwiththe representation of a multivariatesampleof size n as
pointsP1,P2,..., PI ina Euclideanspace.Theinterpretation ofthedistanceA(Pi,Pj) between
theithandjthmembers ofthesampleis discussedforsomecommonly usedtypesofanalysis,
includingbothQ and R techniques.Whenall the distancesbetweenn pointsare knowna
methodis derivedwhichfindstheirco-ordinates referred to principalaxes.A setofnecessary
and sufficientconditionsfora solutionto existin real Euclideanspace is found.Q and R
techniquesare definedas beingdual to oneanotherwhentheybothlead to a setofn points
withthesameinter-point distances.Pairsofdual techniquesare derived.In factoranalysis
the distancesbetweenpointswhoseco-ordinates are the estimatedfactorscorescan be
interpretedas D2 witha singulardispersion matrix.
1. INTRODUCTION
If we have information on v variatesforeach ofn individuals,thiscan be set out in a
two-waytablewithxijas thevalueofthejth variatefortheithindividual.Thistableis the
starting-point formost multivariatestatisticalmethodssuch as discriminant analysis,
factoranalysisand principalcomponents analysis.
Different methodsrequiredifferent assumptionsabout the structure ofthe sampleand
about the hypotheticalmultivariateprobabilitydistribution fromwhichthe sample is
drawn.Thus,in discriminant analysiswe mustbe able to assigneach individualto one of
a setofpredetermined groupseachwitha knowndistributional form,and in factoranalysis
thedispersion matrixmusthave a particularstructure.
Recentlytherehas beenwidespreadinterest inthepossibilityofusingnumerical methods
Sokal & Sneath(1963,p. 178) givereferences
as an aid to classification. to investigations
usinga varietyofdifferent numericalmethods;thesemethodsall beginwitha multivariate
sample,butso faras is knowneachindividualmaycomefroma different biologicalpopula-
tionor all individualsmay comefromthe samepopulation.An additionalcomplication is
thatmanyor all ofthe variatesmaybe qualitative,so thatproductmomentcorrelations
betweenvariatesmaybe inappropriate. Thetechniquesusedarebasedona n x n symmetric
matrixA whose(i,j)th element,aij, is a coefficient ofassociationbetweenthe ith and jth
individuals.Sokal & Sneath(1963)givea listofcommonly manyofwhich
usedcoefficients,
are familiarto statisticians.
In one typeof analysisthe matrixA is firstcalculatedand thena methodofforming
groupsofindividualsknownas a clusteranalysisfollows.Individualsare assignedto the
samegroupwhentheircoefficients in A obeycertaincriteria,whichdependon themethod
ofclusteranalysisbeingused.Thesecriteriaalwayshave a simplegeometric interpretation
2I Biom. 53
326 J. C. GOWER
when ai1, or some functionof aij, is regarded as the distance between the ith and jth indi-
viduals. Thus an arbitrarydistance a may be chosen and a criteriondefinedby sayingthat
all membersof a group must be withindistance a of all othermembers,or that it must be
possible to finda chain linkingall membersof the group such that the lengthof no link is
greaterthan 6, or that the average distance between all membersof the group is less than
6, etc. The usual techniqueis to let 6 vary froma maximumto a minimum(sortingupwards)
or froma minimumto a maximum (sortingdownwards),and to examine how these groups
split or combine at the differentlevels of d. Not only do differentcriteriagive different
groups,but some criteriaformdifferent groupsforthe same value ofa6when sortingupwards
fromwhen sortingdownwards.
We are not primarilyconcernedwithclusteranalysis herebut onlywishto emphasize that
the concept of distance between individuals is fundamentalto these techniques.
Recognizing the metricnature of the matrix A, several workershave endeavoured to
constructmodels in two or three dimensionswhich reflectthe inter-individualdistances;
see, for example, Lysenko & Sneath (1959) and Bidwell & Hole (1964). One of the by-
products-ofmost standard multivariate statistical techniques is a representationof the
multivariatesample in a small number of dimensions.There is a growingtendencyto use
these techniques formallyon association matricesto derive multi-dimensionalrepresenta-
tions of the sample, although the standard underlyingassumptions are not even approxi-
mately satisfied.In particular the association matrix is a n x n matrix formedfromthe
comparisonof all pairs of individuals,i.e. a so-called Q matrix,whilststandard techniques
postulate a v x v dispersion or correlationmatrix, i.e. a so-called R matrix, formedfrom
comparisonsbetweenthe variates. Despite this the techniques have been used successfully,
in the sense that the expected relative magnitudesof inter-individualdistances have been
recovered.In this paper we shall investigatethe extentto whichthis use is valid. To do this
we shall need the methodofprincipalcomponentsanalysis,whichis thereforediscussedin ?2.
Root
A1 A2 ... An
2I -2
328 J. C. GOWER
variate and a scoreof 1 givenifthis level occurs and 0 ifit does not occur; the problemthen
reduces to that of (0, 1) data discussedin ? 4 1. An alternativetreatmentfrequentlyused for
qualitative data, and sometimesfor quantitative data, is to calculate a n x n Q matrix of
coefficientsof association between individuals. The co-ordinates of the point Qi corre-
spondingto the ith individual are now definedby the ith componentofeach ofthe n vectors
of the Q matrix.It is hoped to get a low-dimensionalrepresentationof the sample by using
co-ordinateaxes correspondingto only a few large latent roots. Such methods have been
foundto lead to reasonable results,but the underlyingtheoryis not so well understoodas is
that ofprincipalcomponentsanalysis. In particulartwo questions arise: what is the correct
scalingforeach latent vectorand what is the distance A(Qi, Qj) whenthis scalingis used? In
the remainderofthis sectionwe firstanswerthese questions,thenshow how the methodcan
be improvedand give a fewancillaryresults.
Suppose A is a symmetricmatrixof ordern withlatent roots A1l A2D. A and associated
columnvectorscl, c2,..., cn. We may writethe vectorsdown in a square array as in Table 1
wherethe rthcolumnis the vector cr. The technique outlinedabove is to take the elements
of the ith row as the co-ordinatesof a point Qi in n space and the distance dij betweenQi
and Qj is then given by
n n n
= EC2+
- XC2r-2E CirCr (2)
r=1 r=1 r=1
Now it is a well-knownpropertyof latent vectors that if they are normalized so that the
sums of squares of theirelementsare equal to theircorrespondinglatent roots then
A =c1cl + c2c2+ ***+Cncn (3)
n n
Thus a - c?r
C and aij = E CirCIr,
r=1 r=1
so that (2) becomes d = aii + ajj - 2aii. Thus when the vectors of Table 1 have been
normalizedso that E2c =
-Ar,
{A(Q, Qj)}2 = ai + aj1-2aij. (4)
The points Qi oftenhave the rightsort of metricpropertiesforrepresentingthe inter-
relationshipsbetween the individuals when A is an association matrix. For if A is a simi-
larity matrix or a formal product-momentcorrelationmatrix between the individuals,
A(Qi,Qj) will be zero for complete identificationand will attain its maximum value for
completeopposites.In both these cases the diagonal elementsofA are unityand (4) becomes
Using equation (6) it is possible to relate the roots,t to the rootsA. First (6) has a zero root,
because it is always possible to fitn pointsinto n -1 dimensions,and secondlythe remaining
roots of (6) are separated by the roots A. Thus a knowledgeof the Ai will tell us quite a lot
about the best fitto be obtained froma principal componentsanalysis of the Qi.
If two, or more, of the Ai are negative then we must have one, or more,negative values
of ,t, indicating that points cannot be found in real space such that d41i= aii + aj -2ai,
When the negative jt have small modulus they will have littleeffecton the values of dij as
calculated fromthe real co-ordinatesand will give no trouble. If, however,a large negative
It occurs it may still be possible to finda low-dimensionalmodel to give the rightorder of
magnitudeofdi2.but thismodelwillinvolve one ormorepurelyimaginarysets of co-ordinate
axes which will upset intuitiveideas of distance. Thus the points Q1(1,i) and Q2(- 1, - )
willin factbe zero distance apart but in any real model will appear distant.Thus it becomes
important to examine the conditions under which the matrix A will give rise to a real
configuration.
Obviously it is sufficientforA to be positive semi-definite(p.s.d.) forthen all A and hence
all It are non-negative.On the other hand if thereis more than one negative A a real con-
figurationis impossible because this implies at least one negative Itt. If there is just one
negativeroot Anthena real configuration is possible onlywhenIt,, > 0. If A has a zero root
then a necessaryand sufficient conditionfora real configuration is that A is p.s.d., because
if A has a negative root so will B. It is always possible to adjust A so that it has a zero root
330 J. C. GOWER
withoutalteringthe distances A(Qi, Qj) forifdi is the mean value of the ith row (or column)
of A and -ais the overall mean we can definea matrix a with elementsgiven by
preservingthe distance property. The sum of every row and column of a is zero and
consequentlya has a zero root.
A set of necessaryand sufficientconditionsfora real configurationof points Qi to exist
such that the distances dij are given by d41= ai+ ajj -2aij is that a be p.s.d. For many
frequentlyused coefficientsof association it can be shown that A has this property.An
account of this workwill be given elsewhere.
We have now established that if we are given an association matrix A we can finda set
of points Qi (i = 1, ..., n) in n space such that the distance dij between Qi and Qj is given by
d4j = aii + ajj - 2aij and that formany coefficients ofassociation thisis a desirableproperty.
Furthermore,it is legitimateto use the methodofprincipalcomponentson the co-ordinates
of the Qi to findthe best fitin fewerdimensions.This method requires a two-stage com-
putational procedureand at each stage the latent roots and vectors of an n x n matrix are
required. It would be advantageous if these two stages could be collapsed into one. From
(5) we see that B will become the diagonal matrix diag (Al,A2,..., A) if ci = 0 forall i and
under these conditionsthe points Qi are themselvesthe principal componentprojections
because the directioncosines definedby the vectors of any diagonal matrixare parallel to
the originalaxes.
Because the rows of a all sum to zero the vector 1 = (1, 1, ..., 1)' is the latent vector of a
correspondingto the zero root, so that if v = (v., v2,..., vn)' is any otherlatent vector,then
1'v = 0 and thus Evi= 0.
If now Table 1 has arisenfroma ratherthan fromA we have ci = 0, eitherbecause Ai = 0
or by the above result. It followsfromthe previous paragraph that the points Qi derived
froma are the principal componentfitto the co-ordinatesthat would be arrived at after
the two-stagecalculation startingfromA.
The computationalproceduresuggestedis:
(i) Form the association matrixA.
(ii) Transformthis to a by equation (7).
(iii) ConstructTable 1 by findingthe latent roots and vectorsof a scaling each vector so
that its sum of squares is equal to its correspondinglatent root.
(iv) The ith row of Table 1 may now be regardedas the co-ordinatesof a set of points Qi
whose distances apart are given by the best approximations to (aii + a, - 2aij)l in the
chosen numberof dimensions.
(v) As in all principal components analyses, the sum of squares of the residuals (i.e.
perpendiculars on to the reduced k-dimensionalrepresentation) will be the difference
between the trace of a and the sum of the k largestroots of a.
Results similarto those given above have previouslybeen reportedby Torgerson(1958,
p. 123) but this author does not appear to realize the close connexion with principal com-
ponents analysis and suggests that a can be analysed by any factor analytical method;
such an approach would only obscure interpretation.Rao (1964), in a paper surveyingthe
different uses of principal components,has quoted Torgerson's results but overlooks the
fact that the co-ordinatesobtained froma have their centroidas origin,forhe advocates
Distance propertiesof latentvectormethods 331
shiftingthe originto the centroidby replacing a by (I-i I'/n) a. Because the column means
of a are zero, I'a is a null row vector and the formulaleaves a unchanged.
r=
whichis the measureoftaxonomic distance proposed by Sokal and others(Sokal & Sneath,
1963, p. 147). The technique of ? 3 used on the matrix
V
Matchingcoefficients. Suppose now that X is composed entirelyof (0, 1) data. When com-
paring two individuals we have, in the usual notation, a (1, 1), b (0, 1), c (1,0) and d (0, 0)
pairs where a + b + c + d = v, the number of variates. Sokal & Sneath (1963, p. 125) give
a list ofdifferent
matchingcoefficients whichhave been used fromtimeto time.We are here
only concernedwith Sij = (a+ d)/vand note that
5. FACTOR ANALYSIS
In the last paragraph we showed that a reasonable measure of distance between sample
numbersof a singlepopulation does not lead to a reductionin the numberof dimensionsof
the sample fromthe smaller of n -1 and v. Factor analysis achieves this by assuming that
334 J. C. GOWER
each variate can be expressed as a linear compound of k (< v) hypotheticalvariates, the
common factors,plus an additional term depending only on the particular variate and
known as the specificfactor.Algebraically the variates are related to each other by
k
xi= ,lifj+ei (i = 1,2,*..,v).
j=1
5 1. Distancesbetween
individuals
It is usual in factor analysis to concentrateintereston the factor loadings but we are
more concerned here with the distances between the individuals, regardingtheir factor
scores as co-ordinates.We firstnote that the fundamentalequation of factoranalysis may
be written
x-e = Lf. (16)
Thus forany estimates of f fora set of individuals the correctedscores x - e lie in k-space
as do the factorscoresthemselves.The factorsf are uncorrelatedwithunit variance, though
this is not true of the estimated factor scores using the methods discussed in the next
section. Absence of correlationis retained for any orthogonalrotation of axes in factor
space. Throughout this section we assume the theoreticalunit dispersionmatrix for the
factor scores so that distances between the individuals with factorscores as co-ordinates
can be calculated directlyusingD2 witha unit-dispersionmatrix.The dispersionmatrixfor
the correctedscoresis C - V or LL' whichhas rank k and thereforeno inversein the ordinary
sense. A slightextension to the definitionwill allow us to defineD2 forsingulardispersion
matrices; details are given in the Appendix.
Equation (16) relates the v variates x - e to the k variates f as describedin the Appendix
and therefore,using the extended definitionof D2, the distances between the estimated
factor scores are equal to the distances between the observed variates correctedfor the
estimated specifics.This resultis independentof the particularmethod used forestimating
the factorscores.
An alternativeway of lookingat the factoranalysis model is to regardthe v-space of the
x-variatesas embedded in a (v + k)-space whose axes are orthogonal,v ofthese representing
the specificsand k the common factors. Correctingthe x's for the specificsensures that
a k-space is attained.
Writing6y for (xi - ei) - (xj - ej) and 61forfi- fj, the differencesbetween the corrected
scores and the estimated factorscores respectivelyof two individuals i and j, the equality
between the two distances gives
MM
A= 6y'(LL')- 5y, (17)
where (LL')- is a suitably chosengeneralizedinverseof LL' dependingon the method used
forestimatingfactorscores; see below.
Distance propertiesof latentvectormethods 335
5*2. Distancesusinyparticularestimates
offactorscores
There are two closely related methods forestimatingfactorscores, one due to Bartlett
(1938) and the other to Thomson (1951); see also Lawley & Maxwell (1963, p. 88). For
each of these methodswe can findan algebraic formfor f' Mfin termsof the originaldata
and the matricesC, V, L.
Bartlett's methodestimates the factorscoresf by
f = J-1L'V-lx, (18)
5*3. Relationoffactoranalysistoprincipalcomponents
This investigationof distance in factoranalysis should not be taken to imply that the
writerrecommendsthe use of the methodin the type of situation consideredearlierwhere
the multivariatesample cannot be consideredas homogeneous.The method can be legiti-
mately used onlyin those situationswhereits underlyingassumptionsare at least approxi-
mately fulfilled;in many published applications this is not so.
336 J. C. GOWER
Cattell (1965), a leading exponentoffactoranalysis,discussingthe heterogeneouspopula-
tion problem recommends that a cluster analysis should precede factor analysis in an
attempt to separate out the differentpopulations or 'species'. We can then separately
factoranalyse the membersofeach species to findfactorsthat defineindividuals withinthe
species, and do an inter-speciesfactoranalysis on the species means to findthe factorsthat
differentiate species. This is contraryto commonpractice which uses factoranalysis on the
complete heterogeneoussample as an alternative to clusteranalysis in the hope that the
firstfactorwill differentiatespecies; but as Cattell points out, 'the factorsobtained will
neither be clear species differentiators nor optimum individual differentiators'.In the
model-makingsituationspreliminaryto classification,discussed earlier,thereusually seems
to be little justificationforusing factoranalysis and the simpler computational methods
given above in ? 3 are just as useful. It seems that the reason for obtaining meaningful
resultswhen using factoranalysis in these situationsis that the resultsobtained are often
very close to the results obtained by a principal componentsanalysis.
Some insightinto the relationshipbetween the two methods can be obtained by noting
that the maximum likelihood estimates of factorloadings satisfythe equations
JL'V-1 = L'V-I(V-AV--I). (27)
This is equation (2.11) of Lawley & Maxwell (1963) in a rearrangedform.Thus L'V- are
the firstk latent vectors of V-IAV4 - I and J is the diagonal matrix of the latent roots.
By settingJ = L'V-1L the vectors are scaled so that the sum of squares of theirelements
equal the roots. AlternativelyL'V- are the latent vectorsof V-IAV- withroots given by
I + J. In either case L'V-1 are the vectors of a matrix which is put into non-dimensional
formby scaling by the diagonal matrix V-1. The situation is very close to the principal
componentsanalysis of the correlationmatrixwhere V-mis replaced by S-1, the matrix of
standard errorsgiven by S = diag (A). Regarding J-L'V-1 as direction cosines derived
froma principalcomponentsanalysis ofV-IAV-Wwe see that the projected values ofscaled
co-ordinatesV-x onto k dimensionsare given by
v = J-L'V-lx. (28)
This result differsfromBartlett's and Thomson's estimates of factorscores only in a scale
factoralong each of the k axes so that, except in exceptional circumstances,the same sort
of spatial configurationcan be expected fromall the methods.Thus when V is proportional
to S a principal components analysis can be expected to give similar results to a factor
analysis.
6. PROXIMITY ANALYSIS
An interestingtechniquehas been describedby Shepard (1962 a, b) and furtherdeveloped
by Kruskal (1964a, b) who name their method ProximityAnalysis. The method assumes
the matrixA to be available and endeavours to finda set of points Qi in a reduced number
of dimensionsk such that the distancesA(Qi, Qj) are monotonicallyrelatedto the aij. Rather
empirical methods are given fordeciding on the optimum value of k required to preserve
the monotonicrelationshipwithoutusing too many dimensions.The relationshipof the a
to the iX(Qi,Qj) gives the monotonictransformation ofail requiredto reduce dimensionality.
For example, the monotonictransformationfromaij to -log aij may be such that points
Qj can be foundin, say, two dimensionsso that the rankingofthe distances A(Qi, Qj) is very
Distance propertiesof latentvectormethods 337
nearlythatofthe - logaij and ifthisis so thegraphofaij againstA(Qi,Qj) shouldrecover
thistransformation. The transformation, althoughmonotonic, need not have any simple
functionalform.Shepardhas producedexamplesto showthat the methoddoes indeed
recovera knowntransformation whenit is usedon dummydata intwoorthreedimensions.
The maininterestofShepard'sworkis thatthe numericalvalues ofthe aij are not used;
onlytherankorderoftheirsizesis important. Shepardarguesthatwhenall the 1n(n-1)
distancesare known,a knowledgeof theirrankorder,ratherthan the actual distances,
entailsverylittleloss ofinformation.
In ? 3 and whenall thediagonalelementsareoriginally 1,beforecalculating thea matrix,
we haveshownthatthedistancesaregivenby{2(1 - aij)}1andthisis a particular monotonic
transformation oftheaij. Consequentlyifwegeta goodfitin,say,twodimensions usingthis
distancefunction, we would expecthighcorrelation withShepard'ssolution.However,
Shepardmaybe able to finda goodfitin a lownumberofdimensions wherethemethodof
? 3 cannot.We arein effect usinga particulardistancefunction, thoughthereis nothingto
stopus transforming thevalues ofaij ourselvesifwe have goodreasonto thinkthatsuch
a transformation willbetterreflectthe inter-relationshipsbetweentheindividuals.As all
similarities are positivenumbersnot greaterthan one, the pointsgivenby ? 3 mustlie
withinan n-1 dimensionalregularsimplex.Thus the pointsforany threesimilarities
812, 823, 831 lie insidean equilateraltriangle ofside V2. The logarithmictransformation will
convertthisrestricted space to one wherea pair of completely dissimilarindividualswill
becomepointsan infinite distanceapart,butit mayintroduce difficulties
shouldno solution
existin realspace.
The relationships betweenShepard'ssolutionand the principalcomponentssolution
needs investigation but as Shepard'scomputationsare muchmorecomplexthan those
requiredto findlatentrootsand vectorsof a symmetric matrixit is suggestedthat the
methodof? 3 be triedfirst andifthisdoesnotlead toa solutionofsufficiently lowdimension-
alitythenthe co-ordinates foundin n -1 dimensionscan be used as a starting-point for
Shepard'siterativemethod.
7. CONCLUSION
The workreportedin this paper grewfroma dissatisfaction withthe manyreported
applicationsoffactoranalysisand principalcomponents analysisof Q matricesfoundin
classificationwork,particularlyin the biologicalliterature.The interpretation of such
methodscan be betterunderstoodby examiningthe distances,suitablydefined,between
the individualsand we have givena methodforfindingco-ordinates foreach individual
referred to principalaxes whichpreservethesedistances.To distinguish the methodfrom
classicalprincipalcomponents analysisit mightbe usefulto referto a principalco-ordinate8
analysis.
Perhapsthe mostimportantaspect of thistypeof analysisis the suitablechoiceof a
distancefunction.When identification is requiredD2 has certainoptimalproperties for
normalpopulationswhilstfornon-normal populations,functionssimilarto D2 could be
devisedwhichwouldimmediately providea canonicalanalysisand mightlead to a practical
solutionofthe discriminant problem.Rao (1948) has suggesteda generalapproachto the
problemofdistancewhichmightbe usefulhere.Unfortunately it is notalwayspossibleto
recognizethe different populationsin the originalsampleof individualsand thesemust
338 J. C. GOWER
firstbe foundby using an analysis based on similaritiesor distances which do not allow for
within population correlation.Furtherexamination of the propertiesof such distances is
needed.
REFERENCES