Professional Documents
Culture Documents
Ecological Data
The lnterpretation of
Ecological Data
A Primer on Classification and Ordination
E. C. Pielou
University of Lethbridge
A Wiley-lnterscience Publication
JOHN WILEY & SONS
New York • Chichester. Brisbane. Taranta. Singapore
Preface
Thethaim
d of this
d b book is to .give a fu 11' d etailed,
. . introductory account of the
me lfo s. use fi /d cdommuruty ecologists to make large, unwieldy masses of
mu ivanat~ e ata comprehensible and interpretable. 1 am convinced
that there is need for such a book. There are more advanced books that
. . of the same material ' for example, L . or l'oc1., s M u¡twarzate
cover sorne · ·
Analysls zn Vegetation Research (W. Junk 1978) and A D G d '
Cl ifi · ' . . or on s
assz catwn. Methods for the Exploratory Analysis of Multiuariate Data
(Chapman and Hall, 1981), but they assume a much higher level of
mathematical ability in the reader than <loes this book. There are also more
general discussions of the material, as in H. G. Gauch's Multiuariate
Analysis in Community Ecology (Cambridge University Press, 1982), or
many of the chapters in the two volumes edited by R. H. Whittaker,
Ordination o/ Plant Communities and Classification o/ Plant Communities
(W. Junk, 1978), but .they are more concemed with general principles, and
with comparing the merits of different methods, than with explanations of
the actual details of individual techniques.
· Such explanations are sorely needed. Through this century, ecologists
have used a series of aids to assist them in number-crunching, from tables of
logarithms, through mechanical, then electrical, and then electronic desk
calculators, to programmable computers. There has been no need for
ecologists to understand how these aids work. But over the past decade, a
new supply of "crutches" has appeared on the scene, namely, packaged
computer programs. These enable ecologists to carry out elaborate analyses
of their data without having to write their own programs. The packaged
programs are often long and intricate, the work of computer expe~ts. It.
would be {inreasonable to demand that ecologists refrain from turnmg to
these experts for help.
vii
viii
to expect ecologists to understand what
. . t unreasonable d
However, it is no_ h en if they do not understan how. There
do1ng for t ero ev
the programs are the person who uses a ready-made progra:rn.
f difference between .
is a world o d . nvectors of a large matnx and who under.
· nvalues an eige
to find the eige . d the person who delegates the whole task of
h t these things are, an
stands w ~ . 1 ent analysis (for instance) to such a program With
d . a pnncipa compon .
omg d. f what the analysis <loes. Packaged programs are a Il11Xed
no understan mg o . b d. fd .
· Whil they make it possible to analyze large o 1es o ata qmckly,
blessmg. e · 1 · li ·
1 d · a way that best reveals their ecologica 1mp cat10ns, they
accurate y, an m .
also make it possible for inadequately tramed people to go through the
motions of data analysis uncomprehendingly.
This book is designed to help those who want to gain a complete
understanding of the most popular techniques for analyzing multivariate
data. It <loes not offer any computer programs. Instead, it demon strates and
explains the techniques using artificial data simple enough for all the steps
in an analysis to be followed in detail from start to finish. The prerequisites
are a knowledge of elementary algebra and coordinate geometry at about
the first year undergraduate level. To make the book useful for self-instruc-
tion, exercises have been given at the ends of the chapters. The answers to
them, and a comprehensive glossary, are at the end of the book.
1 have written this book while holding an Alberta Oil Sands Technology
a~d Research _Authority (AOSTRA) Research Professorship at the Univer-
s~ty of Lethbndge. I am greatly indebted to the Authority, and the Univer-
sity., whose support has made the work possible. I also thank William
Sllllenk of Lethbridge, who prepared all the figures.
E. C. PIELOU
Lethbridge, A/berta, Canada
April 1984
Contents
1 INTRODUCTION
1
1.1 Data Matrices and Scatter Diagrams, 3
1.2 S~me Definitions. and Other Preliminaries, 8
1.3 A1m and Scope of This Book 11
1
2 CLASSIFICATION BY CLUSTERING 13
2.1 lntroduction, 13
2.2 Nearest-Neighbor Clustering, 15
2.3 Farthest-Neighbor Clustering, 22
2.4 Centroid Clustering, 25
2.5 Mínimum Variance Clustering, 32
2.6 Dissimilarity Measures and Distances, 40
2.7 Average Linkage Clustering, 63
2.8 Choosing Among Clustering Methods, 72
2.9 Rapid Nonhierarchical Clustering, 76
Exercises, 79
83
3 TRANSFORMING DATA MATRICES
3.1 lntroduction, 83
3.2 Vector and Matrix Multiplication, 85 ix
X
4 ORDINATION 133
4.1 lntroduction, 133
4.2 Principal Component Analysis, 136
4.3 Four Different Versions of PCA, 152
4.4 Principal Coordinate Analysis, 165
4.5 Reciproca! Averaging or Correspondence Analysis, 176
4.6 Linear and Nonlinear Data Structures, 188
4.7 Comparisons and Con el usions, 197
Exercises, 199
lntroduction
Probably
. ali. ecologists
. are familiar with field noteb ooks whose pages look
so~et~g like Fi~ure l. l. Probably most ecologists, even those still at the
begmnmgs of the~ careers, have drawn up tables like it. Their efforts may
b~ neater or .mess1er, dep~nding on the person and the circumstances (wind,
ram, mosqmtoes, gathenng darkness, a rising tide, or any other of the
stresses an ecologist is subject to). But such tables are ali the same in
principie. They show the values of each of several variables (e.g., species
quantities) in each of severa! sampling units (e.g., quadrats). Tables such as
these are the immediate raw material for community study and analysis.
Although natural, living communities as they are found in the field are,
of course, an ecologist's ultimate raw material, it is impossible to come to
grips with them mentally without first representing them symbolically. A
table such as that in Figure 1.1, which is part of a data matrix, * is a typical
symbolic representation of a natural community. 1t is the very first represen-
tation from which all subsequent analyses, and their representations, flow.
Therefore it is the first link in the chain leading from an actual, observed
community ' to a theory concerning the community, and possibly to more
inclusive theories concerning ecological communities generally.
The interpretation of such data matrices is the topic of this book. This
introductory chapter first describes in general outline the p~oc~dures that
make data interpretation possible. Then, as a necessary preliminary to ali
*Words italicized in the text are defined in the Glossary at the end of the book.
1
f t,f-C, {JU- ~ ' c: ?oo "" ~ ~ '4...o,_._11;;
(P. .t 1-~1-t ~ <;1-e.ep~ R. ; Jf-0 .h-1 lv..!~j
~
~ /{-:W
Q~
.,. ,, #:12-. 13 /'f /':,- lb I /t ~
Jr' J_ 7 lo 13 1
II s t8 '1 f1 ~
I 8 t. 'l ~ 3 £" y
-- ;l.
I 11/f
,~
I I
I /
/
/
1
c, fl 2 I "") "J
2-
,,_I 3 ai 2 1 /
JS- 'O / J
,3 :i. /o fo 16 9
I q lb <R., ~ 19
6 0 :i. /O 'f 1 1~
~ I 3 / 1
• 4
I
Scanned by CamScanner
I I
I·
I 3
I I
( I
't /
I (
-latd
111aw.1llllltebook. This one records observations on the ground
lD the floodplain of the Athabasca River, Alberta.
oATA MATRICES ANO SCATTER DIACRAMS
1 • 1 .m" n1.1tii .
() () 3 o 1 1 4 3
() () () o o 4 o 1 2
() 4 1 o 2 o 1 o o
~ o o 4 1 o 2 3 2
1 3 4 o 3 o 2 o o
4 o l 3 2 o 3 2 1
2 2 3 l 4 o 3 o o
l o o 2 o 2 o 3 4
o o o 1 o 3 o 2 3
()
4 3 o l 2 2 3 o 1
,,, .tlwa vs. thl: rnws rcprcscnt species and the columns represent s
uni ts.
undcniably, lacks any evident structure. But now
l'h1 s 111ati i: .
1h l sa111plmg units (rnlumns) and the species (rows) were rearr
1pptllp11at • fashion . Thc result is the "arranged matrix"
4 3 2 1 o o o o o
3 4 3 2 1 o o o o
2 3 4 3 2 1 o o o
1
2 3 4 3 2 1 o o
o 1 2 3 4 3 2 1 o
o o 1 2 3 4 3 2 1
o o o 1 2 3 4 3 2
o o o o 1 2 3 4 3
u o o o o 1 2 3 4
', . o o o o o. o
1h1s mnt1' . l. ontains xartly the s .
1 2 3
th ord rings of th. . .· ame infonnation as th
spt:ltcs, and of the sampling unit
DATA MATRICES ANO SCATIER DIAGRAMS
5
1.2.
SOME DEFINITIONSANO OTHER PRELIMINARIES
Before proceeding it is most im .
book of two terms much u d . portant to give the definitions used in this
se m community ecology: sample and clustering.
SOME DEFINITIONS ANO OTHER PRELIM INARIES
9
Statisticians and
Sorne Ecologists Other Ecologists
Single unit (e.g., quadrat) Sampling unit Sample
Collection of uni ts Sample Sample set
!his boo~ uses the terms in the left-hand column. Neither terminology is
entlfely satisfactory, however, because it is a nuisance to have to use a
two-word term (either sampling unit or sample set) to denote a single entity.
Therefore, in this book 1 have used the word quadrat to mean a sampling
unit of any kind, and have occasionally interpolated the additional words
"or sampling unit" as a reminder. This is a convenient solution to the
problem but it remains to be seen whether it will satisfy ecologists whose
sampling units are emphatically not quadrats: for example, students of river
fauna who collect their material with Surber samplers; palynologists, whose
sampling unit is a sediment core; planktonologists whose sampling unit is
tbe catch in a sampling net; entomologists, whose sampling unit. is the .c~tch
in a sweep net or a light trap; diatom specialists, whose sampling umt is.ª
microscope slide; foresters, whose sampling unit is ~ plot or stand ~h~t is
much larger than a traditional quadrat although, like a quadrat, it is ª
delimited area of ground.
.. . t A G Tansley and T. F. Chipp, in their
*The term quadrat is surely fanuhar to all ecologi~ s. (B .f· h Empire Vegetation Committee,
class1c Aims and Methods in the Study 01 Vegetatzon n ~Is ermanently marked off as a
192 " · 1 are area temperan Y or P .
6), define a quadrat as s1mp .Y a squ ,, A ore modero definition would omit
1
sample of any vegetation it is desired to study e os~ y. al m that although the definition says
1
the word "square"; a quadrat can be any shape. ?te h soght ~f as smaller than a "plot."
nothi ng about size . . d. ge a quadrat is t ou 1· . t the
m or mary usa . f drat nor Iower lffilt 0
H ' limit to the size o a qua •
. owever, there is no agreed upon upper bl be called by either name.
1,ize of a plot; sorne" marked off areas" could reasona Y
1N TRO DUCT1o~
10
. d h word sample because of its
as possible 1 have avoide . t. e sed in the statistical sense to
As far . ear i t is u
. 't but where it does app '
amb1gw y, . f d ats · ·
a whole collect10n o qua r · . Some ecologists treat lt as
mean . al amb1guous.
The word clustering
.
is so ifi t. Other ecologists
( e class1 ca wn · .
treat it as
synonymous with agglo.mer~ IV . h general sense, including both ag.
synonymous with clas~i~c~twn I~o~se Schematically, the two possibilities
gl~merative and and divISive met .
are as follows:
Classification
Classification ( = clustering)
Agglornerative Divisive Agglomerative Divisive
(= clustering)
The material covered in this book is listed in the Table of Contents. The aim
of the book (to paraphrase what has already been said in the Preface) is to
explain fully and in detail, at an elementary level, exactly how the tech-
niques described actually work.
Packaged computer programs are readily available that can perform ali
these analyses quickly and painlessly. Too often users of these ready-made
programs do no more than enter their data, select a few options, and accept
the results in the printout with no comprehension of how the results were
derived. But unless one understands a technique, one cannot intelligently
judge the results.
Anyone who uses a ready-made program, for instance, to do a pr~cipal
component analysis, should be capable of doing the identical analysts of .ª
small, manageable, artificial data matrix entirely with a desk calculator, or ú
INTRODU CTIO~
12
Classification by Clustering
2.1. INTRODUCTION
Certain decisions need to be made before this process can be carried out
The questions to be answered are:
t. ttow shall the similarity (or its converse, the dissimilarity) between tWo
individual quadrats be measured?
2. How shall the sinúlarity between two clusters be measured when at leas1
one and possibly both clusters have more than one member quadrat?
s
L (x;J - X;k)2.
i=l
*lnstead of meas .
ind.ivid · · unng the amou
uals, lt is sometime nt of a species in
areal "cover." s preferable to meas h a _quadrat by counting the numbef ol
ure t e b1omass or, f or many plant spec1es,
. t!Jc
NfAREST·NEIGHBOR CLUSTER! NG
15
::x::22 - - - - - - - - - - - - - - - - 2
N
(/)::X:
w 21
______ I
1
: r
! rx,,
l.)
w 1---::x::l2 - ::X: -....!
1 11 1
(L
(/) 1 1
1 1
1 1
1 1
1 1
1 1
1 1
SPECIES 1
Figure .2.1. .The distance between points 1 and 2, Wl'th coord'mates (x 11 x ) and ( )
respect1vely, is d(l , 2). From Pythagoras's theorem, ' 21 x12, X22 ,
E · f h d re applied toan
XAMPLE. The following is a demonstration t e proce u . . 10 °
artificiaUy simple data matrix representing the amounts of 2 spec1es m
CLASSIFICATION BY CLUSTERI NG
16
/
/
-. . " "
........
I \
I • \
\" ", ..._ . \1
_.,.., /
------- N
,..,..,.,.----.......
.
F / "-.
f \
( \
1 1
\ I
\ /
"-, -- ----- / /
quadrats. With only two species, it is possible to 12lot the data points in a
plane and .o.Ullille the successive clusters in the order in which they are
created .(see Figure 2.3).
Table 2.1 shows the data matrix (Data Matrix # 1) and, bel?w it, the
distance matrix. In the distance matrix the numerical value of, for example,
d(3 , 5), the distance between the third and the fifth poiñ.ts, 'lQP~S in the
(3, 5)th cell, which is the cell where the third row crosses the fifth column. It
is d(3 , 5) = 12.5. Since the distance matrix m7st ol;iously be symmetrical,
only its l!PlleJ: right half is sh9wn.
The smallest distance in the matrix (in bold face tyPe) is d(5 , 8) = 2.2.
Therefore, the first cluster is formed by uniting quiidrats 5 and 8. we call the
cluster (5, 8). '
The distance matrix · " .
d . is now
new istance matrix distance t0 reconstructed. as shown in Table 2 2
· · In thís
[5, 8) are entered in s every pomt from the newly formed cluster
. row 5 and column 5· 8 d ·h
asterisks to show that d ' row an column 8 are filled wit
qua rat 8 no long · Th
distance from [5 8) to . · er exists as a separate entity. e
' any pomt f · . f
d(3 , 5) and d(3 , 8). Since d ' _0! _mstance, to pomt 3, is the lesser 9
35
( ' ) - 12.5 and d(3, 8) = 14.4, the distance
f,.\BLE 2.t.
pATA MATRIX #1. THE QUANTITIES OF 2 SPE
A. UADRATS. CIES IN
10 Q r
-----
---
Quadrat
species 1
species 2
12
30
1
20
18
2 3
28
26
4
11
5
5
22
15
6
8
34
7
13
24
8
20
14
9
39 .
34
10
16
11
Quadrat
I í
1 2 3 4 5 6 7 8 9 JO
1 o 14.4 16.5 25.0 18.0 5.7 6.1 17.9 27.3 19.4
2 o 11.3 15.8 3.6 20.0 9.2 4.0 24.8 8.1
3 o 27.0 12.5- 21.5 15.1 14.4 13.6 19.2
4 o 14.9 29.2 19.l 12.7 40.3 7.8;
5 o -23.6 12.7 2.2 25.5 7.2
6 o 11.2 23.3 31.0 24.4
7 o 12.2 27.9 13.3
8 o 27.6 5.0
9 o 32.5
10 o
1 2 3 4 [5, ~ ] 6 7 8 9 JO
1 o 14.4 16.5 25.0 17.9 5.7 6.1 * 27.3 19.4
2 o 11.3 15.8 3.6 20.0 9.2 * 24.8 8.1
13.6 19.2
3 o 27.0 12.5' 21.5 15.1 *
7.8
4 o 12.7 29.2 19.1 * 40.3
[5, 8] o 23.3 12.2 , 25.5 5.0
*
6 o 11.2 * 31.0 24.4
7 o * 27.9 13.3
8 o * *
9 o 32.5
10 o
17
CLASSIFICATION BY CLUSTER!~
18
.
9
30
(f) 20
l±:!
l..)
w
o..
(f)
®
SPECIES 1
12
w
l..)
z
¡:!
(f)
o 4
® o
9 3 7 6 4 'º 2 5 8
Figure 2.3. (a) The data points of Data Matrix #1 (see Table 2.1). The "contours" show th1
successive fusions with nearest-neighbor clustering except that, for clarity, the final contou
enclosing ali 10 points is omitted. ( b) The corresponding dendrogram. Details are given ii
Table 2.3. The height of each node in the dendrogram is the distance between the pair o
clusters whose fusion corresponds with the node.
!~s values ª.PPe~rs in the (3, S)th cell of the reconstructed distance mató>
d . ~e 8entnes m. the fifth row and column, which give the distance
(~h[e sm est entry in the
*
' Dallfor all 1 5 or 8, are the lesser of d(j 5) a.nd d(j 8)
' ' . ·r
boldface) in the (2, 5)th cell ;econstructed dist~nce matrix is 3.6 (shown. I
· hus the next step m the clustering is the fusior
, RfST-NEIGHBOR CLUSTERING
NEA 19
Step Nearest
Fusionsª Distance Between
Number Pointsb Ousters
1 5,8 5, 8 2.2
2 [5, 8], 2 2, 5 3.6
3 [5, 8, 2], 10 8, 10 5.0
4 1, 6 1,6 5.7
5 [1, 6], 7 1, 7 6.1
6 '[5, 8, 2, 10], 4 4, 10 7.8
7 [5, 8, 2, 10, 4], [1, 6, 7] 2, 7 9.2
8 [5, 8, 2, 10, 4, 1, 6, 7], 3 2, 3 11.3
9 [5, 8, 2, 10, 4, 1, 6, 7, 3], 9 3,9 13.6
10 Ali points are in one cluster
ground.
There are occasions, however, when an "unnatural" classification (some.
times called a dissection) is needed for practical purposes. For example
. mapping even if' 1·n,
classification is required as a preliminary to vegetation
fact, the plant communities on the ground merge mto one another with
broad, indistinct ecotones. The lines separating communities on such ama
are analogous to contour Jines on a relief map, and are no less useful. p
How to distinguish clusters, given a dendrogram like that in Figure 2.3b
is a matter of choice. Sorne common ways of classifyin,g are as follows : '
lf the internodes of a de d
with short ones at the botto: r:;~alm are of conspicuously different lengths
the polints fall naturally into clust~ng onehs at the top, then it follows that
examp e. rs w1t out arbitrariness. Consider an
•llliill~~~~~~~~~~~~=m=a~c~l~u~st~erin g procedure
' is needed
' ' to
f;\BLE 2.4. DATA MATRIX #2. THE QUAN
QUADRATS. TlTIES OF 3 SPECIES IN
10
Quadrat 1 2 3 4 5 6 7 8 9 10
Species 1 24 27 24 8 10 14 14 36 36 41
Species 2 32 30 29 20 18 20 22 14 9 12
Species 3 3 1 2 11 14 13 12 8 8 6
40
¡ ' - - .......
I IA \
3+ • 2 )
1 \
30
1
----- .........
\ , ___ ./ /
/
/
' \\
N
I
/
*7
.6 l
•
(f) 20 4-=* I
w
u
1
1 I /
/----- ........
..... ,
w \
5 /
o...
(f) \
"
........._ __ .,......,.. _,,/
/
I
!
ª* * 10
\
1
10 1
1
\
9*__
'- .__ ........... /
/
I
J
@ 10 20 30 40
SPECIES 1
20
15
w
u
z
<C
1- 10
(f)
F· ® o
1 3 2 4 s 6 1 e 9 10
in~;·4· (a) The data points of Data Matrix #2 (see Table 2.4). The amount of species J
detu1ro :drat ts shown by the number of "spokes" attached to each point. ( b) The
gr yielded by nearest-neighbor clustering.
21
22
bowever, is the fact that, Witq
. them. What makes the task e_asy, an be displayed in visualizab¡
recogruze f da ta pom ts e e
three species, the swarm o . used in figure 2.4b of represent
on ly b the dev1ce " ·
three-dimensional space, or ~ d. ate by the number of spokes,,
ing the magnitude of the .third coodr. mensional coordinate frame. When
· a two- 1m
radiating from the po1Ilts 1Il . 1 tly when the swarm of data Püints
· r equ1va en ' . .
there are many spec1es 0 ' . ¡¡z·ation becomes 1mposs1ble anct a
. · al space, v1sua
occupies a many-d1mens10n d a dendrogram.
formal procedure is needed to pro uce .
. . t often used in practice because it is
Nearest-neighbor clu~t~nn~ ishnot ndency for early formed clusters to
. · Cha1Il1Ilg is t e e
prone to ehammg. . f . gle points one after another in
w b the accret1on to them 0 Slfl .
gro .Y a- be seen in Figure 2.3 where the first, tigh¡
success1on. The eu ect can h 4
icks up the points 2, then 10, and t en , one ata
two-membere1us er , P
t [5 81 .
time, before uniting with another cluster containing more than one pomt.' H
a classification is intended to define classes for a purpose such as ve~e~ation
mapping, then a method that frequently leads to exaggerated chammg is
defective. It results in clusters of very disparate sizes. Thus, as shown before,
when the dendrogram in Figure 2.3b is used to define three clusters, two ol
them are "singletons" and ali the remaining eight points are lumped
together in the third cluster. For vegetation mapping, or for descriptive
classifications generally, one usually prefers a method that yiefds clusters ol
roughly equal sizes. However, chaining may reveal a true relationship
among the quadrats. Therefore, if what is wanted is a dendrogram that is in
sorne sense "true to life," a clustering method that reveals natural chaining,
if it exists, certainly has an advantage.
l~deed~ a dendrogram is more than merely a diagram on which a
cl~ssificatton can be based. It is a portrayal in two dimensions of a swarm of
pomts occupying many dimen s1ons · (as many as there are species). A
~tendrogr.aghm need not be used to yield a classification. I t can be studied in
1 s own n tas a representation f th · · . among a swarm of
d · 0 e mterrelat10nships
1
a a pomts. Sorne workers find a d d
two-dimensional 0 d" t' en rogram more informative than a
r ma ion.
-----
DATA !V~----------:::--------
SteP
Nurnber
Fusions "Farthest"
Pointsª
--~~~~~~~~::::__
n·1stance between
Ousters
ºThese are the quadrats whose distance apart, shown in the last column, defines the distance
between the two clusters.
24
30
N
20
(f)
LLl
G
LLl
o...
(f)
10
40~
30
w
u
z
<t 20
1---
(f)
o
10
® o
7 1 6 2 5 8 4 10 3 9
·
between two populous ne1ghbonng · o f ten 1arge m
· clusters is · .sp~·1e of the. fa'el
that, as a whole, they may be very similar. Consequently, 1t is more lik.
that an isolated unattached point at a moderate distance will be united wit
one of them than that they will unite with each other. Hence an anomaloV
quadrat may become a cluster member quite early in the clustering proce~
and the fact that it is anomalous will be overlooked.
When true natural clusters exist, the outcomes of nearest-neighbor 0, ª
farthest-neighbor clustering are usually very similar. Farthest-neighbor clO·
tering applied to Data Matrix # 2 gives results indistinguishable frotn tbº5
in Figure 2.4.
CtUSTERING 25
. 1rllº'º
cEr•
CENTROIO CLUSTERING
2.4.
(entrOl
.d c/ustering is one of several methods designed t O St rik e a h appy
· di rn between the extremes of nearest-neighbor clustering on the
rne u . hb . one
halld and farthest-neig or e1ustenng on the other. Nearest and farthest-
neighbor rnethod~ hav~ the defe~t that_ they are in~uenced at every step by
the chance l~cat10ns m. th~ ~-dunen~10nal coordmate frame of only two
. dividual pomts. That IS, It IS the distance between two points only that
ind ides the outcome of each step. Centroid clustering overcomes this
ec ..
drawback by ~sing a defim~10n ~f intercluster distance that takes account of
the locations of all the pomts m each cluster. To repeat, there are many
ways in which this might be done, and centroid clustering, described in this
section, is only one of the ways. A more general discussion of the various
roethods and how they are interrelated is given in Section 2.7.
In centroid clustering the distance between two clusters is taken to be the
distance between their centroids. The centroid of a cluster is the point
representing the "average quadrat" of the cluster; that is, it is a hypothetical
quadrat containing the average quantity of each species, where the aver-
aging is over all cluster members. Hence if there are m cluster members and
sspecies, and if we write ( c1 , c2 , ... , es) for the coordinates of the centroid,
then
1 1 m
c1 = - (x 11 + x 12 + · · · +x1m)
m
= - L X1J'
m J=l
and, in general,
For example, the centroid of the three-point cluster [2, 5, 8) in Data Matrix
#1 has coordinates [(20 + 22 + 20)/3, (18 + 15 + 14)/3) = (20.67, 15.67).
.The clustering procedure is carried out in the same way as f~r n~arest
netghbor and farthest-neighbor clustering. Thus each step consists I~ the
fuston of the two nearest clusters (as before, a cluster may have only a ~mgle
lllQQiber); one finds which two clusters are nearest by searching the mter-
r distance matrix for its smallest element. As in the methods alre~dy
·......i • d f h fusion by entermg
~, the d1stance matrix is reconstructe a ter eac
CLASSIFICATION BY CLUSTER
I~~
26
20
15
LU
u
z 10
~
Cf)
i5
11ie deftdro o
l 6
7 ' •
&ram PrOduced by
'
s
la io 4
• ~~· 3
Ylng centroid clust . 9
enng to Data Matrix #l. Jt
r 1
OID CLUSTERING 27
eEN TR
TABLE 2.6. STEPS IN THE CENTROID CLU
:MATRIX #l. STERINGOF DATA
Step Square of
Number Fusions lntercluster Distance
1 5, 8 5.00
2 2, [5, 8] 13.25
3 1,6 32.00
4 10, [2, 5, 8] 43 .56
5 7, [l, 6] 73.00
6 4, [2, 5, 8, 10] 162.50
7 3,9 185.00
8 [7, 1, 6], [2, 5, 8, 10, 4] 326.25
9 [7, 1, 6, 2, 5, 8, 10, 4], [3, 9] 456.83
10 Ali points are in one cluster
28
thern is
of the distance between
Hence the square )2 - 326 24
")2 +(e'2 - e~ - · ·
d2(C',C")={c{-C1 .. .
d. error in the final d1g1t) Wlth the
This corresponds (except for a ro~n mtgp #8 in Table 2.6.
d' oppos1te s e .
squared intercluster ist~nc~ t straightforward way of findmg the
. . nnc1ple the mos . . .
Although thi s 1s, m P ' 1us ter centroids, it 1s mconveruent in
. between two e
square of the distance . b e the coordinates of the original
. . . nce anses ecaus .
pract1ce. The mconveme . h lations every time. lt 1s more efficient
. h t be used m t e ca1cu
data pomts ave 0 . t of each successive distance matrix
f lly to denve the e1emen s
computa 10na d rs The following is a demonstration of
from the elements of its pre ecesso . . D Matrix # 1 after which the
the first few steps of the process applied to ata '
generalized version of the equation is given. . . .
Each element in the initial distance matnx, m~re prec~sely a d1sta~c~
2
(j, 5) + td 2 (j, 8) - ~d 2 (5 , 8) .
2 2
d (J, [5, 8]) = td (2.2)
Thus, when j = 1,
1 o 9 10
208 272 626 325 32 37 320 745
2 377
o 128
o 250 13 400 85 16 617
3 730 157 464 65
4 o 221 850
229 208 185 369
365 162 1625 61
5 o 557
162 5 650 52
6 o 125 544 961 593
7
o 149 776 178
8
o 761 25
9
o 1058
o
1O . after the fusion of points 5 and 8.
(2) The second matnx, 4 [5 8] 6
1 2 3 ' 7 8 9 10
131.25 *
1 13.25
*
2 181.25
*
34 • 190.25 *
• 549.25 154.25 * 704.25 37.25
[5,8]
6 *
7 *
* *
8
9
JO
.
. af ter the fus10n of2and[5 , 8].
(3) The third matnx, 5
8 9 10
6 7
1 [2 ,5,8] 3 4
* 672.22 43.56
1 280.56 * 486.56 128.22 *
[2,5,8] 160.56 207.22 *
*
* *
3
* * *
4 * *
5 *
*
6 * *
7
8
9 Unchanged
·ces are shoWO.
1O . · earlier
fro111 those Ul
01 atn
ts diffenng 29
ªln mctrices 2 and 3 only elemen
elements are shown as dots.
30 CLASSIFICA TION BY CLUSTERlt-.Jc
d2(J, [2, 5, 8]) = td2(J, 2) + ~d2(j, (5, 8]) - id2(2 , [5 , 8]). (2.3)
When j = 1,
when j = 3,
and so on. These values appear in the row and column labeled [2, 5, 8] in the
third matrix in Table 2.7.
Equations (2.2) and (2.3) are particular examples of a general equation
which we now derive. It is the equation for the distance 2 from any point (or
cluster centroid) P to the centroid Q of an ( m + n )-member cluster created
by the fusion of two clusters [M1 , M 2 , ... , Mm] and [N1 , N , .• • , Nn] with m
2
and n members, respectively. The centroids of these clusters are M and N.
The set-up is shown in Figure 2.7.
~ \
\
\
\
\
\
\
\
\
\
\
~igure2 72·7·
\
\ Illustration of the derivation of Equa·
~ion ( . ). Q is the centroid of clusters M and N; il
p ts assumed that to
N th n > m and, therefore, Q is closer
an to M. See text for further details.
CENTROID CLUSTERING
31
and
2 2
(PQ) + (NQ) - 2(PQ)(NQ)cos{3 = (NP)2.
Equivalently,
n 2c 2 2xnc
x2 + - --cosa= a2 · (2.4)
(m + n)2 m+n '
m 2c 2 2xmc (2.5)
x2 + · 2
- --cosf3 = b2 .
(m+n) m+n
whence
m + n) nc2 - a2m + b2n
2
x ( -;;-
+ --
m+n m
.
T~s i~ the dlast ~lu~te:rng method that is fully described in this book Before
gomg mto etails, It is necessary to defin h . . .
and to give two methods f . .e t e term withzn-cluster dispersion ,
or computmg It Th fi f
obvious one implied by th d fi . . · e rst o these methods is the
'
way of obtaining the identical
e e mtion The s
1
b d ·
econ ' ~onobVIous method 1s a
.
First, the definition. the w·thinr~su lt y a c.omputationally simpler route.
defined as the sum of th· I -e uster d ·
ispersion of a cluster of points is
h e squares of the d. t
t e centroid of the cluster. is anees between every point and
Next, we illustrate the comput 1·
a 1ons.
Quadrat 1 2 3 4 5
Species 1 11 36 16 8 28
Species 2 14 30 20 12 32
= 892.
. . same result is to use the equation
A simpler way of obtammg the
2 3 4 5]=_!_ ~d
2
Q[l ' , ' ,
(J,k).
n J<k
CLASSIFICATION BY CLUSTE
34 ~IN~
(For a proof, see Pielou, 1977, p. 320.) Here n = 5, the number of point .
the cluster; d 2 (j, k) is the squared distance . between
ki points
h j . and k·' sthe~
summation is over every possible pair of pomts, ta ng eac parr once. l'hi
is the reason for putting the condition j < k below the summation sign s
ensures that, for example, d 2 (1, 2) shall be a component of the sum, but · lt
d 2 (2, 1) which is merely a repetition of d 2 (1, 2). There are n ( n - 1)/2 ,,no¡
distinct pairs of points, and hence 10 distinct components of the ¡ !fj
d 2 (i, k ). Thus ºl'JJJ
2
Q[1,2,3,4,5] = Hd 2 (1,2) + d 1 (1,3) + ... +d (4 , s)}
Hence
. Q[a,b)=~d2(a,b);
that is, it is half th
F e square of th d.
d' or. a one-member cluster e istance between them
ispers1on is zero· h . ' say point a b . .
We are now . ' t at is, Q[a] = O. y Itself, the within-cluster
m a P ..
each step th os1tion to descn·b . .
. ' ose two 1 e nunun
mcrease in withi le usters are to be un·t d um variance clustering. At
matters is not s· n-c uster d.ispersion 1t ·1 e. whos e fus1on · yields the Ieast
formed cluster ~upt yththe value of the wi1tshinrrnplortant to notice that what
1
' e amount by which thi -e ust er a·ispersion of a newlY
s value exceeds the sum o¡
MINIMUM VARIANCE CLUSTERING
35
It is values such as q([a, b, e], [d, e]) that are the criteria for deciding
which two clusters should be united at each step of the clustering process.
At every step the clusters to be united are always the two for which the
value of q is least.
As with the clustering procedures described in earlier sections the
mínimum variance method also requires the construction of a sequen~e of
"criterion" matrices. Then the position in the matrix of the numerically
smallest element indicates which clusters are to be united next. The matrices
obtained when minimum variance clustering is applied to Data Matrix # 3
are shown in Table 2.9. In addition to the sequence of criterion matrices Q 1,
Q2 , Q3 , and Q4 , and printed above them, is the matrix D 2 . It is a distance 2
matrix whose elements are the squares of the distances separating every pair
of points. The elements of D 2 are used to construct the successive criterion
matrices.
We now carry out minimum variance clustering on the data in Data
Matrix # 3. Q , the first criterion matrix, has as its elements the within-clus-
1
ter dispersion of every possible two-member cluster that could be formed by
uniting two individual points. It has already been shown that for a two-
member cluster consisting of points j and k, say, the within-cluster disper-
sion is
2
The Distance Matrix 2 3 4 5
1
881 61 13 613
1 o
2
o 500 1108 68
o 128 288
D2= 3
o 800
4
5
o
Tbe Sequence of Criterion Matrices
1 2 3 4 5
36
MINIM UM VARIANCE CLUSTERING
37
1
Jonger exists as adseparate entity. In the J.t h cell of h
(now the row an column for cluster [l ' 4]) is . en tered
t e first row and col umn
Q[J] =O ,
and
Q[l,4] = !d 2 (1,4).
Therefore,
. . letting j take the values 2' 3, and 5 m
. turn and t ki 1h
2
reqmred d1stances from the matrix 02, it is found that ' ng e ª
q(2, [1,4]) = Q[l,2 ,4] - Q[2] _ Q[l, 4]
= 660.83.
Likewise,
q(3, [l, 4]) = Q[l, 3, 4] - Q[3] - Q[l, 4]
= 67 .33 - o- 6.5
= 60.83
It will be seen that the values just computed appear in the first row of Q1
(they would also appear in the first column, of course, if the whole matrix
were shown, but it is unnecessary to print the matrix in full because it is
symmetric).
The remaining elements in Q are the same as in Q1 · .
5 15
The smallest element in Q i; 34 the value of q[2, 5]. Hence [Z, 1 the
~ 2 '
cond cluster to be formed.
CLASSIFICATION BY CLUSTER.IN<;
38
1 t 5dis .no
th terms o f Q 3 . Since poin
. ·1ar process, we calculate f herow an d co lumn are rep ace With
By a sum the elements in the ti t e the row and column for the
longer separate, d row and column becom 1 sters [l 4] and [2, 5] now
. ks The secon member e u ' . hin
astens . 5] so that the two two- . The increase in w1t -clus-
new cluster [2, d 'econd positions in the matruct d to make a four-member
occupy first an s esult if they were um e
ter dispersion that would r
cluster is [ ]
5]
([1,4], [2,5]) = Q[l,2,4, - Q[l ' 4] - Q 2 ' 5
The procedure far compu ting the elements o f the Q matrices should now
be The
clear.smallest element m. Q3 is. 60. 83 in cell ([l, 4], 3). Therefore, the next
After the final fusion, when all five points have been umte
· d m
· t o one
cluster, the within-cluster dispersion is
º[1, 2, 3, 4, 51 = ~ ¿ d 2 u, k)
j<k
= s92,
In words: the total disp~rsion is the sum of the smallest elements in the
successive criterion matnces, the Qs. The clustering strategy consists in
augmenting the total by the smallest amount possible at each step.
Figure 2.8 shows the data points of Data Matrix # 3 and the dendrogram
we have just computed. The height of each node is the within-cluster
•5
30 •2
(,() 20 •3
w
u
w
Q_
(,()
.,
•4
'º
20 30 40
10
SPECIES
z 892
o
(/)
cr:
w
Q_
75
(f)
o
cr:
w 50
1-
(f)
::)
_J
l.)
1
25
~
I
1-
~
5 yielded
o ' 4 3 2 8) and the dendrograro
. # 3 (see Table 2.
. f D ta Matnx
Figure 2.8. The data pomts .º : the data.
by nunimum variance clustenng 0
CLASSIFICATION BY CLU STER!~~
40
1673 . 8
765. 2
z
o
(j)
a:: 174
w
o...
(j)
i5
a::
w
f-
(j)
::::>
_J
u 92 . 5
1
z
J: 64 .7
t:
~
30.5
16
11 .3
2.5
3 9 7 1 6 2 5 8 4 10
Figure 2.9. The den drogram produced by applying mínimum . .variance clustering
h d to Data
Matrix #1. The scale on the Ieft shows the within-cluster d1spers1on of eac no e.
dispersion of the newly formed cluster that the node represents. Thus the
heights are Q[l, 4] = 6.5, Q[2, 5] = 34, Q[l, 4, 3] = 67 .33, and Q[l , 2, 3, 4, 5]
= 892.
Figure 2.9 shows the results of applying minimum variance clustering to
Data Matrix #l. The steps in the computations are not shown here since
nine 10 X 10 matrices would be required. The exact value of the within·
cluster dispersion of each newly formed cluster is shown on the scale to the
left of the dendrogram to serve as a check for readers who wish to carry out
minimum variance clustering on these data for themselves. It is interesting
to compare this dendrogram with those in Figures 2.3, 2.5, and 2.6 which ali
relate to the same data.
bItetween
was remarked in Section 1 of this chapter that the Euclidean distance
the point · ·¡
s representmg two quadrats is only one of m any possib e
01ss1MILARITY MEASURES ANO DISTANCES
ways of defining
d
. the
. ·1 dissimilarity
istance as a diss1rm
.
. of the two quadrat W
anty measure in See1ions
. 2 3 4s. e used . Euclide an
ditferent clustenng procedures were describ d ' . ' ' and 5 m which four
41
'
other poss1ºbl e a·iss1rm
. ·1 anty
· measures a de ·hThis
n t eu. adseet.ion describes sorne
vantages. vantages and disad-
is ·the common-sense axiom which states that the 1ength of any one s1de of s
tnangle must be . less than the sum of the length f h
s o t e other two s1des. . ª
Suppose we wnte d(A, B) for the length of side AB of tnang · 1e ABC and
analogously
. for the other two sides. Then the triangle mequa · li ty may ' be
wntten
The equality sign applies when A, B, and C are in a straight line or,
equivalently, when triangle ABC has been completely flattened to a straight
line.
The triangle inequality is obviously true of Euclidean distances. However,
measures of the dissimilarity of the contents of two quadrats (or sampling
units of any appropriate kind) are often devised without any thought of the
geometrical representation of the quadrats as points in a many-dimensional
coordinate frame. These measures were not, when first invented, thought of
as distances. Only subsequent examination shows whether they are metric,
that is, whether they obey the triangle inequality or, in other words,
"behave" as distances.
Sorne examples are given after we have considered why metric diss~milar-
ity measures are to be preferred to nonmetric ones. As remarked preVIously,
. · ·1 't b t een two
when a metric measure is used to define the d iss1IDI an Y e w .
quadrats, then the dissimilarities behave like distances. As result, it may
b · t · a space of many
ª
e possible (sometimes) to plot the quadrats as poID s m 1 h
dimensions with the distance between every pair of points being equal 1 t_te
d" . . . h nonmetnc d1ss1ID1 an y
°
ISsimilanty of the pair (see page 165). But w en ª
measure is used, this cannot be done.
CLASSIFICATION BY CLusn
Rl~C
42
100
S(A , B) = max(d) - d(A, B).
43
i=l (2.8)
PS = 200 X s
L (X¡¡ + X¡2)
i=l
· the table,
. . . 1 2 10. Since, as shown m
A numencal example IS shown 11:1 ~ab _e : PD = 41.33%.
PS = 58.67%, the percentage diss1nulanty IS
CLASSIFICATION BY ClUSTERINc;
44
CALCULATION OF THE
TABLE 2.10. TO ILLU~i~iiv PD AND THE PERCENTAGE
PERCENTAGE DISSIM RATS ª
REMOTENESS PR OF TWO QUAD .
Quadrat Quadrat
Species min(x;., X;z) max( x; 1, x;2 )
1 2
Nwnber i
7 7 25
1 25
16 16 40
2 40
50 18 50
3 18
22 16 22
4 16
22 9 22
5 9
117 66 159
Totals 108
PS = 200 x 66/(108 + 117) = 58.67%. Therefore, PD = 41.33%.
RI = 100 x 66/159 = 41.51%. Therefore, PR = 58.49%.
ªThe entries in the table are the quantities of each species in each quadrat.
s
L min( xil, x¡ 2 )
RI = 100 X _i_=_l- - - - -
s (2.9)
L max(x¡1 , x¡ 2 )
i=l
Then
PR = 100 - RI.
= 18 + 24 + 32 + 6 + 13 = 93.
the two clusters are united for which the sq~are of. the d1stance separatin¡
them is least. This was done in the example m Sect1on 2.6.
2
lt should be noticed that although it is legitimate to use distance as 1
2
clustering criterion, this is not equivalent to usíng distance as a dissimilar
ity measure since distance 2 is nonmetric. To see this, consider a numerica
example. It is easy to construct a triangle with sides 3, 4, and 6 units long
since 6 < 3 + 4. But, obviously, one cannot construct a triangle with sid
2 2 2
32 , 42 , and 6 2 units long sínce 6 > 3 + 4 •
Euclidean distance, city-block distance, and percentage remoteness pro
vide a more than adequate armory of dissimilarity measures for use wheneve
nonnormalized distances are required. I t is now necessary to consider th1
topic of normalized versus nonnormalized (" raw") data and to conside
whether, and if so how, ecological data and dissimilarity measures derive1
from them should be normalized for analysis.
1.0
(\J
(/)
W A(0. 2,0.5)
l) 0.5 •
w
a..
(/)
C( 2 . 1 , 0 . 3)
B~(0~
.7~,0.~l~l~~~~~~~
1.5
2.0
SPECIES 1
Figure 2.10. Points A, B, and C described in the text Cl
. early, d(A, B) < d(B, C).
1.0 A' f.
(\J . '-....
(/)
\
\
' ',
\ g(A,B)
w \ \
u \ \
c(A,B) \
~ 0 .5
\'\ \
(/)
\ \
\ ..y
1.5 2.0
0.5 1.0
SPECIES
. 1 p ·nts A B and e are the same as in Figure 2.10. A' and B' are the projections
F1gure 21
. . 01 , ' h " d ·
of A and B onto a circle of unit radius. The chord distance and t e geo es1c metric" (oi
geode ic distance) separating A and B are c(A , B) and g(A, B) , respectively.
B, the point C' is identical with B' so that d(A', C') = d(A', B') and
d(B',C') =O. Equivalently, c(A, C) = c(A, B) and c(B, C) = O.
We now derive c(A, B) in terms of the coordinates of points A and B.
Let these coordinates be (xlA, x 2 A) for point A and (x 1B, x2B ) for point B
(Recall that the first subscript always refers to the species and the second tJ
the quadrat or other sampling unit.)
First, for brevity, put OA = a, OA' = a', OB = b, and OB' = b'. Writ(
Ofor angle AOB. Obviously, AOB = A'OB'.
From the construction, we know that a' = b' = l.
Applying Apollonius's theorem (see page 78) to 6. ABO and ~A'B 'O
respectively, shows that .
1
d (A, B) = ª 2
+ b2 - 2ab cos 8 (2.10
and
a2 -xlA+x2A;
- 2 2
cos8= ª 2 +b 2 -d 2 (A,B)
2ab
it follows that
(2.12)
Hence to evaluate c 2 (A, B) for any pair of points A and B with known
coordinates, first find cos (J using (2.12) and then substitute the result in
(2.11).
For example, in Figure 2.11 the coordinates of A and B are, respectively,
Therefore,
{t x?A ;t x?o}
1/ 2
Quadrat 1 2 3 4 5 6 7 8 9 10 11
Species 1 3 4 5 5.5 6 6 11 11.5 12 14 13.5
Species 2 3 7 7 5.5 4 6.5 11 13.5 13 11 15
seashore varies markedly with the exposure of the shore to waves; sheitere
shores support a much Jarger crop than wave-battered _shores. But the,sd
ed communities are not sparse and dense vers10ns of the e
co ntrast . . . .. salll
species mixture; they d1ffer, also, m spec1es compos1t10n: 1
Likewise' the luxuriance
. . of the
. ground vegetat10n . m regions sev ere[)
affected by air pollut10n is consp1cuously less than that m clean areas. B .
. . . h . . Ut11
is not only less m amount; it 1s also mue poorer m spec1es.
Thus if clustering is done to disclose differences of an unspecified kin
raw data are better than normalized data. Differences in overall abundand
are not "meaningless" and are not (usually) unaccompanied by at le~
sorne qualitative differences in the community of interest. Normalizing:1
data may inadvertently obscure real, but slight, differences among them
the same time as it (intentionally) obliterates the quantitative differenc ª
. ~
Th at is .not .to say, however, that there may not be situations in hi
11 w el
normaliz at10n is ca ed for; for example' one might wish to classi'fY samph
8.28
2
11
15
•
ª• •
9
N
l/) 10
•
7 •10
w
u
w 2 3
o o
2
o
3
o
4
o
5
o
6
o
a_ 1
o o
l/) 06
5 40
o 0.10
o ~
1
5 10 0.05
15
SPECIES
0
Figure 2 12 T o
b · · · he data ·
o tamed by cluste . pomts of Data Mat .
o
5 10
o
1
°4 7
o •
6 9 11 B
o
2 3
nng the data e .
betw amed using the Euclid . e~tro1d clustering wa
was obt · nx # 4 (see Tabl aJll
e 2.11) and two deodrogr
een-quadrat similarity; th:a;:,!istance between eac~ us~~
for both. The upper dendro~
-----~-----~--er
- ....... dendrograrn u d p of raw data points as measure
se the geodesjc metric.
orssrMILARITY MEASURES ANO DISTANCES
53
Iots of the vegetation of a polluted a
p . . rea so as to d.
repollut10n clustenng. Then, provided th . isclose the probabl
P . . 1 e quahtativ d. c:r e
reexistmg e usters exceeded the qualitat" ct· e iuerences among
P . . Ive Ifference . d
tion, normalizat10n would be desirable. It . h s m uced by pollu-
. · mig t prevent ct·.i::r
uantity
q . . of vegetat10n m the sample plots f . Iuerences in the
. rom overnd · .
q ualitative differences that persisted from th mg, and masking the
e prepollut" · '
To repeat: whether to use raw or normalized d _10n penod.
judgment. ata IS always a matter of
Presence-and-Absence Data
1~
Tbere is no objection to using the clustering methods described in earli
sections of tbis chapter to the clustering of binary data. However, unless 1
total oumber of species is large, the result of a clustering process may see
somewhat arbitrary. Tbis is because only a few values are p ossible for;
distance separating any pair of points. e
Consider the simple cases in Figure 2.13. In the two-species case th
distance between any pair of noncoincident points must have one of t
values, 1 or !i, depending on whether the two points are at the ends 0 ;c
SPECIES 2
SPECIES 1
SPECIES 2
(0,1 ,0 )
(1,1,0)
lw ~
(0,1,1)
/
V-
~1: --
: /1 (1 ,1,I)
"V'3" 1
(0,0,0~--
/
/ SPECIES
/
/
/
(0,0,I) /
~ 1 ----7(1,0,1)
d ts contam . spec1es
. that are not present
. hin quadrats
d t 1. and .
and sorne qua ra 1 f pecies represented m t e a a matnx sorne2
d of the to ta 0 s s
In other wor s, b 0 th the quad ra ts being compared . H ence th ese '' ·
. Joint
are absent from . t d N ow consider th e following 2 ><
ab sences " can be spec1fied and coun e .
table:
Quadrat 2
Number of species
Present Absent
Quadrat 1 Present a b
N umber of species Absent e d
t (x
i=l
;i + x ;,) = (a + b) + (a + e) = 2a + b + e; (2.14)
RI = 100 X --ª--
a+ b + c ·
This is. J accard 's index (as a percentage) ' the oldest simi·lan·tYm
· dex used by
e~olog1sts (Goodall, 1978a) and as well known as S0fensen's. Thus with
bmary dat~ t_he percentage re~oteness PR is identícal with the complement
of Jaccard s mdex. lndeed, this complement (as a proportíon rather than a
percentage), namely,
1- a b+c
a+b+c a+b+c
d(l, 2) =¡¡;+e.
SPECIES 2
o
, l . - - - - - - - , ( 1. 1, o)
1
1
~---
/ e SPECIES 1
¡ (0,0,0)
/
/
/
A
(0,0,1)
+ +
Quadrat { + 1 2 Quadrat { + O 2
B -o O D - O 1
MS( C, D) = 2+O
0+2+0 = l.
so that MS(A
. '
B)
=I= MS(C, D)
The d1fferen ce betw .
(the number of s . een MS(A, B) and MS( .
In general p~cies present in both d e, D) IS due to the terlll a
' even if tw 0 . qua rats) in th d "'S
matches" ( b + pairs of points h e enominator of 1v'. ·
e), the pair with th 1 ave the same number of ''llllS'
e arger nu m b er of species for tbe
LARITY MEASURES ANO DISTANCES
01 ss1M I
61
Binary
~~
N onnormalized
1
Normalized
1
Recall that four of the dissimilarity measures described here are the
complements of similarity measures. The way in which they are paired is
·sted below:
,.,/4
,./3
,J2. 0.40 QJ
QI \)
u e:
e o
~ 0.33 ~
~
1/)
"O ,JI
0 .25 ~
e
o
QI
"O
u:::J
w
o o
2 3 4 8 6 7 1 5 2 3 4 8 6 7 1 ·5
Figure 2.15. Two dendrograms produced by applying nearest-neighbor clustering to Data
Matrix # 5 (see Table 2.13). Euclidean distance was used as dissimilarity measure for the
dendrogram on the left, MS distance for the dendrogram on the right.
TABLE 2.13. DATA MATRIX #5. PRESENCES (1) AND ABSENCES (O)
OF 10 SPECIES IN 8 QUADRATS.
Quadrat 1 2 3 4 5 6 7 8
Species 1 o 1 1 1 o o o
2 o 1 1 o
3 o
1 o o
1 1 1 o o o
4 1 o o o 1 1 1 o
5 1 1 1 1 1 o
6 1 1
1 1
7 1
1 1 o 1 1 o
1 1 o
8 o 1
1 o o
1 1 o
9 o 1
o o
1 1 o
10 o o o o
o o o o o
62
---
E LINKAGE CLUSTERING
AVERA G 63
In Section 2.4 it was remarked that there are m~ny possible ways of defining
the distance between tw~ clusters. In the clustenng method described in that
section (centroid clustenng)
. the distance between two clusters is defined as
the distance between theu centroids. I.n this section we consider other
definitions and their properties.
The most widely used intercluster distance is the average distance. This
distance is most easily explained with the aid of a diagram; see Figure 2.16.
Tue five data points with their coordinates given beside them show the
amounts of species 1 and 2 in each of five quadrats. There are two .obvious
clusters, [A, B, C] and [D, E]. The average distance between these clusters,
which will be written du([A, B, C], [D, E]), is defined as the arithmetic
average of all distances between a point in one cluster and a point in the
other. There are six such distances. Therefore,
15
./ --- ......_
"\
N
/
/ /
eA(4,11) \ /
/------ ...........
'\
10
1 eC(6,IO) \ I • E (16,9) \1
.
(j)
LLJ 1 I 1 • I
u
w
CL
\ 8(4,7) /
/
\
\
"
0(15,6)
....... __ /
/
(f)
\ / /
5 \ /
........ /
15 20
5 10
SPECIES 1
. . . ve uadrats to illustrate
Figure 2.16. Data points showing the quantities of two spec1es m fi q ,
the definition of d u ([A, B' C], [ D, E]) . See text.
CLASSIFICATION BY CLusTE
64 RIN~
2
d(A, E)= V(4 - 16) 2 + (11 - 9) = 12.1655;
..................................
and
2 2
d( e, E)= /(6 - 16) + (10 - 9) = 10.0499.
or, equivalently,
E UNl<AGE CLUSTERING
AVER AG
65
(2.16)
it being understood that the summations are over all values of j and k.
Equation (2.16) is the symbolic form of the definition of average distance.
But when these distances are used to decide which pair of clusters should be
united at each of the successive steps of a clustering process, it is much more
economical computationally to derive each successive intercluster distance-
matrix from its predecessor rather than by using (2.16) which expresses each
distance in terms of the coordinates of the original data points. This may be
done as follows (Lance and Williams, 1966):
Consider three clusters [M1 , M 2 , .•. , Mm], [N1 , N2 , •.. , Nn], and [P1 , P2 ,
... ,PP] with m, n, and p members, respectively. In what follows, the
clusters are represented by the more compact symbols [M], [N], and [P].
Suppose [M] and [N] are united to form the new cluster [Q], with
q = m + n members. Then, from (2.16),
(2.17)
Now recall that the points belonging to the new cluste~ [Q~ are
M1, M2, ... ,Mm, Ni, N2, ... ,Nn. Therefore, we can separate the nght s1de of
(2.17) into two components and write
· n . That is,
express10 n p
d ([ P] [QJ) = m __!_
u ' m pq J = i
t t d(~,
k=1
Pk) + ~ :q j~I k~/(~,Pd
n p
_l
mp
f f d(Mi, Pk) = du([M], [P ])
J=l k=l
L L d(~, Pk) =
n p )
and _!__ du([N], [P] ·
np J=l k=l
Recall that q = m + n and that [Q] contains ali the members of [M] and
[N] , the two clusters that were combined to form it. Thus
The average ~istance between clusters [P] and [Q], previously denoted by
du([P], [Q]), IS not the only way of measuring intercluster distance. Recall
and compare equations (2.18) and (2.7) (page 32). They constitute two
different answers to the question: What is the distance between two clusters
given that the first has just been created by the fusion of two preexisting
clusters each of which was at a known distance from the second cluster?
(Observe that the question asked is not the simpler one: What is the
distance between two clusters? The reason is that the answer sought is a
formula for computing the elements of each distance matrix from its
predecessor.)
Let us write [Q] for the first cluster, [M] and [N] for the preexisting
clusters from which [ Q] was formed, and [ P] for the second cluster. The
numbers of points in these clusters are q, m, n, and p, respectively, with
q = m + n.
The answer to the preceding question depends on how intercluster
distance is defined. As we shall see, the defining equations are sometimes
2
expressed in terms of distance d, and sometimes of distance squared d . To
make the relationship among the definitions more apparent, the w~rd
2
"dissimilarity" is used here to mean either distance or distance , according
2
to context. The symbol S is used in the equations to denote either d or d '
and after each equation its current meaning is specified.
. a· t e the answer to the
lf dissimilarity is defined as the average IS anc '
question is given by (2.18), rewritten with S in place of d, namely,
question becomes
(2.22)
Here. 8 denotes
. d2
and the sub scnpt
. m stands for "median"· d is tbe
median dzstance somef kn . ' m
E . ' unes own as the wezghted centroid distance.
assu!~ª:~~ (2.~2) can be obtained directly by considering Figure 2. 7. lf we
formed by ' :'. atever the values of m and n, the centroid of the cluster
urutmg [M] and [N] líes midway between them at distance c/Z
E UNt<AGE CLUSTERING
AVERA G
69
trorn which (2.22) follows in the same w__~y that (2.20) follows from ( .?). *
2
The Four Versions of Average Linkage Clustering
Four ways of measuring intercluster distance have now been described: the
unweighted average distance, the weighted average distance, the centroid
distance (unweighted), and its weighted equivalent, the median distance.
Each of these diiferently defined distances can be used as the basis of a
clustering process. At every step of such a process, the pair of clusters
separated by the smallest distance (using whichever definition of distance
has been chosen) is united.
The four clustering methods that use these distances are known, collec-
tively, as average linkage clustering. Centroid clustering, described in detail
in Section 2.4, is one of the four. The relationships among the four are most
clearly shown by arraying them in a 2 X 2 table thus (below the name of
each method is given the number of the equation to be used in the
computations):
Intercluster Distance
Average of Distance
Interpoint Distances Between Centroids
EXAMPLE F
· . igure 2.17 shows the dendrograms obtained by applying tbe
four clustenng method t0 D M . .
. . s ata atnx #6 (see Table 2.14). The clustenng
en 1enon was d for the g d
d' roup average methods and d 2 for the centroid an
me Lan methods. The heights of th d . al
to values of d Th d e no es lll all four dendrograms are equ
clustering is n~tice:bl e~~~ogram produced by unweighted group average
however, that this w·n ~ 1 ere~t from the other three. It does not folloW,
1 e true Wlth other data matrices.
The unweighted group avera . ,
dure most widely u d b ge method is probably the clustering proce
se Y ecologists T 0 ·1
· mention only a single exarople, 1
AVERAGE uNt<AGE CLUSTERING
71
A
8
50
w
L)
z
~ 25
Cfl
o
o
1 2 3 4 5 7 8 6 9 10 1 2 3 4 5 7 8 6 9 10
e D
50
w
l)
z
<!
1-
Cfl 25
o
o
1 2 3 4 5 7 8 6 9 10 1 2 3 4 5 7 8 6 9 10
Figure 2.17. Four dendrograms produced by applying different forms of average linkage
clustering to Data Matrix #6 (see Table 2.14). (A) centroid clustering; (B) median clustering;
(C) unweighted group average clustering; ( D) weighted group average clustering. The cluster-
ing criterion is d 2 for A and B, and d for C and D .
Seven clustering techniques have been described in this chapter: nearest and
farthest-neighbor clustering, minimum variance clustering, and the four
forms of average linkage clustering among which centroid clustering ~
included. There are many other, less well-known methods, devised for
special purposes; accounts of them may be found in more advanced books
such as Orlóci (1978), Sneath and Sokal (1973), and Whittaker (1 978b). One
or another of the last five methods described in this chapter should meet the
needs of ecologists in ali but exceptional contexts. It remains to compare the
methods with one another.
These are rarely used nowadays. In these methods the two clusters to be
~ni~e? at any step are determined entirely by th~ distance between two
mdlVldual data points, one in each cluster. Thus a cluster is always repre·
s~nted by only one of its points; moreover, the "representative point" (a
differe~t one at each step) is always "extreme" rather than "typicar' of the
cluster It represents.
.
:e
quadrat~ in any 0 rarly, they f~rmed severa! distinct classes with all tbe
C ass COilStltutin nl'T'IC
population. In the fo g a random sample from the Sw':
· . rmer case every d . 1s
mterestmg and reveals (ºt . h ' no e m a clustering dendrograIJ1
i IS oped) "t " . . ·¡w
rue relationships among dissiJlll
CHOOSING AMONG CLUSTERING METHODS
73
•
1
thí·ngs. In the latter case the first few fusions do no more than urute .
of quadrats that are not truly distinct from one h groups
. hi anot er; the differences
d
arnong the qua rats wit n such a group are due enf 1
. h . . . ire y to chance and the
order in which
. . t ey are . umted is likewise
. a matter of chanee. '
With nummum vanance e1ustenng it is possible to d . .
. . . o a statistical test of
each fusion m order to .Judge whether the points (or clusters) b elilg . uruted.
are homogeneous (replicate samples . from a single parent .
popu1ation) or
heterogeneous (samples from d1fferent populations) This i ·
· · · 1 h "· f . · s eqmva1ent to
judgi~g, obJective y, t e m ormat10n value" of each node in a dendrogram.
Thus If the lo~ermost ~odes re~resent fusions of homogeneous points or
clusters, they have no mformat10n value; obviously, it is useful to dis-
tinguish them from nodes representing the fusions that do convey informa-
tion about the relationships among the clusters and about their relative
ecological "closeness." The reader is referred to Goodall (1978b, p. 270) or
Orlóci (1978, p. 212) for instructions on how to do the test, which is beyond
the scope of this book.
Mínimum variance clustering, like farthest-neighbor clustering, tends to
give clusters of fairly equal size. If a single data point is equidistant from
two cluster centroids and the clusters do not have the same numbers of
members, then the data point will unite with the less populous cluster
(proved in Orlóci, 1978). The result is that, as clustering proceeds, small
clusters acquire new members faster than large ones and chaining is unlikely
to happen. This is a great advantage when clustering is done to provide a
descriptive classification, for mapping purposes, for instance. Of course, it
<loes not follow that a nicely balanced dendrogram gives a truer picture of
ecological relationships than a straggly one.
. now to the four average li·nkage e1us tenn · g techniques ' the first
Turrung
choice that must be made is between unweighted and weighted ~ethods. ~
1
between weighted and unweighted clustering is not always easy, but When ~
doubt, unweighted clustering is to be preferred. .
I t remains to choose between group average .clustenng h and centrO¡.
clustering (or its weighted equivalent, median clustenng). T e pros and con
are very evenly divided. Each method has a notable ~dva~tage that the othe
ª
lacks and, at the same time, a notable weakness which is consequence 0
the advantage.
The strong point of centroid clustering is that each cl~ster as it is forme
is represented by an exactly specifiable point, its centr01d, and the distance
between two clusters is the distance between their centroids. In grouP.
average clustering, there is no such geometrical realism: the clusters cannoi
be identified with precise representative points and, therefore, the concept 0
intercluster distance is unavoidably fuzzy. The device of using the average 0
all interpoint distances between two clusters as a measure of intercluste
distance is just that, a device.
The weakness of centroid clustering, a weakness not shared by grouP.
average clustering, is that it is not monotonic. This term is most easil
explained with a figure (Figure 2.18). The upper panel shows six data points
and their coordinates in a two-dimensional space. Below are two dendro-
grams obtained from the data. The dendrogram on the left results from
centroid clustering with d 2 as the clustering criterion; the scale shows the
square root of the d 2 value corresponding to each node. The dendrogram
on the right results from group average clustering with d as criterion.
As may be seen, the centroid clustering dendrogram contains two so-called
reversals. For example, the height of the node (the intercluster distance)
rep.resenting. the fusion of E with [D, F] is less than that representing the
fusion of pomts D and F. This is because (see the upper panel) although D
an~ F are nearer to each other than either is to E, so that D and F are
uruted first, the centroid of the new cluster after the fusion (the hollow dot
labeled [D, FP is nearer to E than either of its component points were
before the fus1on. The distances are easily found to be
A(6,24)
T
1
20 1
• e e 16.4 ,10.0 l
[A,B]~
(/) 1 • E(34.2, 15)
w 1
ü
~ 10
•
8(6, 12) 0(26,10).__ rr
'~,F
]
(/)
'-.... F( 34,6)
10 20 30 40
SPECIES
24.2
23 .3 11
12. 2
11.6
11.6
10 .7
w
~I u
w z
ü <!
~- z t-
(/)
4 9.3
~ o
01
(/) 8.9
o 8 .9
8 .2 1 1 o
o o e B E o F . dendrogram (left)
F A
A e B E
sals appear m
a centroid clustenng
. t d from the
d gram construc e
Figure 2.18. 111ustration of how rever average clustering den r~ted at the top. The ho
although they are absent from the _grou~at were clustered are ~o f the dendrograms, wbic
11
?:
same data (right). The six data pom~] and [D, F). Note the se es 0
dots are the centroids of clusters [A' als conspícuous.
have been adjusted to make the revers
CLASSIFICATION BY CLUST
En1N
76
All
· theh.clustering. techniques
. so far d escnbed
. in this chapter have been
hzerarc zca1. A hierarchical clu t .
unite data points ·nt s enng procedure <loes more than mere!
1 o e1usters. 1t perfo h f . . .
and, therefore the outc . rms t e us10ns rn a defim te sequence
, orne can be displa d d
to discem the different d . ye as a endrogram, enabling one
. egrees of relationshi .
W1th very large data mat . . . P among the pornts.
clustering of the data ~ces, it IS often desirable to do a nonhierarchic
preliminary to hierarchi polmlts (q~adrats or other sampling units) as a
thi s: ca e ustenng · There are severa! reasons for doi.n .
1. ffierarchical
. clusterin b
c_omputer time and me g y any method makes heavy demands oJJ
tlon n mory · F or very 1
a y more economical pr d . arge bodies of data, a computa·
~· A dendrogram w'th oce ure IS desirable.
1 a very I
ultlill at e branches is too big t arge number (100 or more say) o¡
o comprehend. '
ONHIERARCHICAL CLUSTERING
RAPID N
77
3. A Iarge .
data matrix usually contains data f
. hi h f h . . rom numerous replicate
aropling umts w1t n eac o t e d1stmguishabl d·i:r ..
s . . b . . . Y Iuerent commuruties
W hose relat10nships are emg mvestigated. Hence th li . .
. . . . e ear est fus10ns m a
hierarchical clustenng are likely to be urunformative Th
. . · ey mere1y have the
effect of pooling replicate quadrats, and the order of th f . . .
. . e us10ns which brmg
this pooling about is of no mterest.
APPENDIX
Apollonius's Theorem
To Prove:
c2 = ª2 + b2 - 2ab cos(J
EXERCISES 79
where
e = C1 + C2 and () = () 1 + ()2.
Also,
Now
e 2 -- ( C1 + C2 )2 =e?+ ci + 2c1c2.
c2 = (a2 - h2) +(b2 - h2) + 2absin01sin02
= ª 2 + b 2 - 2( h 2 - ab sin01sin02)
= a 2 + b2 - 2abcos0.
QED
EXERCISES
Given the following data matrix X, what is the Euclidean distance
2.1.
between: (a) points 1 and 5; (b) points 2 and 3; (e) points 3 and 5?
(Each point is represented by a column of X, which gives its coordi-
nates in four-space.)
l~ 8 2 -1
o 4 -2
3 -1 6
X=
-2 -4 -2 o
CLASSIFICATION BY CL
80 USltR1~
2.3. Suppose the data points whose coordinates are given by the col
of x in Exercise 2.1 were assigned to two classes: (1, 2] and [3~lllni
Find the coordinates of the centroi~ of each of these classes.
5
1
Wh;
.I
the distance between the two centro1ds?
2.4. Let M ' N ' and P denote the centroids of clusters of points in five•sPac
with, respectively, m = 5, n = 15, and p = 6 members. Find th
distance 2 between P and the centroid of the cluster formed by Uniti !
clusters [M] and [N]. The coordinates of points M, N, and pare ~
follows:
M N p
1 -5 10
-8 2 11
8 5 -2
9 1 -5
7 4 -6
2.5. What .is the withi~-cluster dispersion of the swarm of five points whos(
coordmates are given by X in Exercise 2.1?
2.6. For the two points m' s·ix-space wh ose coordmates
. .
are given by th¡
columns of th~ following 6 X 2 matrix, find: (a) the chord distance
(b). the geodes1c metric·' (e) the angular separation between the tW(
pomts.
3 -1
4 -3
-2 -4
1 o
5 4
2.7.
Obtain J accard' s and S ' . .
following three . 0rensen s md1ces of similarity (J and S) for tb
paus of quadr t · D a
62): (a) quadrats 1 and 2· a s m ata Matrix # 5 (Table 2.13, Pªj
Prove that S must alwa ~ (b) quadrats 3 and 4; (e) quadrats 2 a11d
y exceed J except when S = J = l.
EXERCISES
81
[Q]
[M] [N] [P]
~ ~ ~
M1 M2 N1 N2 N3 PI P2
3
3
-1
8
7
9
5
7
5
6
-3
-2
o
-1)
-1
[! o
8
6
8
9 6 1
4
-2
Chapter Three
3.1. INTRODUCTION
~his is a d~ere~t order from that given in the preceding paragraph. Bot
lists ~re ordmatwns of the data in Data Matrix # 7 and the fact that the
are different s~o':s that the result of an ordination depends on the metho
chosen
. for ass1grung seores to the quadrats or, equivalently on the we1ght
.
ass1gned to the differen t spec1es.
· The vanous . '
ordination techniques de
Quadrat
---
1 2 3 6
4 5
Species 1 50 20 25 45 15 60
Species 2 11 16 20 33 14 17
Species 3 45
Species 4
65 23 49 31 37
12 82 15 23 70 10
R AND MATRIX MULTIPLICATION
vECTO
85
·bed in Chapter 4 are all procedures f
sen . f . or deter · .
biectively mstead o choosmg them arbitr il Illlrung. these weights
oJ di ar y and b
done in the prece ng. su ~ectively as was
X= rxll
X21 X22 X23 X24 X25 X26
X3I X32 X33 X34 X35 X36
X4I X42 X43 X44 X45 X46
The element in the ith row and jth column is written x¡J· Notice that the
first subscript in X¡¡ is the number of the row in which the element appears,
and the second subscript is the number of the column; this rule is invariable
and is adhered to by all writers. In this book, and in most but not all
ecological writing, data matrices are so arranged that the rows represent
species and the columns represent sampling plots or quadrats. Therefore,
when this system is used x .. means the amount of species i in quadrat J.
' lj . .
The single symbol X denotes the whole matrix, made up m this case of 24
distinct numbers. I t does not denote a single number (a sea/ar). Boldface
type is used for X to show that it is a matrix, nota scalar.
Now let us write y' for the Iist of six seores; y' is a matrix with only one
row, otherwise known as a row vector. 1t is
column). The prime shows that it is a row vector. If the same array
elements were written as a column instead of a row, they would fo of
.h . ) rm a
column vector and be denoted by y (w1t out a pnme .
Finally, let us write u' for the list of coefficients by which each elemen .
a column of X is to be multiplied to yield an element of y'; u' is a :0lll
vector and the number of elements it contains must obviously be the sa
as the number of elements in a column of X. (The number of elements:
column of X is, of course, the number of rows of X.) Hence
(u 1 , u 2 , u3 , u4 ). Recall again the example in Section 3.1. The first list 0
u':
quadrat seores, namely,
was obtained by adding the elements in each column of X. That is, the score
for the j th quadrat was given by
4
YJ = X1j + X21 + X3j + X4j = L X¡j•
i=I
u' = (1 , 1, 1, 1).
The elements in the second list of seores, namely,
87
w vector (or one-row matrix). Thus y' ·
O . . . is the pr d
asan equat10n, this IS o uct of u' and X. Written
u'X =y'.
(u1,U2,U3,U4)
[Xn
X21
X22
X13
X23
X14
X24
X15
X25
X16
X26
X31 X32 X33 X34 X35 X36
X41 X42 X43 X44 X45 X46
This
. extended version
. of the equation
. u'X = y' is itself a con densed form of
six separate equat10ns, of which the first and last are:
...........................
Thus the rule for calculating each of the six elements of y', that is, for
calculating the elements in the product u'X, is the formula already given:
4
Yj = L U¡X¡j for j= 1,2, ... ,6.
i=l
in which
s
for j = l,2, ... , n.
Yj = L U¡Xij
i=l
TRANSFORMIN G DATA
88 MAi-~IC
Let us reWrite Equation (3.1) with the sizes of the three matrices sho1,1¡
below them:
u' X = y' .
(1 X s) ( s X n) (1 X n)
An obvious way is to f
carry out the described pr d
.gh . .
.
oce ure tw1ce over usin
different vectors o we1 tmg coeffic1ents u'1 and , g
tWO b · d . ' U2, say. Two vectors
, and y'2 of seores are o tame , each w1th n element Th '
Y1 ' · s. us each of the n
.lll ts now has two seores which can be treated as the e d. .
po . . oor mates of a pomt
. two-dimens10nal space enabling the data to be plott d .
1l1 e ~m~mry
i scatter diagram.
To illustrate, consider Data Matrix # 7 again It has 1 d b
. . . . · a rea y een
condensed to a one-d1mens10nal list of seores m two different ways. The first
condensation use? the vector (1, 1, 1, 1) = uí, and gave the result
(118, 183, 83, 150, í30, 124) = y{. The second used the vector (4, 3, 2, 1) = u'
2
and gave the result (335, 340, 221, 400, 234, 375) = YÍ· It is straightforward
to combine these two sets of results. W e let quadrat 1 be represented by the
pair of seores (118, 335), quadrat 2 by the pair of seores (183, 340), and so
on. Eaeh pair of seores is treated as a pair of coordinates and the points are
plotted in a two-dimensional coordinate frame, with the first seores mea-
sured along the abscissa and the second seores along the ordinate. The
result, a two-dimensional ordination, is shown in Figure 3.1.
400 •
•
•
•
300
•
•
200
º 'ºº zoo Y1
e transformation of their original
Figure 3.1. The data points of Data Matrix # 7 af~er th . two-space. The solid dots show
eoord'mates in four-space given in Tab1e 3·1' t0 coordinates
.b dm·n the text. The h0 11ow half-dots
th . ' · · al d ta descn e 1 . Id d by vectors
e two-d1mensional ordination of the ongm
.
ª
· al ordinatwn
. ·
s of the data y1e e
on the Yi and y axes show the two one-d1menswn
u'1 2
and u'2, respectively.
90 TRANSFORMING DATA
MAlRtc~
U1'X = Y1 I
and 'X-y'
U2 - 2·
UX=Y (3.2)
Here U has two rows, the first being ui and the second uí. That is, U is the
2 X 4 matrix
u=(! 1
3
1
2
1) = (
1
Uu
U21
= (Yu
Yi1
Y12 Yu
Y22 Y23
U X = Y
(2 X s) ( s X n) (2 X n)
The factors are so ordered (UX, not XU) that the number of columns in tbe
first factor is equal to the number of rows in the second · both are s. The
pro duct Y has the same number of rows as the first factor' and the saJll e
number of columns as the second factor.
W e can gener alize further. So far we have discussed one-dimensIO· nal
ordination and two-dimensional ordination. There is no need to stop at tWº
/
VECTOR AND MATRIX M U LTIPLICATION
91
U X = Y . (3.3)
(pXs) (sXn) (pXn)
s
Yij = U¡¡X¡j + ui2x2j + ... +u¡sXsj = 2: U¡,Xrj•
r=l
· · d ts of the
n words the ( i j)th element of Y is the sum of the pairwise pro uc
lements' in the' ith row of the first factor, U, and the Jth column of the
second factor X
' · hink 0 f each of the p rows of u
.Equivalently, as was just done, one can t t y' One
d
. w of y as a row vec or .
s a row vector u', and the correspon mg ro . p· ally the p
h . . · 'X ' p separate tlffies. m '
en performs the mult1plicat10n u = Y ' ther to give
ectors y', each with n elements, are stacked on top of one ano
he P X n matrix Y.
TRANSFORMING DATA
92 MA¡~IC~
Linear Transformations
In what follows, a matr~ of size s X n, that is, with s rows and n colurnni
is called an s X n matnx.
lt was shown in Equation ~3.3) that when an ~ X n data matfix Xu
premultiplied by a p x s matnx U, th~ pro_duct ~IS a _P X n matrix. Now
matrix x specifies the locations of n pomts m s-d1mens10nal space (s-spa
for short). Indeed, each column of X is a list of the s coordinates of one ~
the points. Likewise, Y specifies the locations of n points in p-space; eac
of its columns is a list of the p coordinates of one of the points.
We can, therefore, regard Y as an altered form of X. Both matrice
amount to instructions for mapping the same swarm of n points. X map
them in s-space; Y maps the same points in p-space. Therefore, if p < s,
the p-dimensional swarm of points whose coordinates are given by the
columns of Y is a "compressed" version of the original s-dimensional
swarm of points whose coordinates were given by the columns of X. In other
words, premultiplying X by the p X s matrix U has the effect of condensing
the data and, inevitably, of obliterating sorne of the information the original
data contained.
Now suppose that p = s or, equivalently, that U is an s X s matrix (a
square matrix). Premultiplying X by U no longer condenses the data since
t?e product Y is, like the original X, an s X n matrix. But the premultiplica·
~10n d~es affect the shape of the swarm represented by X, and it ii
mt~restmg ~o see how a very simple swarm is affected, geometrically, by a
vanety of d1fferent versions of the "transforming" matrix u.
To make the demonstration as clear as possible, we shall put
10 1
1 10 10) .
10
Thus X is a 2 x 4 t · .
. ma nx representmg a swarm of n = 4 points in s === two·
space. The pomts are at th
We now evaluate UX e· corners of a square (see Figure 3.2a ). · s
are given in the followin u~mg_ seve~al different Us. The numerical equat1~J1P
and is left as the s b lgX, X IS wntten out in full only in the first equattº.
ym o subse 1 · 111s
the matrix u h . quent y. The first factor in each equatt 0 .
w ose effect is b · . d 10
Figure 3.2 All th emg exammed. The results are plotte .
· e transformar ·1 ts 10
spaces of more than t d" IOns 1 lustrated have their counterpar 0
wo imensions, of course, but these are difficult (wbe
VECTOR AND MATRIX MULTIPLICATION
c----
93
(b)
"f 1
o A---8
o 10
(e) (f)
.~
Vº
e t
Figure 3.2. (a) The four data points of the matrix X and also of IX = X (see text). ( b)-(f).
The same data points after transformation by the five different 2 X 2 matrices given in the text.
The lines joining points A, B, C, and D havé been drawn to emphasize the shape of the
"swarm" of four points.
u=(~ ~) ,
. .
the ongmal square wouId have remained a square but with sides three time
as long.
(e) ( 1.5
0.9
0.1
1.0
)x = ( 1.6
1.9
15.1
10.0
2.5
10.9
16)
19 .
(d) ( 0.1
1.0
I.5
0.9
)x = ( 1.6
1.9
2.5
10.9
15 .1
10.0
16)
19 .
The parallelogram is of the same shape as in (e ) but the corners B and Car
in terchanged.
(e) 2.0
( 0.8 -oA)x = ( 1.6 19.6 - 2.0
-1.0 -0.2 7.0 -9 .2
(/) 0.8
( -0.6 o.6)x= (1.4
0.8 0.2
8.6
-5 .2
6.8 124) .
7.4
The original square is still a square and its size is unchanged but it has been
rot~ted · A matnx
. '
U that has this effect is known as orthogonal. Because.º
f
U X = Y
(sXs) (sXn) (sXn)
would be reduced to
ux' =y'
where u is a scalar (an ordinary number) and x' and y' are both n-element
row vectors (equivalently, 1 X n matrices).
Written out in extenso, the last equation is
u= ( 0.8 0.6)
-0.6 0.8
y=(~~)
(see Figure 3.4).
10
D
A
0
c D 10
10
-5
/
/ B
./
./ (b)
./
./
./
/
\ A/
\ B
c
0 \ 10
,.......~~~~~~--D
\
\
\
,':>\.
\
\
(a) \
(c)
Figur~ 3J. (a) Relative to the axes shown as solid lines, the points A, B, C, and D have
Scanned by CamScanner
X= u 10
1
1
10
10)
10 .
97
98 TRANSFORMINC DATA
MAl~ICl~
Obviously, OR = OM + MR.
It is seen that
OM = OA cos (} = X1COS (}'
and
MR=MN+NR
= AN sin(}+ NP sin(}
= (AN + NP)sin(}
= AP sin(} = x 2 sin().
A
\
\
\
\
\
\
Figure 3.4. lllustrating the co · f .
\ s to
coordinates relative t th nversion
o e y-axes.
°
the coordmates of point P relative to the x-axe
VECTOR AND MATRIX M ULTIPLICATION
Tberefore,
Y¡ = X1COS (} + X
2
sin O•
3
Exactly analogous arguments (which the reader should h ( .4a)
e eck) show that
Ji = - X1 sin(/ + X2COS (/
. f (3.4b)
Thus the pair o Equations (3.4a) and (3 4b) .
· · t f th · · give the y-co d.
pomt m erms o e x-coordmates and th or mates of the
· · . e angle O w ·f
equauons as a smg1e equat10n representin th . · n mg the pair of
column vectors gives g e equality of two two-element
O is the angle through which the axes are rotated. In the example on page
94, 8 = 36.87º, whence cos (} = 0.8 and sin 8 = 0.6.
An important property of orthogonal matrices must now be describe?·
F" · · the matnx
rrst, a defi.nition is needed: the transpose of any matnx is
obt · · I ntly its columns as
amed by writing its rows as columns or, equiva e '
rows. For example, the transpose of the 2 X 3 matrix A, where
100
TRANSFORMING DATA
MAr~ICEs
is the 3 X 2 matrix
or
UU' = l (3.6)
since cos 2 0 + sin2 0 = l.
Equation (3.6) is true, in general, of all orthogonal matrices of any size.
Orthogonal matrices are always square. Before discussing the general s >< s
orthogo 1 · · ·
n~ matru, It is desirable to make a small change in the symbols.
The reqmred change · h · ·
been relabeled thus: is s own m Figure 3.5. It is seen that the angles bave
--
\ 9 12 \ e22 ________ .-- y1
\ \ - ----
\ / 9 11 ----
,/ :x::I
/
\ xi
\ \
(a) \
\ (b)
Figure 3.5. The angles between the x-axes and the y-ax . Th .
and 012 with the x 1- and xr axes; (b) The Yraxis makese:.n(~) (}e Y1-axis m~es angles 011
g es 21 and 022 with the x1- and
xr axes .
It is obvious from the figure that 811 = 822 is the same as the original O; also
that
812 = 90º - 8 or 8 = 90º - 812'.
and
821 = 90º +8 or 8 = 821 - 90º.
The reason for giving every angle a separate symbol becomes clear when we
discuss the s-dimensional case.
Consider, now, how the change in symbols affects U. The old and new
versions are as follows:
u
12
= sin 8 = sin(90º - 812) = cos 812;
u 21 = -
. 8 = - sm
sm · (821 - 90º) = cos821;
In t~s section we consider matrices of the form XX' and X 'X. These are tbe
matnx ?r~ducts formed when a data matrix X is postmultiplied alld
premultiplied r t. 1 · X11
' espec ive y, by the transpose of itself. If X is an s
TRANSFORMING DATA
MATRICES
TABLEJ.2. DATA MATRIX #8, IN RAW FO
------------~ RM X, AND CENTERED X(e)·
- 1
X1 =-t4 =9X1¡
x+~ l!)
j
8 10
11
5
3
5
-
X
2
=-41 t =8X2¡
j
-
X
3 =-41 t =4X3¡
j
(-5 -1 1
XR =
-2
9 3
1
-5
1 -~)
The SSCP matrix R and the covariance matrix (l/n)R.
(-5
-~)(=~
-1 1 9
R = XRXR =
-2
9 3
1
-5
1 -5
-7
3
-ri
-( -~~ 10
-88
164
-20
-2~
10)
var( x1) cov(x 1 , x 2 ) cov(x1, x 3 ) )
~R = cov( x 2 , x 1 ) var(x 2 ) cov(x 2 , x 3 ) =
( 13
-22
-22
41 -5
2.5)
( cov(x x var(x 3 ) 2.5 -5 1.5
3, 1) cov(x 3 , x 2 )
Notice that if the right-hand side were divided by n, * it would give the
variance of the observations xil, x¡ 2 , ••• , xin' that is, the variance of the
variable "quantity of species i per quadrat." lt should be recalled that
the variance of a variable is the average of the squared deviations of the
observations from their mean. In symbols, the variance of the quantity of
species i per quadrat is
1 n - 2
var(x¡) = - L (xiJ - x;) ·
n J=l
The standard deviation of this variable, say <1;, is the square root of the
*n is . . . d ts examined constitute the total
po ul u~ed as d1v1sor since we are assurrung that the n. qua ra a lar er "parent population"
f P ~tion of interest. If the quadrats are a sample of size n from g
or Which the variance is to be estimated, the divisor would be n - l.
THE PRODUCT OF A DA TA MATRIX ANO ITS TRANSPOSE
X¡¡ - X¡
(x '.1 - x.1' x z2. - x- _) X¡2 - X¡ n
¡, · · ·, X¡n - X¡
L (x;1 - x; )
2
(3.8a)
J=l
variance, or
n
L (xhj - xh)(xij - X¡) (3.8b)
j=l
ª
Observe that when, as here, a matrix is multiplied by scalar (in this case
the scalar is n ), it means that each individual ele~ent of the matrix is
multiplied by the scalar. Thus the (h, i)th term of R is n cov(xh, X;). R isa
syrnmetric matrix since, as is obvious from (3.8b ),
In the raw data matrix X previously discussed the elements are the mea·
sured quantities of the di.tferent species in each of a sample of quadrats or
other sampling units. Often, it is either necessary or desirable to standardize
these data, that is, rescale the measurements to a standard scale.
Standardization is necessary if di.tferent species are measured by different
methods in noncomparabl · p . . 1; ... g it
mayb · e uruts. or example ' m vegetat10n samplJJJ ·
e converuent to use cover as the measure of quantity for sorne spec1es,
~nd numbers of individuals for other species · there is no ob1ection to usíng
mcommensurate u ·t8 h ' J d·zed
bef ore analys1.s. ru suc as these provided the data are standar 1
pRODUCT OF A DATA MATRIX ANO ITS TR
Tt-ff ANSPosE
107
Standardization is sometimes desirabl
.. d 1 e even wh
robers of ind1v1 ua s) are used for the m en the same units (
nu l=J' f . . easurement f e.g.,
t1·es · It has the euect o . we1ghtmg
.
the species
accordm
. o all species quant·
. i-
are species have as b1g an mfluence as cornn-. g to theu rarity so that
r . hi . ..uiuon ones o th
Ordination. Sometunes t s is desirable somet· n e results of an
' unes not o
wish to prevent the common species from d . . · ne may or may not
matter of ecological judgment. A thorough di.ºsinin~tmg an analysis. It is a
. . cuss1on of the
of data standardizat10n has been given by Noy-M . pros and cons
(1975). eu, Walker, and Williams
The usual way of standardizing, or rescaling th d t . . .
observed measurements on each species after ' they e ha a isbto d1v1de the
. . ' ave een centered
(transformed. t~ deviat10ns fr?m the respective species means), by the
standard deviatlon of the spec1es quantities. Thus the element x .. in X is
replaced by IJ
Jvar(x¡)
say.
We now denote the standardized matrix by ZR, and examine the product
~RZR. = SR, say. (ZR is the transpose of ZR.)
The (h, i)th element of SR is the product of the hth row of ZR
postmultiplied by the ith column of ZR. Thus it is
X·in - X·1
CJ¡
W
h ere rhr. is the correlation coefficient between species h and species i tn
.
the
n quadrats. . .
Observe that the (i , i)th element of SR 1s n. This follows from the fac¡
that cov(x¡, X¡)= var(x¡)·
The correlation matrix is obtained by dividing every element of SR by n.
Thus
~~:)
1
= ( - 0.~529
0.5661
-º·{ 529
- 0.6376
__?~1~~~6).
--------~----~~~~----
~ooLJCT Of A DATA MATRIX AND ITS TR
rt-IE P ANSPOSE
o
/41""
o
o
~
l( 13
-22
2.5
-22
41
-5
_;.sl(~
1.5 o
fü
o
o {E
o . l
In the general case this is the product (1/n)BRB where B is th d·
rnatrix
.
whose (i, i)th .element
.
is Jvar(x i.) =a¡. N ot.ice that when
e iagonal
three
rnatnces are. to be multiplied (e.g., when the product LMN is · to be found) 1.t
malees no difference. whether one first obtains. LM
. and then postmu1tiplies
· . ' 1.t
by N, or first obtams MN and then premultipbes it by L· Ali that matters is ·
that the order of the factors be preserved. The rule can be extended to the
evaluating of a matrix product of any number of factors.
x 1 = 7.67
17
11 x2 = 8.00 -----------
X'= r10: 3
l) x3 = 6.00
14 1 x4 = 6.33
X'Q --
r-3.67
O
9.33
3
-5.67)
-3
4 -3 -1
7.67 -5.33 -2.33
The SSCP matrix Q and the covariance matrix (1 / s )Q
45 -37
18 -64.67)
-6 -9
-6 26 49
-9 49 92.67
44.22 15 -12.33
1 15 6 -2 - 21.56)
-3
;Q = [ -12.33 - 2 8.67 16.33
- 21.56 -3 16.33 30.89
in the quadrat. (If a species is absent from a quadrat, it is treated as
"present" with quantity zero.)
Table 3.4 (which is analogous to Table 3.2) shows X', the transpose o
Data Matrix #8, and its row-centered (quadrat-centered) form XQ in the
upper panel. In the lower panel is the SSCP matrix Q = X XQ (here XQ is
the transpose of XQ) and the covariance matrix (l/s)Q. 0 .
The (j, j)th element of (l/s)Q is the variance of the species quantities in
quadrat j. The (j, k)th element is the covariance of the species quantities in
quadrats j and k. These elements are denoted, respectively, by var(xj ) and
cov(x1, xk); the two symbols j and k both refer to quadrats. Notice that
var(x) could also be defined as the variance of the elements in the jth row
of XQ, and cov(x1, xk) as the covariance of the elements in its jth and ktb
rows.
l
-1.7497 -0.6611 2.9948 3
The correlation matrix is
1 0.9209 -0.6300 -0.5832
1s - 0.9209 1 -0.2774 -0.2204
-; Q - -0.6300 -0.2774 1 0.9983 .
( 0.9983 1
-0.5832 -0.2204
Table 3.6 shows a tabular comparison of the two procedures just dis-
cussed. These procedures constitute the basic operations in an R-type and a
Q-type analysis. It is for this reason that the respective SSCP matrices have
been denoted by R and Q.
Centered XR is matrix X centered by rows (species). Its XQ is matrix X' centered by rows (quadrats).
matrix (i, j)th term is
X¡¡ - X¡ where Its (j, i)th term (in row j , column i) is
1 n
X¡= - L xiJ.
n J=l
R = XRXR Q = X 0XQ
SSCP where X Ris the transpose of X R. where XQ is the transpose of x 0.
matrix R is an s X s matrix; each of its elements is a sum of Q is an n X n matrix; each of its elements is
n squares or cross-products. a sum of s squares or cross-products.
.!:_R lQ
s
n
Covariance The (i, i)th element var(x;) is the variance of the The (j, j)th element var(x) is the variance of
matrix elements in the ith row of XR (quantities of species i). the elements in the jth row of X 0 (quantities
The (h, i)th element cov(xh, X;) is the covariance in quadrat j). The (j, k)th element cov(x./, x;.)
of the hth and ith rows of XR (quantities of species is the covariance of the j th and k tb rows of
h and i). X é, (quantiti es i n qua dra t s _/ a.n d k ).
ZR Z'Q
Standardized Its (i, j)th term is
lts (j, i )th term is
matrix x,, - x, x1, - xj
Jvar( x,) a,
vvar(xj)
where a, is the standard deviation of the where ªJ is the standard deviation of the
quantities of species i. quantities in quadrat j.
1 1 I
- SR= -ZRZR
n n
Correlation The (h, i)th element of (l/n)SR is r 17 ¡ , the The (j, k )tb element of (1/s )S0 is ')k, the
matrix correlation coefficient between the h th and i th rows of correlation coefficient between the jth and k tH
XR (i.e., between species h and species i). rows of X 0 (i.e., between quadrats j and k ).
The matrix has ls on its main diagonal since rhh = 1 Tbe matrix has ls on its main diagonal since
for all h. '.iJ = 1 for a11 j.
ªSymbols h and i refer to species; syrnbols j and k refer to quadrats.
114
TRANSFORMING D
AlA~A
l~I(¡,
X=
with n columns.
Hence
nx-21
- -
~~2.~1 . . . ~~~
nx- 1x- 2
-2 -
n.x.2 :~
••• '. '.'. • •
-¡
nx- 1xs
-
rnxsxl
- - - -
nxsx2 ... -2
nxs
L (X hJ - X,;) (X ij - X;) .
j=l
RODUCT Of A DATA MATRIX ANO ITS T
rt-IE P RANSPOSE
115
TABLE3.7. COMPUTATION OF THE sscp
~:~(81;S~G:;:;¡¡~IO~t~XX)~(~::R::;
2 5 5 4 10 3 5 = 200 420 108
xx' = ( 4~ 4~ 4~ ~ )(~ 4 9
9
r:)1-(:24 ::: 1:) 70
8
8
4
4
- 288
144
256
128
128
64
I ( 52 -88 10)
R =XX ' - XX = -88 164 -20
10 -20 6
n
L: xhjxij - nxhxi.
j=l
We now show that these two expressions are identical. In what follows,
ali sums are from j = 1 to n.
Multiplying the factors in brackets in the first expression shows that
Now note that Lx x .. = x I:x 1.. and LX¡XhJ = X¡LXhJ since xh and X¡ are
h 11 h 1 ¡ 0 f ·) Similarly
onstant with respect to j (i.e., are the same for all va ~e~ 1 · '
Cxhx¡ = nxhx; since it is the sum of n constant terms xhxi. Thus
and "X·
Í-J I}
. = nX¡·
116 TRANSFORMING DAT
AMAlR1ct~
Then
= '"'Xh
~ J·X··
lj - nXhX1·
as was to be proved.
This section resumes the discussion in Section 3.2, where it was shown how
the pattem of a swarm of data points can be changed by a linear transfor-
mation. It should be recalled that, for illustration, a 2 X 4 data matrix was
considered. The swarm of four points it represented were the vertices of a
square. Premultiplication of the data matrix by various 2 X 2 matrices
brought about changes in the position or shape of the swarm; see Figure 3.2.
When the transforrning matrix was orthogonal it caused a rigid rotation of
the swarm around the origin of the coordinates (page 94) ; and when the
transformation matrix was diagonal it caused a change in the scales of the
coordinate axes.
Clearly, one can subject a swarm of data points to a sequence of
transformations one after another, as is demonstrated in the following. The
relevance of the discussion to ecological ordination procedures will become
clear subsequently.
is an example. As in section 3 2
f{ere . · , we us
enting a swarm of four pomts in two-sp e a 2 X 4 data mat .
pres ace. This time let nx
X= (11 15 14 45) .
s the data swarm consists
. of the vertices of a rectangle (
hu e first transformahon
. .
1s to cause a clockw·1se rotati seef Figure 3.6a) ·
Th
rO
und the origin through an angle of 25 °. The 0 th
. .
on the rectangle
r ogonal t·
°
produce this rotat10n 1s (see page 101) ma nx required
::C2 :X:2
(a) (b)
5 5
D 5
:x:, 10
x,
X2
~
(d)
10 (e) 10
5 10 15 :x:,
5 'º
. . ( d) U' AVX· The lines
~e 16· The data swarms represented by: (a) X; ( b) UX; (e) A~, The eleroents of U and
ar:g .the ~oints are put in to make the shapes of the swarms apparen ·
given m the text.
TRANSFORMING DAT
118
A"1Al~IC¡1
Then
The newly transformed data are plotted in Figure 3.6c. The shape of the
swarm has changed from ~ rectangle to a parallelogram.
The third and last transformation consists in rotating the parallelogram
back, counterclockwise, through 25º. This is achieved by premultiplying
AUX by U', the transpose of U. It is found that
l
). That is, since
A1 O o
A=
[ ¿.. ~: ..::·... ~: ,
the eigenvalues of A are A1,. A2, ... ' As. The eigenvalues of ama tnx . are
1uAenoted by AS by long-established. .
custom· '
likewise '
the matrix· f ·
o eigenva1-
ues is always denoted ?Y A. This IS why A was used for the diagonal matrix
that rescaled the axes m the second of the three transformations performed
previously.
The rows of U, which are s-element row vectors, are known as the
eigenvectors of A ( also called the latent vectors, or characteristic vectors, of
A).
In the preceding numerical example we chose the elements of U (hence of
U') and A, and then obtained A by forming the product U'AU. Therefore,
we knew, because we had chosen them, the eigenvalues and eigenvectors of
this A in advance. The eigenvalues are A1 = 2.4 and °"A. 2 = 1.6. And the
eigenvectors are
ui = ( 0.9063 0.4226)
and
A= U'AU
and premultiply both sides by U to give
UA = uu~u.
UA = IAU = AU.
Let us write this out in full for the s = 2 case. For U and A we write each
separate element in the customary way, using the corresponding lowercase
letter subscripted to show the row and column of the element. For A we use
the knowledge we already possess, namely, that it is a diagonal matrix. Thus
Ü ) (Un
A2 U21
Un ªn + U12 ª21
(
U21 ªn + U22 ª21
which states the equality of two 2 X 2 matrices. N ot only does the left side
(as a whole) equal the right side (as a whole), but it follows also that anY
row of the matrix on the left side equals the corresponding row of tbe
matrix on the right side. Thus considering only the t9p row,
an equation having two-element row vectors on both sides. This is the same
as the more concise equation
ENVALUES ANO EIGENVECTORS OF A
¡Jff LIG SQUARE S\'M
METRIC MATRIX
. 121
.ch ui 1s the two-element row vect
in wl1l or con . .
nonzero element in tl ti stituting the fi
an d "A i is the only. le rst r rst row of U
ther are an e1genvalue of A and its e ow of A. Hen , ,
ioge orresponct· . ce 1\1 and '
tJotelling's method for obtaining the . ing e1genvect U1
r1 nuinenc l or.
ele ments of ui when the
. elements of A ar .
e g1ven p ª
values of A i an
d the
stePs are illustrated usmg roceeds as foll ows. The
A= ( 2.2571 0.3064)
0.3064 1.7429 '
Step J. Choose
. arbitrary tria! values for the e1ements of u' D .
tnal vector by w(Ó)· It is convenient to t . l· enote this
s art with w(Ó) = (l, l).
/
Step 2. F orm th e prod uct Wco) A. Thus
( 1, 1 )A = ( 2.5635, 2.0493 ).
say. Now wci) is to be used in place of WcÓ) as the next trial vector.
Step 4. Do steps 2 and 3 again with wá) in place of WcÓ), and obtain /2 and
Wá)·
w<F> = ( 1, 0.4663)
and
lp = ~l = 2.4000.
We now wish to obtain uí from w<~· Recall (page 102) that UU 1 ::::
1
what comes to the same thing, that the sum of squares of the element~ ~
any row of U is l. Hence uí is obtained from w<~ by dividing each elemen
in w<~ by the square root of the sum of squares of its elements. That is,
1i2 + ~.46332
0.4633 )
u; = ( , 2
/1 + o.4633 2
= ( 0.9063, 0.4226 ).
Cycle Tri al
Number Eigenvector
w{i) I
W(i)
A -- I i+IW(i+l)
I
U = ( c?s () sin o)
-sm() coso
d we have just obtained uí the first row f U
an o Which is
0 í = (0.9063, 0.4226).
Therefore,
u= ( 0.9063 0.4226)
-0.4226 0.9063
A= Ul\U.
Premultiply both si des by U and then postmultiply both sides by U'. Hence
UAU' = UU1\UU'.
UAU' = (o2.4 O ).
1.6
{~-º 0.2
5.6
2.4)
-0.4 .
~-2
B =
-0.4 5.2
TRANSFORM ING DA
124 TA MAl~ICt\
( 1, -0.058 , 0.854 )B =
( 8.038, -0.466 , 6.864) = 8.038( 1, -0.058 , 0.854).
Therefore,
A1 = 8.038,
/i 2 2
+ ( -0 .058) + 0.854 2 = 1.316
shows that
Now, to find the second eigenvalue and eigenvector, A2 and u;, proceed
as follows. Start by constructing a new matrix B · it is known as the first
1
residual matrix of B and is given by '
B¡ = B - A1U1Ui
0.760)
= B - 3 .o 33 -0.044 ( 0.760, -0 .044, 0.649)
(
0.649
6.0 0.2
= 0.2 2.4) ( 4.639 -0 .269 3.962)
( 5.6 -0.4 - - 0.269 -0.230
2.4 -0.4 1.565
5 .2 3.962 -0.230 3.383
1.361 0.469
= ( 0.469 -1.562)
4.035 -0.170 .
-1.562 -0 .170 1.817
ENVALUES AND EIGENVECTORS OF A
ft·IE EIG SQUARE SYM
METRIC MATRIX
. n1 . 125
Note: this is o y approximate; for accurate r
(more decimal places would be needed ·) esults at the next step m
The values of A2 and u'2 may now b b . ' any
A d ' . e o tallled f
sarne way as 1 an "1 were obtamed from B. lt . rom B1 in exactly th
is found that e
A. 2 = 5.671 and u'2 = ( o. 144
. . ' 0.984, -0.102).
finally, smce B is a 3 X 3 matrix there ;_ 5 a third .
pair still to be found, A3 and u3. To find th e1genvalue-eigenvector
residual matrix of B from the equation em, compute B2 the second
or, equivalently,
and operate on B2 in exactly the same way as B and B1 were operated on. It
is found that
UBU' =A.
Substituting the numerical results just obtained in the left side of this
equation gives
UBU' =
( 0.760
0.144
-0.044
0.984
0.649) 0.2
-0.102 (6·º -0.4
0.2
5.6
2.4)
-0.4
5.2
0.171 0.754 2.4
-0 .634
=
8.043
-0.001
-0.001
5.667
O
0.001
)
~
( 8.0
o
33
~.o671 ~3.092 ) =A.
( o 0.001 3.091 o d
(Tb . . al Jaces were use .)
e mexactness is because only three decnn P
126 TRANSFORMING DATA
MATRICts
and
2
L A¡ = 2.4 + 1.6 = 4.
i=l
3
LA¡= 8.038 + 5.671 + 3.092 = 16.801
i=l
(3.13)
or, equivalently,
¡~
!~i
l
The factor u'X on both sides is an n-element row v~ctor. · value of Gas
z • • • h t A. is an eigen
Comparing (3.13) and (3.14), it is ev1dent t a i f Gis either equal to
wU d. ·genvector o
e as of F, and that the correspon mg ei
or proportional to u'.1 X. b derived froro those of
Thus the eigenvalues
. •
and e1genvec t 0 rs of G can ·
. 1·r ei"ther .n
e espec1allY
F · ·d t omputation, . alys1s
or s ·
ª
or vice versa. This fact is a great 1 0 e
tly exceeds s·
A direct eigenan .
f ons 1f n
o is very large. Thus suppose n grea . ve long computa I
f the n X n matnx . G = X'X would entail ry
TRANSFORMINC DATA M
128 Al~ICts
8
X= (1i 2
Then
130 46 109)
F =XX'= ( 122 105 ) and G = X 'X = 46 68 72 .
105 189 ( 109
72 113
This result can be checked by evaluating both sides of Equation (3.13) and
finding that
~) =
8
2 ( 10.652, 6.334, 10.589 ).
EXERCISES
A= uo 2 1).
-1 '
B= ( j
-2
1
-1
o
1
o
3
-!);
1
C=
[-! 1
2
6
130
What are the following products? (a) AB; (b) BC; (e) AC; (d) C .
CB; (f) BCA; (g) CAB. A, (e)
3.2. See Figure 3.7. Four data points, A, B, C, and D, have coorct ·
1nates ·
two-space given by the data matrix X, where in
B e D
2
1 -1
2 -1).
-1
B
A
4 4
D
2
o o
- 1 e
-2
-4
D
-5
- 3 -1 3 5 -6
e -7
-3 -1 o 2
-2
y y2 y3
1
4
1
find the 2 X 2 correlation matrix.
6 Suppose A = UAU ', where U is orthogonal and th d' .
B. · A IS. e iagonal matnx
i\ = 19.71; A4 = 0.02.
d 0 an eigenanalysis of S to
What is A. 2? [Note: there is no need to
answer the question.] . le of your own
. h a numenca1examp .
Show, using symbols, and test wit XY)' . the transpose of XY,
devising, that (XV)' = Y'X'. _:'he;e
Show, likewise, that (ABC) - C B
i .
[N::e: these results will be
Ordination
4.1. INTRODUCTION
ordination · · stems
. ·~ nt species-we1ghtmg sy ,
Often one wants to use two or more di ere . · are obtained.
. · al ordmat10ns
an d then two or more different one-dimenswn. h wn in Figure 3.1
Th . 1 b mbmed as s o . d
. ese separate results can convement Y e co . . h ve been combme
1l1 Ch ·
apter 3, where two one-dunens1ona· 1 ordmat10ns ª
133
134
ORD1NAl1a~
to give a two-dimensional ordin~tion in which every ?ºint (quadrat) ha
1·t coordina tes two seores obtamed from the two dtfferent weighr s as
s
1eros. Obviously' if one were to use s different weighting
. .
systems (whlllg 8Ys,
ere .
the number of species), the result would be an s-d1mens10nal ordinar .8 is
.
swarm of data points would occupy an s- d1mens10na . 1coordinate fram ion , the
. h . . 1
by projecting the pomts ~nto. eac ~s m tum, one cou d recreate each of
eand
/
/
( b)
137
argumen t easier to comprehend s b
ake an . · u sequently (
J1l d es devised for analyzmg "unnatural" d t page 142) the
oce ur a a (the com f
pr lied to more believable data swarms that i ers o a box)
are app , s, ones that are diffuse and
irregular. .
e on
sider the eight pomts at the corners of the box . p·
. . . . 10 igure 4.2a (only
of the comers are V1Slble m the d1agram since for th k .
seven . ' esa e of clanty
the box is s~own as an opaq~e solid). The center of the box is at the origi~
of the coordmates. If the pomts were present alone, without the edges, and
were Pro1ected
J
onto the plane defined
•
by the x 1- and x 2-axes (the xv x
plane), there w~uld be a confusmg pattern .ºf points with no irnmediately2
0 bvious regulanty; the same would be true if they were projected onto the
x plane, or the x 2 , x 3 plane. However, if the box were rigidly rotated
~~o:i the origin of the coordinates until it was oriented as in Figure 4.2b ,
and its comer points were then proJected onto the three planes, each
::x¡2
1 s
1
R
- - - - - :X:¡
( b)
G- - - - - - - yl
R
p 1 Q
1
: . two different
.1 coordina te traro e in. entation that
.
F· h dimensional .on to an on s are
l~e 4.2. A box (cuboid) plotted in ~ t ~ee~t appears aiter. rotaU of the box's co~e~epth
onentations. In (a) the box is oblique; m ( ) The coordinates ·dth height, an
bnngs
· its edges parallel with the coord.ma te . axes.
the Jower grap b· Tbe w1 •
~enoted by xs in the upper graph and b~ ys 1ll
f the box are PQ, QR, and RS, respect1vely.
138
dge has a nonzero lengths. on ali three axes, and ali these projections are
"true"projection
e
1 the . · f h ·
ess tbantherefor e requue a rotation o t e coordmate
. . frame that will cause
we d e of t he box to have a .nonzero
. h
PIOJection (equaJ to its true length)
. Only, and zero .proJectlons on t e other two axes.
each e gaJCIS
0
. This require-
o onet spec1fies m
. . mathematical terms exactly what the desued rotation is to
men
acbieve. . the next stages of the discussion, considera numerical examp_le
graphica1 representation. The box to be rotated 1s the oblique box m
andToitsclanfY.
f2
1
(a) 1
1
1
A 1 e
( b)
A
E
------yl
e
1
G 1 i· ue
1 • an ob iq
1
X=
(-4.04
7.07
-8.66
1.41
1.73
7.07
-2.88
1.41
2.88
-1.41
-1.73
- 7.07
8.66
-1.41
4.04
-7.07
3.26 o -4.89 -8.16 8.16 o 4.89 -3.26
The matrix y giving the coordinates of the points as shown in Figure 4.3b ,
af ter rotation of the box.
A B e D E F G H
y=(~
6
5
6
-5
6 -6 -6 -6
-5 5 5 -5 -6)
-5
-4 4 -4 4 -4 4 -4
ªThe capital letter above each column is the label of the corresponding point in Figure 4.3a.
Figure 4.3a. The three coordinates of its eight comer points are the elements
in the columns of the 3 X 8 data matrix X shown in Table 4.1. (The reason
for choosing these coordinates becomes clear later.) In Figure 4.3 (in
contrast to Figure 4.2) the three-dimensional graphs have their third axes
perpendicular to the plane of the page. Therefore, what the drawing in
Figure 4.3a shows is the projection of the oblique box onto the x 1 , x 2 plane.
The size of the box can be found by applying the three-dimensional forro
of Pythagoras's theorem. Thus d(AB), the length of edge AB whichjoins the
points A = (xn, xw x 31 ) and B = (x 12 , x 22 , x 32 ), is
2
= /( -4.04 + 8.66) + (7.07 - 1.41-) 2 + (3.26 - 0) 2
= 8.
Likewise,
2
= /( - 4 .o4 - 2 .88) + (7.07 + 1.41)2 + (3.26 - 8.66)2
= 12.
In the same wa .t
y, 1 may be found that d(AC) = 10.
AL coMPONENT ANALYSIS
pRrt•.iCIP
UX=Y.
YY' =
(
288
O
o
o
200
o
o '
125
l
a diagonal matrix. 1t is diagonal because the points whose coordinates are
the columns of Y form a box that is aligned with the coordinate axes. (This
last point is not proved in this book; it is intuitively reasonable, however,
and should seem steadily more reasonable as the rest of this section
unfolds.)
Since we are to have UX = Y, we must also have
(4.1)
UX(UX)' = YY'
t ·x product UX. Now
where (UX)' is the 8 X 3 transpose of the 3 X 8 man
, X'U' Thus (41) becomes
recall (from Exercise 3.9) that (UX) = · ·
(4.2)
UXX'U'
-,..._
= yY'.
. . of the same forro as
Next observe that both XX' and YY' are SSCP matnces d synunetric. We
R. f rse square an
in Section 3.3. Both matrices are, 0 c~u ' Then (4.2) becomes
use the symbol R for XX' and denote yY by Rv· (4.3 )
URU' = Ry· I
. It is ciear that U, U ,
Finan ·hE ation (3.lZ).
Y, compare this equation w1t qu
142
UX=Y=(~ ~ -~ -~ -~ -~ =~ =~)
4 -4 4 -4 4 -4 4 -4
ªX is the data matrix defining the comer points of the box in Figure 4.3a.
N ow consider th li · ar111s·
S e app catwn of this procedure to "realistic" data sw ·
uppose one had an s x d . cíes JU
n ata matnx X listing the amounts of s spe
L coMPONENT ANALYSIS
pfllfloJCIPA
1. Center theh
data by species (rows). Do tbi b
f h s y subtracting f
eleIIlent in X t e mean o t e elements in the same r rom every
data IIlatrix XR. ow. Call the centered
2. Forro the s X s SSCP. matrix R = XRXR.
3. Forro the s X s covanance matrix (l/n )R As w h .
· · · e s a11 see, tbis step
is not strictly necessary, b ut 1t 1s usually done.
4. Carry out ~ eige~analysis o~ R or (l/n )R. The eigenvectors of these
two matrices are 1dentical. Combme these s eigenvectors, each with s
elements, by letting them be the rows of an s X s matrix u; u is orthogo-
nal. Tue eigenvalues of R are n times those of (l/n )R; hence it is
immaterial whether R or (1 / n )R is analyzed. Let A denote the s x s
diagonal matrix whose nonzero elements are the eigenvalues of the covari-
ance matrix (1/n )R. Then [compare Equation (3.12)]
It follows that
URU' = nA
the eigenvalues of R.
and the nonzero elements of n A are tnx· y= uxR. Each
· g the s X n roa f
5a. Complete the PCA by f ornun . f of the data points. l
oordmates 0 one th
column of y CY1ves the new set o f s e . ·t is found that e
O" coordmates, i h
the points are plotted using these new . h nged. The only e ange
.
pattem of the points relat1ve to one
nother 1s une
ª . ª
.t has been rotate ar
d ound
a smgle entl Y
produced is that the whole swarm as a·nate frame.
· f h ew coor 1 . · 1swarm
tts centroid, which is the origin o t en . Table 4.3. The ong~na 4 4a·
· h sults m · figure · '
Figure 4.4 shows graphically t e re f X is plotted in 1 nts of
of . the elements o t the e eme
Pomts whose coordinates are h as coordina es f the whole
the transformed swarm which after pCA as the centroid o
y . ' ay be seen,
' is plotted in Figure 4.4b. As ro
TABLE4.3. THE STEPS IN A PRINCIPAL COMPONENTSANALYSIS OF DATA MATRIX #9.
Xa = (-15
-9 -8 -7 -4 -2 4 6 7 13 15)
9 4 19 9 5 14 -6 -16 -8 -17 -13 .
The SSCP matrix is
( 934 -1026)
R = -1026 1574 .
The covariance matrix is
_!_ R = (
84.9091 -93.2727)
n -93.2727 143.0909 .
The matrix of eigenvectors is
u
= ( 0.592560 -0.805526)
0.805526 0.592560 .
The eigenvalues of the covariance matrix are the nonzero elements of
A= ( 211 704
0 16~95 ).
The transformed data matrix ( after rounding to one decimal place) is
y=( -16.1
-6.7
-8.6
-4.9
-20.0
4.8
-11.4
-0.3
-6.4
- 0 .3
-12.5
6.7
7.2
-0.3
16.4
-4.6
10.6
0.9
21.4
0.4
19.4)
4.4
145
a~.x"
F1gur~.
bown by a cross), which is X2) = (35, 41) in Figure 4.4a, has
swaIJTI (\ed to the ongm at (Y1, Y2) - (O,?! m 4.46, and the swann
been sJuf has been rotated so that its I_ong aius is parallel with the
as a .whole
bis statement is expressed more prec1sely later).
.
1
(t
•aJ(!S Sa describes PCA as a process of rotatmg a swann of points
.
paragraph centr01.d · lt is instructive to rephrase the. paragraph, calling it Sb,
ar0
that
und its1·t descn.b es the process as one of rotatmg the coordinate frame
so .
elat1ve to the swarm.
X2
60 •
•
50 • •
• •
40 +
(a) ••
30
•
• •
20
10
o 30 40 50 60 X¡
o 10 20
Y2
20
(b) 10
• • •
Y1
-20
-20 -10
• •
• - 10
d data whose
- 20
. inal un transforme warm.
( b) Af ter
The ong .d of the s has been
igure 4.4. Two plots of Data M4ª3trixThe # 9. (a)marksarm's
cross the centrmd 'fbe
centro1 .are swarm
given by Y in Table
eordmates
. .
are given
. . of tby Xm · Tabledinates
· · . at al
1s
be ne w coor. t measure d o
thengswaxes Y1 an d Y2•
A. The ongm
tated. The coordinates o f each pom ,
.3.
146
ORD1t-.¡Al
I()~
This is easily done. Note that any imaginable point on the Ji-axis h
. . as a
coordinate of zero on the y1-axis, and vice versa. Hence the y1-axis is thes
of ali imaginable points, such as point k, of which it is true that et
Indeed, the set of all points conforming to this equation is the y1-axis, and
its equation is
-0.805526
0 .592560 X lO = - l3 .5 9 .
This li~e is shown dashed in Figure 4.5 and is labeled .Y1 · The Yfa,Xis ¡,
found m the same way. It is the line
coMPONE NT ANALYSIS
r.JClpAL
pfl J
147
Figure 4.5. An:other V:ªY of portraying the PC~ of Data Matrix # 9. The points were plotted
using the coordinates rn the centered data matnx XR (see Table 4.3) with the axes labeled x1
and Xi- The new axes, the y 1- and y 2 -axes, were found as explained in the text. Observe that
the pattern of the points relative to the new axes is the same as the pattern in Figure 4.4b.
We have now done two · PCAs, the first of the comer points of a
three-dimensional box, and the second of an irregular two-dimensional
swarm of points. It should now be clear that, in general (i.e., for any value
of s), doing a PCA consists in doing an eigenanalysis of the SSCP matrix
y= UXR,
Whe . he eigenvectors of R and
re U is the orthogonal matrix whose rows are t
(1 /n)R·
a· ate axes from the
h
' or (2) find the equations of t e ro ª
t ted coor in
148
equation
Ux =O,
x= and o= rn
. Ux = 0 denotes s equations of which the i tb is
and the equat10n
Ry = YY' and
respectively
I · · 3) d Table 4 2 (pages 141
t should now be recalled, from Equat10n (4· . an . · s. all their
and 142), that both Ry and (1/n )Rv are diagonal mNatnce ~e already
· d.iagon al are zero. ow,
elernents except for those on the mam
know [see Equation (3.9), page 106] that
1 1
-Ry::::: -vY' = ( 211.704
n n O
16~95) =A.
Thus the Vari.ances of the . . the
eigenvalues of th · .
eu covanance Pnncipal
. eomponent seores are equal to
I t is. mtuitively
.
.d matnx.
greatest "spread" of the o·ºlll in~pection of Figures 4.4 and 4.5 t a Jso
evi ent fr · h t the
tlhakt, although !he data p mts is in the direction of the Y1-axis, and ªhen
oo ed at · Points sh . ·nw
lil the frame of th ow obVIous negative corre1at10 ·on
ex 1 an d Xi-axes (Figure 4.4a ), this. correlatt
coMPONENT ANALYSIS
..1(1PAL
p~I,,
151
. es when the points are plotted in the
11a111sh 4 4b ). fra:rne of the Y a d
f1gure · . . 1 n Y2-axe
( CA as here descnbed IS often used a s
p . . s an ordin .
. aJ work. Such an ordmation is a "success" .f ation method in
¡ogic . . I a lar . eco-
dispers1on (or scatter) of the data Is par ll . ge proportion of th
rot al . a e1 With th fi e
. cipal axes; for then this large proportion f h . e rst two or three
pru1 . . li b . o t e mfor .
. the original, unvISua za le s-dunensional d t mation contained
J1l a a swarm b
Spa ce or three-space and examined This · h can e plotted in
1wo· . · Is w at d. .
s out to acbieve: the data swarm is to be pro· t d or mation by PCA
se t . . ~ec e onto the t .
·onaJ or three-d1mens10nal frame (or frames) th t wo-dunen-
si a most clearly 1 the
real pattern of the data. When three axes are retained . reveas
. . . . ' as IS very often do
the result IS shown m pnnt either as a two-dimensional . ne,
· · f hr . . perspective (or
isometnc) drawmg o a t ee-dimens10nal graph or el .
. . . ' se as a tno of
two-dimens10nal graphs showmg the swarm projected onto the
Y1, Y2 p1ane,
the y¡ , y3 plane, and the Yi , y 3 plane, respectively.
The statement that such a two- or three-dimensional display of the
riginal s-dimensional data swarm reveals the real pattem of the data is
tuitively reasonable, but it is desirable to define more precisely what is
1
·~
p
eant by "real pattem." The observed abundances of a large number of
ecies co-occurring in an ecological sampling unit are govemed by two
ctors: first, the joint responses of groups of species to persistent features
f the environment; second, the "capricious," unrelated responses of a few
dividua} members of a few species to environmental accidents of the sort
at occur sporadically, here and there, and have only local and temporary
ects. In the present context the joint, related responses of groups of
· · " nd the
ec1es constitute "real pattem" or "interestmg data structure, a .
·. " . ,, (This is not to say that m
Pnc1ous, sporadic responses amount to no1se.
· they produce may not
er contexts, environmental accidents and the nms~ Gauch (I982b) that
ª researcher's chief interest.) It has been shown ~ . · ly a few
1 · f 0 rdmat10n, m on
PªYmg the results of a PCA, or indeed 0 any ely pennit an
e · d re than mer
. ns10ns (typically two or three) oes m? . d· it also suppresses
isualiz bl .
. a e s-d1mens10nal pattern
ise ''
.
f
° ..
t be v1sualize '
pnncipa1 co
mponents of t e
h
d
· This is because the first ew fiect the concerte
ª--those with the largest variances-nearly always re of species (hence
on · When a group 1t of
ses of groups of severa! spec1es. lik ly to be the resu
e · · · un e · do
/ºus mdividuals) behave in concert, It IS f t that manY species ~
izect, temporary "accidents." Moreover, the ac s of the environrnen
ect . t" feature
' respond in concert to the "unportan
(
~I
152 .
hole contains redundanc1es; therefore
t body as a w h ... · , the
means t ªh the . xes nee ed to display t e mterestmg
data d . structure,, of
num ber of coordmate ª
h s the to ta1 number of spec1es observed.
the ad ta is far Iess t an '.
. . dinat10n pernn·ts us to profit
. from. the. redundancy.lil
To sununanze. or d cy not much mformat10n 1s lost by rep
of redun an ' . . re.
fi eld data. Because . t ·n only a few d1mens10ns. And the discard d
of data pom s I . . e
senting a swarm . d d axes along which the vanances are srnall).
. formation (on the d1sregar e is
m . (Gauch 1982b). . . . .
mostly n01se h0 d of domg '. a PCA ordination descnbed . m this sect10n can be
The met
. d. .
nous ways as shown in the next sect10n. An example of its use
mod1fie m va . 1 described may be found in Jeglum, Wehrhahn and
· the way prev10us Y . . ..
m
Swan (1971). They samp led the vegetation m vanous . commuruties
. in the
boreal forest of Sas k a tchewan and ordinated theIT data and vanous subsets
of it using PCA.
The method given in the preceding section for carrying out a PCA can be
modified in one or both of two ways.
First, one can standardize (or rescale) the data by dividing each element
in the centered data matrix XR by the standard deviation of the elements in
its row. The resulting standardizied centered data matrix ZR then has as its
(i, j)th element
as we saw in Chapter 3 ( 10 .
(l/n)Z Z' . ~age 7). The SSCP matrix divided by n [i.e.,
now ca R. da] is the correlation matrix (see Table 3.6, page 112). The PCA ~
rne out by domg a · . . . d
of the covanance
· .
matnx. n eigenanalys1s of the correlation matnx mstea
The seconct modification co . . .
analyzing Cl/n)X X' nsists m usmg uncentered data. Instead of
O R a as was do · s · XX'
f course' both th ese modifi ne. 1Il ection 4 .2 , one analyzes (l/n) ·
one can analyze the m t . cations can be made simultaneously. Thus
~efore
vers1ons f p
discussing ~~ (dl/n)ZZ' in Which the (i, j)th element is
a vantages d . · us
x/ª;·
. °
two-d1rne . CA, we eompare the an disadvantages of these. dvanto a
0
th ns1onal swarrn of 10 d resuits they give when apphe
e columns 0 f ata po· . are
Data Mat · lllts. The coordinates of the pomts
nx #10 · tbe
given at the top of Table 4.4. In
RfNT VERSIONS Of PCA
o ptfff
fOU" 153
separate sections in the lower part of the table are given, for each of the four
forms of PCA: (1) the square symmetric matrix to be analyzed, (2) the
matrix of eigenvectors U, and (3) the matrix of eigenvalues A.
The results are shown graphically in Figure 4.6. It should be noticed that
the effect of standardizing the raw data (as in Figure 4.6b) is to make the
Variances of both sets of coordinates equal to unity. Thus the dispersions of
the P0ints along the x / a axis and along the x 2/ ª2 axis are the same.
1
Standardizing the data t~erefore, alters the shape of the swarm; after
standard·IZation,
· the swarm ' IS
. noticeably
. less e1ongated th n it was before. ª
PCA Using a Correlation Matrix
Anaiysis 0 f ' . forro of PCA that is
frequ
enuy e
the correlation matrix (1/n )ZRZR IS
. 1 lit ture As as
ª
h already been
expJain ncountered in the ecologica era · d di ed centered
ed, the correlation matrix is obtained from the stan ar z
154
'h.'
• •
(a)
• ~-- ·-)· ····
•
. ··········· ......... y,
---+-----,=
75 00:- ::.X: I
y;'
.
,/ Y1'"
• .............·
(b)
•
•
1 2
4
Figure 4.6. Four versions of PCA applied . to Data Matnx · #10 (see the 4.x¡,4)x,
alon Table · ( axes. U•
) Unstan· ª
dardized data. The raw, nncentered coordmates are measured gd PCA shifts the ongJJ
centered PCA rota tes the axes in to the solid lines labeled YÍ, YÍ- . Cen teredotted lines y , y,. 1bl
PC~
to the centroid of tbe swarm (marked +) and rota tes the axes mto the the x ¡o,,x,/•1
1
Standardized data. The nncentered but standardized data are measured along d shifts th<
axes. Uncentered PCA rotales the axes into the salid lines y{', YÍ'· Centere
origin to the centroid and rotates tbe axes into tbe dotted lines y{" , y{" ·
Contributing to y2 :
. e the negative coefficients of z16 and z17 ) have the highest second
' (nouc
dr~ . al component seores. Thus we can draw the two-dimensional coordi-
princifpame shown in Figure 4.7a and label the four regions into which the
nate r . d . . .
divide it w1th a two-sentence escnpt10n of the climate: the first
atesten ce puts into words the meaning of high and low values of the first
sen
. ·pal component, and analogously for the second sentence.
pnDCl
Axis 2
(a)
- - - - - - - - - + -- - - - - - --Axis 1
Axis 2
(b) 5
•• •
• • •••
••• • • ••
••
+• •
• 8
-5 +
• +
+
•• º~~ 5
Axis 1
o
++ ++ + o
+ ++ o
+
+ o
+ o
o o o o
+ + o o
+
-5 o o
The great majority of ecological ordinations are done-with centered data but
this is not always the most appropriate procedure. Sometimes it is preferable
to ordinate data in their raw, uncentered form. The reason for this will not
become clear until we consideran example with more than two species, an~
ª
?ence data ~warm occupying more than two dimensions. First, however, ~
is worth looking at the results of doing both a centered and an uncentere
PCA on the same, deliberately simplified two-dimensional data swaflll
chosen to demonstrate as clearly as possibie the contrast between the tWO
methods.
eons~'der Figure 4.8. Both graphs show the same seven data ~ oíntS ·nal
plotted m raw form in the frame defined by the X1 and X2-axes. Tbe ong1
DlfffRENT VERSIONS OF PCA
fOLJR 159
(a)
( b)
:X:
"•......... •
.. ..
)('
,
,
·-_,
'',,,
/ "
""" y~
Figure 4.8. Bo~ grap~ s~ow a row of seven data points plotted in an x 1 , x 2 coordina te frame.
(a) The dot~ed lin;s Yi an,d Y1 are the first and second principal axes of a centered PCA; ( b)
the dashed lines Y1 and Y2 are the first and second principal axes of an uncentered PCA.
data matrix is
X =(~ 7
3
6
4
5
5
4
6
3
7 ~ ).
Figure 4.8a shows the principal axes (the lines y 1 and Yi) yielded by a
centered PCA. The intersection of these axes, which is the origin of the new
frame, is at the centroid of the swarm (which coincides with the central
point of the seven). The coordinates of the data points relative to the y 1 and
Jraxes are given by the matrix
7
L (Yli ). 2 and
i=I
10 • •
15 • •
20
25 xi
AXIS 2
•
10 •
(b) •
AXIS 1
10 • 20 • 30
AXIS 2
(e)
5
• • AXIS 1
10 • 20
-20
•
-10
•
•~1t~ters
p·1¡,,, 4 -5
~ng ª
el · (.9.b) A( plot
) Seventh data pomts
. m. tbree-space. Tbey fof!ll two qualitat1vely
. . d11ferent
·
(e¡ from an u
01
e pomts m the coordinate 1rame fonned by tbe first two principal axes
rrespond' e A. One cluster lies on axis 1 and the other very close to axis 2.
ine e ncenter d PC .161
0 mg plot alter a centered PCA. Both clusters lie on axis l.
162
o o
~) .
15 18 21 24
5 10 1 2
( ~
X= 10
o o o 8 15 10
L:u~
a=l---
L:u~
•1his .. PCA since the ~ata points
lbems efi!l!tion is not applicable to the axes of a centered a}ues are only of int.erest for
dl
or
[u~
a=l--- (ir ¿u¡ < ¿u:.).
[u:_
Let us find the coefficients of asymmetry for axes 1 and ~in Figure 4.9b.
An uncentered (and unstandardized) PCA of Data Matnx # 11 yields a
matrix of eigenvectors
Therefore, a = 1 for axis 1, since all elements in the first row of U are of
the same sign. This result is also obvious from the figure, of course.
For axis 2,
It is clear from Figure 4.9b that axis 2 should be treated as unipolar even
though three of the points belonging to the cluster on axis 1 have small
negative seores with respect to axis 2. It is these small negative seores that
cause a to be just less than l. Using Noy-Meir's criterion whereby a value
of a greater than 0.9 is treated as indicating a virtually unipolar axis, we
may treat axis 2 in Figure 4.9b as unipolar.
A clear and detailed discussion of the use of centered and uncentered
ordinations on different kinds of data has been given by Noy-Meir (1973a).
For an example of the practica! application of uncentered PCA to field
data, see Carleton and Maycock (1980). These authors used the method to
identify "vegetational noda" (qualitatively different communities) in the
boreal forests of Ontario south of James Bay.
Th~re are other ways (besides those described in the preceding pages) Íil
which data can be transformed as a preliminary to carrying out a pCA
Data ca~ be ~tandardized in various different ways and they can be
centered in vanou d'A" ' d ne
s Iuerent ways. Standardizing and centering can be 0
coORDINATE ANALYSIS
pRINCIPAL
165
ly or in combination. There are num
Para te erous pos ·b.li .
se t of the methods are seldom used by e . s1 i hes.
Ivfos co1og1sts anct
. book. They are clearly discussed anct e are not dealt with
~id ]lloy-Meir, Walker, and Wtl!iams (1975).
·n tlus . . omparect in N M .
oy- eu (1973a)
l. As a pr limi
Th e nary, note that ll
erefore, to keep th
left u
ª
e symbols as u 1
sununations are over the range 1 ton.
. .
subscrinstated.
t .
However, lt· is
. very ne
. Uttered as possible ' these limíts are
h
symbol b 1P s vanes each f important
une a sunun · to observe which of the
ow the L. Bear In
"'-,e,1. -_ e e + · oUnd f is done. I t is specified byh t es
. ation
11
c21 + · · · +e . ' or example that a sum suc
nJ (In Whi h ' lues
ª
e r takes the series of va
coORDINATE ANALYSIS
pRl1,..10PAL
167
is not the same as .L .e . = e
2 ... ' !1 ) . J rJ rl + Cr2 + . . . .
J•• tl1e senes of values 1, 2, ... , n ). + crn (m Wruch J
1a
k~ .
The coordmates sough t are to be Writt
2. . h . en as an
. h ach column g1ves t e n coordmates of n >< n matrix e .
ivhJC e . . . one of the . in
be centered. That IS, the ongm of th . Pomts. The points
are to . e coordmat .
'd of the swarm of pomts. Equivalently " es is to be at the
centro1 2 2 . 'LiJc, . =O f ll
3· The distance , d (J, k), between the 1
·th a Jd k or .ª r.
n th pom ts is
d2(J,k)=[(c.-c )2
r
rJ rk ·
2
d (J, k) = L (e;+ c;k - 2c,Jcrk)
r
= L:c;
r
+ L:c;k -
r
2LcrJcrk·
C11
c12
C21
c22 ••·
Cnl )
Cn2 r
C11
C21
C12
C22
. . . C111 )
C211
A=C'C = .... ............ . .............c..
e2n ···
reln Cnn C l n C11 2 •' . 1111
7. Notice, for later use, that L1ª1k = O. This follows from the fact that
since the sum E¡crJ is zero (see paragraph 2). Because A is symmetric, it is
also true that l,kaJk = O.
8. We now wish to find a1k as a function of d 2 (j, k). Rearranging
Equation (4.4) shows that
(4.5)
Ld
.
2
(i,k) = I:a
. n
.. +°"a
~ kk
-2°"a
~ lk·
J J j j
td2(J,k)=x+
j nakk whence akk = ni( Ld2(J, k) - )
X •
1
Likewise summi
, ng every term . ( ht
m 4.4) over ali values of k, it is seen t ª
L d 2 (J' k) = na .. + )
k 11 x Whence ajj = ~ ( L d 2 (j , k) - x .
Observe that " k
'--kªkk = t lall' Which
.
we have already denoted by x.
cooRDINAI t ANAL ni~
..,0 rAL
pRl ~1
169
. n (4.5) thus becomes
BquatlO
Q ..
}} =" 2
Í-J crj.
r
Therefore,
Lª11 =
J
Lj LC~· ·
r
(4.7)
Recall from paragraph 1 that the centroid of the n points whose coordinates
\\'e seek is to be at the origin. Therefore, L,/0 is the square of the distance of
the jth point from the origin, and L,¡LrC~· is the sum of the squares of the
distances of ali n points from tbe origin. This latter sum is equal* to (1/n)
times the sum of the squares of the distances between every pair of points.
Tbat is,
2
" " c 2. =
~~ 0
.!n "~ d 2 (J k) =
' 2n I:. l:d (J, k ).
2-
j r j<k J k
· h ll be considered only
(Th e form L. specifies that each pair of pomts s
1<k . d dz(k ·) which is the same, ª . .
once. The form L: ·L specifies that d 2 (J, k) an '1
shall both enter th/sum; hence the 2 in the denonúnator.)
11. Equation (4.6) now becomes
a. = _ _!d2(J k) +_!_~d2(J,k)
1k 2 ' 2n J
pul
i_s 2( .,·, k> ' 2n
i i./>2U, k >
2 1
+ _J_ ~0 2 () , k)
2n k
A = U'AU,
where A is the diagonal matrix whose nonzero elements are the eigenvalues
of A, and U is the orthogonal matrix whose rows are the correspondmg
eigenvectors of A; U' is the transpose of U.
Since A is a diagonal matrix, it can be replaced by the product A112A11:
in which the nonzero elements are · A. 1(2, A.1{ 2, ... , A.1t2.
Thus
A= (U'A1/2)(A1/2U) .
A= C'C.
Therefore
'
U'A112
-- e, and A112u = e
14. To find th .
A e elements of C · of
. Then the first princi al ' .we therefore carry out an eigenanalys1s
e~ements in the first r p coordmates of the data points which are tbe
e1g ow of e ' first
\l /~nvector of A). The , are obtained from A.1/ 2u' (u' is the
/\2 U'2 and second p . . 1 1 i . d fro.rJJ
' so on. lf as is nncipal coordinates are obtaine .
' very oft h d. auoil
en t e case. a two-climensional or 10
AL coORDINATE ANALYSIS
pfllNCIP
171
¡5 wan '
ted only the first two rows of
. . e need be
dinates of the n pomts m two-space . evaluated· they .
coor . Wlth the . ' give the
distance between every pa.tr of points a . po1nts so arranged th
t11e . . al pproximat at
tbeir dissi.mJ.lanty as e cu1ated at the outset. es as closely as possible
l. Calculate. the
. . dissimilarity
. between every pau. of d
some chosen d1sslIIlllanty measure. Denote by 8 . k qua ~a~s, using
between quadrats j and k. Put the squares of th ese(J, d1ss
) .the·¡ d1ss11nilarity
··
elements of an n X n matrix A. irm anties as the
2. Find the elements of the n X n symmetric ·
(4.8) which is matnx A from Equation
and
liere ;\1 and A are the first and second eigenvalues of A; (un u12 ...
uln) and (u21 u222 . . . u2n) are the respective eigenvectors.
172
2 1 3 4 5
2~ ) is Data Matrix # 12.
X=(¡ 9
10 14
8 15
8
The matrix of squared dissimilarities is
o 100 169 196 529
o 25 64 225
tl = o 169 400
o 81
Matrix A whose ele . O
ments are given by Equation (4.8) is
120.48 12.48 12.88 - 25.92
-119.92
A= 4.48 26.88 - 17.92
-25.92
74.28 - 35.52
- 78.52 .
23 .68
55.68
An eigenanalysis of A h 168.68
'
/\¡ , A , A A A _
s ows that its eigenvalues
· (to two decimal places) are
2
The first two ~¡ge4 , s - 310.77, 85.78, 4.50 O - 9 45
nvectors are ' ' . .
.
3
03
2
2~
-15 -10
Qf
-5
4o
.
4
5
15
Axis 1
-5
- 10
Figure 4.10. The solid dots are the data points (projected 0 t t
of data matrix X (Data Matrix # 12) in Table 4.5. Each np~n;~~sfaªbcel) dyiel~ehd by a PCO
· th 1 f X th t · e e w1t a number
deoo!Jilg . e co umn o a represents lt. The hollow dots show the sam
ed
uns andardiz , centered PCA · e data after
t
respective columns of X, and these are used to label the points in Figure
4.10.
It is now necessary to choose a dissimilarity measure for measuring the
illssimilarity between every pair of quadrats. Let us use the city-block
distance CD (page 45). Then the dissimilarity between quadrats 3 and 5, for
instance, is
CD(3, 5) = jx13 - x 15 j + jx 23 - x 25 1 = 18 - 231+114 - 91
= 15 + 5 = 20.
The dissimilarity between every pair of quadrats is measured in this ~ay,
5
and the squared dissimilarities are the elements of the 5 X matnx !:::..f
are o · a· al of !:::..
shown in Table 4.5; all the elements on the mam iagon '
course, zero since CD(j, j) = O for all J. . (4 8) For exam-
Next, the elements of A are determined from Equatwn · ·
ple,
- -=.Af)Q + 763 + .ill2 - 3~56 = - 78 .52.
a 35 - 2 10 10
·ght halves are
Sine . · ly their upper n
w. e matnces A and A are symmetnc, on
rttten out. ble and also its first
A·18 · nin the ta
t\Vo .
then analyzed • Its eigenvalues are give
eigenvectors.
\ ·. 11 yaliv . Tliis i111pli '8 Lliat it i:) ¡,
1h:at ",, IH i' ' 'l!iq~~.il,¡
11 will h ~L ~ 11,
. e11 1· J
. , . "1 spa<.; of u11y lllllllr)t-r o t un t1:-i1011)
·11 1·111g f¡yi.; 1
• JOl'ldS 111 ,1 1 C.I '
. .
1 ll 1lrlYl!
. . .1
CXr1Cl y lJi(.; Y'il
' 111 '• 11 \;'l1:1
' ' . JHll WISC (
1'11.' i'llKC:S
,1 ' ' '
:-;fia
• 'lJ1.;~, 1
These points are shown as the solid dots in Figure 4.1 O. For comparison, thc
results of carrying out an unstandardized, centered PCA on thc samc data
are also shown (by hollow dots) on the same figure.
It is interesting to compare the desired interpoint distances (the sq uares
of these distances are the elements of A) and the actual interpoint distances
in the two-dimensional ordination yielded by PCO. The two matrices to be
compared are shown in Table 4.6.
---
1
o Interpoint Distances'
10 13 14 23
o 5 8 15 o 9.8 13.0 13.7 23.0
o 13 20 o 5.8 7.7 16.0
o 9 o 13.0 19.9
o o 9.3
o
:?he elemcnts of A in Table 4 5
2Computcd.
5 using Pythagoras' · thare the sq uarcs of lhcsc dissi mil ari 1i c.<. ven in Wi
X matrix at the botto 0 fsT eorem, from thc coordinates of thc points which are gi
m abJe 4.5.
AL coORDINATE ANALYSIS
p~1NCIP
175
RA as a Form of PCA
It was pomted
· out li f
four different way eüar er that an "ordinary" PCA can be done in one o
s. ne choo fi Jeave
them uncentered Th . ses rst whether to center the data or 5
whether to standa a· en, mdependently of this first choice, one choose_
da a· r ize the dat
r IZe them, the eleme t .
bYthe st d
ª or leave them unstandardized (to. staJ1
·ded
n s m each · d1v1
an ard deviation f row of the raw data matnx are d tbe
data mat ·
nx may be left a11 the e¡ements in the row). In other wor s.daf'
0
un transfo d . staJI
rme or it may be centered, or
CAL AVERAGING, OR CORRESPONDENCE
RfCfpRO ANAL YSIS
..,..,
di ed ·
zh 'sen the next steps are the same: the dat
15· e o '
~ t e four po 'ibiliti1.:s'
e eve 0 f h
a matnx (
0
not) is postmultiplied by its transpose and the w1lether tr~ns~ormed
: analyzed (for examples, see Table 4 product matn · 1 then
e1gen . " . ·4, page 153) Th .
Onsists of a list of we1ghts" to be att h · en each e1genve1:-
wr e . ac ed to ea h .
" ores" (which are we1ghted sums of species . . e pec1e so that
se quantit1es) e b
for each quadrat. The seores are the coordinates of h ~n ~ computed
tbe or dm . ation · t e pornts m a plot of
RA differs from the four versions of PCA alread a· .
which the data matnx . is
. transformed before the eige Y iscussed
al .
111 the way in
.
. . nan ys1s, a.nd m the
way in which the e1genvectors are transformed into seo f h .
analysis. We cons1der . these two procedures in turn Theyres a dter t e e1gen-
. · are emonstrated
in Tables 4.7 and 4.8, which show the RA ordination of a 3 x 5 matrix
(Data Matrix # 13). The reasons for the operations will not become clear
until we attain the same result by "reciproca! averaging." Here they are
presented in recipe form, without explanation.
Since in RA seores are assigned both to quadrats and to species, the
procedure yields an R-type and a Q-type ordination simultaneously. (Recall
that an R-type analysis gives an ordination of quadrats and a Q-type
analysis an ordination of species.) In the following account we first consider
the Q-type part of the analysis which gives the species seores (Table 4.7),
and then the R-type part of the analysis, which gives the quadrat seores
(Table 4.8).
The data are not centered and they are transformed as follows. Each
element in the data matrix is divided by the square root of its row total and
by the square root of its column total. . d
As always, let the number of spec1es . (the num ber of rows rn the . ata X)
m· ( h ber of colurnns rn
atnx X) be s, and the number of quadrats t e num
ben.
s . h olurnn of X;
let c.=
J
'\""' x 1). . be the total of the Jt e
L.,.¡
i=l
1//25 o o o o
r: o
o
1//30
( 0.67082
l/~
o ) (15
0.11547
i
o
6
7
2
15
5
o
0.14142
2
o
8 2~)
0.04082)
o
o
o
o
1/fü
o
o
o
o
l/fiü
o
o
o
o
l/v'lO
o
o
o
o
1//30
-0~998).
1
1 2
V= /NUR - / = ( i.i02 0.929
1.669 -1.213 0.060
ROCAL AVERAGING, OR CORRESPON
RfCIP DENCE i'\N
ALYs1s
179
tbe (i, j)th element of the tran f
fbell s orrnect m .
atnx, say M .
X¡) , lS
m· · = - -
11 ¡r;c; with i = 1
,... ,s and i :::: 1
, . .. , n.
o o
R =
( r¡
~ '2
o)
o = (20o 30
o r3 O o sH
Next, note that
'1
-1/2 o 1//20 o o
R-1;2 = o '2
-1/2
oo 1- o 1//30 o
o o ,3-1/2 o o 1//50
(The reader should confinn, by matrix multiplication, that R- 112 R- 112 =
ernents , d s X s rnat
are the eigenvalues of P) an an
180
JN /r1 O o
V= U O JN/r2 O = /NUR-112. (4.11)
o o JN/r3
15
10
o3
02
1
- _¡_ - ~--....!_ Axis 1
1
1
- o~
0 .5 1.0 1.5
I '> 10 40
·~
-o5
.4
-10 .,
O!
.
l<tgurc . . Tfic, ~o ¡·i 0 d<>ls ,show the outcome of RA ordination of
411 . Data Matrix #13 (Tables
4 7 arid 4 H). ThL hollow dob show t~e same data after unstandardized, centered PCA; they are
plottcr.l in thl' planc of PCA axes l and 2.
As in the ordination of the speeies, the first set of seores (the first row of
W) consist~ of ones and is of no interest. The seores on the first and second
RA axes are given by the elements of the seeond and third rows of W. Using
thcsc seores as coordinates, the five points representing the quadrats give the
two-dimensional RA ordination shown in Figure 4.11 (solid dots). The
rcsult of doing an unstandardized centered PCA on the same data is shown
for comparison (hollow dots).
Computation of r2 . ª
Species
Seores
15 2 o o
2 1)
o
1.102
0.929
X= ( i ~ l~ 8 29 -0.998
beco mes
r
i
= _!_ L LX;¡·
N . .
I J
(0) - [
wj - X vCO) o
Wh . lJ 1 + x 2J·V~)+ ... +x Sj.uCº)]¡c
s J
. (4.16)
ere cJ is the j th colu
mn total of the data matrix.
185
v<I>(%)
o v<2\%)
2 2 1 iOO V(%)
15 64.0 (lOO)
9 6 15 o o 50 48.8 (68.9) 69.9 (100) 76.9 (100)
7 5 8 29 o 15.J (O) 59.5 (80.l) 72.J (91.8)
17. 7 (O)
33.3 37.5 20.0 3.3 20.9 (O)
40.9 51.7 20.0 3.3
45.3 60.l 20.0 3.3
........ .... ..
V o
1
91¡8 190
Row 2 of V - 0.998
0.929 1.102
w o 18.6
1
52.1
1
73.0 100
Row 2 of W - 1.240 0.772 0.070 0.597 1.276
The. data matrix is· shown above and to the left of the doubl e lin e. Success1ve
. a · ·
eCies seores are m the columns on the right, labeled v<º) v<l) S . ppro~mat~ons to the
uadrat-scores are in the rows below Iabeled w<O) w<l) ' '· · · · uccessive approximat10ns to the
' ' , ....
The procedure is continued until the vectors stab~ze (i.e., until any
o-ive unchanged results). The final results m the exampl
f ur th er stePs er . . . e are
hown by the column on the extreme nght m Table 4.10 (which gives the
final species seores as percentages) and the row at the bottom (which giv
the final quadrat seores as percentages). As is ~hown in the lower part of t~:
table, these seores are the same (apart from bemg rescaled as percentages) as
row 2 of v (in Table 4.7) and row 2 of W (in Table 4.8). Thus they are the
required seores for, respectively, a one-dimensional ordination of the species,
and a one-dimensional ordination of the quadrats.
The seores on the second RA axes (i.e., the third row of V and the third
row of W) can be obtained by a similar, though computationally more
laborious procedure. It is not described here. Details are given by Hill
(1973). The reader should confirm that if v<0) = (1 , 1, 1), then wC0J ==
(l. L 1, 1, 1). This is the trivial result mentioned on page 181.
We now show the equivalence between the reciproca} averaging proce-
dure just described and the outcomes of the eigenanalyses of matrices P and
Q in Equations (4.10) and (4.13).
Suppose reciproca! averaging has been continued back and forth (rescal-
ing the species seores as percentages each time) until stability has been
reached. Then we can rewrite Equations (4.17) and (4.16) (in that order),
dropping the superscripts in parentheses. Thus (4.17) becomes
V = R-1 X (4.20)
(sXl) (sXs) (sXn) (n~l)
and
w
(n X 1)
=
(
c-1 X' V •
(4.21)
nXn) (nXs) (sXl)
CAL AVERAGING, OR CORRESPONDEN
RforRO CE ANAL Ys1s
187
_'tuting the right side of (4.21) for the .
subsll w in (4.20) gives
V = (R -lX)( c-1X'v)
now operate on (4.22). Notice that . (4.22)
We b . matrices I
mu lt iplied as may e. convement, provided the1r. ordernay. be factored or
ntheses are put m wherever they help to ak is never changed.
pare h . m e the st
der should check t e s1zes of the matrices at eps e1earer. The
rea . every step to b
multiplications are poss1ble. e sure that all
First premultiply both sides of (4.22) by R1;2. Then
VR.112 ex U (4.28)
where U is the s X s matrix whose rows are th e eigenvectors of P. Postmul-
1 2
tiplying both sides of (4.28) by R- 1 shows that
V ex UR - 112 (4.29)
The methods of ordination di.s cussed in this chapter so far (PCA, PCO, and
RA) are ali achieved by projecting an s-dimensional swarm of data points
onto a space of fewer dimensions. In the simplest method (PCA) the
coordinates of the points before projection are the measured quantities of
the s species in each of the n quadrats; centering and standardizing the data
(both optional) merely amoun t to changing the origin and the scale of
measurement, respectively. In PCO and RA, the measurements are adjusted
in a more elabora te fashion (as described in Sections 4.4 and 4.5) before the
swarm is projected onto a space of fewer than s dimensions. But, to repeat.
the final step in all these ordinations consists in projecting a swann of
points onto a line, a plane, or a three-space.
It is obvious that whenever such a projection is done, there is a risk t~at
the original pattern of the swarm will be misinterpreted; this risk is the pnce
that must be paid for a reduction in dimensionality. We now ask wh~ther
projection of the swarm is likely to produce a pattern that is posiuvelY
. 1 d" baS a
~s ea mg. The answer depends on whether the original data swarrn
!mear or non linear structure. ·
Figure 4.12 demonstrates the d1fference.
. · m~
The three-dimens10nal swar . .
th 1h . . 10 rd111a
. e upper pane as a linear structure; if a one or two-dimens10na
hon 0 f th . . . plaJle.
e swarm were done by proJectmg the points onto a line or
Lli,~
.. ic¡\R AND NONLINEAR DATA STRUCTU RES
189
(a)
.. ....,..,.,'
,."',,,.
------
.,-,:
....
,/.
,,~·
""----- ----
--~-/
( b)
Figurethe4.12.
case Linear
hollow(a)dots and r( b). n_onlinear
are the . data swarms (solid dots) in three-space. In eacb
coordinate frame. p OJectwn of the swarm onto the two-dimensional "ftoor" of Lhe
the result
data wouldwould
b be sausfactory.
. .
Sorne of the mformation in the original
anothe e lost, of course, but the positions of the points relative to one
each othrwould
. b e reasonably well preserved m . the sense that p01nts
. clase to
each oth:r m the original three-dimensional swarm would remain clase to
Th ~ m the one or two-dimensional projections.
obvio e spual swarm m · the lower panel has a nonlinear structure. There is·
Pro;e ~Yoo
1 way of orienting a Jine or a plane so that when the swarm is .
ªPprJ cted
· ont 0 it. the relationships of ali tbe points to one another are even
0
~;:mately
nto preserved. For instance, suppose the swarm were projected
e floor of the coordinate frame; it would be found tbat the pomts at
190
each end of the spiral, which are far apart in three-space, would be
.f h d. . 1 . el ose
together in two-space. In d ee d , 1 t e two- lffiens10na p1cture were the '
available representation of the swarm, it would be impossible to . :~ly
whether its original three-dimensional shape had been that of a spi~ ge
a'1 a
hollow cylinder, or a doughnut.
It should now be clear that ordination by projection, for example b
PCA, PCO, or RA, although entirely satisfactory if the data swarm is linea;
may give misleading results if the swarm is nonlinear. 1t is sometimes said
that PCA, for example, gives a distorted representation of nonlinear data.
This is a misuse of the word " distorted." The picture of a many-dimensional
swarm that PCA yields is no more distorted than, say, a photograph in
which both distant and nearby objects appear.. One would not call such a
picture distorted because the images of a distant mountain peak and a
nearby tree-top, say, were close together on the paper. In the same way, the
circle of points on the ft.oor of the coordinate frame in Figure 4.l2b is not in
the least distorted. But it is rnisleading. What we require is a method of
ordination that deliberately introduces distortion of a well-planned, spe·
cially designed kind, that will correct the misleading impression sometimes
given by truly undistorted data.
Various methods of ordination that achieve this result have been devised.
They are known collectively as nonlinear ordination methods. A note on
terminology is necessary here. The contrast between linear and nonlinear
ordination methods is that they are appropriate for linear and nonlinear
data structures, respectively. The term "linear ordination" should not be
used (though it occasionally is) to mean a one-dimensional, as opposed to a
two or three-dimensional, ordination. The term catenation , suggested by
Noy-Meir (1974), is a useful and unambiguous synonym for " nonlinear
ordination."
W e now consider how nonlinear data swarms can arise in practice. Then
a good method of ordinating such data, known as detrended correspon·
dence analysis, is described.
·· ese rcipenie .
F1 gure .La giYe a diagrammatic portrayal of such a coenocline. Each
i.·Te repre~enL ne species. the horizontal axis measures distance along the
~3. ·em. and the height of a particular species' curve above this axis shows
the . ·~y the species responds to the varying environmental conditions along
íhe ~a dien t
·:-o · ima~e that the coenocline is sampled by placing a row of quadrats
· 'd ) Will the
- a ed at equal intenrals along the gradient (up the mountainsI e · .
f data points) be linear
"sultant ··data structure" (the shape of the swarm 0 .
h 0 f the segment of gradient
N nonlinear? The answer depends on the lengt
th .
ac 15 sampled. . ntains the peaks of one or
Suppose the sampled segment IS long, and co d monotonically
m . , .es do not respon
ore speaes response curves. These speci . th y do not increase
to th nt· that IS, e
e gradient oYer the length of the segme '. h they first increase
contmuouslv. or decrease continuously, along it. Rat er,is nonlinear.
anct then d;crease As a consequence, the data structurent of the gradient
. h rt segme . h
But if samplin º is confined to only a s o h species present w t .e
1f' 0 of eac · lf IS
tgure 4.13b) over which the response curve h data structure itse
s gm . li ar then t e
ent is at least approximately ne '
.ªºain that results
\, · ªPProximately) linear. onlinear data swarm a sampled
e now consider the shape of the n d crease, aiong
hen . and tben e
severa! species first IDcrease,
192
(a)
(b)
.. ·~.
IENT
1
-~
4 4
t:i
•
10 •
12
A 1 2
\
\
9 'q
\
\ _ .,..Axis 1
1
••
10
12 / ~
o
¡';
one would like the result of ordinating the quadrats observ d .
. h li e a1ong a 11
grad1ent to ave a near pattern themselves, in the present near
. . case to for
more or less stra1ght row m two-space. But they do not. The d h ma
. . . . as ed curve.
Figure 4.l4b, which IS almost a closed loop, gives a misleadin 1"d r
. h h. . . g ea of th
grad1ent even t oug 1t 1s an und1storted picture of the data swa S
. . rm. uppo
one were to ask for a one-d1mens10nal ordination of these dat
· o f t h e quadrats along axis 1 turns out to be
ord ermg ª· Th
3 4 5 2 1 6 7 12 8 11 9 10
'
an ob~iously meaningless resu!t. Bu~ if ordination by PCA (or by RA or by
PCO) 1s performed on data w1th a lmear structure (as in Figure 4.13b), the
result accords with what one intuitively expects; for an example, s
Exercise 4.10.
It might be argued that the PCA ordination in Figure 4.14b would not
mislead in practice. The points are numbered according to their position on
the gradient and can be joined, in proper order, by a smooth curve. But it
should be recalled that this is an artificial example with only three species.
Given field data with many . species, one always has to project the data
swarm onto a space of far fewer dimensions than it occupied originally; ·
the swarm is a "hyper-coil" (a multidimensional analogue of the dashed
curve in Figure 4.14b ), then when it is projected it will automaticall
"collapse" and yield as meaningless a pattern in the line, the plane, or
three-space as the one-dimensional ordering of the artificial example listed
previously.
The problem is, of course, compounded when the gradient sampled is Iess
obvious than that of a mountainside, or is not even apparent at all. Indeed
if environmental variables such as soil moisture, soil texture, and the like are
varying haphazardly in space, there may be no gradient in the ordin~
sense. Then the quadrats will have no particular ordering befo~e an ~aly~~
is carried out, and the purpose of the analysis is to perceive theu ordenng
there is any) and diagnose its cause. . t to
What is required, therefore, is an ordination method that is not ~ubJe~aY
the arch effect, one which will ordinate a nonlinear data swarm in tion· ª
that clearly exhibits iri one, two, or three dimensions the true interrela
ships among the quadrats. oti·d
A The s
Let us first consider whether RA is an improvement over PC · and
curve in Figure 4.14b shows the RA ordination of the same 12 quadrats
D NON LINEAR DA TA STRUCTURES
~¡N(.4~ AN
195
s the sort of results that RA is f
resen t . ound t .
reP It is an 1mprovement over PCA in ° Y1eld in practice .
da 13 · ated. However, the effect is still p
gger
t~at
resent m .
the arch A' wi~h real
euect is 1
~,x d s not give an erroneous ordering
3
.nuld form, and 1h ess
il oe . h d. . on axis 1 . a t ough
·ngless pattem m t e uection of axis 2. , it still pr d
mea.fll b . , a true o uces a
? quadrats would o v10usly consist of a row f ~epresentation of th
1.- t . o equ1sp d e
axis. 1 with no componen. on axJ.s
. 2. Also, there is . a co ace . points along
h end of the grad1ent: the pomts at the end ntraction of scale at
eac . d . . . are more clo 1
ose at the nuddle an this vanation in spacing d se Yspaced than
th . . oes not corr
var iation m the steepness of the enVIronmental d.
gra ient espond to any
Therefore, although RA <loes better than PCA · . :
. 1 . hi m g1vmg a true repr
tation of the mterre at10ns ps of the quadrats th . esen-
. . . . ' ere is room for further
iJnprovement.
. One
. ·way of achievmg this is to use detrend d
e correspondence
analys1s, an ordinat10n method that we now consider.
RA
X
X
o
DCA
•
.. •
• •
•
x•• • o o o
• • e
•
X
X
• xX •
e X
( X
X ru~
x f the Athabasca )
lakes in the valley o d Nitelfa
RA and DCA ordinations. ~f 37 oxbow. angiosperms (plus Cha~a ~ (•). those
or~inated on the basis of the c~~:t~ru;:;sd~sfti:i'::s~~d T~~a~~Xº~dinaUa~ ;'
Fi ure 4.15.
those with
growing m th~m.
Three cl(a~e)s
d those with neither spec1es (X). [ kindly provided by .
with Triglochin manttma ' an . . f the same data was
. a- (1984) . The RA ordmation o
adapted from L1e11ers
Lieffers (pers. comm.).]
NS ANO CONCLUSIONS
.Ap,ARIS
eº''' º
197
,..111ple sites by RA and by DCA are h
e sai.. . s own ·
!l1 of the figure. D1fferent symbols hav b 111 the upper a d
ane1s . e een used f n lower
P whether they contamed Typha latz:r ¡· or the sites d
. g 011 10 za (whi h epend
in ) Triglochin marítima (which thrives i . e cannot tolerate s lin-
water , n satine w a e
ecies never occurred together. The RA . ater), or neither th
tWO sP ord1naf , e
the arch effect and the scale contraer ion clearly exhibits
bo th d . ion effect d
. pear when the ata are ordmated by DCA ' an both effects
d1sap .
Ali four . of the. ordination methods described in this ehap ter have ment. m .
appropnate crrcumstances.
Three of the m~thods are suitable ~or data with a linear structure, and
such data are obtamed very frequently m ecological studies (van der Maarel
1980). PCA has the merit that it is the most straightforward, conceptually:
of all methods; it allows the user to look at a visible projection of a
multidimensional and hence unvisualizable swarm of points. In addition,
uncentered PCA aids in the recognition of distinct classes of quadrats (see
page 162). PCO sometimes allows one to construct and inspect a data
swarm in which the distances between every pair of points corresponds
(approximately) with sorne chosen measure of their dissimilarity. RA pro-
vides simultaneous ordinations of quadrats and species.
With nonlinear data, DCA is the ordination method at present fav?red
. . . d that it removes two likely
by the maJonty of ecologists. lts a vantages are .
. d t ordinated by ordmary
sources of error that arise when nonlinear ªªt" effect Its e ec is that
are d f t ·
RA, namely, the arch effect and the scale contr~c wn . · by deliberately
· . · lation that 1s,
tt gams these advantages by data mampu . ' tbe scales of the
ft · · 1 ¡ adJustments to
atterung the arch and by applymg oca . 1 artifacts devoid of
1 mathemauca
axes. lf these troublesome effects are tru Y . emove them. But
· . · 1 deslfable to r h
ecolog1cal meaning then it 1s obVIOUS Y etirnes lead to t e
o ' d "d fects" may soro
verzealous correction of suspecte e . f 1 ·nformation.
unwitting destruction of ecologically meamng u l . st but are beyond the
oh . nli ear data eXl d vised by
t er methods of ordinatmg no n . . method has be~n e 4 and
scope of this book. An especially pronus111g.b d in Noy-Meir (197al)t,erna-
She . · 1 0 descn e · g" or,
1
ª
b. Par~ and Carroll (1966); 1t is s " ararnetric 111appJJl d·fficult 1
than
/ efiy m Pielou (1977). lt is known. as pthernaticallY more
· " It 1s rna
ively ' as " continuity analys1s.
l~H
7,l,10,9 ,5,2,8,6,4,3,
and the ordering of the species is:
2,9,8,1,4,6,10,7,5,3.
Let the d ata matnx . gedin
be rewritten with the quadrats (columns) arran
th d h · · · the order
e or ~r s own m the_ first hst, and the species (rows) ~r~anged 111 l. The
shown m the second hst. The result is the lower matnx m Table 4.1
pattern or "structure" of the data is now strikingly obvious.
An example using real data is given by Gauch (1982a). x1 ·bit
11
An altemative method of rearranging data matrices in order to e en
their structure has been devised by van der Maarel Janssen, and LouPP
(1978), who give a program, TABORD, for carryin~ it out.
r
~ 1 ¡S
1\I
11)<)
1.J-' t t. NtllNli
f ,\JJ . s RUl UR
ns
¡ rRlN l --
- :irn rdcr~d dat ·1 matri ·
fhl' rtl\ '
Quadrnt
,
1 - ,l ' 4 5 (>
8 o 10
2
3 4 3
4 l
3 4 1
-
./ 1 4 2 3
5 l J 4 J 2
(1 4 l 3 .,
- 3
2 3 1 ..+ 3
8 3 1 2 2 3 -l
9 4 1 2 3
10 3 1 2 2 3 4
The sarne data váth the r w and e lumn" n~arrnng,ed
1 10 o 5
., 8 ó 4 _,'
-
., 4 3 2 1
-
9 3 4 3 1
8 2 3 4 3 1
1 1 2 3 4 3 1
4 1 3 4 3 2 1
3 4 3 2 l
6 1 2
3 4 3 '.!
10 1
4 3
7 1
2 3 .+ 3
5 1
2 3 .+
1
3
EXERCISES
4 , f thc e varinm:e
Consider Table 4.2. What are the eigenvnlue~
1
.l,
lllatrix yielded by the SSCP matrix R? \ = UX be the
4.2. L t tr1·x and let 1t1.tie
e X be a row-centered data roa ' CA \ hnt d 1 the quni
transformed matrix obtained by doine> p · ª
200
1
tr(X:X') and tr(YY represent in geometric terms? Why
)
. . Wou1
expect them to be equal? [Rerrunder: tr(A) 1s the trace of 0. You
i.e., the sum of the elements on the main diagonal.] rnatnx A 1
4.4. Refer to Table 4.4 and Figure 4.6. From the table, determine the
angles between: (a) the x 1-axis and the y 1-axis in Figure 4.6a ; (b)
the x 1-axis and the y{-axis in Figure 4.6a; (c) the (x 1/a 1)-axis and
the y{'-axis in Figure 4.6b; (d) the (x 1/a 1 )-axis and the y¡"'-axis in
Figure 4.6b.
4.5. Refer to page 164. What is the coefficient of asymmetry of axis 3in
the example described in the text? (Note: axis 3 does not appear in
Figure 4.9b because it is perpendicular to the plane of the page.)
4.6. Let A, M, N, and 1 ali be n X n matrices. A is the matrix whose
(i, j)tb element is given in paragraph 12, page 170. The (i, j)th
element of Mis - -!-8 2 (), k ). Ali the elements of N are equal to l/n.
1 is the identity matrix. Show that (1 - N)M(I - N) = A.
4.7. The quantities of two species in quadrats A, B, and C are given by
the data matrix
A B
X=(! 5
4
,32 =A 3.
Here r3 i the co r l · . 3 of V
(T bl r e ation between the species seores in row
a e 4. 7) and the quadrat seores in row 3 of W (Table 4.8).
201
Refer to Equation (4.27) on page 187 sh .
~.9. f r a RA ordination are related to ;he º:"'ing how the species seo ,
o e1genvectors 0 r res
Equation (4.12) (page 181). Derive the an J P defined in
. a ogous relat" b
the quadrat seores an d t h e e1genvectors of Q d ~on etween
(4.13). efined Jn Equation
11 12 13 14
17 19 21 23
X= 22 27 32 37
30 27 24 21
34 28 22 16
Oivisive Classification
5.1. INTRODUCTION
data set, th.ese quadrats are ,~kel~' to ~ave a strong ~ffe~t on ~he first round
of a clustenng process, and bad fus10ns at the begmmng w1ll influence
later fusions. The obvious (but, with one exception, impracticable) solutio
is to adapt the agglomerative methods so that they can be used the oth
way round. It is easy (in theory) to devise a method of classification
division that proceeds as follows. The whole collection of quadrats is fu
divided into two groups in every conceivable way, and one then judg
which of the ways is "best" according to sorne chosen cri terion. lf there
n quadrats, there will be 2n - l - 1 different divisions to compare with 0
another (for a proof, see Pielou, 1977). Having discovered the best possib
division to make at the first stage, the whole process must be repeated
each of the two classes identified at this stage, and then on each of the fo
classes identified at the second stage, then on each of the eight class
identified at the third stage, and so on. N ot surprisingly, the comput
requirements of such methods are so excessive that they are infeasible unle
n is very small.
However, there is another method (actually, a whole set of relat
methods) of doing divisive classifications that avoids these computation
diffi.culties. lt consists in first doing an ordination of the data (in any wm
one chooses) and then dividing the ordinated swarm of data points wi
suitably placed partitions. The procedure is known as ordination-spa
partitioning. The term describes a large collection of methods since one e
choose any one of a number of ways of doing the initial ordination, an
then any one of a number of ways of placing the partitions. Gauch (198
has reviewed the development of these methods. Collectively, they constitu
a battery of exceedingly powerful procedures for interpreting ecologic
data. They yield an ordination and a classification simultaneously, and th
classification, being divisive, avoids the disadvantage of agglomerative clas
sifications previously described.
lt should be noticed, however, that ordination-space partitioning is
much more "rough and ready" method of classifying quadrats than th
agglomerative methods described in Chapter 2. Even so, it is probabl
adequate .for most, if not all, ecological applications. And, as so ofte
happe~s m efforts to interpret ecological data, one is faced with tli
pere~mal problem of choosing, judiciously, one of a large number 0
poss1ble and only slightly different procedures.
In the following sections, we considera few representative methods.
iNG ANO PARTITIONING A MINIM
coNS rRlJ(T UM SP
AN N1Ne TREE
205
2 coNSTRUCTING ANO p
~ 1 N1MUM SPANNING TREE ARTITIONINC A
/\
1,¡ e
N G
lf) ') (' 11 \
w
13 .lb
w
o... H ?'t'
E
lf)
•,o
10
78
o
o 10 20 30 40
SPECIES 1
Figure 5.1. Thc points of Data Matrix #1 linked by their mínimum spanning tree. The
coordina tes of thc quadrnt poinl, (hcre Jabclcd with the lctters A, B, . . . , J) are given in Table
:U. Thc distancc bctwccn cvcry pair of joined points is also shown.
describe the procedure in such a way that it is applicable whatever the value
of s. It is still convenient to use Data Matrix #1 asan example.
lf the data swarm is man y-dimensional and hence unvisualizable, the line
segments forming the mínimum spanning tree must be found by inspecting
the n X n distance matrix showing the distance between every pair of
points.
The distance matrix for Data Matrix # 1 is given in the upper panel of
Table 5.1. It is identical with the distance matrix in Table 2.1 (page ~S)
except for two changes. In Table 5.1 the quadrats have been labeled with
. ' .
lette~s mstead of n~merals, as explained. And sorne of the d1stances the
· the
:r
matnx have been g1ven superscript numbers· these label the segments
. . . . , d TheY are
rrummum spanmng tree m the order in which they are foun ·
f . . oower
ound as follows. (The method is due to Prim, and is descnbed in
and Ross, 1969; Rohlf, 1973; Ross, 1969.) ·n the
The first segment ~f the tree corresponds to the shortest distance ~ Elt
table. The shortest d1stance is 2 2 = d(E H) the length of segmen bY
h . · · · ' ' . find,
t erefore, supe1scnpt 1 is attached to this d1stance. We next . ce
. t dista 11
s.ear~ hmg the rows and columns headed E and H, the shortes :; :; 3,6;
linkmg a third point to either E or H. It is the distance d(E, B)
RlJCflNG ANO PARTITIONING A M
coNlf INI MlJM SPA NNINC TREE
207
cur 1 CUT 2
CUT 3
CUTS 4 8 5 CUTS 6 87
I C G D 1 I C GAF D J
1
___ L
CUT 8
F' r co•FoJB
on of Data Matrix . the const (10º.0 f the dendrogram givmg
ti igure 5.2. St ages m# ruc . . a nearest-neighbor classifica-
cut !tves the complete d1- Each successive cut perrruts a division to be made. The final (ninth)
endrogram, which is shown in Figure 2.3b.
searchin~ ~uperscnpt
lherefore . 2 is attached to this distance. We next find, by
linlting a / e rows .and columns headed E, H, and B, the shortest distance
~~rth
~
therefore, pomt to _any of E, H, or B. It is the distance d(H, J) = 5.0;
d(B,H) perscnpt 3 is attached to this distance. Nouce that although
of th . .Ois shorter than d(H J) segment BH is not ad!Illss1ble as part
4
e nuni
not pe . mu spanrung. tree as ,it would
' forma loop BHE and Joops are
01
rnutted ·
ntinuing ts needed to
Co eteih
cornp1 · · m the same way• all the n - 1 ::::: 9 .segmenh nd in the
e tree are found. They are listed, with thelf Jengt s ª
208 DIVISIVE ClASSIFI(
ATION
order in which they were found, in the center panel of Table 5.1. Observ
that they were not a·iscovered m . length; some of th
· order of mcreasmg
.
later-found segments are shorter than sorne found earlier. Now that th
seg~ents have been found, it is easy to draw a two-dimensional diagrarn
matic representat· f h · · ;,, the
IOn °
t e ffilrumum spanning tree as has been done ~· .
bottom panel of Table 5.1. The segments are linked together in the order in
fRUCTING ANO PARTITIONINC A M
coNS INIMUM SP
ANNINC TREE
·a . 209
. h theY were 1 entified; they have b
,vJJ.lC f h 1 een assig
¡enª {hs with 1 or t e ongest up to 9 f or th hned ranks acc or d.mg to th ·
º o obtain. a nearest-neighbor class·fi
T
. e s ortest.
1 cation f
e1r
e it remams to cut the tree's segment rom the minimum .
1re , . . . . s, one af te spanrung
e largest. This partitiorung process h r another, beginrun· . h
th . as airead b g w1t
·aure 5.2. Of course, w1th a many-dimen . y een demonstrated .
flº .. s1onal d t m
be plotted and partlt10ned as was the tw~-d· . a a swarm which cannot
.. . tb d imens1onal s .
!h
e part1t10nmg mus e one on the d1'ag . warm m Figure 5 1
. rammatic · · · '
constructed as shown m Table 5.1. The dia mirumum spanning tree
gram can alw b
dimensions, regardless of the value of s 1 . ays e drawn in two
. . · n practice of
in the classificat10n can be done by comput . ' course, all the steps
Rohlf (1973). The foregoing description ofe:h: pro~ram has ~ee~ given by
met od explams Its princi-
p1es.
J
~ s
1<?
11
11
\1
----<{---- - -
\1
o--- 1 --- 11
1
b
.
F1gure · g 1he
5.3. A two-d · · al . . . m shOWJll
. . . imens10n ordination of a ni.ne-dimensional data swar brews.
rruru mum spanrung t (d h . ts on s
ree as ed line). The original data were skull measuremen dia-ercol
JI
co ected from 10 lo 1 . . frorn 11
'
· d . h . ca popu 1ations. F1ve of the populations (solid dots) are · d (circlcsl
is1an s m t e Sc11ly Isl d Th 1 r5Jan s
and Ca G . an s. e hollow syrnbols refer to four of the Channe 196 9.) fhC
. p ns Nez (square) ; J = Jersey; S = Sark. (Adapted from Gower and Ross, 1I )' OP
~s~_t ~ap . shows the locations of the two island groups (A = Scilly Is., B = Cha.Dlle s. '
ns ez is on the north coast of France , far to th e eas.t
,,
111
t tm 1 n1 ,r
unn tt. d . th tn\lsti •·111011 so that
IN N
ning thod
~)
y=
s (++ +
+ +
+ + +
Figure 5.4a shows the data swarm projected onto two-space; equiva;
lently, it is a two-dimensional PCA ordination of the data. The coordmate
of the points are given by the first two rows of Y. h
Matrix Y, in the bottom panel of the table gives the signs of 1 e
correspondmg. elements in Y, which is all the informat10n . reqmre· d for
carrying out the classification. To make the first division, consider the :~1
row of Y,. We see that A and B both have rninus signs, whereas C, D, E, Bl
F ali have plus signs. Hence the first division is into the two classes (A.ro·
and (C,
10
D, E, F). The division is shown diagrammatically in the first us den~
gram . p·igure 5.4b. That this should be the first division is l obvio
. ªso
I'
( 11)
,,
•1
,,,
(\ lt
1
•111 11
,, / /
1,
11
1 1
1
¡1 ,, , , ¡ l / ,, 1 ,, ,, 1 1
11111 , 111111
1
"" • 111111
1
1 1
! 11 1 1
1 1 ' 1
,, 11 1
li'ii,:11rc• c;A, < '1:1 ,• ,if1< .1111111 of .1 ·.1~ 11111111 d.lf.1 "' 1 11 ¡ l ,1f~1,1it 1 11 1111 lf111rl r 11 ¡ ,, l t/'11h111•11
~ 10 11 .il Pe/\ c11d111:1111111 cil 1111 p1111i1 •,, ( ¡,) 1111 ',I q111 ,,, , •11 ,llJí'/ 1,f ,,,, ,,.. , ,,,,,,,. ,,,íl
y· u1 l1rn11a¡•,1: 1111 1· al tli1; :,< :itlt r d1:wr:w1 <>11 :J1:I', í ¡1111111 1 f1:1, : ¡1 1 1',1111 1
collrdi11:111, :111d po1111 :, <', 1 ' :111d I ~ ha .;1, "' í')ilJ ¡1 '1' rl d111:i11 ¡,, I ¡, "' '• 1111, i
~l:l'{)Jld d1vi:,1 1111 ~. plil :, l:t, :, (< , IJ , J~, J ) ' I J11 tJ111 • 1 );1•, ,• , 11'1/I 1
1 1
,t , 1; 1111
1
,<I
I I/
214 '" '' / 1,,, ¡
. A B) (C '. IJ, n), íJIJ l ,, ., ;1 , ,Ji tU(IJ lll 11. ,• 'J1¡1J d1 r111t
are, thercforc, { • ' 11 1:, ,
11
in Figure 5.4h . ¡ '1
rd divi:m 111 w1 , tr11J h l 1, 1111 11J 1 ' , 111 ,,, J 1¡11 11 ,, ', ;¡ " 1
To do l 11 • ti 11 11111
. ,· · I (Thc )'COHl<,, 111<, l/llf>J1<.,¡d ,11111 ', C'H 1 ,f¡j j l,1 11, i.d1 1t,1l, 1 1
two-d1mcnswn.1 . ,1 ''"'
, ) l•roin iow 'i uf y ti 11, : ,(.,1 n l h:it, ' ''' f h1 lh1r 1:, 11 , 1 1,
stage howcv r. , •
F
and havc posttivc '()(JJdmal 1:, :1nd f) :w 1 I', 1i:J ¡1 i1 l''lf1; _, v,1,1d111:11i.; '.
We thcrcforc separa te tl1osc.., porn b l11:il, J1 :110 11111 ' ,ni 11~11, ¡,,, tfw /1r )1
time , namc 1y, (( ') f Olll (IJ • fl)' · 'I bi: 1'1' f',; , tli lfo 1tf Ji.,,,, 11 ,v1:11
, 11 11 , ¡ 1 ' irt
5.4b.
The two two-mcmbcr cla1->hcs p 1....t>cri l :1 l U1 1,, Uwd di 11 ',1 1111 : 11 1,, f,1,1r1 '.fiht íil
the fourth division : (A , B) splítH mlo (JI) ;wd (HJ , (J >, f '. J 1f11i '> irit1 , (fJJ íJrid
(E). Th.is is clcar from row 4 <Ji' Y , but 11 1 111 11 <...íJJ 111 <, t ~1c Vl',1:1lw,;d
geometrically sincc wi:, are cor1c(.;.rncd wit11 Oi e ,,,,r,,d in: ti.,1, 1,f IJ1L. f' ,ínL <in a
fourth axis perpendicular to thc <>Uicr lhn.;c.
Observe Lhat points that hlJV0 bc<,,'>rrn; 1cparnl(;,d ;1t f1{J CMIJ •,tfJgc; 1,! th1;
classífication cannol beco me f(jUflÍt0<l at (j lrJt(.,f ',tíJf.~1;. I (JI 1....í'.(JffJplc, B llfJd e
become members of a diffcrent dat;s at lb(.; fw,t di v'J ', Jf1n r, <.. .<..<JIJ ,e tr1<.;J íHc on
opposite sides of the ccntroid ali muJ~urcd <Jl<mg íJ1I '1 J. 'f h1.; ftJ et Ü1llt thcy
are on the sarne side as mcasurnd al<mg a1c11 3 :-:ind 4 (',rx.. r<m , 1 &nd 4 <11 Y,J
is irrelevant.
In the final dendrogram) the hcjght:, <Jf thr.,, fJrf> t, '">CVJIVJ , .. , n<1dc~
(countíng from the top <lownwarchJ ;;irc pr<1p<)rti<Jn(l.J u, ht- fir&t, \ C( /)n<l , ... ,
eígenvalues. Hence the height ()fa n<Jdc ._,h<>W'> thc rclativc Jcngth <Jf thc {JXÍ ~
that was "broken" at the <livision Jorming thc n<Jd0.
There is) of course, no necd to ccmtrnue thc ~ ubdiv1\J<)n prrJce~1i right to
the end) leaving every individua] point (quadrat; J)(Jlat0Cl rn <1 <JflC~memher
clus~e~. One us~ally wishes to ~top at a ~tagc that Jea ve'> " real" clu~ters
undlVlded; for mstance) in thc examplc in f igun 5 4 onc m1ght regard
(A, B) an_d (D, E) as true, natura] el u~tcr~ an<l trcat cla~'>i~cation a~ complete
at the third stage Th 1·s · . · . . . f' , tion:
. . . · ts one of the ad vanta''C'>
6
<Jf a d1 vi &1ve class1ica
the subd1vis10n pro h h ali
"t 1 ,) cess necd not be contrn ucd bcyond the stage at w JC
ru Y separate cluste h· b · d , ntage
. h .. ' fS ave CCn ~Gpé:tn:1tcd. rf he aS' ociated d1sa va .
is t at a dec1s1on must b d d fined.
equivalentl e ma e a.~ t<J h<>w a "real" el u&ter &hall be e f
Y, a rule has to b j · uence o
subdivisions sh ]d e < cvi.se:;d fr>r dcc1drng when the seq re
ou stop Such , 1 .
several possible e .t . ·' ª ru e Jh unav<Jidably arb1trary an d there a
. d far
n ena for <lec·1d' r
enough (Pielou, 1977) Disc .. ing w 1cn ~ubdiv1)ion has een . beyond
b carne
the scope of this book. . u~~1on <;f lhesc V>-cal lecJ stopptng rules is
Noy M · t' 1•' hlio11 UJ M ·lito 1
1111
111 "''1111d ¡d .11 h · 111. ti 1.i 1101 q111l1 111 1.11rq1I!,,. A, w1U1 L0Jkc,v1tch ':)
llllllirnl 1111 d,111 u< lt1 f 01d111nkd (w1ll1 11<1 11;,d11 licw 111 dírn e.11 i-,irniaJity)
liy I' A, lit< p1111 q1,tl .ixt, 01 tlll;JJ l11ok1;11 111 Lw1¡, cm aflc.r anoLhcr ,
~1:11 1111 • w11l 1 1111 111 .1. r >Y M 11 s 1111 ,ll1qd ddJl-1 :) i11 th 0 way 111 which thc
"111 al po111t ' 1 lio.111 101 ;i 11 111 ;;iY. . lt Jt, :,() pl ;1 ·d ;1~. t<J rnak0 tlic ~ uro
ni tl11 (w1lli111 1•1q11p) v: 11ia11 '} , qj tlic; prir1<,., Ípí1I wnp11n0nl :.,i.;orc~ of thc:,
¡11<111 p. 1d p11111 I., 111 1lli 1 :.id ()1 111 · IH erik p()llll a:) :,JTl<lll a:, pof,:.,ihk.
111 lh11d '1111 .11 p' 111 11 11 ,., 1 10 ·:-,b :ir<,., 1hown JrJ ful! in 'I íi hl c ) .1.
hu li ;1x1·, 1 , ¡ 1111 ~ • 11 111 ¡ 11111 • L<1 ti p<1:11)1hl · hr e;1Y. p11inl aloug (.;rHJ1 axí:., is
l1 \ l1 d 111 111111 ',() lli:it lli . l.orr ·(.;.t pC1ínt trJHY b0 dd<.;rmrnt;d . l«1r in :-, tancc,
Cllll id11 1111 IJl',I a,1,1 ,, arad '" . (;1->U lt of hn;akrn'' Jl IJ dWf.)511 r<JÍnl:-. J~ ancJ L.
'1111 ·,1 , 11 ,
., ,,¡
¡,,, ni 4 p()trJL (A , B, <:, and IJ) lo th i.; lcll th1 :-. br(;ak
1 ,,¡
pq1111 :111
%.~ , 31 8, ~~().,) , J 1 .8 )
f ~ 11 ?.. l .'J8
1 \ ¿y,2
1 l
CLASS CA"nor"' crr
----
3
TABLES.· D
-
(J
BY NOY-MEIR'S METHO .
First Break . . .
The seores on the first pnne1pal ax.is are
A B C D E J
Point: - 36.8 - 33.8 30.5 11.8 7.5 20.9
Seore:
Within-Group Variance Sum of
Break Right Group Variance~
Between Left Group
o 608.17 608.17
Aand B
4.50 104.31 108.81*
B and C
1445.46 46.81 1492.27
C andD
1121.98 89.78 1211.76
D andE
Eand F 883.97 o 883.97
The smallest sum is marked with an asterisk.
Malee the brealc between B and C.
Second Break
The seores on the second principal axis are
Point: A B C D E F
Score: 0.3 0.8 -14.3 -11.5 - 7.2 31.9
with variance
6
1 {
n2 - 1 -~Y;2- ;¡ 6 Y; )2}
1 (L = 89.78.
1-5 2 i=5
The sum· of· these two variances is 1211 ·76 · (It sh ould now be clear how ali
the entnes rn the table are computed.)
lt ~s se~n. that the smallest sum of variances is obtained when the break
on this
ºfi axis
· is· made
· between points B and C. Hence the first d.lVlSlOil
· · Of th e
elass1 cahon is ~~º. the classes (A, B) and (C, D, E, F).
The second d1v1Slon is made by breaking the second axis. In this case the
break _comes between the points E and F. Therefore, the three classes
recogruzed after the second division are (A, B), (C, D, E), and (F). Likewise,
th~ four classes a~ter the third division are (A, B), (C), (D, E), and (F). The
ultimate step, which needs no computation, is to break A from B and D
from E.
gation, and _the~ drawing the_ partitions that yield the cJassification directly
on the ordmahon scatter. diagram.
. To complete the representation, the
classification dendrogram is g1ven as well.
Figure 5.5 shows the result of an ordination-plus-classification of vegeta-
tion. It is adapted from a figure in Marks and Harcombe (1981). The scatter
diagram shows a two-dimensional RA ordination of 54 sampJe plots repre-
senting the range of natural vegetation in the coastaJ plain of southeastern
Texas. The data matrix was also classified, using the TWINSPAN program,
and gave the classification dendrogram shown asan inset on the graph. The
four groups <?f sample plots separated in the classification were then
outlined and labeled on the ordination.
• •
• • •
........•·:•,.
•
•
C\J
Vl •
>(
<{
<{
Q:
0
p PO HP F •
RA Ax i s 1 b
TW INSPAN classification of data on t e
Figure 5 5 A two-dimens10n · al ordin ation. and ª
. asoutheastern Texas. The vege t tion classes
ak .
v~g~tation· · of 54 sample ~lots o f vegetat10n
land ine forest and wetland pme sav~alain hardwood
m . · PO pme-o
HP F PO p
However, only two of the ways (that in Figure 5.5 and its ·mirror image)
show that, for example, HP is closer (more similar) than F to PO, and that
the greatest separation (dissimilarity) is between F and P. It should now be
clear that the numerous possible ways of drawing a dendrogram are not ali
equally informative. One of the merits of the TWINSPAN program is that it
arranges the dendrogram's branches in a way that puts similar points close
to each other, so far as is possible in a two-dimensional representation.
Since a TWINSPAN classification entails a new one-dirnensional RA
ordination at each step, the same result would be obtained if DCA ordina·
tions were ~sed. This is because the order of the points on the first axis is
1
?e same with ªDCA as with an RA ordination. Partitioning a two-dimen·
~10nal DCA ordination is yet another way of doing a divisive classification;
it has been proposed and demonstrated by Gauch and Whittaker (19811
wh~ g~ve the procedure the name DCASP · The partitioning is· done
subJectively p ff am
1
h h · ar rnns are drawn through parts of the scatter diagr .
w ere t e data po· t hod is
unlikel 10 m s can be seen to be sparse and therefore the met ..
Y make "fal " d · · · '' true
divisions m se ivisions. But there is a risk that sorne . n·
ay escape notice D · · o-d11ne
sional diagram · ata pomts may appear close m a tw ace
even though th . · nal sp
(for an example s F" ey are far apart in many-dunensi~ beÍJlg
rnisled by drawi~geteh ig~r~ 5.3). It may be feasible to guard against t is to
b
e Partitioned. e nummum spanrung . tree of the data swarm tha
221
EXERCISES
5..1 The following distance matrix gives the pairwise distances between
points in a swarm of 10 points in nine-space. The points are labeled
· A, B, ... , J. Find the segments of the minimum spanning tree and list
them with their lengths in the order in which they were found (as in
the center panel of Table 5.1). Draw a diagram of the mínimum
spanning tree.
A B e D E F G H I J
A o 1.88 2.33 2.26 1.74 2.93 3.30 10.73 8.83 8.57
B o 2.54 2.97 2.05 4.00 4.52 10.89 9.09 8.78
e o 3.22 1.54 4.01 4.10 11.28 9.66 9.21
G
o 10.44 8.96 9.07
H
o 3.27 3.77
o 3.00
I
o
J
. # 14 is altered by putting x61 = x62
5.2. Refer to Table 5.2. If Data Matr~. of principal component seores
= O, it is found that the ma nx
becomes
13.2 8.4 18.4
-37.5 -34.8 32.2 32.5
-11.7 -7.7
-1.2 0.6 -12.5
-6.4 -0.3
1.8 9 .5 -7.0
2.5 -3.8 4.3 -0.3
Y= -0.7
0.4 0.2 -0.2 -0.l
o.o 0.2
-1.3 1.4
o.o -O.O o.o
O·O o.o
-00 Pl
. b Lefkovitch's method. ot
. . of these data y
Do a divisive classificati~n .
the two-dimensional ordi~at10~. of the data described in Exercise 5.'2
5.3. Carry out a divisive class1ficast1on when four classes have been d1s-
.' ethod. top
using Noy-Meu s m
tinguished.
Chapter Six
Discriminant Ordination
6.1. INTRODUCTION
The data matrices that have been described, ordinated, and classified so far
in this book have all been treated in isolation. lt has been assumed that an
investigator has only one data matrix to interpret at any one time. We now
suppose that several data matrices are to be interpreted jointly. lt is desired
to ordinate all of them together, that is, in a common coordinate frame, and
an ordination method is wanted that emphasizes as much as possible the
contrasts among them.
Here are several examples of the kinds of investigations in which joint
ordinations are helpful.
l. Suppose one were investigating the emergent vegetation (or the
benthic invertebrate fauna, or the diatom flora) of several lakes. The data
would consist of several data matrices, one from each lake. .
2. One might be sampling the insect fauna (or some taxononuc subset
of it) in wheat fields in July in several successive years. Then the data would
consist of several data matrices, one for each year · . . . al _
3 O 'gh b mparing environmental cond1t1ons m sever geo
. ne mi t e co b
w· hin ch region a num ero en f vironmen-
graphically separate regions. it ea b' 0 f "quadrats" (or other
ª
tal variables are measured in each of num. er arizing conditions in
ª
8atnpling stations) and the result is data matnx sunun
223
224 UIKRIMINANT
ORD¡NAlio~
. Then the total data consist of severa! data mat .
that reg10n. nces, on
each region. e for
Ali ordination methods so far discussed in this book have entailed the
eigenanalysis of a symmetric square matrix. A discriminant ordination
requires that an unsymmetric square matrix be eigenanalysed. This section,
therefore, describes sorne of the properties of unsymmetric square matrices
and shows how they differ from symmetric matrices. In all that follows, the
symbols A and B are used for symmetric and unsymmetric square matrices,
respecti vely.
AV= VA.
(6.3)
The columns of V ( which are the rows of U) are the eigenvectors of A
and the elements on the main diagonal of A are the eigenvalues of A'.
Indeed, (6.3) is the equation that defines the eigenvalues and eigenvectors
of A.
An exactly analogous equation, namely,
BW=WA (6.4)
defines the eigenvalues and eigenvectors of the unsymmetric matrix B. As
before, the elements on the diagonal of the diagonal matrix A are the
eigenvalues of B and the columns of W are its eigenvectors. But in this case
B =t= WAW'.
MM - 1 = M-1M =l.
der of the factors
. lied by its inverse, the or
(Note that when a matnx is ~ultip onl if Mis orthogonal, M = M ·
. . -1 ' lf
eww- 1 = wAw- 1
or, more simply,
B = WAw- 1
(6,)¡
The diw ·a-erence between (6.2) and (6.5) should be noted. It arise f
. s rom L
h t V whose columns are the e1genvectors of a symmetn· t11e
fact t a , e matnx .
orthogonal . In contrast W, whose columns are the eigenvectors of 11s
unsy mmetn·c matrix ' is nonorthogonal. . an
4
M = (-; 2
-1 3
Its ínverse is
1
M-1 = ( ;
-7
7 -2)
-11 .
16-10
The reader should check that MM-1 = M-1 M = l.
tran~ultiplying
formation. It was shown m Chapter 3 (page 94) that the effect of
a data matrix by an orthogonal matrix is, in geometric terms,
10
pre otate the whole "data swarm" (the points defined by the transformed
r b h · · f h ·
matrix) rigídly a out t e ongm o t e coordmate frame; equivalently, one
think of the data swarm as fixed and the transformation as rotating the
can dinate frame (see Figure 3.3, page 97).
coor
Now Jet us trans f orm a data matnx· by premu1Up
· 1ymg
· 1t · by a nonorthog-
onal matrix. As an example, let the matrix that is to be transformed be
X
=(-11 11 -11 -1)
-1 '
whose co1umns give the coordinates of the comers of a square with center at
the origin (see Figure 6.la).
D
A B
--
-
____..
D - e
(a)
(e)
. (a) The original
.th a nonorthogon al matnx.
. ·r:r
TX in two dlllere nt
sforming data Wl sformation of X mto ints unaltered but
Figure 6.1. The effect of tran d (e) show tbe tran d· (e) shows th.e po 3 3)
"data swarm " a square X·' (b) an d but the polll· ts move , (Compare Figure . ·
'
\Vays: ( b) shows the axes un altere
through the ang1es shown.
lhe axes rotated, independently,
:au
(6.6)
choose different val ues for 011 and 022 • Thts en sures that Tf' =f. I and
therefore T is nonorthogonaJ as required.
A a particular example, Jet
Then
BW =WA.
B = ( ~~ ~~ -148)
-42 .
\ 41 59 -92
o
A=(~ 2
o J) and W = (-;
-1
N ow that the groundwork has been laid, we consider how several sets of
data may be simultaneously ordinated in a common coordinate frame in
such a way as to separate the different swarms of points as widely as
possible. The method is described here in recipe form because the underly-
ing theory is beyond the scope of this book. Theoretical accounts may be
found in, for example, Tatsuoka (1971) and Pielou (1977).
To illustrate the method, it is applied to real data. The data consist of
values of 4 climatic variables observed at 14 weather stations in 3 geo·
graphic regions. The purpose of the analysis is to ordinate the stations on
the basis of their climates.
In detail, the data are as follows. The locations of the weather stations
are shown
. . on the map m · p igure
· 6.2, and the place names appear ascolurnn
hea~mgs ~ Table 6.1. The three regions are: 1 the southern part of Yukon
Terntory m th e d' ' · the
e ana tan boreal forest · 2 northern Alberta, also 1I1 .
boreal forest b t t 1 ' '
rairi Th ~ a. a ower latitude; 3, southern Alberta in the ana
e diaD
P es. e climatic v · bl 1
The d . ana es are listed in a footnote to the tab e.
ata, which could b . . one for
each reo1 0 h e wntten out as three separate matnces, ;"
b'" n, ave thus b b watrJJ''
hereafter called X h ~en rought together as the single large t ¡be
, s own m T bl 6 . epara e
hr
t ee regions Th lim .
hr . e e atic ob
ª e .1. The dashed vertical lines 5 . ...,5)
. ID rol'l
t OUgh 6 Of the m t · Servations for each Station are shOWJl nrid z.
a nx. 1t remains
· · ws 1 a.v
to explain the elements 1.l1 ro
10~1MINAN T ORDI NATION or SlVLR Al
5
.
' f>I l)t\IA
SLIS
:n1
ALAS KA /
I
.....
)
YU KO N )
o (
'
I
(
)
N.W T
\
-,_
• I
BRITI SH I
COLUMBIA
I
• I
• I
• I
I
• I
ALBERTA /
- -- --
:i::'.egion
'
6.2. Map showing the locafions ol the 14 weather stalions lisled in Table 6.1. Stations
1 (Yukon) O; stat10ns m Region 2 (northem Alberta) • ; statioos io Region 3
(southern Alberta) ® .
These are "dummy" variables which show to which of the three regions
each of the stations belongs. As may be seen, every station has two dummy
variables associated with it, x 1 and x 2 . They are assigned as follows.
-=
j¡I-, 1
e
= =
t:
1
.¡ ~ =
~
1
1
~
o
=
,.J = ~ 1 Q.l
~(J
a
Oll ~
~
rl)
rl) e e
=
(J
rl) ;j
=
~
1
o ~
·e:
Q.l
e
·o
e
o
- ~
o ofil ~
1
e
~
o e e
rl)
~ =
j¡I-,
~
,,Q
= ofil
t:o t:o =
,,Q
1
~ ~ t:o :e
u Q ~ ~
~
< ~ ~ ~
=
~
1
~
1
o
~ ~ ~
Q.l
~
~
~
Xl 1 1 1 1 o o o o o 1
1
o o o o o
X2 o o o o 1 1 1 1 1 1
1 o o o o o
1
X3 -18.9 -29.4 -25.0 -20.0 -17.2 -12.8 -25.0 -22.8 -18.9 1 -11 .1 -8.9 -8.3 - 11.1 -8.3
1
X4 12.8 15.6 14.4 14.4 15.6 15.6 16.7 16.1 15.6 1 20.0 17.8 18.3 20.6 18.9
1
X5 11.5 13.8 10.1 17.2 14.7 13.1 11.5 14.8 10.3 1 1.1 11.9 11.5 9.8 12.9
1
x6 11.3 18.3 18.4 23.0 31.9 34.2 20.4 30.1 31.8 1 29.1 26.2 26.6 22.8 26.2
1
ªData from "Climatic Summaries for Selected Meteorological Stations in the Dominion of Canada, Volume I." Meteorological Division,
Department of Transport, Canada, Toronto, 1948.
b x 1 and x 2 are dummy variables; see text; x 3 and x 4 are daily mean temperatures in degrees C in January and July, re pectively: x 5
a.nd x 6 are precipitation in cm for October to March and April to September, respective1y.
isCRIMINANT ORDINATION OF SEVERAL
O SETS OF DATA
233
l. Center and standardize the data as described earlier. That is, replace
the (i, j)th element of X, x; 1 , by (xiJ - x¡)/a; where X; and a; are the mean
and standard deviation of all the elements in the ith row. (Note: it makes
no difference to the result whether the dummy variables are standardized. In
the computations shown in Table 6.2 they are standardized. Every row must
be centered.)
2. Postmultiply the matrix by its transpose to obtain the SSCP matrix
S. Sis shown in Table 6.2.
3. Partition S into four submatrices Sw Sw Sw and S22 as shown by
the dashed lines. The four parts into which S has been divided are as
foil . S · (k _ 1) x (k - 1) = 2 x 2 matri.x giving the sums of
OWS. 11 lS a · ummy variables x 1 and x 2 ; S22 1s
·
squares and cross-products o f t h e two d f
th · · · the sums of squares and cross-products o
es X s = 4 x 4 matnx givmg . S is a (k _ 1) x s = 2 x 4
the ~bserved variables X3, X4, X5, and x~, ro~ucts formed by multiplying
matnx whose elements are all sums of cross P d · bl . s = S'
one of the dummy variables by one of the observe vana es, 21 12.
235
s. Find the matrix product D defined as
(a)
••••
®
o
•
(\J
®®
® o
(f)
- ® o
X
<l
AXIS
(b)
• •
• o
(\J ®
®®
•
(f)
-
X
<I: o
• o
o
®
AXIS 1
Fim•re 6 3 · (a) a·1scnmman
Two ordinations of the 14 weather stat10ns: · · t ordinatiw(b) ' s
PCA
'to~ • • (centered and standardized). The symbols for the three reg1ons
lil.
The number of ways that now exist for classifymg · and ordinating
ecological data is already large. No doubt the invent10n . of new. ' more
has
ingenious techniques will continue. It could be argued that the t!Dl; lf
come for calling a halt to the endless proliferation of new _metho ~le
ecologists are ever to lit their individual contributions together mto 5 gd ª
· · knowledge, it seems desirable that few gootly
um·11 ed body of sc1en1Jfic ª .
methods of data ana!ysis should be adopted widely and used consisten
and that unproven methods should be consigned to the scrap-heap. re
At . h a mo
tractJve t ough this argument may be, it collapses before ret·
persuas1ve counterargument. This is that the development of data interp
¡XI t< ISI S
EXERCISES
H,= n fr 1
2
3
H2 = ( ~
-1 -2
2
o -i);
-3
H, = (
-3
~
1
2
-6 -n
Use each in turn to multiply the 3 X 8 matrix X (which
cube) where represents a
bring about?
6·2. Construct a diagram like that in Figure 3.4 (page 98) but with the Y1
and rraxes not perpendicular to each other. Let the angles between
the o1d an d ew axes /J , /J , 821 , an d 822 be defined as on page 100.
11 11 12
Denve equations analogous to (3.4a) and (3.4b) on page 99.
DISCRIMINANT
238
O~DINl\110~
e x 3 matrix X is partitioned as shown 1"\t
6.3. Suppose th 6 1~' o tWQ
. denoted by X1 and X2.
matnces SUb-
Xu X12 X13
Xz¡ Xzz X23
---------
X= X31
X41
X5¡
X32
X42
X52
X33
X43
X53
(~:)
X61 X62 x63
Write out t~~ _6_x -6 SSCP matrix XX' in full, and show how it can be
partitioned-into four submatrices that are identical with those in the
product
(Keep track of the sizes of the various submatrices and their products.)
Note that the multiplication of partitioned matrices is carried out
according to the same rules as ordinary matrix multiplication except
that submatrices take the place of individual elements.
Answers to Exercises
CHAPTER 2
1 1, 3 1, 3 4.58
2 [1, 3], 2 2, 3 8.49
3 4, 5 4, 5 9.17
5.5 0.333
o 1.667
2 2.667
-3 1.667
them is 7.190. )
The distance between lt is independent of P·
2.4. d1([P], [M, N]) = 554.125. (Note: The resu
2.5. 173.2.
239
240
J=a/(a+f).
S/l = (2a + 2/)/(2a + f) > 1 when f > O or = 1 wh
en f:::: O.
CHAPTER 3
3.1.
11
- 1 4 3)· (b) - 7
3 o 3 ' (
12
(e) cannot be
formed;
o
(d) 1
23
~ -2
4
10
- ~).
- 3 '
- 1
(e) cannot be formed;
(- 6819 22 3 o 3
(f) - 14 -8)
- 3 (g)
[ 14
30 1 8 9
78 24 - 10 20 7 -4 3
124 13 20 33
[Note: To form the r0 d
BC by A B P uct BCA, for example one may postmultiply
or by CA.] '
3.2.
-1).
3 ' u2 = ( -1 -1).
- 2 2 '
3.3. U2 is Orthogonal. U .
3.4. L ' i is not
XetXX'=A.Thena .. ist~ of
11
. and the Jth col e sum of cross-products of the ith row
is the sum of eros ~mn of X' (which is the jth row of X). Likewise, ap
X' ( hi . s Product 0 f . ...,11 of
w ch is the ·th s the J th row of X and the i th colu11u·
z row of X) H .
· ence a l.j. = aji.. for all i, J.
241
.5.
( _¿.5 -0.5)
1 .
5
.6. The eigenvalues of A are 2 5 = 32 and 3s = Thi
243 · s follows from:
As = (U'AU)(U1\.U)(U'AU)(U1\.U)(U'AU)
= U'A(UU')A(UU')A(UU')A(UU')AU
= U'A5U since UU' =l.
5
Hence the eigenvalues of A are the eigenvalues of A raised to the fifth
power.
CHAPTER 4
2 X 2 correlation matrix be
4.3. Let the
p = (~
Let the matrix of eigenvectors be
U= ( C?S0 sinO)
- smO coso .
Then since UPU' = A, it follows that the (1, 2)th element of UPU' is
O. That is, p(cos20 - sin28) = O. Hence
4.9. The following equations are numbered to correspond with those in tbe
text, except that primes have been added here.
w = c-1x,R-1xw (4.22')
'Cl 1
" - = W'Cl/ 2Q
whence (4.27')
wc1, 2 a: u
Q ( 4.28')
where UQ is the n
n matrix of e1genvectors of Q Th f
· ere ore,
w a: uQc-112.
( 4.29')
uo.
-12.99 -4.33 4.33 12.99
o o o o
Y= o o o o
o o o o
o o o o
Ibis follows from the fact that the data points líe on a straight line in
5se-space (i.e .. they are confined to a one-dimensional subspace of the
5xe-space ). The PCA places the first principal aus on this line;
therefore. the coordinates of the points can be found by determining
the distances separating them. An eigenanalysis of the covariance
matrix would show that it has only one nonzero eigenvalue. The
number of nonzero eigenvalues of a covariance matrix, which is equal
to the number of dimensions of the subspace in which the data points
lie, is known as the rank of the covariance matrix. In this example, the
rank is l.
CHAPTER 5
5.I. Th e segments of t h e · · m spanninP0 tree are:
ffilillIDU
E B
J
~
\ ~ F
11
'11
I \1
H ---'f-- - - \1
~- 1 ------ 1
1 ----- D_~A
b --~ ~
J E \--.C
• B
4
: (A) (B) (C) (D) (E) (F).
245
N atice that A and B are sep
. f . arated at the
still orm a close pau in th . .second step eve th
indistinguishable from the e or~mation pattem whin h ~ugh they
. pattern m F c is almost
separat10n occurs because th igure 5.4a. This " ,,
dB e second princi 1 . unnatural
an . pa axis passes between A
5.3. Successive divisions give the f 0 11owmg
. classes:
Observe that the close pair (A, B) has not been divided.
CHAPTER 6
6.1. The points of H 1 X form a straight line, hence form a figure of one
dimension. The points of H 2 X (and also of H 3 X) are confined to a
plane, and hence form a figure of two dimensions. This leads to the
conjecture that a matrix can be inverted only if it brings about no
reduction in the dimensionality of a swarm of points when it is used to
transform the swarm. Proof of the correctness of this conjecture, and
methods for judging whether a given matrix can be inverted, are
beyond the scope of this book. See, for example, Searle (1966),
Chapter 4, or Tatsuoka (1971), Chapter 5. Matrices that can,. and
cannot, be inverted are called, respectively, "nonsingular" and "smgu-
lar."
6·2· The equations are
.
Catenat1on. A n ordination designed to show, as clearly as possible' the
structure of nonlinear data.
Centered data. Da ta in which the observations
. . are expressed as deviations
from their mean value. Hence their sum 1s zero.
Centroid. The center of gravity (or "average point") of a swarm (or
cluster) of points in a space of any number of dimensions. The
coordinate of the centroid on each axis is the mean of the coordinates
on that axis of all the points in the swarm (or cluster).
Centroid clustering. A clustering technique in which the distance (or
dissimilarity) between two clusters is set equal to the distance (or
dissimilarity) between their centroids.
Centroid distance. See average linkage clustering criterion.
Cbaining. In a clustering process, the tendency for one cluster to grow by
the repeated addition, one at a time, of single points.
Characteristic values or roots. Same as eigenvalues. See eigenanalysis.
Characteristic vector. Same as eigenvector: See eigenanalysis.
Chord distance. The shortest (straight line) distance between two points on
the same circle, sphere, or hypersphere.
City-block distance CD Th · a· te
' · e d1stance between two points, in a coor roa
frame of any numb 0 f a· · nts
. er imens10ns, measured as the sum of segme
parallel w1th the axes p . .
· or pomts J and k,
s
Cluste · T
CD =
i=l
L
lxiJ - xikl ·
rmg. he process of 1 · . · · ·milaJ
Points to form e assifymg data points by combmmg 51 er
11
sma classes, then combining small classes into Jarg
249
classes, and
;¡:; so. on. Samc as av<T/o merutwe
, . el ;r·
{'¡(')
sive elassL11catwn. uss1;1cation. Contrast divi-
vector. A matrix
coJurno . with onl y
written as a vertical column.
one colum 11 E
• -<quivalently, a vector
s min(xi}' X¡k)
S = 200 L + percent
i=l Xij X¡k
RI = 100 t
Í=l
min(xiJ, xik)
max( x¡ 1 , xik)
percent
. Same as dendrogram . .
Tree daagram. . The axiom that the d1stance betwee
rty
1 ax1om. n any
Triangle inequa ot exceed the sum of the distances frorn t\Vo
. A and B cann eacn
points . oint C; that is, d(A, B) :::; d(A, C) + d(B, C). of
them to a third p .
· distance mea sures · Those that cannot m any circurn stanc
U1trametr1c l . a clustering process. es
cause a reversa m . . .
. d·nation axis on which the data pomts all have
· lar axis. An or 1 · C . seores
Umpo .
of the same s1gn (all positive or all negat1ve). ontrast bipolar ax ·
is.
. tr'x See symmetric matrix.
Unsymmetrrc ma • · .
.
Unwerghted average distance.
. Same as average dzstance. See average-tin.
kage clustering critena.
.
Vanance. Th e mean of the squared deviations, from their mean value, of a
set of observations.
Variance-covariance matrix. Same as covariance matrix.
Vector. A row vector or column vector. In sorne contexts, the n (say)
elements of a vector constitute the coordinates of a point in n-dimen-
sional space.
Weighted average distance. See average-linkage clustering criteria.
Weighted centroid distance. Same as median distance. See average-linkage
clustering criteria,
Weighted and unweighted clustering methods. Weighted methods treat
clusters as of equal weight irrespective of their numbers of points.
Unweighted methods treat data points as of equal weight so that the
weight of a cluster is proportional to its number of points.
Within-axes heterogeneity. The heterogeneity exhibited by a data swarm
consistmg of two or more subswarms, when all subswarms occupy the
same many d · · ·t
- imensional space. Con trast between-axes heterogenei Y·
Within-cluster d' · · ces
isp~rs1on of a cluster. The sum of squares of the distan
from every pom t . h 1
m t e e uster to the cluster's centroid.
Bibliography
Anderberg,
York. M. R · (1973) · Clu\'fer
' A11a/y.rn
· · · 1or
r. A ¡· ·
/IP trn/1011s . /\c¿Hfl:111ic l'n.:ss, Nl'w
Carleton. T. J., and P. F. Mayc<?ck (1980). Vcgctation of thc boreal forcst~ sou ll1 of'
James Bay: non-centered componcnt anaJysis or thc vascular llora . h'"º'ºJ'Y (,f :
1199-1212. \
Delaney, M. J., and M. J. R. Healy (1966). Variation in lhc whitc-toothe<.l shrcws
( Crocidura spp) in the British Islcs. Proc. R<~l' · Soc. B. 164: 63 74.
Gauch, H. G., Jr. (1979). COMPCLUS- A FORTRAN program for rapid initial
c/ustering o/ /arge data sets. Comell Univcrsity, Ithaca, N.Y .
Gauch, H. G. (1980). Rapid initial clustering of largc data sets. Vexetatio 42:
103-111.
Gauch, H. G., Jr. (1982a). Multivariate Analysis in Communi~y Ecolo¡zy. Cambridge
University Press.
Gauch, H. G., Jr. (1982b). Noise reduction by eigenvcctor ordinations. ecolo¡zy 63:
1643-1649.
d R. H. Whittaker (1972). Cocnoclinc simulation. Ecolox_v 53:
G auc h , H . G ., J r., an
446-451.
d R. H. Whittaker (1981). Hicrarchical cJassiftcation of
G auch , H. G ., J r., an
community data. J. Eco/. 69: 135-152. . .
. . . d . corrclation In "Ordrnat1on
G dall D W (1978a). Sample smnlanty an spccics · - ,
oo ' . . . . " (R H Whittaker Ed.), W. Junk. Thc llague, pp.
of Plant Commumt1es . · '
99-149. . . 1 "C1assiftcatjon of Plant Com-
Goodall, D. W. (1978b). Numerical classification. n ,,
257
258
u. (l JI (
1d, h7
1 rd 7 '
t blod 'í
ud1d n, 14 ~I 'i9 e 1 I f
'' 1 1 n 1 l
Livcnílnaly'>1 • 116, l 2ó. 229
Hotcllm m lhod 120
f.:,igr.;nvalu J lb, 22
L•!' nv uor. J lb, 225