You are on page 1of 265

The lnterpretation of

Ecological Data
The lnterpretation of
Ecological Data
A Primer on Classification and Ordination

E. C. Pielou
University of Lethbridge

A Wiley-lnterscience Publication
JOHN WILEY & SONS
New York • Chichester. Brisbane. Taranta. Singapore
Preface

Thethaim
d of this
d b book is to .give a fu 11' d etailed,
. . introductory account of the
me lfo s. use fi /d cdommuruty ecologists to make large, unwieldy masses of
mu ivanat~ e ata comprehensible and interpretable. 1 am convinced
that there is need for such a book. There are more advanced books that
. . of the same material ' for example, L . or l'oc1., s M u¡twarzate
cover sorne · ·
Analysls zn Vegetation Research (W. Junk 1978) and A D G d '
Cl ifi · ' . . or on s
assz catwn. Methods for the Exploratory Analysis of Multiuariate Data
(Chapman and Hall, 1981), but they assume a much higher level of
mathematical ability in the reader than <loes this book. There are also more
general discussions of the material, as in H. G. Gauch's Multiuariate
Analysis in Community Ecology (Cambridge University Press, 1982), or
many of the chapters in the two volumes edited by R. H. Whittaker,
Ordination o/ Plant Communities and Classification o/ Plant Communities
(W. Junk, 1978), but .they are more concemed with general principles, and
with comparing the merits of different methods, than with explanations of
the actual details of individual techniques.
· Such explanations are sorely needed. Through this century, ecologists
have used a series of aids to assist them in number-crunching, from tables of
logarithms, through mechanical, then electrical, and then electronic desk
calculators, to programmable computers. There has been no need for
ecologists to understand how these aids work. But over the past decade, a
new supply of "crutches" has appeared on the scene, namely, packaged
computer programs. These enable ecologists to carry out elaborate analyses
of their data without having to write their own programs. The packaged
programs are often long and intricate, the work of computer expe~ts. It.
would be {inreasonable to demand that ecologists refrain from turnmg to
these experts for help.
vii
viii
to expect ecologists to understand what
. . t unreasonable d
However, it is no_ h en if they do not understan how. There
do1ng for t ero ev
the programs are the person who uses a ready-made progra:rn.
f difference between .
is a world o d . nvectors of a large matnx and who under.
· nvalues an eige
to find the eige . d the person who delegates the whole task of
h t these things are, an
stands w ~ . 1 ent analysis (for instance) to such a program With
d . a pnncipa compon .
omg d. f what the analysis <loes. Packaged programs are a Il11Xed
no understan mg o . b d. fd .
· Whil they make it possible to analyze large o 1es o ata qmckly,
blessmg. e · 1 · li ·
1 d · a way that best reveals their ecologica 1mp cat10ns, they
accurate y, an m .
also make it possible for inadequately tramed people to go through the
motions of data analysis uncomprehendingly.
This book is designed to help those who want to gain a complete
understanding of the most popular techniques for analyzing multivariate
data. It <loes not offer any computer programs. Instead, it demon strates and
explains the techniques using artificial data simple enough for all the steps
in an analysis to be followed in detail from start to finish. The prerequisites
are a knowledge of elementary algebra and coordinate geometry at about
the first year undergraduate level. To make the book useful for self-instruc-
tion, exercises have been given at the ends of the chapters. The answers to
them, and a comprehensive glossary, are at the end of the book.
1 have written this book while holding an Alberta Oil Sands Technology
a~d Research _Authority (AOSTRA) Research Professorship at the Univer-
s~ty of Lethbndge. I am greatly indebted to the Authority, and the Univer-
sity., whose support has made the work possible. I also thank William
Sllllenk of Lethbridge, who prepared all the figures.

E. C. PIELOU
Lethbridge, A/berta, Canada
April 1984
Contents

1 INTRODUCTION
1
1.1 Data Matrices and Scatter Diagrams, 3
1.2 S~me Definitions. and Other Preliminaries, 8
1.3 A1m and Scope of This Book 11
1

2 CLASSIFICATION BY CLUSTERING 13

2.1 lntroduction, 13
2.2 Nearest-Neighbor Clustering, 15
2.3 Farthest-Neighbor Clustering, 22
2.4 Centroid Clustering, 25
2.5 Mínimum Variance Clustering, 32
2.6 Dissimilarity Measures and Distances, 40
2.7 Average Linkage Clustering, 63
2.8 Choosing Among Clustering Methods, 72
2.9 Rapid Nonhierarchical Clustering, 76

Appendix: Apollonius's Theorem, 78

Exercises, 79

83
3 TRANSFORMING DATA MATRICES

3.1 lntroduction, 83
3.2 Vector and Matrix Multiplication, 85 ix
X

f Data Matrix and its Transpose, 102


3.3 The Product o a f a Square
The Eigenvalues and Eigenvectors o
3.4 6
Symmetric Matrix, 11
The Eigenanalysis of XX' and X 'X, 126
3.5
Exercises, 129

4 ORDINATION 133
4.1 lntroduction, 133
4.2 Principal Component Analysis, 136
4.3 Four Different Versions of PCA, 152
4.4 Principal Coordinate Analysis, 165
4.5 Reciproca! Averaging or Correspondence Analysis, 176
4.6 Linear and Nonlinear Data Structures, 188
4.7 Comparisons and Con el usions, 197

Exercises, 199

5 DIVISIVE CLASSIFICATION 203


5.1 lntroduction, 203
5.2 Constructing and Partitioning a Mínimum
Spanning Tree, 205
5.3 Partitioning a PCA Ordination, 211
5.4 Partitioning RA and DCA Ordinations, 218
Exercises, 221

6 DISCRIMINANT ORDINATION 223


6.1 lntroduction, 223
6.2 Unsymmetric Square Matrices 224
6.3 Discriminant Ordination of S~~eral Sets of Data 230
Exercises, 237 '
coNTENTS
xi
ANSWERS TO EXERCISES
239
GLOSSARY
247
BIBLIOGRAPHY
257
INDEX
261
The lnterpretation of
Ecological Data
Chapter One

lntroduction

Probably
. ali. ecologists
. are familiar with field noteb ooks whose pages look
so~et~g like Fi~ure l. l. Probably most ecologists, even those still at the
begmnmgs of the~ careers, have drawn up tables like it. Their efforts may
b~ neater or .mess1er, dep~nding on the person and the circumstances (wind,
ram, mosqmtoes, gathenng darkness, a rising tide, or any other of the
stresses an ecologist is subject to). But such tables are ali the same in
principie. They show the values of each of several variables (e.g., species
quantities) in each of severa! sampling units (e.g., quadrats). Tables such as
these are the immediate raw material for community study and analysis.
Although natural, living communities as they are found in the field are,
of course, an ecologist's ultimate raw material, it is impossible to come to
grips with them mentally without first representing them symbolically. A
table such as that in Figure 1.1, which is part of a data matrix, * is a typical
symbolic representation of a natural community. 1t is the very first represen-
tation from which all subsequent analyses, and their representations, flow.
Therefore it is the first link in the chain leading from an actual, observed
community ' to a theory concerning the community, and possibly to more
inclusive theories concerning ecological communities generally.
The interpretation of such data matrices is the topic of this book. This
introductory chapter first describes in general outline the p~oc~dures that
make data interpretation possible. Then, as a necessary preliminary to ali

*Words italicized in the text are defined in the Glossary at the end of the book.

1
f t,f-C, {JU- ~ ' c: ?oo "" ~ ~ '4...o,_._11;;
(P. .t 1-~1-t ~ <;1-e.ep~ R. ; Jf-0 .h-1 lv..!~j

~
~ /{-:W

Q~
.,. ,, #:12-. 13 /'f /':,- lb I /t ~

Jr' J_ 7 lo 13 1
II s t8 '1 f1 ~
I 8 t. 'l ~ 3 £" y
-- ;l.
I 11/f
,~
I I
I /
/
/
1

c, fl 2 I "") "J
2-
,,_I 3 ai 2 1 /
JS- 'O / J
,3 :i. /o fo 16 9
I q lb <R., ~ 19
6 0 :i. /O 'f 1 1~
~ I 3 / 1

• 4

I
Scanned by CamScanner

I I

I 3
I I
( I

't /
I (

11.. : /l ; /'l : /":> ;

-latd
111aw.1llllltebook. This one records observations on the ground
lD the floodplain of the Athabasca River, Alberta.
oATA MATRICES ANO SCATTER DIACRAMS

that follows, is a section on termino} . .


b . f ogy. As is m . t bl .
growing su ~ect, a ew of the technical t ev1 a e lil a rapidly
. erms are used . d.
·trerent
dl wnters. Therefore it is Ilece m lfferent senses by
. ' ssary to den .
senses in which they are used in this book . lne,. unamb1guously, the
' as is done m Section 1.2.
\

1.1. DATA MATRICES AND SCATTER DIAGRAMS


A data matrix in the most general sense of th t .
· d e erm is any table of
observat10ns ma e up of rows and column Th d .
. s. e ata matnces most
commonly encountered m community ecology t bl h ·
. . are a es s owmg the
amounts of severa!
. spec1es m
. each of a number of samp ling uruts. · Thus
there are obVIously two poss1ble ways of constructing a data matrix: either
one may let each row represent a different species and each column a
different sampling unit (as is done throughout this book), or vice versa. The
method used here is the one favored by the majority of ecologists.
Any matrix (and that includes data matrices) can be denoted by a single
symbol that represents, by itself, the whole array of numbers making up the
table. A symbol representing a matrix is usually printed in boldface. If the
matrix has s rows and n columns, it is described as an s X n matrix or,
equivalently, as a matrix of order s X n. The symbols s and n are used
throughout this book to denote the orders of data matrices for mnemonic
reasons: s stands for s pecies and n for n umber of sampling units. In
specifying the order, or size, of a matrix, the number of rows is always
written first and the number of columns second.
Now consider the symbolic representation of a 3 X 4 data matrix, say X,
using subscripted x s in place of actual numerical values. Then

X = ( ~:: ~:~ ~:: ~:: ) .


X31 X32 X33 X 34

t . that is every individual term,


As may be seen, every element 0 ~ the m:c~, the po~ition of the element in
has two subscripts. These subscnpts sp Yb f the row and the second
th . . es the num er o '
e matrix: the first subscn~t giv . e element occurs. For example, X24
the number of the column m which th h mn of x· in general,
w and fourt co1u ' .
denotes the elemen t in the secon d ro d .th column. This rule is
· the ith row an 1 ar lt
one writes x .. for the element m . . where matrices appe ·
adhered to uciversally in all mathematical wntmg
tli r ¡ l 1i
.0 11111111 , say, i..: onslitulcs a hst of lhe
11
111 11 d11:1 111.1t • (if a spccics is ahsent frorn
¡1
1 111111. ti 1 '' 11111' 11111' ti 11 . f thc
1iill1 ''' 11 lllll ·· 111 llH I ·• 1 . 1111 row is alisto thequantitic
, l ll q ti' 1 ) 1 1I t V1St ' I l l S Of
11 1tll\'l. 'l1 lll '
1111 1 if,\ '"'"' 1111 l 111111 .•i. . h .
1 th · s 11111 p 111nh le 111 O 1 l 1•I. 1•.1 mtcrprctat1on, t . e subJect
1 ,111 of tk
11 \1, I 11 ' " 'lll
1111
Vl 111 '"' ,., t tu l H 1 H' ll . . 111 . latcnl structure m a raw'' data
IVl
11 ¡~. ldn1 11 t.isv
0

¡,, ,, 1k '1• 1 i 11 11ht• 11 ·ll I , <>• cvcn to judge .


whether it has any
· . 1 1s '""'P' ' ' • W l 111 ' illl •.111 y 'systematac pattern that wout.1
1 l 1 1
1\\.1 1 '""I
. . i•roups of species tended to ""~-
111 111
. 111 1\ ti11l lh 's t " '
1111pk l 1t.1111 l"'I -~
11 "11 ·. 111· 1h.11. ,,,. ' ¡ •
1 , whcn appropriately arranged, WoUld
t 1 s·1111pl111¡• 11111 s, ..
1,, .l th ' · 11 111. 1 ll ·:
1
, , • ín Lhcir spccies compos1ttons.
' 1 • 1 llll(lllllOllS lt ll( 1 }Q d '
• t11h1t .1":"11" ·' . .. 1• . thc following 10 X atamatnx.It
:.. ,111 :11 lil 1ri:l l t , :1111pk, OllSI< CI

1 • 1 .m" n1.1tii .
() () 3 o 1 1 4 3
() () () o o 4 o 1 2
() 4 1 o 2 o 1 o o
~ o o 4 1 o 2 3 2
1 3 4 o 3 o 2 o o
4 o l 3 2 o 3 2 1
2 2 3 l 4 o 3 o o
l o o 2 o 2 o 3 4
o o o 1 o 3 o 2 3
()
4 3 o l 2 2 3 o 1
,,, .tlwa vs. thl: rnws rcprcscnt species and the columns represent s
uni ts.
undcniably, lacks any evident structure. But now
l'h1 s 111ati i: .
1h l sa111plmg units (rnlumns) and the species (rows) were rearr
1pptllp11at • fashion . Thc result is the "arranged matrix"

4 3 2 1 o o o o o
3 4 3 2 1 o o o o
2 3 4 3 2 1 o o o
1
2 3 4 3 2 1 o o
o 1 2 3 4 3 2 1 o
o o 1 2 3 4 3 2 1
o o o 1 2 3 4 3 2
o o o o 1 2 3 4 3
u o o o o 1 2 3 4
', . o o o o o. o
1h1s mnt1' . l. ontains xartly the s .
1 2 3
th ord rings of th. . .· ame infonnation as th
spt:ltcs, and of the sampling unit
DATA MATRICES ANO SCATIER DIAGRAMS
5

for instance, the. species


. labeled # 1 (therefo re, appeanng . m . the first row)
in the raw matnx IS labeled
. # 4 ( and ' therefo .
re, appears m the fourth row)
in the arranged matnx.
. .As to the colurnns ' samPling urut . # 1, m. column 1
of the ra~ matnx, for mstanc~, has been relabeled as # 2 and laced in
column 2 m the arranged matnx. P
The
. method by . which
. the labeling
. system th t d
a pro uces the arranged
matnx
. was
. determmed Is descnbed in Chapter 4 and th ese two matnces,
·
w1th the1r rows and columns labeled, are given again in Table 4.ll. lt
su~ces to remark here t~at the method is reasonably sophisticated. To
denve the a~~ange~ m~'tnx from the. raw version without a prescribed
method (an a~g?~Ithm ) would be time-consuming and rather like the
efforts of a norurutiate to solve a Rubik cube.
The preceding arranged matrix X is a clear example of a matrix with
"structure." Data interpretation, as the term is used in this book, consists of
methods of perceiving the structure in real data matrices even though these
matrices, in raw form, may be as seemingly unstructured as the raw version
of X. Table (or matrix) arrangement, as demonstrated here, is only one such
method. Two other techniques, classification and ordination, do more to
reveal the structure of a data matrix than <loes simple table arrangement,
and descriptions of these techniques form the bulk of this book. Only a few,
short introductory comments on these topics are necessary here.
Little need be said about classification. The word, as used in community
ecology, has its ordinary everyday meaning (that of grouping similar things
together in to classes) and requires no special definition. Obviously, one can
classify sampling units, uniting into classes those units that resemble one
another in species composition; or one can classify species, uniting into
classes those species that tend to occur together .. More will be said later
about these contrasted modes of classification.
Again, obviously, there are two contrasted ways in which one can go
about the task of classifying data. lf sampling units are to be classified, for
example, one can either start with the individual units and combine them
into small classes which are then themselves combined (and so on) thus
successively forming ever larger classes; this is agglomeratiue classi~cation
(or clustering) and is the topic of Chapter 2. Or else one can st~~ WI~h. the
whole set of sampling units as the first, all-inclu~ive class _and dlVlde It mto
smaller classes· this is divisive classification and IS the topic of Chapter 5.
We come n~w to ordination, a term that describes a whole battery of
· · · · 1 ense the
techniques that are useful in commuruty ecology. In Its ongma s '
Word means the same as "ordering." Consider, first, the data ª plant
INTRODUCTION
6

· a row of samPle plots ( quadrats) along


b servmg
1
eco og1s . t might collect .by o f mple up th e s1·ae of a mountain or across
vironmental grad1ent, or exa . th ordering of the quadrats is
an en In tbis case e h
al tmarsh from land to sea.
as . need for an "ordination." Now suppose t. e
· en in advance and there is no . in level mixed fores t w1th
giv h b ceous vegetat10n ' . h
ecologist samples the er a that the enwonment, t ough
d t and suppose a1so Id b
randomly scattered qua ra s, . . d fi .te gradients. There wou e no
exhibits no e m
moderately heterogeneous, d ts but it might be reasonable
. . to order the qua ra ' . .
immediately obv10us way . d "f nly one could discover it. The
1 d " existe 1 o
to suppose that a "natura or er h vironment of the forest consists
supposition amounts to assuming that t e en ily with clear boundaries
a·~ habitats (not necessar
of a mosaic of I erem d 1h t these different habitats have, them-
between the mosaic patches) an
1 t al order in the same way
ª that the successive habitats up a
.. .
se ves,taina naoruracross a saltmarsh , h ave a natural order. If this . suppos1t10n
. is.
moun h' t hnique can presumably be devised for d1scovenng this
correct, t en a ec hni · Id an ordina
natural order from the vegetation data. Such a tec que y1e s -
tion. ·· f
The preceding paragraph describes the airo of the first practlt10~ers o
ecological ordination. Now let us approach the t~pic _from a different
starting point. To malee the discussion concrete, V1Sualize the data that
would be amassed by a forest ecologist estimating the biomasses of the trees
of several different species in a number of sampling plots in mixed forest. It
is required to ordinate the plots. An obvious way of doing this would be to
rank them according to the quantity they contained of the most abundant
species. But why stop at one ordination? They could also be ranked
according to the quantity they contained of the second most abundant
species. An easy way to obtain these rankings would be to draw a scatter
diagram in which the quantity of the first species is measured along the
x-axis, and of the second species along the y-axis; each sample pl9t is
represented by a point. Clearly, if the points were projected onto the x-axis,
their order would correspond with that of the first ordination; likewise, if
they were projected onto the y-axis, their order would correspond with that
of the second ordination. But a clearer picture of community structure
would be g¡ven by the scatter diagram itself; nothing is gained by consider-
mg. the_ two_ axes separately. The scatter diagram can be considered as an
ordinat10n m two dunensio . ·1 · · .
rather than merely a list . of plot
ns, ilabels.
is a p1ctonal representation of the data,
I t should now be appare 1 h
dim ens1onal
. h argument is leading · If a two-
scatter diagram n(sh w ·ere the
owmg t e amounts in the plots of the two
DATA MATRICES AND SCAITER DIAGRAMS
7

rnost important species) is good ' then a three-d1me . · 1 ·


showing the three most important s . . ns10na scatter diagram
( pec1es) is e t 1·nl
harder to construct-it would have to b
.
d . er
e ma e with ·
ª Y better. Though
f .
stuck in corkboard-1t would contain . pms o vanous lengths
. . more mformaf B
thfee spec1es? Every time a species that . wn. ut why stop at
. . is present 1·s d' .
amount of mformation is sacrificed Th f isregarded, a certam
. . · ere ore the b t .
would be one d1splaymg all the data on all '. . es scatter diagram
' s spec1es m th f
of how large s may be. The only difficult . h e orest, regardless
. . . Y is t at a scatter d. ·
than three dlIDens1ons
.
is impossible even t . liz
o v1sua e let al
iagram m more
One is left with a "conceptual scatter diag ,, ' . one construct.
investigate. ram, an unsatisfactory object to
Ecological ordination
. as it is now practiced ' however, <loes start from
conceptual scatter
. .diagrams. The various techni ques amount to mappmg .
these.many-dlffiens10nal
. .pattems in two dimensions (or, occas10na
· lly, three)
In this way
.. the unvisualizable conceptual patterns can b e rou gh t b ack mto
b · ·
the real vlS1ble world to be looked at and studied.
The ?roce~s is analogous to t~at of drawing two-dimensional maps of the
three-dimens10nal globe. The d1fference is that when a geographer draws a
?f
map the world on paper, he can easily consult a three-dimensional globe
showmg the true pattem of land and sea; the reduction in dimensionality is
only from three to two. By contrast, an ecologist cannot see, or even
visualize, the many-dimensional pattem that has to be mapped, and the
required reduction in dimensionality is usually many times greater. All the
same, the analogy between geographic mapping and ecological ordination is
close, and it is also instructive. It shows that there are many possible
techniques for mapping a many-dimensional scatter diagram in two dimen-
sions, and that no one of them is automatically better than all the others. In
the same way that different map projections are suited to different purposes,
so different ordination techniques emphasize different aspects of the ecologi-
cal data being examined. Unfortunately, however, the matching of tech-
nique to purpose is far less clear-cut in ecology than in geography; the
purposes are not so well defined and methods for achieving them are not at
ali obvious.
We now tum to practica! considerations. To arrive at a two-dimensional
ordination of a many-dimensional scatter diagram, one h~s to operate on
the given data namely an s X n data matrix (recall that s is the number of
· ' ' · b h ght of as
species and n the number of sampling uruts). The data may et ou .
01 · · · tt d' ram plotted m a
b"Vlng the coordinates of the pomts m a sea er iag .
COordinate frame with s mutually perpendicular axes (one for each spec1es),
INTRODU( fl(J~

o Jing unit). The coord 1


ach sarnP .
. f data points (one for e 1 rnents in the J th co1umn c,f
· ung o n . by the e e . " ,,
and cons1s . t say are given . ts form.Ing a swarm or
h ·th pom , ' . f d ta potn ' .
nates of t el_ Th whole collectton o ª ny ordination techmque~
the data m~trIX. lel d a data swarm. All the frna pping an s-dimensjonaJ
" d ,, w1ll be ca e . ays o rna . .
clou ' . e amount to d1fferent w r What is obtamed is an
that ecolog1sts us heet of two-dimensional pape ~st widely used method~
data swarm on a s . h best-known, m . 1
. . f mpling uruts. T e ssary mathematica pre-
ordmation o sa 4 f llowing sorne nece
are described in Chapter ' o
Iiminaries in Chapter 3. . d that if it is Iegitimate to treat a bod~ of
No doubt the reader has not1c~ d. i·onal space (s-space) as JUst
. . ts ID s- imens . .
data as eqmvalent to n pom . . t treat it as s pomts m n-space.
described then it is equally leg1t1mate rº h column) of the data matrix
'
When this is done, eac row
h e 1nstead o eac
. Th ares points altogether and they
· f data pomt. ere .
gives the coordmates ª 0
. h mutually perpendicular axes, one
. d·nate
1 frame w1t n .
are plotted m a coor _ . t resents a species. Ordinat10n of the
for each sampling umt. Each pom . rep_ .
data swarm therefore, gives an ordmat10n of spec1es. . .
. ' .
An ordmauon of samp mg u
1· nits is known as
. . an R-type
. . ordznatwn ,
whereas an ordination of speci~s_i_La Q-type ordznation. Smrilarly, there are
R-type and Q-type classifications, but these terms are sel_don:1 used. . .
The techniques for performing R-type and Q-type ordmat10ns are 1dent1-
cal, and at first thought the two types of analyses seem equally legitimate.
Q-type analyses have one great drawback, however. If one plans to carry out
any statistical tests on the data, it is essential that the "objects" sampled be
independent of one another. Community sampling is nearly always done in
a way that ensures the mutual independence of the sampling units, and the
sampling units are the objects in an R-type ordination. But the species in
the sampling units are almost certainly not independent and it is the species
that. are the objects in a Q-type ordination. Statistical hypothesis testing is
outside the scope of this book, and we do not have occasion to consider the
randomness and independence of sampling units again. However, the con-
trast between R and Q-ty 1 f . . .
pe ana yses rom the statistical point of y¡eW
should be kept in mind.

1.2.
SOME DEFINITIONSANO OTHER PRELIMINARIES
Before proceeding it is most im .
book of two terms much u d . portant to give the definitions used in this
se m community ecology: sample and clustering.
SOME DEFINITIONS ANO OTHER PRELIM INARIES
9

The word sample is a source of


enormous co f .
most unfortunate. To illustrate· 1·m . n us10n in ecology which 1·s
. · agme that '
observat10ns on a number of quad t *
. h
ª plant ecologist has
ra s. To a st r ..
made
ecolog1sts, eac quadrat is a samnf" . a istician, and to many
. r mg unlt and th
quadrats.1s a sample. To other ecologists ( e whole collection of
quadrat 1s a "sample" and the whole lel.g.,. Gauch, 1982) each individual
co ection of d .
set." The muddle can be shown most el . qua rats is a "sample
1
words in the four cells are the name ~ar y m 2 X 2 table in which the
ª
s g1ven to the b · .
row labels by the people specified in th
eco1umn labels. Thus:
°
~ects spec1fied in the

Statisticians and
Sorne Ecologists Other Ecologists
Single unit (e.g., quadrat) Sampling unit Sample
Collection of uni ts Sample Sample set

!his boo~ uses the terms in the left-hand column. Neither terminology is
entlfely satisfactory, however, because it is a nuisance to have to use a
two-word term (either sampling unit or sample set) to denote a single entity.
Therefore, in this book 1 have used the word quadrat to mean a sampling
unit of any kind, and have occasionally interpolated the additional words
"or sampling unit" as a reminder. This is a convenient solution to the
problem but it remains to be seen whether it will satisfy ecologists whose
sampling units are emphatically not quadrats: for example, students of river
fauna who collect their material with Surber samplers; palynologists, whose
sampling unit is a sediment core; planktonologists whose sampling unit is
tbe catch in a sampling net; entomologists, whose sampling unit. is the .c~tch
in a sweep net or a light trap; diatom specialists, whose sampling umt is.ª
microscope slide; foresters, whose sampling unit is ~ plot or stand ~h~t is
much larger than a traditional quadrat although, like a quadrat, it is ª
delimited area of ground.
.. . t A G Tansley and T. F. Chipp, in their
*The term quadrat is surely fanuhar to all ecologi~ s. (B .f· h Empire Vegetation Committee,
class1c Aims and Methods in the Study 01 Vegetatzon n ~Is ermanently marked off as a
192 " · 1 are area temperan Y or P .
6), define a quadrat as s1mp .Y a squ ,, A ore modero definition would omit
1
sample of any vegetation it is desired to study e os~ y. al m that although the definition says
1
the word "square"; a quadrat can be any shape. ?te h soght ~f as smaller than a "plot."
nothi ng about size . . d. ge a quadrat is t ou 1· . t the
m or mary usa . f drat nor Iower lffilt 0
H ' limit to the size o a qua •
. owever, there is no agreed upon upper bl be called by either name.
1,ize of a plot; sorne" marked off areas" could reasona Y
1N TRO DUCT1o~

10
. d h word sample because of its
as possible 1 have avoide . t. e sed in the statistical sense to
As far . ear i t is u
. 't but where it does app '
amb1gw y, . f d ats · ·
a whole collect10n o qua r · . Some ecologists treat lt as
mean . al amb1guous.
The word clustering
.
is so ifi t. Other ecologists
( e class1 ca wn · .
treat it as
synonymous with agglo.mer~ IV . h general sense, including both ag.
synonymous with clas~i~c~twn I~o~se Schematically, the two possibilities
gl~merative and and divISive met .
are as follows:

Classification
Classification ( = clustering)
Agglornerative Divisive Agglomerative Divisive
(= clustering)

This book uses the scherne shown on the left.


It should also be remarked here that the word cluster is ambiguous too.
Consider a two-dimensional swarm of data points. Sorne writers would
apply the word cluster only to a "natural" (in other words, conspicuous)
cluster of points, that is, a group of points that are simultaneously close to
one another and far from all the rest. Other writers use the word cluster to
mean any group of points assigned to the same class when a classification is
done, no matter how arbitrary its boundaries. The word is not used in this
book; hence there is no need to make a choice between these definitions.
A note on symbols should make subsequent chapters easier to follow. It
was remarked that the term in the ith row and jth column of a data matrix
~also known as the (i, j)th term, oras X¡¡] denotes the quantity of species i
m q.uadrat J. Throughout this book, the symbol i always represents a
spec1es and the syrnbol j a quadrat.
Matrices that are not data matrices are encountered in the book Sorne of
these show the relationship b t . . ·
two d iuerent
·&r e ween parrs of spec1es necessitating the use of
symbols to repr t d'
symbols are h d . esen two ifferent species. In such cases the
an 1.

Likewise, sorne matrices show the rel . . .


necessitating the use of tw diIB ationship between parrs of quadrats,
quadrats. In such cases th: berent sy~bols to represent the two different
T . sym o1s are J and k
bis convention should be recall d . .
countered. It is used in all b t f e whenever paued subscripts are en·
u a ew special contexts that are explained as
¡\IM ANO SCOPE OF THIS BOOK
I
11
they arise. Thus a term such
species i in
. quadrat
. j. A ter as x,J, With i and 1.
m such as . as subscnpt .
1
to a relat10nship between th Yhn With h and . s, re ates to
e two s · l as subs ·
with j and k as subscripts pecies h and i A d cnpts, relates
k ' re1ates to a relationshi. bn a term such as z1 k
· P etween d '
1t has been remarked sev al . . qua rats j and
er tunes hi
data mat~ denotes the "amount" ~ t s ~hapter that each ele
The way m which the quantity of a or ~uantity" or a species in ament of a
course on the kind of . spec1es should be quadrat.
, . orgarusm concemed measured depends of
For most ammals, for most 1 · '
· P ankton 0 ·
poll en grams, and for seedling pla t f rgarusms (plant or animal) f
. di 'd al . n s o roughl ' or
m VI . u s 1s the simplest measure of ua Y . equal size' the numb er of
organisms as mat-forming plant q ntity. The amounts of su h
s, and sorne col . 1 e
bryozoans, are often best measu d orna corals, sponges and
· · · re as percen tag " ,, '
spec1es m which, though the m·d'1 'd . e cover. The amounts of
VI ua1s are d1 ( h
in size (such as the trees in uneven-a ed f s mct, t ey are very unequal
biomass of the individuals present . tgh orest) are best measured by the
In all . m e quadrat.
commumty studies it is imporiant to decid
measure species quantities and the t k e upon the best way to
n o ma e the mea
However, these matters are outside the . f _surements carefully.
referred to again. purVIew o this book and are not

1.3. AIM ANO SCOPE OF THIS BOOK

The material covered in this book is listed in the Table of Contents. The aim
of the book (to paraphrase what has already been said in the Preface) is to
explain fully and in detail, at an elementary level, exactly how the tech-
niques described actually work.
Packaged computer programs are readily available that can perform ali
these analyses quickly and painlessly. Too often users of these ready-made
programs do no more than enter their data, select a few options, and accept
the results in the printout with no comprehension of how the results were
derived. But unless one understands a technique, one cannot intelligently
judge the results.
Anyone who uses a ready-made program, for instance, to do a pr~cipal
component analysis, should be capable of doing the identical analysts of .ª
small, manageable, artificial data matrix entirely with a desk calculator, or ú
INTRODU CTIO~

12

en with programs written by oneself. N obody can claim to


on a cornpu t er, th
understand a technique completely who 1s not capabl~ of domg this. .
. . .
It will be noticed that the book contains no ment10n of such top1cs as
sampling errors, confidence intervals, and hypothesis tests. This is because
the procedures described are treated as techniques for interpreting bodies 0¡
data that are interesting in their own right, and are not regarded merely as
samples from sorne larger population. The techniques can safely be applied,
as here described, as long as one realizes that what they reveal is the
structure of the data actually in hand. A large body of data can certainly, by
itself, provide rewarding ecological insights. But if it is intended to infer the
structure of sorne parent population of which the data in hand are regarded
only as1 a . sample, then statistical matters do have to be considered . For
examp e: it would probably be necessary, at the outset, to transform the
observat10~s so as to make their distribution multivariate normal. One is
then. entenng the field of multivariate statistical analysis which is h 11
outs1de the scope of this book. ' w o Y
The distinction just made between in .
multidimensional data swarm' d t~rpretmg the patterns of given
s, an analyzmg m lf · . .
deserves emphasis· it is too oft bl d u ivanate statlstlcal data,
, en urre The t b'
and mastery of the first i·s . wo su ~ects are quite distinct
a necessary pre · ·
second. Students who wish t reqms1te to appreciating the
1. . o go on from the .
mu tlvanate statistical analys· . present book mto the study of
Morrison (1976 ) and Tatsuokais(l 97l)w111 can
find many b. 00k s to choose from.
be especially recommended.
Chapter Two

Classification by Clustering

2.1. INTRODUCTION

The t~sk describe~ in t~s chapter is that of classifying, by clustering, a


collectlon of sampling umts. In all that follows, the sampling units are called
quadrats for brevity and convenience. As described in Chapter 1, the data
matrix has s rows, representing species, and n columns, representing
quadrats. The ( i, j)th element of the matrix represents the amount of
species i (for i = 1, ... , s) in quadrat j (for j = 1, ... , n ). We wish to
classify the n quadrats by clustering or, as it is also called, by agglomeration.
To begin, each individual quadrat is treated as a cluster with only the one
quadrat as member. As the first step, the two most similar clusters (i.e.,
quadrats) are united to form a two-member cluster. There are now (n - 1)
clusters, one with two members and all the rest still with only one member.
Next, the two most similar of these (n - 1) clusters are united so that the
total number of clusters becomes ( n - 2). The two clusters united may be
single quadrats (one-member clusters), in which case two of the (n - 2)
clusters have two members and the rest one. Or else one of the two clusters
united with another may be the two-member cluster previously formed; in
that case one of the ( n _ 2) clusters has three members an~ the rest ?ne.
Again, the two most similar clusters are united. And agam and agam and
· . ll h · inal quadrats have been
agam. The process continues unt11 a t e n ong
agglomerated into a single ali-inclusive cluster.
13
CLASSIFICATION BY CLUSTER.!
~~
14

Certain decisions need to be made before this process can be carried out
The questions to be answered are:
t. ttow shall the similarity (or its converse, the dissimilarity) between tWo
individual quadrats be measured?
2. How shall the sinúlarity between two clusters be measured when at leas1
one and possibly both clusters have more than one member quadrat?

Both these questions can be answered in numerous ways. First, to answer


question l. Recall (see page 8) that, given n quadrats and s species, the data
can be portrayed, conceptually, as n points (representing the quadrats) in
an s-dimensional coordinate frame. Therefore, one possible way of measur.
ing the di~i~arity between two quadrats is to use the Euclidean distance
in this s-space, between the points representing the quadrats. The coorcti'.
nates of the jth of these n points are (x 11 , x 21 , ... , xs)· This records the
fact ~hat quadrat j co~tai~s. x 11 individuals* of species 1, x 21 individuals of
spec1es 2, ... , and xsJ mdlVlduals of species s.
The distance in s-dimensional space between the j th and k th ·
denoted by d(j, k ), is, therefore, pomts,

d(j,k) = V(x11 - xa)2 +(x21 - x2k)2 + ... +(x SJ. - x sk )2

s
L (x;J - X;k)2.
i=l

This is simply an extension to a space of . .


of Pythagoras's theorem whose t d. s ~imens1ons of the familiar resull
21 ·· wo- imens10n I vers10n
· is ª
· shown in Figure

The Euc~dean distance between the . .


?f the ways m which the dissimil . pomts representmg them is only one
1s the measure we adopt ¡ d.anty of. two qu ªdrats nnght
. be measured. JI
techniq n iscussmg f f
26 ues. 0 ther ways of measurin d. . ~ur . requently used clustering
·· g 1ss1milanty are d.iscussed in Section
.

*lnstead of meas .
ind.ivid · · unng the amou
uals, lt is sometime nt of a species in
areal "cover." s preferable to meas h a _quadrat by counting the numbef ol
ure t e b1omass or, f or many plant spec1es,
. t!Jc
NfAREST·NEIGHBOR CLUSTER! NG
15

::x::22 - - - - - - - - - - - - - - - - 2

N
(/)::X:
w 21
______ I
1
: r
! rx,,
l.)
w 1---::x::l2 - ::X: -....!
1 11 1
(L
(/) 1 1
1 1
1 1
1 1
1 1
1 1
1 1

SPECIES 1
Figure .2.1. .The distance between points 1 and 2, Wl'th coord'mates (x 11 x ) and ( )
respect1vely, is d(l , 2). From Pythagoras's theorem, ' 21 x12, X22 ,

Next for question 2, on how to measure the dissirnilarity (now distance)


between two clusters when each may contain more than one point (i.e.,
quadrat): the different ways in which this can be done are the distinguishing
properties of the first three clustering methods described in the following.

2.2. NEAREST-N EIGHBOR CLUSTERING

In nearest-neighbor clustering, also known as single-linkage clustering, ~he


distance between two clusters is taken to be the distance separatmg
the closest pair of points such that one is in one cluster and the other in the
other (see Figure 2.2).

E · f h d re applied toan
XAMPLE. The following is a demonstration t e proce u . . 10 °
artificiaUy simple data matrix representing the amounts of 2 spec1es m
CLASSIFICATION BY CLUSTERI NG

16

/
/
-. . " "
........

I \
I • \
\" ", ..._ . \1
_.,.., /
------- N
,..,..,.,.----.......

.
F / "-.
f \
( \
1 1
\ I
\ /
"-, -- ----- / /

. f th distance between two clusters. The nearest-neighbor


Figure 2.2.. Two poss1bledi~easures o d ~e farthest-neighbor distance F is the longest distance
distance N is the shortest stance, an
between a ~er of one cluster anda member of the other.

quadrats. With only two species, it is possible to 12lot the data points in a
plane and .o.Ullille the successive clusters in the order in which they are
created .(see Figure 2.3).
Table 2.1 shows the data matrix (Data Matrix # 1) and, bel?w it, the
distance matrix. In the distance matrix the numerical value of, for example,
d(3 , 5), the distance between the third and the fifth poiñ.ts, 'lQP~S in the
(3, 5)th cell, which is the cell where the third row crosses the fifth column. It
is d(3 , 5) = 12.5. Since the distance matrix m7st ol;iously be symmetrical,
only its l!PlleJ: right half is sh9wn.

The smallest distance in the matrix (in bold face tyPe) is d(5 , 8) = 2.2.
Therefore, the first cluster is formed by uniting quiidrats 5 and 8. we call the
cluster (5, 8). '
The distance matrix · " .
d . is now
new istance matrix distance t0 reconstructed. as shown in Table 2 2
· · In thís
[5, 8) are entered in s every pomt from the newly formed cluster
. row 5 and column 5· 8 d ·h
asterisks to show that d ' row an column 8 are filled wit
qua rat 8 no long · Th
distance from [5 8) to . · er exists as a separate entity. e
' any pomt f · . f
d(3 , 5) and d(3 , 8). Since d ' _0! _mstance, to pomt 3, is the lesser 9
35
( ' ) - 12.5 and d(3, 8) = 14.4, the distance
f,.\BLE 2.t.
pATA MATRIX #1. THE QUANTITIES OF 2 SPE
A. UADRATS. CIES IN
10 Q r

-----
---
Quadrat

species 1
species 2
12
30
1

20
18
2 3

28
26
4

11
5
5

22
15
6

8
34
7

13
24
8

20
14
9

39 .
34
10

16
11

B rHEDISTANCEMATRIX (THE ROW AND COLUMN


. QUADRAT NUMBERS) LABELS ARE THE

Quadrat

I í
1 2 3 4 5 6 7 8 9 JO
1 o 14.4 16.5 25.0 18.0 5.7 6.1 17.9 27.3 19.4
2 o 11.3 15.8 3.6 20.0 9.2 4.0 24.8 8.1
3 o 27.0 12.5- 21.5 15.1 14.4 13.6 19.2
4 o 14.9 29.2 19.l 12.7 40.3 7.8;
5 o -23.6 12.7 2.2 25.5 7.2
6 o 11.2 23.3 31.0 24.4
7 o 12.2 27.9 13.3
8 o 27.6 5.0
9 o 32.5
10 o

TABLE2.2. THE RECONSTRUCTED DISTANCE MATRIX AFTER THE


FUSION OF QUADRATS 5 AND 8.

1 2 3 4 [5, ~ ] 6 7 8 9 JO
1 o 14.4 16.5 25.0 17.9 5.7 6.1 * 27.3 19.4
2 o 11.3 15.8 3.6 20.0 9.2 * 24.8 8.1
13.6 19.2
3 o 27.0 12.5' 21.5 15.1 *
7.8
4 o 12.7 29.2 19.1 * 40.3
[5, 8] o 23.3 12.2 , 25.5 5.0
*
6 o 11.2 * 31.0 24.4
7 o * 27.9 13.3
8 o * *
9 o 32.5

10 o

17
CLASSIFICATION BY CLUSTER!~

18

.
9

30

(f) 20
l±:!
l..)
w
o..
(f)

®
SPECIES 1

12

w
l..)
z
¡:!
(f)

o 4

® o
9 3 7 6 4 'º 2 5 8
Figure 2.3. (a) The data points of Data Matrix #1 (see Table 2.1). The "contours" show th1
successive fusions with nearest-neighbor clustering except that, for clarity, the final contou
enclosing ali 10 points is omitted. ( b) The corresponding dendrogram. Details are given ii
Table 2.3. The height of each node in the dendrogram is the distance between the pair o
clusters whose fusion corresponds with the node.

from point 3 to the cluster [5, 8] is

d(3 , [5,8]) = 12.5.

!~s values ª.PPe~rs in the (3, S)th cell of the reconstructed distance mató>
d . ~e 8entnes m. the fifth row and column, which give the distance
(~h[e sm est entry in the
*
' Dallfor all 1 5 or 8, are the lesser of d(j 5) a.nd d(j 8)
' ' . ·r
boldface) in the (2, 5)th cell ;econstructed dist~nce matrix is 3.6 (shown. I
· hus the next step m the clustering is the fusior
, RfST-NEIGHBOR CLUSTERING
NEA 19

Step Nearest
Fusionsª Distance Between
Number Pointsb Ousters
1 5,8 5, 8 2.2
2 [5, 8], 2 2, 5 3.6
3 [5, 8, 2], 10 8, 10 5.0
4 1, 6 1,6 5.7
5 [1, 6], 7 1, 7 6.1
6 '[5, 8, 2, 10], 4 4, 10 7.8
7 [5, 8, 2, 10, 4], [1, 6, 7] 2, 7 9.2
8 [5, 8, 2, 10, 4, 1, 6, 7], 3 2, 3 11.3
9 [5, 8, 2, 10, 4, 1, 6, 7, 3], 9 3,9 13.6
10 Ali points are in one cluster

ªUnbracketed· numbers refer to individual quadrats · The numb ers m


· square b rackets are
the quadr ats m a e1uster.
hThe distance between these two points defines the distance (given in the last column)
between the two clusters united at this step.

of the existing two-member cluster [5, 8] with quadrat 2 to form the


three-member cluster [2, 5, 8] .
. The distance matrix is reconstructed again, by adjusting the entries in
row and column 2 and putting asterisks in row and column 5. And the
procedure continues. The succession of steps is summarized in Table 2.3. At
every step a new cluster is formed by the fusion of two previously existing
clusters. (This includes one-member "clusters.") The final column in Table
2.3 shows the distance separating the clusters united at each step.
The procedure is shown graphically in Figure 2.3a; Figure 2.3b shows the
result of the clustering as a tree diagram or dendrogram. The horizontal
links in the dendrogram are known as nodes and the vertical lines as
i~ternodes. The height of each node above the base is set equal to the
d1stance between the two clusters whose fusion the node reptesents. These
distances are shown on the vertical scale on the left.
It should be noticed that the ordering of the points (quadrats) along the
?0 ttom is to sorne extent optional. Thus if the labels 1 a~d ~ were
mterchanged, or 5 and 8, there would be no change in the implications ~f
the dendrogram. The dendrogram may be thought of as a hanging mobile
20
. bl t swivel freely where it is attached to the intern 0
w1 th each node ª e 0 de
above it. . h d t divide the quadrats mto · e1asses, the r e are obviou
· 1
Jf we w1s · e hich
o ·t could be done, all of them arbitrary.
. The arbitras.l
m e the points exhibtt no natural clustenng. The contours.n.
several. ways w 1 . . .
b
ness anses ecaus . . .. 10
Figure do not represen! abrupt d1scontmu1t1es any more than the
contour230
lines on a relief map of hilly country represen! steps VlSlble on the

ground.
There are occasions, however, when an "unnatural" classification (some.
times called a dissection) is needed for practical purposes. For example
. mapping even if' 1·n,
classification is required as a preliminary to vegetation
fact, the plant communities on the ground merge mto one another with
broad, indistinct ecotones. The lines separating communities on such ama
are analogous to contour Jines on a relief map, and are no less useful. p
How to distinguish clusters, given a dendrogram like that in Figure 2.3b
is a matter of choice. Sorne common ways of classifyin,g are as follows : '

l. T~e number of cl~sses to be recognized is decided beforehand. Thus


~uppose it had been dec1ded to classify the 10 points in . Data Matrix #1
mto 4 classes. The memberships of the classes are found b d .
horizontal
. line across the dendrogram at a level h . Y rawmg. a
nodes It will b w ere 1t cuts four rnter·
(4, 10."2, 5, 8]. e seen that the resultan! classes are [9], [3], (7, 1, 6], and

2. The minimum distance that mu


recognized as distinct may be d .d dst separate clusters for them to be
distance of 10 units were chos ~c1 hie. beforehand. Suppose a minimum
b . en m t s example Th hr
e recogruzed, namely (9) ¡ d · en t ee classes would
, ' 31 ' an [7, 1, 6, 4, 10, 2, 5, 8].

lf the internodes of a de d
with short ones at the botto: r:;~alm are of conspicuously different lengths
the polints fall naturally into clust~ng onehs at the top, then it follows that
examp e. rs w1t out arbitrariness. Consider an

~XAMPLE. Data Matrix #


m 10 quadrats Th 2 (see Table 2.4) sh
the dendrogr . e data points are show ows the amounts of 3 speciel
am resulting f n graphically · F' d
The separation . t rom nearest-neighb . in igure 2.4a, aJI
obvious in both ~·º
three classes, namely (1 3or clustenng is in Figure z.4b·
iagrams, and no for i' ' '2], [4, 5, 6, 7] and (8 9 10] Jl

•llliill~~~~~~~~~~~~=m=a~c~l~u~st~erin g procedure
' is needed
' ' to
f;\BLE 2.4. DATA MATRIX #2. THE QUAN
QUADRATS. TlTIES OF 3 SPECIES IN
10
Quadrat 1 2 3 4 5 6 7 8 9 10
Species 1 24 27 24 8 10 14 14 36 36 41
Species 2 32 30 29 20 18 20 22 14 9 12
Species 3 3 1 2 11 14 13 12 8 8 6

40

¡ ' - - .......
I IA \
3+ • 2 )
1 \
30
1
----- .........
\ , ___ ./ /

/
/
' \\
N
I
/
*7
.6 l


(f) 20 4-=* I
w
u
1
1 I /
/----- ........
..... ,
w \
5 /
o...
(f) \
"
........._ __ .,......,.. _,,/
/
I
!
ª* * 10
\
1
10 1
1
\
9*__
'- .__ ........... /
/
I
J

@ 10 20 30 40
SPECIES 1
20

15

w
u
z
<C
1- 10
(f)

F· ® o
1 3 2 4 s 6 1 e 9 10

in~;·4· (a) The data points of Data Matrix #2 (see Table 2.4). The amount of species J
detu1ro :drat ts shown by the number of "spokes" attached to each point. ( b) The
gr yielded by nearest-neighbor clustering.
21
22
bowever, is the fact that, Witq
. them. What makes the task e_asy, an be displayed in visualizab¡
recogruze f da ta pom ts e e
three species, the swarm o . used in figure 2.4b of represent
on ly b the dev1ce " ·
three-dimensional space, or ~ d. ate by the number of spokes,,
ing the magnitude of the .third coodr. mensional coordinate frame. When
· a two- 1m
radiating from the po1Ilts 1Il . 1 tly when the swarm of data Püints
· r equ1va en ' . .
there are many spec1es 0 ' . ¡¡z·ation becomes 1mposs1ble anct a
. · al space, v1sua
occupies a many-d1mens10n d a dendrogram.
formal procedure is needed to pro uce .
. . t often used in practice because it is
Nearest-neighbor clu~t~nn~ ishnot ndency for early formed clusters to
. · Cha1Il1Ilg is t e e
prone to ehammg. . f . gle points one after another in
w b the accret1on to them 0 Slfl .
gro .Y a- be seen in Figure 2.3 where the first, tigh¡
success1on. The eu ect can h 4
icks up the points 2, then 10, and t en , one ata
two-membere1us er , P
t [5 81 .
time, before uniting with another cluster containing more than one pomt.' H
a classification is intended to define classes for a purpose such as ve~e~ation
mapping, then a method that frequently leads to exaggerated chammg is
defective. It results in clusters of very disparate sizes. Thus, as shown before,
when the dendrogram in Figure 2.3b is used to define three clusters, two ol
them are "singletons" and ali the remaining eight points are lumped
together in the third cluster. For vegetation mapping, or for descriptive
classifications generally, one usually prefers a method that yiefds clusters ol
roughly equal sizes. However, chaining may reveal a true relationship
among the quadrats. Therefore, if what is wanted is a dendrogram that is in
sorne sense "true to life," a clustering method that reveals natural chaining,
if it exists, certainly has an advantage.
l~deed~ a dendrogram is more than merely a diagram on which a
cl~ssificatton can be based. It is a portrayal in two dimensions of a swarm of
pomts occupying many dimen s1ons · (as many as there are species). A
~tendrogr.aghm need not be used to yield a classification. I t can be studied in
1 s own n tas a representation f th · · . among a swarm of
d · 0 e mterrelat10nships
1
a a pomts. Sorne workers find a d d
two-dimensional 0 d" t' en rogram more informative than a
r ma ion.

2.3. FARTHEST-NEIG HBOR CLUSTERING

In f arthest-neighbor clustering (al kn


the distance between two so . own as complete-linkage clustering)
1
b'tween a point in one clustc usters is defined as the maximum distance
er and a po · t ·
m m the other (see Figure 2.2).
T-NEIGHBOR CLUSTERING
f,i\l~rHES 23

LE z.5. STEPS IN THE FARTHEST-NEIGHBOR


f,.\ B ~KATRIX #l. CLUSTERING OF

-----
DATA !V~----------:::--------
SteP
Nurnber
Fusions "Farthest"
Pointsª
--~~~~~~~~::::__
n·1stance between
Ousters

ºThese are the quadrats whose distance apart, shown in the last column, defines the distance
between the two clusters.

To apply the method to Data Matrix #1 , we again start with the


distance matrix in Table 2.1 and unite the two closest clusters (at this stage,
individual points) which are, of course, points 5 and 8 as before. But in
compiling each of the sequence of reconstructed distance matrices, we use
the greater rather than the lesser of two distances. For example, the distance
between cluster [5, 8] and point 2 is defined* as
d(2, [5, 8]) = max[ d(2 , 5), d(2, 8)] = max(3.6, 4.0) = 4.0,

for farthest-neighbor clustering, whereas it was defined as


d(2, [5, 8]) = min[ d(2, 5), d(2, 8)] = min(3.6, 4.0) = 3.6
for nearest-neighbor clustering. The two clustering procedures (neares~­
neighbor and farthest-neighbor) are the same in all respects ex~ept for this
changed definition of intercluster distance. The result of clustenng the data
in Data Matrix # 1 by farthest-neighbor clustering is. show.n in Table 2.5
~ Figure 2.5. The figure should be compared w1th Figure 2.3. The
difprence
·-a~,miesis conspicuous.
· "
· 1
"t" th t i·t tends to y1eld e usters
. ..:i t-ne1ghbor clustering has the men a .
fairly equal in size. This is because the farthest-neighbor d1stance

lienotes the max.imum of x and y, and analogously for min(x, y).


A TION BY CLUSTERIN"
CLASSI FIC "

24

30

N
20
(f)
LLl
G
LLl
o...
(f)

10

40~
30

w
u
z
<t 20
1---
(f)

o
10

® o
7 1 6 2 5 8 4 10 3 9

Figure 2.5. Farthest-neighbor clustering applied to Data M atnx


· # 1. The dendrogram :
based on Table 2.5.

·
between two populous ne1ghbonng · o f ten 1arge m
· clusters is · .sp~·1e of the. fa'el
that, as a whole, they may be very similar. Consequently, 1t is more lik.
that an isolated unattached point at a moderate distance will be united wit
one of them than that they will unite with each other. Hence an anomaloV
quadrat may become a cluster member quite early in the clustering proce~
and the fact that it is anomalous will be overlooked.
When true natural clusters exist, the outcomes of nearest-neighbor 0, ª
farthest-neighbor clustering are usually very similar. Farthest-neighbor clO·
tering applied to Data Matrix # 2 gives results indistinguishable frotn tbº5
in Figure 2.4.
CtUSTERING 25
. 1rllº'º
cEr•

CENTROIO CLUSTERING
2.4.
(entrOl
.d c/ustering is one of several methods designed t O St rik e a h appy
· di rn between the extremes of nearest-neighbor clustering on the
rne u . hb . one
halld and farthest-neig or e1ustenng on the other. Nearest and farthest-
neighbor rnethod~ hav~ the defe~t that_ they are in~uenced at every step by
the chance l~cat10ns m. th~ ~-dunen~10nal coordmate frame of only two
. dividual pomts. That IS, It IS the distance between two points only that
ind ides the outcome of each step. Centroid clustering overcomes this
ec ..
drawback by ~sing a defim~10n ~f intercluster distance that takes account of
the locations of all the pomts m each cluster. To repeat, there are many
ways in which this might be done, and centroid clustering, described in this
section, is only one of the ways. A more general discussion of the various
roethods and how they are interrelated is given in Section 2.7.
In centroid clustering the distance between two clusters is taken to be the
distance between their centroids. The centroid of a cluster is the point
representing the "average quadrat" of the cluster; that is, it is a hypothetical
quadrat containing the average quantity of each species, where the aver-
aging is over all cluster members. Hence if there are m cluster members and
sspecies, and if we write ( c1 , c2 , ... , es) for the coordinates of the centroid,
then
1 1 m
c1 = - (x 11 + x 12 + · · · +x1m)
m
= - L X1J'
m J=l

and, in general,

For example, the centroid of the three-point cluster [2, 5, 8) in Data Matrix
#1 has coordinates [(20 + 22 + 20)/3, (18 + 15 + 14)/3) = (20.67, 15.67).
.The clustering procedure is carried out in the same way as f~r n~arest­
netghbor and farthest-neighbor clustering. Thus each step consists I~ the
fuston of the two nearest clusters (as before, a cluster may have only a ~mgle
lllQQiber); one finds which two clusters are nearest by searching the mter-
r distance matrix for its smallest element. As in the methods alre~dy
·......i • d f h fusion by entermg
~, the d1stance matrix is reconstructe a ter eac
CLASSIFICATION BY CLUSTER
I~~
26

. . h wly formed cluster to every other cluster. B


in 1t the d1stances from l e ne ·d Th · Ut
now the distances are those separating cluster centrdo1 s .. b de wfay m Which
· constructed 1s escn e a ter we h
the successive distance matnces are ave

The dendrogram produced by applying the centro1d clusten~g procedure


looked at sorne results. . .

to Data Matrix # 1 is shown in Figure 2.6. It should be notlced that th


dendrogram is intermediate between that yielded b)' nearest-neighbor cluse
tering (Figure 2.3b) and that yielded by farthest-neighbor clustering (Figure
2.5b ). Thus in centroid clustering, as in farthest-neighbor clustering, points
3 and 9 unite to forro the cluster [3, 9], whereas in nearest-neighbor
clustering these two points are chained, one after the other, to the cluster
formed by the remaining eight points. But centroid clustering, like nearest.
neighbor. clustering, c~ains point I?
and then point 4 to cluster [2, 5, 8],
whereas m farthest-ne1ghbor clustenng, cluster [10, 4] is formed first and 18·
only later united with [2, 5, 8].
Ta~le 2.6, which resembles Tables 2.3 and 2.5, shows the ste s .
centro1d
. . clustering
. of Data Matrix # 1· Ob serve t h at mstead
. of ,ap e m1 the
g¡vmg the dtstance between the two clusters united at each o uw
column gIVlng the square of this a· t Th step, there IS a
Is anee. e reason for this, together with

20

15

LU
u
z 10
~
Cf)
i5

11ie deftdro o
l 6
7 ' •
&ram PrOduced by
'
s
la io 4
• ~~· 3
Ylng centroid clust . 9
enng to Data Matrix #l. Jt
r 1
OID CLUSTERING 27
eEN TR
TABLE 2.6. STEPS IN THE CENTROID CLU
:MATRIX #l. STERINGOF DATA

Step Square of
Number Fusions lntercluster Distance

1 5, 8 5.00
2 2, [5, 8] 13.25
3 1,6 32.00
4 10, [2, 5, 8] 43 .56
5 7, [l, 6] 73.00
6 4, [2, 5, 8, 10] 162.50
7 3,9 185.00
8 [7, 1, 6], [2, 5, 8, 10, 4] 326.25
9 [7, 1, 6, 2, 5, 8, 10, 4], [3, 9] 456.83
10 Ali points are in one cluster

the derivation of these values, is explained in the following. Distances rather


than distances-squared have been used to fix the heights of the nodes in the
dendrogram so that the three dendrograms so far obtained from Data
Matrix # 1 may be easily compared.
We now consider the computations required in centroid clustering.
Determining the distance between two cluster centroids is perfectly
straightforward. Suppose there are s species, so that the distance required is
that between two points in s-dimensional space or s-space. Let the two
centroids be labeled C' and C"; their coordinates, obtained from Equation
(2.1), are (e{, cí, ... , e;) and (c-í', e'{, ... , e;'), respectively. Then the square
of the distance between e/ e" is
and
2 ( )2 ( / ")2
d2 (C',C") = (cí - cí' ) + cí - e'{ + · · · + es - es ·
As an example, recall Data Matrix # 1 (see Table 2.1) and !et. C' and C ",
be the clusters [7, 1, 6] and [2, 5, 8, 10, 4], respectively. The coordmates of C
are
(e;, cí) = [t(13 + 12 + 8), 1(24 + 30 + 34)] = (11, 29.33);
analogous calculations show that the coordinates of C" are
( ci', en= (17.8, 12.6).
CLASSIFICATION BY CLUSTEl{i~C

28
thern is
of the distance between
Hence the square )2 - 326 24
")2 +(e'2 - e~ - · ·
d2(C',C")={c{-C1 .. .
d. error in the final d1g1t) Wlth the
This corresponds (except for a ro~n mtgp #8 in Table 2.6.
d' oppos1te s e .
squared intercluster ist~nc~ t straightforward way of findmg the
. . nnc1ple the mos . . .
Although thi s 1s, m P ' 1us ter centroids, it 1s mconveruent in
. between two e
square of the distance . b e the coordinates of the original
. . . nce anses ecaus .
pract1ce. The mconveme . h lations every time. lt 1s more efficient
. h t be used m t e ca1cu
data pomts ave 0 . t of each successive distance matrix
f lly to denve the e1emen s
computa 10na d rs The following is a demonstration of
from the elements of its pre ecesso . . D Matrix # 1 after which the
the first few steps of the process applied to ata '
generalized version of the equation is given. . . .
Each element in the initial distance matnx, m~re prec~sely a d1sta~c~
2

matrix, is the square of the corresponding element m the d1stance matnx m


Table 2.1. This initial matrix is at the top in Table 2.7. lts smallest element
(in boldface) is d 2(5, 8) = 5.00. Therefore, once again, step # 1 is the
formation of cluster (5, 8].
Now we do the first reconstruction of the distance 2 matrix. As before,
since point 8 no longer exists as a separate entity, the elements of row and
column 8 are replaced with asterisks. The fifth row and column, now labeled
[5, 8], contain the distances-squared from the centroid of [5, 8] to every other
point. The required distance-squared from the jth point is (as proved later)

(j, 5) + td 2 (j, 8) - ~d 2 (5 , 8) .
2 2
d (J, [5, 8]) = td (2.2)
Thus, when j = 1,

d2(1, [5, 8]) = td2(1, 5) + !d2(1, 8) - !d2(5 , 8)


= 3~5 + 3~0 - 24 = 321 •25 •
Likewise, when j = 2,

d2(2, [5, 8]) = 1f- + lf - l = 13.25 ,


and so on.
. These values appear in the row
distance2 matrix in Table 2 7 All and column labeled [5, 8] in the second
It is seen that the sman . t. 1 othe~ elements remain as they were.
es e ement 1 th
[5, 8]. Hence the second t . n e new matrix is 13.25 in row 2,
s ep m the el
the three-member el · · d
ustenng is the fusion of 2 an
uster [2, s, 8].
27
THE FIRST THREE D!STANCE' MATRICES CONSTRUCTED
TABLE · éENTROID CLUSTERING OF DATA MATRIX #1.'
pVRJNG . . '
.. tial matrix. Each element is a d1stance between two points.
(ll Tbeim 1 2 3 4 5 6 7 8

1 o 9 10
208 272 626 325 32 37 320 745
2 377
o 128
o 250 13 400 85 16 617
3 730 157 464 65
4 o 221 850
229 208 185 369
365 162 1625 61
5 o 557
162 5 650 52
6 o 125 544 961 593
7
o 149 776 178
8
o 761 25
9
o 1058
o
1O . after the fusion of points 5 and 8.
(2) The second matnx, 4 [5 8] 6
1 2 3 ' 7 8 9 10
131.25 *
1 13.25
*
2 181.25
*
34 • 190.25 *
• 549.25 154.25 * 704.25 37.25
[5,8]
6 *
7 *
* *
8
9
JO
.
. af ter the fus10n of2and[5 , 8].
(3) The third matnx, 5
8 9 10
6 7
1 [2 ,5,8] 3 4
* 672.22 43.56
1 280.56 * 486.56 128.22 *
[2,5,8] 160.56 207.22 *
*
* *
3
* * *
4 * *
5 *
*
6 * *
7
8
9 Unchanged
·ces are shoWO.
1O . · earlier
fro111 those Ul
01 atn
ts diffenng 29
ªln mctrices 2 and 3 only elemen
elements are shown as dots.
30 CLASSIFICA TION BY CLUSTERlt-.Jc

Th e distance 2 matnx· must now be reconstructed


. anew. The f of
¡ elements
d 1 mn [5 8] are replaced with astensks. The new e ements or the
co uand co'1 umn, now labeled [2 ' 5, 8], are found from the for111u1a
row an row
·second

d2(J, [2, 5, 8]) = td2(J, 2) + ~d2(j, (5, 8]) - id2(2 , [5 , 8]). (2.3)
When j = 1,

d2(1, [2, 5, 8]) = 2~8 + t X 321.25 - ~X 13.25 = 280.56 ;

when j = 3,

d2(3, [2, 5, 8]) = 1 ~ 8 + tX 181.25 - i X 13.25 = 160.56 ;

and so on. These values appear in the row and column labeled [2, 5, 8] in the
third matrix in Table 2.7.
Equations (2.2) and (2.3) are particular examples of a general equation
which we now derive. It is the equation for the distance 2 from any point (or
cluster centroid) P to the centroid Q of an ( m + n )-member cluster created
by the fusion of two clusters [M1 , M 2 , ... , Mm] and [N1 , N , .• • , Nn] with m
2
and n members, respectively. The centroids of these clusters are M and N.
The set-up is shown in Figure 2.7.

Let MP =a, NP = b, and MN =c.

Let the angles MQj =a and NQP = f3 with a + f3 = 180º.


2
The distance required is x , where PQ = x. Recall that x2 is needed as
~n eled~ent o; the r~w or column headed [M1 , M 2 , ... , Mm , N , N , ... , N ]
m a istance matnx undergom· g · 1 2 · 11
. reconstruct10n as one of the steps m a
clustenng operation. The values of a 2 b2 and c2 kn · h
1 · . 2 ' ' are own smce t ey are
e ements m the d1stance matrix constructed at an earlier step.
M

~ \
\
\
\
\
\
\
\
\
\
\

~igure2 72·7·
\
\ Illustration of the derivation of Equa·
~ion ( . ). Q is the centroid of clusters M and N; il
p ts assumed that to
N th n > m and, therefore, Q is closer
an to M. See text for further details.
CENTROID CLUSTERING
31

As a preliminary to finding x 2 it is necessary t fi d


. M d N o n MQ (or NQ = e -
MQ). Smee an are,
· respectively ' the cen tro1.ds of m-member and
n~merober clusters an d Q is at their center of grav1·ty, it
. is
. clear that

MQ= m:n and NQ=~


m +n·
Now, from Apollonius's theorem, *

(PQ)2 +(MQ)2 - 2(PQ)(MQ)cosa = (MP)2

and
2 2
(PQ) + (NQ) - 2(PQ)(NQ)cos{3 = (NP)2.

Equivalently,

n 2c 2 2xnc
x2 + - --cosa= a2 · (2.4)
(m + n)2 m+n '

m 2c 2 2xmc (2.5)
x2 + · 2
- --cosf3 = b2 .
(m+n) m+n

Multiply (2.5) by n/m, and do the substitution cosf3 = cos(180º - a)=


-cosa. Then
2
nx 2 nmc 2 2xnc b n (2.6).
-+ +--cosa=-.
m (m + n)2 m +n m

Add (2.4) and (2.6) to eliminate cosa. The sum is


2 b2 n
x 2 ( 1 + !!._ ) + ne 2 .( m + n) = a 2 + -¡;;
m (m + n)

whence
m + n) nc2 - a2m + b2n
2
x ( -;;-
+ --
m+n m
.

*A proof of Apollonius's theorem is given as an appendix to this chapter.


CLASSIFICATION BY CLlJ STERINc
32

Multiply through by m/( m + n) to obtain


mnc2 a2m + b2n
2
x + 2 m +n
(m + n)
or
m 2 n b2 - mn 2
+- (2.7)
x2 =m+n
--a m+n (m + n)2 e .

It is seen that (2. 7) is the required general form ?f


(~.2) ~nd (2.3).
It is now apparent that when centroid clustenng IS bemg .done, the most
convenient measure of the dissimilarity between two clusters IS the square of
the distance separating their centroids rather than the distance itself (but see
page 46). The terms x 2 , a 2 , b 2 , and c 2 in Equation (2.7) are all squared
distances, and use of this equation makes it easy to construct each distance 2
matrix from its predecessor as clustering proceeds from step to step. There
is no simple relation among x, a, b, and e when they are not squared.
Centroid clustering has been described with greater mathematical rigor
by Orloci (1978), who gives an example. He uses the name "average linkage
clustering" for the method. Average linkage clustering is an inclusive term
for severa! similar clustering procedures of which centroid clustering is one.
The interrelationships among the several methods are brie:fly described in
Section 2. 7.

2·5· MINIMUM VARIANCE CLUSTERING

T~s i~ the dlast ~lu~te:rng method that is fully described in this book Before
gomg mto etails, It is necessary to defin h . . .
and to give two methods f . .e t e term withzn-cluster dispersion ,
or computmg It Th fi f
obvious one implied by th d fi . . · e rst o these methods is the
'
way of obtaining the identical
e e mtion The s
1
b d ·
econ ' ~onobVIous method 1s a
.
First, the definition. the w·thinr~su lt y a c.omputationally simpler route.
defined as the sum of th· I -e uster d ·
ispersion of a cluster of points is
h e squares of the d. t
t e centroid of the cluster. is anees between every point and
Next, we illustrate the comput 1·
a 1ons.

E~~E. Consider Data Matrix


q antities of each of t . . # 3 shown in Table 2 8 It li t the
wo spec1es in fi . . ss
ve quadrats· ·t d
' I can be represente
flp....- - - - - - - - - - - - - - - - - - - · -
. MtNtMUM VARIANCE CLUSTERING
33

TABLE 2.8. DATA MATRIX #3. THE QUANTITIES OF TWO SPECIES


IN f!VE QUADRATS.

Quadrat 1 2 3 4 5
Species 1 11 36 16 8 28
Species 2 14 30 20 12 32

graphically by a swarm of five points (representing the quadrats) in a space


of two dimensions, that is, a two-dimensional coordinate frame with axes
representing the species.
Suppose the five points have been combined into a single cluster, as they
will have been when the last step in a clustering process is complete. The
centroid of the five-point cluster has coordinates
= ( 11 + 36 + 16 + 8 + 28 14 + 30 + 20 + 12 + 32)
( C1' Cz ) 5 ' 5
= (19.8, 21.6).
Now write Q[l, 2, 3, 4, 5] for the within-cluster dispersion of the cluster of
points 1, 2, 3, 4 and 5; let d 2 (j, C) be the square of the distance from the
centroid to the jth point.
Then, from the definition,
5
Q[l,2,3,4,5] = L d 1 (J,C)
j=l

= d1(1,C) + d1(2,C) + ... +d2(5,C)

= [ (11 - 19.8)2 + (14 - 21.6)2)


+ [ (36 - 19.8)2 + (30 - 21.6)2]

+ ... + [ (28 - 19.8)2 + (32 - 21.6)2]

= 892.
. . same result is to use the equation
A simpler way of obtammg the

2 3 4 5]=_!_ ~d
2
Q[l ' , ' ,
(J,k).
n J<k
CLASSIFICATION BY CLUSTE
34 ~IN~

(For a proof, see Pielou, 1977, p. 320.) Here n = 5, the number of point .
the cluster; d 2 (j, k) is the squared distance . between
ki points
h j . and k·' sthe~
summation is over every possible pair of pomts, ta ng eac parr once. l'hi
is the reason for putting the condition j < k below the summation sign s
ensures that, for example, d 2 (1, 2) shall be a component of the sum, but · lt
d 2 (2, 1) which is merely a repetition of d 2 (1, 2). There are n ( n - 1)/2 ,,no¡
distinct pairs of points, and hence 10 distinct components of the ¡ !fj
d 2 (i, k ). Thus ºl'JJJ

2
Q[1,2,3,4,5] = Hd 2 (1,2) + d 1 (1,3) + ... +d (4 , s)}

with 10 components between the braces. Now

d2(1, 2) = (11 - 36)2 + (14 - 30)2 = 881,

d2(1, 3) = (11 - 16)2 + (14 - 20)2 = 61,


.............. . . . . . . . . . . . . . . . . .

d2(4, 5) = (8 - 28)2 + (12 - 32)2 = 800.

Hence

Q[l, 2, 3, 4, 5] =·!X 4460 = 892


' '
as before.

. For a two-member clust .


d1spersion is er, say of pomts a and b, the within-cluster

. Q[a,b)=~d2(a,b);
that is, it is half th
F e square of th d.
d' or. a one-member cluster e istance between them
ispers1on is zero· h . ' say point a b . .
We are now . ' t at is, Q[a] = O. y Itself, the within-cluster
m a P ..
each step th os1tion to descn·b . .
. ' ose two 1 e nunun
mcrease in withi le usters are to be un·t d um variance clustering. At
matters is not s· n-c uster d.ispersion 1t ·1 e. whos e fus1on · yields the Ieast
formed cluster ~upt yththe value of the wi1tshinrrnplortant to notice that what
1
' e amount by which thi -e ust er a·ispersion of a newlY
s value exceeds the sum o¡
MINIMUM VARIANCE CLUSTERING
35

itbin-cluster dispersions of the two separate 1


w ew cluster e usters whose fusion formed
the n ·
for example, consider the two clusters [a b ]
. ' , e and [d e] h · · hi
uster dispers10ns of Q[a, b, e] and Q[d e] r . ' ' avmg w1t n-
Cl ' , espectively s h
united to form a new cluster whose withi 1 . up~ose t ey are
· · . n-c uster d1spersion ·
Q[ a' b, e, d'e ]. Th e mcrease m w1thin-cluster d1.sp ers1on
. h
t at thi f · h
is
brought about, denoted by q([a, b, e], [d, e]), is s us10n as

q([a,b,c], [d,e]) = Q[a,b,c,d,e] _ Q[a,b,c] _ Q[d,e].

It is values such as q([a, b, e], [d, e]) that are the criteria for deciding
which two clusters should be united at each step of the clustering process.
At every step the clusters to be united are always the two for which the
value of q is least.
As with the clustering procedures described in earlier sections the
mínimum variance method also requires the construction of a sequen~e of
"criterion" matrices. Then the position in the matrix of the numerically
smallest element indicates which clusters are to be united next. The matrices
obtained when minimum variance clustering is applied to Data Matrix # 3
are shown in Table 2.9. In addition to the sequence of criterion matrices Q 1,
Q2 , Q3 , and Q4 , and printed above them, is the matrix D 2 . It is a distance 2
matrix whose elements are the squares of the distances separating every pair
of points. The elements of D 2 are used to construct the successive criterion
matrices.
We now carry out minimum variance clustering on the data in Data
Matrix # 3. Q , the first criterion matrix, has as its elements the within-clus-
1
ter dispersion of every possible two-member cluster that could be formed by
uniting two individual points. It has already been shown that for a two-
member cluster consisting of points j and k, say, the within-cluster disper-
sion is

· 1 e half the values of the


Therefore, the elements of Q 1 are slffip Y on -
2
corresponding elements of D • . 4)
Th 11 · Q is 6 5 (shown in boldface) m cell (1, ·
1
e sma est e ement m i · t et the next
Therefore, the first cluster to be formed is [1, 4]. We now co~s ru . t 4 no
. . d column 4 smce pom
cnterion matrix Q • I t has asterisks m row an '
2
TABLE 2.9. sucCESSIVE MATRICES CONSTRUCTED IN THE MINIM:
VARIANCE CLUSTER!NG OF DATA MATRIX # 3. 111¡

2
The Distance Matrix 2 3 4 5
1
881 61 13 613
1 o
2
o 500 1108 68
o 128 288
D2= 3
o 800
4
5
o
Tbe Sequence of Criterion Matrices
1 2 3 4 5

1 o 440.5 30.5 6.5 306.5


2 o 250 554 34
Q¡= 3 o 64 144
4 o 400
5 o
[J , 4] 2 3 4 5
[1 , 4] o 660.83 60,83
* 468.83
2 o 250
* 34
Q2= 3 o 144
4
*
o *
5
o
[J , 4] [2,5] 3 4 5
[1 , 4] o 830.25 . 60.83
[2 , 5)
o * *
251.33
Q3= 3 * *
4
o * *
5 o *
[J , 4, 3]
o
[2, 5] 3
[1 , 4, 3] 4 5
o 790.67
[2 ,5 ]
o * * *
Q4= 3 * *
4 o
5
*
o *
o

36
MINIM UM VARIANCE CLUSTERING
37
1
Jonger exists as adseparate entity. In the J.t h cell of h
(now the row an column for cluster [l ' 4]) is . en tered
t e first row and col umn

q(J, [1 , 4]) = Q[J, 1, 4] - Q[ J·1 - Q[l,4].

These terms are evaluated for every 1· no t equal to 1


3 and 5. Recall that for any j value or 4, that is, for j = 2,

ºu, 1, 4J H d 2u' 1) + d 2( ],. 4) + d 2(1,4)},


=

Q[J] =O ,

and
Q[l,4] = !d 2 (1,4).

Therefore,
. . letting j take the values 2' 3, and 5 m
. turn and t ki 1h
2
reqmred d1stances from the matrix 02, it is found that ' ng e ª
q(2, [1,4]) = Q[l,2 ,4] - Q[2] _ Q[l, 4]

= H d2(1, 2) + d1(1, 4) + d2(2, 4)} - o -1d2(1, 4)


= t{881+13+1108}-0-lf

= 660.83.

Likewise,
q(3, [l, 4]) = Q[l, 3, 4] - Q[3] - Q[l, 4]

= 67 .33 - o- 6.5

= 60.83

and q(S, [l, 4]) = 468.83.

It will be seen that the values just computed appear in the first row of Q1
(they would also appear in the first column, of course, if the whole matrix
were shown, but it is unnecessary to print the matrix in full because it is
symmetric).
The remaining elements in Q are the same as in Q1 · .
5 15
The smallest element in Q i; 34 the value of q[2, 5]. Hence [Z, 1 the
~ 2 '
cond cluster to be formed.
CLASSIFICATION BY CLUSTER.IN<;
38

1 t 5dis .no
th terms o f Q 3 . Since poin
. ·1ar process, we calculate f herow an d co lumn are rep ace With
By a sum the elements in the ti t e the row and column for the
longer separate, d row and column becom 1 sters [l 4] and [2, 5] now
. ks The secon member e u ' . hin
astens . 5] so that the two two- . The increase in w1t -clus-
new cluster [2, d 'econd positions in the matruct d to make a four-member
occupy first an s esult if they were um e
ter dispersion that would r
cluster is [ ]
5]
([1,4], [2,5]) = Q[l,2,4, - Q[l ' 4] - Q 2 ' 5

q = -!{ d2(1,2) + d2(1,4) + d2(1 ' 5) + d2(2, 4)


+d2(2, 5) + d2(4, 5) } - 2ld2(1 ' 4) - ld2(2
2 , 5) = 830 .25.

The procedure far compu ting the elements o f the Q matrices should now

be The
clear.smallest element m. Q3 is. 60. 83 in cell ([l, 4], 3). Therefore, the next

fusion creates the cluster [l, , ~·


4 3
. d d by the final fusion be-
The gain in within-cluster d1spers1on pro uce . 1 1 t in Q It is
tween [1, 4, 3] and [2, 5] is 790.67, the only numenca e emen 4·
found from the relation

q([l,3,4], [2,5]) = Q[l,2,3,4,5] - Q[l,3,4] - Q[2 , 5] .

After the final fusion, when all five points have been umte
· d m
· t o one
cluster, the within-cluster dispersion is

º[1, 2, 3, 4, 51 = ~ ¿ d 2 u, k)
j<k
= s92,

as already derived at the beginning of this section.


To summarize, let us consider the step-by-step increases that took place
in the total within-cluster dispersion (hereafter called the total dispersion) as
clustering proceeded. At the start, there were five separa te points (or
one-member clusters), ali with zero within-cluster dispersion. Therefore, the
total dispersion was zero. Formation of cluster [l, 4] raised the total disper·
sion by the amount of its own within-cluster dispersion, namely, 6.5.
Likewise, formation of cluster [2, 5] added 34 to this total, bringing it to
40.5. Next, formation of cluster [l, 3, 4] brought an increment of 60.83 to tbe
I
M VARIANCE CLUSTERING
MINI MU
39

al (recall that the elements in the criterion matrices h . .


tot .
. ersion that the d1fferent .
poss1ble .
fus1ons would b are
. t be mcreases m
d1sp . . rmg a out, not the
.thin-cluster dispers1ons themselves). The final fusion of [l d[
brought an increment o f 790 .67 . Thus m
W1 · numbers, ' 3' 41 an 2• S]

6.5 + 34 + 60.83 + 790.67 = 892.

In words: the total disp~rsion is the sum of the smallest elements in the
successive criterion matnces, the Qs. The clustering strategy consists in
augmenting the total by the smallest amount possible at each step.
Figure 2.8 shows the data points of Data Matrix # 3 and the dendrogram
we have just computed. The height of each node is the within-cluster

•5
30 •2

(,() 20 •3
w
u
w
Q_
(,()
.,
•4

20 30 40
10
SPECIES

z 892
o
(/)
cr:
w
Q_
75
(f)
o
cr:
w 50
1-
(f)
::)
_J
l.)
1
25
~
I
1-
~
5 yielded
o ' 4 3 2 8) and the dendrograro
. # 3 (see Table 2.
. f D ta Matnx
Figure 2.8. The data pomts .º : the data.
by nunimum variance clustenng 0
CLASSIFICATION BY CLU STER!~~
40

1673 . 8

765. 2

z
o
(j)
a:: 174
w
o...
(j)
i5
a::
w
f-
(j)
::::>
_J
u 92 . 5
1

z
J: 64 .7
t:
~

30.5
16
11 .3
2.5
3 9 7 1 6 2 5 8 4 10

Figure 2.9. The den drogram produced by applying mínimum . .variance clustering
h d to Data
Matrix #1. The scale on the Ieft shows the within-cluster d1spers1on of eac no e.

dispersion of the newly formed cluster that the node represents. Thus the
heights are Q[l, 4] = 6.5, Q[2, 5] = 34, Q[l, 4, 3] = 67 .33, and Q[l , 2, 3, 4, 5]
= 892.
Figure 2.9 shows the results of applying minimum variance clustering to
Data Matrix #l. The steps in the computations are not shown here since
nine 10 X 10 matrices would be required. The exact value of the within·
cluster dispersion of each newly formed cluster is shown on the scale to the
left of the dendrogram to serve as a check for readers who wish to carry out
minimum variance clustering on these data for themselves. It is interesting
to compare this dendrogram with those in Figures 2.3, 2.5, and 2.6 which ali
relate to the same data.

2.6. DISSIMILARITY MEASURES AND DISTANCES

bItetween
was remarked in Section 1 of this chapter that the Euclidean distance
the point · ·¡
s representmg two quadrats is only one of m any possib e
01ss1MILARITY MEASURES ANO DISTANCES

ways of defining
d
. the
. ·1 dissimilarity
istance as a diss1rm
.
. of the two quadrat W
anty measure in See1ions
. 2 3 4s. e used . Euclide an
ditferent clustenng procedures were describ d ' . ' ' and 5 m which four
41
'
other poss1ºbl e a·iss1rm
. ·1 anty
· measures a de ·hThis
n t eu. adseet.ion describes sorne
vantages. vantages and disad-

Metric and Nonmetric Measures

first it must .be noticed . . that. dissimilarity m easures are of t ki .


and nonmetrzc; the d1stmct10n between the . . wo nds, metnc
· m is very important
A metnc
. measure .or, more briefly ' a metne . h as th e geom t ·. .
of a distance. In particular, it is subiect to th e t rzang
. .
/e mequalzty~ne properties
· Thi
axzo~.
. . J

is ·the common-sense axiom which states that the 1ength of any one s1de of s
tnangle must be . less than the sum of the length f h
s o t e other two s1des. . ª
Suppose we wnte d(A, B) for the length of side AB of tnang · 1e ABC and
analogously
. for the other two sides. Then the triangle mequa · li ty may ' be
wntten

d(A, B) ~ d(A, e)+ d(B, e).

The equality sign applies when A, B, and C are in a straight line or,
equivalently, when triangle ABC has been completely flattened to a straight
line.
The triangle inequality is obviously true of Euclidean distances. However,
measures of the dissimilarity of the contents of two quadrats (or sampling
units of any appropriate kind) are often devised without any thought of the
geometrical representation of the quadrats as points in a many-dimensional
coordinate frame. These measures were not, when first invented, thought of
as distances. Only subsequent examination shows whether they are metric,
that is, whether they obey the triangle inequality or, in other words,
"behave" as distances.
Sorne examples are given after we have considered why metric diss~milar-
ity measures are to be preferred to nonmetric ones. As remarked preVIously,
. · ·1 't b t een two
when a metric measure is used to define the d iss1IDI an Y e w .
quadrats, then the dissimilarities behave like distances. As result, it may
b · t · a space of many
ª
e possible (sometimes) to plot the quadrats as poID s m 1 h
dimensions with the distance between every pair of points being equal 1 t_te
d" . . . h nonmetnc d1ss1ID1 an y
°
ISsimilanty of the pair (see page 165). But w en ª
measure is used, this cannot be done.
CLASSIFICATION BY CLusn
Rl~C
42

~ E ¡·a distance as already defined were used as th


Of course' if uc 1thean the pattern of points would b e t h e same as th e
. ·mil rity measure,h en. t has as its coordinates the amounts of that
d1ss1 ª
Produced when . eac
· th pom .
quadrat it represents. But 1f sorne other met .
e
different spec1es m e . . l1c
dissünilarity measure were used, it w_ould give a dlfferent pattem of Püiuts.
However, if a nonmetric dissimilanty measure were used, no swann 01
points could be constructed of ~n~ p~tt~rn whatever ..
To see this, let us invent a d1ss11mlanty measure s1mply. for purposes of
illustration. Suppose we define the dissimilarity between pomts A and B as

100
S(A , B) = max(d) - d(A, B).

Here d(A, B) is the ordinary Euclidean distance as previously used in this


chapter, and max( d) is the distance separating the farthest pair of points.
For concreteness, let max( d) = 100. Obviously, increasing values of d(A, B)
within the observed range of O to 100 give increasing values of 8(A , B) and,
therefore, 8(A, B) could reasonably be used as a measure of dissimilarity.
Now imagine three points, A, B, and C. Let the Euclidean distance
between each pair be

d(A, B) = 90; d(A,C) = 75; d(B, C) = 50 .

These distances conform with the triangle inequality; that is,

d(A, B) ~ d(A, e)+ d(B , e)

and, therefore, the points can be 1 d 1.


instance, a sheet of a . P otte ~ a two-dimensional space (for
Now consider the d~s ~er!l m_ t_he form of a tnangle with sides 90, 75, and 50.
smu anties defined previously:

8(A, B) = 100 _ 100


100 - d(A , B) - 100 - 90 = 10;

Likewise, 8(A, C) = 4 and 8(B C -


' ) - 2· Clearly, it is not true that

8(A, B) ~ 8(A, C) + 8(B , C)


and ' as a consequence
-~-----------'-ºn_e cannot construct a triangle witb 8( A. fi ),
I
01ss1MILARITY MEASURES ANO DISTANCES

43

8(A, C), and 8(B, C) as its sides. It is impossible T . .


saving that the 8s, although they could serv .' . his is another way of
J ~ e as d1ssuni_¡ ·t
00 n.metric. an Y measures, are
To repeat, the merit of metric dissirnilarity .
errnit the qua d rats to be represented as a swa measures f .
is that th
. ey often
P . . rm o pomts m many-d·
S h
sional space. uc a representation is not strictl . imen-
. . Y necessary if ali w
do with the data 1s class1fy them. Nearest-neighb d e want to
. 1 or an farthest-neighb
clustenng, as examp es, can be done just as well with . . or
.h . B . nonmetnc d1ssimilari
úes as wit metnc. ut often, mdeed usually we wa t t d. -
· ' n o or mate the data
as well as class1fy them. As we see in Chapter 4 ord· r·
. . . ' ma 10n procedures use
swarms of data pomts as theu . raw
. material
. · Obviously, 1·t is · desrra
· ble that
the two
. methods . of analys1s,
. . ordmat10n and classification (or 1
e us tenng,
· ) be
carn~d ~ut. o~ 1d.entical bod1es of data, that is, on identical swarms. Hence
metnc dissumlanty measures are to be preferred to norunetric ones. Their
use permits a clustering procedure and an ordination to be performed on
the same swarm of data points.
The following are examples of two dissimilarity measures which do not,
at first glance, look very different; however, one is metric and the other
nonmetric.
The better known of the two is the nonmetric measure percentage
dissimilarity PD (also known as the percentage difference or percentage
distance ). It is the complement of percentage similarity PS (also known as
Czekanowski 's index of similarity; see Goodall, 1978a). Since PS, now to be
defined, is a percentage, PD is set equal to 100-PS.
The percentage similarity of ·a pair of quadrats, say quadrat 1 and
quadrat 2, is defined as follows.
Let the number of species found in one or both q~adrats be s·
Let X¡1 and X¡2 be the amount of species i m quadrats 1 and 2,
respectively ( i = 1, 2, ... , s ). Then

i=l (2.8)
PS = 200 X s
L (X¡¡ + X¡2)
i=l

· the table,
. . . 1 2 10. Since, as shown m
A numencal example IS shown 11:1 ~ab _e : PD = 41.33%.
PS = 58.67%, the percentage diss1nulanty IS
CLASSIFICATION BY ClUSTERINc;
44

CALCULATION OF THE
TABLE 2.10. TO ILLU~i~iiv PD AND THE PERCENTAGE
PERCENTAGE DISSIM RATS ª
REMOTENESS PR OF TWO QUAD .

Quadrat Quadrat
Species min(x;., X;z) max( x; 1, x;2 )
1 2
Nwnber i
7 7 25
1 25
16 16 40
2 40
50 18 50
3 18
22 16 22
4 16
22 9 22
5 9
117 66 159
Totals 108
PS = 200 x 66/(108 + 117) = 58.67%. Therefore, PD = 41.33%.
RI = 100 x 66/159 = 41.51%. Therefore, PR = 58.49%.
ªThe entries in the table are the quantities of each species in each quadrat.

Calculation of the second dissimilarity measure mentioned, the metric


measure, is also shown in Table 2.10. There appears to be no accepted name
for it, so it is here called percentage remoteness, PR. It is the complement of
Rufüka's index of similarity, RI (Goodall, 1978a), which is

s
L min( xil, x¡ 2 )
RI = 100 X _i_=_l- - - - -
s (2.9)
L max(x¡1 , x¡ 2 )
i=l

Then

PR = 100 - RI.

Both PD and PR take values · h 'f


the two quadrats have _m t_ e range O to 100. It is easily seen that 1
min(x . x ) a no spec1es m common, then all terms of the forni
11' i2 re zero and thus PD = p _ ·r
the contents of the two . R - 100%. At the other extreme, 1.
then PD = PR - O°' Thquadrats are identical so that x. = x. for all ''
- 10. erefore ·th 11 12
' e1 er measure could be used as a measure
rLARITY MEAS U RES ANO DIST ANCES
01 ssrM
45

dissirnilarity and if it were not for th .


of h e super ·
rnetric measures, t ere would be little t h ionty of metric ov
no n . . . . o e oose b er
r since PR is metnc, it is superior A etween them How
eve ' . . · proof that p · · -
be found m Levandowsky and Wmter (1971) R is metric can
· b f d · ' and a dem ·
PD is nonme tne can e oun 111 Orlóci (lnS) onstration that
book). An example of the use of PR in ecol . l(but see page 57 of this
. og1ca work has b .
evandowsky (1972); h e used it to measure th e d.lSSlilllla
. . · t een f
given by
L
pJankton in water samples collected from te n Y o the phyto-
shores of Long Island Sound. mporary beach ponds on the
Another metric dissimilarity measure that ha s mueh to comm d · ·
city-block distance CD (sometimes called the M an hattan metrzc)
_en Itit ·is the
. . 15 h
sum of the . d1fferences
. m species amounts ' for a11 spec1es
. m . · the tt e
sampling uruts bemg compared. In symbols, ' wo
s
CD = "
1...J lx 11. - . 1·'
x 12
i=l

here lx 11 - xi21 denotes. t~e absolute magnitude of the difference between


x, 1 and X¡z
. taken as pos1tive irrespective of the sign of (x 11. - x 12 ) . Thus for
the data m Table 2.10,

CD = 125 - 71 + 140 - 161 + 118 - 501+116 - 221+19 - 221

= 18 + 24 + 32 + 6 + 13 = 93.

In a way, CD is the most intuitively attractive of the dissimilarity


measures. It amounts to a numerical value for the difference an observer
~onsciously sees on looking at two sampling units, for example, two quadrats
~n a salt marsh, two trays full of benthos from Surber samplers, or two plots
ª
m forest. Thus suppose one were to inspect two forest plots and count the
number of trees of each species in each plot; then, for many people, the
spontaneous answer to the question, "How great is the difference betwe~n
the plots?" might well be arrived at simply by adding together the dif-
fere b · ll · · t0 ccount
Thi nces etween the plots in species content, taking a species m ª ·
sis the c1ty block distance and it has the metnc property.
The units in which city-block distance and Euclidean distance are mea~
surect are the same as the units in which species quantities are mea~ure
anct 0 f conunu111ty to
' as explained before (page 11), vary trom one type . . . · a
another Wh . 1 1 f 0 r a dissnmlanty 1Il
· en an author gives a numenca va ue
CLASSIFICATIO N BY CLusnb
"'~e
46
.t almost always omitted. Purists may disappro
er the uru s are . Ve
h
researc pap ' b ro'versal and leads to no rmsunderstanct·
t0 m seems to e u ' lllo
but the cus . f nd clearly defined at the outset. With p ª
·d d ali um ts are u11Y a . er.
provi e h PD and PR the problem of uruts <loes not arj
centage measures suc as
·a d three metric measures of d'1ssimil.anty:
. Euclidese
we have now cons1 ere teness and city-block d1stance..
Often
.
1t is co
a1
distance, percentage remO ' . .
. h are of Euclidean d1stance rather than the distan
vement to use t e squ .
itself as the clustering criterion; that is, at each step of a ~lustenng procesi
C1

the two clusters are united for which the sq~are of. the d1stance separatin¡
them is least. This was done in the example m Sect1on 2.6.
2
lt should be noticed that although it is legitimate to use distance as 1
2
clustering criterion, this is not equivalent to usíng distance as a dissimilar
ity measure since distance 2 is nonmetric. To see this, consider a numerica
example. It is easy to construct a triangle with sides 3, 4, and 6 units long
since 6 < 3 + 4. But, obviously, one cannot construct a triangle with sid
2 2 2
32 , 42 , and 6 2 units long sínce 6 > 3 + 4 •
Euclidean distance, city-block distance, and percentage remoteness pro
vide a more than adequate armory of dissimilarity measures for use wheneve
nonnormalized distances are required. I t is now necessary to consider th1
topic of normalized versus nonnormalized (" raw") data and to conside
whether, and if so how, ecological data and dissimilarity measures derive1
from them should be normalized for analysis.

Raw versus Normalized Data

Data are said to be nonnalized wh


distance from the ori . ~n every point is placed at the sam
points from one anothgm ?f thh~ co?rd1~ates so that all that distinguishes th
er is t err drrect10n fr th . . . . . al
to disregardíng the absolute uan . . om e ~ngm. This 1s eqmv e~
the relative quantities. q tlties of each spec1es and considering onl
.To see why thi s is
· sometimes thought d . .
pomts A B and e esrrable, consider Figure 2.10. Tb
.' ' represent three d . .
commuruty and it is 1 qua rats laid down in a two-spec1e
P . e ear that d (A B) d .
ropor~ions of the two s e . . ' < (B, C ). However, the relatl~
great d1ssimilarity as p c1es m quadrats B and C are identical Tbei
the total quantity ~f t:;,etawsured b?' d ( B ' C ), arises solely from -'the f;ct th8
B Con
· ° d' . comb.med is much greater in C thaJl ~
verse1Y, the very slightspec1es
lSSunilari ty
represen ted by the short dista11
L.ARITY MEASURES ANO DISTANCES
orssrM'
47

1.0
(\J

(/)
W A(0. 2,0.5)
l) 0.5 •
w
a..
(/)
C( 2 . 1 , 0 . 3)
B~(0~
.7~,0.~l~l~~~~~~~

1.5
2.0
SPECIES 1
Figure 2.10. Points A, B, and C described in the text Cl
. early, d(A, B) < d(B, C).

d(A B) arises from the fact that both the quadrats A d B .


' . . . an contam small
amounts of both spec1es, although the ratio of species 1 to sp · .
. . . . . ec1es 2 is much
greater m B than m A, this d1fference is not refiected in their dissimilarity
as measured by d(A, B).
It can be argued that dissimilarity should be measured in a way that lays
more stress on the relative proportions of the species in a quadrat and
correspondingly less stress on absolute quantities, in other words, that the
raw observations should be normalized. However, this is a matter of opinion
and is one of the decisions that must be made before data are analyzed. It
should be emphasized that there is no single answer to the question: Should
data be normalized before analysis? Whatever the decision, it is a subjective
choice. Sorne guidance towards making the choice is offered in the follow-
ing. First we consider two ways of measuring the dissimilarity between a
pair of data points so that only the relative proportions, not the absolute
amounts, of the species are taken into account. . .
These dissimilarities are the chord distance and the geodesic metrzc. Both
are metric measures. They are shown diagramatically in Figure ~.ll. As
. · ) is used for illustra-
a1ways, the simple, two-dimens10nal ( two spec1es case . .
f -~ liz d to the s-d11Ilens10nal ( s
ion, ªQP the resulting formulas are then genera e
species) ..
case. . 2 11 ). Let the ongmal
The chord distance is derived as follows (see Figure · . A' and B' for
data · · f ·1 adius and wnte
Pomts be proiected onto a cuele o uru r E ¡·aean distance
th · . J d(A' B') the uc i
e ProJections of points A and B. Then ' ' d B which we shall
between A' and B' is the chord distance between A an 'ts a quadrat in
de ' . e represen
note by c(A B) In the figure since pomt . as in quadrat
Which ' . ' 1 tive proport10ns
the species are present in the same reª
CLASSIFICATION BY CLUSTER¡~<:;

1.0 A' f.
(\J . '-....
(/)
\
\
' ',
\ g(A,B)
w \ \
u \ \
c(A,B) \
~ 0 .5
\'\ \
(/)
\ \
\ ..y

1.5 2.0
0.5 1.0

SPECIES
. 1 p ·nts A B and e are the same as in Figure 2.10. A' and B' are the projections
F1gure 21
. . 01 , ' h " d ·
of A and B onto a circle of unit radius. The chord distance and t e geo es1c metric" (oi
geode ic distance) separating A and B are c(A , B) and g(A, B) , respectively.

B, the point C' is identical with B' so that d(A', C') = d(A', B') and
d(B',C') =O. Equivalently, c(A, C) = c(A, B) and c(B, C) = O.
We now derive c(A, B) in terms of the coordinates of points A and B.
Let these coordinates be (xlA, x 2 A) for point A and (x 1B, x2B ) for point B
(Recall that the first subscript always refers to the species and the second tJ
the quadrat or other sampling unit.)
First, for brevity, put OA = a, OA' = a', OB = b, and OB' = b'. Writ(
Ofor angle AOB. Obviously, AOB = A'OB'.
From the construction, we know that a' = b' = l.
Applying Apollonius's theorem (see page 78) to 6. ABO and ~A'B 'O
respectively, shows that .

1
d (A, B) = ª 2
+ b2 - 2ab cos 8 (2.10
and

c2(A,B) = a'2 + b'2 - 2a'b'cos8


= l2 + l2 - 2 cos 8

= 2(1 - cos8) . (z.11


N ow use (2 10)
. to express cos (} in t f
erms o xlA, X2A, X1s ' and Xzs·
01ss1MILARITY MEASURES ANO DISTANCES
49

from Pythagoras's theorem,

a2 -xlA+x2A;
- 2 2

d2(A, B) = (x1A - xlB)2 + (x 2A - x 28 )2 .

Then since from (2.10),

cos8= ª 2 +b 2 -d 2 (A,B)
2ab

it follows that

cosO = ( xfA + x~A) + ( xfB + x~B) -[ (x,A - xlB)2 +(x2A - x2S]


2J(xi_A + xL)(xf8 + x~ 8 )

(2.12)

Hence to evaluate c 2 (A, B) for any pair of points A and B with known
coordinates, first find cos (J using (2.12) and then substitute the result in
(2.11).
For example, in Figure 2.11 the coordinates of A and B are, respectively,

-(0205) and (x 18 ,x 28 )=(0.7,0.l).


(X lA' X 2A ) - • ' ·

Therefore,

cosO = (0.2 X 0.7) +(0.5 X 0.1) = 0.4990,


2 2
{(0.2 2 + 0.5 2
)(0.7 + 0.1 )

B) - 1 0010 from (2.11);


(A
from (2.12), whence c2 (A, B) 1.0021 and e '
= - ·
also, fJ = 1.0484 radians or 60.07º · li d to the s spec1·es case when the
The equations can be directly genera ze . tual space of s
data points form an unvisualizable swarm
10
conc¿ lar axes). Thus ª
dimensions (a coordinate trame with s mutuallY perpen icu
CLASSIFICATI ON BY CLUSlERlt-.jC
so

{t x?A ;t x?o}
1/ 2

. ( ll) . s already given whatever the number of species.


Equatlon 2. is a ' h h d d.
·
The maximum an d Illi·n1·mum possible values for t e. e or. 1stance
between a parr · of poi.nts in a space of any number of drmens1ons are ¡2
and o, respectively. This follows from (2.11) and the fact that cos (} must
lie in the range [ -1 , l]. Thus when OA and OB are parallel, cos (} === 1,
c2(A , B) = o, and c(A, B) = O; when OA and OB are perpendicular to
each other, cos (} = O, c2 (A, B) = 2, and c(A , B) = ¡2.
Another obvious dissimilarity measure is the geodesic metric, shown as
g(A, B) in Figure 2.11. It is the distance from A' to B' along the are of the
unit circle; to be exact, one should stipulate that the distance is to be
measured along the shorter are, not the longer are formed by the rest of the
circle. It is seen that, since the circle has unit radius, the are distance
g(A, B) is the same as angle (} measured in radians. To find the angle, om
must first evaluate cos(} = SAB' say, and then find g( A, B) from g(A, B) ==
arccosSAB·
SAB is known as the cosine separation of the quadrats (Orlóci, 1978
P·. 199). In the two-species example considered previously and shown ll:
Figure _2.11, where the coordinates of A and B are (0.2, 0.5) and (0.7,0.l)
respecl!vely, coslJ = SAB = 0.4990, as already determined. Hence,

g(A, B) = arccosSAB = 1.0484 units of length.


The range of possible 1 f ()
7T/ 2 = 1.576. In the sim ~a ues o ~ and he~ce of g(A, B), is from Ot
geodesic metric is th 1 p ~ two-spec1es case illustrated in Figure 2.11 thi
(three-dimensional): entght of a~ ~re of the unit circle. In the three speciel
. ase e metnc is th 1 h o
unit radius and the d e engt of a geodesic on a sphere
wor geodesic has its customary meaning, namely, ti
01ss1MILARITY MEAS U RES ANO DIST ANCES
51

TABLE 2.11. DATA MATRIX #4. THE QUANTITIES OF TW


ELEVEN QUADRATS. O SPECIES IN

Quadrat 1 2 3 4 5 6 7 8 9 10 11
Species 1 3 4 5 5.5 6 6 11 11.5 12 14 13.5
Species 2 3 7 7 5.5 4 6.5 11 13.5 13 11 15

shortest on-the-surface dis~ance, or great circle distance, between two points


on a sphere. In the s spec1es case the geodesic metric is a great circle on an
s-dimensional hypersphere.

ExAMPLE. We now examine the outcomes of clustering the same set of


data twice using the centroid clustering method both times. First, the data
are left in raw form and Euclidean distance is used as the dissimilarity
measure. Second, the data are normalized and the geodesic metric is used as
the dissimilarity measure. The data (Data Matrix # 4) are tabulated in
Table 2.11 and plotted in Figure 2.12.
There are two "natural" clusters but they differ from each other chiefiy in
the quantities of the two species they contain; the relative proportions of the
species are not very different. Thus if the points represented randomly
placed vegetation quadrats, one would infer that the ar ea sampled was a
mosaic of fertile areas and sterile areas, but that the vegetation in these two
areas differed mainly in its abundance and hardly at all in its species
composition. As one would expect, if clustering is done using Euclidean
distance as the dissirrúlarity measure (upper dendrogram in the figure), the
two natural clusters are separated clearly; whereas if one uses the geodesic
metric as the dissirrúlarity measure (lower dendrograrn), the two clusters are
intermingled.
Which is "better" is obviously a rnatter of choice. Even the rneaning of
"better" is undefined unless the investigator has sorne definite object in
view, that is, sorne clearly formulated question for which an answer is
sought. Then whatever clustering method gives an unarnbiguous answer to
the question (if any method does) is obviously the bes t.

Communities which differ from place to place only in overall abundan~e,


and not at all in species cornposition, are most unlikely to be found m
nature. For instance the abundance of the marine rnacroalgae on rocky
'
ª
CLASSIFICATION BY CLus
TER¡~~
52

seashore varies markedly with the exposure of the shore to waves; sheitere
shores support a much Jarger crop than wave-battered _shores. But the,sd
ed communities are not sparse and dense vers10ns of the e
co ntrast . . . .. salll
species mixture; they d1ffer, also, m spec1es compos1t10n: 1
Likewise' the luxuriance
. . of the
. ground vegetat10n . m regions sev ere[)
affected by air pollut10n is consp1cuously less than that m clean areas. B .
. . . h . . Ut11
is not only less m amount; it 1s also mue poorer m spec1es.
Thus if clustering is done to disclose differences of an unspecified kin
raw data are better than normalized data. Differences in overall abundand
are not "meaningless" and are not (usually) unaccompanied by at le~
sorne qualitative differences in the community of interest. Normalizing:1
data may inadvertently obscure real, but slight, differences among them
the same time as it (intentionally) obliterates the quantitative differenc ª
. ~
Th at is .not .to say, however, that there may not be situations in hi
11 w el
normaliz at10n is ca ed for; for example' one might wish to classi'fY samph

8.28

2
11
15

ª• •
9
N

l/) 10

7 •10
w
u
w 2 3
o o
2
o
3
o
4
o
5
o
6
o
a_ 1
o o
l/) 06
5 40
o 0.10
o ~
1

5 10 0.05
15
SPECIES

0
Figure 2 12 T o
b · · · he data ·
o tamed by cluste . pomts of Data Mat .
o
5 10
o
1
°4 7
o •
6 9 11 B
o
2 3
nng the data e .
betw amed using the Euclid . e~tro1d clustering wa
was obt · nx # 4 (see Tabl aJll
e 2.11) and two deodrogr
een-quadrat similarity; th:a;:,!istance between eac~ us~~
for both. The upper dendro~
-----~-----~--er
- ....... dendrograrn u d p of raw data points as measure
se the geodesjc metric.
orssrMILARITY MEASURES ANO DISTANCES

53
Iots of the vegetation of a polluted a
p . . rea so as to d.
repollut10n clustenng. Then, provided th . isclose the probabl
P . . 1 e quahtativ d. c:r e
reexistmg e usters exceeded the qualitat" ct· e iuerences among
P . . Ive Ifference . d
tion, normalizat10n would be desirable. It . h s m uced by pollu-
. · mig t prevent ct·.i::r
uantity
q . . of vegetat10n m the sample plots f . Iuerences in the
. rom overnd · .
q ualitative differences that persisted from th mg, and masking the
e prepollut" · '
To repeat: whether to use raw or normalized d _10n penod.
judgment. ata IS always a matter of

Presence-and-Absence Data

In sorne ecological investigations it often seems b tt · .


. . . '. - e er simp1Y to list the
species present
.. m each sampling
. . urut than to attempt to .
measure or estimate
the quantitles. When this Is done ' the resulting data are known as
presence-absence data, binary data, or (O, 1) data, and the elements in the
data matrix consist entirely of Os and ls.
Suppose the community being sampled appears to vary appreciably from
place to place; then for a given outlay of time and effort one may be able to
acquire a larger amount of information, or more useful information, by
examining many quadrats quickly rather than a few quadrats slowly and
carefully; the quickest way to record a quadrat's contents is, of course, just
to list the species in i t. Again, suppose the organisms comprising the
community vary enormously in size. They might range from tall trees to
dwarf shrubs, for example. It rnight then be impossible to find a quadrat
size that was large enough for -use with the trees and, at the same time, small
enough for it to be practicable to measure the amounts of each species of
ground vegetation. In such a case, use of binary data overcomes the
difficulty. As Goodall (1978a) has written, in highly heterogeneous co~un­
ities, "quantitative measures add little useful information" to that yielded
by a simple species list for each quadrat. .
Now consider the graphic representation of a binary data matnx. In the
simple, visualizable tw~ and three-species cases, all the data points must ~all
· 1 This amounts to saymg
on the vertices of a square or a cube, respective Y· d
'gh .bl positions for the ata
that there are respectively only four or ei t possi e . f th
. , ' . the coordmates o e
pomts in these cases Thus in the two-spec1es case .
· (l 1) In the three-spec1es
four possible data points are (O, O), (O, 1), (1, O), and . ' · (O O) (1 o O)
· h ordmates , 0' ' ' ' '
case, the eight possible data pomts ave co F 2 13)
(O, 1, O), (O, O, 1), (1, 1, O), (1, O, 1), (O, 1, 1), and (1, 1, 1) (see igure . .
54
argument to binary data from an s-species comm. .
ow ex ten d the . .
N (conceptually) in an s-duneos1onal
plotted
. . . UIUt
trame. It 1smtuitive¡) coor~ate
ciear that the possible data p01nts are the_ ve~uces of an s-dunensiona
hypercube and the number of these veruces is 2 .

1~
Tbere is no objection to using the clustering methods described in earli
sections of tbis chapter to the clustering of binary data. However, unless 1
total oumber of species is large, the result of a clustering process may see
somewhat arbitrary. Tbis is because only a few values are p ossible for;
distance separating any pair of points. e
Consider the simple cases in Figure 2.13. In the two-species case th
distance between any pair of noncoincident points must have one of t
values, 1 or !i, depending on whether the two points are at the ends 0 ;c
SPECIES 2

SPECIES 1

SPECIES 2

(0,1 ,0 )
(1,1,0)

lw ~
(0,1,1)
/
V-
~1: --­
: /1 (1 ,1,I)
"V'3" 1

(0,0,0~--
/
/ SPECIES
/
/
/

(0,0,I) /
~ 1 ----7(1,0,1)

Figure 2.13 SP_EClES 3


sional (u . Ali possible bin
pper) and three-dime~
------------~_'_~wer
data(lo
ns1onal points,) and the d'istances s . . p·
coordinate frames . eparatmg them, in two-d1J1le
01ss1MILARITY MEASURES ANO DIST ANCES
55

side of the square or at the ends of a diagon 1


·"ere are three poss1"ble nonzero distances· th ª· 1n the thre · case
e-spec1es
u1 · ese are 1 1·f th
at the ends of one edge of the cube·, fi if they are at the ' e two points are
d f .
of one of the square faces of the cube; /3 if the a en s 0 a diagonal
diagonal that crosses the cube. Again the argum t Y re at the ends of a
. en can be ge liz d
s-spec1es case. When there are s species the d. t nera e to the
. . ' is anee between a . f
noncoinc1dent pomts must have one of only d" . ny parr o
r;, ¡;:;- e Th d. . s istmct values namel
1 y.t., v3, ... , vs. e 1stance 1s the square root 0 f th ' y,
' e number 0 f ·
found in one or the other (but not both) of the quad t species
. . ra s represented by the
pomts. Thus the distance between the points (1 o 1 0 1
· · d. . ' ' , ' , 1, 1, 1, O) and
(O, 1, O, .O, 1, 1, 1, 1 , 1) m nme- 1mens10nal space is v'4 't
= 2, smce
· there are
four nnsmatches between . these two lists. In the s-speci·es case, 1·f the two
quadrats . together contarn all. s species but have no speci·es m · common, then
the d1stance between the pomts representing them is IS.
It follows that if a clustering procedure starts with construction of a
distance matrix (e.g., like that in Table 2.1, page 18), whose elements are the
distances between every possible pair of points, then unless the number of
species is very large there are likely to be severa! "ties" in the matrix. That
is, severa! elements may ali have the same value. If this also happens to be
the smallest value, then severa! fusions become "due" simultaneously. The
same thing happens with minimum variance clustering; if two or more
elements in a criterion matrix (such as Q1 in Table 2.9, page 36) are equal to
one another and smalier than ali the others, then again the indicated fusions
are due simultaneously. When this happens, the "due" fusions should be
carried out simultaneously before the next distance matrix (or criterion
matrix) is constructed; otherwise, errors will occur.
We now consider other ways of measuring the dissimilarity between pairs
of quadrats (data points) when the data are in binary or (O, 1) form. The
Euclidean distance between two points which, as we have seen, is always the
distance between two vertices of a hypercube, is not the only way of
measuring the dissimilarity of the points. One can also use percent~ge
dissimilarity PD a~d percentage remoteness PR which were defined earher
(see page 43).
These measures can be calculated as already shown in Table 2.lO (page
.
44) ; altematively they can be denved 2 X 2 table as we shall now
from a d
' .t t o chosen qua rats
see. Suppose a 2 x 2 table is constructed to perfill w d
(quadrats 1 and 2 say) from a sample of severa! quadrats to be comparled.
' . d f 11 the quadrats samp e ,
Assume that species lists have been compile or ª
CLASSIFICA TI O N BY CLUSTEn
56 "I N~

d ts contam . spec1es
. that are not present
. hin quadrats
d t 1. and .
and sorne qua ra 1 f pecies represented m t e a a matnx sorne2
d of the to ta 0 s s
In other wor s, b 0 th the quad ra ts being compared . H ence th ese '' ·
. Joint
are absent from . t d N ow consider th e following 2 ><
ab sences " can be spec1fied and coun e .
table:

Quadrat 2
Number of species
Present Absent
Quadrat 1 Present a b
N umber of species Absent e d

. (2.8) on page 43 ' the definition of percentage


Recall Equat10n . (. similarit)
f

PS. Clearly, when xil an d X;2 are 1 ther o or 1 for all values of z i.e., or aU
s species),
s
L min(x; 1, x;i) =a , (2.13)
i=l

the number of species present in both quadrats.


Similarly,

t (x
i=l
;i + x ;,) = (a + b) + (a + e) = 2a + b + e; (2.14)

this is the number of species in quadrat 1 plus the number in quadrat. 2,


counting the a "joint presences" (species present in both quadrats) twrce
over.
Substituting from (2.13) and (2.14) into (2. 8) gives

PS = 200 X 2 ab = 100 X 2ab


a+ + e 2 a+ + e
as the percentage similarity between two quadrats when the data are in
binary fonn. This is identical with Serensen 's similarity index (as a per·
~en~age), one of the best known and most widely used of the similant
mdices available to ecologists. It follows that, with binary data, the per·
centage dissimilarity PD is the complement of S0rensen's index.
01ss1MtLARITY MEASURES ANO DISTANCES
57

Next recall (2.9), the ·formula


. for RuZicka's suru
· .1anty
. ind RI (
The term in the denommator 1s ex page 44).
s
L max(x¡¡, x; 2 )
i=l
=a+ b + .
e, (2.15)

this is the number of . species in the two quadrat s comb'med n t .


the joint presences tw1ce. ' o countmg
Substituting from (2.13) and (2.15) into (2.9) gives

RI = 100 X --ª--
a+ b + c ·

This is. J accard 's index (as a percentage) ' the oldest simi·lan·tYm
· dex used by
e~olog1sts (Goodall, 1978a) and as well known as S0fensen's. Thus with
bmary dat~ t_he percentage re~oteness PR is identícal with the complement
of Jaccard s mdex. lndeed, this complement (as a proportíon rather than a
percentage), namely,

1- a b+c
a+b+c a+b+c

is known as the Marczewski-Steinhaus distance (Orlóci, 1978). It is the ratio


of the number of "single" occurrences (species in one but not both of the
two quadrats being compared) to the total number of species (those in one
or other or both of the quadrats).
The numerical example in Table 2.12 illustrates the relatíonships among
the various measures discussed. To summarize: when the data are binary,
percentage dissimilarity PD is identical to the complement of S0rensen's
index, and percentage remoteness PR is identical to the Marczewski-
Steinhaus distance MS. (I t is assumed that the measures are either ali in the
form of percentages or all in the form of proportions.)
This statement enables one to choose wisely between the competing
measures. It has already been mentioned (page 45) that PR is metric and
PD nonmetric. 1t follows that MS, which is no more than a particular form
of PR, is metric; a proof, which is rather long, has been given b;'
Levandowsky and Winter (1971). Similarly, the complement of S~rense~ ~
· d D · t e· Orloc1
in ex, which is no more than a particular form of P , is nonme n ' .
n 97 s, p. 61) demonstrates the truth of this with an example. Hence MS is

the better dissimilarity measure of the two.


TABLE 2.12. ILLUSTRATION OF THE CALCULATION OF
DISSIMILARITY MEASURES WITH BINARY DATA.

l. The presences and absences of 12 species in 2 quadrats (compare T~


Species ).
Number Quadrat 1 Quadrat 2 min(xil, x; 2 ) max(x . ... )
1
- - - - - - - - - - - - - - - - - - - - -- - 1 , ... i2
1 1 1 1 1
2 1 1 1 1
3 o o o o
4 1 1 1 1
5 o 1 o 1
6 1 1 1 1
7 1 1 1 1
8 1 o o 1
9 o 1 o 1
10 1 1 1 1
11 o 1 o 1
12 o o o o
Totals 7 -
9 6 10
PS = 200 X 6/(7 + 9) = 75%. Therefore PD = 25%
RI = 100 '
X 6/10 = 60%. Therefore, PR = 40%.
º·
2. The same data in the forro of a 2 X 2 t abl e.
Quadrat 2
Species Speeies
present absent

Qu;drat ¡~~:se~:~ a=6 b=1


Species
absent e= 3 d= 2
Complement of S0rensen's . d
m ex = 100( b
MS dist + c)/( 2 a + b + e)= 25%
~-- anee= lOO(b + c)/(a + b + e)= 40% ..
ILARITY MEASURES ANO DISTANCES
01SSI M
59

Since MS and the complement. of. S0rensen's ind ex can both be d


unctions of the cell frequenc1es m the 2 x 2 tabl . . . . expresse
as f e given 11 1s mt ·
to
enquire whether (assuming binary data) the Euclid d'. erestmg
. . ean 1stance bet
wo data pomts can also be expressed m terms of thes f . ween
t e requenc1es Rec n
that the distance
·
between quadrats
.
1 and 2
'
d (l 2)
' '
is th ·
e square root of the
ª
number of spec1es that occur m one or the other ' but no t both , of the
quadrats. Hence

d(l, 2) =¡¡;+e.

We must now compare the Euclidean distance d with the Marczewski-


Steinhaus distance MS in an attempt to decide which (if either) is the better.
Since both are metric, sorne other criterion is needed for judging between
them.
The distinctive characteristic of MS is that it takes no account of species
that are absent from both the quadrats being compared. This is regarded as
a great advantage by ecologists who argue that presences and absences
should not be given equal weight, especially for a community made up of
sessile organisms. A "presence" conveys the unambiguous information that
the species concerned can and <loes occur in the quadrat concerned, but an
"absence" may mean either that the species cannot survive in the quadrat or

SPECIES 2

o
, l . - - - - - - - , ( 1. 1, o)

1
1
~---
/ e SPECIES 1
¡ (0,0,0)
/
/
/

A
(0,0,1)

SPECI ES 3 d b tween points C and D


Figure 2.14. The Euclídean distance between points A ~d B anMSeare not equal. See text.
are equal. But the corresponding Marczewski- Stei·nhaus distances
CLASSIFICATION BY CLUS
lER.1~C
60

that it is absent merely by chance. Thus a dissímilarity measure, or" .


tance," that ignores joint absences appears to have an advantage. di~
Euclidean distance . treats presences and absences equally, as dem~
strated in the followmg.
The disadvantage of MS, a fatal disadvantage accor.d~g to Orlóci (197
p. 62), is that it has no uniform scale of measure. This is most easily se&,
from Figure 2.14 which shows four data pomts A , B , C, and D e
three-space. Clearly, the distan ce between points A and B is equal to th
distance between points C and D, and both are equal to ti.
That is ' e

d(A,B) = d(C,D) = fi.

. is obvious geometrically. The same result can be derived by con.


This
structmg 2 X 2 tables for each pair of points; thus

Pair (A, B) Pair ( C, D)


Quadrat A Quadrat e
~ ~

+ +
Quadrat { + 1 2 Quadrat { + O 2
B -o O D - O 1

( + denotes presences and - absences).


Hence d(A, B) = d(C, D) = llJ+C = 2 + _
Now consider the MS di'st ff+O - fi as before.
anees, say MS(A B)
the two pairs of points. From th f ' . .' and MS( C, D ), between
e requenc1es m the 2 X 2 tables
'
MS( A, B) = b + e 2+0 2
a+b+c 1+2+0 3
and

MS( C, D) = 2+O
0+2+0 = l.
so that MS(A
. '
B)
=I= MS(C, D)
The d1fferen ce betw .
(the number of s . een MS(A, B) and MS( .
In general p~cies present in both d e, D) IS due to the terlll a
' even if tw 0 . qua rats) in th d "'S
matches" ( b + pairs of points h e enominator of 1v'. ·
e), the pair with th 1 ave the same number of ''llllS'
e arger nu m b er of species for tbe
LARITY MEASURES ANO DISTANCES
01 ss1M I
61

cornbined pair (a + b + e) will seem to be the "closer" . .


s measures of dissimilarity. If MS distances are
useda .
In spite of the theoretlcal contrast between E 1.d .
. · uc I ean distance and MS
di·stance the d1fference IS probably unimportant in . . . .
' _ practice; It IS unlikely to
have much eftect on the form of the dendrogram p d .
ro uced by a clustenng
procedure.

EXAMPLE· For instance, Figure 2.15 shows the results of ·


. . app1ymg nearest-
neighbor clustenng to Data Matnx # 5 (see Table 2 13) Th 1 .
. . . . · · e e ustenng was
performed tWI~~' once with Euclidean distances in the distance matrices (see
Section 2.2), givmg t~e dendrogram on the left, and once with MS distances
in the distance matnces, giving the dendrogram on the right. As may be
seen, they are very similar.

To conclude this section, here is a dendrogram showing the way in which


the dissimilarity measures described are related. The measures in boldface
are metric, those in italics nonmetric. The arrows lead to dissimilarities
usable with binary data from their quantitative "parents."
1
Quantitative

Binary
~~
N onnormalized
1
Normalized
1

Euclidean distance Euclidean distance Chord distance


Marczewski-Steinhaus dist. ~ Percent remoteness Geodesic metric
S(í}rensen complement Percent dissimilarity

Recall that four of the dissimilarity measures described here are the
complements of similarity measures. The way in which they are paired is
·sted below:

Similarity Measure Dissimilarity Measure

Percentage similarity, PS Percentage dissimilarity, PD


RuZicka's Index, RI Percentage remoteness, PR
Jaccard's Index Marczewski-Steinhaus distance, MS

S0rensen's Index Complement of S0rensen's Index


(no other name has been devised)
0.67
,J5

,.,/4

,./3

,J2. 0.40 QJ
QI \)
u e:
e o
~ 0.33 ~
~
1/)

"O ,JI
0 .25 ~
e
o
QI
"O
u:::J
w

o o
2 3 4 8 6 7 1 5 2 3 4 8 6 7 1 ·5
Figure 2.15. Two dendrograms produced by applying nearest-neighbor clustering to Data
Matrix # 5 (see Table 2.13). Euclidean distance was used as dissimilarity measure for the
dendrogram on the left, MS distance for the dendrogram on the right.

TABLE 2.13. DATA MATRIX #5. PRESENCES (1) AND ABSENCES (O)
OF 10 SPECIES IN 8 QUADRATS.
Quadrat 1 2 3 4 5 6 7 8
Species 1 o 1 1 1 o o o
2 o 1 1 o
3 o
1 o o
1 1 1 o o o
4 1 o o o 1 1 1 o
5 1 1 1 1 1 o
6 1 1
1 1
7 1
1 1 o 1 1 o
1 1 o
8 o 1
1 o o
1 1 o
9 o 1
o o
1 1 o
10 o o o o
o o o o o

62
---
E LINKAGE CLUSTERING
AVERA G 63

2.7. AVERAGE LINKAGE CLUSTERING

In Section 2.4 it was remarked that there are m~ny possible ways of defining
the distance between tw~ clusters. In the clustenng method described in that
section (centroid clustenng)
. the distance between two clusters is defined as
the distance between theu centroids. I.n this section we consider other
definitions and their properties.

The Average Distance between Clusters

The most widely used intercluster distance is the average distance. This
distance is most easily explained with the aid of a diagram; see Figure 2.16.
Tue five data points with their coordinates given beside them show the
amounts of species 1 and 2 in each of five quadrats. There are two .obvious
clusters, [A, B, C] and [D, E]. The average distance between these clusters,
which will be written du([A, B, C], [D, E]), is defined as the arithmetic
average of all distances between a point in one cluster and a point in the
other. There are six such distances. Therefore,

du([A, B, e], [D, E]) = H d(A, D) + d(A, E)+ d(B, D) + d(B, E)


+d(C,D) + d(C,E)}.

15
./ --- ......_

"\
N
/
/ /

eA(4,11) \ /
/------ ...........
'\
10
1 eC(6,IO) \ I • E (16,9) \1

.
(j)
LLJ 1 I 1 • I
u
w
CL
\ 8(4,7) /
/
\
\
"
0(15,6)

....... __ /
/

(f)
\ / /
5 \ /
........ /

15 20
5 10

SPECIES 1
. . . ve uadrats to illustrate
Figure 2.16. Data points showing the quantities of two spec1es m fi q ,
the definition of d u ([A, B' C], [ D, E]) . See text.
CLASSIFICATION BY CLusTE
64 RIN~

A D) for example, denotes the Euclidean dista


Here, as always, d( ' ' . nee
between the two individual pomts A and D.
Now

d(A, D) = {(4 - 15)2 + (11 - 8)2 = 11.4018;

2
d(A, E)= V(4 - 16) 2 + (11 - 9) = 12.1655;
..................................

and
2 2
d( e, E)= /(6 - 16) + (10 - 9) = 10.0499.

It is easily found, after calculating all six interpoint distances, that

du([A, B, C], [D, E])= 11.0079.

N ow consider the general case. W e require an equation for ,the averag


distance between an m-member cluster [M1 , M 2 , .•. , Mm] andan n-member
cluster [N1 , N2 , ... , Nn1· There are clearly mn point-to-point distances to be
averaged. Therefore,

or, equivalently,
E UNl<AGE CLUSTERING
AVER AG
65

N tíce that the order of summation is immaterial and th


o
unnecessary. Thus we may write e 1arge brackets are

(2.16)

it being understood that the summations are over all values of j and k.
Equation (2.16) is the symbolic form of the definition of average distance.
But when these distances are used to decide which pair of clusters should be
united at each of the successive steps of a clustering process, it is much more
economical computationally to derive each successive intercluster distance-
matrix from its predecessor rather than by using (2.16) which expresses each
distance in terms of the coordinates of the original data points. This may be
done as follows (Lance and Williams, 1966):
Consider three clusters [M1 , M 2 , .•. , Mm], [N1 , N2 , •.. , Nn], and [P1 , P2 ,
... ,PP] with m, n, and p members, respectively. In what follows, the
clusters are represented by the more compact symbols [M], [N], and [P].
Suppose [M] and [N] are united to form the new cluster [Q], with
q = m + n members. Then, from (2.16),

(2.17)

Now recall that the points belonging to the new cluste~ [Q~ are
M1, M2, ... ,Mm, Ni, N2, ... ,Nn. Therefore, we can separate the nght s1de of
(2.17) into two components and write

. . m/m = 1 and the second


N ow multiply the first term on the nght side by the value of the
term by n/ n = 1. This maneuver obviously <loes not 1ter ª
CLASSI FICATION BY CllJSlt
66 ~1~c

· n . That is,
express10 n p

d ([ P] [QJ) = m __!_
u ' m pq J = i
t t d(~,
k=1
Pk) + ~ :q j~I k~/(~,Pd
n p

L d(Mj , Pk) + q np t'1k':I d ~' Pk)


m P nl" " (
= m J_
q mp J=l
L k=l
1

From (2.16) it is seen that

_l
mp
f f d(Mi, Pk) = du([M], [P ])
J=l k=l

L L d(~, Pk) =
n p )
and _!__ du([N], [P] ·
np J=l k=l

Recall that q = m + n and that [Q] contains ali the members of [M] and
[N] , the two clusters that were combined to form it. Thus

du([P], [Q]) = m: ndu([M], [P]) + m: ndu ([N], [P]).


(2.18)
As a numerical example, consider the points in Figure 2.16 again. When
these points are clustered by any method, it is obvious that points D and B
will be united first and then points A and C. After these two fusions have
been done there are three clusters, which will be labeled as follows:

[M] is the one-member cluster consisting of point B;


[N] is the two-member cluster consisting of points A and C;
[ Q] is the three-member cluster consisting of points B A and C forroeO
by uniting clusters [M] and [N]. ' '

[ P] is the two-member cluster consisting of points D and E.


Thus m = 1 n = 2 q _ d
3
' ' - , an P = 2. From the definition of (2.16),
d.([M],[P]) = 1{d(B,D) + d(B , E)} = 11.6054;
d.([N], [P]) = Hd(A, D) + d( A E) + d( e, D) + d( e, E) l
= 10. 7092. __/ 7
AVERAGE UNKAGE CLUSTERING
67

Hence from (2.18),

du([P], [Q]) = t X 11.6054 + i X 10.7092 = 11.0079.

This is, as it should be, the same as du([A, B C] [D E]) .


! ' ' ' as given on page
1 64.

Unweighted and Weighted Distances


As Clustering Criteria

The average ~istance between clusters [P] and [Q], previously denoted by
du([P], [Q]), IS not the only way of measuring intercluster distance. Recall
and compare equations (2.18) and (2.7) (page 32). They constitute two
different answers to the question: What is the distance between two clusters
given that the first has just been created by the fusion of two preexisting
clusters each of which was at a known distance from the second cluster?
(Observe that the question asked is not the simpler one: What is the
distance between two clusters? The reason is that the answer sought is a
formula for computing the elements of each distance matrix from its
predecessor.)
Let us write [Q] for the first cluster, [M] and [N] for the preexisting
clusters from which [ Q] was formed, and [ P] for the second cluster. The
numbers of points in these clusters are q, m, n, and p, respectively, with
q = m + n.
The answer to the preceding question depends on how intercluster
distance is defined. As we shall see, the defining equations are sometimes
2
expressed in terms of distance d, and sometimes of distance squared d . To
make the relationship among the definitions more apparent, the w~rd
2
"dissimilarity" is used here to mean either distance or distance , according
2
to context. The symbol S is used in the equations to denote either d or d '
and after each equation its current meaning is specified.
. a· t e the answer to the
lf dissimilarity is defined as the average IS anc '
question is given by (2.18), rewritten with S in place of d, namely,

8.([P), [Q]) = m: n8.([M), [P]) + m: n8.([N] , [P]). (2.19)

Here 8 denotes d. . . . . . , the distance2 between


On the other hand, if dissirmlanty IS define~ as the answer to the
cluster centroids (i.e., as the squared centroid distance ),
CLASSIFICATION BY Cl\Js
68 lt~,N~

question becomes

oc([P], [Q]) = m: noc([M], [P]) + m: noc([N], [P])


- mn 2 oc([M], [N]). (2.20)
(m + n)
2
This is Equation (2.7) with x 2 = o/[P], [Q]), a = oc([M], [P]), b2 '='
1
Sc([N], [P]), and c2 = o/[M], [N]). In (2.20) o denotes d . The subscripts
in ou and oc stand for "unweighted" and "centroid," respectively; 8u may
be described as the unweighted average distance.
Both these dissimilarities are described as unweighted because they
attach equal weight to every individual point. Therefore, the weight of a
cluster is treated as proportional to the number of points it contains. As a
result, the centroid (center of gravity) of a pair of clusters is not at the
midpoint between the centroids of the separate clusters but is closer to the
cluster with the larger number of members (see Figure 2. 7, page 30).
We now consider "weighted dissimilarities." These are defined in a way
that attaches equal weight to every cluster, and hence unequal weights to the
individual points. Therefore, the definitions are very easily obtained by
setting m = n = 1 in Equations (2.19) and (2.20). Thus from (2.19) we get

Here cS denotes d and the subscript w stands for "weighted" · d is the


weighted average distance. ' w
Similarly, (2.20) is replaced by

l>m{[P], [Q]) = Hm{[M], [P]) + !Sm([N], [P]) - Hm([M], [N]).

(2.22)

Here. 8 denotes
. d2
and the sub scnpt
. m stands for "median"· d is tbe
median dzstance somef kn . ' m
E . ' unes own as the wezghted centroid distance.
assu!~ª:~~ (2.~2) can be obtained directly by considering Figure 2. 7. lf we
formed by ' :'. atever the values of m and n, the centroid of the cluster
urutmg [M] and [N] líes midway between them at distance c/Z
E UNt<AGE CLUSTERING
AVERA G
69

ch then it is clear that


frorn ea '

trorn which (2.22) follows in the same w__~y that (2.20) follows from ( .?). *
2
The Four Versions of Average Linkage Clustering

Four ways of measuring intercluster distance have now been described: the
unweighted average distance, the weighted average distance, the centroid
distance (unweighted), and its weighted equivalent, the median distance.
Each of these diiferently defined distances can be used as the basis of a
clustering process. At every step of such a process, the pair of clusters
separated by the smallest distance (using whichever definition of distance
has been chosen) is united.
The four clustering methods that use these distances are known, collec-
tively, as average linkage clustering. Centroid clustering, described in detail
in Section 2.4, is one of the four. The relationships among the four are most
clearly shown by arraying them in a 2 X 2 table thus (below the name of
each method is given the number of the equation to be used in the
computations):

Intercluster Distance
Average of Distance
Interpoint Distances Between Centroids

Unweighted group Centroid


Unweighted average method method
(2.19) (2.20)

W eighted group Median


average method method
Weighted
(2.22)
(2.21)

The methods were named by Lance and Williams (1966)·


* . . . arit between two clusters, one
For summary definitions of these four mea.sures of the dissimil ~ der is referred to the
~ Which has been formed by uníting two ~reexis~in~ clusters, t e rea
lossary; see under Average Linkage Clustenng Cntena.
CLASSIFICATION BY CLU
70 SlE~¡~

. ts should be noticed before examples are given; the


Three pom Yar
discussed in the following paragraphs.

l. Ali four methods have the great computatio~al advantage of be~


com bma ton·at (Lance and Williams, . 1966).
. . That IS, once . the distance
between every pair of points in the ongma~ swarm of d~ta pomts have bee
computed and entered in a distan~e-m~tnx, the co~r~mates of the Poin 1
are not needed again. Each succeedmg distance-matnx Is calculable from iu
predecessor, using the appropriate equation as indicated in the precedin~
table.
2. All four methods can quite easily be carried out using either d or di
in place of 8 in Equations (2.19), (2.20), (2.21), and (2.22). Thus each
method can be made to yield two different dendrograms since d and d2 do
not give the same results. However, there seems to be no good reason for
using d 2 rather than d for either of the group average methods. For the
centroid and median methods, on the other hand, d 2 is preferable todas
clustering criterion. As noted on page 32, Equations (2.7) and, likewise
(2.20) and (2.22) with 8 set equal to d 2 , have a definite geometric
meaning; this is not so if 8 is set equal to d. With the centroid and median
methods, therefore, it is best to use values of d 2 as clustering criteria (i.e., to
unite the cluster pair for which d 2 is a mínimum at each step). But this <loes
2
not amount to using d as a dissimilarity measure. Rather, one uses das
dissimilarity measure and its square as clustering criterion.
3. It is worth noticing that the terms "weighted" and "unweighted"
have been used here as Lance and Williams (1966) and Sneath and Sokal
(1973) use them. Their usage is unexpected and apt to mislead unless it is
remembered that the word "unweighted" applies to the original data points
(see Gower, 1967).

EXAMPLE F
· . igure 2.17 shows the dendrograms obtained by applying tbe
four clustenng method t0 D M . .
. . s ata atnx #6 (see Table 2.14). The clustenng
en 1enon was d for the g d
d' roup average methods and d 2 for the centroid an
me Lan methods. The heights of th d . al
to values of d Th d e no es lll all four dendrograms are equ
clustering is n~tice:bl e~~~ogram produced by unweighted group average
however, that this w·n ~ 1 ere~t from the other three. It does not folloW,
1 e true Wlth other data matrices.
The unweighted group avera . ,
dure most widely u d b ge method is probably the clustering proce
se Y ecologists T 0 ·1
· mention only a single exarople, 1
AVERAGE uNt<AGE CLUSTERING
71

A
8

50

w
L)
z
~ 25
Cfl
o

o
1 2 3 4 5 7 8 6 9 10 1 2 3 4 5 7 8 6 9 10

e D
50

w
l)
z
<!
1-
Cfl 25
o

o
1 2 3 4 5 7 8 6 9 10 1 2 3 4 5 7 8 6 9 10

Figure 2.17. Four dendrograms produced by applying different forms of average linkage
clustering to Data Matrix #6 (see Table 2.14). (A) centroid clustering; (B) median clustering;
(C) unweighted group average clustering; ( D) weighted group average clustering. The cluster-
ing criterion is d 2 for A and B, and d for C and D .

was used by Strauss (1982) to cluster 43 species of fish occurring in the


Susquehanna River drainage of Pennsylvania (this is an example of Q-type
clustering; see page 8). As Strauss remarks, "any clustering technique might
have been used."
It is, unfortunately, true that no one clustering method is better than all
the others in every respect. To choose a method wisely, it is necessary to

TABLE2.14. DATA MATRIX #6. THE QUANTITIES OF 2 SPECIES IN


10 QUADRATS.
8 9 10
Quadrat 4 5 6 7
1 2 3
54 64 66 82
Species 1 33 32 34 51
15 21 20 15
27 45 42
Species 2 75 72 58 46 32
CLASSIFICATION BY CLU
72 Sll~¡~C

vantages and disadvantages of each and decide Which


balance the ad . . ad
ost desirable and which d1sadvantages can be tolerated 'h,
vantages are m . . . . llle
. . · ften a1·fficult· choosmg the best trade-offs m a given cont
dec1s10n is o ' . ext ¡
a1ways, m
· the end ' somewhat subjective. We now d1scuss sorne of the rnost8
crucial decisions.

2.8. CHOOSING AMONG CLUSTERING METHODs

Seven clustering techniques have been described in this chapter: nearest and
farthest-neighbor clustering, minimum variance clustering, and the four
forms of average linkage clustering among which centroid clustering ~
included. There are many other, less well-known methods, devised for
special purposes; accounts of them may be found in more advanced books
such as Orlóci (1978), Sneath and Sokal (1973), and Whittaker (1 978b). One
or another of the last five methods described in this chapter should meet the
needs of ecologists in ali but exceptional contexts. It remains to compare the
methods with one another.

Nearest and Farthest-Neighbor Clustering

These are rarely used nowadays. In these methods the two clusters to be
~ni~e? at any step are determined entirely by th~ distance between two
mdlVldual data points, one in each cluster. Thus a cluster is always repre·
s~nted by only one of its points; moreover, the "representative point" (a
differe~t one at each step) is always "extreme" rather than "typicar' of the
cluster It represents.

Minimum Variance Clustering

This is a useful technique wh h .


of the quadrats b en t ere IS reason to suspect that sorne (or ali)
e1ong to one or more h Je
suppose data had b omogeneous classes. For examP '
een coliected by li . ¡ d
quadrats, a rather hete samp ng, With randomly p ace
uncertain whether ali :~:en:ous tract of forest and scrub. One might be
whether on the co t q adrats should be thought of as unique or

.
:e
quadrat~ in any 0 rarly, they f~rmed severa! distinct classes with all tbe
C ass COilStltutin nl'T'IC
population. In the fo g a random sample from the Sw':
· . rmer case every d . 1s
mterestmg and reveals (ºt . h ' no e m a clustering dendrograIJ1
i IS oped) "t " . . ·¡w
rue relationships among dissiJlll
CHOOSING AMONG CLUSTERING METHODS
73

1

thí·ngs. In the latter case the first few fusions do no more than urute .
of quadrats that are not truly distinct from one h groups
. hi anot er; the differences
d
arnong the qua rats wit n such a group are due enf 1
. h . . . ire y to chance and the
order in which
. . t ey are . umted is likewise
. a matter of chanee. '
With nummum vanance e1ustenng it is possible to d . .
. . . o a statistical test of
each fusion m order to .Judge whether the points (or clusters) b elilg . uruted.
are homogeneous (replicate samples . from a single parent .
popu1ation) or
heterogeneous (samples from d1fferent populations) This i ·
· · · 1 h "· f . · s eqmva1ent to
judgi~g, obJective y, t e m ormat10n value" of each node in a dendrogram.
Thus If the lo~ermost ~odes re~resent fusions of homogeneous points or
clusters, they have no mformat10n value; obviously, it is useful to dis-
tinguish them from nodes representing the fusions that do convey informa-
tion about the relationships among the clusters and about their relative
ecological "closeness." The reader is referred to Goodall (1978b, p. 270) or
Orlóci (1978, p. 212) for instructions on how to do the test, which is beyond
the scope of this book.
Mínimum variance clustering, like farthest-neighbor clustering, tends to
give clusters of fairly equal size. If a single data point is equidistant from
two cluster centroids and the clusters do not have the same numbers of
members, then the data point will unite with the less populous cluster
(proved in Orlóci, 1978). The result is that, as clustering proceeds, small
clusters acquire new members faster than large ones and chaining is unlikely
to happen. This is a great advantage when clustering is done to provide a
descriptive classification, for mapping purposes, for instance. Of course, it
<loes not follow that a nicely balanced dendrogram gives a truer picture of
ecological relationships than a straggly one.

Average Linkage Clustering

. now to the four average li·nkage e1us tenn · g techniques ' the first
Turrung
choice that must be made is between unweighted and weighted ~ethods. ~
1

the great majority of cases an unweighted method, which assI~ns eq~


weight to each data point and hence weights each cluster a~c.ordmgd knto Its
. . . . t f commumties an ew
size, Is better. But if one were studymg a rrnx ure 0 h weighted
that they were very unequally represented in th~ data, ~ en ttheir sizes
method, which assigns equal weight to the clusters irrespecuve o ·ty fro~
d ntly sampled commum
would be useful· it would prevent th e a b un
h · '
ª
h h e of the dendrogram.
How large
. avmg an overly large infiuence on t e s ap . Choosing wisely
is "overly large" IS, of course, a question of JUdgment.
CLASSIFICATION BY CLusn
74 Rt~c

between weighted and unweighted clustering is not always easy, but When ~
doubt, unweighted clustering is to be preferred. .
I t remains to choose between group average .clustenng h and centrO¡.
clustering (or its weighted equivalent, median clustenng). T e pros and con
are very evenly divided. Each method has a notable ~dva~tage that the othe
ª
lacks and, at the same time, a notable weakness which is consequence 0
the advantage.
The strong point of centroid clustering is that each cl~ster as it is forme
is represented by an exactly specifiable point, its centr01d, and the distance
between two clusters is the distance between their centroids. In grouP.
average clustering, there is no such geometrical realism: the clusters cannoi
be identified with precise representative points and, therefore, the concept 0
intercluster distance is unavoidably fuzzy. The device of using the average 0
all interpoint distances between two clusters as a measure of intercluste
distance is just that, a device.
The weakness of centroid clustering, a weakness not shared by grouP.
average clustering, is that it is not monotonic. This term is most easil
explained with a figure (Figure 2.18). The upper panel shows six data points
and their coordinates in a two-dimensional space. Below are two dendro-
grams obtained from the data. The dendrogram on the left results from
centroid clustering with d 2 as the clustering criterion; the scale shows the
square root of the d 2 value corresponding to each node. The dendrogram
on the right results from group average clustering with d as criterion.
As may be seen, the centroid clustering dendrogram contains two so-called
reversals. For example, the height of the node (the intercluster distance)
rep.resenting. the fusion of E with [D, F] is less than that representing the
fusion of pomts D and F. This is because (see the upper panel) although D
an~ F are nearer to each other than either is to E, so that D and F are
uruted first, the centroid of the new cluster after the fusion (the hollow dot
labeled [D, FP is nearer to E than either of its component points were
before the fus1on. The distances are easily found to be

d(E, D) = 9.604; d(E, F) = 9 .002 ; d(D, F) = 8.944 ;


and d(E, [D, F]) = 8.163.
Clearly,

d(E, [D, F]) < d(D, F) < { d(E, D)


d(E, F)"
The reversa] where B joins [A C] h
' as the same cause.
CHOOSI N
G AMONG CLUSTERING METHODS
75

There are no reversals in the dendrogram on the right.


Indeed, it can be proved that reversals cannot occur in a dendrogram
obtained by the group average clustering methods (Lance and Williams,
1966), and hence these methods are preferred by those who regard reversals
. a dendrogram as a fatal defect.
in If a clustering method is incapable of giving reversals, the measure of
. tercluster distance that it uses is said to be ultrametric; with an ultrametric
lil f . 1 d"
meaSure' the sequence o mterc uster istance values between the pair of
clusters united at each successive fusion is always a monotonically (continu-

A(6,24)
T
1
20 1
• e e 16.4 ,10.0 l
[A,B]~
(/) 1 • E(34.2, 15)
w 1

ü
~ 10

8(6, 12) 0(26,10).__ rr
'~,F
]
(/)

'-.... F( 34,6)

10 20 30 40

SPECIES

24.2
23 .3 11
12. 2

11.6
11.6

10 .7
w
~I u
w z
ü <!
~- z t-
(/)
4 9.3
~ o

01
(/) 8.9

o 8 .9

8 .2 1 1 o
o o e B E o F . dendrogram (left)
F A
A e B E
sals appear m
a centroid clustenng
. t d from the
d gram construc e
Figure 2.18. 111ustration of how rever average clustering den r~ted at the top. The ho
although they are absent from the _grou~at were clustered are ~o f the dendrograms, wbic
11
?:
same data (right). The six data pom~] and [D, F). Note the se es 0
dots are the centroids of clusters [A' als conspícuous.
have been adjusted to make the revers
CLASSIFICATION BY CLUST
En1N
76

ously) increasing sequence. Therefore, the clusterin~ method is c~lled mon


tonic. In group average clustering the measure of rntercl~ste~ distance (tH
average of all the point-to-point distances between a po~t m one cluste
and a point in the other) is ultrametric; hence the method IS monotonic an
the dendrograms it gives are free of reversals.
If a clustering method can give reversals, the me~sure of intercluste
distance that it uses is not ultrametric and the method IS not monotonic • ¡1
centroid clustering, the measure of intercluster distance (the distance be
tween the two cluster centroids) is not ultrametric, as Figure 2.18 show
Hence centroid clustering is nonmonotonic and it can give reversals.
In sum: group average clustering gives clusters with undefined center
and monotonic dendrograms; centroid (including median) clustering giv
clusters . with exactly defined centers and dendrograms that may contau
reversals. lt is logically impossible to have the best of both worlds if ¡
guaranteed absence of reversals is indeed "bes t." Reversals, where thei
occur, suggest that the difference between the clusters being united ü
negligible; unfortunately, one cannot make the converse inference, that an
absence of reversals implies distinctness of ali the clusters.

2.9. RAPID NONHIERARCHICAL CLUSTERING

All
· theh.clustering. techniques
. so far d escnbed
. in this chapter have been
hzerarc zca1. A hierarchical clu t .
unite data points ·nt s enng procedure <loes more than mere!
1 o e1usters. 1t perfo h f . . .
and, therefore the outc . rms t e us10ns rn a defim te sequence
, orne can be displa d d
to discem the different d . ye as a endrogram, enabling one
. egrees of relationshi .
W1th very large data mat . . . P among the pornts.
clustering of the data ~ces, it IS often desirable to do a nonhierarchic
preliminary to hierarchi polmlts (q~adrats or other sampling units) as a
thi s: ca e ustenng · There are severa! reasons for doi.n .

1. ffierarchical
. clusterin b
c_omputer time and me g y any method makes heavy demands oJJ
tlon n mory · F or very 1
a y more economical pr d . arge bodies of data, a computa·
~· A dendrogram w'th oce ure IS desirable.
1 a very I
ultlill at e branches is too big t arge number (100 or more say) o¡
o comprehend. '
ONHIERARCHICAL CLUSTERING
RAPID N
77

3. A Iarge .
data matrix usually contains data f
. hi h f h . . rom numerous replicate
aropling umts w1t n eac o t e d1stmguishabl d·i:r ..
s . . b . . . Y Iuerent commuruties
W hose relat10nships are emg mvestigated. Hence th li . .
. . . . e ear est fus10ns m a
hierarchical clustenng are likely to be urunformative Th
. . · ey mere1y have the
effect of pooling replicate quadrats, and the order of th f . . .
. . e us10ns which brmg
this pooling about is of no mterest.

Therefore, it is . desirable to subject very large data matnces · to non-


hierarchical cluste~g at the outs~t of an analysis. The clustering should be
done by as econonucal a method, m computational terms , as poss"bl 1 e.
Also,
so far as possible,. the clusters it defines should be homogeneous. This
preliminary clustenng. should have the effect of condensing a large data
matrix. lt should pernut batches (or "pools") of replicate quadrat records in
the large matrix to be replaced by the average for each pool. Then the
centroids of these batches, or pools, of virtually indistinguishable quadrats
can become data points for ·a hierarchical clustering that will reveal their
relationships.
lt is unfortunate that the word clustering is at present used both for the
hierarchical clustering procedures discussed in earlier sections of this chapter,
and also for the rapid, preliminary nonhierarchical clustering that we are
considering now. The objectives of the two operations are entirely different.
Nonhierarchical clustering is just a way of boiling down unmanageably
large data matrices in order to make hierarchical clustering (or other
analyses) computationally feasible and ecologically informative. To avoid
ambiguity, there should be different names for the two operations; perhaps
rapid nonhierarchical clustering could be called pooling, since that is what it
does, and a cluster defined by such a process could be called a pool, as in
the preceding paragraph. These newly coined terms are used in what
follows.
Severa! methods of data pooling have been devised; pr~bably th~ be~; is
Gauch's (1980, 1982a) technique which he calls "compos1te cluste~ng. A
computer program for doing it is available (Gauch, 1979). In outline, the
process is as follows.
There are two phases. In the first phase, points are selected at rand~m
·
from the swarm of data points, and all other pomts wi·thin a specified. radms
. 1
0 f each selected point are ass1gned to a poo cen ered on that pomt. The
t .
t 0 f course centr01ds)
random points that act as pool centers (they are no , .' .
. h rsphencal m shape
are chosen one after another. The earliest poo1s are ype
LLA~~•F•c~noN BY e
LlJs1l~1~
78

. lar in the two-dimensional case). Later pools are not all


(CJICU earlier pools (i.e.,
overlap
. . . O\VC{j
a pomt must remam a member of the pool it .. t
first) and hence these later pools tend to be small and " spikey" sin Jo
occupy the interstices. among. earlier formed
. ce thei
pools. Therefore '.the pr ocedu
has a second phase m which pools w1th. fewer than a. specified quOta Or
points are broken up. Their member pomts are reass1gned to the n
Jarge pool, provided that they ~e
witbin a predetermined distance of it.e~:
on the second round the radn of sorne pools are slightly increased. Tll
number of pools formed is under the control of the investigator' who mus
choose the radius to be used at each phase of the process. The smaller
radü, the smaller and more numerous the pools, and the more confident lb
number~
1

can be that they are homogeneous. Points that fail to become


any pool are rejected as "outliers." After t~e
pooling has been complet~
each pool can be replaced by an average pomt (the centroid of all the p ·
. the pool) and these average points then become the data point ~f
m 0
further investigations. s

APPENDIX

Apollonius's Theorem

To Prove:

c2 = ª2 + b2 - 2ab cos(J
EXERCISES 79

where
e = C1 + C2 and () = () 1 + ()2.

Proof: First recall that


cos( 81 + 82) = cos () 1cos 02 - sin 01sin ()

Observe, from the figure, that

h = a cos 81 = b cos (} 2,. h 2 = ab cos 01cos ()2,.

C1 = a sin01 and C2 = b sin()2·

Also,

Now
e 2 -- ( C1 + C2 )2 =e?+ ci + 2c1c2.
c2 = (a2 - h2) +(b2 - h2) + 2absin01sin02

= ª 2 + b 2 - 2( h 2 - ab sin01sin02)

= a 2 + b2 - 2ab ( cos 01cos ()2 - sin () 1sin ()2)

= a 2 + b2 - 2abcos0.
QED

EXERCISES
Given the following data matrix X, what is the Euclidean distance
2.1.
between: (a) points 1 and 5; (b) points 2 and 3; (e) points 3 and 5?
(Each point is represented by a column of X, which gives its coordi-
nates in four-space.)

l~ 8 2 -1
o 4 -2
3 -1 6
X=
-2 -4 -2 o
CLASSIFICATION BY CL
80 USltR1~

2.2. Cluster the five quadrats in Exercise 2.1


.
using farthest-neighb
or cJ
tering. Tabulate the results in a table like Table 2.5 . Us

2.3. Suppose the data points whose coordinates are given by the col
of x in Exercise 2.1 were assigned to two classes: (1, 2] and [3~lllni
Find the coordinates of the centroi~ of each of these classes.
5
1
Wh;
.I
the distance between the two centro1ds?
2.4. Let M ' N ' and P denote the centroids of clusters of points in five•sPac
with, respectively, m = 5, n = 15, and p = 6 members. Find th
distance 2 between P and the centroid of the cluster formed by Uniti !
clusters [M] and [N]. The coordinates of points M, N, and pare ~
follows:

M N p
1 -5 10
-8 2 11
8 5 -2
9 1 -5
7 4 -6

2.5. What .is the withi~-cluster dispersion of the swarm of five points whos(
coordmates are given by X in Exercise 2.1?
2.6. For the two points m' s·ix-space wh ose coordmates
. .
are given by th¡
columns of th~ following 6 X 2 matrix, find: (a) the chord distance
(b). the geodes1c metric·' (e) the angular separation between the tW(
pomts.

3 -1
4 -3
-2 -4
1 o
5 4
2.7.
Obtain J accard' s and S ' . .
following three . 0rensen s md1ces of similarity (J and S) for tb
paus of quadr t · D a
62): (a) quadrats 1 and 2· a s m ata Matrix # 5 (Table 2.13, Pªj
Prove that S must alwa ~ (b) quadrats 3 and 4; (e) quadrats 2 a11d
y exceed J except when S = J = l.
EXERCISES
81

The columns of the following matrix give the co a· .


2. 8. . . or mates m four-space
of seven pomts grouped mto clusters as shown.

[Q]
[M] [N] [P]
~ ~ ~
M1 M2 N1 N2 N3 PI P2
3
3
-1
8
7
9
5
7
5
6
-3
-2
o
-1)
-1
[! o
8
6
8
9 6 1
4
-2

Find the following measures of the dissimilarity between clusters [P]


and [Q]: (a) The unweighted average distance; (b) the centroid dis-
tance; (e) the weighted average distance; (d) the median distance.
1

Chapter Three

Transforming Data Matrices

3.1. INTRODUCTION

This chapter provides an elementary introduction to the mathematics neces-


sary for an understanding of the ordination techniques described in Chapter
4. But to begin, it is desirable to demonstrate a crude form of ordination to
show what the purpose of ordination is and how this purpose is achieved.
Consider Data Matrix #7 (Table 3.1) which shows the quantities of four
species in six quadrats (or other sampling units ). Suppose one were asked to
list the .quadrats "in order" or, equivalently, to rank them. Clearly, there is
no "natural" way to do this; the data points do not have any intrinsic order.
The task would be simple if one species only had been recorded; then the
quadrats could be ranked in order of increasing (or decreasing) quantity of
the single species. When two or more species are recorded for each quadrat,
however, the data points do not, usually, fall in a natural ~equenc~.
Although such natural ordering is not logically impossible, in pract1ce one is
far more likely to find that a set of observed data points represents a diffuse
swarm in a space of many dimensions. Therefore, if one wis~es.to r~ the
points, it is necessary first to prescribe sorne method of assigmng s~gle
ª
numerical score to each quadrat. Then, and only th~n, can the pomts
(quadrats) be ranked using the seores to decide the ranking.
' .. . D M tnx· # 7 are cover values
To illustrate suppose the quantities m ata
' .
ª .
tree spec1es 2 a
0 f four species of forest plants. Let spec1es 1 be a canopy '
83
TRANSFORMIN G DATA M
Al~IC~
84

. t tree species 3 a tall shrub, and species 4 a low shrub. On


subdonunan ' · dd e Way
. score to each quadrat would be sunp1Y to a the cover valu
to ass1gn a f . .. es o
ali four species. Thus Iet~in_g x iJ denote the cover o spec1e_s i m ~Uadrat j
the score of quadrat J is (x1 1 + X21 + X31 + X4)· Usmg tbis scorin.
the seores of quadrats 1 through 6 are found to be, respectively
me th o,
d ,
183 83 150 130 124;
118
the ranking of the quadrats, from that with the smallest to that with th
largest score, is then

#3, #1, #6, #5, #4, #2.

Alternatively, one might choose to weight the species according to the·


sizes instead of treating them ali equally. There are infinitely many ways ·
which this could be done. For example, one might assign to quadrat j th
score (4x 1j + 3x 2 j + 2x 3j + x 4 ) . Using this formula, the seores of quadrat
1 through 6 are, respectively,

335 340 221 400 234 375

and the ranking of the quadrats becomes

#3, #5, #1, #2, #6, #4.

~his is a d~ere~t order from that given in the preceding paragraph. Bot
lists ~re ordmatwns of the data in Data Matrix # 7 and the fact that the
are different s~o':s that the result of an ordination depends on the metho
chosen
. for ass1grung seores to the quadrats or, equivalently on the we1ght
.
ass1gned to the differen t spec1es.
· The vanous . '
ordination techniques de

TABLE 3.1. DATA MATR


IN n = 6 QUADRATS. IX :t:t?. THE QUANTITIES OF s = 4 SPECIES

Quadrat

---
1 2 3 6
4 5
Species 1 50 20 25 45 15 60
Species 2 11 16 20 33 14 17
Species 3 45
Species 4
65 23 49 31 37
12 82 15 23 70 10
R AND MATRIX MULTIPLICATION
vECTO
85
·bed in Chapter 4 are all procedures f
sen . f . or deter · .
biectively mstead o choosmg them arbitr il Illlrung. these weights
oJ di ar y and b
done in the prece ng. su ~ectively as was

3.2. VECTOR AND MATRIX MULTIP


LICATION
The operation just performed, that of transfor ·
Illlng a data mat · ·
seores that can be ranked, can be represented s b li nx to a list of
Ym o cally.
Vector X Matrix Multiplication

Let us write X for the data matrix which in the prevI· .


ous examp1e Is an array
of numbers ( elements). arranged in four rows and six· e 1
o umns enclosed m .
Iarge parentheses. It IS a 4 X 6 matrix.

X12 X13 XI4 XIS XI6

X= rxll
X21 X22 X23 X24 X25 X26
X3I X32 X33 X34 X35 X36
X4I X42 X43 X44 X45 X46

The element in the ith row and jth column is written x¡J· Notice that the
first subscript in X¡¡ is the number of the row in which the element appears,
and the second subscript is the number of the column; this rule is invariable
and is adhered to by all writers. In this book, and in most but not all
ecological writing, data matrices are so arranged that the rows represent
species and the columns represent sampling plots or quadrats. Therefore,
when this system is used x .. means the amount of species i in quadrat J.
' lj . .
The single symbol X denotes the whole matrix, made up m this case of 24
distinct numbers. I t does not denote a single number (a sea/ar). Boldface
type is used for X to show that it is a matrix, nota scalar.
Now let us write y' for the Iist of six seores; y' is a matrix with only one
row, otherwise known as a row vector. 1t is

y'= (y1 , Ji, y3, y4, Ys' Y6).


A b . The lowercase letter
ª
s efore, the boldface type shows that Y' is matnx. only one
(y not Y) shows that it is a vector (a matrix with only one row, or
TRANSFORMING DATA M
86 AlR1cEs

column). The prime shows that it is a row vector. If the same array
elements were written as a column instead of a row, they would fo of
.h . ) rm a
column vector and be denoted by y (w1t out a pnme .
Finally, let us write u' for the list of coefficients by which each elemen .
a column of X is to be multiplied to yield an element of y'; u' is a :0lll
vector and the number of elements it contains must obviously be the sa
as the number of elements in a column of X. (The number of elements:
column of X is, of course, the number of rows of X.) Hence
(u 1 , u 2 , u3 , u4 ). Recall again the example in Section 3.1. The first list 0
u':
quadrat seores, namely,

y'= (118, 183, 83, 150, 130, 124)

was obtained by adding the elements in each column of X. That is, the score
for the j th quadrat was given by
4
YJ = X1j + X21 + X3j + X4j = L X¡j•
i=I

Altematively, this can be written as

with u1 = u 2 = u3 = u4 -- 1• Therefore, th ese seores were obtained using


the row vector

u' = (1 , 1, 1, 1).
The elements in the second list of seores, namely,

y' = (335, 340 , 221 , 400, 234, 375)


were obtained from the same formula - 4 . 4
U2= 3, u3 = 2, and u = H . (yJ - L¡=1U¡X¡) but with u1 === ,
4
1 . ence m the second case we had

u'= (4 ' 3 ' 2 ' 1) .


The operation by which 'wa . .
ª form of matrix mult.lp 1zcatwn
. Y_ s obtamed from u' and X in the two cases
1t e11t 1·¡
15

• ª s the multiplication of a matrix bY ª


ECTOR ANO MATRIX MULTIPLICATION

87
w vector (or one-row matrix). Thus y' ·
O . . . is the pr d
asan equat10n, this IS o uct of u' and X. Written

u'X =y'.

n words: the coefficient vector u' times the d .


. .d . al . ata matnx X . h
'. This IS I entIC m meaning to the much 1 . Is t escore vector
e umsier equation
X12

(u1,U2,U3,U4)
[Xn
X21
X22
X13
X23
X14
X24
X15
X25
X16
X26
X31 X32 X33 X34 X35 X36
X41 X42 X43 X44 X45 X46

= (Ji, Y2, Y3, Y4, Ys, y 6 ).

This
. extended version
. of the equation
. u'X = y' is itself a con densed form of
six separate equat10ns, of which the first and last are:

...........................

Thus the rule for calculating each of the six elements of y', that is, for
calculating the elements in the product u'X, is the formula already given:

4
Yj = L U¡X¡j for j= 1,2, ... ,6.
i=l

To generalize: suppose an s x n data matrix X records the amounts of s


species in n quadrats. Let u' be an s-element row vector (i.e., a 1 X s
matrix) of weighting coefficients; these are the weights to be assigned to
each species in order to calculate the score for a quadrat. Let the resultant
seores be listed in the n-element row vector Y'. Then
(3.1)
u'X =y'

in which
s
for j = l,2, ... , n.
Yj = L U¡Xij
i=l
TRANSFORMIN G DATA
88 MAi-~IC

Let us reWrite Equation (3.1) with the sizes of the three matrices sho1,1¡
below them:
u' X = y' .
(1 X s) ( s X n) (1 X n)

For the multiplication to be possible, tl;te number of columns in the firs


factor, u', must be the same as the number of rows in the second factor,
Sin ce u' has s columns and X has s rows, the product y' = u'X can indee
be formed. It has the same number of rows as the first factor, u', and th
same number of columns as the second factor, X. In other words, the size 0
y' is 1 X n.
As should now be clear, the factors in a matrix product must appear ·
correct order. Equation (3.1) cannot be written as Xu' = y'. The produc
Xu' does not exist, since n, the number of colum·1s in X, is not equal to 1
the number of rows in u'.
The product u'X is described as X premultiplied by u' or as u' postmulti
plied by X.

Matrix X Matrix Multiplication

The preceding paragraphs showed how to premultiply a data matrix X by


vector u' of weighting coefficients to obtain a vector y' of quadrat seores
Before proceeding, it is worthwhile to recall the purpose of the operation. 1
is to replace a large, perhaps confusing, data matrix by a list of seores that i
much more easily comprehended. To put the argument in geometric terms
an s X n data matrix is equivalent to a swarm of n points in s-dimensiona
space. Therefore, unless s s 3, the swarm is impossible to visualize. How
ever, if the original data matrix is transformed into a list of "quadra
seores" by the procedure previously described, the multidimensional swar
of ? 0 ints is transformed into a one-dimensional row of points tbat ca
easily be ?lotted on one axis to make a one-dimensional graph. .
Reducmg an s-dimensional swarm to a one-dimensional row entail
considerable sacrifice of information, of course. This raises the questíon
Need multidimensional data be so severely condensed to make thern coJll
prehens~ble? The answer is obviously no. A two-dimensional swartJl (a
conventwnal scatter diagram) is quite as easy to understand; it e~ ~J
plotted on a sheet of paper. How, then, can the original s-dirnens1on
swarm be transform d t0 .
e a two-d1mensional swarm?
AND MATRIX MULTIPLICATION
vECTOR
89

An obvious way is to f
carry out the described pr d
.gh . .
.
oce ure tw1ce over usin
different vectors o we1 tmg coeffic1ents u'1 and , g
tWO b · d . ' U2, say. Two vectors
, and y'2 of seores are o tame , each w1th n element Th '
Y1 ' · s. us each of the n
.lll ts now has two seores which can be treated as the e d. .
po . . oor mates of a pomt
. two-dimens10nal space enabling the data to be plott d .
1l1 e ~m~mry
i scatter diagram.
To illustrate, consider Data Matrix # 7 again It has 1 d b
. . . . · a rea y een
condensed to a one-d1mens10nal list of seores m two different ways. The first
condensation use? the vector (1, 1, 1, 1) = uí, and gave the result
(118, 183, 83, 150, í30, 124) = y{. The second used the vector (4, 3, 2, 1) = u'
2
and gave the result (335, 340, 221, 400, 234, 375) = YÍ· It is straightforward
to combine these two sets of results. W e let quadrat 1 be represented by the
pair of seores (118, 335), quadrat 2 by the pair of seores (183, 340), and so
on. Eaeh pair of seores is treated as a pair of coordinates and the points are
plotted in a two-dimensional coordinate frame, with the first seores mea-
sured along the abscissa and the second seores along the ordinate. The
result, a two-dimensional ordination, is shown in Figure 3.1.

400 •



300



200

º 'ºº zoo Y1
e transformation of their original
Figure 3.1. The data points of Data Matrix # 7 af~er th . two-space. The solid dots show
eoord'mates in four-space given in Tab1e 3·1' t0 coordinates
.b dm·n the text. The h0 11ow half-dots
th . ' · · al d ta descn e 1 . Id d by vectors
e two-d1mensional ordination of the ongm
.
ª
· al ordinatwn
. ·
s of the data y1e e
on the Yi and y axes show the two one-d1menswn
u'1 2
and u'2, respectively.
90 TRANSFORMING DATA
MAlRtc~

The two operations just performed on data matrix X could be


. repre
sented symbolically by the two equat10ns ·

U1'X = Y1 I
and 'X-y'
U2 - 2·

However, there is a still more compact representation, namely,

UX=Y (3.2)
Here U has two rows, the first being ui and the second uí. That is, U is the
2 X 4 matrix

u=(! 1
3
1
2
1) = (
1
Uu
U21

N otice that the 2 X 4 matrix U is denoted by a capital letter since lowercase


letters are reserved for vectors. Also, the elements of U now require a pair of
subscripts to define their locations in the matrix; the first subscript specifies
the row and the second the column.
Likewise, matrix Y in Equation (3.2) is a matrix with two rows and six
columns. It is

y= ( 118 183 83 150 130 124)


335 340 221 400 234 375

= (Yu
Yi1
Y12 Yu
Y22 Y23

To generalize: Equation (3.2) specifies that X is to be premultiplied by V


to give Y. Suppose data matrix X were of size s x n. Then writing (3.2) with
the dimensions of each matrix shown gives

U X = Y
(2 X s) ( s X n) (2 X n)

The factors are so ordered (UX, not XU) that the number of columns in tbe
first factor is equal to the number of rows in the second · both are s. The
pro duct Y has the same number of rows as the first factor' and the saJll e
number of columns as the second factor.
W e can gener alize further. So far we have discussed one-dimensIO· nal
ordination and two-dimensional ordination. There is no need to stop at tWº
/
VECTOR AND MATRIX M U LTIPLICATION
91

dimensions. I t is true that a three-dirnensio na1 ordmat . . .


dimensional swarm of points that can onl b ion y1elds a three-
. fi Y e plotted 0
perspect1ve gure or as three separate two-a· . n paper as a
. d· · . 1Illens1onal gra h . hi h
dimens10na1or mat10ns requue even more two-a· . P s, g er-
1Illens10nal g h f h .
portrayal. However, the process of ordinating a . . _rap s or t err
· mu1tidunens10n 1
data points obVIously entails a trade-off One m t b
· . · us a1anee the ad
ª swarm of
of condensmg the data agamst the disadvantages f .fi . . vantages
. o sacn cmg mfor t"
It is often desirable to keep more than three d · . . ma wn.
. . rrnens10ns m transformed
data, eqmvalently, to do a more-than-three-dimensional a· .
· · · d · Ch or mat10n The
top1c 1s discusse
. m apter
. 4.. For the present ' let us cons1·aer the symbolic · .
representat10n
. of a p-d1mens10nal
. ordination of an s x n data matrix. · X.
The requued r~presentat10n has already been given: it is Equation (3.2),
unchanged. A difference appears only if the sizes of the matrices are shown.
We then have

U X = Y . (3.3)
(pXs) (sXn) (pXn)

Each of the p rows of U is a set of s weighting coefficients (how to find


numerical values for these coefficients, objectively, is considered in Chapter
4). Each of the n columns of Y is a set of p seores for one of the quadrats;
treating these seores as coordinates permits the points to be plotted (concep-
tually) in a p-dimensional coordinate frame.
The rule for calculating the value of the (i, j)th element of Y, tha~ is, the
i th score of the j th poin t, is summed up in the equation

s
Yij = U¡¡X¡j + ui2x2j + ... +u¡sXsj = 2: U¡,Xrj•
r=l

· · d ts of the
n words the ( i j)th element of Y is the sum of the pairwise pro uc
lements' in the' ith row of the first factor, U, and the Jth column of the
second factor X
' · hink 0 f each of the p rows of u
.Equivalently, as was just done, one can t t y' One
d
. w of y as a row vec or .
s a row vector u', and the correspon mg ro . p· ally the p
h . . · 'X ' p separate tlffies. m '
en performs the mult1plicat10n u = Y ' ther to give
ectors y', each with n elements, are stacked on top of one ano
he P X n matrix Y.
TRANSFORMING DATA
92 MA¡~IC~

Linear Transformations

In what follows, a matr~ of size s X n, that is, with s rows and n colurnni
is called an s X n matnx.
lt was shown in Equation ~3.3) that when an ~ X n data matfix Xu
premultiplied by a p x s matnx U, th~ pro_duct ~IS a _P X n matrix. Now
matrix x specifies the locations of n pomts m s-d1mens10nal space (s-spa
for short). Indeed, each column of X is a list of the s coordinates of one ~
the points. Likewise, Y specifies the locations of n points in p-space; eac
of its columns is a list of the p coordinates of one of the points.
We can, therefore, regard Y as an altered form of X. Both matrice
amount to instructions for mapping the same swarm of n points. X map
them in s-space; Y maps the same points in p-space. Therefore, if p < s,
the p-dimensional swarm of points whose coordinates are given by the
columns of Y is a "compressed" version of the original s-dimensional
swarm of points whose coordinates were given by the columns of X. In other
words, premultiplying X by the p X s matrix U has the effect of condensing
the data and, inevitably, of obliterating sorne of the information the original
data contained.
Now suppose that p = s or, equivalently, that U is an s X s matrix (a
square matrix). Premultiplying X by U no longer condenses the data since
t?e product Y is, like the original X, an s X n matrix. But the premultiplica·
~10n d~es affect the shape of the swarm represented by X, and it ii
mt~restmg ~o see how a very simple swarm is affected, geometrically, by a
vanety of d1fferent versions of the "transforming" matrix u.
To make the demonstration as clear as possible, we shall put

10 1
1 10 10) .
10

Thus X is a 2 x 4 t · .
. ma nx representmg a swarm of n = 4 points in s === two·
space. The pomts are at th
We now evaluate UX e· corners of a square (see Figure 3.2a ). · s
are given in the followin u~mg_ seve~al different Us. The numerical equat1~J1P
and is left as the s b lgX, X IS wntten out in full only in the first equattº.
ym o subse 1 · 111s
the matrix u h . quent y. The first factor in each equatt 0 .
w ose effect is b · . d 10
Figure 3.2 All th emg exammed. The results are plotte .
· e transformar ·1 ts 10
spaces of more than t d" IOns 1 lustrated have their counterpar 0
wo imensions, of course, but these are difficult (wbe
VECTOR AND MATRIX MULTIPLICATION

c----
93

(b)

"f 1
o A---8
o 10

(e) (f)

.~

e t
Figure 3.2. (a) The four data points of the matrix X and also of IX = X (see text). ( b)-(f).
The same data points after transformation by the five different 2 X 2 matrices given in the text.
The lines joining points A, B, C, and D havé been drawn to emphasize the shape of the
"swarm" of four points.

s = 3) or impossible (when s > 3) to draw. The reader should experiment


with other Usas well.
10 1
10 10)=(1
10 1 10)
10 .
(a) (~ 1 10 1 1 10
. . d ted by I which has ls on
Here U is the so-called identity matrzx, always eno ' b 1
the main diagonal (top left to bottom right) and Os elsewher\In 1sym0 ; ~
the equation is, therefore, IX = X. It is apparent that premulup ica ion
by 1 leaves X, and the square it represents, unchanged.

(b) (02 o1.2 )x (1.0


=
1.2
20.0
1.2
2
12
20).
12
TRANSFORMING DATA M
94 AlR1c~

. t · . that is ' its only nonzero


This U is a diagona / ma rzx' - elements are tho
. se on
. d·agonal
1 lt is seen that each row of y - UX is the correspondih
the mam · 1 t · th "1&
. li d b the single nonzero e 1emenf m e same row of lJ
row of X mul tlp e Y h
etrical effect is to change the sea es . o t e two axes, and the
!~:i;~º:uare becomes a rectangle. Obviously, if we had put

u=(~ ~) ,
. .
the ongmal square wouId have remained a square but with sides three time
as long.

(e) ( 1.5
0.9
0.1
1.0
)x = ( 1.6
1.9
15.1
10.0
2.5
10.9
16)
19 .

The square is now transformed to a parallelogram.

(d) ( 0.1
1.0
I.5
0.9
)x = ( 1.6
1.9
2.5
10.9
15 .1
10.0
16)
19 .

The parallelogram is of the same shape as in (e ) but the corners B and Car
in terchanged.

(e) 2.0
( 0.8 -oA)x = ( 1.6 19.6 - 2.0
-1.0 -0.2 7.0 -9 .2

There is no reason why all the elements of U or all the coordinates of th


data points should be positive; although no measured species quantities are
negative, of course, it is often desirable to convert these measurements to
deviations from their mean values, as shown in Section 3.4, and when this i
done sorne elements of X must be negative. This example illustrates the
effect of setting sorne of the elements of U negative.

(/) 0.8
( -0.6 o.6)x= (1.4
0.8 0.2
8.6
-5 .2
6.8 124) .
7.4
The original square is still a square and its size is unchanged but it has been
rot~ted · A matnx
. '
U that has this effect is known as orthogonal. Because.º
f

the1~ great importance in data transformations orthogonal matrices requúe


detalled description in the following. '
VECTOR AND MATRIX MULTIPLICATION
95

first, however, a word on terminology · Ali the o ·


ously illustrated and others like them in hi h . perat10ns on X previ-
. .r. . w e U is an X .
known as l mear trans1 ormatzons of X. Th . s s matnx are
. . li e word linear . li
eleroent m Y 1s a near function of the elem imp es that each
. li ents of X that . .
the eleroents are mu1tlp ed by constants and dd ' is, one m which
. a ed but are t 1· ·
by each other
. or squared
. or raised to higher powers.' In othe no mu d tlplied
xs are said to be m the first degree. r wor s, all the
For instance, recall that

In this equation, which expresses y¡ J. as a function.


of x lJ'. X2)' · · ·, Xs ·, the
1
factors un, u¡ 2 , •.• , uis are constants. They are mdependent of j.
. Now suppose that the. original data had been only one-d.1mens10na · 1, that
is, that onl_y one s~ec1es had been observed so that s = l. Then the
transformation equat10n

U X = Y
(sXs) (sXn) (sXn)

would be reduced to

ux' =y'

where u is a scalar (an ordinary number) and x' and y' are both n-element
row vectors (equivalently, 1 X n matrices).
Written out in extenso, the last equation is

To multiply a vector by a scalar, one simply multiplies each


/ - sepa_rate
. y' lS a
e1ement of the vector by the scalar. Thus the equat10n ux -
condensed form of the n separate equations
UXn = Yn·
UX¡ =y¡;
E h . . H the adJ. ective linear to
ac of these is the equation of a stra1ght line. ence
describe the transformation.
TRANSFORMING DATA
96 MATRtc~

Orthogonal Matrices and Rigid Rotations

It was already mentioned that the 2 X 2 matrix

u= ( 0.8 0.6)
-0.6 0.8

is described as orthogonal. W e saw that the transformed data matrD;


y = UX specifies a swarm of points with the same pattem as that specified
by the original X; the only change brought about by the transformation is
that the swarm as a whole has a new position relative to the axes of the
coordinate frame.
We are, therefore, free to regard the transformation as a movement of the
swarm relative to the axes, or of the axes relative to the swarm (see Figure
3.3). In both cases, as the figure shows, the movement consists of a ri~d
rotation around the origin of the coordinates. In Figure 3.3b the swarm of
points, behaving as a rigid, undeformable unit, has rotated clockwise around
the origin. In Figure 3.3c the axes bave rotated counterclockwise around the
origin, relative to the swarm.
We now enquire: How can a matrix U be constructed so that its only
effect on X is to cause a rigid rotation of the data swarm relative to the
coordinate axes or vice versa?
To answer the question, envisage a single datum point in two-space with
coordinates (x 1 , x 2 ). The data matrix is, therefore, the 2 x 1 matrix (or
two-element column vector)

(N otice that since x denotes a column vector it is printed as a Iowercase


boldface letter without a prime).
N ex t suppose th e axes are rotated counterclockwise around the ongiII
··
through angle 8. Let the coordinates of the datum point relative to the new,
rotated axes be given by

y=(~~)
(see Figure 3.4).
10

D
A
0

c D 10
10

-5
/
/ B
./
./ (b)
./
./
./
/
\ A/
\ B
c
0 \ 10
,.......~~~~~~--D
\
\
\
,':>\.
\
\
(a) \

(c)

Figur~ 3J. (a) Relative to the axes shown as solid lines, the points A, B, C, and D have
Scanned by CamScanner

coordmates given by the columns of

X= u 10
1
1
10
10)
10 .

Relative t0 th e dashed axes the coordinates are

8.6 6.8 14)


y = ( 1.4 2 .
0.2 -5.2 7.4
(b) and ( ) . . ( b) shows the axes
unait c show the transformation of X to Y in two different ways. d
ered but h . db t the axes move .
t e pomts moved; (c) shows the points unaltere u

97
98 TRANSFORMINC DATA
MAl~ICl~

We need to find y 1 and Yi in terms of x 1 , x 2 , and () from str .


ward geometrical and trigonometrical considerations.
1
ghtfor, ª
Consider Figure 3.4. The points are labeled so that
OA = BP = x1 ; OB = AP = x 2 ; OR = SP = y 1 ; OS = RP : :
. h d. 1
AM is t e perpen 1cu ar from A to the y 1-axis. Y¿.

Obviously, OR = OM + MR.
It is seen that
OM = OA cos (} = X1COS (}'

and
MR=MN+NR
= AN sin(}+ NP sin(}
= (AN + NP)sin(}
= AP sin(} = x 2 sin().

A
\
\
\
\
\
\
Figure 3.4. lllustrating the co · f .
\ s to
coordinates relative t th nversion
o e y-axes.
°
the coordmates of point P relative to the x-axe
VECTOR AND MATRIX M ULTIPLICATION

Tberefore,
Y¡ = X1COS (} + X
2
sin O•
3
Exactly analogous arguments (which the reader should h ( .4a)
e eck) show that
Ji = - X1 sin(/ + X2COS (/
. f (3.4b)
Thus the pair o Equations (3.4a) and (3 4b) .
· · t f th · · give the y-co d.
pomt m erms o e x-coordmates and th or mates of the
· · . e angle O w ·f
equauons as a smg1e equat10n representin th . · n mg the pair of
column vectors gives g e equality of two two-element

Here the left-hand side is y. The right-hand side is


· the matnx
· product

cos (} sin(}) ( X1)


( - sin8 cos8 X2 =Ux (3 .5)

in which the 2 X 2 matrix U has elements

Un = U22 = COS (}; u12 = sinO; U21 = -sin8.

We have now discovered how to construct ali possible 2 X 2 orthogonal


matrices. Ali have the form

cos (} sin 8).


( - sin8 cos (} '

O is the angle through which the axes are rotated. In the example on page
94, 8 = 36.87º, whence cos (} = 0.8 and sin 8 = 0.6.
An important property of orthogonal matrices must now be describe?·
F" · · the matnx
rrst, a defi.nition is needed: the transpose of any matnx is
obt · · I ntly its columns as
amed by writing its rows as columns or, equiva e '
rows. For example, the transpose of the 2 X 3 matrix A, where
100
TRANSFORMING DATA
MAr~ICEs

is the 3 X 2 matrix

Obviously, the transpose of an s X n ~atrix (for instance) is an n >< s


matrix. The transpose of a matrix is always denoted by the same symbol as
the untransposed matrix with a prime added. Thus the transpose of Ais
denoted by A', and the transpose of a column vector, say x, is the row vector
x'.
Now let us obtain U' the transpose of U in Equation (3.5) and then fonn
the product UU'.
-sin O)
coso
and, therefore,

sin O) ( cos O -sin O)


coso sin O coso
_ ( cos 2 0 + sin2 0 - sin Ocos O + sin Ocos 8 )
- sinOcos O+ sinOcos O sin2 0 + cos 2 0

or

UU' = l (3.6)
since cos 2 0 + sin2 0 = l.
Equation (3.6) is true, in general, of all orthogonal matrices of any size.
Orthogonal matrices are always square. Before discussing the general s >< s
orthogo 1 · · ·
n~ matru, It is desirable to make a small change in the symbols.
The reqmred change · h · ·
been relabeled thus: is s own m Figure 3.5. It is seen that the angles bave

(Jn is the angle between the Y -axis and the . .


() . i x 1-axis,
i2 is the angle between the y -axis and th . .
() . i e x 2 -axis
is
21 the angle between the . '
. Y2-axis and the x 1-axis.
022 is the angle between the . '
Ji-axis and the x 2 -axis.
AND MA TRIX MUL TIPUCA TION
VfCTOR
101
Xz y2 ::x:2
Y2 \
\ \
\ \
ij,
\ \
\ \
\ \
__..- - Y1

--
\ 9 12 \ e22 ________ .-- y1
\ \ - ----
\ / 9 11 ----
,/ :x::I
/
\ xi
\ \
(a) \
\ (b)
Figure 3.5. The angles between the x-axes and the y-ax . Th .
and 012 with the x 1- and xr axes; (b) The Yraxis makese:.n(~) (}e Y1-axis m~es angles 011
g es 21 and 022 with the x1- and
xr axes .

It is obvious from the figure that 811 = 822 is the same as the original O; also
that
812 = 90º - 8 or 8 = 90º - 812'.
and
821 = 90º +8 or 8 = 821 - 90º.

The reason for giving every angle a separate symbol becomes clear when we
discuss the s-dimensional case.
Consider, now, how the change in symbols affects U. The old and new
versions are as follows:

sin8) = ( cos 811 cos 812) (3.7)


cos o cos 821 cos 822 .

This result is reached using the relationships:

Un = COS 8= COS 811;

u
12
= sin 8 = sin(90º - 812) = cos 812;
u 21 = -
. 8 = - sm
sm · (821 - 90º) = cos821;

U2 2 = COS 8 = COS 822 ·


Th . of the corresponding angle
ese equations express each u. · as the cosme
O. . i1
'J"
TRANSFORMIN G DATA
102
MAl~tc~

It is now intuitively clear how an s X s orth?gonal matrix U shou}d b


constructed (the proof is beyond the scope of this book). Thus e

Equation (3.6), namely,


U U' = 1
(s Xs) (s Xs) (s Xs)

remains true. This equation states the diagnostic property of orthogonal


matrices: that is, a square matrix U is orthogonal if and only if UU' === l.
When U is orthogonal, the transformation
U X = Y
(sXs) (sXn) (s X n)

brings about a rigid rotation of the s-dimensional coordinate axes on which


are measured the coordinates of an s-dimensional swarm of n data points;
their coordinates are given by the columns of X. The coordinates of the
points relative to the new coordinate axes, which are still s-dimensional, are
given by the columns of Y.
Finally, it should be remarked that the elements of U are known as the
direction cosines of the new axes relative to the old. For instance, the
element u¡¡ of U, which is U¡¡ = cos ()ii' is the direction cosine of the y¡-axis
relative to the x¡-axis in s-space.
The problem of finding numerical values for the elements of an orthogo·
nal matrix when s > 2 is dealt with in Section 3.4. One cannot, as in the
two-dimensional case described in detail previously, simply choose one
angle and derive all the elements of U from it. A rotation in s-space requires
that s angles be known, and because they are mutually dependent, theY
cannot be chosen arbitrarily.

3.3. THE PRODUCT OF A DATA MATR X


ANO ITS TRANSPOSE

In t~s section we consider matrices of the form XX' and X 'X. These are tbe
matnx ?r~ducts formed when a data matrix X is postmultiplied alld
premultiplied r t. 1 · X11
' espec ive y, by the transpose of itself. If X is an s
TRANSFORMING DATA
MATRICES
TABLEJ.2. DATA MATRIX #8, IN RAW FO
------------~ RM X, AND CENTERED X(e)·
- 1
X1 =-t4 =9X1¡

x+~ l!)
j
8 10
11
5
3
5
-
X
2
=-41 t =8X2¡
j

-
X
3 =-41 t =4X3¡
j

(-5 -1 1
XR =
-2
9 3
1
-5
1 -~)
The SSCP matrix R and the covariance matrix (l/n)R.

(-5
-~)(=~
-1 1 9
R = XRXR =
-2
9 3
1
-5
1 -5
-7
3
-ri
-( -~~ 10
-88
164
-20
-2~
10)
var( x1) cov(x 1 , x 2 ) cov(x1, x 3 ) )
~R = cov( x 2 , x 1 ) var(x 2 ) cov(x 2 , x 3 ) =
( 13
-22
-22
41 -5
2.5)
( cov(x x var(x 3 ) 2.5 -5 1.5
3, 1) cov(x 3 , x 2 )

Notice that if the right-hand side were divided by n, * it would give the
variance of the observations xil, x¡ 2 , ••• , xin' that is, the variance of the
variable "quantity of species i per quadrat." lt should be recalled that
the variance of a variable is the average of the squared deviations of the
observations from their mean. In symbols, the variance of the quantity of
species i per quadrat is

1 n - 2
var(x¡) = - L (xiJ - x;) ·
n J=l

The standard deviation of this variable, say <1;, is the square root of the
*n is . . . d ts examined constitute the total
po ul u~ed as d1v1sor since we are assurrung that the n. qua ra a lar er "parent population"
f P ~tion of interest. If the quadrats are a sample of size n from g
or Which the variance is to be estimated, the divisor would be n - l.
THE PRODUCT OF A DA TA MATRIX ANO ITS TRANSPOSE

matrix, then XX' is an s x s matrix and X'X an n X n. matrix. These


matrices are needed in many ordination procedures as explamed in Chapter
4. For clarity we consider XX' in detail first, and note the analogous
properties of X'X subsequently.

The Variance-Covariance Matrix

As always, we denote a data matrix by the symbol X. Its elements are in


"raw" form ; that is, they are the data as recorded in the field. The (i, J)th
element is the quantity of species i in quadrat j; i ranges from 1 tos and j
ranges from 1 to n.
We now require a centered data matrix XR.* Its (i , j )th element is the
amount by which species i in quadrat j deviates from the mean quantity of
species i in all n quadrats. Thus the (i, j)th element is x; - X¡ where
1
X¡= (l/n)L)= 1 xiJ. That is, X¡ is the mean quantity of species i averaged
over the n quadrats; equivalently, it is the mean of the n elements in the ith
row of X. A simple example in which X is a 3 X 4 matrix (Data Matrix # 8)
is shown in the upper panel of Table 3.2. Because of the way in which it is
constructed, all rows of X R must sum to zero.
N ow form the product

where XR is the transpose of XR. R is a square s x s matrix; the ith


element on its main diagonal (i.e., its (i, i)th element) is obtained, as usual,
by postmultiplying the s-element row vector constituting thé ith row of XR
~y the s-element column vector constituting the ith column of Xá. Since XR
is the transpose of XR, these vectors are" the same" except that one is a row
vector and the other a column vector. Thus their product is

X¡¡ - X¡
(x '.1 - x.1' x z2. - x- _) X¡2 - X¡ n
¡, · · ·, X¡n - X¡
L (x;1 - x; )
2
(3.8a)
J=l

*The symbol R is used as subscri t in X


described are part of an R-t pal . R and Sa , and also by itself because the procedures
YPe an ys1s. '
THE pRODUCT OF A DATA MATRIX ANO ITS TRANSPOSE
105

variance, or

var(x I.) = 0 I.2 •

Next consider the (h, i)th element of R Th .


. (. .) h . e switch from the f .li
symbol pair z, J to t e unfamiliar ( h , i) is b ecause h and ¡ b h arru ar
species, namely, the h th and i th species where . h . ot represent
. . ' as m t e pa1r ( · ·)
hitherto m this chapter, i denotes a species and . z, l as used
element of R is ª
1 quadrat. The ( h, i )th

n
L (xhj - xh)(xij - X¡) (3.8b)
j=l

This is n times the covariance of species h and species i in the n quadrats,


written cov(xh, x¡). When two variables (such as species quantities) are
observed on each of n sampling units (such as quadrats), the covariance of
the variables is the mean of n cross-products. The cross-product for species
h and i in quadrat j is the product of the deviation from its mean of the
amount of species h in quadrat j (which is xhJ - xh) and the deviation
from its mean of the amount of species i in quadrat j (which is x¡ 1 - x¡).
For given h and i, there are n such cross-products, one for every quadrat,
and their average* is the covariance cov(xh, x¡). There are s(s - 1)/2
covariances altogether, one for every pair of species. Notice that if we put
h = i and calculate cov(x.,, x 1.) ' it is identical with var(x¡).
Writing R out in full, it is seen that

L(xlj - l\)2 ... L(x¡j - X¡)(x,j - x,))


R= ( [(~:J·~ ·;,)(~1> ~:; ............. i(~,~ ~ ~~;,° .. .
*As . . . . n when the n quadrats are
15
Withthe vanance, the divisor used to calculate this average d as a sample from
tre~ted as a whole "population ,, and n - 1 when the quadrats are treate
Which the covanance
· . a larger 'populat10n
m · is· to be est1mate
. d·
TRANSFORMIN G DATA MA
106
lR1cls

where ali the summations are from J = 1 to n · R i~ known as a surr¡.


squares-and-cross-products matnx· or an SS CP matrzx for short · I t is· 0an1,
s X s matrix.
Alternatively, one may write

var( x 1 ) cov(x 1, ~2) cov(x 1 , xs)

R = n cov(x 2 , x 1 ) var(x 2 ) cov(x 2 , xs)


. . . . . . . . . . . . . . . . . . . . . ... .......... (3.9)

ª
Observe that when, as here, a matrix is multiplied by scalar (in this case
the scalar is n ), it means that each individual ele~ent of the matrix is
multiplied by the scalar. Thus the (h, i)th term of R is n cov(xh, X;). R isa
syrnmetric matrix since, as is obvious from (3.8b ),

The matrix (1 / n )R is known as the variance-covariance matrix of the


data, or often simply as the covariance matrix.
The lower panel of Table 3.2 shows the SSCP matrix R and the
covariance matrix (1 / n )R for the 3 X 4 data matrix in the upper panel.
To calculate the elements of a covariance matrix, one may use the
procedure demonstrated in Table 3.2, or the computationally more conve·
nient procedure described at the end of this section.

The Correlation Matrix

In the raw data matrix X previously discussed the elements are the mea·
sured quantities of the di.tferent species in each of a sample of quadrats or
other sampling units. Often, it is either necessary or desirable to standardize
these data, that is, rescale the measurements to a standard scale.
Standardization is necessary if di.tferent species are measured by different
methods in noncomparabl · p . . 1; ... g it
mayb · e uruts. or example ' m vegetat10n samplJJJ ·
e converuent to use cover as the measure of quantity for sorne spec1es,
~nd numbers of individuals for other species · there is no ob1ection to usíng
mcommensurate u ·t8 h ' J d·zed
bef ore analys1.s. ru suc as these provided the data are standar 1
pRODUCT OF A DATA MATRIX ANO ITS TR
Tt-ff ANSPosE

107
Standardization is sometimes desirabl
.. d 1 e even wh
robers of ind1v1 ua s) are used for the m en the same units (
nu l=J' f . . easurement f e.g.,
t1·es · It has the euect o . we1ghtmg
.
the species
accordm
. o all species quant·
. i-
are species have as b1g an mfluence as cornn-. g to theu rarity so that
r . hi . ..uiuon ones o th
Ordination. Sometunes t s is desirable somet· n e results of an
' unes not o
wish to prevent the common species from d . . · ne may or may not
matter of ecological judgment. A thorough di.ºsinin~tmg an analysis. It is a
. . cuss1on of the
of data standardizat10n has been given by Noy-M . pros and cons
(1975). eu, Walker, and Williams
The usual way of standardizing, or rescaling th d t . . .
observed measurements on each species after ' they e ha a isbto d1v1de the
. . ' ave een centered
(transformed. t~ deviat10ns fr?m the respective species means), by the
standard deviatlon of the spec1es quantities. Thus the element x .. in X is
replaced by IJ

Jvar(x¡)
say.
We now denote the standardized matrix by ZR, and examine the product
~RZR. = SR, say. (ZR is the transpose of ZR.)
The (h, i)th element of SR is the product of the hth row of ZR
postmultiplied by the ith column of ZR. Thus it is

X·in - X·1

CJ¡

f (xhJ - xh)(xiJ - .X¡)


J-l
108 TRANSFO RMINC DATA M
Al~1c~s

W
h ere rhr. is the correlation coefficient between species h and species i tn
.
the
n quadrats. . .
Observe that the (i , i)th element of SR 1s n. This follows from the fac¡
that cov(x¡, X¡)= var(x¡)·
The correlation matrix is obtained by dividing every element of SR by n.
Thus

lt is a symmetric matrix since, obviously, rhi = r ¡h·


Table 3.3 shows the standardized form ZR of Data Matrix # 8 (see Table
3.2), its SSCP matrix ZRZ~ , and its correlation matrix. The elements of the
correlation matrix may be evaluated either by postmultiplying ZR by ZR
and dividing by n, or else by dividing the (h , i)th element of the covariance

TABLE3.3. COMPUTATION OF THE CORRELATION MATRIX


FOR DATA MATRIX #8 (SEE TABLE 3.2). '

The standardized data matrix is


-5 -1 1 5
v'IT
9
v'IT v'IT m
3 -5 -7
ZR =
v'4f 141 141 141
-2 1 1
o
v'f5 v'f5 ru
The SSCP matrix for the standardized data is

sR = zRza =j -i.sm -3.8117


4
2.2646)
- ~ . 5503 .
2.2646 -2.5503
The correlation matrix is

~~:)
1
= ( - 0.~529
0.5661
-º·{ 529

- 0.6376
__?~1~~~6).
--------~----~~~~----
~ooLJCT Of A DATA MATRIX AND ITS TR
rt-IE P ANSPOSE

roa tfÍX, wb.ich is cov(xh,x¡),


. . by /var(x h )var ( x .
109
.,,T1ces trom the mam diagonal of th
va.flOJ-• e covariai) ' taking the v 1
a ues of h
a.rnple, the (1, 2)th element of the n~e matrix. Thu . t e
ex 2/. 'ÍfX4l = - 0.9529. correlation mat . . s m the
-2 VlJ -"" "T.1. nx is r12 _
Yet another way of obtaining
. . . the correlafion matnx . . -
nu.rnerical examp1e) it is g1ven by the product is to note that (in the

o
/41""
o
o
~
l( 13
-22
2.5
-22
41
-5
_;.sl(~
1.5 o

o
o {E
o . l
In the general case this is the product (1/n)BRB where B is th d·
rnatrix
.
whose (i, i)th .element
.
is Jvar(x i.) =a¡. N ot.ice that when
e iagonal
three
rnatnces are. to be multiplied (e.g., when the product LMN is · to be found) 1.t
malees no difference. whether one first obtains. LM
. and then postmu1tiplies
· . ' 1.t
by N, or first obtams MN and then premultipbes it by L· Ali that matters is ·
that the order of the factors be preserved. The rule can be extended to the
evaluating of a matrix product of any number of factors.

The R Matrix and the Q Matrix

Up to this point we have been considering the matrix product obtained


when a data matrix or its centered or standardized equivalent is postmulti-
plied by its transpose. The product, whether it be XX', XRXR., or ZRZR., is
of size s X s. Now we discuss the n X n matrix obtained when a data
matrix. is premultiplied by its transpose.
First, with regard to centering: recall that to form XR, the matrix X was
centered by row means; that is, it was centered by subtracting, from every
element, the mean of its row. Equivalently, the data were centered by
species means since each row of X lists observations on one species. .
T B t this time, centenng by
o center X' we again center by row means. u 0 f X'
ro · ' . d t eans since each row
li w
t
means is equivalent to centenng by qua rad fm of X' w1ll . be deno ted*
s s observations on one quadrat. The centere orm . h the quantity
by X' Th . h unt by whic xJi'
. Q· e(}, i)th element of X 1s t e amo
0 1 of all s species
fty
in quadrat J of species i, deviates from the average quan
analysis.
*Th t of a Q-type
e symbol Q is used because the procedures described are par
TABLE3.4. DATA MATRIX #8 TRANSPOSED, X', ANDTttEN
CENTERED BY ROWS, Xo.

x 1 = 7.67
17
11 x2 = 8.00 -----------
X'= r10: 3
l) x3 = 6.00
14 1 x4 = 6.33

X'Q --
r-3.67
O
9.33
3
-5.67)
-3
4 -3 -1
7.67 -5.33 -2.33
The SSCP matrix Q and the covariance matrix (1 / s )Q

45 -37
18 -64.67)
-6 -9
-6 26 49
-9 49 92.67
44.22 15 -12.33
1 15 6 -2 - 21.56)
-3
;Q = [ -12.33 - 2 8.67 16.33
- 21.56 -3 16.33 30.89
in the quadrat. (If a species is absent from a quadrat, it is treated as
"present" with quantity zero.)
Table 3.4 (which is analogous to Table 3.2) shows X', the transpose o
Data Matrix #8, and its row-centered (quadrat-centered) form XQ in the
upper panel. In the lower panel is the SSCP matrix Q = X XQ (here XQ is
the transpose of XQ) and the covariance matrix (l/s)Q. 0 .
The (j, j)th element of (l/s)Q is the variance of the species quantities in
quadrat j. The (j, k)th element is the covariance of the species quantities in
quadrats j and k. These elements are denoted, respectively, by var(xj ) and
cov(x1, xk); the two symbols j and k both refer to quadrats. Notice that
var(x) could also be defined as the variance of the elements in the jth row
of XQ, and cov(x1, xk) as the covariance of the elements in its jth and ktb
rows.

Next XQ is standardized to give ZQ; we then obtain the producl


SQ = ZQZQ wh~re ZQ is the transpose of ZQ. Finally, (ljs)~
is_ 1:;
correlatJon matnx whose elements are the correlations between every pi!ll
quadrats. Table 3.5,
which is analogous to Table
calculations for Data Matrix # 8.
3.3,
shows the steps in the
RooUCT Of A DATA MATRIX ANO ITS
rHf p TRANSPOSE
,,,
TABLEJ.5. COMPUTATION OFTHE
pOR THE TRANSPOSE OF DATA MATCORRELATION MA
-- RIX # 8. TRIX
Tbe standardized form of X' is
- 3.67 9.33 - 5.67
v'44.22 v'44.22 ~
v44.22
3 -3
o
16 16
4 -3 -1
v'8.67 v'8.67 i/8.67
7.67 - 5.33 -2.33
v'30.89 v'30.89 v'30.89
The SSCP matrix for the standardized data is
3 2.7626 -1.8900 -1.7497)
s - Z' z _ 2.7626 3 - 0.8321 - 0.6611
Q - Q Q -
( - 1.8900 - 0.8321 3 2.9948 .

l
-1.7497 -0.6611 2.9948 3
The correlation matrix is
1 0.9209 -0.6300 -0.5832
1s - 0.9209 1 -0.2774 -0.2204
-; Q - -0.6300 -0.2774 1 0.9983 .
( 0.9983 1
-0.5832 -0.2204

Table 3.6 shows a tabular comparison of the two procedures just dis-
cussed. These procedures constitute the basic operations in an R-type and a
Q-type analysis. It is for this reason that the respective SSCP matrices have
been denoted by R and Q.

Computation of a Covariance Matrix

his subsection is a short digression on the subject of computations.


. · 1 · iples and who do not
eaders who are concentratmg exclusive Y on pnnc . 3
ish to be distracted by practica! details should skip to Sectwn .4. product
1 · It is the
Consider R, the SSCP matrix used in an R-typehi oduct as it stands, it..is
ana ysis. .
f
aXR. Instead of evaluating the elements o t s pr
sually more convenient to note that
(3.10)
X RX'R = XX' - XX'
and · ·de of this equation.
to evaluate the expression on the nght si
TABLE 3.6. A COMPARISON BETWEEN PRODUCTS OF THE FORM XX' AND X'X.ª

Centered XR is matrix X centered by rows (species). Its XQ is matrix X' centered by rows (quadrats).
matrix (i, j)th term is
X¡¡ - X¡ where Its (j, i)th term (in row j , column i) is
1 n
X¡= - L xiJ.
n J=l

R = XRXR Q = X 0XQ
SSCP where X Ris the transpose of X R. where XQ is the transpose of x 0.
matrix R is an s X s matrix; each of its elements is a sum of Q is an n X n matrix; each of its elements is
n squares or cross-products. a sum of s squares or cross-products.

.!:_R lQ
s
n
Covariance The (i, i)th element var(x;) is the variance of the The (j, j)th element var(x) is the variance of
matrix elements in the ith row of XR (quantities of species i). the elements in the jth row of X 0 (quantities
The (h, i)th element cov(xh, X;) is the covariance in quadrat j). The (j, k)th element cov(x./, x;.)
of the hth and ith rows of XR (quantities of species is the covariance of the j th and k tb rows of
h and i). X é, (quantiti es i n qua dra t s _/ a.n d k ).
ZR Z'Q
Standardized Its (i, j)th term is
lts (j, i )th term is
matrix x,, - x, x1, - xj
Jvar( x,) a,
vvar(xj)
where a, is the standard deviation of the where ªJ is the standard deviation of the
quantities of species i. quantities in quadrat j.

1 1 I
- SR= -ZRZR
n n
Correlation The (h, i)th element of (l/n)SR is r 17 ¡ , the The (j, k )tb element of (1/s )S0 is ')k, the
matrix correlation coefficient between the h th and i th rows of correlation coefficient between the jth and k tH
XR (i.e., between species h and species i). rows of X 0 (i.e., between quadrats j and k ).
The matrix has ls on its main diagonal since rhh = 1 Tbe matrix has ls on its main diagonal since
for all h. '.iJ = 1 for a11 j.
ªSymbols h and i refer to species; syrnbols j and k refer to quadrats.
114
TRANSFORMING D
AlA~A
l~I(¡,

Here X is an s X n matrix in which every element in the ith .


row is
1 n
X¡= -
n ;=
. 1
X¡J' L
the mean over all quadrats of species i. Thus X has n identical s-elern
columns. It is en

X=

with n columns.
Hence

nx-21
- -
~~2.~1 . . . ~~~
nx- 1x- 2
-2 -
n.x.2 :~
••• '. '.'. • •

nx- 1xs
-

rnxsxl
- - - -
nxsx2 ... -2
nxs

which, like XX', is an s X s matrix. .


The subtraction in (3.10) is done simply by subtracting every element ~
XX' from the corresponding elemen t in XX', that is, the ( i, j )th element 0
the former from the (i, j)th element of the latter.
These computations are illustrated in Table 3.7, in which matrix R f:r
Data Matrix # 8 is obtained using the right side of Equation (3.1~). \:
table should be compared with Table 3.2 in which R was obtained using 1
left side of (3 ..10). As may be s~en, the results are the s_ame. i)lh
Now cons1der the symbolic representation of this result. The (h,
element of XRXR is [see Equation (3.8b))
n

L (X hJ - X,;) (X ij - X;) .
j=l
RODUCT Of A DATA MATRIX ANO ITS T
rt-IE P RANSPOSE
115
TABLE3.7. COMPUTATION OF THE sscp

~:~(81;S~G:;:;¡¡~IO~t~XX)~(~::R::;
2 5 5 4 10 3 5 = 200 420 108

xx' = ( 4~ 4~ 4~ ~ )(~ 4 9
9
r:)1-(:24 ::: 1:) 70
8
8
4
4
- 288
144
256
128
128
64
I ( 52 -88 10)
R =XX ' - XX = -88 164 -20
10 -20 6

The (h, i)tb element of XX' - XX' is

n
L: xhjxij - nxhxi.
j=l

We now show that these two expressions are identical. In what follows,
ali sums are from j = 1 to n.
Multiplying the factors in brackets in the first expression shows that

Now note that Lx x .. = x I:x 1.. and LX¡XhJ = X¡LXhJ since xh and X¡ are
h 11 h 1 ¡ 0 f ·) Similarly
onstant with respect to j (i.e., are the same for all va ~e~ 1 · '
Cxhx¡ = nxhx; since it is the sum of n constant terms xhxi. Thus

ext make the substitutions

and "X·
Í-J I}
. = nX¡·
116 TRANSFORMING DAT
AMAlR1ct~

Then

= '"'Xh
~ J·X··
lj - nXhX1·

as was to be proved.

3.4. THE EIGENVALUES AND EIGENVECTORS OF A


SQUARE SYMMETRIC MA TRIX

This section resumes the discussion in Section 3.2, where it was shown how
the pattem of a swarm of data points can be changed by a linear transfor-
mation. It should be recalled that, for illustration, a 2 X 4 data matrix was
considered. The swarm of four points it represented were the vertices of a
square. Premultiplication of the data matrix by various 2 X 2 matrices
brought about changes in the position or shape of the swarm; see Figure 3.2.
When the transforrning matrix was orthogonal it caused a rigid rotation of
the swarm around the origin of the coordinates (page 94) ; and when the
transformation matrix was diagonal it caused a change in the scales of the
coordinate axes.
Clearly, one can subject a swarm of data points to a sequence of
transformations one after another, as is demonstrated in the following. The
relevance of the discussion to ecological ordination procedures will become
clear subsequently.

Rotating and Rescaling a Swarm of Data Points

To begin, consider the following sequence of transformations:

l. Premultiplication of a data matrix X by an orthogonal matriX (}.


(Throughout the rest of this book, the symbol U always denotes ª11
orthogonal matrix.)
2· Premultiplication of UX by a diagonal matrix A (capital Greek lambda:
the reason for using this symbol is explained later).
3
· Premultiplication of AUX by U', the transpose of U, giving VJ\VX·
NVALUES ANº t1uc1H e'""'º"" vt A SQu
HE flGE ARE SY MMET RIC MATRIX
117

is an example. As in section 3 2
f{ere . · , we us
enting a swarm of four pomts in two-sp e a 2 X 4 data mat .
pres ace. This time let nx

X= (11 15 14 45) .
s the data swarm consists
. of the vertices of a rectangle (
hu e first transformahon
. .
1s to cause a clockw·1se rotati seef Figure 3.6a) ·
Th
rO
und the origin through an angle of 25 °. The 0 th
. .
on the rectangle
r ogonal t·
°
produce this rotat10n 1s (see page 101) ma nx required

u - ( cos 25º cos 65º) - ( 0.9063 0.4226)


cos 115º cos 25º - -0.4226 0.9063 .

Jt is found that the transformed data matrix is

ux = ( 1.329 4.954 2.597 6.222)


0.484 -1.207 3.203 1.512 .

::C2 :X:2
(a) (b)
5 5

D 5
:x:, 10
x,

X2
~
(d)
10 (e) 10

5 10 15 :x:,
5 'º
. . ( d) U' AVX· The lines
~e 16· The data swarms represented by: (a) X; ( b) UX; (e) A~, The eleroents of U and
ar:g .the ~oints are put in to make the shapes of the swarms apparen ·
given m the text.
TRANSFORMING DAT
118
A"1Al~IC¡1

The swarm of points represented by this matrix, which is mer


.. . h . p· 3 e1y tn
riginal rectangle in a new pos1tton, 1s s own m 1gure .6b. e
o 1 .
The second transformation is to be an a terat10n of the coordinate
. (th e a b sc1ssa
. ) b y a factor of ~ scalei
Let us increase the scale on the x 1-mas
1 :::: 4·
and on the x 2 -axis (the ordinate) by a factor of A2 = 1.6. This is equival~~
to putting

Then

AUX = ( 3.189 11.890 6 .232 14.933)


0.774 -1.931 5.124 2.419 .

The newly transformed data are plotted in Figure 3.6c. The shape of the
swarm has changed from ~ rectangle to a parallelogram.
The third and last transformation consists in rotating the parallelogram
back, counterclockwise, through 25º. This is achieved by premultiplying
AUX by U', the transpose of U. It is found that

U'AUX = ( 2.563 11.592 3.483 12.511)


2.049 3.275 7 .278 8.504 .

These points are plotted in Figure 3.6d.


Now observe that we could have achieved the same result by premultiply·
ing the original X by the matrix A where

A= U'AU = ( 2.2571 0 .3064) (3 .11)


0 .3064 1.7429 .

Observe that A is a square symmetric matrix.


We n~w make the following assertion, without proof. Any s X s square
symmetnc matrix A is the product of three factors that may be writt~
U'Al!; U and its transpose U' are orthogonal matrices; A is a diagoil d
matnx. In the general d. · s all
. s- 1mens1onal (or s-species) case all three factor
A itself are s X s matrices. '
A (Thle elements on the main diagonal of A are known as the eigenvafues o
ª so called the latent va¡ues or roots, or characteristic values or roots, 0
ENVALUES AND EIGENVECTORS OF A SQUA
HE EIG RE SY MMETRIC MA TRIX
119

l
). That is, since

A1 O o
A=
[ ¿.. ~: ..::·... ~: ,

the eigenvalues of A are A1,. A2, ... ' As. The eigenvalues of ama tnx . are
1uAenoted by AS by long-established. .
custom· '
likewise '
the matrix· f ·
o eigenva1-
ues is always denoted ?Y A. This IS why A was used for the diagonal matrix
that rescaled the axes m the second of the three transformations performed
previously.
The rows of U, which are s-element row vectors, are known as the
eigenvectors of A ( also called the latent vectors, or characteristic vectors, of
A).
In the preceding numerical example we chose the elements of U (hence of
U') and A, and then obtained A by forming the product U'AU. Therefore,
we knew, because we had chosen them, the eigenvalues and eigenvectors of
this A in advance. The eigenvalues are A1 = 2.4 and °"A. 2 = 1.6. And the
eigenvectors are

ui = ( 0.9063 0.4226)

and

u'2 = ( -0.4226 0.9063 ),

the two rows of U. . tnx· A and


.h h e symmetnc ma
Now suppose we had started wit t e squar ·ble to
W ld it have been possI
had not known the elements of U and A · ou · yes The
f A? The answer IS ·
determine U and A knowing only the elements 0 · A _ u 'AU in
. . d U d A such that - '
analys1s which, starting with A, fin s .ªn n eigenanalysis. In
which U is orthogonal and A diagonal, IS kn~wn as ª·s forros the heart
. . d an eigenana1ysI
nearlY all ecological ordmat10n proce ures,
of the computations, as shown in Chapter 4 · . hich this may be
· f one way in w ·
d
The next step here is a demonstrat10n
.
°
· ly constructe
d and then w1th ª
. d
one, first with the 2 x 2 matnx A previous d be generalize to
3 . · h h metho can
>< 3 symmetric matrix. The way m whic t e . .th s > 3 will then be
Per · · trie matnx wi
rrut e1genanalysis of an s X s symme
120

clear. Of course, with large s the computations exhaust the paf


anything but a computer. ience or

Hotelling's Method of Eigenanalysis

To begin, recall Equation (3.11)

A= U'AU
and premultiply both sides by U to give

UA = uu~u.

N ow observe that, since U is orthogonal, UU' = 1 by the definition of an


orthogonal matrix. Therefore,

UA = IAU = AU.

Let us write this out in full for the s = 2 case. For U and A we write each
separate element in the customary way, using the corresponding lowercase
letter subscripted to show the row and column of the element. For A we use
the knowledge we already possess, namely, that it is a diagonal matrix. Thus

Ü ) (Un
A2 U21

On multiplying out both sides, this becomes

Un ªn + U12 ª21
(
U21 ªn + U22 ª21

which states the equality of two 2 X 2 matrices. N ot only does the left side
(as a whole) equal the right side (as a whole), but it follows also that anY
row of the matrix on the left side equals the corresponding row of tbe
matrix on the right side. Thus considering only the t9p row,

(unan+ U12ª21' Unª12 + U12ª22) = (A. 1un, A1U1z) ,

an equation having two-element row vectors on both sides. This is the same
as the more concise equation
ENVALUES ANO EIGENVECTORS OF A
¡Jff LIG SQUARE S\'M
METRIC MATRIX
. 121
.ch ui 1s the two-element row vect
in wl1l or con . .
nonzero element in tl ti stituting the fi
an d "A i is the only. le rst r rst row of U
ther are an e1genvalue of A and its e ow of A. Hen , ,
ioge orresponct· . ce 1\1 and '
tJotelling's method for obtaining the . ing e1genvect U1
r1 nuinenc l or.
ele ments of ui when the
. elements of A ar .
e g1ven p ª
values of A i an
d the
stePs are illustrated usmg roceeds as foll ows. The

A= ( 2.2571 0.3064)
0.3064 1.7429 '

the symmetric matrix whose factors we alread Yk now.

Step J. Choose
. arbitrary tria! values for the e1ements of u' D .
tnal vector by w(Ó)· It is convenient to t . l· enote this
s art with w(Ó) = (l, l).
/
Step 2. F orm th e prod uct Wco) A. Thus

( 1, 1 )A = ( 2.5635, 2.0493 ).

Let the largest element on the right be denoted by 11. Thus


11 = 2.5635.
Step 3. Divide each element in the vector on the right by /1 to give

2.5635( 1, 0.7994) = l1wó_),

say. Now wci) is to be used in place of WcÓ) as the next trial vector.
Step 4. Do steps 2 and 3 again with wá) in place of WcÓ), and obtain /2 and
Wá)·

Continue the cycle of operations (steps 2 and 3) until a trial vector is


obtained that is exactly equal, within a chosen number ~f dec~al P,~aces, ,~º
the preceding one. Denote this vector by wcf-> (the subscnpt F 18 for final ).
~ben the elements of w' are proportional to the elements of ui the first
e1g (F) · ' A 15
· ual to ~ 1 the
envector of A and / the largest element m WcF) , eq '
~ ' p,
rgest
Th eigenvalue
. of A. d . h f0 ur decimal places
us m the example at the nineteenth cycle an wit
\Ve obtain '
0.4663 ).
( 1, 1.1191) = 2.4000( 1,
0.4663 )A = ( 2.4000,
122 TRANSFORMINC DATA
"-'Al~IC¡
That is,

w<F> = ( 1, 0.4663)

and

lp = ~l = 2.4000.

We now wish to obtain uí from w<~· Recall (page 102) that UU 1 ::::
1
what comes to the same thing, that the sum of squares of the element~ ~
any row of U is l. Hence uí is obtained from w<~ by dividing each elemen
in w<~ by the square root of the sum of squares of its elements. That is,

1i2 + ~.46332
0.4633 )
u; = ( , 2
/1 + o.4633 2

= ( 0.9063, 0.4226 ).

These steps are summarized in Table 3.8.


Having obtained ~ 1 and uí, the first eigenvalue and eigenvector of A, iti
easy (when s = 2) to obtain the second pair.

TABLE3.8. EIGENANALYSISOFTHE2 X 2MATRIXA


BY HOTELLING'S METHOD.ª

Cycle Tri al
Number Eigenvector
w{i) I
W(i)
A -- I i+IW(i+l)
I

o (1, 1) (2.5635, 2.0493) = 2.5635(1, 0.7994)


1 (1, 0.7994) (2.5020, 1.6997) = 2.5020(1, 0.6793)
2 (1, 0.6793) (2.4652, 1.4904) = 2.4652(1, 0.6046)

19 (1, 0.4663) (2.4000, 1.1191) = 2.4000(1, 0.4663)


20 (1, 0.4663) (2.4000, 1.1191) = 2.4000(1, 0.4663)
Hence A1 = 2.4000
and ui is proportional to (1, 0.4663).

ªGiven A = ( 2.2571 0.3064)


0.3064 1.7429 .
fNVALUl:~ ANLJ CIUC l"IVtl...IURS OF
rHE flG A SQuARE SYM
METRIC MATRIX
123
We knOW (page 101) that U has the form

U = ( c?s () sin o)
-sm() coso
d we have just obtained uí the first row f U
an o Which is
0 í = (0.9063, 0.4226).

Therefore,

u= ( 0.9063 0.4226)
-0.4226 0.9063

and u'2 is the second row of U.


To find A2 , the eigenvalue corre3ponding to u;, recall Equation (3.ll),
namely,

A= Ul\U.

Premultiply both si des by U and then postmultiply both sides by U'. Hence

UAU' = UU1\UU'.

Since U is orthogonal, UU' = I; therefore,


(3.12)
UAU' = IAI = A.

Hence we may find A by evaluating UAU'. lt is found that


2

UAU' = (o2.4 O ).
1.6

Thus A1 = 2.4 (as was found before) and A1 = 1.6. h d to a 3 X 3 matrix,


Next consider the application of H otelling's met od . 1
with a numenca
sa B te the proce ure
Y · Again it is convenient to demonstra
example. Let

{~-º 0.2
5.6
2.4)
-0.4 .
~-2
B =
-0.4 5.2
TRANSFORM ING DA
124 TA MAl~ICt\

. b tinding Ai and ui as was done with a 2 X 2 matrix. As b


Begm Y . Th t · ef
"th the trial matnx (1, 1, 1). e compu at10ns proceed as f 0rel
start w1 . oU0
decimal places are shown, but 12 were used to obtam these w
(on1y 3 results).
(1, 1, l)B=(8.6, 5.4, 7.2)=8 .6( 1, 0.628, o. 837 )
( 1, 0.628, 0.837 )B =
(8.135, 3.381, 6.502) = 8.135(1, 0.416 , 0.799);
• • . • . . • . • • . • • • . • • • • • • • • • • • • • • • • • • • • • • • • . . . . 1.

( 1, -0.058 , 0.854 )B =
( 8.038, -0.466 , 6.864) = 8.038( 1, -0.058 , 0.854).

Therefore,
A1 = 8.038,

and ui is proportional to (1, - 0.058, 0.854). Dividing through the elements


of this last vector by

/i 2 2
+ ( -0 .058) + 0.854 2 = 1.316
shows that

ui = ( 0.760, - 0.044, 0.649 ).

Now, to find the second eigenvalue and eigenvector, A2 and u;, proceed
as follows. Start by constructing a new matrix B · it is known as the first
1
residual matrix of B and is given by '

B¡ = B - A1U1Ui

0.760)
= B - 3 .o 33 -0.044 ( 0.760, -0 .044, 0.649)
(
0.649
6.0 0.2
= 0.2 2.4) ( 4.639 -0 .269 3.962)
( 5.6 -0.4 - - 0.269 -0.230
2.4 -0.4 1.565
5 .2 3.962 -0.230 3.383
1.361 0.469
= ( 0.469 -1.562)
4.035 -0.170 .
-1.562 -0 .170 1.817
ENVALUES AND EIGENVECTORS OF A
ft·IE EIG SQUARE SYM
METRIC MATRIX
. n1 . 125
Note: this is o y approximate; for accurate r
(more decimal places would be needed ·) esults at the next step m
The values of A2 and u'2 may now b b . ' any
A d ' . e o tallled f
sarne way as 1 an "1 were obtamed from B. lt . rom B1 in exactly th
is found that e
A. 2 = 5.671 and u'2 = ( o. 144
. . ' 0.984, -0.102).
finally, smce B is a 3 X 3 matrix there ;_ 5 a third .
pair still to be found, A3 and u3. To find th e1genvalue-eigenvector
residual matrix of B from the equation em, compute B2 the second

or, equivalently,

and operate on B2 in exactly the same way as B and B1 were operated on. It
is found that

A. 3 = 3.092 and u'3 = ( -0.634, 0.171, 0.754).

The eigenanalysis of B is now complete. As a check, recall that, applying


Equation (3.12), we should have

UBU' =A.
Substituting the numerical results just obtained in the left side of this
equation gives

UBU' =
( 0.760
0.144
-0.044
0.984
0.649) 0.2
-0.102 (6·º -0.4
0.2
5.6
2.4)
-0.4
5.2
0.171 0.754 2.4
-0 .634

( 0.760 0.144 -0.634)


0.984 0.171
X -0.044
-0.102 0.754
0.649

=
8.043
-0.001
-0.001
5.667
O
0.001
)
~
( 8.0
o
33
~.o671 ~3.092 ) =A.
( o 0.001 3.091 o d
(Tb . . al Jaces were use .)
e mexactness is because only three decnn P
126 TRANSFORMING DATA
MATRICts

' I t should now be clear how a square symmetric matrix of any s· .


analyzed by Hotelling . ' s method . The eigenva
. 1ues are always obtainect
tze .is
decreasing order of magnitude; that is, given an s X s matrix, we alway~
have A. 1 > A. 2 > · · · > A. s .
A way of reducing the large numbers of computational cycles that are
often needed to find each eigenvalue- eigenvector pair is outlined in Tatsuoka
(1971); it is beyond the scope of this book.
Finally, it is worth noticing, though it is not proved here, that the sum of
the eigenvalues is always equal to the sum of the diagonal terms of the
matrix analyzed, which is known as the trace of the matrix.
Thus the trace of the 2 X 2 matrix A analyzed before is

tr(A) = a 11 + a 22 = 2.2571 + 1.7429 = 4

and
2
L A¡ = 2.4 + 1.6 = 4.
i=l

F or the 3 X 3 matrix B we have

tr(B) = b11 + b22 + b33 = 6.0 + 5.6 + 5.2 = 16.8


and

3
LA¡= 8.038 + 5.671 + 3.092 = 16.801
i=l

(the discrepancy is merely a rounding error).

3.5. THE EIGENANALYSIS OF XX' AND X'X


It was remark.ed m · section
· 3.3 that, in ordinating ecolog1cal . data, 011e
11
frequently begms by forming the product XX' or X'X where X is an s '/...
· 1y, th ese two products are related ' in sorne way. Eacb 15
data matrix · Ob VIous
*Th e theoret.Ical
· · thí5 eJeIJJ eotaíY
. .
dlSCUSSlOil. possi'b'li · a pau
1 ty of finding . of equal eigenvalues is ignored lll
1~
oduct of the same two factors and 1
~~ ~Y~m~
Let us put of the factors differs.

F = XX' and G = X'X.

(The syrobols R and·


Q are not used here sm· R
ce and Q
w-centered matnces; see Table 3.6. The ro f are the products f
ro ws o X and X, o
in forrning the products F and G.) are not centered
Clearly, F is a symmetric s x s matrix d G .
an is a s .
rnatrix. Suppose we were to do eigenanalyses 00 both F Ymmetnc n x n
the results be related? and G. How would
We first answer this question in symbols and th .
en exanune a nu . 1
example. At every step the reader should check th t th . menea
side of an equals sign are of the same size. ª
e matnces on each
Let "A¡ be the. i th. eigenvalue of F ' and let the s-element row vector u'. be
the corresponding e1genvector. I t follows that '

(3.13)
or, equivalently,
¡~
!~i
l

Postmultiply both sides of this equation by X to give

Then, since X'X = G by definition, the equation becomes


(3.14)
( u~X)G = A¡ (u~X) ·

The factor u'X on both sides is an n-element row v~ctor. · value of Gas
z • • • h t A. is an eigen
Comparing (3.13) and (3.14), it is ev1dent t a i f Gis either equal to
wU d. ·genvector o
e as of F, and that the correspon mg ei
or proportional to u'.1 X. b derived froro those of
Thus the eigenvalues
. •
and e1genvec t 0 rs of G can ·
. 1·r ei"ther .n
e espec1allY
F · ·d t omputation, . alys1s
or s ·
ª
or vice versa. This fact is a great 1 0 e
tly exceeds s·
A direct eigenan .
f ons 1f n
o is very large. Thus suppose n grea . ve long computa I
f the n X n matnx . G = X'X would entail ry
TRANSFORMINC DATA M
128 Al~ICts

e results could be obtained much faster by analyzin


were large. Th e sam g the
smaller s X s matnx · F -- XX' · ·
Now cons1.der a numerical example. . The results
. . are g1ven. here to 3
.
decimal p1aces, although 12 were used m the ongmal computations.
The 2 x 3 data matrix is

8
X= (1i 2

Then

130 46 109)
F =XX'= ( 122 105 ) and G = X 'X = 46 68 72 .
105 189 ( 109
72 113

The first eigenvalue and eigenvector of F, which can be found by


Hotelling's method (or by other methods not described in this book), are

;\ 1 =265.714 and ui=(0.590, 0.807).

This result can be checked by evaluating both sides of Equation (3.13) and
finding that

uiF = A1ui = (156.755, 214.552).

From the previous argument we know that A¡ = 265.714 is also a


eigenvalue of G, and that the corresponding eigenvector is proportional t
uiX which is

~) =
8
2 ( 10.652, 6.334, 10.589 ).

Then to find this eigenvect , · . liZ


· d or, say V1, of G lt is only necessary to norma
the vector u'1 X· thi 18
' s one by dividing every element in u' X by the squar
root of the sum of f· 1
squares 0 lts elements, namely, 16.301. Thus

v{ = ( 0.653, 0.389, 0.650 ),


and, as is necessary f or an e. t
unity. igenvector, the squares of its elements sull1
129

ns a check that ;\1 and


A Ví are an eigenvalUe-e1g.
envector pair of G
that , note
v{G = ;\1ví = ( 173 .631,
103.255, 172.610 ) .
finding the second eigenvalue of both F and
ective eigenvectors u'2 and v{ is straightf G, say "-2, and their
res P . . . orward Th .
arises as to what is the third e1genvalue A. of th 3 · e _quest10n now
2 matrix F has no third eigenvalue ~he a e x_ matnx G, since the
3
2X . . · nswer is A. = o F
eiaenanalys1s of G gives as the third eigenvector , 3 .' urther, an
º V3 correspondmg to A
3'

v; = ( - 0.4S 6 , -0.483, 0.748)

and it will be found that

v{G = A.3G = (O, O, o)

(disregarding minar rounding errors).


To summarize: suppose F = XX' is an s X s matrix and G = X'X is an
n X n matrix. Suppose n > s and let n - s = d. Then F and G have s
identical eigenvalues and the remaining d eigenvalues of G are all zero. The
seigenvectors of G that belong with its nonzero eigenvalues can be found
from the corresponding eigenvectors of F. To do this, note that the ith
eigenvector of G, say, v/, is proportional to u~X, where u~ is the ith
eigenvector of F. The elements of v/ are obtained by normalizing the
elements of the vector u~ X.

EXERCISES

.l. Consider the following three matrices:

A= uo 2 1).
-1 '
B= ( j
-2
1
-1
o
1
o
3
-!);
1
C=
[-! 1
2
6
130

What are the following products? (a) AB; (b) BC; (e) AC; (d) C .
CB; (f) BCA; (g) CAB. A, (e)
3.2. See Figure 3.7. Four data points, A, B, C, and D, have coorct ·
1nates ·
two-space given by the data matrix X, where in
B e D
2
1 -1
2 -1).
-1

Find the 2 X 2 transformation matrices U1 , U2 , and U3 that wi


respectively, transform X into Y1 , Y2 , and Y3 , as shown graphically ¡
separate coordinate frames in Figure 3.7. (The coordinates of ali th
poin ts are shown on the axes in the figure.)
3.3. Which of the following matrices U1 and U2 is orthogonal?

( 0.49237 -0.61546 0.61546)


U1 = 0.84515 0.50709 -0 .16903
0.30985 -0.51412 0.77814

( 0.49237 -0.61546 0.61546)


U2 = 0.84515 0.50709 -0.16903
-0.20806 0.60338 0.76983

B
A
4 4

D
2

o o
- 1 e
-2

-4
D
-5
- 3 -1 3 5 -6
e -7
-3 -1 o 2

-2

y y2 y3
1

Figure 3.7. See Exercise 3.2.


131
Prove that XX' is symmetric.
J.4.
J.5. Given the 2 X 3 data matrix X, where

4
1
find the 2 X 2 correlation matrix.
6 Suppose A = UAU ', where U is orthogonal and th d' .
B. · A IS. e iagonal matnx

whose diagonal elements are the eigenvalues of A. What are the


eigenvalues of A5 ?
7. Eigenanalysis of a 5 X 5 matrix showed that its first eigenvector was
proportional to (1, 0.87, 0.63, - 0.20, - 0.11). What is this eigenvector
in normalized form?
8. Eigenanalysis of the following matrix

10.16 6.16 -7.48


0.241
s= 6.16 5.36 -4.28 -1.16
-7.48 -4.28 5.84 -0.12
r 0.24 -1.16 -0.12 1.36

shows that the first, third, and fourth eigenvalues are

i\ = 19.71; A4 = 0.02.

d 0 an eigenanalysis of S to
What is A. 2? [Note: there is no need to
answer the question.] . le of your own
. h a numenca1examp .
Show, using symbols, and test wit XY)' . the transpose of XY,
devising, that (XV)' = Y'X'. _:'he;e
Show, likewise, that (ABC) - C B
i .
[N::e: these results will be

needed in Chapter 4.] The second eigenvector


. 1 on page 12 8. . 1 e and
Consider the numencal examp e the second eigenva u
of F =XX' is A2 = 45.285. What are
eigenvector of G = X 'X?
Chapter Four

Ordination

4.1. INTRODUCTION

Ordination is a procedure for adapting a multidimensional swarm of data


points in such a way that when it is projected onto a two-space (such as a
sheet of paper) any intrinsic pattem the swarm may possess becomes
apparent. Severa! different projections onto differently oriented two-spaces
may be necessary to revea! all the intrinsic pattern. Projections onto
three-spaces to give solid three-dimensional representations of the data can
also be made but the results, when reproduced on paper as perspective
drawings, are often unclear unless the data points are very few in number.
This definition of ordination may, at first glance, appear to contradict
that given at the beginning of Chapter 3. According to the earlier de~nition,
ordination consists in assigning suitably chosen weights to the different
. . . h " " can be calculated for
spec1es m a many-species commumty so t at a score d
. Th h adrats can be ordere
each quadrat (or other sampling umt). en t e qu . . 1
" d h
( ordinated") according to their seores, an t e resu lt is a one-dunens10na

ordination · · stems
. ·~ nt species-we1ghtmg sy ,
Often one wants to use two or more di ere . · are obtained.
. · al ordmat10ns
an d then two or more different one-dimenswn. h wn in Figure 3.1
Th . 1 b mbmed as s o . d
. ese separate results can convement Y e co . . h ve been combme
1l1 Ch ·
apter 3, where two one-dunens1ona· 1 ordmat10ns ª
133
134
ORD1NAl1a~
to give a two-dimensional ordin~tion in which every ?ºint (quadrat) ha
1·t coordina tes two seores obtamed from the two dtfferent weighr s as
s
1eros. Obviously' if one were to use s different weighting
. .
systems (whlllg 8Ys,
ere .
the number of species), the result would be an s-d1mens10nal ordinar .8 is
.
swarm of data points would occupy an s- d1mens10na . 1coordinate fram ion , the
. h . . 1
by projecting the pomts ~nto. eac ~s m tum, one cou d recreate each of
eand

the ene-dimensional ord~nat10n~ y1elded by one o~ th~ chosen species.


weighting systems. More mterestmgly, one could proJect 1t onto one of h
two-dimensional planes defined by a pair · o f axes an d ob tam
· a two-ctun· te
en.
sional ordination. There are s(~ - 1)/2 such pla~es, hence s(s _ l)/
2
different two-dimensional ordinattons would be poss1ble. In practice, proba.
bly only a few of them would be interesting.
N ow let us consider how species-weighting systems can best be devised.
Clearly, if they are chosen arbitrarily and subjectively, there are infinitely
many possibilities. What is required is an objective set of rules for assigning
weights to the species. A way of arriving at such a set of rules is simply to
treat every species in the same way, and conceptually plot the data in
s-space; the result is the familiar swarm of n data points in which the
coordinates of each point (representing a quadrat) are the amounts it
contains of each of the s species. One then treats the swarm as a sin~e
entity and adapts it (e.g., by one of the methods described in the following)
in a way that seems likely to reveal the intrinsic pattem of the swarm, if it
has one, when it is projected onto visualizable two or three-dimensional
spaces.
Many methods of adapting a swarm of raw data points have been
invented and sorne of them are described in succeeding sections of tbis
chapter. What unites the methods is that each amounts to a technique for
adapting raw observational data in a way that makes them (or is intended to
make them) more understandable. Since the initial output of each method is
an "adapted" swarm in s dimensions, we again have n data points eacb
with_ s coordinates; each coordinate of each point is a function of the
spec1es qua~tities. in the quadrat represented by that point.
The relattonship of the two definitions of ordination should now be cle~r.
When the n columns of an s-row data matrix are plotted as n points 111
s-space, the patt~m of the swarm can be changed by assigning diff~reo!
sco~es to the ~pec1es (equivalently, by multiplying each row of the matnx bY
a different we1ghf1n f t ) 0 . und,
g ac or · r, looking at the process the other way ro
t he swarm as a whol b . ore
e can e modified (to make its interna! pattern 111
¡NfflO oucr10N ns

erceptible) and, provided this is _done appropriately, the effect is to


cJearlY P t weights
. differen .. to the several spec1es.
giveI 15
. worth notlcmg the. parallei
. between ordinating
. a swarm of dat a
t
iJltS an d drawing a two-dimenswna]
h h b. geographic
. map of the whole earth or
poJarge par t of it. In both cases
. t e o llject is to .representa
. pattem in a space
a dimensions t at 1t actua y occup1es, while at the same time
1 of fewer 8 muchas possible of the infonnation it contains and (sometimes)
retauung;stortion to a minimum. (Distortion is not always abad thing; see
keeping
9
page i o.) apher's problem is, of course, much simpler than the ecologist's
The geogr apher always starts with a visualizable pattem in only three
· e the geogr · liz bl ·
swc
· .
s10ns, wh ereas the ecologist starts with an unv1sua a e pattern h m Th s
diJI1en . 8 • s is often Jarge, and differs from one case to anot er. e
diroenswnis ' the same, however. And just as one can choose among a large
princíple

/
/

( b)

(a) . t. ns illustrating the


. different map ProJeCee-dimens1on
10 . al globe,
America usmg art of the thr .
Figure 4.1. Two maps of s.o uth dun·ensional map of p f an s-dimens10nal d [The swarm of data
map
·
Parallel between constructmg a two- .
(an ordmatIO· n) o the techroq . t 0°w in (a )
· ue use ·
·
and constructing a two-dunensi ·onal map ngly intluence d by tral roen·a· an 1s a 6
.
Pomts. . d 15
The result obtame . very stro
hi (equaton"al)·' the cen 1
Projection for both maps is stereograp e
and at 120ºW in ( b ).]
ÜRDINAl
136 I()~

number of map projections when drawing a geographic map (see Figure 4


examples) so one can choose among a large number of ordin . .1
for two ' . Th . ati00
. when ordinating ecolog1cal data. e ments and drawba k
tech mques e s 0¡
the various ordination methods have been debated for years and the debate
is Iikely to continue.
The motives for drawing geographic maps are, of course, multifarious·
show climatic data, geological data, bird migration routes, shipping rou~:o
ocean currents, population densities, and so on; the list is endless. But th~
motive for doing ecological ordinations is always the same, namely, to revea]
what is hidden in a body of data, and it is at this point that the parall~J
between ecology and geography breaks down. A map of South America like
that in Figure 4.lb, although obviously "wrong" in a way that would
require severa! paragraphs to define precisely, would not mislead a
sophisticated map reader. This is because the true shape of South America is
thoroughly familiar, and if the distorted two-dimensional version <loes raise
any problems, one can always inspect the ultimate source, a three-dimen-
sional globe. An ecologist does not have these escape hatches: the ecologist's
data are always unfamiliar, and (except in rare cases when s ~ 3) inspection
of the source data is impossible.

4.2. PRINCIPAL COMPON ENT ANAL YSIS

Principal component analysis (PCA) is the simplest of all ordination meth·


od~. ~he data swarm is projected as it stands, without any differential
weightmg of the sp.e~ies, onto a differently oriented s-space. Equivalently,
the axes of the ongm~l coordinate frame in which the data points are
(conc~ptually) plotted is rotated rigidly around its origin. This rotation is
done m such a way th ª,t re¡ative
· to the new axes the pattern of the data
swarm shall be, colloquially speaking, as simple as' possible.

Ordinating an "Unnatural" Swarm With


a Regular Pattern

Before defining the phrase "a . . . a1


terms, it is instruct· .s simple as poss1ble" exactly, in mathemauc
ive to contmue th d.1 · I tbe
account .of PCA that f e scuss10n at an intuitive level. n
· ht pomts
. at the 0 11ows we env · · tbe
eig isage, as a swarm of data points,
.18 corners of a cub 1·d " rJll
utterly unnatural · · ° or box." The fact that such a swa
is irre1evan t '· u nnatural assumnfl()TI" ~TP \T~lW:lhle 1·¡ tbe
coMPONENT ANALYSIS
pRINCIPAL

137
argumen t easier to comprehend s b
ake an . · u sequently (
J1l d es devised for analyzmg "unnatural" d t page 142) the
oce ur a a (the com f
pr lied to more believable data swarms that i ers o a box)
are app , s, ones that are diffuse and
irregular. .
e on
sider the eight pomts at the corners of the box . p·
. . . . 10 igure 4.2a (only
of the comers are V1Slble m the d1agram since for th k .
seven . ' esa e of clanty
the box is s~own as an opaq~e solid). The center of the box is at the origi~
of the coordmates. If the pomts were present alone, without the edges, and
were Pro1ected
J
onto the plane defined

by the x 1- and x 2-axes (the xv x
plane), there w~uld be a confusmg pattern .ºf points with no irnmediately2
0 bvious regulanty; the same would be true if they were projected onto the
x plane, or the x 2 , x 3 plane. However, if the box were rigidly rotated
~~o:i the origin of the coordinates until it was oriented as in Figure 4.2b ,
and its comer points were then proJected onto the three planes, each

::x¡2
1 s
1

R
- - - - - :X:¡

( b)

G- - - - - - - yl
R

p 1 Q
1

: . two different
.1 coordina te traro e in. entation that
.
F· h dimensional .on to an on s are
l~e 4.2. A box (cuboid) plotted in ~ t ~ee~t appears aiter. rotaU of the box's co~e~epth
onentations. In (a) the box is oblique; m ( ) The coordinates ·dth height, an
bnngs
· its edges parallel with the coord.ma te . axes.
the Jower grap b· Tbe w1 •
~enoted by xs in the upper graph and b~ ys 1ll
f the box are PQ, QR, and RS, respect1vely.
138

. . Id show one of the three faces of the box as a rectangl


pr0Ject10n wou Id b d. e; the
"true" pattern of the points in three-space wou e isplayed as clearly as
"bl d the fact that they formed the corners of a box would be
poss1 e an come
obvious. . .
We now describe the task to be p~rformed m ma_thematical terms. To
repeat, we wish to rotate the box relativ~ t~ the coordmate f:ame or, Which
comes to the same thing, rotate the coordmate frame relative to the box
However, this sentence does not specify in operational terms exactly wha;
needs to be done (unless, of course, one were to do the job physically, with a
wood and wires model). The actual operation to be performed consists in
finding the coordinates in three-space of the corners of the newly oriented
box as shown in Figure 4.2b from a knowledge of their original coordinates
as in Figure 4.2a. We denote the original coordinates by xs and the new
coordinates after the rotation by ys. The axes in Figure 4.2a and 4.2b are
labeled with xs and ys accordingly.
The original coordinates (three for each of the eight points) form the
columns of a 3 X 8 data matrix. Therefore, to rotate the box, it is necessary
to premultiply the data matrix by a 3 X 3 orthogonal matrix (see Section
3.2) that will bring about the rotation required. The problem, therefore,
boils down to finding this orthogonal matrix.
To see how it can be found, notice that the projections of the width,
height, and depth of the box (the lengths PQ, QR, and RS, respectively, in
Figure 4.2) have their true lengths only when they are projected onto axes
parallel with the edges of the box, that is, onto the axes of the coordinate
frame when it is oriented as in Figure 4.2b. Given this orientation, the
projections of the edges on the axes can be seen to be as follows:
Projection of edge PQ ( PQ on the Ji-axis
( and the other three = o on the y -axis
2
edges parallel with it) O on the J3-a~s;

Projection of edge RS (O on the ri-axis


( and the other three = RS on the y2-axis
edges parallel with ¡t) o on the y 3-axis;
Projection of edge QR
( and the other three =

0
on the y 1-axis

edges parallel with it) QR on the Yr~s


. on the y 3-axis.
But given an 0 bli
que orientation lik · rY
' e that m Figure 4.2a for instance, eve
MPONENT ANALYSIS
NorAL co
pR I
139

dge has a nonzero lengths. on ali three axes, and ali these projections are
"true"projection
e
1 the . · f h ·
ess tbantherefor e requue a rotation o t e coordmate
. . frame that will cause
we d e of t he box to have a .nonzero
. h
PIOJection (equaJ to its true length)
. Only, and zero .proJectlons on t e other two axes.
each e gaJCIS
0
. This require-
o onet spec1fies m
. . mathematical terms exactly what the desued rotation is to
men
acbieve. . the next stages of the discussion, considera numerical examp_le
graphica1 representation. The box to be rotated 1s the oblique box m
andToitsclanfY.

f2
1

(a) 1
1
1
A 1 e

( b)
A
E

------yl

e
1

G 1 i· ue
1 • an ob iq
1

in Table 4.1. (a) X in tbe tabl~;,


The boX 1.11 (b) The
~ ~
'.
are cnven corners
. dinates ts of the matnxoord;.,ates of· ular to the
Figu,,, 4.3. The box whose_ coor ers are the elernen ate axes; tbe eare perpendic tbe widtb
Position· the coordinates of its coro 'th the coordin d y3-axes . therefore,
º'
b rotated' until its edges are parallel
are the elements of Y in the table.
P
Wl h xb~
- an tb page,
Observe that t. e plan< ol eoreshortenuig
ACGE 1s m t without f
.

Iane of the page. Hence m· ( b ) the tace


f the box are shown
d(Af:)"" 12 and height d(AC) = 10 0
140

41 DATA MATRICES POR THE POINTS (CORNERS OF A B


~~~~URE 4.3a.(MATRIX X) AND FIGURE 4.3b (MATRIX Y). O)()

The data matrix X.ª


A B e D E F G lI

X=
(-4.04
7.07
-8.66
1.41
1.73
7.07
-2.88
1.41
2.88
-1.41
-1.73
- 7.07
8.66
-1.41
4.04
-7.07
3.26 o -4.89 -8.16 8.16 o 4.89 -3.26
The matrix y giving the coordinates of the points as shown in Figure 4.3b ,
af ter rotation of the box.
A B e D E F G H
y=(~
6
5
6
-5
6 -6 -6 -6
-5 5 5 -5 -6)
-5
-4 4 -4 4 -4 4 -4

ªThe capital letter above each column is the label of the corresponding point in Figure 4.3a.

Figure 4.3a. The three coordinates of its eight comer points are the elements
in the columns of the 3 X 8 data matrix X shown in Table 4.1. (The reason
for choosing these coordinates becomes clear later.) In Figure 4.3 (in
contrast to Figure 4.2) the three-dimensional graphs have their third axes
perpendicular to the plane of the page. Therefore, what the drawing in
Figure 4.3a shows is the projection of the oblique box onto the x 1 , x 2 plane.
The size of the box can be found by applying the three-dimensional forro
of Pythagoras's theorem. Thus d(AB), the length of edge AB whichjoins the
points A = (xn, xw x 31 ) and B = (x 12 , x 22 , x 32 ), is

2
= /( -4.04 + 8.66) + (7.07 - 1.41-) 2 + (3.26 - 0) 2
= 8.
Likewise,

d(AE) = /(xll - X15)2 +(x21 - X2s>2 +(x31 - X35)2

2
= /( - 4 .o4 - 2 .88) + (7.07 + 1.41)2 + (3.26 - 8.66)2
= 12.
In the same wa .t
y, 1 may be found that d(AC) = 10.
AL coMPONENT ANALYSIS
pRrt•.iCIP

,...r w supposethat the box is rotated 141


1~º ~w·
dínate axes. Let the rotation brin t lts edges are
'ºº~re 4.Jb. The width d(AE) and hei~h~~ box into the p¿~:•llel With the
figthe rotatíon and are still 12 and lO u . (AC) of the box ion shown in
bY lllts, res · are uncha
i.; h cannot be seen because it is at right Pectively· its d h nged
w1vC . · angles to h ' ept d(AB)
stil l 8 units. It 1s easy to see
. that the coord"mates of tthe plane of the page 18. '
newly oriented box are given by the columns of the 3e corner points of ~he
. Table 4.1. Therefore, we now need to fi nd the orthoX 8 matrix y shown
in
which gonal matrix U for

UX=Y.

To do this notice first the form of the product of y . .


transpose Y'. It is the 3 X 3 matrix postmultiplied by its

YY' =
(
288
O
o
o
200
o
o '
125
l
a diagonal matrix. 1t is diagonal because the points whose coordinates are
the columns of Y form a box that is aligned with the coordinate axes. (This
last point is not proved in this book; it is intuitively reasonable, however,
and should seem steadily more reasonable as the rest of this section
unfolds.)
Since we are to have UX = Y, we must also have
(4.1)
UX(UX)' = YY'
t ·x product UX. Now
where (UX)' is the 8 X 3 transpose of the 3 X 8 man
, X'U' Thus (41) becomes
recall (from Exercise 3.9) that (UX) = · ·
(4.2)
UXX'U'
-,..._
= yY'.
. . of the same forro as
Next observe that both XX' and YY' are SSCP matnces d synunetric. We
R. f rse square an
in Section 3.3. Both matrices are, 0 c~u ' Then (4.2) becomes
use the symbol R for XX' and denote yY by Rv· (4.3 )
URU' = Ry· I

. It is ciear that U, U ,
Finan ·hE ation (3.lZ).
Y, compare this equation w1t qu
142

TABLE 4.2. THE EIGENANALYSIS OF R = XX'.ª

The SSCP matrix is


205.21 - 65.21 3.74)
R = XX' = _ 65.21 207.89 -46.06 .
( 3.74 -46.06 202.25
The eigenvectors of R are the rows of U where
-0.57735 0.70711 -0.40825)
u= - 0.57735
( 0.57735
o 0.81650 .
0.70711 0.40825

The eigen:~e:~~ ~e( ;e8no~:º elr)en~ :~~w;,e


o o 128
The coordinates of the box's comers when it is oriented as in Figure 4.3b
are given by the columns of

UX=Y=(~ ~ -~ -~ -~ -~ =~ =~)
4 -4 4 -4 4 -4 4 -4

ªX is the data matrix defining the comer points of the box in Figure 4.3a.

and R v can be obtained by doing an eigenanalysis of R; the nonzero


elements of Ry which, as we have seen, are ali on the main diagonal are the
eigenvalues of R.
Table 4.2 shows the outcome of doing an eigenanalysis of R. As may be
seen, the eigenanalysis gives the eigenvectors of R; these are the rows of U.
U is the orthogonal matrix we require. The product UX gives Y, the matrix
of coordinates of the comer points of the box in Figure 4.3b, which has
been rotated to the desired orientation with its edges parallel with the
coordinate axes.
Thus the original problem is solved. To summarize: the solution is found
by doing an eigenanalysis of the SSCP matrix R = XX' where X is tbe
original data matrix.

Ordinating a "Natural", Irregular Swarm

N ow consider th li · ar111s·
S e app catwn of this procedure to "realistic" data sw ·
uppose one had an s x d . cíes JU
n ata matnx X listing the amounts of s spe
L coMPONENT ANALYSIS
pfllfloJCIPA

drats (or other sampling units) and th 143


n ~ua represented by a swarm of n points . at these data are th
beJJlg Th d in s-spa OUght of
. regular. e proce ure for perfor . ce; the swarrn. . as
and 1r b 1 d mmg a PCA lS d1[us
nows (the sym o s use are the same, and h on such data is e
fo in Section 3.3 and Table 3.6. A num . ave the same rneanm· as
rhose . eneal exa 1 g, as
data IIlatrix is shown m Table 4.3). mp e using a 2 ><
11

1. Center theh
data by species (rows). Do tbi b
f h s y subtracting f
eleIIlent in X t e mean o t e elements in the same r rom every
data IIlatrix XR. ow. Call the centered
2. Forro the s X s SSCP. matrix R = XRXR.
3. Forro the s X s covanance matrix (l/n )R As w h .
· · · e s a11 see, tbis step
is not strictly necessary, b ut 1t 1s usually done.
4. Carry out ~ eige~analysis o~ R or (l/n )R. The eigenvectors of these
two matrices are 1dentical. Combme these s eigenvectors, each with s
elements, by letting them be the rows of an s X s matrix u; u is orthogo-
nal. Tue eigenvalues of R are n times those of (l/n )R; hence it is
immaterial whether R or (1 / n )R is analyzed. Let A denote the s x s
diagonal matrix whose nonzero elements are the eigenvalues of the covari-
ance matrix (1/n )R. Then [compare Equation (3.12)]

It follows that

URU' = nA
the eigenvalues of R.
and the nonzero elements of n A are tnx· y= uxR. Each
· g the s X n roa f
5a. Complete the PCA by f ornun . f of the data points. l
oordmates 0 one th
column of y CY1ves the new set o f s e . ·t is found that e
O" coordmates, i h
the points are plotted using these new . h nged. The only e ange
.
pattem of the points relat1ve to one
nother 1s une
ª . ª
.t has been rotate ar
d ound
a smgle entl Y
produced is that the whole swarm as a·nate frame.
· f h ew coor 1 . · 1swarm
tts centroid, which is the origin o t en . Table 4.3. The ong~na 4 4a·
· h sults m · figure · '
Figure 4.4 shows graphically t e re f X is plotted in 1 nts of
of . the elements o t the e eme
Pomts whose coordinates are h as coordina es f the whole
the transformed swarm which after pCA as the centroid o
y . ' ay be seen,
' is plotted in Figure 4.4b. As ro
TABLE4.3. THE STEPS IN A PRINCIPAL COMPONENTSANALYSIS OF DATA MATRIX #9.

The 2 X 11 data matrix is


X =(20 26 27 28 31 33 39 41 42 48 50)
50 45 60 50 46 55 35 25 33 24 28 .
The row-centered data matrix obtained by subtracting x1 = 35 and x 2 = 41
from the first and second rows of X, respectively, is

Xa = (-15
-9 -8 -7 -4 -2 4 6 7 13 15)
9 4 19 9 5 14 -6 -16 -8 -17 -13 .
The SSCP matrix is
( 934 -1026)
R = -1026 1574 .
The covariance matrix is
_!_ R = (
84.9091 -93.2727)
n -93.2727 143.0909 .
The matrix of eigenvectors is
u
= ( 0.592560 -0.805526)
0.805526 0.592560 .
The eigenvalues of the covariance matrix are the nonzero elements of
A= ( 211 704
0 16~95 ).
The transformed data matrix ( after rounding to one decimal place) is
y=( -16.1
-6.7
-8.6
-4.9
-20.0
4.8
-11.4
-0.3
-6.4
- 0 .3
-12.5
6.7
7.2
-0.3
16.4
-4.6
10.6
0.9
21.4
0.4
19.4)
4.4
145

a~.x"
F1gur~.
bown by a cross), which is X2) = (35, 41) in Figure 4.4a, has
swaIJTI (\ed to the ongm at (Y1, Y2) - (O,?! m 4.46, and the swann
been sJuf has been rotated so that its I_ong aius is parallel with the
as a .whole
bis statement is expressed more prec1sely later).
.
1
(t
•aJ(!S Sa describes PCA as a process of rotatmg a swann of points
.
paragraph centr01.d · lt is instructive to rephrase the. paragraph, calling it Sb,
ar0
that
und its1·t descn.b es the process as one of rotatmg the coordinate frame
so .
elat1ve to the swarm.

X2

60 •

50 • •
• •
40 +
(a) ••
30

• •
20

10

o 30 40 50 60 X¡
o 10 20

Y2
20

(b) 10

• • •
Y1
-20
-20 -10
• •
• - 10

d data whose
- 20
. inal un transforme warm.
( b) Af ter
The ong .d of the s has been
igure 4.4. Two plots of Data M4ª3trixThe # 9. (a)marksarm's
cross the centrmd 'fbe
centro1 .are swarm
given by Y in Table
eordmates
. .
are given
. . of tby Xm · Tabledinates
· · . at al
1s
be ne w coor. t measure d o
thengswaxes Y1 an d Y2•
A. The ongm
tated. The coordinates o f each pom ,
.3.
146
ORD1t-.¡Al
I()~

5b. Forro the matrix product Y= UXR as already directed 1'


the original data swarm after centenng · the raw data by rows· ·th henp¡
01
nates are the elements of XR (see Figure 4.5). We now wish ~o e coorct¡,
. . Thi . . 1 . rotate h
axes rigidly around the ongm. s 1s eqmva ent to drawmg new te
and y2-axes) which must go through the origin and be perpend~Xes (the
Y1 1 . h icutar t
each other in such a way that, re ative to t em, the points sh ll o
ª have
· to find the equaf
coordina tes given by Y. The pro blem, th eref ore, 1s
. h ionsor
the lines in the x 1 , x 2 coordmate frame t at serve as these new axes.

This is easily done. Note that any imaginable point on the Ji-axis h
. . as a
coordinate of zero on the y1-axis, and vice versa. Hence the y1-axis is thes
of ali imaginable points, such as point k, of which it is true that et

Indeed, the set of all points conforming to this equation is the y1-axis, and
its equation is

To draw this straight line, it is obviously necessary to find only two


points on it. One point is the origin, at (x 1 , x 2 ) = (O, O). Another pointcan
be found by assigning any convenient value to x 1 and solving for x2• In the
numerical example we are here considering, for instance, let us put X1 :::: lü.
Then

-0.805526
0 .592560 X lO = - l3 .5 9 .

Hence two points that define the Y1 -axis are

(x1, x2) = (O, O) and (x 1, xi)= (10, -13.59).

This li~e is shown dashed in Figure 4.5 and is labeled .Y1 · The Yfa,Xis ¡,
found m the same way. It is the line
coMPONE NT ANALYSIS
r.JClpAL
pfl J
147

Figure 4.5. An:other V:ªY of portraying the PC~ of Data Matrix # 9. The points were plotted
using the coordinates rn the centered data matnx XR (see Table 4.3) with the axes labeled x1
and Xi- The new axes, the y 1- and y 2 -axes, were found as explained in the text. Observe that
the pattern of the points relative to the new axes is the same as the pattern in Figure 4.4b.

We have now done two · PCAs, the first of the comer points of a
three-dimensional box, and the second of an irregular two-dimensional
swarm of points. It should now be clear that, in general (i.e., for any value
of s), doing a PCA consists in doing an eigenanalysis of the SSCP matrix

or of the covariance matrix

where Xa is the row-centered version of the original data matrix. . f


. f h data pomts rom
Then one can either (1) find new coordmates or t e n
the equation

y= UXR,
Whe . he eigenvectors of R and
re U is the orthogonal matrix whose rows are t
(1 /n)R·
a· ate axes from the
h
' or (2) find the equations of t e ro ª
t ted coor in
148

equation
Ux =O,

1 ment column vectors


where x an d Oare the s-e e

x= and o= rn
. Ux = 0 denotes s equations of which the i tb is
and the equat10n

The ith axis is defined by the s - 1 simultaneous equations u~x =O with


k = 1,2, .. .,(i -1),(i + l), ... , s.

Figures 4.4 and 4.5 amount to graphical demonstrations of alternatives


(1) and (2) when s = 2.
Regardless of how large s is, there are s mutually perpendicular axes,
both for the original coordinate frame and the rotated frame. The new axes,
namely, the y 1-axis, the Yrax.is, ... , and the Ys -axis, are known as the first,
second, ... , and s th principal axes of the data. The new coordinates of the
data points measured along these new axes are known as principal compo·
nent
IS seores. For example, the ith principal component score of the jth poinl

Y;J = U¡1X11 + U;2X2¡ + ... + U¡sXsj·


Thus it is the weighted d
by species m ) sum of the quantities (after they have been centere
eans of the s s · · the
weights. After PCA . pecies in the jth quadrat. The us are
. each pomt h . · f ach
spec1es in a quadrat b . as as coord1nates not the amount o e
quadrat. ' ut vanously W~ighted sums of all the species in the
The term princi
1
pa cornponent d po·
nent score for any d . enotes the variable" the principal coJIJ
data is ata pomt"; hence the ith principal componen! o! the
oMPONENT ANALYSIS
..i[IPAL C
p~lr•
149
al step in an ordination by PCA th
'[he fin d . . ' e step that b
CA to be interprete ' is to mspect the pattern of ena les the result of
aP projected onto planes defined by th the data points when
tbeY are d . e new t
. . al axes). The ata pomts can be proiect d ' ro ated axes (the
nnc1p 1 . J e onto the
P Jane the Yi, Y3 P ane, and, mdeed any pl Y1' Y2 plane, the
y3 P ' ' ane spe ·fi
J'P. rnetiroes helpful to look at a perspective d . ci ed by two axes.
U s~uck 0
in cork-board that shows the pattern ;~: ~ or solid model of
15 1
5
ª
pins 0f the principal axes, usually the y y and e ata points relative to
111ree . i, 2, Y3-axes.
We rnust now define prec1sely the consequences f .
. 1 d
. gular s-dimens10na ata swarm so as to make its P . . o rotatmg a diffuse,
irre . . (. . fOJections onto spaces
0f fewer dunens10ns m practice, onto spaces of two or thr d' .
'bl ,, h ee unens10ns)
"as simple as poss1 e or, per aps more accurately " lin
. , as revea g as
possible." Rec~ that .1f one does ~ PCA of the comer points of a box-and
it can be an s-dllllens10nal box w1th s taking any value-the pattem of the
points when projected onto any of the two-dimensional planes of the
rotated coordinate frame is always a rectangle. What can be said of
the projections onto different two-spaces, defined by principal axes, of a
diffuse swarm?
The answer is as follows. Consider the new SSCP matrix and the new
covariance matrix calculated from the principal component seores after the
PCA has been done and the coordinates of the points have been converted
from species quantities (xs) into principal component seores (ys). These
matrices are

Ry = YY' and

respectively
I · · 3) d Table 4 2 (pages 141
t should now be recalled, from Equat10n (4· . an . · s. all their
and 142), that both Ry and (1/n )Rv are diagonal mNatnce ~e already
· d.iagon al are zero. ow,
elernents except for those on the mam
know [see Equation (3.9), page 106] that

cov( Y1' Ys)


var(Yi) cov ( Ji, Y2 ) )
l cov(yz, Ys
;Rv = c?~~~2.' !:~ .. ~ª:~~2) ........... · · · · · ·.
var(Ys)
cov( Ys , Yi) cov( Ys , Y2 )
()~[)¡~
A.t1r,
150 11 . . . .
h t the COV an .anees of a paus of Principal C()O)
1 tberefore_ follows ª
. t her words, they are all uncorrelated . with ne: anP(inl
0 111
t
seores are zero; .in otuence of a PCA, however, . .1s the
1 .following
. . 1t (;·ci11ilr
The chief conseq 1977) that the first pnne1pa axis is so orient <tn bé
d( Pielou . · 1 Cd et
prove see e.g., . f ' the n first-pnnc1pa -eomponent . seores as· grc· \ l¡i
make the vanance o seores Yn• Y12• ... ' Yin• th~ c?ordmates of the n~t. a1
possible; these are the the y -axis (the first pnnc1pal axis). In COll d&1a
Points measured along h aJs is oriented in such a way that Whe oqu1a1
. ns that t e n the
terms, this mea . d nto it they have the greatest possible dis . n
data pom . ts are proJecte o Pers,
00
d"
or The
"sprea
second · pnnc1pa
. . 1 axis is so oriented 1as to make the variance of th en
. . 1-component seores (the va ues Y21,
nd-prmc1pa d Y22' . · · ·, Y2 n) as great.a~
seco. .
oss1ble subject o t the restriction that the seeon . 1 . must
axis f be perpenct¡cu.
P '
lar to (synonymous ly' orthogonal to) the first axis. t is ounct that the fi rst
and second-principal-component seores (the Yi s and Y2S) are uncorrelated
(have zero covariance). . . . . . .
Likewise, the third pnnc1pal axis 1s so onen ted as to make the vanance of
the n third-principal-component seores as great as possible, subject to the
restriction that it must be orthogonal to both the first and second axes. This
third set of seores is uncorrelated with either of the other two.
And so on, with each succeeding set of seores accounting foras muchas
possible of the remaining dispersion. Ali the new variables (the principal
component seores) are uncorrelated with one another.
The orientation of the final, s th, principal axis is fixed; it must be
orthogonal to the other s - 1 principal axes.
Returning to the numerical example in Table 4.3, it is easily found thal
the covanance matnx' of th · · .
e pnnc1pa1 component seores 1s

1 1
-Ry::::: -vY' = ( 211.704
n n O
16~95) =A.
Thus the Vari.ances of the . . the
eigenvalues of th · .
eu covanance Pnncipal
. eomponent seores are equal to
I t is. mtuitively
.
.d matnx.
greatest "spread" of the o·ºlll in~pection of Figures 4.4 and 4.5 t a Jso
evi ent fr · h t the
tlhakt, although !he data p mts is in the direction of the Y1-axis, and ªhen
oo ed at · Points sh . ·nw
lil the frame of th ow obVIous negative corre1at10 ·on
ex 1 an d Xi-axes (Figure 4.4a ), this. correlatt
coMPONENT ANALYSIS
..1(1PAL
p~I,,

151
. es when the points are plotted in the
11a111sh 4 4b ). fra:rne of the Y a d
f1gure · . . 1 n Y2-axe
( CA as here descnbed IS often used a s
p . . s an ordin .
. aJ work. Such an ordmation is a "success" .f ation method in
¡ogic . . I a lar . eco-
dispers1on (or scatter) of the data Is par ll . ge proportion of th
rot al . a e1 With th fi e
. cipal axes; for then this large proportion f h . e rst two or three
pru1 . . li b . o t e mfor .
. the original, unvISua za le s-dunensional d t mation contained
J1l a a swarm b
Spa ce or three-space and examined This · h can e plotted in
1wo· . · Is w at d. .
s out to acbieve: the data swarm is to be pro· t d or mation by PCA
se t . . ~ec e onto the t .
·onaJ or three-d1mens10nal frame (or frames) th t wo-dunen-
si a most clearly 1 the
real pattern of the data. When three axes are retained . reveas
. . . . ' as IS very often do
the result IS shown m pnnt either as a two-dimensional . ne,
· · f hr . . perspective (or
isometnc) drawmg o a t ee-dimens10nal graph or el .
. . . ' se as a tno of
two-dimens10nal graphs showmg the swarm projected onto the
Y1, Y2 p1ane,
the y¡ , y3 plane, and the Yi , y 3 plane, respectively.
The statement that such a two- or three-dimensional display of the
riginal s-dimensional data swarm reveals the real pattem of the data is
tuitively reasonable, but it is desirable to define more precisely what is
1
·~
p
eant by "real pattem." The observed abundances of a large number of
ecies co-occurring in an ecological sampling unit are govemed by two
ctors: first, the joint responses of groups of species to persistent features
f the environment; second, the "capricious," unrelated responses of a few
dividua} members of a few species to environmental accidents of the sort
at occur sporadically, here and there, and have only local and temporary
ects. In the present context the joint, related responses of groups of
· · " nd the
ec1es constitute "real pattem" or "interestmg data structure, a .
·. " . ,, (This is not to say that m
Pnc1ous, sporadic responses amount to no1se.
· they produce may not
er contexts, environmental accidents and the nms~ Gauch (I982b) that
ª researcher's chief interest.) It has been shown ~ . · ly a few
1 · f 0 rdmat10n, m on
PªYmg the results of a PCA, or indeed 0 any ely pennit an
e · d re than mer
. ns10ns (typically two or three) oes m? . d· it also suppresses
isualiz bl .
. a e s-d1mens10nal pattern
ise ''
.
f
° ..
t be v1sualize '
pnncipa1 co
mponents of t e
h
d
· This is because the first ew fiect the concerte
ª--those with the largest variances-nearly always re of species (hence
on · When a group 1t of
ses of groups of severa! spec1es. lik ly to be the resu
e · · · un e · do
/ºus mdividuals) behave in concert, It IS f t that manY species ~
izect, temporary "accidents." Moreover, the ac s of the environrnen
ect . t" feature
' respond in concert to the "unportan
(
~I

152 .
hole contains redundanc1es; therefore
t body as a w h ... · , the
means t ªh the . xes nee ed to display t e mterestmg
data d . structure,, of
num ber of coordmate ª
h s the to ta1 number of spec1es observed.
the ad ta is far Iess t an '.
. . dinat10n pernn·ts us to profit
. from. the. redundancy.lil
To sununanze. or d cy not much mformat10n 1s lost by rep
of redun an ' . . re.
fi eld data. Because . t ·n only a few d1mens10ns. And the discard d
of data pom s I . . e
senting a swarm . d d axes along which the vanances are srnall).
. formation (on the d1sregar e is
m . (Gauch 1982b). . . . .
mostly n01se h0 d of domg '. a PCA ordination descnbed . m this sect10n can be
The met
. d. .
nous ways as shown in the next sect10n. An example of its use
mod1fie m va . 1 described may be found in Jeglum, Wehrhahn and
· the way prev10us Y . . ..
m
Swan (1971). They samp led the vegetation m vanous . commuruties
. in the
boreal forest of Sas k a tchewan and ordinated theIT data and vanous subsets
of it using PCA.

4.3. FOUR DIFFERENT VERSIONS OF PCA

The method given in the preceding section for carrying out a PCA can be
modified in one or both of two ways.
First, one can standardize (or rescale) the data by dividing each element
in the centered data matrix XR by the standard deviation of the elements in
its row. The resulting standardizied centered data matrix ZR then has as its
(i, j)th element

as we saw in Chapter 3 ( 10 .
(l/n)Z Z' . ~age 7). The SSCP matrix divided by n [i.e.,
now ca R. da] is the correlation matrix (see Table 3.6, page 112). The PCA ~
rne out by domg a · . . . d
of the covanance
· .
matnx. n eigenanalys1s of the correlation matnx mstea
The seconct modification co . . .
analyzing Cl/n)X X' nsists m usmg uncentered data. Instead of
O R a as was do · s · XX'
f course' both th ese modifi ne. 1Il ection 4 .2 , one analyzes (l/n) ·
one can analyze the m t . cations can be made simultaneously. Thus
~efore
vers1ons f p
discussing ~~ (dl/n)ZZ' in Which the (i, j)th element is
a vantages d . · us
x/ª;·
. °
two-d1rne . CA, we eompare the an disadvantages of these. dvanto a
0
th ns1onal swarrn of 10 d resuits they give when apphe
e columns 0 f ata po· . are
Data Mat · lllts. The coordinates of the pomts
nx #10 · tbe
given at the top of Table 4.4. In
RfNT VERSIONS Of PCA
o ptfff
fOU" 153

A.BLE 4.4. FOVR DIFFERENT PCAS OF D


T . . ATA MATRIX #10.
fbe data matnx is
2 25 33 42 55 60
X= ( 62 65 92
20 30 13 30 17 42 27 25 99)
25 43 .
Unstandardized Uncentered PCA Unstandarct·ized Centered PCN
1XX'= ( 3644 1641) 1
; 1641 889 ;¡ XRXR = ( 781.9 132.3)
132.3 93.8
u= ( 0.906 0.423)
u =( 0.983 0.183)
-0.423 0.906
- 0.183 0.983
A = ( 4609 1~4) A= ( 806.4 o)
o
69.2
(Axes yí, y5. in Figure 4.6a) (Axes Y1, Yz in Figure 4.6a)
Standardized Uncentered PCA Standardized Centered PCA
!z;z1 =(4.66 6.06) !zRza=( i 0.488)
n 6.06 9.48 n 0.488 1
u= ( 0.561
-0.828
0.828)
0.561
u= ( 0.707 0.707)
-0.707 0.707
A = ( 13.593 O ) A= ( 1.488O )
o 0.548 o
0.512
(Axes yí', yí' in Figure 4.6b) (Axes y¡'", y{" in Figure 4.6b)

ªThis is the version of PCA described and demonstrated in Section 4.2.

separate sections in the lower part of the table are given, for each of the four
forms of PCA: (1) the square symmetric matrix to be analyzed, (2) the
matrix of eigenvectors U, and (3) the matrix of eigenvalues A.
The results are shown graphically in Figure 4.6. It should be noticed that
the effect of standardizing the raw data (as in Figure 4.6b) is to make the
Variances of both sets of coordinates equal to unity. Thus the dispersions of
the P0ints along the x / a axis and along the x 2/ ª2 axis are the same.
1
Standardizing the data t~erefore, alters the shape of the swarm; after
standard·IZation,
· the swarm ' IS
. noticeably
. less e1ongated th n it was before. ª
PCA Using a Correlation Matrix
Anaiysis 0 f ' . forro of PCA that is
frequ
enuy e
the correlation matrix (1/n )ZRZR IS
. 1 lit ture As as
ª
h already been
expJain ncountered in the ecologica era · d di ed centered
ed, the correlation matrix is obtained from the stan ar z
154

'h.'
• •

(a)
• ~-- ·-)· ····

. ··········· ......... y,

---+-----,=
75 00:- ::.X: I

y;'

.
,/ Y1'"

• .............·

(b)


1 2
4

Figure 4.6. Four versions of PCA applied . to Data Matnx · #10 (see the 4.x¡,4)x,
alon Table · ( axes. U•
) Unstan· ª
dardized data. The raw, nncentered coordmates are measured gd PCA shifts the ongJJ
centered PCA rota tes the axes in to the solid lines labeled YÍ, YÍ- . Cen teredotted lines y , y,. 1bl

PC~
to the centroid of tbe swarm (marked +) and rota tes the axes mto the the x ¡o,,x,/•1
1
Standardized data. The nncentered but standardized data are measured along d shifts th<
axes. Uncentered PCA rotales the axes into the salid lines y{', YÍ'· Centere
origin to the centroid and rotates tbe axes into tbe dotted lines y{" , y{" ·

11 . z•.. In what follows we discuss the ments


data matnx . of stan dardization
pros
t kin . f
ª g or granted that the data have first been centered by rows· The
anct cons of data centering is discussect in a subsequent subsection. jrabl•
In sorne
. thana1Yses
. · in
ºPtion
turn.
. . ·bly) des· .
· · standard1Zat1on of the data is a (possid .tuauo ns~
, o ers ll 1s a necessity. We consider these contraste s1
fERENT VERSIONS OF PCA
fOUR DI f
155

sean dardization is often . desirable


. as a
way of p .
tlJe uncommon spec1es m a community b reventmg the "swamping"
of vnJess data are standardized, the d Y_ the common or abu d
ones. al · Thi
,..,;nate the an ys1s. s happens beca
ºIlllnant .
spec1es are likely t
n ant
dow- . use the q .. o
·es tend to have higher variances (as ll uantities of abunda t
spec1 we as high n
nntities of uncommon species. Standardizat· .er means) than the
qull>' . h . ion equaliz ll
re ax.is rotat10n (t e analys1s itself) is c . es a the variances
be fo . arned out Th .
. bordinate spec1es to have an appreciable "' · us íf one wishes
~u . euect on th
o-ood idea to use standardized data. e outcome, it is a
º nªowever, this <loes not mean that standard·ization . .
is alw d ·
. a rnatter of judgment. It could well be argued th ays. esuable. It
J5 . at the domma t .
t to control the result srmply because they a d . n spec1es
oUgb re onunant F th h
is a risk that standardization may give rare speci · . ur er, t ere
. . es an undesuable prorni
nence; if their presence is due only to chance and · -
. . ' is not a response to an
environmental vanable of mterest, . they are merely " ·
n01se. ,, Therefo re
deciding whether to standardize
. . or not to standardiz·e en tai·¡s a trade-off'
between underemphaslZlilg . . and overemphasizing the less common spec1es. ·
A useful compronnse ~s to exclud~ truly rare species from the raw data, and
after that to standardize these ed1ted data for analysis (Nichols, 1977).
Standardization of the data must be done when the quantities of the
different species are measured in different units. In vegetation studies, for
instance, it is often convenient to record the amounts of sorne species by
counting individuals and of other species by measuring cover. When this is
done, the species quantities are obviously noncomparable in their raw form
and should be standardized before an analysis is done.
Data matrices whose elements are the values of noncomparable environ-
mental variables should also be standardized. Let us examine a particular
example in sorne detail.
Newnham (1968) did a PCA of a correlation matrix that showed the
pairwise correlations among 19 climatic variables measured at 70. weat~er
stations in British Columbia. Thus the data matrix had 19 rows m whi~h
· ·mum temperature m
were recorded such variables as average dailY IIl1Ill . d
· . . . th of frost-free penod an
wmter, average winter prec1p1tat10n, average 1eng . al · f
. h t t. n An -e1genan ys1s o
the like, and 70 columns one for each weat er s ª 10 · . b
th ' ere the correlat10ns e-
e 19 X 19 correlation matrix whose elements w . · Id d 19
' lim · anables y1e e
tween every possible pair (171 pairs) of e ath1c v_th i·genvector then
e1ge ) for t e z e '
nvectors. If we write (u. 1 , U¡ 2 , · • ·' U;,19
th 1. I, '
e th principal component is

Yl. = U.1, 1 Z 1 + U·I, 2 Zz + · · · +U· ¡9Z19


1
'
156

zs are the elements of the original data matrix after they h


where the . ave
d and standardized. N ewnham scaled the e1genvector ele
b een cen tere . ments
(the us) so that the !argest element of each was equal to umty (it is th
relative, not the absolute, magnitudes of the elements of an eigenvector tha:
matter, and one can choose whatever scale happens to be convenient).
The first two principal components-those with the largest eigenvalues--.
were as follows. Only the five terms with the largest coefficients (" weightq
are shown here:
y = 0.97z
3
+ z7 + 0.99z10 + z11 + 0.99z14 + 14 other terms;
1

Yi = z + 0.99z5 + 0.94z 13 - 0.73z 16 - 0.86z17 + 14 other terms.


4

The five most heavily weighted variables contributing to these two


principal components are as follows:
Contributing to y 1 :

z3 winter temperature: average daily maximum, ºF;


z7 winter temperature: average daily mínimum ºF·
' '
z10 fall temperature: average daily mínimum ºF·
' '
zu winter temperature: average daily mean ºF·
' '
Z14 fall temperature: average daily mean, ºF.

Contributing to y2 :

Z4 spring temperature: average daily maximum ºF·


Z5 summer temperature: average daily maximum' ºF·
'
Z13 summer temperature: average d il ' '
. a Y mean ºF·
z16 spnng precipitation: average m
. me
. hes· ' '
Z11 summer prec1p1tation:
· · '
average in inches.
These two com
the d ponents accounted f .
ata, respectively, for a tot or S7.4% and 29.4% of the variance 111
When we consider th . a1 of 86.8%.
two. pnnc1pal
· · components .t . th at are weighted most heavily in the firsl
e vanables
pomts) wh ere wmters
· 1
and' fallis seen th ªt the weather stations (tbe data
component seores, and the sta/ are mild have the highest first principal
and summers are bot a.Ild
10ns where ·
spnngs
RENT VERSIONS OF PCA
ll 01ff E 157
f0 U

. e the negative coefficients of z16 and z17 ) have the highest second
' (nouc
dr~ . al component seores. Thus we can draw the two-dimensional coordi-
princifpame shown in Figure 4.7a and label the four regions into which the
nate r . d . . .
divide it w1th a two-sentence escnpt10n of the climate: the first
atesten ce puts into words the meaning of high and low values of the first
sen
. ·pal component, and analogously for the second sentence.
pnDCl

Axis 2
(a)

l. Cold wlnters l. Mlld wlnters


2. Hot dry summers 2. Hot dry summers

- - - - - - - - - + -- - - - - - --Axis 1

l. Cold wínters l. Mild winters


2. Cool wet summers 2 . Cool wet summers

Axis 2

(b) 5

•• •
• • •••
••• • • ••
••
+• •
• 8
-5 +
• +
+
•• º~~ 5
Axis 1
o
++ ++ + o
+ ++ o
+
+ o
+ o
o o o o
+ + o o
+
-5 o o

. . h .ons in tbe ordination of 70 Britisb


Figure 4.7. (a) A "qualitative" graph labeling t e reg1 bt .ned from a PCA of climatic
Co¡ b. · ( b) The first two axes o ai
d um_ ia weather stations show~ m
.
· . Axis se arates the stations into those w1tb
1
ata d1v1de the coordinate frame mto four reg10ns. . p . t tbose with hot dry summers
rnild · tes the stat1ons m o
and those witb cold winters. Axis 2 separa . . al ordination by PCA of tbe
anct th . ( b) A two-d1mens1on '
corr l ·
ose Wllh cool wet summers.
,, · The symbo1• eno
d tes a station where ponderosa
. e ation matrix, of the 70 weather stat10ns. . . The two species are never found
i~ne occurs; O denotes sitka spruce; +, ne1tber spec1es.
gether. (Adapted from Newnham, 1968.)
158

The scatter diagram in Figure 4. ?b is a two-dimensional ordination f


. . db . o the
70 weather stations. Each stat10n 1s represente y .ª pomt having its fir
and second principal component seores as coordmates. Three diA- st
. . uerent
symbols have been used for the pomts: one for stat10ns where Sitka s
. Pruce
(Picea sitchensis) occurs ; one for stat10ns where ponderosa pine (Pin
ponderosa) occurs; one for stations where neither tree species is found.:
one would expect from a knowledge of these trees' habitat requirements an~
geographic ranges, Sitka spruce occurs predominantly at stations With
marine climates (mild winters and cool, wet summers) and ponderosa Pine
at interior stations with hot, dry summers.
The figure demonstrates very well how an ordination by PCA can clarify
what was originally a confusing and unwieldy 19 X 70 data matrix. Two
coordinate axes (instead of 19) suffice to portray ~ large proportion (86.8%)
of the information in the original data, and concrete meanings can be
attached to the seores measured along each axis.
This last point deserves strong emphasis. Ordinations, especially of
community data, are often presented in the ecological literature with the
axes cryptically labeled "Axis l," "Axis 2," and so on, with no explanation
as to the concrete meaning,. the actual ecological implications, of these
coordinate axes. Without such explanations the scatter diagrams yielded by
ordinations are uninterpretable. As Nichols (1977) has written, "The primary
effort in any PCA should be the examination of the eigenvector coeffi.cients
[to] determine which species [or environmental variables] combine to define
which axes, and why."

PCA Using Uncentered Data

The great majority of ecological ordinations are done-with centered data but
this is not always the most appropriate procedure. Sometimes it is preferable
to ordinate data in their raw, uncentered form. The reason for this will not
become clear until we consideran example with more than two species, an~
ª
?ence data ~warm occupying more than two dimensions. First, however, ~
is worth looking at the results of doing both a centered and an uncentere
PCA on the same, deliberately simplified two-dimensional data swaflll
chosen to demonstrate as clearly as possibie the contrast between the tWO
methods.
eons~'der Figure 4.8. Both graphs show the same seven data ~ oíntS ·nal
plotted m raw form in the frame defined by the X1 and X2-axes. Tbe ong1
DlfffRENT VERSIONS OF PCA
fOLJR 159

(a)
( b)
:X:

"•......... •
.. ..
)('
,
,
·-_,
'',,,

/ "
""" y~
Figure 4.8. Bo~ grap~ s~ow a row of seven data points plotted in an x 1 , x 2 coordina te frame.
(a) The dot~ed lin;s Yi an,d Y1 are the first and second principal axes of a centered PCA; ( b)
the dashed lines Y1 and Y2 are the first and second principal axes of an uncentered PCA.

data matrix is

X =(~ 7
3
6
4
5
5
4
6
3
7 ~ ).
Figure 4.8a shows the principal axes (the lines y 1 and Yi) yielded by a
centered PCA. The intersection of these axes, which is the origin of the new
frame, is at the centroid of the swarm (which coincides with the central
point of the seven). The coordinates of the data points relative to the y 1 and
Jraxes are given by the matrix

-2 .83 -1.41 o 1.41 2.83 4.24)


o o o o o o .
Clearly, the y 1-axis is so aligned that the points ha ve the maximum possible
"spread" along it; their spread relative to the y2-axis is zero.
Figure 4.8b shows the principal axes (the lines y{ and yí) yielded by an
uncentered PCA. The intersection of the new· axes coincides with the
intersection of the old axes; equivalently, the origin has not been shi~ted.
The coordinates of the data points relative to the y{ and y~-axes are given
by

7.07 7.07 7.07 7.07 7.07 7.07)


y(')= ( 7.07 4.24 .
-4.24 -2.83 -1.41 o 1.41 2.83
160

(The prime is in parentheses


.
to show that it <loes not indicate a tra
. h nsposect
matríx.) The y{-axis 1s . so. ahgned that t e sum_ of squares of the Y'
coordinates (their first-pnnc1pal-component seores) 1s as great as possibl 1
.
is 7 x 7.071 2 = 350. Any other d irect10n · f or Y' wou Id g1ve
· a smaller vale. Jt
1
The spread of the points along the y{-axis is irrelevant (it is zero in ~:
example). It is in this that the contras~ between a centered and an un.
centered PCA consists.
It should also be notíced that the data points' second-principal-compo.
nent seores, given by their projections onto the y{-axis, are the same (in this
geometrically regular case) as the first-principal-component seores yielded
by the centered PCA. It often happens, with real, many-dimensional data,
that that the second, thírd, fourth, . . . principal axes of an uncentered PCA
are roughly parallel with (hence give roughly the same seores as) the fust,
second, thírd, ... principal axes of a centered PCA; the equalíty is never
exact except in deliberately contrived, geometrically regular data such as
that in Figure 4.8. And it does not always happen, as we shall see.
1t is worth reemphasizing the contrast between the two analyses. In both
cases the first axis was aligned so as to maximize the sum of squares of the
seven points' coordinates measured along these axes. These sums of squares
are

7
L (Yli ). 2 and
i=I

in the c~n~ered and uncentered cases, respectively. With a centered PCA,


because it is centered, thís sum of squares is proportional to the variance of
the seven Yii values. With an uncentered PCA thís sum of squares is the
sum of squared distances of the points (after projection onto the y{-axi 5)
from the unshift d · · · '
e ongm; It bears no relation to the variance of the Yli
values, which in the example is zero.
Wf e now turn to ª three-dimensional example to illustrate the ecological
use ulness of an u t d p allY
f ncen ere CA. In practice of course there are usu
ar more than three axes (thr . ' ' rnust
li · ee spec1es) and it is unfortunate that we .
ffilt ourselves to three a· . . . aliZ'
abl Th imensions m order to make the analysis visu
e. e reader should fi d · the
many-d1m · n it easy to extrapolate the arguments to
L.1.uens10na1 case.
The three-dimensional e 1 . . . . the
three-dime · al xamp e IS shown m Figure 4 9 The pomts J.1l
•...u .. ns10n scatter di · · g to
agram at the top of the figure clearly belOI1
(a)

10 • •
15 • •
20

25 xi

AXIS 2


10 •

(b) •

AXIS 1

10 • 20 • 30

AXIS 2
(e)
5

• • AXIS 1
10 • 20
-20

-10

•~1t~ters
p·1¡,,, 4 -5

~ng ª
el · (.9.b) A( plot
) Seventh data pomts
. m. tbree-space. Tbey fof!ll two qualitat1vely
. . d11ferent
·
(e¡ from an u
01
e pomts m the coordinate 1rame fonned by tbe first two principal axes
rrespond' e A. One cluster lies on axis 1 and the other very close to axis 2.
ine e ncenter d PC .161
0 mg plot alter a centered PCA. Both clusters lie on axis l.
162

two qualitatively different clusters; there is evidently a four-member set of


quadrats (cluster 1) containing only species 1 and 2, anda three-member set
(cluster 2) containing species 3 together with smali amounts of species 2.
The data matrix (Data Matrix # 11) is

o o
~) .
15 18 21 24
5 10 1 2
( ~
X= 10
o o o 8 15 10

As may be seen, the axes yielded by an uncentered PCA (Figure 4.9b)


could be said to "define" the two contrasted clusters. The first axis trans-
fixes one of the clusters and the second axis grazes the other.
With a centered PCA, the first axis goes through both clusters (Figure
4.9c ). There is a perceptible gap between the clusters but their qualitative
dissimilari ty (cluster 1 lacks species 3, and cluster 2 lacks species 1) is not
nearly so weli brought out.
It should now be clear that in certain circumstances an uncentered PCA
reveals the structure of the data more clearly than <loes a centered PCA.
An uncentered PCA is calied for when the data exhibit between-axes
heterogeneity, that is, when there are clusters of data points such that each
cluster has zero (or negligibly smali) projections on sorne subset of the axes,
a different subset of axes for each cluster. Thus in the example in Figure 4.9,
cluster 1 has zero projection on the x 3 -axis, and cluster 2 has zero projection
on the x 1-axis. When an uncen tered PCA is done on data of this kind, each
of the first few principal axes passes through (or very close to) one of the
qualitatively different clusters. Moreover, these axes tend to be unipolar. On
a unipolar axis ali the data points have seores of the same sign, ali positive
or ali negative. In the example, as can be seen in Figure 4.9b, ali the points
of cluster 1 have positive projections on axis 1; likewise, ali the points of
cluster 2 have positive projections on axis 2.
A centered PCA is calied for when the data exhibit little orno between·
axes heterogeneity and nearly ali the heterogeneity in the data is within-a~es
h.eterogeneity or, equivalently, when the data points have appreciable proJec·
tions on ali axes. With a centered PCA ali the principal axes are bipolar: .ºn
each of them sorne of the data points have positive seores and some negauve
seores ..Axes 1 and 2 ~ Figure 4.9c are both bipolar. aJl
Puttmg thes~ r~qurrements into ecological terms, it is seen that s
uncentered ordmat1on is called for when the quadrats belong to grouP
ENT VERSIONS OF PCA
fl orffER
ro U IC1J

.dentical lists of common species A ,


· g no 111 · centcrcd PCA 15 ·
navin contrast. among the quadrats is less pro called for
t11e nounccd a ,, ti1 1·
~111iel1. ee rather than in kind. nu c r contents
·ff r in degr
dr e ractice data are often obtained for which 1 ·t . , .
In P b is not 1mmediatel
. 5 whether the etween-axes heterogeneity ex d . . Y
obv1ou . . Wh . cee s t1ie w1 thm-axes
eneity or vice versa. en this happens it · b
heterog . ' is est to do both a
t red and an uncentered PCA. If the between-axes h t .
cene . . e erogemty of the
is appreciable, then there w1ll . be . as many unipolar (or al mos t urnpo
. ar,
data . 1
see later) axes as th~re are quahtatively d11ferent clusters of data points. Of
course, the first axis of ~ uncentered PCA is automatically unipolar,
regardless of wh~t~er there is a~y between-axes heterogenity. If there is not,
then the first axis is merely a lme through the origin of the raw coordinate
frame passing close to the centroid of the whole data swarm as in Figure 4.6
(page 154), for instance.
Data are often obtained that do not clearly belong to one type or the
other. Then an uncentered PCA is likely to give one or more principal axes
(after the first) that, although technically bipolar, are so "unsymrnetrical"
that it seems reasonable to treat them as "virtually" unipolar. A bipolar axis
is said to be symrnetrical if, relative to it, the totals of the positive and
negative seores are equal. Obviously, bipolar axes can range from the
perfectly symmetrical to the strongly asymrnetrical; only in the limit, when
the asymmetry becomes total ( all seores of the same sign), is an axis
unipolar. Therefore, in ecological contexts an axis need not be strictly
unipolar to suggest the existence of qualitatively different clusters within a
body of data. Noy-Meir (1973a) has devised a coefficient of asymmetry for
principal axes that ranges from O (for perfect symmetry) to 1 (for co.mplete
asymmetry). He recommends that any axis for which the coefficient of
ª 5Ymmetry exceeds 0.9 be regarded as virtually unipolar. *
Given an uncentered PCA the coefficient, a, is defined as follows. Let
th ' . d b us· let U+
e elements of the eigenvector defining an axis be denote Y '. .
de Th n the coeffic1ent is
note a positive element and u_ a negative element. e

L:u~
a=l---
L:u~
•1his .. PCA since the ~ata points
lbems efi!l!tion is not applicable to the axes of a centered a}ues are only of int.erest for
dl

e Ves ha . .. d. tes As a v . phcable to


Uncent ve negat1ve as well as pos1tive coor !Ila · f to make it ap
erect . h f mula or a
centen~ct axes, there is no point in adaptmg t e or
axes.
164

or
[u~
a=l--- (ir ¿u¡ < ¿u:.).
[u:_
Let us find the coefficients of asymmetry for axes 1 and ~in Figure 4.9b.
An uncentered (and unstandardized) PCA of Data Matnx # 11 yields a
matrix of eigenvectors

0.931 0.364 0 .018)


u= (
- 0.078 0.151 0 .985 .
-0.356 0.919 -0.169

Therefore, a = 1 for axis 1, since all elements in the first row of U are of
the same sign. This result is also obvious from the figure, of course.
For axis 2,

O'. = 1- ( -0.078)2 = 0.994.


0.151 2 + 0.985 2

It is clear from Figure 4.9b that axis 2 should be treated as unipolar even
though three of the points belonging to the cluster on axis 1 have small
negative seores with respect to axis 2. It is these small negative seores that
cause a to be just less than l. Using Noy-Meir's criterion whereby a value
of a greater than 0.9 is treated as indicating a virtually unipolar axis, we
may treat axis 2 in Figure 4.9b as unipolar.
A clear and detailed discussion of the use of centered and uncentered
ordinations on different kinds of data has been given by Noy-Meir (1973a).
For an example of the practica! application of uncentered PCA to field
data, see Carleton and Maycock (1980). These authors used the method to
identify "vegetational noda" (qualitatively different communities) in the
boreal forests of Ontario south of James Bay.

Other Forms of PCA

Th~re are other ways (besides those described in the preceding pages) Íil
which data can be transformed as a preliminary to carrying out a pCA
Data ca~ be ~tandardized in various different ways and they can be
centered in vanou d'A" ' d ne
s Iuerent ways. Standardizing and centering can be 0
coORDINATE ANALYSIS
pRINCIPAL

165
ly or in combination. There are num
Para te erous pos ·b.li .
se t of the methods are seldom used by e . s1 i hes.
Ivfos co1og1sts anct
. book. They are clearly discussed anct e are not dealt with
~id ]lloy-Meir, Walker, and Wtl!iams (1975).
·n tlus . . omparect in N M .
oy- eu (1973a)

4.4. PRINCIPAL COORDINATE ANALYSIS

The methods of ordination so far discussed (the various .


opera te on a swarm of data pomts. . s-space and specify
m vers1ons
ct· A' of PCA) all
. . Iuerent ways of
projectmg these pomts onto a space of fewer than s dimensions. The origin
may be shifted and the scales of the axes may be changed, but at the outset
each data point has, as its coordinates, the amount of each species in the
quadrat represented by the point. Principal coordinate analysis diIBers from
principal component analysis in the way in which the data swarm is
constructed to begin with. The points are not plotted in an s-dimensional
coordinate frame. Instead, their locations are fixed as follows: the dissimilar-
ity between every pair of quadrats is measured, using sorne chosen measure
of dissimilarity, and the points are then plotted in such a way as to make the
distance between every pair of points as nearly as possible equal to their
dissimilarity. It should be noticed that the number of axes of the coordinate
frame in which the points are plotted depends on the number of points, not
on the number of species. Also, that the value of the coordinates are of no
· · · mterest;
mtnns1c · they merely ensure that the pom · ts shan have the desired
spacing.
Before descnbmg
· · ·
how the coordmates are found' we need an acronym
fo r " Pnnc1pal
· · . , None has come into common use. 1n
coordinate analys1s.'
What follows it is called PCO. . . . . must first be
To do a PCO, a measure of interquadrat d1ss1IDll~n~y h measure,
cho sen. Any metric measure may be used. w·11hou t spec1fymg .t e d k. We
!et u · . . . . b t n quadrats J an
.s Wnte 8(), k) for the d1ss11nilanty e wee h t the distance
requir . · pace such t ª al
b e to find coordinates for n pomts m n-s ( as nearly equ
etween Points 1· and k namely d(j, k ), shall be equa1 ore the points so
as p . ' ' 'bl t arrang
th oss1ble) to 8(J k ). Often it proves imposs1 e. o 1 and one must
at th . . . ' h qurred va ues
b eir Pairw1se distances have exactly t e re
e content · h ¡· · ffices to
1 Wit approximate equa 1ties. . . s always su
t
con . shou}d b e noticed that a space o n -f 1 dunens10n. ed in a one-s pace (a
talll n 0 . · be contain
P mts. Thus two points can a1ways
166
contained in a two-space (a plane) or.
always be . . · ' in a
. . three points can b colinear; sumlar1y, n pmnts can ahvays b
line), ace if they chance to le dimensions at most. Hence it could be .e
one-sp . ace of n - . . - sa1d
n of the n pomts m . ( n . 1)-space rather
co tained in a spfi d the coora·natesi
that we need to n. . . d d true but the argument 1s s1mpler anct clea
This 1s m ee ' . rer
than in n-space. . , d Th required .( n - 1)-space is a subspace of thi
· considere · e
15 . fl · s
if an n-space th t a two-dimens10na1 oor is a subset of
(. the same way a . h a
n-space m . room) Equivalently, all the pomts ave a coordinate of
three-dimenswnalof the n axes,· the same axis for all of them. There is no need to
keep ºº.
zero thisone
fact m. mm . d.' the required zeros emerge as part of the output of the

computations. ri1 · f · t PCA b


lt could be argued that PCO is necessa . " y m ,,enor b o ecause
· in
h
PCA eac pom 1 · t ·5 placed exactly where
. 1t
. ought to e, whereas m Peo
.
each point is so placed that interpomt d1stance.s are as closel_Y as possible
(but seldom exactly) equal to interquadrat. d1stances. ~rov1ded the ap-
roximation is close, the imprecision of PCO is of no practica! consequence.
~ith both PCO and PCA, the final step consists in projecting the swarm to
be examined onto a visualizable space of two or three dimensions, and far
more information is usually lost in this reduction of dimensionality than in
slight "misplacings" of the points in the n-dimensional swarm. However,
PCO is not suitable for ali bodies of data. We consider later (after
describing the method) how to judge when PCO is appropriate.
Now fer_ the method. Any metric measure of dissimilarity may be used.
The ob¡ect is to find coordinates for n points in n-space such that d(j, k),
the d1stance between pom· t · d k h .
0(1,. k ), theu. d1ssunilarity
. . shJ an , s hali be as nearly equal as .poss1ble to
F . the argume, t owever
or clanty ( hi h we
. ave chosen to measure it. .
numbered paragraphs
' n w e
After th t h is rather . long) is given in the followmg
.
reiterated m· rec1pe ª'
· f.orm R d t e operations . to be performed are bnefiy
delving into the rea . · ea ers who w1sh to try the method before
somng that u d li · d
return to the details later. n er es it should skip to the recipe an

l. As a pr limi
Th e nary, note that ll
erefore, to keep th
left u
ª
e symbols as u 1
sununations are over the range 1 ton.
. .
subscrinstated.
t .
However, lt· is
. very ne
. Uttered as possible ' these limíts are
h
symbol b 1P s vanes each f important
une a sunun · to observe which of the
ow the L. Bear In
"'-,e,1. -_ e e + · oUnd f is done. I t is specified byh t es
. ation
11
c21 + · · · +e . ' or example that a sum suc
nJ (In Whi h ' lues
ª
e r takes the series of va
coORDINATE ANALYSIS
pRl1,..10PAL
167
is not the same as .L .e . = e
2 ... ' !1 ) . J rJ rl + Cr2 + . . . .
J•• tl1e senes of values 1, 2, ... , n ). + crn (m Wruch J
1a
k~ .
The coordmates sough t are to be Writt
2. . h . en as an
. h ach column g1ves t e n coordmates of n >< n matrix e .
ivhJC e . . . one of the . in
be centered. That IS, the ongm of th . Pomts. The points
are to . e coordmat .
'd of the swarm of pomts. Equivalently " es is to be at the
centro1 2 2 . 'LiJc, . =O f ll
3· The distance , d (J, k), between the 1
·th a Jd k or .ª r.
n th pom ts is
d2(J,k)=[(c.-c )2
r
rJ rk ·

Here. r denotes the r tb row of matrix C. Each row of e .


. . corresponds with an
ovis lD the n-space that IS to COnta.m the SWarm of poínts b t ( '
ClJ'.' • u m contrast to
pCA) the axes do not represent species.
4. It follows that

2
d (J, k) = L (e;+ c;k - 2c,Jcrk)
r

= L:c;
r
+ L:c;k -
r
2LcrJcrk·

5. Next consider the n x n matrix, say A, formed by premultiplying C


by its transpose C'. 1t is

C11
c12
C21
c22 ••·
Cnl )
Cn2 r
C11
C21
C12
C22
. . . C111 )
C211
A=C'C = .... ............ . .............c..
e2n ···
reln Cnn C l n C11 2 •' . 1111

I:c;1 L c,¡c,2 LCr1Crn


r r
LCr2Crn
_Lc, 2c,1 .Lc?2
r
r r
. . . . . ... .. .. ..... .......
_Lc,ncr2 _Lc; 11
LCrnCrl
r r
fust find the
Ais oh . . paragraphs we d
el v1ously symmetrical. In the followmg . . hi h are calculate
· · ·1anues w e
rement8 of A from the interquadrat dissinu fA
f e trom those 0 ·
ro1n th fi d h eiements o
e eld observations. W e then fin t e
168
h 5 that if we write alk for thc ( k
lt follows from paragrap J' Jth
6· A we have
element of '
alk = L CrlCrk;
r

ltemative formula for d2(j, k) is


Therefore, an a
d2(J, k) = ª11 + akk - 2alk. (4.4)

7. Notice, for later use, that L1ª1k = O. This follows from the fact that

since the sum E¡crJ is zero (see paragraph 2). Because A is symmetric, it is
also true that l,kaJk = O.
8. We now wish to find a1k as a function of d 2 (j, k). Rearranging
Equation (4.4) shows that

(4.5)

9. We now find ªJJ and akk as functions of d 2 (j, k ). To do this, sum


every term in (4.4) over all values of j, from 1 to n. Thus

Ld
.
2
(i,k) = I:a
. n
.. +°"a
~ kk
-2°"a
~ lk·
J J j j

Put "[,1 ª . = x· note that t Jªkk =


L:1.a1k. == ó. Th,ere f ore,
1
nakk, and recall from paragraph 7 tbat

td2(J,k)=x+
j nakk whence akk = ni( Ld2(J, k) - )
X •
1
Likewise summi
, ng every term . ( ht
m 4.4) over ali values of k, it is seen t ª
L d 2 (J' k) = na .. + )
k 11 x Whence ajj = ~ ( L d 2 (j , k) - x .
Observe that " k
'--kªkk = t lall' Which
.
we have already denoted by x.
cooRDINAI t ANAL ni~
..,0 rAL
pRl ~1

169
. n (4.5) thus becomes
BquatlO

~!{-d 2 (J,k) +¿(Ld 2


(J,k)-x) +.!.("
01i 2 J n '-/2(J,k)-x))
~ -21 d2( ),. k) + l_ "d2( .
2n Í-J
1
J,k) +-¿d 2 ( . k) X
J 2n k J' - ; (4.6)

d it remains to express x as a function of d 2( . k)


an h 6. J, .
IO. In paragrap it was shown that

Q ..
}} =" 2
Í-J crj.
r

Therefore,
Lª11 =
J
Lj LC~· ·
r
(4.7)

Recall from paragraph 1 that the centroid of the n points whose coordinates
\\'e seek is to be at the origin. Therefore, L,/0 is the square of the distance of
the jth point from the origin, and L,¡LrC~· is the sum of the squares of the
distances of ali n points from tbe origin. This latter sum is equal* to (1/n)
times the sum of the squares of the distances between every pair of points.
Tbat is,
2
" " c 2. =
~~ 0
.!n "~ d 2 (J k) =
' 2n I:. l:d (J, k ).
2-
j r j<k J k

· h ll be considered only
(Th e form L. specifies that each pair of pomts s
1<k . d dz(k ·) which is the same, ª . .
once. The form L: ·L specifies that d 2 (J, k) an '1
shall both enter th/sum; hence the 2 in the denonúnator.)
11. Equation (4.6) now becomes

a. = _ _!d2(J k) +_!_~d2(J,k)
1k 2 ' 2n J

_!_ '°'d2(1' k) - ~ '[ '[d2(J, k).


+ 2n ~ k
' 2n J· k
. je!OU (1977), p. 320.
ltii~ fact w . roof is found in p
as used in another context earher. A P
170 . 2
, t d2( . /<) (th~ d1stanc.,c bc.:,t ..vecn fJ<,Jnt .
·t'pul at(; tJta ·1' . 1 '2( k Ja,,
12 w e noWS 1 ' 1 C'• ual atl po~f,Jb<.I) t<J () /; ) Wcthc f,
. • 1 • • i11al (or as nca r y '1 ie '
/1) IS lo ') (.i l:L1

pul
i_s 2( .,·, k> ' 2n
i i./>2U, k >
2 1

+ _J_ ~0 2 () , k)
2n k

. es of 0 2( 1· k) have been ca1cu1ated for every p&n (f


Thc numenca1 va1u ' . . '
. . d. , Hence ali the elements of A can be gJ ven numenca] ·i&Juer..
pomts J an 1<.. •·
It remains to fi.nd the elements of C.
13. Since A is a square symmetric matrix, we know that

A = U'AU,

where A is the diagonal matrix whose nonzero elements are the eigenvalues
of A, and U is the orthogonal matrix whose rows are the correspondmg
eigenvectors of A; U' is the transpose of U.
Since A is a diagonal matrix, it can be replaced by the product A112A11:
in which the nonzero elements are · A. 1(2, A.1{ 2, ... , A.1t2.
Thus

A= (U'A1/2)(A1/2U) .

Each of these factors is the transpose of the other.


Now recall (from paragraph 1) that, by definition,

A= C'C.
Therefore
'

U'A112
-- e, and A112u = e
14. To find th .
A e elements of C · of
. Then the first princi al ' .we therefore carry out an eigenanalys1s
e~ements in the first r p coordmates of the data points which are tbe
e1g ow of e ' first
\l /~nvector of A). The , are obtained from A.1/ 2u' (u' is the
/\2 U'2 and second p . . 1 1 i . d fro.rJJ
' so on. lf as is nncipal coordinates are obtaine .
' very oft h d. auoil
en t e case. a two-climensional or 10
AL coORDINATE ANALYSIS
pfllNCIP
171

¡5 wan '
ted only the first two rows of
. . e need be
dinates of the n pomts m two-space . evaluated· they .
coor . Wlth the . ' give the
distance between every pa.tr of points a . po1nts so arranged th
t11e . . al pproximat at
tbeir dissi.mJ.lanty as e cu1ated at the outset. es as closely as possible

To reiterate,. without explanations: the operations


. .
are the followmg. requ1red to do a PCO

l. Calculate. the
. . dissimilarity
. between every pau. of d
some chosen d1sslIIlllanty measure. Denote by 8 . k qua ~a~s, using
between quadrats j and k. Put the squares of th ese(J, d1ss
) .the·¡ d1ss11nilarity
··
elements of an n X n matrix A. irm anties as the
2. Find the elements of the n X n symmetric ·
(4.8) which is matnx A from Equation

'f} 2(J, k) is the sum of the elements in the jth row of A;


'f.k8 2(J, k) is the sum of the elements in the k th column of A;
'f.l,k8 2 (J, k) is the sum of all the elements in A.

Amore compact formula for determining A from A is given in Exercise 4.6.


3. Do an eigenanalysis of A.
4. For a two-dimensional PCO, calculate the first two rows of C which
are

and

liere ;\1 and A are the first and second eigenvalues of A; (un u12 ...
uln) and (u21 u222 . . . u2n) are the respective eigenvectors.
172

.nts. Their coordina tes are (e )


5. Plot the Pol n' c21 , (e i2,c
22 ).,
( C111• Cz ,,).

We now consider an example.

XAMPLE. A simple example }s shown in Table 4.5 and Figure 41


E . # 12 . h h · O. The
2 x 5 data matrix, Data Matnx ' is s own at t e top of the tabJ
· . . e. As
always, its (i , j)th eleme~t deno t es th e amo~n t of spec1es z m quadrat J, bu¡
these elements are not, m PCO, the . coordmates of the. data . points to be
ordinated. The quadrats have labeling numbers 1, ... , 5 (m Italics) above the

TABLE4.5. AN EXAMPLE OF PRINCIPAL COORDINATEANALYSIS


(SEE FIGURE 4.10).

2 1 3 4 5
2~ ) is Data Matrix # 12.
X=(¡ 9
10 14
8 15
8
The matrix of squared dissimilarities is
o 100 169 196 529
o 25 64 225
tl = o 169 400
o 81
Matrix A whose ele . O
ments are given by Equation (4.8) is
120.48 12.48 12.88 - 25.92
-119.92
A= 4.48 26.88 - 17.92
-25.92
74.28 - 35.52
- 78.52 .
23 .68
55.68
An eigenanalysis of A h 168.68
'
/\¡ , A , A A A _
s ows that its eigenvalues
· (to two decimal places) are
2
The first two ~¡ge4 , s - 310.77, 85.78, 4.50 O - 9 45
nvectors are ' ' . .

u} : ( -o. 517 -0.152 -O 3111 O233 o. 747 )


Ben u 2 - ( - O629 · · ).
ce the first t
Worows f · 0.169 O.721 -O .234 -o.oz7
o e = A112u . d' tes. are
cí) 'which are the required coor 1na
(
1 :::::( - 9.117 2 13176)
C2 - 5.823 - .675 - 5.485 4.101 0'254 .
l.567 6.681 - 2.171 - .
L coORDINATE ANALYSIS
rRH.JCIPA
173
Axis 2
10

.
3

03

2
2~
-15 -10

Qf
-5
4o
.
4
5
15
Axis 1

-5

- 10

Figure 4.10. The solid dots are the data points (projected 0 t t
of data matrix X (Data Matrix # 12) in Table 4.5. Each np~n;~~sfaªbcel) dyiel~ehd by a PCO
· th 1 f X th t · e e w1t a number
deoo!Jilg . e co umn o a represents lt. The hollow dots show the sam
ed
uns andardiz , centered PCA · e data after
t

respective columns of X, and these are used to label the points in Figure
4.10.
It is now necessary to choose a dissimilarity measure for measuring the
illssimilarity between every pair of quadrats. Let us use the city-block
distance CD (page 45). Then the dissimilarity between quadrats 3 and 5, for
instance, is
CD(3, 5) = jx13 - x 15 j + jx 23 - x 25 1 = 18 - 231+114 - 91

= 15 + 5 = 20.
The dissimilarity between every pair of quadrats is measured in this ~ay,
5
and the squared dissimilarities are the elements of the 5 X matnx !:::..f
are o · a· al of !:::..
shown in Table 4.5; all the elements on the mam iagon '
course, zero since CD(j, j) = O for all J. . (4 8) For exam-
Next, the elements of A are determined from Equatwn · ·
ple,
- -=.Af)Q + 763 + .ill2 - 3~56 = - 78 .52.
a 35 - 2 10 10
·ght halves are
Sine . · ly their upper n
w. e matnces A and A are symmetnc, on
rttten out. ble and also its first
A·18 · nin the ta
t\Vo .
then analyzed • Its eigenvalues are give
eigenvectors.
\ ·. 11 yaliv . Tliis i111pli '8 Lliat it i:) ¡,
1h:at ",, IH i' ' 'l!iq~~.il,¡
11 will h ~L ~ 11,
. e11 1· J
. , . "1 spa<.; of u11y lllllllr)t-r o t un t1:-i1011)
·11 1·111g f¡yi.; 1
• JOl'ldS 111 ,1 1 C.I '
. .
1 ll 1lrlYl!
. . .1
CXr1Cl y lJi(.; Y'il
' 111 '• 11 \;'l1:1
' ' . JHll WISC (
1'11.' i'llKC:S
,1 ' ' '
:-;fia
• 'lJ1.;~, 1

w· iy thal 1lu;n 1 . .1 Slll'lllci i11 iabsolutc mag111tuuc U , n A 11 4.


' : ne A' is mue ' • ' . . 11 1~JJ(j ~ ,
1low vc1, si . 1 • 1 caii 1-:afdy he 1gH<>l(..:;d. Wc a1e, 1t1 a11 y (.;¿
. tio1111111mLICli< • 'f l~L,,)J,
the d1stor . llw f)()Í1ILS JJJ lwo-spéHA..... o use <., ll Y
1 · · )fl~Sml ti 11g . 0101 1a11 t 1
intcrcste<. m "I . . 1 cxamplc ( 111 wh1d1 thcrc are 011ly lwo w,,
. • : · 111 thc P' usen . · · ' P01..1c
d111wnswns . . of an ordinal1on, wh1ch is to reduce lhe . \)
ld lcf "1 l thc puipo:-;c
<.;,¿Jr¡ ~a~il
(J111cr
1 1
wou <. .'
sionality ol t he tsp .iye. <l '. l· d dala. ,.
With
,
only lwo spcc1c.;,s, onc y Jll1Jt
l . t· X in two c.lrmumaons.
thc raw e''
Th~ ftrsl. " lWO pi incip·ll e coord111ates· o f ti·1c• eIé.I· t·c1 poin · t 8,· wJ11c.:h

·
are IC
11
dcmen ts of l 1w, 11rs·t two rows of C, é.lrc then c€Jlculatcd 1rom

These points are shown as the solid dots in Figure 4.1 O. For comparison, thc
results of carrying out an unstandardized, centered PCA on thc samc data
are also shown (by hollow dots) on the same figure.
It is interesting to compare the desired interpoint distances (the sq uares
of these distances are the elements of A) and the actual interpoint distances
in the two-dimensional ordination yielded by PCO. The two matrices to be
compared are shown in Table 4.6.

TABLE4.6. A COMPARISON OF (1) INTERQUADRAT DISSIMILARITJES


(CITY-BLOCK MEASURE) AND (2) INTERPOINT DISTANCES FOLLOWJNG
THE TWO-DJMENSIONAL Peo SHOWN IN TABLE 4.5 AND FIGURE4.JO.
---=--------------------~~~~~----
Interquadrat Dissimilaritiesª

---
1

o Interpoint Distances'
10 13 14 23
o 5 8 15 o 9.8 13.0 13.7 23.0
o 13 20 o 5.8 7.7 16.0
o 9 o 13.0 19.9
o o 9.3
o
:?he elemcnts of A in Table 4 5
2Computcd.
5 using Pythagoras' · thare the sq uarcs of lhcsc dissi mil ari 1i c.<. ven in Wi
X matrix at the botto 0 fsT eorem, from thc coordinates of thc points which are gi
m abJe 4.5.
AL coORDINATE ANALYSIS
p~1NCIP
175

, can be seen, the discrepancics b~tw~en d , .·


A~
ight. Such as t11ey are, th~y result r cs1rc<l '·md act ua, 1 d"1stances
:J.fe sl ro111 two ca , F" .
. ossible to plot the points with exactly lhc d ... · , ~ses. "irst, 1t js
unP · cstrcd d1stances b l
ry pair of them 111 any real space whatc . · · e ween
cve . . vei' no 111atter how m·
~:,,1 ensions the space has, the fact that matrix A 1 . . . any
u.u• d . d fi . . . . 1as a negat1ve e1genvaJue
, ows that the esire con 1gur..lt1on is 1mpossible s d
~11 . • econ , the "best" PCO
representat10n of the data would be obtained by using as
.· · 1 · many axes as there
are pos1t1ve eigenva ues of A, 111 the present example th (
, . . , ree note that
f.. ==o and hence c4 = O). ProJectmg the "best" configurat ·
4 • • • • 10 n on to a space
of fewer dnnens1~ns (m the pres~nt case, two d1mensions) is a further cause
of the discrepanc1es between desued and actual distances.
The problem now arises: how close an approximation between the ds
(the interpoint distances) and the 8s (the interquadrat dissimilarities) is
"good enough"? As we have seen, if sorne of the eigenvalues of A are
negative, it follows that a perfect configuration of the points (i.e., one in
which d(j, k) = 8(}, k) for ali (j, k) is unattainable. According to Gower
and Digby (1981), "there will be difficulties if [the negative eigenvalues]
domínate the positive eigenvalues." The decision as to whether the dis-
crepancies introduced by negative eigenvalues are large enough to affect the
interpretation of a PCO ordination is inevitably somewhat subjective.
Unless the negative eigenvalues are of very small absolute value (say less
than A2/10 when one is doing a two-dimensional ordination), it is probably
worthwhile to compute a pair of matrices such as the pair in Table 4.6,
which permit the ds and os to be directly compared. .
No research appears to have been done th at ffilg · ht answer the quest10n:
. d ?
Are some dissimilarity measures better than others 1·r a PCO 1s to be one.
The problem deserves investigation. . . of PCO is found in
A good example of the practica! applicatwn d h mpling units
Kempton (1981). The organisms stud1e · d were moths .an t e saplaced at 14
(" . b k) were hght traps
quadrats" in the terminology of this 00 d with discover-
locaf1 . t was concerne
. ons throughout Great Britam. Kemp on t·ons based on one
lllg h 1
w ether an ordination (by PCO) of th ese 14 oca 1 r after year. The
seaso ' . hly the same yea .
ns moth collections remamed roug f ig term valid1ty, or
1
~Uestion is: Is an ordination of moth communities oh o1 na-lysis of a single
is th that t e ª
,ere so much variation from year to year . 1 Iy is an important,
.
Year s b . ? This e ear , " e
th o servations is virtually mearung1ess. ' d that there was soro
ough
e . rarely pondered problem. emp K ton f oun .
. . consecuuve yea ' rs a
ons1st ,, ' . b . ed ll1 s1x
ency among the ordinations o tain
176

biogeographers who wish to ordinate ge


· g resu1t for . . . h . ograph¡
fairly reassunn. unity compos1t10n m s ort-lived organis e
. the basis of comm ms.
s1tes on

5 RECIPROCAL AVERAG ING,


~R CORRESPONDENCE ANALYSIS

. ·ng and correspondence analysis are alternative names f


Reciproca1 averag1 . . or
the same technique, one that is deservedly popular for ordmatmg ecologicat
data. lt is commonly known by the acron.ym RA (Gauch, 1982a). It is ye¡
another version of PCA (besides those d1scussed ~n page 152, and many
others) and, as such, might seem to have n.o cla1m to special mention.
However, as we shall see, it has one great ment shared by no other version
of PCA.
Thus consider one-dimensional ordination. Recall (page 83) that, by one
definition, a one-dimensional ordination consists in assigning a score to
each quadrat so that the quadrats can be ordered ("ordinated") along a
single axis according to these seores. Each quadrat's score is the weighted
sum of the species-quantities it contains. What differentiates one ordination
technique from another is the system used for assigning weights to the
species.
In RA the quadrats and the species are ordinated simultaneously. Seores
are ~s~igned to each quadrat and to each species in such a way as to
ma~e the correlation between quadrat seores and species seores (as
explamed later).
In the discussion ' web egm · . RA as merely another versio· n
· b Y cons1denng
' is s own how the same result (a scoring system for
of PCA. Af ter that 1·t · h
.
both spec1es and q d ) · l
averaPi ,, ua rats can be obtained by the so-called "reciproca
o-'ng procedure.

RA as a Form of PCA

It was pomted
· out li f
four different way eüar er that an "ordinary" PCA can be done in one o
s. ne choo fi Jeave
them uncentered Th . ses rst whether to center the data or 5
whether to standa a· en, mdependently of this first choice, one choose_
da a· r ize the dat
r IZe them, the eleme t .
bYthe st d
ª or leave them unstandardized (to. staJ1
·ded
n s m each · d1v1
an ard deviation f row of the raw data matnx are d tbe
data mat ·
nx may be left a11 the e¡ements in the row). In other wor s.daf'
0

un transfo d . staJI
rme or it may be centered, or
CAL AVERAGING, OR CORRESPONDENCE
RfCfpRO ANAL YSIS
..,..,

Or both centered and standardized Whi h '/ /

di ed ·
zh 'sen the next steps are the same: the dat
15· e o '
~ t e four po 'ibiliti1.:s'
e eve 0 f h
a matnx (
0
not) is postmultiplied by its transpose and the w1lether tr~ns~ormed
: analyzed (for examples, see Table 4 product matn · 1 then
e1gen . " . ·4, page 153) Th .
Onsists of a list of we1ghts" to be att h · en each e1genve1:-
wr e . ac ed to ea h .
" ores" (which are we1ghted sums of species . . e pec1e so that
se quantit1es) e b
for each quadrat. The seores are the coordinates of h ~n ~ computed
tbe or dm . ation · t e pornts m a plot of
RA differs from the four versions of PCA alread a· .
which the data matnx . is
. transformed before the eige Y iscussed
al .
111 the way in
.
. . nan ys1s, a.nd m the
way in which the e1genvectors are transformed into seo f h .
analysis. We cons1der . these two procedures in turn Theyres a dter t e e1gen-
. · are emonstrated
in Tables 4.7 and 4.8, which show the RA ordination of a 3 x 5 matrix
(Data Matrix # 13). The reasons for the operations will not become clear
until we attain the same result by "reciproca! averaging." Here they are
presented in recipe form, without explanation.
Since in RA seores are assigned both to quadrats and to species, the
procedure yields an R-type and a Q-type ordination simultaneously. (Recall
that an R-type analysis gives an ordination of quadrats and a Q-type
analysis an ordination of species.) In the following account we first consider
the Q-type part of the analysis which gives the species seores (Table 4.7),
and then the R-type part of the analysis, which gives the quadrat seores
(Table 4.8).
The data are not centered and they are transformed as follows. Each
element in the data matrix is divided by the square root of its row total and
by the square root of its column total. . d
As always, let the number of spec1es . (the num ber of rows rn the . ata X)
m· ( h ber of colurnns rn
atnx X) be s, and the number of quadrats t e num
ben.

Let r; = .;, x 11.. be the total of the ith row of X;


L.,.¡
j=l

s . h olurnn of X;
let c.=
J
'\""' x 1). . be the total of the Jt e
L.,.¡
i=l

s n n ~ be the arand total.


let N = '\""'
L.,.¡
'\""' x . .
L.,.¡ lj
= L..,¿
" eJ. = '=1
'-- r; l::J

i=l j=l )=l I


TABLE4.7. RA ORDINATION OF DATA MATRIX #13; THE EIGENANALYSIS GIVING THE SPEClES-SCORES.

The 3 X 5 data matrix X witb row and column totals shown is


15 2 o 2 1 20
x- 6 15 o o 30
- 19 7 5 8 29 50 .
25 15 20 10 30 100

1//25 o o o o

r: o
o
1//30

( 0.67082
l/~
o ) (15

0.11547
i
o
6
7
2
15
5
o

0.14142
2
o
8 2~)
0.04082)
o
o
o
o
1/fü
o
o
o
o
l/fiü
o
o
o
o
l/v'lO
o
o
o
o
1//30

= 0.32863 0.28284 0.61237 o o .


0.02828 0.25560 0.15811 0.35777 0.74878

( 0.48500 0.25311 0.12965)


P = MM' = 0.25311 0.56300 0.17842 .
0.12965 0.17842 0.77980
Eigenanalysis of P gives
o 0.44721 0.54772 0.70711)
0.56056 o ). u= ( 0.49281 0.50885 -0.70584 .
o 0.26715 ' 0.74641 -0.66413 0.04236
The matrix of species seores is

-0~998).
1
1 2
V= /NUR - / = ( i.i02 0.929
1.669 -1.213 0.060
ROCAL AVERAGING, OR CORRESPON
RfCIP DENCE i'\N
ALYs1s
179
tbe (i, j)th element of the tran f
fbell s orrnect m .
atnx, say M .
X¡) , lS
m· · = - -
11 ¡r;c; with i = 1
,... ,s and i :::: 1
, . .. , n.

The whole transformation


. can be eompactly ·
Rdenote the s X s diagonal matrix wh wntten in matrix f
. h ose nonzero 1 orm. Let
totals of X. Thus m t e example in Table 4.7, e ements are th e row

o o
R =
( r¡
~ '2
o)
o = (20o 30
o r3 O o sH
Next, note that

'1
-1/2 o 1//20 o o
R-1;2 = o '2
-1/2
oo 1- o 1//30 o
o o ,3-1/2 o o 1//50
(The reader should confinn, by matrix multiplication, that R- 112 R- 112 =

R-1, and then that R- 1 R = RR- 1 = 1, the identity matrix.)


The n X n diagonal matrix c- 1/ 2 is obtained from the column totals of
Xin the same way; its nonzero elements are the reciprocals of the square
roots of the column totals.
lf we now wri te
(4.9)
M = R-112xc-112
ad . d b h quation it is easily
n carry out the matrix multiplication spec1fie Yt e e '
seen that m the (i 1·)1h element of M, has the required value. M' Call
W IJ' ' . . d b its transpose .
e now find the product of M postmuluplie Y
the Product, which is an s X s matrix, P. Thus
(4.10)
P=MM'.
pis i enanalysis is performed
in the matrix that must now be analyzed. The e g 1..," A (whose nonzero
el
the u a· onal rnat rix V (whose rows are
sual way and yields an s X s iag
.lft

ernents , d s X s rnat
are the eigenvalues of P) an an
180

the corresponding eigenvectors of P). The results for the numerical


. examp¡
are shown in Table 4.7. 1t w11l be seen that A1, the largest eigenvalue of p .e
unity. This is always the case, and the explanation is given subsequent) , is
It remains to derive the species seores. This is done by postmultiply~·
by a diagonal matrix whose jth element is i/N
/'j. gU
Denoting by V the s X s matrix whose rows are the sets of species seores
we therefore have, when s = 3, ,

JN /r1 O o
V= U O JN/r2 O = /NUR-112. (4.11)
o o JN/r3

(The reader should check that postmultiplying U by a diagonal matrix has


the effect of multiplying each element in the jth column of U by the jth
element of the diagonal matrix.)
The rows of V are the required vectors of species seores. It will be seen
that the first row of V, corresponding to the largest eigenvalue, A1 = 1, is
(1, 1, 1). This result is true in general. That is, for any s, the largest
eigenvalue is always A1 = l. The s-element row vector of species seores
112
corresponding to this eigenvalue, which is the first row of V = IN UR- ,
is always (1, 1, ... , 1). It is a "trivial" result, and the reason why it is
invariably obtained becomes clear subsequently, when we use the reciproca!
averaging procedure to do the same RA ordination. Only the rows of V after
the first are of interest.
N ow for the R-type part of the analysis, which gives the quadrat seores.
Recall that for the Q-type part we analyzed the s x s matrix P, where

p =MM' and M = R-112xc-112.


Notice that
M' = c-1;2X'R-112.

(1 t should be recalled that the transpose of the product of two matn


·ces is
h . . d r see
t e product of thelf respective transposes multiplied in the reverse or e.'
Exercise 3.9, page 131. And it should also be noticed that transposlflg a
diagonal matrix leaves it unaltered.)
Thus, written out in full
'
(4JZ)
L AVERAGING, OR CORRESPONDENCE
JPRoCA ANALYSIS
~f C
181

. ow c}ear, from considerations of s


rt 1s nwe must analyze the n X n matrix YIIUnetry ' that to ordinate th
qtiadra ts e
Q = (c-112x,R - 112 )(R -112
xc - 112)
. ( 4.13)
surning that s < n, we find that Q has only .
As .h h . s nonzero e 1
e identical wit t e e1genvalues of p (see igenva ues and
u1eY ar . page 127).
Table 4.8 demonstrates the analys1s using Data M .
. . 5 atnx # 13 As 1
which in this case is a X 5 matrix) has the eige .· a ways,
V( . nvectors as its rows Th
X 5 matnx W, whose rows are the quadrat-scores, is · e
5
w = muc-112.
' (4.14)
this equation should be compared with (4.11 ).

TABLE4.8. RA ORDINATION OF DATA MATRIX #13'


THE EIGENANALYSIS GIVING THE QUADRAT-SCORES.

X' is the transpose of X in Table 4.7.


M' is the transpose of M in Table 4.7.
0.55880 0.17764 0.20572 0.10499 0.04856
0.15867 0.21362 0.10778 0.19610
Q = M'M = 0.40000 0.05657 0.11839
[ 0.14800 0.27366
0.56233
(Q, like P, is symmetrical and the elements below the principal diagonal
have here been omitted.)
Qhas the same nonzero eigenvalues as P, namely,
A1 = l; A2 = 0.56056; A3 = 0.26715.
The matrix of eigenvectors is
0.3162 0.5477
0.5000 0.3873 0.4472 -0.6790
-0.2442
0.6382 0.0273 0.2671 -0.1203
-0.2336
u= -0.5488 0.1757 0.7739
0.8657 -0.4539
- 0.1740 0.1113 0.0420 -0.1359
-0.1906
-0.1061 0.8978 -0.3577
The matrix of quadrat seores is W = ffeuc-1/ 2
1 1 1
1.276 0.070 0.597
- 1.098 0.454 l.730
-0.348 0.287 0.09 4
-0.212 2.318 -0.800
111 .l.
Ax i s 2
20

15

10
o3

02

1
- _¡_ - ~--....!_ Axis 1
1
1
- o~
0 .5 1.0 1.5
I '> 10 40

·~
-o5
.4
-10 .,
O!

.
l<tgurc . . Tfic, ~o ¡·i 0 d<>ls ,show the outcome of RA ordination of
411 . Data Matrix #13 (Tables

4 7 arid 4 H). ThL hollow dob show t~e same data after unstandardized, centered PCA; they are
plottcr.l in thl' planc of PCA axes l and 2.

As in the ordination of the speeies, the first set of seores (the first row of
W) consist~ of ones and is of no interest. The seores on the first and second
RA axes are given by the elements of the seeond and third rows of W. Using
thcsc seores as coordinates, the five points representing the quadrats give the
two-dimensional RA ordination shown in Figure 4.11 (solid dots). The
rcsult of doing an unstandardized centered PCA on the same data is shown
for comparison (hollow dots).

The ~orrelation Between Quadrat Seores and


Spec1es Seores

The analyses just describ d h . . (the


rows of V in Tabl 4 e ave prov1ded sets of seores for the species .
7
Tahlc 4.8) s e · ) and sets of seores for the quadrats (the rows of Win
· upposc the spec· . . of V,
and thc quadrats th ies are ass1gned the seores in the k th row .
TahJc 4.9 the . e seores in the k th row of W Then as demonstrated in
' square of th · , saY
e correlation coeffieient between these seores,
L AVERAGING, OR CORRESPONDENCE
IPRoCA ANAL Ys1s
Rf C
183
. ual to 'A", the kth eigenvalue of p
1 1 s eq anct Q T .
. ~ l, ... , s. .
r,. ·
. . · his holds t rue for
k correlauon coeffic1ent r k is calculated f
The rom the formula
1 s n
rk = N .L L
1=1 J=l
X;Juk;wk ..
J
(4.15)

Here xi;. (from the dataf matrix X) is treated h .


. .. as t ough lt were " h
freq uency of occurrence o spec1es r m quadrat . ,, S . t e
. J· ometunes the l
f Xare indeed frequenc1es; even when they aren t h e ernents
o d h . o ' owever, for exarnple
even when they recor t e b10masses of the species in th d
. e qua rats, they are
treated as frequenc1es for t~e purp~se of the present calculations. The term
v is the k th seo re of the z th spec1es; likewise w . is the k th
ki ' k1 score of the
jtb quadrat.
Table 4.9 shows the computation of r2 • As may be seen, r22 = A . The
2
reader is invited to check that r 32 = 'A 3 (see Exercise 4.8).
It is now obvious why the trivial result A1 = 1, with v1 = (1, 1, ... , 1) and
w1 = (1,1, ... , 1), is always obtained when P and Q are analyzed. If ali the
species and ali the quadrats are assigned a score of unity and, equivalently,
if we put v1¡ = 1 and w11 = 1 for ali i and j, then the right side of (4.15)

TABLE 4.9. THE CORRELATION BETWEEN MATCHED SETS OF SPECIES


SCORESAND QUADRAT SCORES FOR DATA MATRIX #13.

Computation of r2 . ª
Species
Seores

15 2 o o
2 1)
o
1.102
0.929
X= ( i ~ l~ 8 29 -0.998

Quadrat 1.276 0.070 0.597 -0.772 -J.240


r "" '
2
Seores . 29( - O. 998)( -1.240)}
100{15(1.102)(1.276) + 2(1.102)(0.070) + . . +
: : 0.7487
' ':::: 0.561 ~
2
- A2. . f iJ1 the second
''l- (' 'ta11cs) ro d
io lhe . f species seores in J s' trom tbe secon
r011, of yllgbt
. of h . . h ond
t e data matnx is t e .sec .
set o ond set of qua drat score ,
row of w 1 ~ Table 4.7. Below it (in italics) 15 tbe sec
in Table 4.8.
184

beco mes
r
i
= _!_ L LX;¡·
N . .
I J

. that r = 1 since, by definition, N = t;t ·X


utomatica11Y RA d. . 1 iJ· 1'
It follows ª ..
this tnvial resu is
I
lt . discarded when an
.
or mat10n is done
Th . .
0

repeat, . th repeating is the followmg. e spec1es seores


A ther pom t wor . . h and
no f d by RA are such as to maxuruze t e correlat'
d at seores oun f hi b k . ion
qua r them. Th e Proof is beyond the scope o t s oo ; it may be found
between
in Anderberg (1973, P· 215 )·

The Reciproca! Averaging Technique

RA ordination can also be done by "reciproca! averaging." In outline the


procedure is as follows. First, arbitrary trial .values are chosen for the species
seores. Next, a first set of quadrat seores is computed from these species
seores. Then a second set of species seores is computed from the first set of
quadrat seores, then a second set of quadrat seores from the second set
of species seores. And so on, back and forth reciprocally, until the vectors of
seores maintain constant relative proportions.
Table 4.10 illustrates the procedure numerically, using Data Matrix #13
again. At every stage, each quadrat score is the weighted average of the
last-derived species seores (and vice versa). In computing these averages, the
speeies seores being averaged are weighted by the amounts of the species in
the quadrat (and mutatis mutandis when quadrat seores are averaged).
1
In symbols
0
let vCº\ v< >, ... denote the successive vectors of speeies seores,
and let w< ) wC )
1 d h p·
' ' · · · enote t e successive vectors of quadrat seores. rrst,
values for the elements of vCO) are chosen. It is convenient to use per·
centages, ranging from O for the lowest score to 100 for the highest. In the
example the ehosen seores are

vCO) = ( u~O)' v~O)' u~O)) = (100, 50, O).


0The elements of wCO) ar . . t of
w< ) is e now computed. For mstance, the 1th elemen

(0) - [
wj - X vCO) o
Wh . lJ 1 + x 2J·V~)+ ... +x Sj.uCº)]¡c
s J
. (4.16)
ere cJ is the j th colu
mn total of the data matrix.
185

v<I>(%)
o v<2\%)
2 2 1 iOO V(%)
15 64.0 (lOO)
9 6 15 o o 50 48.8 (68.9) 69.9 (100) 76.9 (100)
7 5 8 29 o 15.J (O) 59.5 (80.l) 72.J (91.8)
17. 7 (O)
33.3 37.5 20.0 3.3 20.9 (O)
40.9 51.7 20.0 3.3
45.3 60.l 20.0 3.3
........ .... ..

93.0 50.0 68.8 20.0 3.3


) (100) (52.1) (73.0) (18.6) (O)

V o
1
91¡8 190
Row 2 of V - 0.998
0.929 1.102
w o 18.6
1
52.1
1
73.0 100
Row 2 of W - 1.240 0.772 0.070 0.597 1.276
The. data matrix is· shown above and to the left of the doubl e lin e. Success1ve
. a · ·
eCies seores are m the columns on the right, labeled v<º) v<l) S . ppro~mat~ons to the
uadrat-scores are in the rows below Iabeled w<O) w<l) ' '· · · · uccessive approximat10ns to the
' ' , ....

Thus in the numerical example

wfº> = ( (15 X 100) + (9 X 50) + (1 X 0)] /25 = 78.0.


1
When the n elements of w<º) have been found, v< ) is computed. lts ith
element 01CI> is
v<i 1) = [x i1w<º)
1
+ xi2 w<º)
2
+ · · · +x.rn w<º)]/r¡,
n
(4.17)

where r; is the i th row total of the data matrix.


Thus m · the numerical example

v\1> = ( (15 X 78.0) + (2 X 33.3) + (0 X 37.5)


64 0
+ (2 X 20.0) + (1 X 3.3)]/20 ~ ·
1'L f they are used to
co elements of v <1) are rescaled to percentages be ore 1 d but it is not
lHe
rnpute wCl) Id also be resca e
nece · (The successive w vectors cou
ssary.)
186

The procedure is continued until the vectors stab~ze (i.e., until any
o-ive unchanged results). The final results m the exampl
f ur th er stePs er . . . e are
hown by the column on the extreme nght m Table 4.10 (which gives the
final species seores as percentages) and the row at the bottom (which giv
the final quadrat seores as percentages). As is ~hown in the lower part of t~:
table, these seores are the same (apart from bemg rescaled as percentages) as
row 2 of v (in Table 4.7) and row 2 of W (in Table 4.8). Thus they are the
required seores for, respectively, a one-dimensional ordination of the species,
and a one-dimensional ordination of the quadrats.
The seores on the second RA axes (i.e., the third row of V and the third
row of W) can be obtained by a similar, though computationally more
laborious procedure. It is not described here. Details are given by Hill
(1973). The reader should confirm that if v<0) = (1 , 1, 1), then wC0J ==
(l. L 1, 1, 1). This is the trivial result mentioned on page 181.
We now show the equivalence between the reciproca} averaging proce-
dure just described and the outcomes of the eigenanalyses of matrices P and
Q in Equations (4.10) and (4.13).
Suppose reciproca! averaging has been continued back and forth (rescal-
ing the species seores as percentages each time) until stability has been
reached. Then we can rewrite Equations (4.17) and (4.16) (in that order),
dropping the superscripts in parentheses. Thus (4.17) becomes

for i = 1, ... , s; (4.18)


(4.16) becomes

wj = (x1jV1 + X2 }·D2 + · · · +x Sj.uS )/ej forj=l, ... ,n. (4.19)

Next let us write these t .


Let R-1 b h . wo equat10ns more compactly in matrix forro.
be the n X a·
e t e s X s diagonal ·
matnx whose ith element is l/r¡; let
c- 1
n iagonal matrix h . . ·
versions of (4.lS) and . w ose _Jth element is l/c1. The rnat~
are (419
· ), with the size off each matrix shown below it,

V = R-1 X (4.20)
(sXl) (sXs) (sXn) (n~l)
and

w
(n X 1)
=
(
c-1 X' V •
(4.21)
nXn) (nXs) (sXl)
CAL AVERAGING, OR CORRESPONDEN
RforRO CE ANAL Ys1s
187
_'tuting the right side of (4.21) for the .
subsll w in (4.20) gives
V = (R -lX)( c-1X'v)
now operate on (4.22). Notice that . (4.22)
We b . matrices I
mu lt iplied as may e. convement, provided the1r. ordernay. be factored or
ntheses are put m wherever they help to ak is never changed.
pare h . m e the st
der should check t e s1zes of the matrices at eps e1earer. The
rea . every step to b
multiplications are poss1ble. e sure that all
First premultiply both sides of (4.22) by R1;2. Then

R1f :!v = (R1;2 R - 1)(xc-1x')v


(4.23)
= R - 1;2 (xc-1x')(R - 112 R1;2 )v.
(4.24)
Here the interpolated factor (R - 112 R1l 2) is simply a factored form of the
identity matrix and leaves the right side of the equation unchanged. The
reason for interpolating it becomes clear in a moment.
Writing c- 1 = c- 112 c- 112 and rearranging parentheses, we now see
that
R1/ 2v = (R-112xc-112)(c-112x'R-1; 2)(R112v). (4.25)

On substituting from (4.12), this becomes


(4.26)
(Rl/2 V) = P(Rl/2 V).
. . 1 ment column vectors). Now
Both sides of (4.26) are s X 1 matnces (i.e., s-e e Th s transposing the
row vectors. u
transpose both sides to convert t h em t0
left si de gi ves

anct transposing the right side gives


, 1;2 p' === v R1 12 P ·
1

[P(R1;2v)]' = (R.1/2v )'P' = vR . that p === P'.


. etncaiso
Th that Pis synun
e last equality follows from the fact ?)
lience (4.2
'R1/2)P.
(v'R.1;2) = (v . . true for all s
rhis result 1s
lt f 1 tor of P.
o lows that (v'R1;2) is an eigenvec
188

vectors of species seores, that is, for all s rows of V. Hence

VR.112 ex U (4.28)
where U is the s X s matrix whose rows are th e eigenvectors of P. Postmul-
1 2
tiplying both sides of (4.28) by R- 1 shows that

V ex UR - 112 (4.29)

which, apart from the constant of proportionality IN, is identical with


(4.11).
This explains why the species seores can be obtained either by reciproca!
averaging or by eigenanalysis of P; the results are the same. It is left to the
reader (Exercise 4.9) to derive the analogous relation between matrix Qin
Equation (4.13) and the vectors of quadrat seores.

4..6. LINEAR ANO NONLINEAR DATA STRUCTURES

The methods of ordination di.s cussed in this chapter so far (PCA, PCO, and
RA) are ali achieved by projecting an s-dimensional swarm of data points
onto a space of fewer dimensions. In the simplest method (PCA) the
coordinates of the points before projection are the measured quantities of
the s species in each of the n quadrats; centering and standardizing the data
(both optional) merely amoun t to changing the origin and the scale of
measurement, respectively. In PCO and RA, the measurements are adjusted
in a more elabora te fashion (as described in Sections 4.4 and 4.5) before the
swarm is projected onto a space of fewer than s dimensions. But, to repeat.
the final step in all these ordinations consists in projecting a swann of
points onto a line, a plane, or a three-space.
It is obvious that whenever such a projection is done, there is a risk t~at
the original pattern of the swarm will be misinterpreted; this risk is the pnce
that must be paid for a reduction in dimensionality. We now ask wh~ther
projection of the swarm is likely to produce a pattern that is posiuvelY
. 1 d" baS a
~s ea mg. The answer depends on whether the original data swarrn
!mear or non linear structure. ·
Figure 4.12 demonstrates the d1fference.
. · m~
The three-dimens10nal swar . .
th 1h . . 10 rd111a
. e upper pane as a linear structure; if a one or two-dimens10na
hon 0 f th . . . plaJle.
e swarm were done by proJectmg the points onto a line or
Lli,~
.. ic¡\R AND NONLINEAR DATA STRUCTU RES

189

(a)
.. ....,..,.,'
,."',,,.
------
.,-,:

....
,/.
,,~·
""----- ----
--~-/

( b)

Figurethe4.12.
case Linear
hollow(a)dots and r( b). n_onlinear
are the . data swarms (solid dots) in three-space. In eacb
coordinate frame. p OJectwn of the swarm onto the two-dimensional "ftoor" of Lhe

the result
data wouldwould
b be sausfactory.
. .
Sorne of the mformation in the original
anothe e lost, of course, but the positions of the points relative to one
each othrwould
. b e reasonably well preserved m . the sense that p01nts
. clase to
each oth:r m the original three-dimensional swarm would remain clase to
Th ~ m the one or two-dimensional projections.
obvio e spual swarm m · the lower panel has a nonlinear structure. There is·
Pro;e ~Yoo
1 way of orienting a Jine or a plane so that when the swarm is .
ªPprJ cted
· ont 0 it. the relationships of ali tbe points to one another are even

0
~;:mately
nto preserved. For instance, suppose the swarm were projected
e floor of the coordinate frame; it would be found tbat the pomts at
190

each end of the spiral, which are far apart in three-space, would be
.f h d. . 1 . el ose
together in two-space. In d ee d , 1 t e two- lffiens10na p1cture were the '
available representation of the swarm, it would be impossible to . :~ly
whether its original three-dimensional shape had been that of a spi~ ge
a'1 a
hollow cylinder, or a doughnut.
It should now be clear that ordination by projection, for example b
PCA, PCO, or RA, although entirely satisfactory if the data swarm is linea;
may give misleading results if the swarm is nonlinear. 1t is sometimes said
that PCA, for example, gives a distorted representation of nonlinear data.
This is a misuse of the word " distorted." The picture of a many-dimensional
swarm that PCA yields is no more distorted than, say, a photograph in
which both distant and nearby objects appear.. One would not call such a
picture distorted because the images of a distant mountain peak and a
nearby tree-top, say, were close together on the paper. In the same way, the
circle of points on the ft.oor of the coordinate frame in Figure 4.l2b is not in
the least distorted. But it is rnisleading. What we require is a method of
ordination that deliberately introduces distortion of a well-planned, spe·
cially designed kind, that will correct the misleading impression sometimes
given by truly undistorted data.
Various methods of ordination that achieve this result have been devised.
They are known collectively as nonlinear ordination methods. A note on
terminology is necessary here. The contrast between linear and nonlinear
ordination methods is that they are appropriate for linear and nonlinear
data structures, respectively. The term "linear ordination" should not be
used (though it occasionally is) to mean a one-dimensional, as opposed to a
two or three-dimensional, ordination. The term catenation , suggested by
Noy-Meir (1974), is a useful and unambiguous synonym for " nonlinear
ordination."
W e now consider how nonlinear data swarms can arise in practice. Then
a good method of ordinating such data, known as detrended correspon·
dence analysis, is described.

The Arch Eff ect

~col?gical data often have a nonlinear structure. An example of an inves-


tigatwn that would yield such data is worth considering in detail. .
Imagi.ne an ecological community occupying a long environmental gradt'
ent, for.mstance the vegetation on a mountainside. The vegetation forros d
coenocline ' a commuru·tY wh ose spec1es-composition
· changes srnoo thlY aJl
LI - RUCTURES

·· ese rcipenie .
F1 gure .La giYe a diagrammatic portrayal of such a coenocline. Each
i.·Te repre~enL ne species. the horizontal axis measures distance along the
~3. ·em. and the height of a particular species' curve above this axis shows
the . ·~y the species responds to the varying environmental conditions along
íhe ~a dien t
·:-o · ima~e that the coenocline is sampled by placing a row of quadrats
· 'd ) Will the
- a ed at equal intenrals along the gradient (up the mountainsI e · .
f data points) be linear
"sultant ··data structure" (the shape of the swarm 0 .
h 0 f the segment of gradient
N nonlinear? The answer depends on the lengt
th .
ac 15 sampled. . ntains the peaks of one or
Suppose the sampled segment IS long, and co d monotonically
m . , .es do not respon
ore speaes response curves. These speci . th y do not increase
to th nt· that IS, e
e gradient oYer the length of the segme '. h they first increase
contmuouslv. or decrease continuously, along it. Rat er,is nonlinear.
anct then d;crease As a consequence, the data structurent of the gradient
. h rt segme . h
But if samplin º is confined to only a s o h species present w t .e
1f' 0 of eac · lf IS
tgure 4.13b) over which the response curve h data structure itse
s gm . li ar then t e
ent is at least approximately ne '
.ªºain that results
\, · ªPProximately) linear. onlinear data swarm a sampled
e now consider the shape of the n d crease, aiong
hen . and tben e
severa! species first IDcrease,
192

(a)

(b)

.. ·~.

· tal radient Utbe


Figure 4.13. (a) The response curves of eight species along an envi~onmen g tb ti.e data
1
community (a coenocline) were sampled at a series of points along 1ts whole Ieng ' ¡ed at
.
swarrn would be nonhnear. ( b) An enlarged segment of (a). lf the coenocline werebsamp
espoose
a number of closely spaced points within the segment, wbich is so short tha_t t e ~at Jea t
curves are not appreciably nonlinear within it, the data swarrn would be linear
approximately).

. . . here are onlY


grad1ent. Figure l4.4a shows a very simple artificial example, t uallY
· . d and eq ' ·
three spec1es and their response curves are identically shape d t 1he
· ¡0 cate
spaced. Assume that n = 12 quadrats are examined; they are . . e (oí
ª
. . . . . d VISUa 1lZ
s1tes marked on the honzontal axis. The reader 1s mVIte to uence
.h . . . the seq
construct, w1t stiff wire) the curve in three-space connecting . tes 1JJe
· ordJllª
of data pomts these quadrats would yield (each point has asco . repre·
oiJlt
amounts of each of the three species in the quadrat that the P bY tJ1e
sents). lt will be found that the curve is the same as that shoWI1
1l
t

IENT
1
-~
4 4
t:i

10 •
12

A 1 2

\
\
9 'q
\
\ _ .,..Axis 1
1

••
10

12 / ~
o

- d hne in Fi,,,nr 4.14b. The latter is a two-dimensional pCA ordination


dll.1. and a, u l . , d. t d .. picture" of the points; it shows
au~~mnr d L 1 is an un 1stor e
. d t theplanet1a
1 t fits thelll
,t . h' l uced ' ·ben they . . are proJecte on. ¡1 r:recf
1
°
(so111et101es called .
1 - · 1e 1.:urve e lub1ts the so-called 01e C.1. whtill data fro m a long

.. cr ). and the fact that it appear 1 by !lAl detracts
tt
'rdin ted by PC A ( and also. as shown ~~:· drawback is tbis:
fulne:: of PCA asan ordinat10n method.
ÜRDINI\ l l()N

¡';
one would like the result of ordinating the quadrats observ d .
. h li e a1ong a 11
grad1ent to ave a near pattern themselves, in the present near
. . case to for
more or less stra1ght row m two-space. But they do not. The d h ma
. . . . as ed curve.
Figure 4.l4b, which IS almost a closed loop, gives a misleadin 1"d r

. h h. . . g ea of th
grad1ent even t oug 1t 1s an und1storted picture of the data swa S
. . rm. uppo
one were to ask for a one-d1mens10nal ordination of these dat
· o f t h e quadrats along axis 1 turns out to be
ord ermg ª· Th

3 4 5 2 1 6 7 12 8 11 9 10
'
an ob~iously meaningless resu!t. Bu~ if ordination by PCA (or by RA or by
PCO) 1s performed on data w1th a lmear structure (as in Figure 4.13b), the
result accords with what one intuitively expects; for an example, s
Exercise 4.10.
It might be argued that the PCA ordination in Figure 4.14b would not
mislead in practice. The points are numbered according to their position on
the gradient and can be joined, in proper order, by a smooth curve. But it
should be recalled that this is an artificial example with only three species.
Given field data with many . species, one always has to project the data
swarm onto a space of far fewer dimensions than it occupied originally; ·
the swarm is a "hyper-coil" (a multidimensional analogue of the dashed
curve in Figure 4.14b ), then when it is projected it will automaticall
"collapse" and yield as meaningless a pattern in the line, the plane, or
three-space as the one-dimensional ordering of the artificial example listed
previously.
The problem is, of course, compounded when the gradient sampled is Iess
obvious than that of a mountainside, or is not even apparent at all. Indeed
if environmental variables such as soil moisture, soil texture, and the like are
varying haphazardly in space, there may be no gradient in the ordin~
sense. Then the quadrats will have no particular ordering befo~e an ~aly~~
is carried out, and the purpose of the analysis is to perceive theu ordenng
there is any) and diagnose its cause. . t to
What is required, therefore, is an ordination method that is not ~ubJe~aY
the arch effect, one which will ordinate a nonlinear data swarm in tion· ª
that clearly exhibits iri one, two, or three dimensions the true interrela
ships among the quadrats. oti·d
A The s
Let us first consider whether RA is an improvement over PC · and
curve in Figure 4.14b shows the RA ordination of the same 12 quadrats
D NON LINEAR DA TA STRUCTURES
~¡N(.4~ AN
195
s the sort of results that RA is f
resen t . ound t .
reP It is an 1mprovement over PCA in ° Y1eld in practice .
da 13 · ated. However, the effect is still p
gger
t~at
resent m .
the arch A' wi~h real
euect is 1
~,x d s not give an erroneous ordering
3
.nuld form, and 1h ess
il oe . h d. . on axis 1 . a t ough
·ngless pattem m t e uection of axis 2. , it still pr d
mea.fll b . , a true o uces a
? quadrats would o v10usly consist of a row f ~epresentation of th
1.- t . o equ1sp d e
axis. 1 with no componen. on axJ.s
. 2. Also, there is . a co ace . points along
h end of the grad1ent: the pomts at the end ntraction of scale at
eac . d . . . are more clo 1
ose at the nuddle an this vanation in spacing d se Yspaced than
th . . oes not corr
var iation m the steepness of the enVIronmental d.
gra ient espond to any
Therefore, although RA <loes better than PCA · . :
. 1 . hi m g1vmg a true repr
tation of the mterre at10ns ps of the quadrats th . esen-
. . . . ' ere is room for further
iJnprovement.
. One
. ·way of achievmg this is to use detrend d
e correspondence
analys1s, an ordinat10n method that we now consider.

Oetrended Correspondence Analysis

Detrended correspondence analysis (DCA) is an ordination method that


overcomes the two defects of ordinary RA. It fiattens out the misleading
arch, and it corrects the contraction in scale at each end of an RA-ordinated
data swarm. DCA does this by applying the requisite adjustments to an
ordinary RA ordination. In general terms, the adjustments (to a two-dimen-
sional ordination) are as follows. .
The arch effect is removed by dividing the RA-ordinated data swar~ mto
several short segments with dividing lines perpendicular to the first axis, and
. . th so that the arch
1hen slidmg the segments with respect to one ano er .
d. hift d up or down m such a
isappears. More precisely ' the segments. are s. hi e ch segment (.i.e., the
way that the average height of the po~nts wit n ~al The scale contrac-
~verage of their seores on the second axis) are all e~ htf~rward rescaling of
tion at each end of the swarm is corrected by stra~g d d cription of these
thos . . d d A detaile es
e parts of the axes where 1t 1s nee e · . h m out have been
P~ocedures and a FORTRAN program for carry1~; tr~cedures consist in
given by Hill (1979a) who devised the method. T . p order to force .an
overt ' . ·ed out U1 . · tuiuve
. ' systematic data manipulat10n, carn ossible with in
orct111 · . well as P . y sorne-
ation mto a forro that accords as s intuitions rna d ta
expect . . . k h t erroneou .th real a
tirn ations. Therefore, there is a ns t ª experience W1 therein)
atidest be forced upon a body of data. flow;~:Za and references
ests With artificial data (see Gauch,
suggest that the method gives useful results and permits ecologically e
.
interpretations to be denved from confusing multivariate data. orrect
An example, using real data, of the contrast between RA and DCA ¡
shown in Figure 4.15. The data consist of observations on the aquatic5
vegetation at 37 sites in oxbow lakes in the floodplain of the Athabasca
River in northem Alberta. The lakes dilfered among themselves in a variety
of abiotic factors, the most important of which was salinity. Ordination of

RA

X
X
o

DCA


.. •
• •


x•• • o o o

• • e

X
X

• xX •
e X

( X

X ru~
x f the Athabasca )
lakes in the valley o d Nitelfa
RA and DCA ordinations. ~f 37 oxbow. angiosperms (plus Cha~a ~ (•). those
or~inated on the basis of the c~~:t~ru;:;sd~sfti:i'::s~~d T~~a~~Xº~dinaUa~ ;'
Fi ure 4.15.
those with
growing m th~m.
Three cl(a~e)s
d those with neither spec1es (X). [ kindly provided by .
with Triglochin manttma ' an . . f the same data was
. a- (1984) . The RA ordmation o
adapted from L1e11ers
Lieffers (pers. comm.).]
NS ANO CONCLUSIONS
.Ap,ARIS
eº''' º
197
,..111ple sites by RA and by DCA are h
e sai.. . s own ·
!l1 of the figure. D1fferent symbols hav b 111 the upper a d
ane1s . e een used f n lower
P whether they contamed Typha latz:r ¡· or the sites d
. g 011 10 za (whi h epend
in ) Triglochin marítima (which thrives i . e cannot tolerate s lin-
water , n satine w a e
ecies never occurred together. The RA . ater), or neither th
tWO sP ord1naf , e
the arch effect and the scale contraer ion clearly exhibits
bo th d . ion effect d
. pear when the ata are ordmated by DCA ' an both effects
d1sap .

4.7. COMPARISONS AND CONClUSIONS

Ali four . of the. ordination methods described in this ehap ter have ment. m .
appropnate crrcumstances.
Three of the m~thods are suitable ~or data with a linear structure, and
such data are obtamed very frequently m ecological studies (van der Maarel
1980). PCA has the merit that it is the most straightforward, conceptually:
of all methods; it allows the user to look at a visible projection of a
multidimensional and hence unvisualizable swarm of points. In addition,
uncentered PCA aids in the recognition of distinct classes of quadrats (see
page 162). PCO sometimes allows one to construct and inspect a data
swarm in which the distances between every pair of points corresponds
(approximately) with sorne chosen measure of their dissimilarity. RA pro-
vides simultaneous ordinations of quadrats and species.
With nonlinear data, DCA is the ordination method at present fav?red
. . . d that it removes two likely
by the maJonty of ecologists. lts a vantages are .
. d t ordinated by ordmary
sources of error that arise when nonlinear ªªt" effect Its e ec is that
are d f t ·
RA, namely, the arch effect and the scale contr~c wn . · by deliberately
· . · lation that 1s,
tt gams these advantages by data mampu . ' tbe scales of the
ft · · 1 ¡ adJustments to
atterung the arch and by applymg oca . 1 artifacts devoid of
1 mathemauca
axes. lf these troublesome effects are tru Y . emove them. But
· . · 1 deslfable to r h
ecolog1cal meaning then it 1s obVIOUS Y etirnes lead to t e
o ' d "d fects" may soro
verzealous correction of suspecte e . f 1 ·nformation.
unwitting destruction of ecologically meamng u l . st but are beyond the
oh . nli ear data eXl d vised by
t er methods of ordinatmg no n . . method has be~n e 4 and
scope of this book. An especially pronus111g.b d in Noy-Meir (197al)t,erna-
She . · 1 0 descn e · g" or,
1
ª
b. Par~ and Carroll (1966); 1t is s " ararnetric 111appJJl d·fficult 1
than
/ efiy m Pielou (1977). lt is known. as pthernaticallY more
· " It 1s rna
ively ' as " continuity analys1s.
l~H

DCA, but is free of the rather contrived "corrections" that


. ·
ord mat10ns somewh a t sub.~ectlve.
· · mapping could make De A
Parametnc
. . profitabl b
tested in ecologtcal contexts; it may prove to have the merit Y. e
without its defect of artificiality. s of DCA
There is, however, a valuable byproduct of RA and DCA orct· .
inat1on~
that more than compensates for any defects they may have They .
. . · prov1de
the mformat10n needed to rearrange the rows and columns of a data m
atnx
in such a way as to make the raw data themselves easily interpretable. Thi
is possible because both methods ordinate the quadrats and the specie:
simultaneously.
Consider the artificial example in Table 4.11. (An artificial example ¡5
used for the sake of clarity.) The two matrices in the table contain identical
information. Both record the abundances of 10 species in 10 quadrats. The
upper matrix shows the data as they might have been collected in the field.
I t displays no discernible pattern, and there is no reason to suspect that it
contains a concealed pattern. It is typical of the sort of matrices obtained
when observations are first recorded in a field notebook. N either the species
nor the quadrats are listed in any particular order; indeed, one often <loes
not know what the intrinsic ordering is, or even if there is any.
N ow suppose that these data are ordinated by RA or DCA. We require
only a one-dimensional ordination from which the order of the points along
the axis can be obtained. (Since RA and DCA range the points in identical
order on the first axis, either method may be used.) The quadrats and the
species are then reordered according to the magnitudes of their seores. In
the example, the ordering of the quadrats on the first axis is

7,l,10,9 ,5,2,8,6,4,3,
and the ordering of the species is:

2,9,8,1,4,6,10,7,5,3.
Let the d ata matnx . gedin
be rewritten with the quadrats (columns) arran
th d h · · · the order
e or ~r s own m the_ first hst, and the species (rows) ~r~anged 111 l. The
shown m the second hst. The result is the lower matnx m Table 4.1
pattern or "structure" of the data is now strikingly obvious.
An example using real data is given by Gauch (1982a). x1 ·bit
11
An altemative method of rearranging data matrices in order to e en
their structure has been devised by van der Maarel Janssen, and LouPP
(1978), who give a program, TABORD, for carryin~ it out.
r

~ 1 ¡S
1\I
11)<)

1.J-' t t. NtllNli
f ,\JJ . s RUl UR
ns
¡ rRlN l --
- :irn rdcr~d dat ·1 matri ·
fhl' rtl\ '
Quadrnt
,
1 - ,l ' 4 5 (>
8 o 10
2
3 4 3
4 l
3 4 1
-
./ 1 4 2 3
5 l J 4 J 2
(1 4 l 3 .,
- 3
2 3 1 ..+ 3
8 3 1 2 2 3 -l
9 4 1 2 3
10 3 1 2 2 3 4
The sarne data váth the r w and e lumn" n~arrnng,ed

1 10 o 5
., 8 ó 4 _,'
-
., 4 3 2 1
-
9 3 4 3 1
8 2 3 4 3 1
1 1 2 3 4 3 1
4 1 3 4 3 2 1
3 4 3 2 l
6 1 2
3 4 3 '.!
10 1
4 3
7 1
2 3 .+ 3
5 1
2 3 .+
1
3

EXERCISES
4 , f thc e varinm:e
Consider Table 4.2. What are the eigenvnlue~
1
.l,
lllatrix yielded by the SSCP matrix R? \ = UX be the
4.2. L t tr1·x and let 1t1.tie
e X be a row-centered data roa ' CA \ hnt d 1 the quni
transformed matrix obtained by doine> p · ª
200

1
tr(X:X') and tr(YY represent in geometric terms? Why
)

. . Wou1
expect them to be equal? [Rerrunder: tr(A) 1s the trace of 0. You
i.e., the sum of the elements on the main diagonal.] rnatnx A 1

4.3. Show that the eigenvectors of a 2 X 2 correlation matrix are


a1Way~
(0.7071 0.7071) and ( -0.7071 0.7071) .

4.4. Refer to Table 4.4 and Figure 4.6. From the table, determine the
angles between: (a) the x 1-axis and the y 1-axis in Figure 4.6a ; (b)
the x 1-axis and the y{-axis in Figure 4.6a; (c) the (x 1/a 1)-axis and
the y{'-axis in Figure 4.6b; (d) the (x 1/a 1 )-axis and the y¡"'-axis in
Figure 4.6b.
4.5. Refer to page 164. What is the coefficient of asymmetry of axis 3in
the example described in the text? (Note: axis 3 does not appear in
Figure 4.9b because it is perpendicular to the plane of the page.)
4.6. Let A, M, N, and 1 ali be n X n matrices. A is the matrix whose
(i, j)tb element is given in paragraph 12, page 170. The (i, j)th
element of Mis - -!-8 2 (), k ). Ali the elements of N are equal to l/n.
1 is the identity matrix. Show that (1 - N)M(I - N) = A.
4.7. The quantities of two species in quadrats A, B, and C are given by
the data matrix

A B

X=(! 5
4

Pe.rfo~ a PCO on these data by simple geometric construction.


usmg city-block distance as measure of the dissimilarity between
quadrats. (Show the result as a diagram of the pattem of the tbJee
po~ts after the ordination; do nor compute the coordinates of the
pomts.)
4.8. Refer to Table 4 7 4 8
· , .. and 4.9. Confirm tbat

,32 =A 3.
Here r3 i the co r l · . 3 of V
(T bl r e ation between the species seores in row
a e 4. 7) and the quadrat seores in row 3 of W (Table 4.8).
201
Refer to Equation (4.27) on page 187 sh .
~.9. f r a RA ordination are related to ;he º:"'ing how the species seo ,
o e1genvectors 0 r res
Equation (4.12) (page 181). Derive the an J P defined in
. a ogous relat" b
the quadrat seores an d t h e e1genvectors of Q d ~on etween
(4.13). efined Jn Equation

4JO. Consider the following data matrix in which the


species and the columns quadrats. rows represent

11 12 13 14
17 19 21 23
X= 22 27 32 37
30 27 24 21
34 28 22 16

Find matrix Y, giving the coordinates of the four points after an


unstandardized, centered PCA of the data. (Hint: with these data
there is no need to construct the covariance matrix and do an
eigenanalysis. To perceive the structure of the data, inspect the
results of plotting against each other the quantities of every pair of
species.)
Chapter Five

Oivisive Classification

5.1. INTRODUCTION

In this chapter we return to the topic of classifying ecological data. Several


methods of classification were described in Chapter 2; all were so-called
agglomerative methods. Here we consider divisive methods. The distinction
is as follows.
In an agglomerative classification one begins by treating ali the quadrats
(or other sampling units) as separate entities. They are then combined and
recombined to form successively more inclusive classes. The process is often
called "clustering." Metaphorically, construction of the classificatory den-
drogram (tree diagram) starts with the twigs and progresses towards the
trunk.
A divisive classification goes the other way. lt starts with the trunk an.d
pr h llection of quadrats 1s
ogresses towards the twigs. That is, the w 0 1e co bd. ·_
treat d . h divided and the su iv1
. e as a single entity at the outset. 1t is t en
Stons re d.iv1ded,
. again and agam.. . hods have one nota-
Compared with divisive methods, agglomeratr:e mhet mallest units (the
blqe dis ªdvantage. It arises because they start w1 th t e .sal quadrats in the
uadrats themselves). lf, by chance, there are a f ew atyp1c
203
204 DIVISIVE ClASSIFICAnoN

data set, th.ese quadrats are ,~kel~' to ~ave a strong ~ffe~t on ~he first round
of a clustenng process, and bad fus10ns at the begmmng w1ll influence
later fusions. The obvious (but, with one exception, impracticable) solutio
is to adapt the agglomerative methods so that they can be used the oth
way round. It is easy (in theory) to devise a method of classification
division that proceeds as follows. The whole collection of quadrats is fu
divided into two groups in every conceivable way, and one then judg
which of the ways is "best" according to sorne chosen cri terion. lf there
n quadrats, there will be 2n - l - 1 different divisions to compare with 0
another (for a proof, see Pielou, 1977). Having discovered the best possib
division to make at the first stage, the whole process must be repeated
each of the two classes identified at this stage, and then on each of the fo
classes identified at the second stage, then on each of the eight class
identified at the third stage, and so on. N ot surprisingly, the comput
requirements of such methods are so excessive that they are infeasible unle
n is very small.
However, there is another method (actually, a whole set of relat
methods) of doing divisive classifications that avoids these computation
diffi.culties. lt consists in first doing an ordination of the data (in any wm
one chooses) and then dividing the ordinated swarm of data points wi
suitably placed partitions. The procedure is known as ordination-spa
partitioning. The term describes a large collection of methods since one e
choose any one of a number of ways of doing the initial ordination, an
then any one of a number of ways of placing the partitions. Gauch (198
has reviewed the development of these methods. Collectively, they constitu
a battery of exceedingly powerful procedures for interpreting ecologic
data. They yield an ordination and a classification simultaneously, and th
classification, being divisive, avoids the disadvantage of agglomerative clas
sifications previously described.
lt should be noticed, however, that ordination-space partitioning is
much more "rough and ready" method of classifying quadrats than th
agglomerative methods described in Chapter 2. Even so, it is probabl
adequate .for most, if not all, ecological applications. And, as so ofte
happe~s m efforts to interpret ecological data, one is faced with tli
pere~mal problem of choosing, judiciously, one of a large number 0
poss1ble and only slightly different procedures.
In the following sections, we considera few representative methods.
iNG ANO PARTITIONING A MINIM
coNS rRlJ(T UM SP
AN N1Ne TREE
205
2 coNSTRUCTING ANO p
~ 1 N1MUM SPANNING TREE ARTITIONINC A

, rernarked previously that one (and


lt was . . on1y one) f
met110d
s of classificat10n can be done in
. reverse. The
°
the agglomerat·
1ve
. l1bor clustenng (see page 15), also known a . .method is nearest-
ne1g . . s smgle li k .
o do a nearest-neighbor . . 1y the nn age
classification dI·v·ISIVe d clustermg·
T lotted in s-space ( n IS the number of quad t' ata points are
rs P
fi tnurnber . . . ra s to be cla ·fi d
of species); this is equivalent to dom· . . ssi e , and s
t11e . . . . g an ordmat10 f h
with no reducuon m dunens1onality. °
n t e data
The. points of
. the swarm . are then linked . . by a mimmum. . . tree A
spannzng
spanning tree IS a set of line. segments . linking
. . all the n pomts . m . the swarm
·
in such .a way that every pau of pomts 1s linked by one and on1Y one path
(i.e., a line segment, or sequence of connected line segments). None of the
patbs form closed loops. The length of the tree is the sum of the len ths of
¡15 constituent line segments. The minimum spanning tree of the sw~rm is
the spanning tree of mínimum length. (Note: Do not confuse a spanning
tree with a tree diagram or dendrogram.)
Figure 5.1 shows a simple example with n = 10 and s = 2. The coordi-
nates of the 10 data points in two-space are given in Data Matrix # 1 (see
Table 2.1, page 18), and the swarm of points is identical with the swarm in
Figure 2.3a (page 17). For clarity, the data points are here labeled with the
letters A, B, ... , J instead of with the numerals 1, 2, ... , 10. The length of
each line segment is shown in the figure.

Partitioning the Tree


. . . method is now done by
Nearest-neighbor classification by the divisi:e then the second
e tf · t link in the tree,
u mg, m succession, first the 1onges ess is illustrated, up
longest link, then the third longest, and so on. The procs i·n Figure 5.2. The
t h 0 f dendrogram
0 ~ e penultimate step, by the sequence . of all yields a dendrogram
~ltimate step, that of cutting the shortest link '
idenf ¡ . . . b 1 there are
ica With that m Figure 2. 3 · ;11 the examP e, .
Th t when, as .u.. h rniJUIDum
e procedure is very easy to carry ou lotted, and t e
only t m can be P Let us now
wo species· for then the data swar t of paper.
span · ' d. sional shee
mng tree drawn on a two- 1men
'
'\I

/\
1,¡ e
N G
lf) ') (' 11 \
w
13 .lb
w
o... H ?'t'
E
lf)
•,o
10
78
o

o 10 20 30 40
SPECIES 1
Figure 5.1. Thc points of Data Matrix #1 linked by their mínimum spanning tree. The
coordina tes of thc quadrnt poinl, (hcre Jabclcd with the lctters A, B, . . . , J) are given in Table
:U. Thc distancc bctwccn cvcry pair of joined points is also shown.

describe the procedure in such a way that it is applicable whatever the value
of s. It is still convenient to use Data Matrix #1 asan example.
lf the data swarm is man y-dimensional and hence unvisualizable, the line
segments forming the mínimum spanning tree must be found by inspecting
the n X n distance matrix showing the distance between every pair of
points.
The distance matrix for Data Matrix # 1 is given in the upper panel of
Table 5.1. It is identical with the distance matrix in Table 2.1 (page ~S)
except for two changes. In Table 5.1 the quadrats have been labeled with
. ' .
lette~s mstead of n~merals, as explained. And sorne of the d1stances the
· the
:r
matnx have been g1ven superscript numbers· these label the segments
. . . . , d TheY are
rrummum spanmng tree m the order in which they are foun ·
f . . oower
ound as follows. (The method is due to Prim, and is descnbed in
and Ross, 1969; Rohlf, 1973; Ross, 1969.) ·n the
The first segment ~f the tree corresponds to the shortest distance ~ Elt
table. The shortest d1stance is 2 2 = d(E H) the length of segmen bY
h . · · · ' ' . find,
t erefore, supe1scnpt 1 is attached to this d1stance. We next . ce
. t dista 11
s.ear~ hmg the rows and columns headed E and H, the shortes :; :; 3,6;
linkmg a third point to either E or H. It is the distance d(E, B)
RlJCflNG ANO PARTITIONING A M
coNlf INI MlJM SPA NNINC TREE
207

cur 1 CUT 2
CUT 3

CUTS 4 8 5 CUTS 6 87

I C G D 1 I C GAF D J
1

___ L
CUT 8

F' r co•FoJB
on of Data Matrix . the const (10º.0 f the dendrogram givmg
ti igure 5.2. St ages m# ruc . . a nearest-neighbor classifica-
cut !tves the complete d1- Each successive cut perrruts a division to be made. The final (ninth)
endrogram, which is shown in Figure 2.3b.

searchin~ ~uperscnpt
lherefore . 2 is attached to this distance. We next find, by
linlting a / e rows .and columns headed E, H, and B, the shortest distance
~~rth
~
therefore, pomt to _any of E, H, or B. It is the distance d(H, J) = 5.0;
d(B,H) perscnpt 3 is attached to this distance. Nouce that although
of th . .Ois shorter than d(H J) segment BH is not ad!Illss1ble as part
4
e nuni
not pe . mu spanrung. tree as ,it would
' forma loop BHE and Joops are
01
rnutted ·
ntinuing ts needed to
Co eteih
cornp1 · · m the same way• all the n - 1 ::::: 9 .segmenh nd in the
e tree are found. They are listed, with thelf Jengt s ª
208 DIVISIVE ClASSIFI(
ATION

TABLE 5.1. A DIVISIVE NEAREST-NEIGHBOR CLASSIFICATION


OF DATA MATRIX #1.ª

The distance matrix : tJ


A B e D E F G H J
7 6
A o 14.4 16.5 25.0 18.0 5.7 6.1 17.9 27.3 19.4
B o 11.38 15.8 3.6 2 20.0 9.2 5 4.0 24.8 8.1
e o 27.0 12.5 21.5 15.1 14.4 13.69 19.2
D o 14.9 29.2 19.l 12.7 40.3 7.8 4
E o 23.6 12.7 2.2 1 25.5 7.2
F o 11.2 23.3 31.0 24.4
G o 12.2 27.9 13.3
H o 27.6 5.0 3
o 32.5
J o
The lengths of the segments of the mínimum spanning tree, in the order
in which they were found:
1: d(E, H) = 2.2; 2: d(E, B) = 3.6; 3: d(H, J) = 5.0;
4: d(J, D) = 7.8; 5: d(B, G) = 9.2; 6: d(G,A) = 6.1;
7: d(A, F) = 5.7; 8: d(B, C) = 11.3; 9: d(C, I) = 13.6.
Diagram of the mínimum spanning tree, constructed from the segments
whose lengths are given above. The segments are labeled 1, 2, ... , 9 from
longest to shortest, showing the order in which they are to be cut.
FA G BE H J D
• 6 • 5 • l~ 8 • 9 • 7 • 4 •

:see Table 2.1 and Figures 5.1 and 5.2.


The row and column headings refer to the quadrats.

order in which they were found, in the center panel of Table 5.1. Observ
that they were not a·iscovered m . length; some of th
· order of mcreasmg
.
later-found segments are shorter than sorne found earlier. Now that th
seg~ents have been found, it is easy to draw a two-dimensional diagrarn
matic representat· f h · · ;,, the
IOn °
t e ffilrumum spanning tree as has been done ~· .
bottom panel of Table 5.1. The segments are linked together in the order in
fRUCTING ANO PARTITIONINC A M
coNS INIMUM SP
ANNINC TREE
·a . 209
. h theY were 1 entified; they have b
,vJJ.lC f h 1 een assig
¡enª {hs with 1 or t e ongest up to 9 f or th hned ranks acc or d.mg to th ·
º o obtain. a nearest-neighbor class·fi
T
. e s ortest.
1 cation f
e1r
e it remams to cut the tree's segment rom the minimum .
1re , . . . . s, one af te spanrung
e largest. This partitiorung process h r another, beginrun· . h
th . as airead b g w1t
·aure 5.2. Of course, w1th a many-dimen . y een demonstrated .
flº .. s1onal d t m
be plotted and partlt10ned as was the tw~-d· . a a swarm which cannot
.. . tb d imens1onal s .
!h
e part1t10nmg mus e one on the d1'ag . warm m Figure 5 1
. rammatic · · · '
constructed as shown m Table 5.1. The dia mirumum spanning tree
gram can alw b
dimensions, regardless of the value of s 1 . ays e drawn in two
. . · n practice of
in the classificat10n can be done by comput . ' course, all the steps
Rohlf (1973). The foregoing description ofe:h: pro~ram has ~ee~ given by
met od explams Its princi-
p1es.

Clarifying an Ordination with a


Minimum Spanning Tree

When s-dimensional data (with large s) have been ordinated in two-space,


for example by PCA, there is obviously always a risk that two points that
are far apart in s-space will appear close together in two-space. This is
particularly likely to happen if the data have a nonlinear structure (see
Chapter 4, Section 4.5). To avoid being misled by the spurious proximity of
points that are, in fact , widely dissimilar, it often helps to draw the
minimum spanning tree on an ordination diagram. Then the fact th~t
11
apparently similar points are not linked by a segment of the tree makes
obvious that the similarity is only apparent.
Figure 5.3 shows an example. The data are from Delaney and Healy
(1966) and the analysis from Gower and Ross (1 969 )· The purpose of. the
. . 1O isolated populat10ns
research was to investigate the relat10nships among t
f . f veral skull measuremen s.
0 shrews of the genus Crocidura on the basis 0
~e . 2 11 ws the 10
Ürdinating the data and reducing their dimensionality. to th: ~gure. The
dat · s shown in
ª
po l ·
pomts to be plotted in two-space, ª
h One group
comes trom five
pu at1ons belong to two groups of five eac . 1 d· the other group
of th . ( of Eng an , d
e Scilly Islands off the southwestem 1P 1 of France, an
co ' ff h north coas
rnes from four of the Channel Islands 0 t e d. tion did not show
frorn
th
e ap Gris N ez on the French mam an ·
.
1 d lf the or ina f h
elude that two 0 t e
e rn · b tural to con
Inimum spanning tree, it would e na
210 DIVISIVE CLASSIF
ICA110N

Channel Island populations (those from Jersey and Sark, labeled 1


were closely similar. In fact, as the mínimum spanning tree shows t:nd S)
more similar to one of the Scilly Island populations than to each other ey are
Thus a mínimum spanning tree is a useful adjunct to an ordinati~n
can be helpful i~ pre.venting misinterpre.ta~ions. When .the number of po:~
being ordinated is fa1rly low, then the rmmmum spanrung tree can be show
as part of the ordination, as in Figure 5.3. When there are a large number ~ 0
points, a diagram showing the tree as well as the points may be too
confusing to be useful as a final portrayal of one's results; even so,

J
~ s
1<?
11
11
\1
----<{---- - -
\1
o--- 1 --- 11
1
b

.
F1gure · g 1he
5.3. A two-d · · al . . . m shOWJll
. . . imens10n ordination of a ni.ne-dimensional data swar brews.
rruru mum spanrung t (d h . ts on s
ree as ed line). The original data were skull measuremen dia-ercol
JI
co ected from 10 lo 1 . . frorn 11
'
· d . h . ca popu 1ations. F1ve of the populations (solid dots) are · d (circlcsl
is1an s m t e Sc11ly Isl d Th 1 r5Jan s
and Ca G . an s. e hollow syrnbols refer to four of the Channe 196 9.) fhC
. p ns Nez (square) ; J = Jersey; S = Sark. (Adapted from Gower and Ross, 1I )' OP
~s~_t ~ap . shows the locations of the two island groups (A = Scilly Is., B = Cha.Dlle s. '
ns ez is on the north coast of France , far to th e eas.t
,,
111

t tm 1 n1 ,r
unn tt. d . th tn\lsti •·111011 so that

IN N

ning thod

'tmpk. unifü.:ial e ampk -erves a a demonstration. We


uadrat' that e 1ntain. t:1gether. eight species. The data are in
LIT = 1 . in the tL p panel :-if Table 5.2. The columns are headed
euer: . . . . . F. '·l ch ar the quadrat labels. An unstandardized,
p · - carried 1Ut on these data in order to determine the
' mp nent .-, r ( the e ordinates mea ured on the principal
e -~- 1
data p int ·. Th s cMrdinate , rounded to one decimal
e \ ·en - t e e [umn: f matrL Y in the center panel of Table 5.2.
ter !he p . the , t point: are rdinated in five-space. Hence
Ju e 1rr spondin,, l each row is shown on the
malle-1 tfi th ei~en
~ 1lu is much :maller than
a· g the others,
to one .ªºd
decunal
te~ n the fi.fth is ar all z ro after roun in
212

CLASSIFICATION OF SIX QUADRATS


TABLE 5.2.VITCH'S METHOD.
BYLEFKO :_:_:_~~~~~~~~~~~--------
Data Matrix # l 4 :
A B C D E F
9 10 4 o o 1
28 25 3 1 1 o
37 39 50 40 45 46
X= 14 15 65 50 42 40
2 1 20 8 10 o
8 11 19 15 12 o
1 3 21 10 11 50
7 7 30 25 24 23
The matrix of coordmates
. a ft er PCA (the principal component seores):
A B C D E F
- 36.8 - 33.8 30.5 11.8 7.5 20.9 ;\. 1 = 676.l
0.3 O.8 -14.3 -11.5 - 7.2 31.9 ,\2 = 234.l
Y=
2.2 2.2 9.5 - 7.2 - 6.8 0.1 ;x. 3 = 32.9
-0.5 0.1 0.2 - 3.9 4.3 - 0.3 .\4 = 5.6
O.O O.O O.O O.O O.O O.O ;x. 5 = 1.6
The matrix of signs:
A B e D E F
+ + +

~)
y=
s (++ +
+ +
+ + +

Figure 5.4a shows the data swarm projected onto two-space; equiva;
lently, it is a two-dimensional PCA ordination of the data. The coordmate
of the points are given by the first two rows of Y. h
Matrix Y, in the bottom panel of the table gives the signs of 1 e
correspondmg. elements in Y, which is all the informat10n . reqmre· d for
carrying out the classification. To make the first division, consider the :~1
row of Y,. We see that A and B both have rninus signs, whereas C, D, E, Bl
F ali have plus signs. Hence the first division is into the two classes (A.ro·
and (C,
10
D, E, F). The division is shown diagrammatically in the first us den~
gram . p·igure 5.4b. That this should be the first division is l obvio
. ªso
I'

( 11)
,,

•1

,,,
(\ lt
1
•111 11
,, / /
1,

11

(h) ,'111f 11111


1 1
1

1 1
1
¡1 ,, , , ¡ l / ,, 1 ,, ,, 1 1

11111 , 111111
1
"" • 111111
1

1 1

! 11 1 1

1 1 ' 1
,, 11 1

li'ii,:11rc• c;A, < '1:1 ,• ,if1< .1111111 of .1 ·.1~ 11111111 d.lf.1 "' 1 11 ¡ l ,1f~1,1it 1 11 1111 lf111rl r 11 ¡ ,, l t/'11h111•11
~ 10 11 .il Pe/\ c11d111:1111111 cil 1111 p1111i1 •,, ( ¡,) 1111 ',I q111 ,,, , •11 ,llJí'/ 1,f ,,,, ,,.. , ,,,,,,,. ,,,íl

fio111 thL· S< ;lllt•¡ cli;i¡•1;1111 Íll f •ig1111 · 1


1/lfl •¡¡IJ1,,11 JI J', ,i ~/¡ IJ1;if Jr'rJJd' , / :111d H
hav~ 11~1·at1v 1 o()idiii;tl( 'I,, ; 111 d 111 · 111rw11111i¡ j p111111, ¡111',Ji11 1 1,1,1,¡il111:it1,, 1,,,
axis l.
'l ltc s · ·n11d d1v1:)'"'' · 1il1 1 ~w1 ,c,1, u 111 f11,, 1JJ:1d1 , 1, J1!11 / 11¡ 1.:1111111111¡~ 1 111 l. ,,f 1
1

y· u1 l1rn11a¡•,1: 1111 1· al tli1; :,< :itlt r d1:wr:w1 <>11 :J1:I', í ¡1111111 1 f1:1, : ¡1 1 1',1111 1

collrdi11:111, :111d po1111 :, <', 1 ' :111d I ~ ha .;1, "' í')ilJ ¡1 '1' rl d111:i11 ¡,, I ¡, "' '• 1111, i

~l:l'{)Jld d1vi:,1 1111 ~. plil :, l:t, :, (< , IJ , J~, J ) ' I J11 tJ111 • 1 );1•, ,• , 11'1/I 1
1 1
,t , 1; 1111
1
,<I
I I/
214 '" '' / 1,,, ¡

. A B) (C '. IJ, n), íJIJ l ,, ., ;1 , ,Ji tU(IJ lll 11. ,• 'J1¡1J d1 r111t
are, thercforc, { • ' 11 1:, ,
11
in Figure 5.4h . ¡ '1
rd divi:m 111 w1 , tr11J h l 1, 1111 11J 1 ' , 111 ,,, J 1¡11 11 ,, ', ;¡ " 1
To do l 11 • ti 11 11111

. ,· · I (Thc )'COHl<,, 111<, l/llf>J1<.,¡d ,11111 ', C'H 1 ,f¡j j l,1 11, i.d1 1t,1l, 1 1
two-d1mcnswn.1 . ,1 ''"'
, ) l•roin iow 'i uf y ti 11, : ,(.,1 n l h:it, ' ''' f h1 lh1r 1:, 11 , 1 1,
stage howcv r. , •
F
and havc posttivc '()(JJdmal 1:, :1nd f) :w 1 I', 1i:J ¡1 i1 l''lf1; _, v,1,1d111:11i.; '.
We thcrcforc separa te tl1osc.., porn b l11:il, J1 :110 11111 ' ,ni 11~11, ¡,,, tfw /1r )1
time , namc 1y, (( ') f Olll (IJ • fl)' · 'I bi: 1'1' f',; , tli lfo 1tf Ji.,,,, 11 ,v1:11
, 11 11 , ¡ 1 ' irt
5.4b.
The two two-mcmbcr cla1->hcs p 1....t>cri l :1 l U1 1,, Uwd di 11 ',1 1111 : 11 1,, f,1,1r1 '.fiht íil
the fourth division : (A , B) splítH mlo (JI) ;wd (HJ , (J >, f '. J 1f11i '> irit1 , (fJJ íJrid
(E). Th.is is clcar from row 4 <Ji' Y , but 11 1 111 11 <...íJJ 111 <, t ~1c Vl',1:1lw,;d
geometrically sincc wi:, are cor1c(.;.rncd wit11 Oi e ,,,,r,,d in: ti.,1, 1,f IJ1L. f' ,ínL <in a
fourth axis perpendicular to thc <>Uicr lhn.;c.
Observe Lhat points that hlJV0 bc<,,'>rrn; 1cparnl(;,d ;1t f1{J CMIJ •,tfJgc; 1,! th1;
classífication cannol beco me f(jUflÍt0<l at (j lrJt(.,f ',tíJf.~1;. I (JI 1....í'.(JffJplc, B llfJd e
become members of a diffcrent dat;s at lb(.; fw,t di v'J ', Jf1n r, <.. .<..<JIJ ,e tr1<.;J íHc on
opposite sides of the ccntroid ali muJ~urcd <Jl<mg íJ1I '1 J. 'f h1.; ftJ et Ü1llt thcy
are on the sarne side as mcasurnd al<mg a1c11 3 :-:ind 4 (',rx.. r<m , 1 &nd 4 <11 Y,J
is irrelevant.
In the final dendrogram) the hcjght:, <Jf thr.,, fJrf> t, '">CVJIVJ , .. , n<1dc~
(countíng from the top <lownwarchJ ;;irc pr<1p<)rti<Jn(l.J u, ht- fir&t, \ C( /)n<l , ... ,
eígenvalues. Hence the height ()fa n<Jdc ._,h<>W'> thc rclativc Jcngth <Jf thc {JXÍ ~
that was "broken" at the <livision Jorming thc n<Jd0.
There is) of course, no necd to ccmtrnue thc ~ ubdiv1\J<)n prrJce~1i right to
the end) leaving every individua] point (quadrat; J)(Jlat0Cl rn <1 <JflC~memher
clus~e~. One us~ally wishes to ~top at a ~tagc that Jea ve'> " real" clu~ters
undlVlded; for mstance) in thc examplc in f igun 5 4 onc m1ght regard
(A, B) an_d (D, E) as true, natura] el u~tcr~ an<l trcat cla~'>i~cation a~ complete
at the third stage Th 1·s · . · . . . f' , tion:
. . . · ts one of the ad vanta''C'>
6
<Jf a d1 vi &1ve class1ica
the subd1vis10n pro h h ali
"t 1 ,) cess necd not be contrn ucd bcyond the stage at w JC
ru Y separate cluste h· b · d , ntage
. h .. ' fS ave CCn ~Gpé:tn:1tcd. rf he aS' ociated d1sa va .
is t at a dec1s1on must b d d fined.
equivalentl e ma e a.~ t<J h<>w a "real" el u&ter &hall be e f
Y, a rule has to b j · uence o
subdivisions sh ]d e < cvi.se:;d fr>r dcc1drng when the seq re
ou stop Such , 1 .
several possible e .t . ·' ª ru e Jh unav<Jidably arb1trary an d there a
. d far
n ena for <lec·1d' r
enough (Pielou, 1977) Disc .. ing w 1cn ~ubdiv1)ion has een . beyond
b carne
the scope of this book. . u~~1on <;f lhesc V>-cal lecJ stopptng rules is
Noy M · t' 1•' hlio11 UJ M ·lito 1

1111
111 "''1111d ¡d .11 h · 111. ti 1.i 1101 q111l1 111 1.11rq1I!,,. A, w1U1 L0Jkc,v1tch ':)
llllllirnl 1111 d,111 u< lt1 f 01d111nkd (w1ll1 11<1 11;,d11 licw 111 dírn e.11 i-,irniaJity)
liy I' A, lit< p1111 q1,tl .ixt, 01 tlll;JJ l11ok1;11 111 Lw1¡, cm aflc.r anoLhcr ,
~1:11 1111 • w11l 1 1111 111 .1. r >Y M 11 s 1111 ,ll1qd ddJl-1 :) i11 th 0 way 111 which thc
"111 al po111t ' 1 lio.111 101 ;i 11 111 ;;iY. . lt Jt, :,() pl ;1 ·d ;1~. t<J rnak0 tlic ~ uro
ni tl11 (w1lli111 1•1q11p) v: 11ia11 '} , qj tlic; prir1<,., Ípí1I wnp11n0nl :.,i.;orc~ of thc:,
¡11<111 p. 1d p11111 I., 111 1lli 1 :.id ()1 111 · IH erik p()llll a:) :,JTl<lll a:, pof,:.,ihk.

11,X AMPLH. 1, 1 11 l:i 1,,d y 1 ala M:1t11x I~ 14 <11!,HÍn , uhJrJ'' Noy-Meir'¡.;


1
,

111 lh11d '1111 .11 p' 111 11 11 ,., 1 10 ·:-,b :ir<,., 1hown JrJ ful! in 'I íi hl c ) .1.
hu li ;1x1·, 1 , ¡ 1111 ~ • 11 111 ¡ 11111 • L<1 ti p<1:11)1hl · hr e;1Y. p11inl aloug (.;rHJ1 axí:., is
l1 \ l1 d 111 111111 ',() lli:it lli . l.orr ·(.;.t pC1ínt trJHY b0 dd<.;rmrnt;d . l«1r in :-, tancc,
Cllll id11 1111 IJl',I a,1,1 ,, arad '" . (;1->U lt of hn;akrn'' Jl IJ dWf.)511 r<JÍnl:-. J~ ancJ L.
'1111 ·,1 , 11 ,
., ,,¡
¡,,, ni 4 p()trJL (A , B, <:, and IJ) lo th i.; lcll th1 :-. br(;ak
1 ,,¡
pq1111 :111

%.~ , 31 8, ~~().,) , J 1 .8 )

f ~ 11 ?.. l .'J8
1 \ ¿y,2
1 l
CLASS CA"nor"' crr

----
3
TABLES.· D
-
(J

BY NOY-MEIR'S METHO .

First Break . . .
The seores on the first pnne1pal ax.is are
A B C D E J
Point: - 36.8 - 33.8 30.5 11.8 7.5 20.9
Seore:
Within-Group Variance Sum of
Break Right Group Variance~
Between Left Group

o 608.17 608.17
Aand B
4.50 104.31 108.81*
B and C
1445.46 46.81 1492.27
C andD
1121.98 89.78 1211.76
D andE
Eand F 883.97 o 883.97
The smallest sum is marked with an asterisk.
Malee the brealc between B and C.
Second Break
The seores on the second principal axis are
Point: A B C D E F
Score: 0.3 0.8 -14.3 -11.5 - 7.2 31.9

Within-Group Variances Sumof


Break
Between Left Group Right Group Variances

A and B O 351. 70 351.70


C and D 73.57 571.81 645.38
D and E 61.65 764.41 826.06
E and F 46.45 o 46.45*
Malee the brea}{ between E and F.
Third Break
The seores on the third principal axis are
Point: A B C D E F
Score: 2.2 2.2 9.5 -7.2 -6.8 0.1

Break Within-Group Variance sumof


Between Left Group Right Group Variances
Aand B o 48.05
48.05
e and D 17.76 34.60*
16.84
D~dE
46.85 23 .81 70.66
Make the br ak b
- e etween C and D.
:~:ponent
Table 5.2 fer the
seores y
Da . ~
ta Matnx X and the matrix of pnncip
pARTITIONING A PCA OROINATION
217

fhe seores of the n 2 = 2 points (E and F) .


to the nght of the break point are

(y5, Y6) = (7.5,20.9)

with variance

6
1 {
n2 - 1 -~Y;2- ;¡ 6 Y; )2}
1 (L = 89.78.
1-5 2 i=5

The sum· of· these two variances is 1211 ·76 · (It sh ould now be clear how ali
the entnes rn the table are computed.)
lt ~s se~n. that the smallest sum of variances is obtained when the break
on this
ºfi axis
· is· made
· between points B and C. Hence the first d.lVlSlOil
· · Of th e
elass1 cahon is ~~º. the classes (A, B) and (C, D, E, F).
The second d1v1Slon is made by breaking the second axis. In this case the
break _comes between the points E and F. Therefore, the three classes
recogruzed after the second division are (A, B), (C, D, E), and (F). Likewise,
th~ four classes a~ter the third division are (A, B), (C), (D, E), and (F). The
ultimate step, which needs no computation, is to break A from B and D
from E.

It should be noticed that the sequence of breaks is identical with that


yielded by Lefkovitch's method.
In the example, each ax.is was broken exactly once. In another version of
the method, an ax.is may be broken more than once if such a break gives a
smaller sum of variances than would the breaking of a hitherto unbroken
axis. Details are given by Noy-Meir (1973b). He also discusses applications
of the method to data ordinated by other forrns of PCA (besides unstan-
dardized centered PCA as used here). And he suggests possible stopping
rules. The method does not lead to unwanted splitting of tight clusters of
points as Lefkovitch's method sometimes does when a cluster happens to be
skewered by one of the principal axes (see Exercise 5.2).
~t is interesting to note the resemblance of the method to rninim~1:1
vanance clustering (page 32). It is not true to say'. however, th~t Noy-Me~,r. s
Partitioning method amounts to rninimum vanance clustermg done m
reverse." Thus one <loes not examine every possible division of the points
into two groups to find which gives the mínimum sum of within-group
variances; such a procedure would be impracticable because of the excessive
DIVISIVE CLASSIF
218 ICAT10N

tation required (page 204). The only divisions exa .


amount of cornpu · · llllnect
nding to breaks of the pnnc1pa1 component axes· h
are those corresp O . . ' ence
. . there are only n - 1 poss1ble break pomts on each axi '
g1ven n pomts, s.

5.4. PARTITIONING RA ANO DCA ORDINATIONS


A artitioning method devised for application to PCA or PCO ordina-
ti:~s ~an, of course, be applied to RA and DCA ordinations as well, and
vice versa.
Hill (1979b; and see Hill, Bunce, and Shaw, 1975) developed a partition-
ing procedure that was applied to RA ordinations, but there is no reason
why it should not be used with PCA and PCO ordinations. In principie, it
consists in carrying out a one-dimensional RA ordination and breaking the
axis at the centroid so as to divide the data points into two classes. Each of
the two classes is then itself split, in exactly the same way, to give a total of
four classes; then each of the four classes is split to give eight classes, and so
on.
The method is known as two-way indicator species analysis; a computer
program for doing it, called TWINSPAN, is available (Hill, 1979b) and, as
its author comments, it is "long and rather complicated." This is because, at
each step, the required one-dimensional RA ordination is first done in the
ordinary way to give a "crude" partitioning of the data points; it is then
redone (at least once and sometimes twice) with the species quantities
~eighted in such a way as to emphasize the influence of especially useful
diagnostic species (i.e., of differential, or "indicator," species) identified by
the first ordination.
These (and other) refinements are thought to make the classification more
natural by ensuring that "indifferent" species (those that are not diagnostic
of true natu~al classes) do not affect the results. However, the price of such
refinements IS the lo 0 f · · . . h d of
. . ss s1mplic1ty. Whenever a simple basic met 0
~alys~s ~s refined and elaborated, the number of possible 'modified forros of
e ongmal rnethod in the111
b . . creases exponentially and choosing among
eco~es mcreasmgly subjective.
I t IS worth conside · NSP Al'..¡
analysis in orde t d nng an example (in Figure 5.5) of a TWI d
0
One can h rh emonstrate how clearly the results can be displaye ·
ave t e best of tw 0 · (on) bY
presenting a t wor1ds (classification and ordma 1 .
wo or three-dimensional ordination of the data under invesu·
p/4 RTITIONING RA ANO DCA ORDINATIONS
219

gation, and _the~ drawing the_ partitions that yield the cJassification directly
on the ordmahon scatter. diagram.
. To complete the representation, the
classification dendrogram is g1ven as well.
Figure 5.5 shows the result of an ordination-plus-classification of vegeta-
tion. It is adapted from a figure in Marks and Harcombe (1981). The scatter
diagram shows a two-dimensional RA ordination of 54 sampJe plots repre-
senting the range of natural vegetation in the coastaJ plain of southeastern
Texas. The data matrix was also classified, using the TWINSPAN program,
and gave the classification dendrogram shown asan inset on the graph. The
four groups <?f sample plots separated in the classification were then
outlined and labeled on the ordination.

• •
• • •

........•·:•,.


C\J

Vl •
>(
<{

<{
Q:

0
p PO HP F •

RA Ax i s 1 b
TW INSPAN classification of data on t e
Figure 5 5 A two-dimens10n · al ordin ation. and ª
. asoutheastern Texas. The vege t tion classes
ak .
v~g~tation· · of 54 sample ~lots o f vegetat10n
land ine forest and wetland pme sav~alain hardwood
m . · PO pme-o

d1stinguished are: P, sandhill and :~od


a:d pine forest; F, llatland and ~:~b~
(1981). In the
forest on upper slopes; HP, hard . k t [Adapted from Marks _and !far . artitioned iolo
forest and wetlands and shrub thic ~ s. f th r and the ordinatton diagram is P
0 . . ' . . carned ur e
. is
nginaJ paper the class1ficat10n
10 classes rather than merely 4.]
220 DIVISIVE CLAss
IFICA.110~

1t should be noticed that when an .ordination and a classification are


done simultaneously, it becomes poss1b~e to represent. the classification
dendrogram m . the mo st natural way poss1ble. Thus, cons1der
. Figure · . lf
55
the only ana1ys1s o . t whích the data had been subJected had been a
e1ass1.fica t.10n m. to four classes ' the resultant dendrogram, thought of as a
mo b11e capa e of swiveling at every node, could have been
. bl . drawn in any
one of e1.ght ways·, for instance ' one of. the other .poss1ble versions ('1n
addition to the one shown in Figure 5.5) is the followmg.

HP F PO p

However, only two of the ways (that in Figure 5.5 and its ·mirror image)
show that, for example, HP is closer (more similar) than F to PO, and that
the greatest separation (dissimilarity) is between F and P. It should now be
clear that the numerous possible ways of drawing a dendrogram are not ali
equally informative. One of the merits of the TWINSPAN program is that it
arranges the dendrogram's branches in a way that puts similar points close
to each other, so far as is possible in a two-dimensional representation.
Since a TWINSPAN classification entails a new one-dirnensional RA
ordination at each step, the same result would be obtained if DCA ordina·
tions were ~sed. This is because the order of the points on the first axis is
1
?e same with ªDCA as with an RA ordination. Partitioning a two-dimen·
~10nal DCA ordination is yet another way of doing a divisive classification;
it has been proposed and demonstrated by Gauch and Whittaker (19811
wh~ g~ve the procedure the name DCASP · The partitioning is· done
subJectively p ff am
1
h h · ar rnns are drawn through parts of the scatter diagr .
w ere t e data po· t hod is
unlikel 10 m s can be seen to be sparse and therefore the met ..
Y make "fal " d · · · '' true
divisions m se ivisions. But there is a risk that sorne . n·
ay escape notice D · · o-d11ne
sional diagram · ata pomts may appear close m a tw ace
even though th . · nal sp
(for an example s F" ey are far apart in many-dunensi~ beÍJlg
rnisled by drawi~geteh ig~r~ 5.3). It may be feasible to guard against t is to
b
e Partitioned. e nummum spanrung . tree of the data swarm tha
221

EXERCISES

5..1 The following distance matrix gives the pairwise distances between
points in a swarm of 10 points in nine-space. The points are labeled
· A, B, ... , J. Find the segments of the minimum spanning tree and list
them with their lengths in the order in which they were found (as in
the center panel of Table 5.1). Draw a diagram of the mínimum
spanning tree.

A B e D E F G H I J
A o 1.88 2.33 2.26 1.74 2.93 3.30 10.73 8.83 8.57
B o 2.54 2.97 2.05 4.00 4.52 10.89 9.09 8.78
e o 3.22 1.54 4.01 4.10 11.28 9.66 9.21

D o 2.68 4.51 3.46 10.01 8.20 8.24

E o 3.84 3.56 10.54 9.04 8.64

F o 3.37 10.99 9.00 8.74

G
o 10.44 8.96 9.07

H
o 3.27 3.77
o 3.00
I
o
J
. # 14 is altered by putting x61 = x62
5.2. Refer to Table 5.2. If Data Matr~. of principal component seores
= O, it is found that the ma nx
becomes
13.2 8.4 18.4
-37.5 -34.8 32.2 32.5
-11.7 -7.7
-1.2 0.6 -12.5
-6.4 -0.3
1.8 9 .5 -7.0
2.5 -3.8 4.3 -0.3
Y= -0.7
0.4 0.2 -0.2 -0.l
o.o 0.2
-1.3 1.4
o.o -O.O o.o
O·O o.o
-00 Pl
. b Lefkovitch's method. ot
. . of these data y
Do a divisive classificati~n .
the two-dimensional ordi~at10~. of the data described in Exercise 5.'2
5.3. Carry out a divisive class1ficast1on when four classes have been d1s-
.' ethod. top
using Noy-Meu s m
tinguished.
Chapter Six

Discriminant Ordination

6.1. INTRODUCTION

The data matrices that have been described, ordinated, and classified so far
in this book have all been treated in isolation. lt has been assumed that an
investigator has only one data matrix to interpret at any one time. We now
suppose that several data matrices are to be interpreted jointly. lt is desired
to ordinate all of them together, that is, in a common coordinate frame, and
an ordination method is wanted that emphasizes as much as possible the
contrasts among them.
Here are several examples of the kinds of investigations in which joint
ordinations are helpful.
l. Suppose one were investigating the emergent vegetation (or the
benthic invertebrate fauna, or the diatom flora) of several lakes. The data
would consist of several data matrices, one from each lake. .
2. One might be sampling the insect fauna (or some taxononuc subset
of it) in wheat fields in July in several successive years. Then the data would
consist of several data matrices, one for each year · . . . al _
3 O 'gh b mparing environmental cond1t1ons m sever geo
. ne mi t e co b
w· hin ch region a num ero en f vironmen-
graphically separate regions. it ea b' 0 f "quadrats" (or other
ª
tal variables are measured in each of num. er arizing conditions in
ª
8atnpling stations) and the result is data matnx sunun
223
224 UIKRIMINANT
ORD¡NAlio~
. Then the total data consist of severa! data mat .
that reg10n. nces, on
each region. e for

I t should now be clear that situations frequently arise in hi .


. . . w ch ll .
. ble to ordinate severa! data matnces JOmt1y. The research
desu a ts
er usu
ts to kn ow whether the separate data matrices (from dill'er ªY
11
wan . . w ent lakes
ions or whatever 1t may be) differ from one another and
ye ars , reg
1

' . . . . . ' may do a


multivariate analys1s of vanance to judge, objectively, whether they d B
independently of any statistical tests that may be done, it is clearly ad~· ut
. . d. . anta.
geous to be able to see, in a two- d1mens10na1 or mat10n on a sheet of paper.
how the severa! sets of data are interrelated.
A way of achieving this is to ordinate all the data matrices jointly by
means of a discriminant ordination (Pielou, unpublished). Before the method
is described, we devote a section to necessary mathematical preliminaries.

6.2. UNSYMMETRIC SQUARE MATRICES

Ali ordination methods so far discussed in this book have entailed the
eigenanalysis of a symmetric square matrix. A discriminant ordination
requires that an unsymmetric square matrix be eigenanalysed. This section,
therefore, describes sorne of the properties of unsymmetric square matrices
and shows how they differ from symmetric matrices. In all that follows, the
symbols A and B are used for symmetric and unsymmetric square matrices,
respecti vely.

Factorization of an Unsymmetric Square Matrix

Recall that o1v thogonal


.
matnx u d d'
ª .
~ en symmetnc matrix A one can always find an or
'
an a iagonal matrix A such that
(6.1)
A= U'AU.
As always U' denotes the trans
To make the d' . pose of U. puttiJ1g
iscuss1on e1earer, we now change the sym b0 Is by
U':::::: V·, consequentiy U_
' - V'. Equation (6.1) now becomes
(6.2)
1
UNSYMMETRIC SQUARE MATRICES
225

Let us rearrange (6.2). Postmultiplying both ·d b V d · h f


.
that smce . orthogonal, V , V = 1, lt
V IS . Is s1 es y an usmg t e act
. seen that

AV= VAV'V = VAI


or, more simply,

AV= VA.
(6.3)
The columns of V ( which are the rows of U) are the eigenvectors of A
and the elements on the main diagonal of A are the eigenvalues of A'.
Indeed, (6.3) is the equation that defines the eigenvalues and eigenvectors
of A.
An exactly analogous equation, namely,

BW=WA (6.4)
defines the eigenvalues and eigenvectors of the unsymmetric matrix B. As
before, the elements on the diagonal of the diagonal matrix A are the
eigenvalues of B and the columns of W are its eigenvectors. But in this case

B =t= WAW'.

This is because W is not an orthogonal matrix; in symbols, WW' * l.


The lnverse of a Square Matrix

We now ask whether, given the matrix W, another matrix, to ~e-~e?o:ed b~


w-1, can be found such that ww - 1 =l. The answer is yes; lS now
as the in verse of W. . . every square matrix
f1ons noted m Exerc1se 6.1'
lndeed, apart from excep . f s·deration here we can
has an inverse. Excludmg t e e~c sa M which may be symmetric or
. h ept10ns rom con i ,

say that for any square .matnx, _y


ca:i be found such that
unsymmetric, another matnx, say M '

MM - 1 = M-1M =l.
der of the factors
. lied by its inverse, the or
(Note that when a matnx is ~ultip onl if Mis orthogonal, M = M ·
. . -1 ' lf

does not matter.) Moreover if, and Y '


1
Mis not orthogonal, M - =t= M '.
DISCRIMIN
226 AN1 O~D
l~A
l1r)~
.
d 'ving an equation of the form of (6 )
Hence m en W 1( W .2 frorn
. 1 both sides of (6.4) by - not ') to get (6.4)
postmultlp Y we

eww- 1 = wAw- 1
or, more simply,
B = WAw- 1
(6,)¡
The diw ·a-erence between (6.2) and (6.5) should be noted. It arise f
. s rom L
h t V whose columns are the e1genvectors of a symmetn· t11e
fact t a , e matnx .
orthogonal . In contrast W, whose columns are the eigenvectors of 11s
unsy mmetn·c matrix ' is nonorthogonal. . an

Finding the inverse of an orthogonal matnx presents no proble .


has only to write down 1ts. transpose. But find.mg the mverse
. of a nonm,thone
. . . fil~
onal matrix reqmres very labonous computat10ns. We do not describe the
steps here. Clear expositions can be found in many books, for example
Searle (1966) and Tatsuoka (1971 ). For our purposes it suffices to note tha;
usually (but see Exercise 6.1) an inverse for a square matrix can be found
and that most computers have a function for obtaining it. As we see in the
following, finding the inverse of a nonorthogonal matrix is one of the steps
in carrying out a discriminant ordination.
As an illustrative example of a matrix and its inverse, suppose

4
M = (-; 2
-1 3
Its ínverse is

1
M-1 = ( ;
-7
7 -2)
-11 .
16-10
The reader should check that MM-1 = M-1 M = l.

The Geometry of Ortho onal


and Nonortho o 1 g
g na Transformations
Let us n . · 10

transfi ow mvestígate the results of using a nonorthogonal JJlat[lt


orrn another matrix.
uNSV MMHRIC SQUARE MATRICES
227

first, recall what happens when an orthogonal matrix is used to effect a

tran~ultiplying
formation. It was shown m Chapter 3 (page 94) that the effect of
a data matrix by an orthogonal matrix is, in geometric terms,
10
pre otate the whole "data swarm" (the points defined by the transformed
r b h · · f h ·
matrix) rigídly a out t e ongm o t e coordmate frame; equivalently, one
think of the data swarm as fixed and the transformation as rotating the
can dinate frame (see Figure 3.3, page 97).
coor
Now Jet us trans f orm a data matnx· by premu1Up
· 1ymg
· 1t · by a nonorthog-
onal matrix. As an example, let the matrix that is to be transformed be

X
=(-11 11 -11 -1)
-1 '
whose co1umns give the coordinates of the comers of a square with center at
the origin (see Figure 6.la).

D
A B
--
-

____..
D - e

(a)

(e)
. (a) The original
.th a nonorthogon al matnx.
. ·r:r
TX in two dlllere nt
sforming data Wl sformation of X mto ints unaltered but
Figure 6.1. The effect of tran d (e) show tbe tran d· (e) shows th.e po 3 3)
"data swarm " a square X·' (b) an d but the polll· ts move , (Compare Figure . ·
'
\Vays: ( b) shows the axes un altere
through the ang1es shown.
lhe axes rotated, independently,
:au

. ·tru c.: t a 2 / 2 ma trix T that is to be us<:d to tran~forrn X


Wc now con s . , .. L . · L·
. ~ ., e forrn as tli e 11gltt-liand matnx rn cquat1on (3.7) (pag, c:t
it be of lh<.i SrHn e l(JJ).
hlll í ~, let us pul

(6.6)

w1.th e12 = 90ºO 1 and 021 -


J
90• º t 022 • (Hcnce
.
the

sguares. of the el e-
. eacJi ·row of T ·sum to
men ts m urnty.) But, unlike
. U m Equat1on (3 ·7) we
1

choose different val ues for 011 and 022 • Thts en sures that Tf' =f. I and
therefore T is nonorthogonaJ as required.
A a particular example, Jet

T _ ( cos 30º cos 60º) = ( 0.866 0.500 )


- cos 110º cos 20º - o.342 0.940 .

Then

TX = (-0 .37 1.37 0.37 -1.37)


1.28 0.60 -1 .28 -0 .60 .

Matrix TX is plotted in Figure 6.1 in two different ways. Figure 6.lb


shows the points plotted in an "ordinary" coordinate frame with the axes
perpendicular to each other. As may be seen, the square in Figure 6.la has
been distorted into the shape of a rhomboid, as well as being rotated;
compare it with Figure 3.3b, in which the square, though rotated, is
undistorted. In Figure 6.lc (which is comparable to Figure 3.3c), the square
is the same shape and has the same orientation as in Figure 6.la , but the
coordinate axes have been rotated. In the present case (in contrast to Figure
3.3c) the axes ha ve not been rotated as a rigid frame; instead, each axis has
been rotated separately and the axes are no longer perpendicular. The
angles between the new (y) axes and the old ( x) axes are shown in the
figure. Exercise 6.2 invites the reader to look even more closely at
the geometry of nonorthogonal transforrnations. .
Of course, not all nonorthogonal matrices are of the form of T in wb.Icbd
the elements of each . . riente
d . . row are the duect10n cosines of a new1Y 0
coor mate axis with th · whose
transpose 1s
. not 1denf
. e axes
1 ·not. mutually
. perpendicular. Any matnx .. ·
d fj01U0 11
The effects of such ica. Wlth lts mverse is nonorthogonal by .e have
matnces when used to transform other matnces
229

been illustrated in Figure 3 2b- e (


.1¡rea dy · page 93) Th
' sc arately) and alter the scales as well. A matrix like. . ey rotate the axes
(. P,,..,es· it leaves the scale of each axis unch d T m (6.6) only rotates
111e
áft , • ange .
Lastly, it should. be noticed that although ' fo r convemence
. the d.
·scussion deals w1th a data swarm in two-di . 1 ' prece mg
d 1 • • mens10na space and a 2 x 2
1rnnsformat10n
. matnx. h T, all the arguments can be e t l d
x rapo ate to as many
dirnens1ons as we w1s .

Eigenanalysis of an Unsymmetric Square Matrix

The last matter to consider in this section on mathematical preliminaries is


the eigenanalyses of unsymmetric square matrices.
The eigenvalues and eigenvectors of such a matrix, say B, are found b
solving the set of equations implicit in Equation (6.4), namely, Y

BW =WA.

If Bis an n X n matrix, then there are n eigenvalues (the elements on the


main diagonal of A) and n eigenvectors (the columns of W). There are
various ways of solving (6.4) but they are not described in this book. The
principles are fully explained and illustrated in, for example, Searle (1966)
and Tatsuoka (1971); applying them, except in artificially simple cases,
entails heavy computations. Here we merely give a simple example to
illustrate the results of such an eigenanalysis.
Suppose

B = ( ~~ ~~ -148)
-42 .
\ 41 59 -92

. BW - WA is satisfied (as the reader should confirm) by


Then th e equat10n -
putting

o
A=(~ 2
o J) and W = (-;
-1

. a·agonal of A are the eigenval-


t on the mam i
It follows that the e1emen 5 .
1 to the eigenvectors of B;
ues of B. The columns of W are proport10na
DISCIUMH4AN 1
(}fU)INA
230 11( )~~

. . still satisfied if the elements in any 1


ation is co urn
notice that the equ t rnultiple of the values shown. To n n cil w
constan . <>rrnar·
are replaced Y
.
b ª. h rnes to the same thmg, to put their ele . '~e lhc
wluc co . . rncn t~
eigenvectors or, . it is necessary to d1v1de through each . ' 1n lhl
. tion cosines, . colurn
form of duec f the sum of squares of 1ts elements. Jn th n CJI
b
h are root o e c;xarnp1
W y t e squ of W is, therefore, L,
the normalized form

0.5345 0.7428 0.8018)


-0.8018 0.3714 0.2673 .
( -0.2673 0.5571 0.5345

6.3. DISCRIMINANT ORDINATION OF SEVERAL SETS


OF DATA

N ow that the groundwork has been laid, we consider how several sets of
data may be simultaneously ordinated in a common coordinate frame in
such a way as to separate the different swarms of points as widely as
possible. The method is described here in recipe form because the underly-
ing theory is beyond the scope of this book. Theoretical accounts may be
found in, for example, Tatsuoka (1971) and Pielou (1977).
To illustrate the method, it is applied to real data. The data consist of
values of 4 climatic variables observed at 14 weather stations in 3 geo·
graphic regions. The purpose of the analysis is to ordinate the stations on
the basis of their climates.
In detail, the data are as follows. The locations of the weather stations
are shown
. . on the map m · p igure
· 6.2, and the place names appear ascolurnn
hea~mgs ~ Table 6.1. The three regions are: 1 the southern part of Yukon
Terntory m th e d' ' · the
e ana tan boreal forest · 2 northern Alberta, also 1I1 .
boreal forest b t t 1 ' '
rairi Th ~ a. a ower latitude; 3, southern Alberta in the ana
e diaD
P es. e climatic v · bl 1
The d . ana es are listed in a footnote to the tab e.
ata, which could b . . one for
each reo1 0 h e wntten out as three separate matnces, ;"
b'" n, ave thus b b watrJJ''
hereafter called X h ~en rought together as the single large t ¡be
, s own m T bl 6 . epara e
hr
t ee regions Th lim .
hr . e e atic ob
ª e .1. The dashed vertical lines 5 . ...,5)
. ID rol'l
t OUgh 6 Of the m t · Servations for each Station are shOWJl nrid z.
a nx. 1t remains
· · ws 1 a.v
to explain the elements 1.l1 ro
10~1MINAN T ORDI NATION or SlVLR Al
5
.
' f>I l)t\IA
SLIS
:n1

ALAS KA /
I
.....
)
YU KO N )

o (

'
I
(
)

N.W T
\

-,_
• I
BRITI SH I
COLUMBIA
I
• I
• I
• I
I
• I
ALBERTA /

- -- --
:i::'.egion
'
6.2. Map showing the locafions ol the 14 weather stalions lisled in Table 6.1. Stations
1 (Yukon) O; stat10ns m Region 2 (northem Alberta) • ; statioos io Region 3
(southern Alberta) ® .

These are "dummy" variables which show to which of the three regions
each of the stations belongs. As may be seen, every station has two dummy
variables associated with it, x 1 and x 2 . They are assigned as follows.

for all stations in Region 1


for all stations in Region 2
for all stations in Region 3.
TABLE6.l. DATA MATRIX GIVING THE VALUES OF 4 CLIMATIC VARIABLES
AT 14 STATIONS IN 3 REGIONS.ª·b

Region 1 Region 2 Region 3


Yukon Northern Alberta 1 Southern Alberta
1

-=
j¡I-, 1
e
= =
t:
1

.¡ ~ =
~
1
1
~
o
=
,.J = ~ 1 Q.l
~(J
a
Oll ~
~
rl)
rl) e e
=
(J
rl) ;j
=
~
1
o ~
·e:
Q.l
e
·o
e
o

- ~
o ofil ~
1
e
~
o e e
rl)

~ =
j¡I-,
~
,,Q
= ofil
t:o t:o =
,,Q
1
~ ~ t:o :e
u Q ~ ~
~
< ~ ~ ~
=
~
1
~
1
o
~ ~ ~
Q.l
~
~
~

Xl 1 1 1 1 o o o o o 1
1
o o o o o
X2 o o o o 1 1 1 1 1 1
1 o o o o o
1
X3 -18.9 -29.4 -25.0 -20.0 -17.2 -12.8 -25.0 -22.8 -18.9 1 -11 .1 -8.9 -8.3 - 11.1 -8.3
1
X4 12.8 15.6 14.4 14.4 15.6 15.6 16.7 16.1 15.6 1 20.0 17.8 18.3 20.6 18.9
1
X5 11.5 13.8 10.1 17.2 14.7 13.1 11.5 14.8 10.3 1 1.1 11.9 11.5 9.8 12.9
1

x6 11.3 18.3 18.4 23.0 31.9 34.2 20.4 30.1 31.8 1 29.1 26.2 26.6 22.8 26.2
1

ªData from "Climatic Summaries for Selected Meteorological Stations in the Dominion of Canada, Volume I." Meteorological Division,
Department of Transport, Canada, Toronto, 1948.
b x 1 and x 2 are dummy variables; see text; x 3 and x 4 are daily mean temperatures in degrees C in January and July, re pectively: x 5
a.nd x 6 are precipitation in cm for October to March and April to September, respective1y.
isCRIMINANT ORDINATION OF SEVERAL
O SETS OF DATA
233

In. general, if there were k reg·ions, k _ d .


1
equired to label all the stations Th ummy vanables would b
r · ey would be e

(l,O, ... ,o) forall st ations


· in Region 1
\º.' ~ ~ ..' :'. ~) for all stations in Region 2
( 0 , 0 , .. ., 1)· · ·f~; ~il ~t~~i~~~ in. R.~~~~ ·k :__ ·i
(O' O' ... ' O) for all stations in Region k'

with k - 1 elements in each vector.


Thus,
. when there are n stations altogether grouped mto
. .
k reg10ns and s
vanables are observed at each station ' X h as s + k - 1 rows 'and n
columns.
. In the present case with s = 4 , k = 3, and n -- 14, x 1s
· a (6 X 14)
matnx.
The operations to be carried out on matrix X are now described in
numbered paragraphs.

l. Center and standardize the data as described earlier. That is, replace
the (i, j)th element of X, x; 1 , by (xiJ - x¡)/a; where X; and a; are the mean
and standard deviation of all the elements in the ith row. (Note: it makes
no difference to the result whether the dummy variables are standardized. In
the computations shown in Table 6.2 they are standardized. Every row must
be centered.)
2. Postmultiply the matrix by its transpose to obtain the SSCP matrix
S. Sis shown in Table 6.2.
3. Partition S into four submatrices Sw Sw Sw and S22 as shown by
the dashed lines. The four parts into which S has been divided are as
foil . S · (k _ 1) x (k - 1) = 2 x 2 matri.x giving the sums of
OWS. 11 lS a · ummy variables x 1 and x 2 ; S22 1s
·
squares and cross-products o f t h e two d f
th · · · the sums of squares and cross-products o
es X s = 4 x 4 matnx givmg . S is a (k _ 1) x s = 2 x 4
the ~bserved variables X3, X4, X5, and x~, ro~ucts formed by multiplying
matnx whose elements are all sums of cross P d · bl . s = S'
one of the dummy variables by one of the observe vana es, 21 12.

(See Exercise 6.3.) l 5-1


11
and s-221 . They are
4. Obtain the inverses of Sn and Sw name y,
Written out in full in Table 6.2.
TABLE 6.2. STEPS IN THE DISCRIMINANT ORDINATION OF THE DATA IN TABLE 6.1.
The SSCP matrix Sis
14.00 - 6.60 1 - 8.34 - 9.41 3.54 -10.44
-~~j~ __ }~~~J--~~~~--=--~~~---~~~---JY~
1
- 8.34 - 3.66 1 14.00 9.05 - 4.22 5.89
- 9.41 - 3.28 1 9.05 14.00 - 7.20 4.78
3.54 3.38 1 - 4.22 - 7.20 14.00 - 0.41
-10.44 7.88 1 5.89 4.78 -0.41 14.00
The inverses of S 11 and S 22 are
0.13303 -0.07552 0.00038
0.04329)· -0.03014)
11 =(º·º9184
s-1 s-1 _
22
-0.01552 0.15650 0.05717 -0.02004
0.04329 0.09184 ' - 0.00038 0.05717
The product matrix D is
r -0.03014 -0.02004
0.10047
-0.01677
-0.01677 .
0.09046

0.40734 0.41975 -0.25893


0.53979 -0.07804)
-1 -1 0.57768 -0.29694 0.21171
D = S22 S21 S11 S12 = - 0.00263

The eigenvalues of D are


r -0.05904
-0.00195
-0.02255
0.00330
0.11999
0.01165 .
0.57397

;\ 1 = O. 96396 and ;\ 2 = 0.59832.


The first two eigenvectors of D (normalized) are the rows of
W' -(-0.61003 -0.78007 0.00494 0.13902)
2
( ) - -0.35471 0.02458 0.01981 0.93444 .
oisCRIMINANT ORDINATION OF SEVERAL SETS OF DATA

235
s. Find the matrix product D defined as

n -- s-22 1s21 s-111s12 .


Dis shown in Table 6.2.
6. Do an eigenanalysis of the unsymmetric square matrix D. The results
are shown in Table 6.2. The number of nonzero eigenvalues is always the
Jesser of s (the number of variables of interest) and k - 1 (where k is the
number of groups of data). Hence in the example in which s = 4 and
k - 1 = 2 there are only two nonzero eigenvalues ,\ and ,\ , and they are
1 2
shown in the table. Only the two eigenvectors corresponding to the two
Jargest eigenvalues of D are required for a two-dimensional ordination. (In
the present case, of course, the two largest eigenvalues are the only nonzero
eigenvalues.) These eigenvectors are shown as the rows of the 2 X s = 2 X 4
matrix Wá). .
7. The required coordinates for the data pomts are given by the
columns of the 2 X n = 2 X 14 matrix Y, defined as

in which X 4 is the s X n = 4 X 14· matrix obtained by deleting. the


k - 1 = 2 rows
( ) of dummy vana· bles from the centered and standardized
data matrix.

The ordination is shown m . Figure


. 6·3ª· For comparison, a PCA ordina-
· of the same d ata is
hon P1ven in Figure .6.3b.
· o-- 111erent'1-
ch more clearly d'"'
As may be seen, the three sets of pomts arPeCmAu The only outlier in the
d' · than by · 1
ated by discriminan! or mat10n Alb t station that seems to be ong
discrirninant ordination is the northern h er ~h its own group. This outlier
With the Yukon group of sta t10 . n s more
. t an
bl Wl it has the col dest wm
· t ers
. b seen m Ta e 6.1'
IS Fort Chipewyan; as may e Alberta group. . . \
anct driest summers of the northem a· tion is that it penmts the dif-
The advantage of diseriminant or dma·th maximum e1an·tY· The process
ferences among data sets to be displayed w1 es) for several batches of data
fi
nds new coordinates (i.e., trans formeh batch
seor shall be as compact ' and as
p. s that eac
ºlllts in a way that ensure h s as possible.
'Nidely separated from the other bate e '
236

(a)
••••
®
o

(\J
®®
® o
(f)
- ® o
X
<l

AXIS

(b)
• •
• o
(\J ®
®®

(f)
-
X
<I: o
• o
o
®

AXIS 1
Fim•re 6 3 · (a) a·1scnmman
Two ordinations of the 14 weather stat10ns: · · t ordinatiw(b) ' s
PCA
'to~ • • (centered and standardized). The symbols for the three reg1ons
lil.

ordination · are the same a


Figure 6.2.

The number of ways that now exist for classifymg · and ordinating
ecological data is already large. No doubt the invent10n . of new. ' more
has
ingenious techniques will continue. It could be argued that the t!Dl; lf
come for calling a halt to the endless proliferation of new _metho ~le
ecologists are ever to lit their individual contributions together mto 5 gd ª
· · knowledge, it seems desirable that few gootly
um·11 ed body of sc1en1Jfic ª .
methods of data ana!ysis should be adopted widely and used consisten
and that unproven methods should be consigned to the scrap-heap. re
At . h a mo
tractJve t ough this argument may be, it collapses before ret·
persuas1ve counterargument. This is that the development of data interp
¡XI t< ISI S

ing ml'lhods is an int.,


cgr"l. 1 part f . 237
11nd s1lOll 1l t not come t . 11 ° sc1cntific
w1·11• 1o· ' fo11 ow d by nnt ,ª . alt: Ev ery 1mpr
· pr gress. and ' as such will
vei
e e 1ing un nent in e ' not

1
w11on, and 1f th interp . pr vements ¡11 t 1 . omputer capabilif
. d rnttlt1on f ec 111iques f 1es
un mm s of ecol gists th ec 1ogical data . . or data interpre-
J
1\l g ncral principie llt;d ?- musl f amiliarize Lh ts lo remain in the hands
er ymg meth ds f d· emselves thoroughly w·th 1
ata handling.

EXERCISES

6.1. :h following three 3 X 3


mv r e . matrices H 1, H 2, and H3 do not have

H,= n fr 1
2
3
H2 = ( ~
-1 -2
2
o -i);
-3

H, = (
-3
~
1
2
-6 -n
Use each in turn to multiply the 3 X 8 matrix X (which
cube) where represents a

Examine the products H 1 X, H 2 X, and H 3 X and determine the


dimensionality of the figure into which tbe cube has been transformed
in each case. What relationship does this suggest between matrices
that cannot be inverted and the transformations that such matrices

bring about?
6·2. Construct a diagram like that in Figure 3.4 (page 98) but with the Y1
and rraxes not perpendicular to each other. Let the angles between
the o1d an d ew axes /J , /J , 821 , an d 822 be defined as on page 100.
11 11 12
Denve equations analogous to (3.4a) and (3.4b) on page 99.
DISCRIMINANT
238
O~DINl\110~
e x 3 matrix X is partitioned as shown 1"\t
6.3. Suppose th 6 1~' o tWQ
. denoted by X1 and X2.
matnces SUb-

Xu X12 X13
Xz¡ Xzz X23
---------
X= X31
X41
X5¡
X32

X42
X52
X33
X43
X53
(~:)
X61 X62 x63

Write out t~~ _6_x -6 SSCP matrix XX' in full, and show how it can be
partitioned-into four submatrices that are identical with those in the
product

(Keep track of the sizes of the various submatrices and their products.)
Note that the multiplication of partitioned matrices is carried out
according to the same rules as ordinary matrix multiplication except
that submatrices take the place of individual elements.
Answers to Exercises

CHAPTER 2

.t. (a) 10.15; (b) 8.49; (e) 10.10.


2.2.

"Farthest Distance Between


Step Fusion Points" Clusters

1 1, 3 1, 3 4.58
2 [1, 3], 2 2, 3 8.49
3 4, 5 4, 5 9.17

2.3. The coordinates of the centroids are:

Cluster [1, 2] Cluster [3, 4, 5]

5.5 0.333

o 1.667

2 2.667

-3 1.667

them is 7.190. )
The distance between lt is independent of P·
2.4. d1([P], [M, N]) = 554.125. (Note: The resu
2.5. 173.2.
239
240

. (b) 1.297; (e) 1.297 radians = 74.3º .


2.6. (a) 0.2 705 ' 3 S _ 6. ( ) J _
i
2. 7. (a) J = 3,
s = i. (b) J = 4, - 7, e - S = 1.
12 '
Proof that S ~ :

S = 2a/(2a + b + e) = 2a/(2a + f) on putting b + e : : : f;

J=a/(a+f).
S/l = (2a + 2/)/(2a + f) > 1 when f > O or = 1 wh
en f:::: O.

CHAPTER 3

3.1.

11
- 1 4 3)· (b) - 7
3 o 3 ' (
12
(e) cannot be
formed;

o
(d) 1
23
~ -2
4

10
- ~).
- 3 '
- 1
(e) cannot be formed;

(- 6819 22 3 o 3
(f) - 14 -8)
- 3 (g)
[ 14
30 1 8 9
78 24 - 10 20 7 -4 3
124 13 20 33
[Note: To form the r0 d
BC by A B P uct BCA, for example one may postmultiply
or by CA.] '
3.2.

-1).
3 ' u2 = ( -1 -1).
- 2 2 '
3.3. U2 is Orthogonal. U .
3.4. L ' i is not
XetXX'=A.Thena .. ist~ of
11
. and the Jth col e sum of cross-products of the ith row
is the sum of eros ~mn of X' (which is the jth row of X). Likewise, ap
X' ( hi . s Product 0 f . ...,11 of
w ch is the ·th s the J th row of X and the i th colu11u·
z row of X) H .
· ence a l.j. = aji.. for all i, J.
241

.5.

( _¿.5 -0.5)
1 .
5
.6. The eigenvalues of A are 2 5 = 32 and 3s = Thi
243 · s follows from:
As = (U'AU)(U1\.U)(U'AU)(U1\.U)(U'AU)

= U'A(UU')A(UU')A(UU')A(UU')AU
= U'A5U since UU' =l.
5
Hence the eigenvalues of A are the eigenvalues of A raised to the fifth
power.

( 0.6733 0.5858 0.4242 -0.1347 -0.0741 ).


.8. .\ 2 = 2.55 [from tr(S) = L¡AJ
.10. .\ 2 is the same for G and F. Hence .\ 2 = 45.285. The second eigenvec-
tor of F is

Then the second eigenvector of G, namely, v{, is proportional to u'2X.


Hence v{ = ( 0.61 -0.78 -0.14 ).

CHAPTER 4

U. The covariance matrix is (l/n )R with n = 8. Hence Ai = 36; A.2 = 25;


.\ 3 = 16.
U.
is the sum of squares of the distances { x-coordinate framt
Tr(XX')} of ali points in the swarm from the y-coordinate framt:
Tr(YY ') origin of the
. ·d e at the centroid
d. ate frames comc1 .
Since the orio1ns of the two coor m hi 1·1 follows easily that
er , _ (YY'). [From t s
of the swarm, tr(XX ) - tr
(l/n)tr(XX') = L¡A¡; see page 126.]
242

2 X 2 correlation matrix be
4.3. Let the

p = (~
Let the matrix of eigenvectors be

U= ( C?S0 sinO)
- smO coso .

Then since UPU' = A, it follows that the (1, 2)th element of UPU' is
O. That is, p(cos20 - sin28) = O. Hence

cos O= ±sin O; o= 45°.


Therefore the eigenvectors are:

( cos O sin O) = ( 0.7071 0.7071)


and

( - sin O cos O) = ( -0.7071 0.7071 ).

4.4. (a) 10.6º; (b) 25.0º; (e) 55.9º; (d) 45º.


4.5. 0.816.
4.7. The ordinated data form a triangle the lengths of whose sides are:

d(A,B)=4; d(B,C)=3; d(A, C) = 7.

4.9. The following equations are numbered to correspond with those in tbe
text, except that primes have been added here.

w = c-1x,R-1xw (4.22')

c112w = c-1;2(X'R-1X)(c -112c112)w (4.24')

= (c- 112x,R-1;2 )(R-1;2 xc-1;2 )(c112w) (4.25')


= Q( c112w). (4.26')
243
Therefore.

'Cl 1
" - = W'Cl/ 2Q
whence (4.27')

wc1, 2 a: u
Q ( 4.28')
where UQ is the n
n matrix of e1genvectors of Q Th f
· ere ore,
w a: uQc-112.
( 4.29')
uo.
-12.99 -4.33 4.33 12.99
o o o o
Y= o o o o
o o o o
o o o o
Ibis follows from the fact that the data points líe on a straight line in
5se-space (i.e .. they are confined to a one-dimensional subspace of the
5xe-space ). The PCA places the first principal aus on this line;
therefore. the coordinates of the points can be found by determining
the distances separating them. An eigenanalysis of the covariance
matrix would show that it has only one nonzero eigenvalue. The
number of nonzero eigenvalues of a covariance matrix, which is equal
to the number of dimensions of the subspace in which the data points
lie, is known as the rank of the covariance matrix. In this example, the
rank is l.

CHAPTER 5
5.I. Th e segments of t h e · · m spanninP0 tree are:
ffilillIDU

3: d(B, A)= 1.88;


1: d(C. E) = 1.54: 2: d(A. E)= 1.7 4 :
6: d(G. A)= 3.30;
4: d(D. A) = 2.26: 5: d(F.A) = 2.93:
9: d(H. I) = 3.27.
8: d(J. I) = 3.00;
7: d(l. D) = 8.20;
G F

E B
J

~
\ ~ F
11
'11
I \1
H ---'f-- - - \1
~- 1 ------ 1
1 ----- D_~A
b --~ ~
J E \--.C

• B

One of the many possible diagrammatic representations of the mini-


mum spanning tree is shown in the figure (upper panel). Compare it
with the two-dimensional ordination of the data, with the mínimum
spanning tree superimposed, in the lower panel. The lower panel is a
reproduction of Figure 5.3 (page 210) with the points labeled to
match. The distance matrix in the exercise was taken from Gower and
Ross (1969).
5.2. Successive divisions give the folloWiflg classes:

1: (A, B) (C, D, E, F);

2: (A) (B) (C, D, E) (F);

3: (A) (B) (C) (D, E) (F);

4
: (A) (B) (C) (D) (E) (F).
245
N atice that A and B are sep
. f . arated at the
still orm a close pau in th . .second step eve th
indistinguishable from the e or~mation pattem whin h ~ugh they
. pattern m F c is almost
separat10n occurs because th igure 5.4a. This " ,,
dB e second princi 1 . unnatural
an . pa axis passes between A
5.3. Successive divisions give the f 0 11owmg
. classes:

1: (A, B) (C, D, E, F);

2: (A, B) (C, D, E) (F);

3: (A, B) (C) (D, E) (F).

Observe that the close pair (A, B) has not been divided.

CHAPTER 6

6.1. The points of H 1 X form a straight line, hence form a figure of one
dimension. The points of H 2 X (and also of H 3 X) are confined to a
plane, and hence form a figure of two dimensions. This leads to the
conjecture that a matrix can be inverted only if it brings about no
reduction in the dimensionality of a swarm of points when it is used to
transform the swarm. Proof of the correctness of this conjecture, and
methods for judging whether a given matrix can be inverted, are
beyond the scope of this book. See, for example, Searle (1966),
Chapter 4, or Tatsuoka (1971), Chapter 5. Matrices that can,. and
cannot, be inverted are called, respectively, "nonsingular" and "smgu-
lar."
6·2· The equations are

Y1 = X¡COS 011 + X2COS 012

Y2 = X¡COS 021 + X2COS 022.


Glossary
(Words in italics are defined elsewhere m
. the glossary.)

Agglomerative classification. Same as clustering.


Arch effect.The appearan ce o f a proJected
. data swarm as a curve (" arch")
when. the ~ata were obtained from sampling units ranged along a
one-drmens10nal gradien t.
Asymmetry, coeflicient of.A measure of the degree to which an ordination
axis approaches unipolarity (see unipolar axis).
Average distance between two clusters. The arithmetic average of all the
distances between a point in one cluster and a point in the other.
Average Iinkage clustering. Collecti ve term for all clustering methods in
which the dístance between two clusters depends on the locations of all
points in both clusters. Contrast nearest-neighbor and farthest-neighbor
clustering.
Average Jinkage clustering criterion. The dissimilarity between clusters [P]
and [Q], where [Q] is formed by the fusion of clusters [M] and [N]. Four
measures of this dissimilarity are:
Centroid distan ce, trom the centroid of [P] to the centroid of [Q]. .
·a f [P] to the midpoint of the lme
Median distance, from the centro1 o
joining the centroids of
. [M] and [N]. the average distance between [P]
Unweighted average dzstance, same as
and [Q]. 247
248 qOssA~v
·stance the average of the average dist
d average d1 ' · b anee b
Weighte M] and the average d1stance etween [P] anct [N e.
n [P] and [ · xhib. ].
twee •ty The heterogene1ty e ited by a d t
Between-axes
beterogene• · ª a sw
more subswarms confined (or almost confi ar117
. t'ng of two or . . nect) t
cons1s t of the total space contammg the whole o
·i:rerent
d111' subspaces . swarm
. h. -axes heterogenetty. .
Contrast wtt zn
. d ta Data consl
·sting entirely of. zeros and. ones.
Bmary ª · = { 1 if speci~s i is present m quadrat },
X¡; o otherw1se.
. d'nation axis on which the data points have sofue
B. 1 r axis An or l
ª . . ·an d some negative seores. Contrast unipolar axis.
ipo pos1t1ve
"l

.
Catenat1on. A n ordination designed to show, as clearly as possible' the
structure of nonlinear data.
Centered data. Da ta in which the observations
. . are expressed as deviations
from their mean value. Hence their sum 1s zero.
Centroid. The center of gravity (or "average point") of a swarm (or
cluster) of points in a space of any number of dimensions. The
coordinate of the centroid on each axis is the mean of the coordinates
on that axis of all the points in the swarm (or cluster).
Centroid clustering. A clustering technique in which the distance (or
dissimilarity) between two clusters is set equal to the distance (or
dissimilarity) between their centroids.
Centroid distance. See average linkage clustering criterion.
Cbaining. In a clustering process, the tendency for one cluster to grow by
the repeated addition, one at a time, of single points.
Characteristic values or roots. Same as eigenvalues. See eigenanalysis.
Characteristic vector. Same as eigenvector: See eigenanalysis.
Chord distance. The shortest (straight line) distance between two points on
the same circle, sphere, or hypersphere.
City-block distance CD Th · a· te
' · e d1stance between two points, in a coor roa
frame of any numb 0 f a· · nts
. er imens10ns, measured as the sum of segme
parallel w1th the axes p . .
· or pomts J and k,
s

Cluste · T
CD =
i=l
L
lxiJ - xikl ·
rmg. he process of 1 · . · · ·milaJ
Points to form e assifymg data points by combmmg 51 er
11
sma classes, then combining small classes into Jarg
249

classes, and
;¡:; so. on. Samc as av<T/o merutwe
, . el ;r·
{'¡(')
sive elassL11catwn. uss1;1cation. Contrast divi-

vector. A matrix
coJurno . with onl y
written as a vertical column.
one colum 11 E
• -<quivalently, a vector

rnplete-linkage clustering. Sam e as 1arthest-neighbo


r ·/ .
C0 · · l l t · r e ustenng
cornbtnatona e us ermg methods. Th ose , .ll1 wh1ch
. each .·
matrix can be constructed from the d' . success1ve distance
prece mg d1stance t· h
ma nx; t e raw
data are need e d only to con truct the r.i1rsl d'islance matrix
correlation
. . . coefficient.
h . A tandardized f orm of covanance . .obtai d b
d1v1dmg t e covanance
· · of two variables s d ne ofY
' ay x an Y, by the product
. m
the stand ar d d eviat1ons of x and y · lts val ue a1ways 1ies . [ -1 1] It
measures the degree to which x and y are related. ' ·
Correlation ma~ix~ A symmetric matrix in which the (h , i)th element,
when h =I= z, is the correlation coefficient between the h th and ¡ th
variables. All the elements on the main diagonal (top left to bottom
right) are l.
Covariance. When two variables, say x and y, are measured on each of a
number of sampling units, their covariance is the mean of the cross-
products of the centered data. The ith cross-product is (x¡ - x)(y¡ - Y)
where x and y are the means of the xs and ys.
Covariance matrix. A symmetric matrix in which the (i , í)th element is the
variance of the ith variable, and the (h, i)th element is the covariance
of the h th and i th variables.
Czekanowski's Index of Similarity. Same as percentage similarity.
Data matrix. A numerical table in which each columhn listlis. a::
. . · ( adrat) and eac row s
t~= ::i:~sv~;
tions on one samplmg unzt or qu '
one of the observed variables in all quadrats.
. . multidimensional space of one
Data point. A geometric representat10n 1ll
column of a data matrix. . upies a space of
· which usua11Yocc
Data swarm. The set of all data pomts
many dimensions. . . tionships produced
. the hierarchica1 re1a
Dendrogram. A diagram showing
by a hierarchical classiftcation. nts except those on
. . which all e1eme
Diagonal matrix. A square matnx 1ll · gbt) are zero.
the main diagonal (top left to bottolll n
. . Th co mes of tJ e angl es madc by any linc lhrou1r~
.
ec ·o
1
o rnes.
_
e h .
d' t fram e and t e axes o
f th
e f ,
ramc. E{luiv· &1 lhc
. . f a coor ma e . , u1cnu
ong1n o f h pr ~ection s onto the axes of a umt segmt.nt 0 f y,
thc Jengths o t e lhc
line.
. A matrtx showing the distance from cach point to e .
i tan e ma . vcry
other point in a data swarm
•.. 'fi ti'on. The process of classifying data points by fir&t divid
t ' ' e e1as l ca h d. . .
ing the whole swarm of points into classes, t en re 1v1dmg &orne C>r al!
·nto subclasses and so on. Contrast clustenng
of the e elass es l ' ·
• • The process of fi.nding the eigenvalue- eigenvector pairs of a
E1genana1 1 •
guare matnx A. The eigenvalues are the elements of .the diagonal
matríx A and the eigenvectors are the rows of U (eqmvalently, the
columns of V') where A= U'AU.
E"gen ·alue-eigenvector pair of a matrix A .. An eigenvalue (or "eigen-
calar"'J-e1genvector pair of A are, respect1vely, a scalar number 'A and
a row vector u' related by the equation u' A = A.u'. If A is an n x n
matrix. there are n such pairs. See also eigenanalysis.
Element of a matrix. One of the individual numbers composing a matrix.
Tbe (i , J )th element is the number in the ith row and the jth column
of the matrix.
Euclidean distance. The distance between two poin ts in the ordinary sense
in one, two, or three dimensions, or the conceptual analogue of distance
in spaces of more than three dimensions.
farthest-neighbor clustering. Clustering in which the distance (dissimilar-
ity! between two clusters is taken to be the longest distance between a
parr of points with one member of the pair in eaph cluster. Contrast
nearest-neighbor clustering. -
· le d1stance
Geodesic metric· Th e great cuc · (shortest over-the-surface d'is·
tance) between tw ·
o pomts on a sphere or hypersphere.
Group-average clust · .
.h ermg methods. Cluster;ng methods that use the un
ltetg ted or weighted · · · ·1 ·ly
[ r r . average dzstance as measures of the d1ssll1ll aI1
o a pa1r of clusters.
Hierarchical classificat. . d
Every indi' ·d bion. A classification in which the classes are rank.e ·
vi ua1 elongs t0 n]dJlg
class, up to the highe ª cl~ss, and every class to a high~r-ra
t class which is the totali ty of all indiv1duals.
251

orsesboe effect. Same as arch effect.


entity matrix. A square matrix in which all th ¡
diagon al ( top left to bottom right) e e ements on the main
are ones and all h
zeros. ot er elements are

ternode. See node.


verse of a square matrix. The inverse of an n
roatrix
1
x- SUCh that xx-1 = x-ix = l. (1 XIS. nthematidentity
. X. h
. rzx IS t en X n
matrix )
accard's Index
· ofh Similarity between two quad ras.
t The ratio
. a/(a ·+ f)
where a is t e number °
. of species comm on t b oth quadrats and f is
the numb er of spec1es present in one or other (b ut not both) of the
quadrats. Compare S(í}rensen's Index.
Latent values or roots. Same as eigenvalues. See eigenvalue-eigenvector
pair.
Latent vector. Same as eigenvector. See eigenvalue-eigenvector pair.
Linear data. A data swarm is (approximately) linear if its projection onto
any two-dimensional space, however oriented, gives a two-dimensional
swarm whose long axis is (approximately) a straight line. If any
projection yields a (projected) swarm with a curved axis, then the data
are non linear.
Linear transformation. A transformation of one set of points into another
done by defining the coordinates of the transformed points as linear
functions of their coordinates befare transformat10n. The ongmal
· h fi d e· e they are never squared
coordinates appear only ID t e rst egree i. .,
or raised to a higher power).
Manhattan metric. Same as city-block distance.
. MS A measure of the dissimilarity of
Marczewski-Steinhaus d1stance, · d' 1 dex of Similarity. MS =
two quadrats, the complement of J afccar ~es npresent in only one (not
f /(a + /) where f is
. the number
.
o specI
umber of species coromon to bot
h

both) of the quadrats, and ª IS the n


of them. Th meaning of each num-
. 1 ay of numbers. e . h
Matrix. A two-dimensiona arr . . . in the matrix, that is, on t e
ds on its pos1t10n
ber (or element) d epen . occurs.
. which It . A
row and the coluillil in d et AB of two roatnces
. n of the pro u d
Matrix multiplication. The formatl~B is the sum of the paifWise pro ucts
and B. The (i, j)th elernent of
252 GLossl\~y

· the ¡ th row of A and the jth column of B. Ben


f h Iernents m . ce AB
o t ee ·r the number of columns in A is equal to the nurnbe
·sts only i A d r of
eXI . . AB has the same num ber of rows as an the same nurnb
rows m B, s B· AB is A postm ultiplied by B or, equivalently eBr
of colurnns ª '
•n/ied by A. In general, AB -=fo BA.
,
premult Ir
. g method A clustering method that uses the medi
Median clustenn · . . . . an
.
dzstance b etween clusters as a d1ssurulanty measure.
. d'is t ance . See average-linkage clustering criterion.
Median
Metric measures of dissimilarity. Measures th at, like distance, satisfy the
triangle inequality axiom.
Mínimum spanning tree. The shortest spanning tree that can be con-
structed in a given swarm of points.
Mínimum variance clustering. Clustering in which the two clusters united
at each step are those whose fusion brings about the smallest possible
increase in within-cluster dispersion.
Monotonic clustering methods. Methods in which the occurrence of rever-
sals is impossible.
Nearest-neighbor clustering. Clustering in which the distance (dissimilar-
ity) between two clusters is taken to be the shortest distance between a
pair of points with one member of the pair in each cluster. Contrast
f arthest-neighbor clustering.
Nodes and intemodes. The parts of a dendrogram. The nodes are the
horizontal lines linking classes of equal rank. The internodes are the
vertical lines linking each class to the classes above and below it in
rank.
Nonlinear data. See linear data.

Normalized data. The coordinates of a data point or the elements of a


vector rescaled so that their squares sum to unity.
Ordination. The ord · f
enng o a set of data points with respect to one or
more axes. Alt~rnatively, the displaying of a swarm of data points inª
two or three-d1mens. 1 .
hi wna coordmate frame so as to make the relation-
s ps among the points · . . . . . _
ion. m many-d1mens10nal space v1Slble on mspect
Ordination-space partiti . . . .
swarm of d _om~g. The placmg of partitions in an ordmated
ata poznts m ord
classes. The result . . .. er to separate the points into groups or
is a dwiswe classi.fication.
253
ogonal matrix. A square rnatrix that h
ortbrnatrix, ·· . ' w en used as a lransformation
causes a ng1d rotation of the d t
. f . a a swarm arou d 1
the coor d mate rame without any cha f n t 1e origin of
· . nge o scale Th d
orthogonal matnx and its transpose is th .d . · e pro uct of an
. e l ent1ty matrix.
rtitioned matr1x. A matrix that has been s bd .. d .
a u iv1 ed mto sub 1 ·
placing one or more horizontal (between-row) .. marices by
· 1 (b partit1ons and/or one
more vertica etween-column) partitions so d. . or
into rectangular blocks. as to iv1de the mat nx ·

percentage difference. Same as percentage dissimilarity.


percentage dissimilarity, PD. The complement of percentage similarity PS.
PD = 100 - PS.
Percentage distance. Same as percentage dissimilarity.
Percentage remoteness, PR (new term]. The complement of RuZil:ka's
Jndex of Similarity RI. PR = 100 - RI.
Percentage similarity. The percentage similarity of quadrats j and k is

s min(xi}' X¡k)
S = 200 L + percent
i=l Xij X¡k

. are the quantities of species i in quadrats j and k,


where x. . an d X ,k . ·
. '1 . the lesser of the two quant1t1es.
and IIlln( xi 1 , x ¡ k) is . d ts from a homogeneous
b h 0 f rephcate qua ra
Pool (in this book]. A ate h are due only to chance.
.
populat10n. ·a-
Diuerence s among t . em
Postmultiply. See matrix multiplica~wn.
· ultiplicatwn. ·t
Premultiply. See matrlX m for a swarrn of data pom s,
coordinate axes . of the data. Each
Principal axes. The new . . cornponent analysis
. d b doing a pnnc1pa 1 f the data.
obtame Y . . ¡ component o . . component
. t a przncipa b a pnnc1pa1
axis represen s . bles derived Y . ht d sum of the
New vana h is a we1g e
Principal components.. a bodY of dat~. Eac r of the centered and/or
analysis to describe red) variables, o
. . ll rneasu
"raw" (as angina y .ponent for an
. bl s . ipal com d
standardized vana e . The value of a pnncoint on the correspon -
t score. d. ate of the p
Principal componen the coor in
. t J-Ience
individual poin ·
ing principal axis.
t 111, I
254 ' I•¡

. A ordinatwn of species. Thc dutu /" 11111 1


d . tiOfl n d. . 1111
Q-type or ma · dinate axes (before or mat1c n) r prt 111 /
. d the coor . . / 1u1r1t,, 1
spec1es an / of the covariance matnx analyzcd 1s th , 11
. k )th e/emen .
The (J, . . ( f all species) in quadrats .J and /, 'on1u_1.,'''"'''"'
1 / ly¡11'
.
of the quant1ttes o
ordination. . .
. b k an ecolog1cal samplmg umt ol any ktnd.
Quadrat. In this oo '
. . An ordination of quadrats. The more usual 1,11 n
R-type ordmation. d d 1 • )f
. .
ordmat10n. Th e data points represent
. qua rats
. an t11c coo 011 Jtc,
1111.

(be fore ordl.nation) represent spec1es.


. The ( h, 1 )th e/emem
. . . ( 1 1Ji "'''ª' 1
anee ma tnx· analyzed is the covariance of. the. quant1t1cs (tn '111 qu; <Ir·,' 1
of species h and i. Contrast Q-type ordmatwn .
Rank of a covariance matrix. The n~mber of its non.zcro <'t?,rw;a/ur-"
Equívalently, the number of dimens10ns of the spacc in wh1ch thc do.to
points lie.
Residual matrix. The r th residual matrix of the square symmetnc matr11, A
lS

where A¡ and u¡ are the ith eigenvalue and eigenvector of A.


Reversa) (in clustering ). A reversa} occurs when a fusion made late rn <i
clustering process unites clusters that are closer together than wcre the
clusters joined at an earlier fusion.
Row vector. A matrix with only one row. Equivalent1y, a vector written as
a horizontal row.
l.

Ruiicka's lndex of Similarity, RI between quadrats j and k is

RI = 100 t
Í=l
min(xiJ, xik)
max( x¡ 1 , xik)
percent

where x ¡1. and x ar th . . . d k·


mi ( ik e e quantities of species ¡ in quadrats J an
n x,J, X¡k) and max(x )d . d the
greater of th t .iJ.' xik enote, respectively. the lesser an
e wo quantities.
Sample. A collection of sam . .
Sampling unit A . . . pling uruts or quadrats.
. · n md1v1dual pl t
0 uch
umts, each of whi h . or quadrat. A collection of rnany 5 .
under study co ·
e Is a ditt ....,mun1tY
erent small fragment of the coiu.u >
, nstitutes a samvlP
5caJar. An "ordinary" numbcr ·
matrix). ' in co11l1ast lo a11 a11ay <>f 11u111h1,;. 1 ~. (a

Single-linkage clustering. Samc as, nearest-ne1ghl I


50rensen's Index of s· . . )()re· usll'ring.
1m11anty hctwccn
2a/(2~ + f ),where a is the number of s }~º quadmts. 'J lae ratio
and f is the number of sp e· . , pcc1cs L:om111on to botf1 quadrat:-.
h d tes present 111 onc < ll
t e qua rats. Compare Jaccard' , d >r <> 1cr (but not bolla) of
. .~ 1n e 0 / Sunilarity.
Spannmg · tree. A set of line segm'~11ts . . . ali ll · ·
'"' · JOrnmg ·
pomts so that every pair of po· t . . H.: pornls in a swllrn1 of
111 IS 1111ke<.J by J
are no loops). on Yonc:, palh (i.c:,., U1cn,

SSCP matrix. Same a um-of-squares-and-cross-products matrix


Standard deviation. The square root o f th e vartance.
. ·
Standardiz~d data. Data that . have been re caled by dividing every ob-
servation by the standard deviation of all the obscrvations.
Stopping rule. A rule for deciding when a divisive classification should
stop.
Submatrix. A subset of a given matrix that is it elf a matrix. It is
delimited en bloc from the "parent" matrix, with the arrangement of
the elements unchanged. See also partitioned matrix.
Sum-of-squares-and-cross-products matrix. The matrix obtained by multi-
plying a data matrix by its transpose. The (i, i)th element is the sum of
squares of the ith variable. The (h, i)th element is tbe sum of
cross-products of the h th and i th variables.
Syounetric matrix. A square matrix that is synunetric about its main
diagonal (top Jef t to bottom righ t). Thus the ( h, i )th elemen.t and. tbe
1
(i, h)th element are equal for all h, i. A square matnJ< ol wbich this
not true is unsymmetric. th lements on the main diagonal
f
Trace of a square matrix. The sum o .e e
(top left to bottom right). The trace of A is written tr(A). . .
ultiply a data matnx m
d
Transformation matrix. A matrix use to prem
order to bring about a linear transformation of the data. .
f the s X n matrix X is tbe n X s
Transpose of a matrix. The transpos~
0
s (and consequently, the
1
matrix having the rows of X as its co umn , '
columns of X as its rows). It is denoted by X·
256

. Same as dendrogram . .
Tree daagram. . The axiom that the d1stance betwee
rty
1 ax1om. n any
Triangle inequa ot exceed the sum of the distances frorn t\Vo
. A and B cann eacn
points . oint C; that is, d(A, B) :::; d(A, C) + d(B, C). of
them to a third p .
· distance mea sures · Those that cannot m any circurn stanc
U1trametr1c l . a clustering process. es
cause a reversa m . . .
. d·nation axis on which the data pomts all have
· lar axis. An or 1 · C . seores
Umpo .
of the same s1gn (all positive or all negat1ve). ontrast bipolar ax ·
is.
. tr'x See symmetric matrix.
Unsymmetrrc ma • · .
.
Unwerghted average distance.
. Same as average dzstance. See average-tin.
kage clustering critena.
.
Vanance. Th e mean of the squared deviations, from their mean value, of a
set of observations.
Variance-covariance matrix. Same as covariance matrix.
Vector. A row vector or column vector. In sorne contexts, the n (say)
elements of a vector constitute the coordinates of a point in n-dimen-
sional space.
Weighted average distance. See average-linkage clustering criteria.
Weighted centroid distance. Same as median distance. See average-linkage
clustering criteria,
Weighted and unweighted clustering methods. Weighted methods treat
clusters as of equal weight irrespective of their numbers of points.
Unweighted methods treat data points as of equal weight so that the
weight of a cluster is proportional to its number of points.
Within-axes heterogeneity. The heterogeneity exhibited by a data swarm
consistmg of two or more subswarms, when all subswarms occupy the
same many d · · ·t
- imensional space. Con trast between-axes heterogenei Y·
Within-cluster d' · · ces
isp~rs1on of a cluster. The sum of squares of the distan
from every pom t . h 1
m t e e uster to the cluster's centroid.
Bibliography

Anderberg,
York. M. R · (1973) · Clu\'fer
' A11a/y.rn
· · · 1or
r. A ¡· ·
/IP trn/1011s . /\c¿Hfl:111ic l'n.:ss, Nl'w

Carleton. T. J., and P. F. Mayc<?ck (1980). Vcgctation of thc boreal forcst~ sou ll1 of'
James Bay: non-centered componcnt anaJysis or thc vascular llora . h'"º'ºJ'Y (,f :
1199-1212. \
Delaney, M. J., and M. J. R. Healy (1966). Variation in lhc whitc-toothe<.l shrcws
( Crocidura spp) in the British Islcs. Proc. R<~l' · Soc. B. 164: 63 74.
Gauch, H. G., Jr. (1979). COMPCLUS- A FORTRAN program for rapid initial
c/ustering o/ /arge data sets. Comell Univcrsity, Ithaca, N.Y .
Gauch, H. G. (1980). Rapid initial clustering of largc data sets. Vexetatio 42:
103-111.
Gauch, H. G., Jr. (1982a). Multivariate Analysis in Communi~y Ecolo¡zy. Cambridge
University Press.
Gauch, H. G., Jr. (1982b). Noise reduction by eigenvcctor ordinations. ecolo¡zy 63:
1643-1649.
d R. H. Whittaker (1972). Cocnoclinc simulation. Ecolox_v 53:
G auc h , H . G ., J r., an
446-451.
d R. H. Whittaker (1981). Hicrarchical cJassiftcation of
G auch , H. G ., J r., an
community data. J. Eco/. 69: 135-152. . .
. . . d . corrclation In "Ordrnat1on
G dall D W (1978a). Sample smnlanty an spccics · - ,
oo ' . . . . " (R H Whittaker Ed.), W. Junk. Thc llague, pp.
of Plant Commumt1es . · '
99-149. . . 1 "C1assiftcatjon of Plant Com-
Goodall, D. W. (1978b). Numerical classification. n ,,
257
258

. . ,, Whittaker, Ed.), W. Junk, The Hague, pp. 249-28S.


mumues (R. H. /. h E /
Classification . Methods 1º' t e xp oratory Ana/ .
Gordon, A .D. (198l)Chapman and Hall, New York. Ys1s of
M ltivanate Data.
Gower,uJ. C. ·(1967). A comparison of sorne methods of cluster analysis. Biotnetrics
.
23· 623-637. . 1 .
G · J C. and P. G. N. Digby (19~1). ~prDess1~~(VcomBp ex trelaEtidonshi~s in tWQ
ower, · . ' I n "In terpreting Multivanate ata
dimens10ns. . ame t, .), Wlley ' New
York, pp. 83-118. . . . .
Gower, J. C., an d G J
. . s. Ross (1969).
. Minimum
8· 54 64 spanrung trees and single linkage
cluster analysis. Appl. Statlst. 1 . - . .
Hill, M. O. (1973 ). Reciproca! averaging: an e1genvector method of ordination.
J. Eco/. 61: 237-251.
Hill, M. O. (l 97 9a). DECORANA-A FOR~RACN ProgrVam. for .Dtetrehnded Corre-
spon dence A nalysis and Reciproca/ Averagmg. orne11 ruvers1 y, 1t aca ' NY ·
Hill, M. o. (1979b). TWINSPAN-A FORTRAN Prog~am for Arranging Multi-
uariate Data in an Ordered Two-Way Table by Classificatzon of the lndividua/s
and Attributes. Comell University, Ithaca, NY.
Hill, M. o., R. G. H. Bunce, and M. W. Sh~w (1975)_. Indica.tor _species analysis, a
divisive polythetic method of classificauon, and its applicat10n to a survey of
native pinewoods in Scotland. J. Eco/. 63: 597-613.
Hill, M. O., and H. G. Gauch, Jr. (1980). Detrended correspondence analysis, an
improved ordination technique. Vegetatio 42: 47-58.
Jeglum, J. K., C. F. Wehrhahn, and M. A. Swan (1971). Comparisons of environ-
mental ordinations with principal component vegetational ordinations for sets
of data having different degrees of complexity. Can. J. Forest Res. 1: 99-112.
Kempton, R. A (1981). The stability of site ordinations in ecological surveys. In The
Mathematical Theory of the Dynamics of Bio/ogical Populations IJ (R. W. Hiorns
and D. Cooke, Eds). Academic Press, New York, pp. 217-230.
Lance, G. N., and W. T. Williams (1966). A general theory of classilicatory sorting
strategies. l. Hierarchical systems. Computer J. 9: 373-380.
Lefkovitch, L. P. (1976). Hierarchical clustering from principal co~rdinates: an
efficient method for small to very large numbers of objects. Math. Biosci. 31:
157-174.
Levandowsky M (1972) An a· · · · d f
v . ' . '. · or mation of phytoplankton populations m pon s o
arymg salinity and temperature. Ecology 53: 398-407.
Levandowsky, M., and D. Winter (1971). Distance between sets. Nature 234: 34- 35·
L1etfers, V. J (1984) E
Alb t . · S .. · mergent plant communities of oxbow lakes in nortbeastern
er a. alinity water le I ft · 62'
310-316. ' ve uctuations and succession. Can. J. Botany ·
Maarel, E. van der (1980) O . .
42: 43-45. · n the interpretability of ordination diagrams. VegetaflO
259
,1 E. van der, J. G. M. Janssen and J M
1111anrc ' f · ' · W L
i program or structunng phytosociological t~bl . ouppen (1978). TABORD
ks P. L., and P. A. Harcombe (1981) F es. Vegetatio 38: 143-156. 'a
tvfaJ ' T E l . orest veget t.
southeast exas. co. Monogr. 51: 287-305. a ion of the Big Thicket,
~wrison, D. F. (1976). Multivariate Statistica/ M th d
York. e o 'S. 2nd ed. McGraw-Hill, New
Newnhan:1· R. M . (196~). A classification of climate b rinc·1
and 1ts relationship to tree pecies distribuf YI:'P P~ component analysis
. ion. rorest Set. 14: 254-264
Nichols, S. (1977). On the mterpretation of princi al com 0 . . · .
cal contexts. Vegetatio 34: 191-197. p P nent analys1s m ecologi-

Noy-Meir, l. (l 973a). Data. transformations in ecoloo-ical


er ordination.
. I Sorne ad-
vantages of non-centermg. J. Eco/. 61: 329-341.
Noy- ~eir, l.. (~ ?73 b ). Divi~ive .polythetic classification of vegetation data by opti-
IlllZed diVls10ns on ordination components. J. Eco!. 61: 753-760.
Noy-Meir, l. (1974). Catenation: quantitative methods for the definition of coeno-
clines. J ºegetatio 29: 89-99.
Noy-Meir, L D. Walker, and W. T. Williams (1975). Data transformations in
ecological ordination. II. On the meaning of data standardization. J. Eco!. 63:
779-800.
Orlóci, L. (1978). Multivariate Analysis in Vegerarion Research. W. Junk, The Hague.
Pielou, E. C. (1977). M athematical Ecology. Wiley, New York.
Rohlf, f. J. (1973). Algorithm 67: Hierarchical clustering using the mínimum
spanning tree. Computer J. 16: 93-95.
). Algorithm AS 13-15. Appl. Statist. 18: 103-110.
Ross, G . J . S. (1969 . N y k
Searle, S. R. (1966). Matrix Algebra for the Biological Sciences. W?ey, ew ~r .
11 (1966) Para.metric representat1ons of non-linear
Shepard, R. N., and J. D. cai:ro . ·al . ,, (P R Krishnaiah, Ed.), Academic
data structures. In "Multivanate An ysis · ·
Press, New York. . w H. Freeman &
Sneath, P. H. A., and R. R. Sokal (1973). Numerzcal Taxonomy. .
Co., San Francisco. . clusters in association
. . al . .ficance of spec1es
Strauss, R. E. (1982). Staustic sigm
analysis. Ecology 63: 63 4-
639 · . Wiley New York.
/
Tatsuoka, M. M. (1971) Multivariate ~na_Yszs~f Plan; Communities. W. Junk, The
Whittaker R. H. (Ed.) (1978a). Ordznation
' .. Junk, The Hague.
Hague. . Plant Communities. W.
Whittaker, R. H. (1978b). C/assi.fication o/
lndex

Page numbers in boldface · d.


m icate substantial t
reatment of a topic.
Anderberg, M . R., 184
Apollonius's theorem, 48, 78 Clustering meth 0 ds, combmatorial,
. 70
Arch effect, 193, 196 group average, 70, 74
Asymmetry, coefficient of 163 monotonic, 74, 76
Axes, principal, 148 ' no~hierarchical, see Pooling
we1ghted and unweighted, 70 , 73
Binary data. 53 Components, principal, 148
Bipolar axis , 162 Correlation coefficient, 108 , 183
Bunce, R. G. H. , 218 matrix, see Matrix , correlation
Correspondence analysis, see Reciproca)
averaging
Carleton, T. J., 164
Cosine separation, 50
Carroll, J. D .. 197
Covariance, 105
Catenation, 190
Covariance matrix, 103, 111
Centered arid uncentered data 103, 15 8 ran.k of, 243
Centro id, 25 '
Czekanowski 's index of similarity, see
Chaining, 22 Similarity
Characteristic value, see Eigenvalue
Characteristic vector, see Eigenvector Data, linear and nonlinear, see Linear data
Classification: Data matrix , 1
agglomerative, see Clustering Data point, 8
divisive, 203 Data swarm, 8
hierarchical and nonhierarchical, see Pooling Delaney, M. J., 209
Cluster, 10 Dendrogram, 19, 220
Clustering, 10, 13, 203 nades and internodes of, 19
average linkage , 32, 63, 69, 73 Detrended correspondence analysis, 190, 195
centroid, 25, 51 , 70, 74 Difference, percentage, see Dissimilarity,
complete linkage , 22, 72 percentage
farthest-neighbor, 22, 72 Digby, P. G. N., 175
median, 70 Direction cosines, 102
Discriminant ordination, 223
rninimum variance, 32, 72
Dispersion, within-cluster, 32
nearest-neighbor, 15, 72
Dissimilarity' percentage, 43' 55. 61
single linkage , 15, 72
261
26:l

u. (l JI (

1d, h7
1 rd 7 '
t blod 'í
ud1d n, 14 ~I 'i9 e 1 I f

Maru ., -St rnh. i1


7 ~ •(1
n dian. lJfl;
p r cntag • \([ Di nt nt '
nw J~hl1 d a erar
wc1µhtr.d avuar
wc1r.h1cd ccntr 111 6x
l',lnn1on, 13 , 1QO

'' 1 1 n 1 l
Livcnílnaly'>1 • 116, l 2ó. 229
Hotcllm m lhod 120
f.:,igr.;nvalu J lb, 22
L•!' nv uor. J lb, 225

Gaud1 H (_, Jr 9, 77, !SI, 152, 17h 11>1.


l9S, 198 . 20 . 220
(, ( tJC<ilC.. mi.:tfh. 7, 'íll, 51, 61
G1 1(1dall D \\ 4 ~. 44, 5 . 57. 73
Gowcr J (' • 70. J 75 206, 209, 244
1 ad1en1 . 191 • 194

Hai om hc. P A ~19


Heal;, 1\1 J R . 209
1-kll:IO).!CllCÜ)
hi:h\ 1.:cn-a 1(<!
\\llhm-a e . lo2
Hil 1, M O • 1 <> 19 • 2 1~
Hor eh dkct, 1t1 An:h llnt 12

lnd1ca1or f)(LJc , 21 Newnlta111 , 1 . M, 1 'i


N1d1ul , S., 1S'i, l SK
Jace:ard' 1ndc of •.i111tl:i11t) 57 hl Non. 1.:, 15 1, l 'í'i
Jan en. J (., 1 . 1n Nuriltlll::11 d.11 .1, ,,,,, 1 111 .11 1!. 11
Jcglum . J K., 152 N111111d1 1l·dda1 .i,46,'il ,11
N11y f\1 ·11 , 1., 107, ltil, lf,4 , 1 ' 1 ~ • 1
Kc.:mp1<11 R A. 17S 215, ') 17

. Lan1.: . ' , 5. t1l) , /(), 75 01di11.it11111 , H\ 1.l1


l..atl:nt vo.ilu1.: , <'< l.1gc.:n uluc.: d1-.l 11111111:111r , 22
Latcnt \l.! tur. i1' F1gcrl\eu11r Q lypc, X, 71
lefkm·111.:h, L p . "l l I . _ 1
l~\andm,., ~k}. M • 4 S7 5 R-typL', 8 J(Jl. 2JO, 218
OnJ1nntin11 -<, p1icL' p:HIJ[JCllllll '~ (I) 72, 71
Jrl(K1, L., J2 . .¡e;, 50, 'i?. )}{, ' '
263
c. 34, 150, 169n, 197 , 204, 214
sscp (sums
píelOU, E· ' ' f
. -o -squares-and-cross-products)
.,.,4. 230
.--
1

76 77 matnx, see Matrix, SSCP


poo]lng, ' . Standard deviation, 104
nce-and-absence data, see Bmary data
prese . Standardized data, 107 ' 155
. ·pal component ana1ys1s, 136, 152 Strauss, R. E., 71
p0 nc1
·pal coordinate analysis, 165 Submatrix, 233
PrJnCI
Swan, M. A., 152
Q-type ordination, see Ordination, Q-type
Quadrat, 9 Table arrangement, 4, 198, 199
TABORD, 198
Reciproca} averaging, 176, 184 Tatsuoka, M. M., 12, 126, 226, 229, 230,
Redundan ey' 15 2 245
Remoteness, percentage, 44, 55, 61 Transfonnation:
Reversa}, 74 linear, 92
Rohlf, F. J., 206, 209 orthogonal and nonorthogonal, 226
Ross, G. J. S., 206 , 209, 244 Tree diagram , see Dendrogram
Rotation: Triangle inequalily axiom, 41
nonrigid, 227 TWINSPAN, 218, 220
rigid, 96, 116, 137
R-type ordination, see Ordination , R-type Uncentered data, see Centered and uncentered
Ruzicka's index of similarity, 44, 57 , 61 data
Unipolar axis, 162
Sample, 9
Variance, 104
Sampling unit, 9
Variance-covariance matrix, see Covariance
Scalar, 85
matrix
Scatter diagram, 6, 88
Vector, 85
Searle, S. R., 226, 229 , 245
column, 86, 96
Shaw, M. W., 218 row, 85
Shepard, R. N., 197
Similarity, percentage, 43, 61 Walker, D., 107, 165
Sneath, P. H. A., 70, 72 Wehrhahn, C. F., 152
Sokal, R. R., 70, 72 Whittaker, R. H., 72, 191, 220
S~rensen's index of similarity, 56 • 61 Williams, W. T., 65, 69, 70, 75 , 107, 165
Spanning tree, 205 Winter, D., 45, 57
mínimum, 205

You might also like