You are on page 1of 188

fJA

<':; ~
', .... ­
.G1Y
1qq? ,~. .- ~;(

,~, ,

ACADEMIC PRESS LID. ~',~

24-28 Oval Road, ~,t

,1'
London NWl
t
United Slales Edition Published by
ACADEMIC PRESS INC.

~,'.
Preface
(Hareourt Braee Jovanovieh, Ine.)

Orlando, Florida 32887

Copyright © 1984 by

ACADEMIC PRESS INC. (LONDON) LID.

Third Printing 1993

When 1 arrived in Paris, in July 1973, 1 had little idea what lay in store for
me. After a traditional statistical education and with a Masters degree in
muitivariate analysis tucked under my arm, 1 embarked on a course which
All Righls Reserved
was to shake up most of my previous ideas on statistics as well as on life itself.
No part of this book may be reprodueed in any form by photostat, microfilm, or
any other means, without written permission from the publishers
It is impossible for anyone to spend two years as a student in Paris and,
likewise, impossible for any statistician to spend two years in contact with the
revolutionary Jean-Paul Benzécri without being radically atTected as a conse­
quence. It has taken me sorne time since those years of doctoral study in
British Library Calaloguing in Publicalion Dala France to fully comprehend al1 that 1 learnt. This is an ongoing process, of
/
Greenaere, Miehael / course, and this book represents a milestone of 10 years' personal experience
Theory and applieations of correspondenee of the statistical method the French call analyse des correspondances, which
analysis. has been obviously translated as "correspondence analysis".
1. Multivariate analysis
I. Title My reasons for writing this book were twofold. First, in 19801 was invited
519.5'35 QA278 to give a paper on correspondence analysis at an international conference on
ISBN 0-12-299050-1 muitidimensional graphical methods, called "Looking at Multivariate Data",
LCCCN 83-72867 in Sheffield, England. There was considerable interest in my talk and 1
realized then, more than ever, the tremendous communication gap between
Benzécri's group in France and the Anglo-American statistical school. 1 felt
that it was almost my duty to undertake the writing of a book which would
explain this important facet of French research to English speaking statistici­
ans, using not only a language but a mathematical style familiar to them.
Secondly, having gained experience of the tremendous versatility of
correspondence analysis in describing graphically almost any rectangular
table of data, 1 have tried to write a book which will be readable and helpful
to researchers in the natural and human sciences in general. Of course it is
impossible to avoid mathematical details in such a text and it is assumed that
Filmset by Advanced Filmsetters (Glasgow) Ltd the reader has sorne mathematical background. However, the book can be
Printed in Great Britain by Hartnolls Ltd, read at different levels. On the one hand the theory of correspondence
Bodmin, Cornwall analysis is laid out systematically once and for all as a primary reference,

i,', ~~:~~;~{;j :1 v

.,: '-~:J;";"J ~
Preface vii
vi Preface
always prepared for a last-minute typing cnsIs, to Lien Badenhorst for
while on the other hand there are sufficient practical examples and applica­
assistance with the graphical material, to colleagues who helped with the
tions to justify this text fully as a practical manual as well. 1 hope that this
proofreading, to Tom Bishop for his dependable transporting of a heavy
book will bring many researchers' ideas on the subject to a point that it
manuscript and original drawings to London on my behalf, and to Emily
serves as a springboard for a much wider and more routine application of
Wilkinson and Jeremy Lambert of Academic Press for their co-operation and
correspondence analysis in the future.
expert guidance of the manuscript towards the final product in your hands
This book is suitable as a course to statistics students, at either under­
graduate or graduate level depending on the detail prescribed. Numerous now.
examples, with solutions, terminate each of Chapters 2 to 8 and will fami­
liarize students with the theoretical material. N otice that a basic knowledge of December 1983
Michael Greenacre
the algebra of matrices and vectors is assumed, but not of their geometry,
which is described in detail. A course which concentrates less on mathematics
would be more suitable for students of most other disciplines in which
numerical data are collected and analyzed. Such a course would consist of a
careful reading of the first 3 chapters, generally ignoring the theoretical
examples, skimming through Chapters 4 to 8, and final1y concentrating on an
appropriate selection of applications in Chapter 9. This approach can be
fol1owed initial1y by readers who are not statisticians but who wish to gain a
rapid overview of the method's basic concepts and interpretation. Subsequent
reading of the theoretical examples and Chapters 4 to 8, as the need arises,
wil1 gradual1y fil1 in the details and demonstrate the wide applicability of the
technique and its unique position in the field of multivariate analysis.
1 would need many more pages to thank in ful1 appreciation al1 the people
that have led, directly or indirectly, to the publication of this book. 1 have
dedicated this book to Jean-Paul Benzécri in acknowledgement ofthe role he
has played in my education. To my friends and colleagues in Paris, Pierre
Teillard, Michael Meimaris, Ludovic Lebart, Maurice Roux, Michel Jambu,
Sylvie Stepan, Madame Laraise, Bernard Michau, René Dorr, Laurent Degos
(to name but a few !), 1 extend my warmest thanks. Here in South Africa 1 owe
a particular debt of gratitude to Michael Browne, who has always kept me
on the straight and narrow path of rationality and common sense in spite of
my many attempts to stray off! (1 must add a similar word of appreciation to
John Gower, who performed more or less the same task while 1 was at
Rothamsted in England.) To my family, parents, friends and colleagues 1
apologize for their having to share with me that particular traumatic state
that accompanies aventure of this nature and thank them for their patience
and understanding during this periodo
Final thanks go to the University of South Africa (UNISA) and the
Council for Scientific and Industrial Research (CSIR), for their generous
financial support on numerous occasions during crucial years of study and
research, to Cas Crouse, Piet van der Westhuizen and the farnilies Théron,
Brink and Claassens for their encouragement, to Edna Schu1tz for her
excellent and competent typing of my manuscript, to Lizzie Pieters who was
J

) Contents

I
Preface v

CHAPTER 1 Introduction
,1

CHAPTER 2
Geometric Concepts in Multidimensional Space 14

I 2.1 Vectors and multidimensional space


2.2 Distance, angle and scalar product
2.3 Weighted Euclidean space
15

25

28

I
2.4 Assigning masses (weights) to vectors 33

2.5 Identifying optimal subspaces 35

2.6 Examples 41

I CHAPTER 3
Simple IIIustrations of Correspondence Analysis
3.1 A typical problem
3.2 The dual problem
3.3 Decomposition of inertia
54

54

60

66

I 3.4 Further illustration of the geometry


3.5 Examples
76

80

I CHAPTER 4
Theory of Correspondence Analysis and Equivalent

Approaches .
4.1 Algebra of correspondence analysis
4.2 Reciprocal averaging
83

83

96

I 4.3 Dual (or optimal) scaling


4.4 Canonical correlation analysis
4.5 SimultaneousJinear regressions
4.6 Examples
102

108

116

119

I
CHAPTER 5
Multiple Correspondence Analysis
5.1 Bivariate indicator matrices .
126

127

x Contents J Contents xi

5.2 Multivariate indicator matrices 137 References . 326


5.3 Analysis of questionnaires and non-responses 146 1,
5.4 Recoding of heterogeneous data 157 APPENDIX A Singular Value Decomposition (SVD) and Multidimen­
5.5 Examples 162 sional Analysis 340
,1
lo
CHAPTER 6 Correspondence Analysis of Ratings and Preferences 169 APPENDIX B Aspects of Computation 352
6.1 Doubling, and its associated geometry 171
6.2 Comparison with other scaling techniques 179 Subject Index 358
6.3 Examples 183

CHAPTER 7 Use of Correspondence Analysis in Discriminant


Analysis, Classification, Regression and Cluster Analysis 185
7.1 Discriminant analysis 187
7.2 Classification 190
7.3 Regression . 193
7.4 Cluster analysis 196
7.5 Examples 202

CHAPTER 8 Special T opics 207


8.1 Stability and statistical inference . 207
8.2 Reweighting and focusing 222
8.3 Horseshoe effect . 226
8.4 Imposing additional constraints 232
8.5 Treatment of missing data 236
8.6 Analysis of symmetric matrices 239
8.7 Analysis of large data sets 245
8.8 Examples 247

CHAPTER 9 Applications of Correspondence Analysis . 255


9.1 Eye colour and hair colour of 5387 schoolchildren 255
9.2 Principal worries of Israeli adults 259
9.3 Ratings of analgesic drugs 263
9.4 Multidimensional time series. 267
9.5 Patterns in examination marks 271
9.6 Protein consumption in Europe and Russia 280
9.7 Seriation of the works of Plato 291
9.8 Antelope census data in African game reserves 294
9.9 HLA gene frequency data in population genetics 299
9.10 Measurements on fossil skulls, with missing data. 308
9.11 Graphical weather forecasting 312
9.12 References to published applications . 317
J

IJ

rn
To Jean-Pau/ Benzécri Introduction

1.1 DATA DESCRIPTION, MODELS AND OBJECTIVITY

The role of statistics is to summarize, to simplify and eventually to explain.


Typically, a researcher who is studying the manifestation of a particular
phenomenon, tries to record or measure all the aspects that he considers
relevant to an understanding of that phenomenon. It is rare, if not impossible,
that the researcher acts in a totally objective manner that is independent of
his own preconceived ideas. The statistician, on the other hand, usually has
the advantage of neutrality with respect to the study, and he may help the
researcher to interpret his observations without being infiuenced by the aims
and hopes of the researcher.
Nevertheless, if one examines the set of conventional statistical techniques
in use today it is c1ear that the statistician himself can rarely proceed without
introducing a certain degree of subjectivity into his own framework. Often
sorne general form of mathematical model is presumed to underlie the
observed data and the statistical analysis involves estimating the particular
form of the model which "best fits" the data. Because of the large degree of
subjectivity involved in the selection of models, it is not surprising that such
a strategy is often fraught with controversy. While sorne extremists might go
so far as to consider their particular models as "Iaws of nature", the more
I rational analyst will admit that a model is at best a good summary of the data
set at hand, simplifying the data in a style preconceived by the analyst him­

I se1f and thus providing a very modest and tentative explanation of the
phenomenon under study. It is unfortunate that so much emphasis is placed
on a model as a representation of reality, which is usually unjustified, with

--.1.
2 Theory and Applications ofCorrespondence Analysis 1. 1ntroduction 3

little or no attention paid to its ability to describe data meaningfully. In fact, is given in Fig. 1.1, where the "stem" is the tens digit of the age and the "Ieaf"
the whole question of data description has not been given the attention it the units digit. This picture contains both a broad summary in the form of the
deserves, as pointed out by Finch (1981). Often the data set at hand is the outline of a conventional histogram, as well as the finer details ofthe data (for
one-and-only set of observations available, the "sampling units" constitute example, the high number of ages in the late 20s).
the population and the study is never to be repeated. In such a case the It is no coincidence that the stem-and-Ieaf plot is a graphical method, as it
description of the data is of supreme importance. is our contention that graphical displays provide the best summaries of
Thus, in general, once a set of data has been collected we would advise that data-a picture is worth a thousand numbers. A graphical description is
the first analytical step be sorne sort of descriptive summary. For example, the more easily assimilated and interpreted than a numerical one and can assist
most familiar summary of a set of quantitative measurements is the arithmetic all three functions mentioned at the start of this introduction: summarizing a
average, or mean, often reported along with the standard deviation. As far as large mass of numerical data, simplifying the aspect of the data by appealing
the applied researcher is concerned (and sorne statisticians too) the values oC to our natural ability to absorb visual images, and (hopefully) providing a
these two quantities might totally replace the original set of measurements as global view of the information, thereby stimulating possible explanations. In
he develops a mental picture of the data located around the mean with a spite of these advantages, it is only in recent years that the value and
spread summarized by the standard deviation. However, the statistician tremendous potential of statistical graphics have been realized. This is mostly
should be aware of the ideal circumstances (i.e. the model) in which such due to the rising importance of exploratory data analysis in a world which
summary values are adequate and thus might consider difTerent measures oC grows more complex and varied each day, where there is a continual
location and dispersion (e.g. median and percentiles) to be more appropriate. proliferation of potential1y interesting phenomena and an abundant supply
Thus even at this low level of analysis, with minimal manipulation of the data, of information available (or obtainable) to study them.
it is difficult to proceed objectively. Sorne structure is always presumed to
underlie the observations by the very nature of our analysis. This leads us to
the following aphorism (with apologies to George Orwell): no statistical 1.2 CORRESPONOENCE ANALYSIS ANO MULTIOIMENSIONAL
methods are objective, but sorne are more objective than others. GRAPHICS
While on the subject of a single set of quantitative measurements, let us
lavish praise on John Tukey's ingenious stem-and-Ieaf plot (Tukey, 1977), A complete data set is seldom so simple that a few histograms can summarize
which converts such data into a neat graphical description with no (or very it adequately. The type of data which we shall be discussing in this book is of
little) loss of information. An example of a stem-and-Ieaf plot of a set of ages a multivariate nature, that is a number of qualities have been observed or
measured on each unit of study, where this unit might be a person, a time
43 30 22 41 29 period, geographical region or other object.
27 24 33 51 63 For example, in an opinion survey of a group of people we might have,
for each person, his or her age, residential area, educational qualifications
35 29 36 18 45
1 889 as well as opinions on a number of controversial issues like the present
38 38 49 20 39 government, women's rights, child education, etc. The resultant data are
2 0124478889999
19 30 53 28 28 3 000123556889 often summarized only marginal1y, by which we mean that the individual
4 13456679 qualities are summarized separately: for example, the histogram of ages, the
30 55 65 46 66
5 011358
29 18 35 46 frequencies in each residential area and in each educational category and the
6 356
7 1 frequencies of response categories of each question. Such a summary may be
51 31 71 24
total1y inadequate in revealing the interesting patterns of response amongst
47 21 50 29 the group. F or examp1e, although there might be very few people in the
32 28 58 44 sample who have had a university education and very few people who feel
FIG.1.1. The ages of a group of 46 people (on the Jeft) and the stem-and-Ieaf strongly about women's rights, it might just be that these are largely the same
histogram of these data (on the right). For example. the first row of the histogram people. This would only be noticed if the categories of education were
shows that there are two people of age 18 and one of age 19 in the group. "crossed" with the opinion on women's rights in a two-way frequency table,
4 Theory and Applications 01 Correspondence Analysis 1. 1ntroduction 5

which summarizes the association, or "interaction", of these two qualities. In


a study involving a large number of observations, or responses, from each
person, such patterns of association might be difficult to detect amongst a
huge number of possible tables, and other means of describing such data are
needed. Of course, it would be possible to investigate the patterns which we
suspect a priori to exist in the data, but we rather want an exploratory frame.
t
work where the patterns reveal themselves.
In arder to fix our ideas, let us consider a specific data set concerning news­
paper readership over a period of 5 years (Table 1.1). Each year a sample of
1000 people was asked which of 21 newspapers they read regularly (since one
I
person can read more than one newspaper regularly, the total readership in
each column is always greater than 1(00). A potential advertiser wouId
obviousIy be interested in the row and column totaIs of this data matrix, for newspapers
example in the fact that newspaper H had the largest readership over the 5
years or that readership was highest in 1978. However, these marginal views

TABlE 1.1
Readership (in readers per 1 000 peryear) of 21 daily newspapers over 5 years.

Year
Newspaper 1976 1977 1978 1979 1980 Totals

A 64 58 67 59 60 308
-¿, ~~~~
-.L.
B 18 18 23 20 17 96 ~<J'>
e
o
E
F
G
12
36
29
133
34
10
25
21
115
28
9
34
25
116
30
12
31
20
107
26
27
20
89
29
9 52
153
11 5
560
147
I FIG.l.2. The "profiles" of four newspapers (C, J, P and U) over the 5-year
period

H 178 143 180 150 148 799 do not describe the changing readership patterns, for example how readership
I
J
8
101
8
113
5
143
6
112 107
6 33
576 tI of specific newspapers changes with time. An initial attempt at representing
sorne of these patterns is illustrated in Fig. 1.2. The data matrix is represented
K 66 56 60 58 53 293
L 87 69 79 68 69 372 as a long, flat table and the frequencies in each row, expressed as percentages
M 23 19 17 19 17 95 of their respective row totaIs, may be displayed as histograms rising up from
N 34 24 29 26 23 136 the table (for example, newspaper e had a total readership of 52 in the sampIes
o 70 56 60 55 50 291 over 5 years; the readership of 12 in 1976 is 23% of this total so that the
P 29 20 25 19 18 111 first "block" in row e has a height representing 23 %). Each of these sets of
o 46 40 38 38 33 195
relative frequencies is cal1ed a profile, in this case a profile of a particular
R
S
123
79
122
68
149
70
122
61
11 2
57
628
335
1¡ newspaper's readership across the 5-year periodo Only four profiles are
T 130 109 148 110 100 597 dispIayed in Fig. 1.2 for purposes of illustration. It is immediately apparent
U

Totals
22

1322
17

1139
19

1326
15

1134
16

1060
89

5981 I that the profiles of newspaper e and of newspaper J are quite different, with
J rising to a peak in 1978 and e dropping to its lowest point in the same year.
The profile of newspaper P shows a different pattem, with readership

I
6 Theory and Applications oICorrespondence Analysis

1. 1ntroduction 7
generaIly faIling off from the initial peak in 1976, while the profile of
newspaper U is fairIy similar to that of P.
Even though such a diagram is a simple transcription of the numerical
~2=0.00094 .1
data, it is clearIy easier to interpret because of its visual impact. Yet problems
08.3%)
will still arise if we try to picture and assimilate aIl 21 row profiles. A diagram
like Fig. 1.2 will become as crowded as the skyline of Manhattan and it will .c
no longer be so easy to evaluate the similarities and differences amongst the
profiles. It would help if profiles which have "similar" shapes (e.g. those of
newspapers P and U) could be grouped together somehow in order to
achieve a global view of the data. ..
1977
.0
.M

Although we have given no description of how correspondence analysis .J 1979


A .. .K F
functions yet, let us now examine the graphical display of Fig. 1.3 which is •8 .R •
S •
obtained from the correspondence analysis of Table 1.1. In this display there "1980 G~O ~rO.OO325

are two sets of points, one representing the newspapers (rows) and the other .l (63.2%)
representing the years (columns). Each point representing a newspaper can H.
U
be considered a display of the complete profile of that newspaper. In other 1978 ..
.0
·.N .. 1976

words, what we have assessed previously as a pattern consisting of five ·T .E


components has been somehow condensed into a single point. Furthermore,
the distance between the newspaper points is intended to be a measure of .p
similarity between their profiles. Thus newspapers e, J and Pare far from sea le
0.05
each other because their profiles are different, while newspaper U is in the
FIG. 1.3. Two-dimensional display. obtained by correspondence analysis of the
vicinity of P because their profiles are similar. Al! the newspaper points are data of Table 1.1.
situated in this display to reflect the relative positions of their profiles in as
"correct" a manner as possible, given the constraints of the display.
In this case the display is two-dimensional, that is on a flat plane, and it is
The points representing the years are interpreted in very much the same
not possible to transcribe aH the inter-profile information onto such a
style, since correspondence analysis treats the rows and columns of a matrix
display. A measure of the completeness of the summary provided by the
in the same way. Thus each year point represents the profile of that year
display is given by adding the percentages indicated on each rectangular
across the set of newspapers, that is, the frequencies in the columns of
co-ordinate axis: 63.2 %+ 18.3 % = 81.5 %' The display would be improved if
Table 1.1 relative to their column totals. It appears that there were relatively
we made it three-dimensional but this would be at the expense of complicating
marked changes in readership patterns from 1976 through 1977 to 1978, after
our view of the points. This is a general principIe which permeates aH
which the pattern tended back towards the 1977 pattern and started to
descriptive statistical methods, namely that there is a trade-off between ease
stabilize.
of interpretation and completeness of description. On the one hand, we have
The positions of the two sets of points with respect to each other are
a graphical display like Fig. 1.2 which represents the profiles exactly but is
interpreted in a very special way in correspondence analysis, and this will be
difficult to assimilate globaHy when aH the profiles are included, while in
described in greater detaillater. Roughly speaking, each newspaper point will
Fig. 1.3 we have sacrificed some "information" (as little as possible) to obtain
líe more or less "in the direction of" the year in which the newspaper's profile
a display which is much easier to interpreto The usefulness of a technique like
is prominent. Newspaper P, for example, lies on the side of the year 1976
correspondence analysis is that the gain in interpretability far exceeds the loss
because it had a relatively high readership that year, while newspaper J lies
in information, as we hope to demonstrate many times throughout this book.
completely on the other side ofthe display because its readership is relatively
low in 1976 and high in 1978. Hence there is an agreement between the
positions of the row and column points in terms of their association in the 1.3 HISTORICAL BACKGROUND
data matrix.
When tracing the historical development of correspondence analysis, it is
necessary to distinguish the development of the analysis itself as a multi­
8 Theory and Applications ofCorrespondence Analysis 1. 1ntroduction 9

dimensional graphical technique from that of analyses which are based on objects in sorne sense". Bock's report included many simple examples as well
the same algebra and numerical procedure, but which operate in com­ as details of a computer program, and did much to popularize the method.
pletely different frameworks. Thus, although it is possible to trace the algebra The only paper of note during this time in the biometric literature was that
of the technique back almost 50 years, it is only in the last 20 years that of Williams (1952), which is also often cited. It is true to say that the method
correspondence analysis has existed in the form that we describe in this book. remained little known outside the psychometric world until the 1970s, which
The "Ieading case" of a data matrix suitable for correspondence analysis prompted Hill (1974) to label it a "neglected multivariate method". Further
is a two-way contingency table which expresses the observed association historical details and references may be found in the book on dual scaling by
between two quantitative variables. In 1935 H. O. HartIey published a Nishisato (1980, Section 1.2).
short mathematical article under his original German name (Hirschfeld, Correspondence analysis, the geometric form of the aboye methods,
1935) which gives an algebraic formulation of the "correlation" between originated in France in a completely different context (linguistics), and was
the rows and columns of a contingency tableo We can attribute the the brainchild of Jean-Paul Benzécri. In the early 1960s a group of French
mathematical origins of correspondence analysis to this paper, although analysts were studying large tables of data obtained from various literature
Richardson and Kuder (1933) and Horst (1935) independentIy suggested sources. Benzécri (1977a, Section 3.2.4) recalls that the first table he studied
similar ideas non-mathematically in psychometric literature, the latter author was the one which appears in every modern Chinese language manual, where
already coining the term "method of reciprocal averages". Later R. A. Fisher the rows are the consonants and the columns are the final vocals. Here the
derived the same theory in the form of a discriminant analysis on a "data" consist of indications whether the various row-column combinations
contingency table and applied it to data on the eye and hair colours of a are permitted in the language or not. In such a context the French term
group of schoolchildren (Fisher, 1940), now a classic example of a contingency correspondance was used to denote the "system of associations" between the
table (reproduced in Table 9.1 of Chapter 9). Meanwhile Louis Guttman elements of two sets, in this case the rows and the columns. In fact, the term
independently derived a method of constructing scales for categorical data carne to represent a specific mathematical entity, namely the original table
(Guttman, 1941), again the same theory in a different context. Guttman divided by its grand total. Since this table consists of non-negative elements
treated the general case of more than two qualitative variables, and his ideas only, a correspondence could be considered as a distribution of one unit of
find their counterpart in what we call multiple correspondence analysis mass across the cells of a rectangular matrix. Thus the concept of a bivariate
(Chapter 5). Since these two famous statisticians, Fisher and Guttman, probability density is included as a special case. In other linguistic examples,
presented essentially the same theory in biometric and psychometric contexts the rows and columns might be (in theory) all the words of a language, the
respectively, it is often the case that biometricians cite Fisher as the inventor data being the number of times the word in the rows precede the words in the
of the technique while psychometricians cite Guttman. To this day the two columns in a specific text. Here the discreteness of the table blurred into a
schools have developed almost independentIy, with a strong school of quasi-continuum of language and there was littIe distinction in the minds of
ecologists worldwide using the method of "reciprocal averaging" (the term the French analysts between a discrete correspondence and a continuous
suggested by the psychologist Horst!) and the psychologists the method of correspondence, the latter being a surface covering a unit volume over a
"dual (or optimal) scaling". These techniques share the same mathematical rectangular, possibly infinite, domain. A familiar special case of a continuous
theory and computational procedure as correspondence analysis, but lead to correspondence would thus be the bivariate normal probability density.
numerical, rather than graphical, results. The French term analyse des correspondances literally means "analysis of
In the 1940s and 1950s further mathematical development took place, correspondences", where the word correspondence has the specific technical
particularly in the psychometric literature by Guttman and his followers. In meaning just discussed. In the usual translation-correspondence analysis­
Japan, a group of data analysts led by Chikio Hayashi, carried on a parallel English-speaking readers would tend quite naturally to think of correspond­
development of Guttman's scaling ideas, which they cal1ed the "quantification ence in the less concrete sense of agreement. Thus the original meaning has
of qualitative data" (Hayashi, 1950, 1952, 1954, 1968). It is interesting that been changed in translation.
as early as 1946 computing machines were being used to perform the Since the early 1960s, then, a small group of dedicated data analysts, led by
calculations (Mosier, 1946). An often quoted technical report by Bock (1960) Benzécri, gained extensive practical experience of correspondence analysis
emphasized the basic principIe of optimal scaling: "to assign numerical values and other descriptive multivariate techniques like cluster analysis. They
to alternatives, or categories, so as to discriminate optimally among the applied their ideas far beyond the initiallinguistic context and gained a large
10 Theory and Applications ofCorrespondence Analysis 1. 1ntroduction 11

degree of fame within France and the continent, and a certain amount of Finally, we return to our statement at the start of this section that
notoriety in the English-speaking world. In order to put the position of this correspondence analysis is, numerically at least, similar to a number of other
group into perspective, a few of their characteristics deserve mentioning. techniques. This does not mean that it is the same as these techniques. For
First, their whole philosophy of data analysis is founded on inductive example, dual scaling (see for example, Nishisato, 1980) is concerned with
reasoning, proceeding from the particular to the general. The data set at hand deriving numerical scores for categories with certain properties, a method
and how one describes it are of importance, not the general framework or pioneered by Guttman. No geometry of such scores is mentioned or intended
model that one might think the data fit. This standpoint is summarized in this framework and the results are not reported in the form of graphical
strongly in Benzécri's second principIe of data analysis: "The model must fit displays. Neither is the framework specifically multidimensional, but rather a
the data, not vice versa". Hand in hand with this principIe is an initial sequence of one-dimensional frameworks. This distinction is an important
rejection of probabilistic and mathematical modelling as presumptious and one and should also be mentioned in the case of reciprocal averaging. An
irrelevant. While few statisticians would subscribe to such an extreme view­ article by HiIl (1974) popularized the name correspondence analysis but con­
point, we would acknowledge that there are occasions when blind assump­ centrated on single dimensions only. This paper forms the basis of a subse­
tions of models lead to serious defects in statistical analysis. However, quent "definition" of correspondence analysis by Hill (1982). By contrast,
Benzécri's first principIe of data analysis that "statistics is not probability" correspondence analysis, as we know it, derives sets of multidimensional
and that "authors (who hardly ever write in our language)"-that is French­ "scores" with a well-defined and intentional geometric interpretation.
"have erected a pompous discipline, rich in hypotheses which are never For further commentary on correspondence analysis and Benzécri's ideas,
satisfied in practice" (Benzécri et al., 1973, p. 3) was destined to raise the anger see Mallows and Tukey (1982) and Gifi (1981).
of statisticians in general rather than win them over, even slightly, to his
idealismo
Secondly, from the outset the descriptive techniques developed by this 1.4 OUTLlNE OF THIS BOOK AND NOTATION
group were geometric ones. Data were transcribed to sets of points in multi­
dimensional space, and points were grouped visually in the form of branches Correspondence analysis is basically a fairly simple technique from the
of a tree structure or actual clusters in a graphical display. This stress on mathematical and computational point of view. However, because it is
graphics followed a tradition of geometry in French mathematics (['esprit primarily a geometric technique rather than a statistical one, it is necessary
géometrique) and paralleled the growing interest in multidimensional scaling to introduce a number of geometric concepts which are crucial to a full
at more or less the same time across the Atlantic. understanding of the method (Chapter 2). Rather than follow immediately
Thirdly, also owing to a longstanding and famous French tradition, their with an algebraic description, we have chosen to devote a whole chapter to
work has been couched in an extremely rigorous algebraic notation. To the sorne simple examples of correspondence analysis so that most of the
initiated this notation is tremendously powerful in expressing exactly the computations and results can be described in detail (Chapter 3). A formal
function and characteristics of both operands and operators. It is unfortunate mathematical treatment is then given in Chapter 4 (Section 4.1) which can be
that this complex language in which the group chose to express itself, almost skimmed through on a first reading and referred back to when necessary. The
completely closed the communication lines to the Anglo-American statistical remainder of Chapter 4 deals with other analyses which are algebraically
schools, who have always used a much more pragmatic notational style. equivalent to correspondence analysis (cf. Section 1.3 above).
Recently, however, sorne of the students of the French school are changing Chapters 5, 6 and 7 treat a wide variety of contexts in which correspondence
their style to communicate and gain acceptance with the larger body of analysis can be used. This is in accordance with the idea that correspondence
statisticians. The book by Lebart et al. (1977), for example, has been read and analysis is a single technique capable of handling many different types of
understood by French-reading statisticians (see the book review by Nash, problems, in contrast to the usual approach in statistics where difTerent
1979), whereas it is a pity that the crucial works of Benzécri et al. (1973) and methods are developed to solve different problems. Thus in Chapter 5 we
the journal Les Cahiers de l'Analyse des Données will never be widely read discuss how correspondence analysis can be applied to data consisting of a
because of their unfamiliar mathematical style. However, for a brief taste of number of qualitative variables and how a general data set can be recoded to
Benzécri's philosophy, interested readers can consult Benzécri (1969b), the be in a form suitable for such an analysis. In Chapter 6 the correspondence
only article available in English and fairly devoid of mathematics. analysis of ratings and preference data is discussed and compared to existing
1. Introduction 13
12 Theory and Applications ofCorrespondence Analysis

methods. In Chapter 7 four traditional areas of multivariate statistics are the respective indexing variables. The matrices F and G usually contain the
co-ordinates of the row and column points respectively in a graphical display,
discussed and it is shown how correspondence analysis can be of use in each
for example the co-ordinates of the two sets of points in Fig. 1.3. Such
ofthese.
conventions are very useful in the development of the algebra and in subse­
In Chapter 8 we have collected together a number of diverse topics of
special interest. The most important of these is a discussion of the stability of quent recall of formulae.
the graphical displays obtained by correspondence analysis, as well as their The notation == indicates the definition of a symbol, for example F == UD"
probabilistic properties in appropriate situations. signifies that the matrix F is defined as UD", as opposed to F = UD" which
Chapter 9 is probably the most important part of the book, where a means that F, previously defined, is equal to UD".
number of applications to data sets from a wide spectrum of disciplines are
discussed. These are not intended to be complete case studies, but rather
illustrations of specific features of correspondence analysis.
We have assumed that the reader has sorne basic knowledge of matrices
and vectors, at a level similar to that assumed by most modern textbooks on
applied multivariate analysis. Most of these books do have a chapter devoted
to matrix algebra and we have not included such a chapter here. If required,
the reader should refer to the relevant chapters of Press (1972), Morrison
(1976), Mardia el al. (1979) or, in particular, the text by Green and Carroll
(1976), which we highly recommend. We have added an appendix specifically
on the singular value decomposition because this crucial concept is given an
inadequate or non-existent treatment in most textbooks (Appendix A). An
appendix on computation and available computer programs is also given to
assist those who wish to execute a correspondence analysis (Appendix B).
At the end of each of Chapters 2 to 8 there is a section entitled "Examples".
These are usually the formal statements and proofs of theoretical results
mentioned during the respective chapter, or numerical examples for purposes
of illustration. They are assembled at the end of each chapter so that the text
is as uninterrupted as possible. '
Finally, we should mention sorne important notational devices that we
have introduced. Following convention in typeset texts, we use italics to
denote scalars (e.g. c, i, K), and boldface to denote vectors (in lower case, e.g.
c, r) and matrices (in upper case, e.g. e, N). If a particular scalar, say j, is
chosen as an index, then its respective capital is usually used as the total
number in the indexed set: for example C l , e2 , ••• , cj , ... , CJ, or cj,j = 1 ... J. In
order to save space when using the summation notation, we indicate the
summation parameters as subscripts and superscripts, for example 'f.{=1 r¡,
often omitting parameters that are obvious in the context, for example 'f.[r¡
or 'f.¡r¡. Since summation most often starts at the first element and ends with
the last, this is assumed when not specifically indicated, so that 'f.kA k stands
for 'f.{-= 1 Ak, while 'f.{-= 2Ak has to be denoted in fuI!.
Sorne letters are informally reserved for specific entities and these will be
explained as they are encountered. For example, we use 1 and J as the
numbers of rows and columns respectively of a general matrix, and i and j as
2. Geomelric Concepts in Multidimensional Space 15

subspaces which best "fit", or líe closest to, a given set of point vectors. The
crucial concept of the singular value decomposition (SVD) and its geometry
are discussed in this section as wel! as in Appendix A, where it is shown to
underlie many difTerent multidimensional techniques.
Section 2.6 concludes with sorne theoretical and practical examples of the
material in this chapter.

~ 2.1 VECTORS ANO MULTIOIMENSIONAL SPACE

Co/umn and row vectors


f Geometric Concepts in The concept of a vector is fundamental to our geometric description of data.
Multidimensional Space For our purposes a vector x consists of a set of real numbers Xl .. , xJ and it
is conventional to write x as a column vector:

Very few textbooks on multivariate analysis lay partiwlar stress on the


geometry of multivariate techniques. A notable exception is the text by Green
x{J

The integer J is caBed the order of the vector and we often refer to such a
and Carrol1 (1976), which gives an excel!ent conceptual introduction to vector as a J-vector. For example, suppose that we measure someone's height
vector and matrix algebra and associated geometry in the context of applied (cm), weight (kg), shoulder width (cm) and waist (cm). These measurements
multivariate analysis. Their book overlaps and complements the material in can be col!ected into a single vector, of order 4 (Le. a 4-vector):
this chapter, and is highly recommended as paral!el reading matter, particu­

l
larly if the reader has a limited mathematical background. In the present 1751+- height
chapter we introduce al! the geometric concepts that will be required for a 70 +-weight
thorough understanding of correspondence analysis, as wel! as related multi­ x = 47 +-shoulder width
dimensional analyses. 84 +-waist
In Section 2.1 we introduce the basic geometric unit in multidimensional
space, the point vector, as wel! as its co-ordinates with respect to a set of basis In order to write vectors as row vectors, the notation xTis introduced, where
vectors. Subspaces are defined and the distinction is made between a Tdenotes the transpose of a vector, for example:
dimension of the space and its dimensionality.
xT = [175 70 47 84] or x = [175 70 47 84]T
In Section 2.2 we begin to add structure to multidimensional space by
defining distances, angles and scalar products between point vectors. These A1though the column vector notation is conventional it is wasteful of space
definitions are illustrated in the wel1-known context of Euclidean space. in printed text, and we often denote particular column vectors in their
In Sections 2.3 and 2.4 we generalize these concepts slightly to accom­ transposed form as row vectors, as shown above.
modate our later description. We need first1y to associate weights with the
dimensions of the space and secondly to assign weights to the individual
Dimension, co-ordinates and basis
point vectors themselves. We cal1 these latter weights (point) masses, in order
to distinguish them from the former dimension weights. Initial!y, for purposes of illustration, let us suppose that we are interested only
In Section 2.5 we arrive at our objective of identifying low-dimensional in a person's height and weight measurements. The data vector (of order 2)
16 Theory and Applications ofCorrespondence Analysis 2. Geometric Concepts in Multidimensional Space 17

75 ____________________________________ A A vector is often considered as a physical movement from one point to


- - - - B·~ another and this is an aiternative interpretation which is of great value in the
C. ,JI understanding of vector geometry. For example, the vector XA = [175 70]T
I
50 I can be considered as the movement from the origin of the space (i.e. the zero
1
Weighl I vector O = [O O]T) to the point, which we have depicted by the diagonal
(kg) I
I
I
arrow in Fig. 2.2. This movement can in turn be considered as the sum of two
25 movements along the respective axes: first a movement of 175 units parallel
to the height axis and then a movement of 70 units parallel to the weight axis.
The vector which denotes one unit of movement parallel to the horizontal
axis (axis 1) is denoted by el = [1 O]T. Similarly e 2 = [O I]T is the unit
25 50 75 100 125 150 175
Heighl (cm)
vector along axis 2, and what we have just described geometrically may be
written algebraically as:
FIG. 2.1
XA = 175e l + 70e 2
for this person (let us say person A) is thus: i.e.
XA = [175 70]T [1~~1 = 175[~1 + 70[~1
Let us further suppose that we have measured two other people (persons B
and C) and that their data vectors are: The vectors el and e2 are known as basis vectors and together they form a
basis, called the standard basis, for the two-dimensional space. Any vector in
xB=[170 68]T and Xc = [150 60]T this space may be obtained by moving in the direction of el and then in the
direction of e2 in the way we have just described, in fact the co-ordinates
These 3 vectors can be displayed in conventional graphical form (Fig. 2.1),
indicate how far to move in the direction of these basis vectors. With this
where each vector is now a point in the display. We say that these vectors are
geometric interpretation it is easy to understand the algebraic properties of
points in 2-dimensional space and that the 2 values in each vector are the
vector addition and scalar-vector muitiplication. For example, the mean
co-ordinates of the point (or point vector) with respect to the 2 dimensions of
height and weight respectively. Here the terms dimension and axis are vector x of the three vectors XA, XB and Xc is
synonymous. For example, point XA has co-ordinate 175 on the height x =i(XA+XB+Xc)
dimension and co-ordinate 70 on the weight dimensiono = i(175e l + 70e 2 + 170e l + 68e2 + 150e l + 60e 2)
= i(495e l + 198e2)
= 165e l + 66e 2
A
so that x = [165 66]T. Thus vectors are summed by summing correspond­
ing elements and a scalar times a vector is equal to the vector of elements
Weighl
multiplied individually by that scalar.
(kg)

"'f-~

7Oe2 Points in subspaces


If we look again at Fig. 2.1 it is clear that the three points lie on a straight line
through the origino This means that all three of our data vectors can be
175 el • 1
O l.<: Heighl (cm)
expressed as multiples of a single vector, for example b = [10 4] T:
FIG.2.2 XA = 17.5b XB = 17.0b Xc = 15.0b
H! Theory and App/icalions 01 Correspondence Analysis

2. Geometric Concepls in Mullidimensional Space


b=IOo¡+ 4eZ e BA 19
~ .. , I • • • I
5 10 15 20
I uni! on

this dimension

FIG. 23. The diagonal dímension in Fig. 2.2 on which all three points lie, at 2000 1 year I
positions 17.5. 17.0 and 1 5.0 respectively with respect to the basis vector b.

For example: 1500

IMPORTS
xA 70 = 17,5 [lOJ
= [ 175J 4
1000,
/j/ ""\year 2
Thus these 3 points actually he in a 1-dimensional subspaee defined by the
basis vector b, and have co-ordinates with respect to b of 17.5,17.0 and 15,0
respectively (Fig. 2.3). This single dimension, which is a combination of the 500+ /
'""
../'
/ \,year :3

original height and weight dimensions, can be interpreted as a dimension of


"size" and Fig. 2.3 shows where each person lies on this dimension-person B
is smaller than A and C is much smaller than B. 500 1000 1500 2000 2500
Clearly there are an infinite number of ways of choosing a basis vector for
EXPORTS
the "size" dimension-we say that the basis vector is not identified. For
example, e = [5 2] T is another basis vector and the co-ordinates of the 3 FIG.2.4
points with respect to e are 35, 34 and 30 respectively. Similarly the standard
basis in the 2-dimensional space is only one out of an infinite number of ordinates: y = [1900 1200JT. The vector b along the line can be chosen as
b = [100 -200]T so that:
possible bases. Later, when we discuss vector lengths and orthogonahty, we
shall restriet our attention to a particular type of basis. In the aboye example yI = Y- 4b Y2 = Y+ b y3 = Y+ 3b
this would amount to fixing a unit length along the size dimensiono
The example of y ¡ is shown geometrically in Fig. 2.4. As we shaH describe in
Section 2.5, there are distinct advantages in choosing the centroid as the fixed
Centroid (centre of gravitv) vector which defines the movement from the origin into the lineo
Consider another artificial example in 2-dimensional spaee, where three Alternatively, we can express aH the vectors as deviations from their
vectors: centroid: z¡ =YI-Y, Z2 =Y2-Y and Z3 =Y3-Y (Fig. 2.5). This action of
centering the vectors results in the centroid y being transferred to the origino
1500] [20ooJ [22ooJ Now, as before, the centred veetors Zb Z2 and Z3 are multiples of a single
YI = [ 2000 Y2 = 1000 Y3 = 600 vector b. The 1-dimensional representation of the 3 points is given in Fig. 2.6,
where the co-ordinates of the (centred) points are - 4, 1 and 3 respectively.
represent the exports and imports of a certain produet in three consecutive
This dimension might be interpreted as a measure of the growth of the local
years. These vectors are plotted as points in Fig. 2.4 and it is clear that they
manufacture of the product, with movement to the righthand side of this
once again lie in a straight lineo However, this is not a straight line through
dimension indicating positive growth (we are assuming that local consump­
the origin and the vectors cannot be expressed simply as multiples of a single
tion of the prod uct has not changed). Notice that in this final picture the
basis vector. Instead each vector is equal to a fixed vector from the origin
information displayed is of a relative nature, firstly relative to the centroid y
to a point on the line, plus a multiple of a vector along the lineo The fixed
of the data, and secondly relative to the basis vector b which we have labeHed
vector is conventionally taken to be the eentroid (or centre of gravity or
a "unit of positive growth". Often we shaH only be interested in this picture
mean vector), which is the vector V of the means of the respective co­
and in descriptive results sueh as: the growth from year 1 to year 2 has been
much faster than from year 2 to year 3. However, to interpret the picture in
2. Geometric Concepts in Multidimensional Space 21
20 Theory and Applications 01 Correspondence Analysis
TABLE2.1

Artificial data to illustrate a 3-dímensional display (Fig 27)

Year1 Year2 Year 3


IMPORTS
Exports 1500 2000 2200
I mports 2000 1000 600

....'
/~ Consumption 2500 2700 3200
Y2
""
~ .... "./"
perspective. For example, suppose that we have local consumption figures for
," Y3
the product discussed aboye so that our data are given by the matrix of Table
2.1. We can depict the three years as points in a "room" where we are looking
/ " down towards the comer of the room (Fig. 2.7). The origin of the display is
/ " at this comer, with two of the axes running horizontally along the edges of
EXPORTS the floor and one vertically up from the comer. We have indicated the
Y2­ "I\
Y "projections" of the points down onto the "floor" of the display (as if the
points have been dropped down vertically), that is with respect to the exports
Y3-Y\
FIG.2.5

ye~r I ~ ye~r 2 veo: 3 _


direction of positive
grow1h 01 local
monulocturing industry
-------- 3000
I unh

FIG.2.6. 1-dimensional representation of point vectors Y,. Y2 and Yo along an


axis through their centroid (the new origin).

absolute terms we have to relate the points' positions to knowledge of y and


b. Thus y = [1900 1200JT tells us around which average export/import
figures our data are centred, and b = [100 - 200J T gives us the actual
interpretation of one unit of growth, namely an increase in exports of 100
coupled with a decrease in imports of 200. In order to make a more realistic
interpretation of the situation we need further information, for example the
local con~;umption of the product from year to year, because the larger
decrease in imports might be due to lower consumption. This increases the
dimensionality of the problem and we have to move away from our neat,
simple representations of the points on "fiat" 2-dimensional sheets of paper
towards conceptually more difficult displays in multidimensional spaces.

Multidimensionalspace
When we want to represent points with respect to 3 co-ordinate axes, it is still FIG.2.7. 3-dimensional display of the columns of Table 2.1. The three poínts líe
in the vertical plane indicated.
possible to give the impression of a third dimension on paper by introducing
Theory and Applications ofCorrespondence Analysis 2. Geometric Concepts in Multidimensional Space 23
22

~ dimensions. In our "room" of Fig. 2.7, this is a vector cutting across the room
Ii::::iE and fol1owing the three points as closely as possible. The question arises as to
e how we could find the vector e which in fact comes the closest to all three
~

3000
8
/'
.
Y3
......-­
/,
/'
/
points. In order to answer this question we have to define what we mean by
"closeness" or, alternatively, distance in our space of points, and this is the
subject ofthe next section.

/'
k /'/
.~ . --.
b The above example in three dimensions is a highly simplified example of
reducing the dimensionality of a set, or cloud, of points in multidimensional
y,. /'/'/'
/'
/'
/'

2500 -­u
y -
900
1200
2800 J space. In actual applications we shall usually be dealing with points in spaces
of much higher dimensionality, but the question will still be the same: can we
/'
find a subspace of lower dimensionality which comes "close" to the cloud of
points? To take a different example, in craniometry a large number of
Positions of the three points in the plane indicated in Fig. 2.7 (viewed
FIG.2.8. measurements might be made on an adult human skull in order to define it
from the opposite side).
accurately. If the vector a contains 30 such measurements then it can be
and imports axes, which is the 2-dimensional display we had previously. The thought of as a point in the 30-dimensional space of possible vectors of
projections onto the two "wal1s" are also shown, and it is clear that these no description of human skulls. The vector of description e of a child's skul1 is
longer lie exactIy on a straight lineo However, since the points do lie on a also a point in this space and perhaps differs from the adult skul1 simply by a
straight line on the "fioor" it is clear that they are contained exactIy in aplane scalar quantity: e = ka. Or e might be obtained by combining the character­
which stands vertical1y on this lineo If we take this plane out of the room, as istics of a smal1 number of different skull-types with appropriate scalings, for
it were, turn it around (so that year 1 is on the left, as before), and lay it fiat, example: e = kla l +k 2 a2 • Ifwe have a large number ofskul1s whose measure­
then we have the display ofFig. 2.8. We have again centred the display at the ments can be obtained by linearly combining two basic skull-types, with
centroid of the three points, y = [1900 1200 2800JT, and the axes are variation only in the coefficients k l and k 2 , then we would say that the skul1
those defined by the original consumption axis and by b. In order to move vectors líe in a two-dimensional subspace of 30-dimensional space. Again it
from the original origin of the display in Fig. 2.7 to a point, say yl' we have will not usually be possible to generate the skull vectors exactIy and we would
to move first to the centroid y, which is the origin of the new display in Fig. try to identify the basis vectors al and a 2 which approximately generate the
2.8, then - 4 units along the b axis, then - 300 units on the consumption skull vectors. Thus, although it is not possible to display the skull vectors in
axis. In short: y1 = Y- 4b - 300e 3, where e3 is the unit vector along the con­ their original space, they can alllie (approximately) in a space ofmuch lower
sumption axis. Recalling the definition of b = 100el - 200e 2 , where el and e2 dimensionality which can be visualized quite easily. Notice that in this
are unit vectors along the export and import axes respectively, we see that the example we have not centred the data explicitIy, although it is more
customary to investigate the vectors with reference to their centroid as
aboye equation is correct:
origin-we shal1 return to this matter later.
1 1. r190~l f, 1 11 r~l
r
0

r
1500 1 To conclude this section we would like to review sorne concepts in a

y, ~ L~: y-4b-3OOe, ~ L~~:r400L~ 1+ 8°OL~ 3OOL~1 general framework. Suppose that we have a cloud of 1 points (or point
vectors) in J-dimensional space (the term "point vector" is synonymous with
"point" and merely underlines our considering a point to be a vector). We
In Fig. 2.8 it is clear that the three points lie approximately in a straight assume that the standard basis vectors el' e 2 , ... , ej define the J axes or
line: for example, the dashed line defined by the vector cseems to fol1ow the dimensions of this space. Let us consider a typical point vector, defined as
direction of spread of the three points. If we are willing to gloss over the x == [X 1 X2 ... XjJT. The numbers Xl' X 2 , ... , Xj are cal1ed the co-ordinates of
deviations of the three points from this line, we could reduce the points' x with respect to e l ,e2 , ... ,ej, because x = xle l +x 2 e 2 + ... +xjej. This
original 3-dimensional positions to a 1-dimensional display along the axis latter expression can also be read as: x is a linear combination of the basis
defined by C. The vector e is now a combination of the consumption axis and vectors e l ,e 2 , ... ,ej, with coefficients X l ,X 2 , ... ,Xj. Geometrical1y, x is a
the axis defined by b, in other words a combination of al1 three original point (or movement from the origin to a point) obtained by moving Xl units
I

24 Theory and Applications ofCorrespondence Analysis


I 2. Geometric Concepts in Multidimensional Space 25

I
in the direction of el' then x 2 units in the direction of e 2 , etc. The vectors Notice that we use the term "subspace" in a slightiy looser sense than the
x1e1,x1e2," .,xJeJ are cal1ed the eomponents of x; x is thus the sum of its usual mathematical definition. We shal1 cal1 a K-dimensional subspace of
components. J-dimensional space the set of vectors U+(X1V1 + ... +(XKVK, where u is any
The centroid of a set of points v1 •.• V¡ is a particular linear combination fixed J -vector, v1 ... V K are K linearly independent J-vectors and (Xl ... (XK are
(X1V1 + ... +(x¡v¡, where the coefficients add up to 1: :E¡(X¡ = 1 (such a linear
combination is often cal1ed a baryeentre). Previously we have used the term
centroid in the sense of average vector, that is when al1 the coefficients are I real numbers. Our definition ineludes the fixed vector u which can be thought
of either as the first step of transferring the vectors from the origin into the
subspace or as effectively redefining the origin of the space at the point u.

I
equal to l/l. However, we shal1 think of a centroid more often as a weighted
average vector, where the coefficients (Xl ... (X¡ are proportional to weights (or
masses) assigned to the respective vectors in the linear combination. Thus the 2.2 DISTANCE, ANGLE AND SCALAR PRODUCT
ordinary mean vector is the centroid when equal weight is assigned to al1 the
Although not mentioned explicitiy in the aboye discussion, the usual physical
vectors.
Although the standard basis is the set of vectors defining our original frame
of reference, as it were, we often re-express our vectors of interest as linear
combinations of other basis vectors, in other words with respect to other
I c-.2!1_c:~pts
of distance, length and direction have aiready been implied, for
example in Fig. 2.7. These concepts become cr1,!ciaJwhen we startto stu9J'
LǻIAa~a, ",:here it would be extremely improbable that the measurements. on

I
more than two people lie exactly along a single dimension or that consecutive
axes. An important property of a basis is that it consists of a set of linearly
years of export/import data be exactiy linearly related. T.!!~ ..qllestion. to
independent vectors-no basis vector is a linear combination of the other
º.~ªsked in a practical situation is not whether a set of points in multi­
basis vectors. Geometrical1y, each basis vector defines a new dimension in
dimensional sP::ice lies exactiy in sorne subspace of lower dimensionality, g.ut
space because its movement cannot be obtained by combining movements of
the other basis vectors.
In Fig. 2.7 the 3 vectors y 1 - y, y 2 - Y and y 3 - Y (the vectors from the
centroid y to the three respective points y l' Y2 and y 3) were defined in
I whet1J:er-iheo-set lies approximately in such a subspace. 1\.. first step' in
formaIi~Úlg this notion is to define a measure of distance (or metric) between
points in the multidimensional space of the data.
The concept of an angle in multidimensional space is a rather more

I
3-dimensional space, yet we saw that they can be expressed as linear
abstract idea, although we are familiar with direction in the physical world
combinations of 2 basis vectors, that is they lie in a subspace of dimensionality
around uso Notice, however, that distance and angle are both scalar quantities
2. Alternatively, we can say that none of the 3 vectors have components along
(real numbers) defined in terms of two points: distance is a value quantifying
a dimension which is linearly independent of the aboye subspace. It is
the eloseness of a point a to a point b, and angle is a value quantifying how
important to note the distinction between the terms "dimensionality" and
rapidly two vectors are diverging from a common origino This is represented
"dimension". The dimensionality of a set (or space) of point vectors is a fixed
in Fig. 2.9 fQr.ªIIY twopoints a and b and it appears intuitively that ifwe
integer value, while dimensions are themselves vectors (which we also caH
know the distances of a and b from the origin O (which we often call the
axes) in the space of the vectors.
lengths of the vectors a and b respectively) as wel1 as the angle between a and
distance from a to b b (i.e. how ra¡>~dly they are moving apart), then we can work out from this
information the distance from a to b. This is indeed true and it turns out that

C>
. (

-------- b
both concepts of distance and angle can be embodied in a single fundamental
concept in multidimensional space, cal1ed the sealar produet ~. in!1f[
produet).In order to introduce this Cüñcept we shal1 ftrst review the
definitions of distance and angle in the simple case of 2-dimensional physical
....0 '
.:o.",c.
space- the space with familiar "horizontal" (x) and "vertical" (y) axes.
~

~~

,,,,<;!' angle belween a and b


relalive lo origin O
I
2-dimensiona/ Euc/idean space
O The term "Euclidean space" is a more formal way of naming the physical
FIG.2.9 space to which we are accustomed, and it has been tacitiy assumed in aH the
26 Theory and Applications ofCorrespondence Analysis

axis 2

/ ·"\":~f
- \~

2. Geometric Concepts in Multidimensional Space 27

All of the aboye formulae can be expressed in terms of the scalar product of
a and b, denoted by (a, b): (a, b) == alb l +a 2 b2 •
a In the notation of vector algebra the expression al b l + a2 b 2 is just a Tb, the
~I -~.
transpose of a multiplied by b (in the sense of matrix multiplication). The
aboye formulae are thus:
Ilall = (a,a)1/2 = (a Ta )1/2
(2.2.1 )
o-b
Ilbll = (b,b// 2 = (b Tb)1/2
(i.e. the squared length of a vector is the scalar product of the vector with
itself)
d(a,b) = (a-b,a-b)1/2 = ((a_b)T(a_b))1/2 (2.2.2)
cos () = (a, b)/((a, a) (b, b) )1/2 = aTb/(aTab Tb)1/2 (2.2.3)
As mentioned above, d(a, b) can be evaluated in terms of lIall, Ilbll and cos ();
in fact d(a, b) squared is
J
01 _/)1
,11"'1 I
(],..
I .
aXIs I
d 2(a,b) = (a-b)T(a-b)
I "1
= a Ta + bTb-2aTb

FIG. 2.10 = lIal1 2 + IIbl1 2 - 211all 'lIbll' cos ()


which is the familiar "cosine rule".
spatial representations of points in the previous diagrams. Low-dimensional The angle cosine between the vector b, say, and the first axis is easily
Euclidean spaces are thus the most common geometric spaces to which we deduced. This axis is defined by the standard basis vector el == [1 O]T. The
are accuMomed: al-dimensional "straight" line, a 2-dimensional "flat" plane length of el is lIe l ll = 1 and the scalar product of b and el is bTel = b¡, the
and 3-dimensional "space" which we see around uso The weH-known x- and co-ordinate of b with respect to el' The cosine of the angle (f3l in Fig. 2.10) is
y-axes reference the dimensions of 2-dimensional Euclidean space, but thus cos f3I = bdllbll, the co-ordinate divided by the vector's length. Similarly
because we want to broaden our discussion to multidimensional space we call the cosine of angle f32 between b and the second co-ordinate axis, defined by
these the first and second axes respectively. Two points, a == [al a 2]T ando e 2, is cos f32 = b2/lIbll· Because bi + b~ = IIbll 2 we have the result:
b == [b l b 2]T, are shown in Fig. 2.10. From basic trigonometry we know the
cos 2 f3l +cos 2 f32 = 1
following results:
The lengths (denoted by 11 ... 11) of vectors a and b are cos f3l and cos f32 are often called the direction cosines of the point b with
respect to the co-ordinate axes, so that the sum of the squared direction
Ilall = (ai + aª)1/2 IIbll = (bi + b~ )1/2 cosines of a point is equal to 1.
(11 .. ·11 is often called the norm of the vector). Rere it is important to note that the aboye results depend entirely on the
The distance between points a and b (denoted by d(a, b)) is perpendicularity ofthe co-ordinate system-we say that the vectors el and e2
are orthogonal. Two vectors are orthogonal if their scalar product is zero,
d(a, b) = ((al -b¡}2 + (a 2 -b 2)2)1/2
in other words neither vector has a component in the direction ofthe other­
(Notice that the vector a-b, with co-ordinates al -b l and a2 -b 2 , has the they are at "right-angles". Clearly eie 2 = O and, in addition, el and e 2 have
same length and direction as the vector from point b to point a-the distance unit lengths (they are "normalized"), so we say that they are orthonormal. In
between a and b is thus the same as the length of the vector a - b, or of b - a.) fact they are an orthonormal basis for 2-dimensional Euclidean space. AH of
The angle () between a and b has cosine the aboye defimtions apply equally weH if the vectors are expressed in any
orthonormal co-ordinate system. On the other hand, if the basis vectors are
cos() = (alb l +a2b2)/((ai +a~)(bi +b~W/2 not orthonormal then the above formulae are not applicable.

28 Theory and Applications of Correspondence Analysis

Mu/tidimensiona/ Euclidea.n space


I 2. Geometric Concepts in Multidimensional Space

Clearly it is not desirable that the distances depend directly on the chosen
29

It is quite straightforward to extend the above mentioned definitions to


J-dimensional Euclidean space. The Euclidean scalar product of any two
J-vectors a == [al'" aj]T and b == [b 1 ... b j]T is defined as:
I scales of measurement. A common way of remedying this particular problem
is to divide the measurements by their respective standard deviations before
computing the Euclidean distance. This standardized form of the measure­

(a, b) == Ejajb j = a Tb
and the definitions of length, distance and angle are exactly as before (see
I ment remains the same for any units chosen originally. For example, suppose
that the standard deviations of height and weight in the sample of people
are 30 (cm) and 10 (kg) respectively. The height measurements are divided
by 30 and the weights by 10 and these are then plotted in 2-dimensional
equations (2.2.1-2.2.3)). Again we are assuming that the co-ordinates of a and
b are with respect to the standard basis el"'" ej, but the definitions would
still apply if a and b were expressed relative to any orthonormal basis. We
shall prove this result in the next section in the even more general context of
I Euclidean space. Two vectors x and y of measurements become standardized
as X s = [x¡j30 x 2/10]T and Ys = [y¡j30 Y2/IO]T and the scalar product of
2
X s and Ys is (xs,Ys) = x 1y¡j30 +x2Y2/102.

I
An equivalent way of describing the above strategy is in terms of differential
weighted Euclidean space. weighting of the co-ordinate axes. The original vectors of measurements are
Notice finally that although we encourage the concept of a vector as a
retained but the definition of scalar product contains a weighting factor on
point in space it should be remembered that the scalar product of two vectors
each termo For the example above, the weighting factors are 1/302 and 1/102

I
is dependent on the origin of the space. Distances between point vectors are, respectively, so that the scalar product between two vectors x and y of
by contrast, independent of the origin of the space.
measurementsis:
(x,y) == X1Y1/302+X2Y2/102 = X TO;>l y (2.3.1 )
2.3 WEIGHTED EUCLlDEAN SPACE

The concept of weighting the co-ordinate axes is fundamental to our


subsequent discussion. We shall illustrate this idea in a simple 2-dimensional
I where

0,1 - [1/30
s - O
2

O J
1/102
example and then describe its multidimensional extension, which we shall
need latero I is the diagonal matrix ofthe inverses ofthe variances.
Geometrically the vectors x and y are plotted in their original units but
scalar products (and therefore distances and lengths) in this space are
Examp/e
_.-_.__ in_--2-dimensiona/-weighted
,
Euc/idean space
Let us refer back to Fig. 2.1 and the example of the height and weight
measurements on different people and suppose that we are interested in
I calculated using (2.3.1), where the height measurement is down-weighted
relative to the weight measurement. We call this space weighted Euclidean
space, with the weights in this example equal to the inverses of the variances.
The space can be imagined to be stretched along the axes, in the sense that
defining distances in the space of these measurements. In Fig. 2.1 we plotted
points as if the space was Euclidean and the implied interpoint distances are
then just the usual "straight line" distances, which can be computed using
formula (2.2.2).
I relative to our physical way of thinking a shell of points equidistant to a fixed
point is not spherical, but elliptical. In our example above, the ellipse of
equidistant points has major axis parallel to the height axis (Fig. 2.11).
Another way of thinking of this is that the units of measurement are changed
However, there is a very good reason why the Euclidean distance is on each axis, with unit distances on axes being inversely proportional to the
unsuitable for this type of data. For example, because of the units of 1 respective weights.
measurement (cm for height, kg for weight), the height measurement always Once again we shall find it advantageous to work only with orthonormal
has a much higher value than the weight. The difference in height between bases. In the weighted Euclidean space with scalar product (2.3.1) the
two people is therefore usually higher than the difference in weight. Thus the I previous basis vectors el and e2 are still orthogonál but not of unit length. An
height measurement will contribute relatively more to the Euclidean distance, orthonormal basis is clearly 30e 1 and l0e 2, because
which depends on the sum of squares of these differences. On the other hand,
(30e¡}TO;>1(30e 1) = 1 and (10e2)TO;>1(10e2~ = 1
if we expressed height in m then the weight measurement would dominate the
Euclidean distance. (cf. the remark at the end of the previous paragraph). The co-ordinates of x
30 Theory and Applications 01 Correspondence Analysis 2. Geometric Concepts in Multidimensional Space 31

weighting matrix is any positive definite matrix Q, that is where the scalar
product between vectors x and y in J -dimensional space is defined by:
x TQy = I:.jI:.j'qjj'XjYj'
Weighl
060b e
Let us thus express x and y relative to any basis b 1 ·•· b J :
x = I:.jujb j y = I:.j'vj'bj'

Heighl
then their scalar product is
x TQy = (I:.jujb j )TQ(I:.j'vj'bj')
FIG.2.11. The computation of distance in weighted Euclidean space The ellipse
defines the set of points which are all equidistant to 9 for a given distance. If = I:.jI:.j'ujvj'bJQbj'
d(a, g) is 3 times d(c, g) then a difference in a unit of weight is, as far as thanks to the distributive nature of matrix multiplication. Now if the basis
computing interpoint distances is concerned, the same as a difference in 3 units of
height. In this way differences in scaies and in variabilities of measurements can
+
b ... b J is orthonormal then by definition bJQbj' = O ifj j' and bJQbj = 1,
1
be compensated foro j = 1 ... J. (In matrix notation we write this as: B T QB = 1, where I is the
identity matrix and Bis the matrix of column vectors b 1 .. , b J • We say that
the basis B is orthonormal in the metric Q.) Thus it fol1ows that:
with respect to this basis are xd30 and x 2 /10 respectively: xTQy = I:.jUjVj (2.3.4)
x = (xd30)30e 1 + (x 2 /10)10e 2 i.e. the weighted Euclidean scalar product is simply the Euclidean scalar
In other words the co-ordinates of x with respect to the orthonormal basis in product of the co-ordinates with respect to any orthonormal basis (ortho­
the weighted Euclidean space are exactly the co-ordinates of the standardized normal in the same weighted space).
vector x. in ordinary (unweighted) Euclidean space.
Distances between vectors of frequencies
~klimensiona/ weigh(ed Euc/idean space
One of the most common examples of a weighted Euclidean distance is
In general, weighted Euclidean space is defined by the scalar product: the chi-square (x2) statistic for testing whether a probability density con­
forms to sorne expected density. For example, suppose that the nation­
xTDqy = I:.jqjXjYj (2.3.2) wide results of a general election give the 5 parties contesting the election the
where ql'" qJ are positive real numbers defining the relative weights fol1owing numbers of votes (in thousands): 1548, 2693, 621, 950 and 283
assigned to the J respective dimensions. The squared distance between two respectively. Expressed as proportions of the total number of votes (6095
points x and y in this space is thus the weighted sum of squared differences in thousand), these are 0.254, 0.442, 0.102, 0.156 and 0.046 respectively. Assuming
co-ordinates : that every voter could choose from al1 five parties, we would expect the votes
in the different parts of the country to be roughly in the same proportions,
d2 (x,y) == (x-y)TDq(X-Y) = I:. j qj(Xj-Yj)2 (2.3.3)
unless other patterns of voting are present. Suppose that in a certain rural
This type of distance function is often referred to as a diagonal metric. area of the country a total of 5000 voters vote as fol1ows for the 5 respective
From our 2-dimensional example, it seems that as long as we maintain the parties: 1195, 2290, 545, 771 and 199, that is in proportions 0.239, 0.458,
prerequisite of an orthonormal basis, no matter how the scalar product is 0.109,0.154 and 0.040 respectively. If voting here had taken place in exactly
defined, then the usual (unweighted) definition of Euclidean scalar product the same proportions as in the nation as a whole, the expected number (or
(and thus distance, length and direction) may be applied to the co-ordinates frequency) of votes for each party would have been 1270,2210,510, 780 and
of the vectors with respect to that basis, in order to evaluate the respective 230 respectively (e.g. 1270 = 0.254 x 50(0). Thus there have been 75 votes less
quantities in the weighted space. than this expected frequency for party 1, 80 more for party 2, 35 more for
This result is easy to prove, even in the more general case where the party 3, 9less for party 4 and 31less for party 5. In order to test whether these
32 Theory and Applieations ofCorrespondenee Analysis

deviations represent statistically significant departures from the expected


I 2. Geometrie Coneepts in Multidimensional Spaee

introduces the sample size into the measure of difference between observed
33

frequencies, the following statistic is calculated:


X2 = I (observed frequency-expected frequency)2
expected frequency
(2.3.5)
I and expected frequencies.
The absolute value of n is important because it is the essential factor
in the statistical comparison of p to p. The value of (p - p)TD ¡ 1(p - p) is

I
14.01/5000 = 0.0028, which is independent of the total observed frequency.
75 2 80 2 35 2 92 31 2 Because the critical point at P = 0.05 of X2 (4) is 9.488, it is clear that if the
= 1270 + 2210 + 510 + 780 + 230 total observed frequency had been less than 3387, with the same relative
frequencies, then there would not have been enough evidence of difference
= 4.43 + 2.90+2.40+0.10+4.18
= 14.01
In order to perform the test, the result 14.01 is compared to the critical
I in the profiles. lf we now have another observed profile p' in another
voting area, we can again see how different this is from p by calculating
(p' - p)TD ¡ 1(p' - p); let us suppose this evaluates as 0.0112. This is 4 times the

I
points of the chi-square distribution with 4 degrees of freedom (i.e. X2 (4)), squared distance of p from p, in other words p' is twice as far from p as is p in
and because 14.01 is greater than 13.28 (P = 0.01) and less than 14.86 the weighted Euclidean space of the profiles. But suppose that the total
(P = 0.005), the set of observed frequencies is said to be significantly different observed frequency in this new area is only 1000. The X2 statistic is
from the set of expected frequencies at a significance level of P < 0.01 (i.e. less 1000 x 0.0112 = 11.20 which is not as significant as the x2 statistic of 14.01
than 1 %).
Another way of thinking about formula (2.3.5) is to consider the two
sets offrequencies as two 5-vectors o=: [1195 2290 545 771 199]T and
I computed for p. Thus the relative values of the total observed frequencies are
important in the weighting of the observed profiles themselves so that we can
order these profiles with respect to sorne measure of "evidence of difference"
of the profiles. In the next section we shall continue discussing this topic after

I
e == [1270 221(j 510 780 230]T in 5-dimensional space. The X2 statistic
is then: a general introduction to the weighting of points in multidimensional space.
X2 = (o -e) TD; 1(0 -e)

I
in other words the squared distance between o and e in the weighted 2.4 ASSIGNING MASSES (WEIGHTS) TO VECTORS
Euclidean space with weights equal to the inverse expected frequencies.
lf we define p and p to be the vectors of relative frequencies and of expected In the previous section we discussed the differential weighting of the original
relative frequencies respectively: dimensions when calculating scalar products (and thus distances and lengths)
in the space of a set of point vectors. In this section we introduce what
p == (l/n)o = [0.239 0.458 0.109 0.154 0.040]T
is essentially a dual concept in correspondence analysis-the differential
P == (1/n)e = [0.254 0.442 0.102 0.156 0.046]T weighting of the points themselves.
where n is the total observed freguency, then the X2 statistic aboye is lt is not uncommon in many statistical methods to weight certain observa­
tions for justifiable reasons. For example, in an opinion survey it might be
X2 = n(p_p)TD¡1(p_p) = nr.ipj-pjf/pj (2.3.6)
difficult to obtain answers from female respondents for reasons which are
This type of formulation will be seen often in this book. Correspondence quite independent of the survey. In the final data set the female opinion is
analysis is concerned with vectors of relative frequencies, like p, as points in grossly under-represented and any summary statistic of general opinion is
multidimensional space. Such vectors are known as profiles, for example in male dominated. In this case it could be decided to assign higher weight (or
our voting illustration p is the profile of the rural area across the 5 polítical mass) to the female responses in order to equalize the contributions of the
parties. The vector p is in this case the average (or expected) profile of the two sexes in the calculation of means, regressions, etc.
whole country across the 5 parties. The squared distance from p to p is In order to enforce a distinction between the weighting of points and
(p - p)TD ¡ 1(p - p), a weighted Euclidean distance where the weights are the dimensions, we shall prefer the term mas s when referring to a quantity which
inverses of the expected relative frequencies. Because ít is proportional to weights a point while the term weight will usually refer to the weighting ofthe
the X2 statistic this distance function is caBed the chi-square distance (X 2 standard dimensions (axes) of a space. However, the verb "to weight" will be
distance). The proportionality factor is the total observed frequency n, which retained in both contexts.
2. Geometric Concepts in Multidimensional Space 35
34 Theory and Applications ofCorrespondence Analysis

In our study of the geometry of a set of vectors, the assigning of difTerent frequencies is the centroid of the individual vectors of relative frequencies,
masses to the vectors amounts to attaching difTerent degrees of importance to where each of the latter vectors is weighted by its associated total frequency
the positions of the points in space. Previously, in Section 2.2, we mentioned (number ofvotes); hence the term average profile for ji.
that our objective is to identify a low-dimensional subspace which comes The statistic XZ in (2.4.3) aboye can be sirnilarly described as a weighted
"closest" to aH the data points. When the~nts have difTerent masses then
sum of the squared di.~tances between Pi _a,n4~ Ir we introduce the foliowing
the subspace should lie even closer totile poirits' oC higher mass,-WlUle a iloÜí.iíonaldefinitións:

deviation from thepoilits-ofIciweimass wóuld be more'- e'asilY~folerated.


n == ~¡n¡ (the total number ofvoters)
Sometim~8.-W~J¡hall even assign_~~[Q.masses to certain points-these points
do ootaff~~Lour search for the "best" -subspace--at aH, but their positions w¡ == n¡ln. (the proportion of voters in area i)
relati~e to a s~bspace;reSiili-~i i~t~r¡;T whetherthey líe close to the sub­
space or n o 1 . ' .. ._--~---_._--
'dr == (p¡- ji) n ¡ (p¡ - p) (the squared distance between p¡ and p, in
T 1

the IIl~YAc de.fit.Ied by D¡ 1 )


'_"0" •

The centroid of a set ofpoints x1,XZ, ... ,x] with different masses W 1, wz,... , W]
is the weighted average point:
r-/,
--
in(I) == XZ In ----------
(which we shaH caH the (total) inertia of the set of 1 profile
vectors) then the centroid p and the inertia in(I) can both be
f'-i~i~~~'~¡/i¡:~ (2.4.1) expressed as weighted averages:
Hence x also tends in the direction of the points with higher mass. . p = ~¡w¡p¡
in(I) = ~iw¡df
x2 ana/ysis of a sel of vectors of frequencies .-i:..\
Thus the average profil(ji i~ a point vector which indicates the centroid of
Formula (2.3.6) describes a particular XZ statisfic as a squared distance individual profiles, while the total inertia is lí measure of how much the
between the vector p of observed relative frequencies and the vector p of individual profiles are spread around the centroid. Both ji and in(I) are
expected relative frequencies, multiplied by the total observed frequency n. In independent of the absolute frequencies that constitute the original data and
the context of the same example suppose now that we have the fuH would be identical if the data were multiplied by any constant value. The
breakdown of election results for aH the constituent areas of the country. absolute frequencies n¡ are only taken into account in relation to each other
That is, for each area i we have ni' the number of people who voted, and the and their relative values define the masses W¡ which are associated with the
profile Pi' the 5-vector of relative frequencies indicating how the n¡ people profiles.
voted. For each area we can calculate the statistic xf (ef. (2.3.6)) N otice that the term "inertia" is used here by analogy with the definition
x¡Z = n¡ ( p¡-p-)Tn-l( -)
p p¡-p (2.4.2) in applied mathematics of "moment of inertia"- the integral of mass times
squared distance to the centroid. In the statistical literature the total inertia
and add these up for aH the areas: is known as "Pearson's mean-square contingency coefficient" computed on
XZ = ~¡xf (2.4.3) the table offrequencies (cf. Section 4.1.6).

In this case XZ assesses aH the evidence in the data for differences in the voting
patterns between the various areas and the overaH voting pattern.
Ir we define the individual elements of the profile as p¡ == [Pil p¡z ... p¡s]T 2.5 IDENTIFYING OPTIMAL SUBSPACES
then the number of people in area i who voted for party j is n¡p¡j' The total
number of people in aH the areas who voted for party j is then ~¡n¡pij' Hence In the previous sections of this chapter we have defined a set of points in a
the relative frequency of votes for party j is ~¡n¡pul~¡n¡, which is what we multidimensional space, where distances and scalar products are calculated
have denoted previously by Pj, the jth element of ji. In vector notation this is: between points by weighting the dimensions and where each point is itself
weighted by an associated mass. Our object now is to identify the subspaces
ji = ~¡n¡pJ~¡n¡ (2.4.4) of lower dimensionality which best contain the set of points, that is the sub­
Comparing this to (2.4.1) we see that the vector ji of overaH relative spaces which come closest to the set of points.

'1

.... J._\ _ ---~


36 Theory and Applications ofCorrespondence Analysis 2. Geometric Concepts in Multidimensional Space 37

Closeness, or fie. of a subspace to a set ofpoints di, sayo If y¡ is weighted by mass W¡ (i = 1 ... 1) then our definition of the
eloseness of the whole set of points to the subspace S is:
The title of a paper by Karl Pearson: "On lines and planes of elosest fit to a
i system of points" (Pearson, 1901), shows that this objective has a long t/J(S;Yl'''YJ) == "I:.¡w¡df (2.5.1 )
history. Pearson's geometry was ordinary Euelidean and there was no where
concept of differentially weighting the points inherent in bis measure of fit,
df == lIy¡-Y;iIi>. == (Yi-y¡)TDq(y¡-y¡)
but the generalization of bis ideas is a straightforward extension of what is
now commonly called "principal components analysis". and D q is the diagonal matrix of positive dimension weights. The squared

r The first problem to address is how to define the eloseness of a set of points
to any given subspace. We have defined distances between any two given
points, so it is intuitive that the distance between a point and a given
distance df depends on the subspace S and our objective is thus to find the
subspace S* which minimizes the function t/J in (2.5.1).
In accordance with our definition of a subspace at the end of Section 2.1
subspace is the shortest of the distances between the point and all the points we can think of a single point s as a zero-dimensional subspace. The function
contained in the subspace. Thus we could define the eloseness of a set of (2.5.1) then becomes:
points to the subspace as an average, or weighted average, of the correspond­
ing set of such shortest distances. For reasons of algebraic simplicity as well t/J(S;Yl'''YJ) = "I:.¡w¡(y¡-s)TDiY¡-s)
as a host of other geometric conveniences, we base our measure of eloseness since Y¡ is equal to s for all i. The centroid y is the point which minimizes this
on the squared distances rather than the distances themselves. This is a function, a result easily shown by setting the function's derivatives with
common practice in statistics, for example in regression and in analysis of respect to the elements of s equal to zero (cf. Example 4.6.3). Thus the
variance where the model is fitted so as to minimize the sum of the squares of centroid is in this sense the elosest point to all the given points y 1 ... yJ.
the errors, not of the absolute errors themselves. Furthermore we can show that in our search for the optimal K*-dimensional
Figure 2.12 depicts a eloud of points in J-dimensional weighted Euclidean subspace S* we need only consider subspaces S which contain y, hence we
space with a subspace of lower dimensio~ality K* drawn schematically as a have drawn y in the candidate subspace of Fig. 2.12. This result is proved in
plane cutting through the space. For a typical point y¡, Yi represents the point Example 2.6.3. Hence any subspace S which is optimal in the sense of
in the subspace which is elosest to y¡, this minimum distance being equal to minimizing (2.5.1) must inelude the centroid, with the result that we can
restrict the approximations Y¡ of the points Y¡ to be of the following form:
Y¡ = Y+ "I:.f: d¡kVk where v1 ·•· VK. are basis vectors of the subspace. The
function (2.5.1) to be minimized can thus be written as:
- K· T - K·
t/J(S;Yl .. ·YJ) = "I:.¡w¡(y¡-y-"I:. k J¡kvd Dq(y¡-y-"I:. k J¡kVk) (2.5.2)

The variables of this objective function are the K* axes v1'" V K., implying
a total of J K* scalar variables. There is an additional problem of identifying
the optimal solution amongst the infinity of bases for the optimal subspace,
even if attention is restricted to orthonormal bases. Fortunately we do not
pace, S
have to resort to the use of optimization techniques to solve this problem, as
our particular choice of fit in terms of squared distances leads to considerable
simplification of the algebra and the algorithm to compute the solution.

Singular value decomposition (S VD) and low rank matrix approximation


The complete theoretical solution to the problem of II).inimizing (2.5.2) for
FIG.2.12. Points in multidimensional space and their proJections onto a sub­ any specified dimensionality K* is embodied in the concepts of singular value
space, depicted by aplane. decomposition (or basic structure) and low rank matrix approximation. The

... la

2. Geometric Concepts in Multidimensional Space 39


38 Theory and Applications of Correspondence Analysis.

singular value decomposition (which we henceforth denote by the abbrevia­ SVD form as:
tion SVD) is one of the most useful tools in matrix algebra and includes the A[K*] = U(K*)D~(K*)vTK*)
concept of the well-known eigenvalue/eigenvector decomposition (which we
where U (K*), V (K*) and D~(K*) are the relevant submatrices of U, V and D~.
call the eigendecomposition) as a special case. A few relevant results are
From (2.5.7) the rows and columns of A[K*] are equivalently the points
stated here, and we leave the reader to refer to the more detailed discussion
in respective subspaces of dimensionality K* which best fit the rows and
in Appendix A
columns of A in the sense of minimum sum of squared Euclidean distances.
The SVD is the expression of any real 1 x} matrix A of rank K in the
This efTectively solves the problem of minimizing a simpler form of (2.5.2), in
folIowing form:
the absence of masses and dimension weights, that is ordinary principal
A = U D~ VT components analysis. We now introduce straightforward generalizations of
(2.5.3)
Ix} IxK KxK Kx} the decomposition and of the matrix approximation to cope with these, thus
i.e. defining "generalized principal components analysis" (cf. Appendix A and
A = :r.f(X.UkV~ (2.5.4) Table Al (2)).
In Appendix A.l it is shown that any matrix A can be decomposed as
where UTU = 1 = VTV and (Xl ~ (X2 ~ ... ~ (XK > O. The K orthonormal A = NDI'MT where NTnN = MTcI>M = 1, .o and cI> being prescribed positive­
I-vectors u l ... UK of U, called the left singular vectors, are an orthonormal definite symmetric matrices. We call this the generalized singular value
basis for the columns of A and are the eigenvectors of AAT, with associated decomposition "in the metrics" .o and cI>. Let us now set n = D w, the masses,
eigenvalues <xi ... <xk· SimilarIy the K orthonormal }-vectors VI' v2, ... , VK and cI> = D q , the dimension weights. Then the matrix approximation
of V, called the right singular vectors, are an orthonormal basis for the T ~K* T
(transposed) rows of A and are the eigenvectors of ATA, with the same A[K*] = N (K*¡D I'(K*¡M(K*) = ¿'k Ilk o k m k

associated eigenvalues (Xi ... (Xk. The elements (Xl'" <XK of the diagonal minimizes:
matrix D~ are calIed the singular values (of A). The existence and the unique­ IIA-Xllb•. Dw == l¡ljw¡qj(aij-x¡Y
ness properties of the SVD are discussed in Appendix A. = l¡w¡(a¡-x¡}TDq(a¡-x¡} (2.5.8)
The matrices F == UD~ and G == VD~ contain the co-ordinates of the
rows and columns of A with respect to the respective basis vectors in V and amongst alI matrices X of rank at most K*, where a; and x; are the rows of
U. For example, if (2.5.3) is written as A = FV T, then the ¡th row a; of A can A and X respectively. Comparing this to (2.5.2) we see that this provides the
be written as: required solution where A is defined as the matrix of centred rows of y, that
a¡ = (2.5.5) is Y-ly T. From the form of the optimum, A[K*], the vectors mI'" mK* of
:r.f¡;kVk

so that the ith row of F contains the co-ordinates of a¡. SimilarIy the jth row
1I M(K*) define an orthonormal basis for the optimal subspace and the co­
ordinates of the vectors y¡ - Y with respect to this basis are in the rows of
of G contains the co-ordinates of the jth column of A with respect to the basis F(K*) == N(K*)DI'(K*)'
vectors in U. The (generalized) SVD provides the required solution for any prescribed
The beauty of the SVD for our present purpose is the fact that if the last dimensionality K*. For K* = 1 the first pair ofsingular vectors and associated
terms of (2.504) corresponding to the smallest singular values are dropped singular value (the largest) provide the optimal solution, for K* = 2 the first
then a least-squares approximation of the matrix A results. That is, if we and second pairs of singular vectors and associated singular values provide
define the matrix A[K*] as the first K* terms of (2.5.4): the optimal solution, and so on. This "additivity" of the dimensions leads to
K* T
A[K*] ==:r. k (XkUkVk (2.5.6) the basis vectors mI ... mK, in this case, being caBed principal axes of the
rows ofY.
then A[K*] minimizes: The squared singular values give an idea of how well the matrix is
IIA - XII 2 == :r.¡:r. j(aij - X¡J2 (2.5.7) represented along the principal axes. The total variation of a matrix A is
quantified by its squared norm, for eX'ample in the present case (cf. (2.5.8)):
amongst all 1 x} matrices X of rank at most K* (cf. (AlA)). A[K*] is called
the rank K* (least-squares) approximation of A and can itself be written in IIAllb•. Dw = l¡w¡a;Dqa¡ = lf= lll~ (2.5.9)

40 Theory and Applications 01Correspondence Analysis 2. GeQmetric Concepts in Multidimensional Space 41

SimilarIy the variation of the approximation A[K'] is: X2 y

II A[K*]lIb.,D w= r.t:, IJ1: (2.5.10)


and the "unexplained" variation is:
IIA-A[K*]llb.,Dw = r.f=K'+IJ1f
which is minimized. The "explained" variation (2.5.10), expressed as a
(2.5.11)

percentage r K' of the total variation (2.5.9), is often used informally to


~
quantify the quality of the K*-dimensional approximation of the matrix (cf.
,L ~. XI
(A. 1.7)). , 1 X

When a¡ = Yi-Y' (2.5.9) is the weighted sum of squared distances of the


FIG.2.13. Fitting al-dimensional subspace to a set of points by minimizing
vectors Yi to their centroid y, a type of generalized variance which we call the residual distances: (a) orthogonal to the subspace (principal components
the inertia of the set of vectors, or total inertia (cf. Section 2.4). Because the analysis); (b) parallel to the dimension defined by the dependent variable
total inertia is the sum of the squared singular values of A and because that (regression).
sum of squares splits up according to (2.5.10) and (2.5.11), it is clear that the
kth principal axis accounts for an amount tt:
of the total inertia. We say that y and an 1 x J matrix X respectively. The rows of the augmented matrix
the total inertia is decomposed along the principal axes. [y X] are points in (J + 1)-dimensional space and again the problem is one
The definitions and results of this section will be illustrated and developed of finding a subspace which fits these points, although not in the same way as
further in the examples of Section 2.6 and in later chapters which specifically we have described aboye. Rere the distance between a point and the subspace
treat correspondence analysis. So far we have not discussed the computations is not the closest distance, measured orthogonal to the subspace, but the
involved in obtaining the SVD, except to state its relationship with the more distance measured parallel to the dimension defined by the dependent
familiar eigendecomposition of a square symmetric matrix (illustrated in variable y. Figure 2.13 is a simple illustration of the two situations, showing
Example 2.6.4). We shall usually compute the generalized SVD by inter­ why the optimal fitting of subspaces is often referred to as "orthogonal
mediary of the ordinary SVD. For example, in order to compute the regression".
decomposition A = NDIlM T, where NTDwN = 1 = MTDqM, we proceed as
follows:
2.6 EXAMPLES
(i) Let B = D~2AD~/2.
This section is included to c1arify the concepts that have been described in the
(ii) Find the ordinary SVD of B: B = UD" VT. previous sections. Both theoretical and practical examples are given.
(iii) LetN = D~1/2U,M = D;I/2V,D Il = D",
(iv) Then A = NDIlM T is the generalized SVD required.
2.6.7 Distanees and sea/ar produets
This procedure is illustrated in Example 2.6.5. More details about the (a) Suppose that XI ... Xl are point vectors in J-dimensional Euclidean space and let
computation of the ordinary SVD are given in Appendix B, although most X == [XI'" Xl]. Then the symmetric [ x [ matrix A of squared distances between
readers who have access to a computer program which evaluates the SVD (or the vectors is given by:
the eigendecomposition) would probably prefer not to become involved in A = sI T + ls T - 2S (2.6.1 )
this subject. where S is the [x [ matrix of scalar products between the vectors, s is the [-vector
formed from the diagonal e1ements ofS and 1 is the [-vector of ones.
(b) Suppose conversely that we have the matrix A of squared Euc1idean distances
Comparison with mu/tip/e regression ana/ysis between 1 points. Then the matrix S of scalar products relative to a barycentre
of the points defined by the vector of masses w, where 1Tw = 1, is given by:
It is instructive to compare the fitting of subspaces to the fitting of multiple
S = -!fI)AfI) T (2.6.2)
regression models. In multiple regression analysis the I observations of the where
T
dependent variable and J independent variables are contained in an I-vector fI)=I-lw
42 Theory and App/ications o! Correspondence Ana/ysis 2. Geometric Concepts in Mu/tidimensiona/ Space 43

(a)
Proo!

S=

l
X~X¡
X2X¡

XIX¡
.
~¡.x~ ... x\X¡
T T
:

Xlx¡
I =xTx
another 3-vector y. A thus maps Xto y and because the matrix-vector multiplication
Ax is a linear operation, A is called a linear mapping.
Suppose that the co-ordinate system is changed from the standard basis el' e2, e to
a basis (¡, (2' (3 defined as follows:
(¡ = e¡ +2e 2 +e 3
(2 = e¡ + e 2 -e 3
3

The squared distance between Xi and Xi' is (3 = e 2 +e 3


b~, == (Xi - Xi') T(Xi - x¡oj = xiXi + X~Xi' - 2xiXi' What are the co-ordinates of X and y relative to this new basis and what form does
the linear mapping A take in this new co-ordinate system?
= Sii + Si'i' - 2Sii ,

Let s == [xix¡ ... XIX¡], the diagonal ofS, Then the matrix A == [b~,] can be written as So/ution
(2.6.1 ). Re-expressing vectors with respect to new bases can sometimes be confusing, so
(b) Whatever the original set of points x¡ ". X¡ might be, we know from (2.6.1~ and the following simple fact should be remembered: if a vector X has co-ordinates
from comment (2) below that A = sI T+ IsT- 2S, Since cJ)lsT = 1s T-1 wT15 = O xjB) ... xy!) with respect to a basis b¡ , .. bK then, by definition:
(wTI = 1) and similarly sI TcJ)T= O, we have cJ)AcJ)T= - 2cJ)ScJ)T. Thus we need to
X = lkX~BlJJk = BX(B) (2,6.3)
show that cJ)ScJ)T= S, which we do by showing that Sw = O. The ith element of SW is
wherex(B) == [xjB)",XY!l]T andB == [b¡".bKl
li'Sii'W i ' = li,(xi-i)T(Xi,-X)W i, where i = liWiXi
Thus if we let F == [(¡(2(3] in our problem above, then the co-ordinates X(F) and
= (X¡-i)T{li'Wi,(Xi,-x)} y(F) of Xand y respectively with respect to F must satisfy the equations:

= (Xi-X)T(i-i) X = FX(F) y = Fy(F)


=0 where
01
Comments
(1) Both results (2,6.1) and (2.6.2) are valid for a weighted Euclidean space, F = 2I 11 1 1
(2) Result (2.6.1) is independent ofthe origin ofthe space, that is if Sii' = (Xi -a) T(X i, -a)
for any a.
Now y = Ax can be written as:

I 1 -1 1
(3) The scalar product matrix S can only be recovered with respect to an origin
"interior" to the set of points, i.e. a barycentre, for example the centroid of the points Fy(F) = AFx(F)

(cf. Schoneman, 1970; Appendix). The set of barycentres of a cloud of points forms a"
convex set, bounded by the convex hull of the points. so that in terms ofthe new co-ordinates:

(4) The transformation cJ)AcJ)Tof A is called a doub/e-centering, The weighted average y(F) = F-¡AFx(F)

of each row is calculated (weighted by the respective elements of w) and subtracted


from the elements of the row (postmultiplication by cJ) T). This is repeated on the i,e, A (F) = F-¡ AF is the form of A in the new co-ordinate system. The inverse matrix
ofF is:
columns of the row-centered matrix (premultiplication by cJ)), Thus the matrix of
scalar products is -t times the double-centered matrix of squared distances.
2-1 1]
2.6.2 Data matrix as a linear mapping
Although we shall think of a data matrix as a set of column vectors or a set of row
vectors, it is also common to think of a matrix as a linear mapping. For example,
consider the following square matrix A:
and the answer is thus:
F-¡ =
l -1
-3
1-1
2-1

2 O0J
3-1 1]
A (F) = F-¡AF = O 1 O

A==

l 4
2
-1
-1
2
2
If we write y = Ax then A can be considered an operation on a 3-vector X to obtain
[O O 1
Thus in the new co-ordinate system the mapping A takes on a particular simple
form-the first co-ordinate is doubled and the other two co-ordinates remain
unchanged.
44 Theory and Applications o!Correspondence Analysis 2. Geometric Concepts in Multidimensional Space 45

Comment y¡
The aboye change of basis F was specificaHy chosen so that F-¡ AF = D A, a diagonal
matrix. This implies that AF = FD A, which looks like the eigenequation of the matrix
A. However, the vectors of F are not orthogonal to each other, hence they are not the
eigenvectors of A. The eigenvectors are thus an orthogonal basis (in fact, an ortho­
normal basis) with respect to which the matrix A takes the simple form of a diagonal s
matrix.

Exercise
Show that the matrix

l
7 -1 1
A == i -2 8-2

also takes the simple form

3 -3 9 J FIG.2.14

2 O 0
The second term of this expansion is simply equal to t TDqt, the squared distance

in the new co-ordinate system:


l 1
DA == O 1 O
O O 1
between y and Y'. The third term is zero, since y¡- y¡ = t, a constant vector, and
~¡Wi(y¡-'¡) = ~¡w¡y¡-~w¡Y¡ = y-~w¡(y;+t)

Thus:
= Y-(~w¡y;+t) = y-(Y' +t) = O

f¡ = e¡+2e 2 +e 3 ¡f¡(S'; y¡ ... y¡) = ~¡w¡(y¡ _y¡)TDq(Y¡ - y¡) +tTDqt


f 2 = -el + e 2 -e 3 Because the points y¡ (and y) lie in a subspace which is defined as aH the points of S'
f 3 = e¡ -e3 plus the vector t, then this equation shows that S is an improvement on S' by the
Show that in this case F is an orthogonal basis and hence derive the eigenvalues and positive quantity tTDqt = IItll!>,
eigenvectors of A.
2.6.4 Simple example of subspaee fitting
2.6.3 Optimal subspaee neeessarily eontains the eentroid Consider the 10 two-dimensional points defined by the columns of matrix X:
Given a set of points y¡ '" y1, with masses W¡ ". W1, in a K·dimensional weighted 2 3 5 6 7 9 9 11 13 15J
Euclidean space, show that an optimal K*-dimensional subspace (in the sense of X == [ 5 6 5 7 9 7 10 10 16 15
weighted least-squares) necessarily contains the centroid y ofthe points.
Find the ane-dimensional subspace which comes closest to aH the points (closest in
Proo! terms of least sum of squared distances), where we assume the two-dimensional space
We show that any subspace not containing y, for example S' in Fig. 2.14, is necessarily of the points to be ordinary (unweighted) Euclidean space and where each point has
sub-optimal. Ir we denote the point in S' which is closest to Y¡ by y; then the sum of equal mass. In addition, find the projections ofthe points onto this subspace.
the weighted squared distances from the points to S' is:
Solution
¡f¡(S';Y¡"'YI) == ~¡Wi(y¡-y;)TDq(y¡-yi) The centroid ofthe 10 column vectors ofX is
Without loss of generality we can assume that the masses W¡, which are non-negative,
add up to 1: ~¡w¡ = 1. Let Y' be the point in S' which is closest to y and let t be the
vector y-Y' which denotes the translation from Y' to y, as shown. FinaHy, let
x= Xl/lO = [:J
Y¡ == y; + t so that the movement from y; to y¡ is also t: y¡ - y; = t. The function ¡f¡ aboye where l is the lO-vector of ones.
may now be expanded as foHows: The matrix of deviations of the 10 points from their centroid xis thus
¡f¡(S';y¡"'YI) = ~iWi(Y¡-Y¡+Yi-y;)TDq(Y¡-Y¡+Y¡-Y;) X = [Xl -x X2 -x"'X¡O-x] = x-x¡T
= ~iWi(Y¡ _y¡)TDq(y¡_y¡)+ ~¡w¡(y¡-yi)TDiY¡-Y;)

I
=[-6-5-3-2-111357J
+ 2L¡w¡(y¡ -y¡)TDq(Yi - yi) -4 -3 -4 -2 O -2 1 1 7 6

l.
I

I[
2. Geometric Concepts in Multidimensional Space 47
46 Theory and Applications ofCorrespondence Analysis
.Ye
/
From our discussion in Section 2.5, the complete solution to the problem is contained r."'/ ·YIO
in the SYD ofX: ~<:$/
X = UDay T where UTU = yTy = I (o)
..§ J/
r;./
~~/
Since X has less rows than columns, we would first find the eigendecomposition ofXX T oQ./
(= UD;U T). The matrix XX Tis: /
/
xx T = [160 134J /
134 136
~ ¿(Y7 ·Ye
and its eigenvalues are the roots ofthe eigenequation:
/ y
IXX T -).11 = O /
Y4 /
where l...1 indicates the matrix determinant. That is: / ./ ·Ya
160-). 134 I !2
136-). = (160-),)(136-),)-134 2 = O
//
134 YI
1 • Y3

which reduces to the quadratic equation: "'Y~/


/
).2-296),+3804 = O /
/
The roots of this e$uation are 282.54 and 13.46 respectively (notice that the trace /
/
of the matrix XX is equal to the sum of the eigenvalues: 160 + 136 = 296 =
282.54+ 13.46). The eigenvector corresponding to the largest root 282.54 will define
the optimal subspace. This eigenvector satisfies the linear equation: XX TUl = 282.54u 1,
that is:
160a+ 134b = 282.54a (b) ongln

134a + 136b = 282.54b


-5
_~_ ..... _L. _1_-1-' ... _ , ...
!
-+_ ~_
5
... _+_L-+_-+_+ .... .........-L
10
¡;.. A ,..,.. "",.., 1 ,. "" I
where U1 == [a b]T, the first column ofU.
A A

YI Yz Y3 Y4 YsYa Y7 Ya Ye YIO
Either of these equations gives the relationship between a and b as a = 1.0935b, and
if we normalize U1 so that a2 + b2 = 1, we have that the unit vector defining the FIG.2.15. (a) Posltions of the point vectors in the full space. showing the
subspace is: optimalline; (b) optimal1-dimensional display of the points.
U1 = [0.738 0.675]T
These co-ordinates are thus calculated as: -7.13, -5.71, -4.91, -2.83, -0.74,
The 10 points and the unit vector U1 defining the closest subspace are given in
-0.61,1.41,2.89,8.41 and 9.21 respectively. For example:
Fig. 2.l5(a). We have also plotted the one-dimensional subspace separately in Fig.
2.15(b), with its origin at the centroid and the projections of the 10 points. There are ¡iu 1 = (-6xO.738)+(-4xO.675) = -7.128
two equivalent ways of calculating the positions of the projections in the subspace.
First we can calculate the first right singular vector v1 of X by using the result: Comments
(1) In practice we use an eigenvaluejeigenvector procedure on a computer to perform
T
V 1 = X u¡ja 1 the calculations. We have performed the calculations by hand in the above example
which is the first column of Y = XTUD; 1 (remember that we have worked with XX T to illustrate the numerical procedures involved. Alternatively a procedure to compute
here, not XTX). Then the co-ordinates with respect to u1 of the columns of X are the SYD directly can be used, if available (see Appendix B).
simply the elements of gl = al vl' which is again just the first column of G = VDa. (2) In this example we have in fact performed a principal components analysis of the
Secondly, because gl = al v1 = a 1X Tu¡ja 1 = X TU1, we can calculate gl simply as columns of the data matrix X (see Appendix A). The vector U1 defines the first
X TU1, which is the set of scalar products between the columns of X and the basis

I
principal axis of the columns of X and is the axis of maximum variance in the sense
vector U1: that the variance of the projections of the points onto U 1 is maximized (notice that in

-
T
X 1U 1 principal components analysis inertia and variance are identical). The sum of squares T
of the projections, i.e. gig1' is equal to the first eigenvalue ).1 = 282.54 of XX ,

r
-T - - T .
gl = X u1 = [X 1.. ·X 10] u1 = -T: (2.6.4)
X 10 U1
because: T r-T T T
glgl = u1XX U 1 = u 1UD,¡U U 1 =).1
48 Theory and Applications ofCorrespondence Analysis 2. Geometric Concepts in Multidimensional Space 49

(since UTU = 1) and hence the variance of the projections is gig¡/(lO-l) = 2¡/9 = the rows of y define 5 points in 4-dimensional weighted Euclidean space, where

1
31.39. The sums of squared deviations of the original rows of X are given by the weights are defined by the diagonal matrix:
diagonal of XX Tas 160 and 136 respectively, therefore the total variance of these two
rows is 160/9 + 136/9 = 32.89. Thus 100(31.39/32.89) = 95.4 % of the total variance of 17.3 O O O
the two rows of X is displayed by the projections onto the first principal axis. Since O 23.5 O O
the projections are of the form XTu l , where uiu I = 1, this shows that the elements of
D==
q O O 17.0 O
u I are the ones which maximize the variance of (normalized) linear combinations of [
O O O 42.2
the elements of the columns of X. This is the usual definition of principal components
analysis by Hotelling (1933) and is discussed later in the context of correspondence and let the points have associated masses proportional to 5.7, 9.3, 26.4, 45.6 and 13.0
analysis as well as in general in Appendix A. respectively (these values sum to 100). Calculate the first two principal axes of these 5
(3) Result (2.6.4) is a special case of the following general result in any scalar product points, the projections of the 5 points onto the principal plane and the percentage of
space: if u is a unit vector in the metric of the space, then the length of the (orthogonal) inertia of the points which is represented by this planeo
projection of a vector x onto the subspace defined by u is simply the scalar product of
Solution
x and u. For example, in weighted Euclidean space where scalar products are defined
by the diagonal matrix D q, with positive diagonal elements, then if u TDqu = 1, the r
Let the rows of Y be yi ... y where Yi is a 4-vector. Let w == [5.7 9.3 26.4 45.6 13.0]T
be the vector of masses of the points and D w == diag(w), the diagonal matrix of these
length of the projection of x onto the subspace defined by u is x TDqu. (In (2.6.4)
D q = l.) The projection is thus the vector (xTDqu)u and the vector from the projection masses. Since 1TW = ~iWi = 100.0, the centroid of the 5 points is y = ~iWiyJ100 =
to x is x - (x TDqu)u. It is easily seen that this latter vector is orthogonal to u (always [31.6 23.3 32.2 13.0]T.
in the metric D q' of course): The matrix of deviations from the centroid is thus:

uTDix - (x TDqu)u] = u TDqx - (x TDqu)u TDqu = O 4.8 -5.1 -4.9 5.2


-9.4 -6.6 6.7 9.2
Furthermore, an even more general result is that if the columns of the 1 x K matrix
U are orthonormal (in the metric D q ) then the orthogonal projection of a vector x
onto the K-dimensional subspace defined by U is the vector UUTDqx. The co­
- == Y-ly-T =
Y I 17.4 -3.7 -8.7 -5.2
-11.1 4.0 5.3 1.8
ordinates of x with respect to the basis vectors of U (cf. (2.6.3» are thus the elements
of UTDqx, which are the scalar products of each basis vector with x. The operation of 8.4 0.7 -4.2 -5.0
orthogonally projecting x onto the subspace defined by U is a linear mapping, defined From the results in Section 2.5 the results we require can be obtained from the
by the matrix UUTD q.
generalized SVD of Y in the metrics D w and D q respectively. This leads us to consider
Exercise
the ordinary SVD ofthe matrix S = D~2YD;:2
In Section 2.5 we discussed the generalized SVD of a matrix A = ND~MT as well as 47.7 -59.0 -48.2 80.7
the rank K* approximation A[K"] defined in (2.5.8). The scalar product and metric -119.2 -97.6 84.2 182.3
in the space of the columns of A is defined by D q and the singular vectors N are
orthonormal in this space, since NTDqN = 1. Verify that the columns of A[K"] are S= I 371.9 -92.2 -184.3 -173.6
orthogonal projections of the columns of A onto the subspace defined by N(K'h the -311.8 130.9 147.6 79.0
first K* columns ofN, that is:
126.0 12.2 -62.4 -117.1
A[K'] = N (K')N(K,)DqA (2.6.5)
(e.g. S12 = Y12(WIQ2)1/2 = -5.1(5.7 X 23.5)1 /2 = -59.0).

--
2.6.5
..."~.-
Another examp/e of subspace fitting
Consider the following data matrix of percentages:
We are interested in the rank 2 approximation ofS, which can be computed as:

0.058 0.462
-0.288 0.737
- 36.4
22.2
18.2 27.3
16.7 38.9
18.2
22.2 S[2] = U(2)D~(2)V(2) = I 0.718 0.048
Y
5x4
I
== 49.0 19.6 23.5 7.8 -0.572 -0.398
20.5 27.3 37.5 14.8 0.267 - 0.288
-'

40.0 24.0 28..0 8.0 639.4 O J[0.807 -0.177 -0.407 -0.389J


L [ O 233.3 0.171 -0.683 -0.042 0.709
where each row of Y adds up to 100 % (up to a possible rounding error of 0.1 %). Let
50 Theory and Applications of Correspondence Analysis 2. Geometric Concepts in Multidimensional Space 51

Then we know that N (2¡D ~(2¡Mr2) is the generalized rank 2 approximation of Y, with inertia of the five points is their weighted sum of squared distances to the centroid:
N (2) = D~1f2U(2) and M(2) = D;1 f 2Y(2), and that the two principal axes are LiW¡(Yi-y)TDiy¡-y) = L¡w;yiDqy¡, while the inertia in the plane is the weighted
defined by the orthonormal basis vectors in the columns of M(2): sum of squared projected distances: L¡w;fiTf¡. Of course we do not actuaHy have to
evaluate these sums because the squares of the singular values of S give the moments
0.194 0'041 of inertia of the points along the respective principal axes, that is the principal in­
1f2 -0.037 -0.141 ertias (cf. (2.5.9-2.5.11)). The third and fourth singular values of S are 47.8 and 1.8
M - D- Y ­ respectively, and this gives a total inertia of (639.4)2+(233.3)2+(47.8)2+(1.8)2 =
(2) - (2) - -0.099 -0.010

r
q
465549. The inertia in the plane is (639.4)2 + (233.3)2, which is 99.5 % of the total
-0.060 0.109 J inertia. Thus practically aH of the variation of the points is contained in the subspace
of the first two principal axes-Fig. 2.16 is an almost exact representation of the five
The projections of the points onto the subspace defined by these two axes are the rows points. The inertias Jl¡ along the axes are usually denoted by Ak(k = 1 ... K).
of the matrix F == N (2¡D ~(2) (remember that we are dealing with the rows of the
matrix y): Comments
(1) The total inertia is also the sum of squares of all the elements of S. (This general
15.5 45.1
result is proved by expressing the sum of squares of aH the elements of S as tr(SS T).)
-60.4 56.4
(2) Ir we place ourselves in the usual Euclidean reference system the basis vectors of
F == N(2¡D~(2) = D~1f2U(2¡D~(2) = 89.4 2.2 M(2) are neither orthogonal nor normalized. The sense of the orthonormality of M(2)
-54.2 -13.8 is with respect to D q in the metric of the row vectors: M(2)D q M(2l = 1. Here we stress
'again the important fact that the rows of F which are co-ordinates with respect to M(2)
47.3 -18.6 can be considered to be ordinary Euclidean vectors for purposes of distance and scalar
Since these co-ordinates are with respect to an orthonormal basis we can plot them product calculations (cf. Section 2.3). So whereas we cannot easily imagine the space
on the usual rectangular co-ordinate system (Fig. 2.16). in which the row vectors reside, the plotting of F in a Euclidean space brings the
In calculations of inertia it is customary to use the relative values of the masses points back into our familiar "frame ofreference".
(as in the centroid calculation), however this does not afTect our determining the value (3) The data matrix Y in the aboye problem has a property which has not been fully
of the inertia in the plane relative to the total inertia of the five points. The total discussed, namely that the sum of each row is a constant (100 %). This implies that the
rank ofY (and thus ofS too) is actually 3, not 4, and the fact that we obtained a fourth
basic value of 1.8 for S is merely rounding error-the fourth singular value is
A2 = 54 4291 (11.7%)
theoretically zero. Yet another property of the problem is that the inverses of the
dimension weights l/q1 ... l/q4 are proportional to the elements Y1" 'Y4 of the row
sea le centroid.
R.2 ~ The next example shows that in this particular situation we can actually omit the
centering of the rows and find the generalized SVD of y itself, in which case the
'RI
centroid ofthe rows is "contained" in the SVD. This is a situation which we shall meet
in correspondence analysis.

2.6.6 Case when centroid vector is coincident with a "trivial" principal


Al = 408 832 (87.8%) R3· aXIs
Suppose y (I x J) is a data matrix and that the sum of each row of Y is a constant c.
'R4
·R5 Let the rows yi ... y Tof Y define 1 points in J-dimensional weighted Euclidean space,
where the weights are defined by the J x J diagonal matrix D q • Let the points have
associated masses contained in the 1 x 1 vector w and let D w == diag(w). Suppose
further that the dimension weights q1'" qJ are inversely proportional to the elements
.v1"'YJ of the centroid y of the 1 points. Show that the principal axes of the points
may be obtained from the generalized SV"n of the uncentered matrix Y.
Proof
As in Example 2.6.5 the principal axes of the rows of Y are the right singular vectors
of the centered matrix Y:
FIG.2.16. Optimal2-dimensional display of the rows ofthe matrix Y. y == y _l yT = ND~MT (2.6.6)
52 Theory and Applications ofCorrespondence Analysis 2. Geometric Concepts in Multidimensional Space 53

where Therefore we define


NTDwN = MTDqM = 1 no == 1/(1 TW )1/2 m o == y/(rxC)1/2 and /lo == (rxel TW)1/2 (2.6.8)
and then
y = y Tw/1 Tw = ~iWiyj~iW¡ OT] [m T
Y = [no N] /lo o M] (2.6.9)
We can thus write: [ O Dp
y = 1yT+NDpM T is the generalized SVD of Y, with
We first show that 1 is orthogonal to N in the metric D w and y is orthogonal to M in [no N]TDw[n o N] = [m o M]TDq[m o M] = 1
the metric D q. To show the latter orthogonality the two given conditions are The singular value /lo must be the largest singular value of Y because /lonom~ = 1yT
sufficient : is the c10sest rank 1 matrix to Y, y being the c10sest point to the c10ud of points
qj = rxlYj j = 1...J (2.6.7) y 1... y I in terms of weighted sum of squared distances.
Thus the SVD of Y is a part of the SVD of Y: for example the first and second
y;r1 = e i = 1...1 (2.6.8) principal axes of the rows of Y(M(2) in our previous notation) are given by the second
From (2.6.6) we have and third right singular vectors of Y, and so on.
M = (y T-y1 T)D wND;1 Comments
and thus The result (2.6"9) also demonstrates that Y is of rank one less than Y-the operation
yTDqM = yTDq(y T-y1 T)D wND;1 of centering Y in this particular case removes exactly one dimension from the row
space ofY.
= OT
Another way of stating this is that the subspace of the row points, which is the
because subspace containing the vector deviations Y¡ - Yand defined by orthonormal principal
yTDq(yT _ y¡T) = rx1 T(yT -y1 T) axes M, is orthogonal to the vector y: this is just the result yTDqM = OT which we
have proved aboye.
= rx(el T_el T)
=OT Exercise
1
where we have used the results: y = rxDq- 1 and Y1 = el, which are (2.6.7) and (2.6.8) If you have access to a computer and also have a subroutine which can compute the
in matrix notation, and also the result that the sum of the elements of the centroid, eigendecomposition or SVD, compute the generalized SVD of the uncentered matrix
like each row, is e: Y in Example 2.6.5 and check that the rank 3 SVD of Y consists firstiy of /lonom~ as
in (2.6.8) (where c = 100 and rx can be evaluated as approximately 547) and secondly
1 Ty = 1 TyTw/1 Tw = el Tw/1TW = e the rank 2 SVD of Yas computed in Example 2.6.5.
Hence y is orthogonal to M with respect to D q.
Similarly, from (2.6.6) we have
N = (Y -lyT)D qMD;l ",
and thus
1 TD wN = 1 TD w(Y-1 yT)D qMD;1
= OT

because
1TDw(Y -ly T) = wTy _ (w T1)w TY/w T1
=OT
and hence 1 is orthogonal to N with respect to D w' The norms of 1 and y with respect
to D wand D q are respectively:
1 TD w1=l Tw
yTDqy = rxy TD qD¡ll = rxy T1 = rxc
3. Simple IIIustrations ofCorrespondence Analysis
""

TABLE 3.1
Matrix of artificial data: the frequencies of dlfferent types of smokers in a sample of
)
personnel from a fictitious organization.

Smoking category
Row
Staff group (1) None (2) Light (3) Medium (4) Heavy totals
(1) Senior managers 4 2 3 2 11
(2) Junior managers 4 3 7 4 18

~ (3)
(4)
(5)
Senior employees
Junior employees
Secretaries
25
18
10
10
24
6
12
33
7
4
13
2
51
88
25

Simple Illustrations of
Column totals 61 45 62 25 193

Correspondence Analysis

conduct a survey. In consultation with the company's statistician, he decides


that the staff members may be categorized into five groups of interest: (1)
senior management, (2) junior management, (3) senior employees, (4) junior
employees and (5) secretarial staff. A 10 % random sample is drawn within
each of these five groups and every person in this sample is asked whether he
Correspondence analysis is a technique for displaying the rows and columns or she (a) does not smoke; (b) smokes 1-10 cigarettes a day; (e) smokes 11-20
of a data matrix (primarily, a two-way contingency table) as points in dual cigarettes a day; or (d) smokes more than 20 cigarettes a day. These
low-dimensional vector spaces. In this chapter we shall treat sorne small data categories are chosen to separate respectively non-smokers, light, medium
matrices, which permit the introduction of the special concepts, notation, and heavy smokers. Data from 193 people are col1ected and the results are
computations and style of the analysis. These examples, although fairly contained in the contingency table ofTable 3.1.
trivial, should be fully understood before proceeding to subsequent chapters. Relative frequencies of smoking categories within each staff group are
In Section 3.1 we discuss a typical problem which leads to the rows or a given in Table 3.2 and this al10ws for easier comparison between the groups.
data matrix being displayed. In Section 3.2 the dual problem of displaying the To facilitate interpretation even further, the statistician draws a histogram
columns of the same data matrix is discussed, as well as its relationship to the next to each of these rows. Each row of relative frequencies is the row of
display of the rows. Section 3.3 deals with the interpretation of the displays original frequencies, divided by its total, and is often multiplied by 100 to be
in terms of various contributions to the so-called "inertia" of the matrix. The expressed as a percentage. In conventional statistical terminoIogy a set of
important concept of a supplementary point in the display is also discussed. relative frequencies (which add up to 1) is often caBed a sample probability
In Section 3.4 the geometry of correspondence analysis is further illustrated density. However, in correspondence anaIysis we shal1 use the term profile
by showing that the row and column points reside in "stretched" barycentric (see Section 2.4) because in sorne other contexts a probabilistic interpretation
co-ordinate spaces. is not applicable.
The chapter concludes with sorne complementarYtheoretical examples. Thinking of the profiles as points in 4-dimensional space, we have already
justified the weighting of each profile by the number of sampling units (e.g.
people) that constitute the profile (see Section 2.4), as well as the weighting
3.1 A TYPICAL PROBLEM of the dimensions by the re3pective inverses of the expected, or average,
profile (thus defining chi-square distances between profiles, see Section 2.3).
The data are collected in the following context. After publication of a nation­ The masses allocated to the 5 points are thus the row totaIs of Table 3.1
wide study of the dangers of smoking, the head of a large company is divided by the total of the matrix, namely 11/193 = 0.057, 18/193 = 0.093,
concerned about the cigarette smoking habits of his staff and he decides to etc.... , as indicated in the last column of Table 3.2. It is easy to show that the
T ABLE 3.2
Relativa frequencies of smoking categories within staff-groups (í.e. row profiles), each depicted graphically in the
form of a histogram, and the row masses.

(1) None (2) Light (3) Medium (4) Heavy Histograms Masses

(1) Senior managers 0.364 0.182 0.273 0.182 Oib 0.057

(2) Junior managers 0.222 0.167 0.389 0.222 llib 0.093

(3) Senior employees 0.490 0.196 0.235 0.078 0.264

(4) Junioremployees 0.205 0273 0.375 0.148 dib 0.456

(5) Secretaries 0.400 0.240 0.280 0.080 lli1 0130

Total sample 0.316 0.233 0.321 0.130


Qib
58 Theory and Applications ofCorrespondence Analysis
3. Simple Illustrations ofCorrespondence Analysis 59
centroid of the row profiles with these masses is the profile of the column
totals of Table 3.1, the average profile given in the last row of Table 3.2. This centroid elements ensures that the SVD of the uncentered matrix "contains"
is exactly the situation described in Example 2.6.6-the weights on the the SVD of the centered matrix.
dimensions are inversely proportional to the co-ordinates of the centroid and Let us denote the matrix of row profiles in Table 3.2 by R, and the diagonal
in fact the proportionality constant fJ. of (2.6.7) is equal to 1. Thus the matrices of the masses and the centroid elements by D r and De respectively,
geometry of the 5 profile points is completely specified by computing either so that the centroid is e = DJ and the metric in the space of the profiles
the generalized SVD of the matrix of profiles or that of the matrix of the is defined by De- 1. The co-ordinates of the profiles in the optimal two­
profiles' deviations from their centroid. dimensional subspace are thus provided by the rows of the 5 x 2 matrix F(2):
In fact if we compare the present problem to that of Example 2.6.5 we F(2) = N(2)DI'(z) (3.1.1)
notice that the only numerical difference is-~ of scale in each of the three
entities (the triplet) describing the probled:l..,firsJ, the data in that example are where N ql and D I'(Z) are the appropriate submatrices of the generalized SVD
100 times our present profil~-d~SeconQly, the masses there are also 100 ofR -le (or ofR, omitting the trivial dimension):
times our present masses. A'O,d thirdly; the dimension weights defining the R-Ie T = NDI'MT where NTDrN = MTDe-1M = 1 (3.1.2)
diagonal metric are approximately 5.47 times the present dimension weights.
(Notice that in Example 2.6.5 the dimension weights were chosen to add up The generalized singular values of R -leTare computed to be 0.2734, 0.1001
to 100, whereas here the inverses ofthe weights add up to 1.) In Example 3.5.1 and 0.0203. The co-ordinate matrix (3.1.1) is:
we shall show the general result that any changes in scale of the triplet of data 0.066 -0.194
merely filter through as related changes in scale of the final co-ordinates and
- 0.259 - 0.243
principal axes of the points. This result is intuitive because of the matrix­
multiplicative form of the SVD and its orthonormalization conditions on the F(2) = 0.381 -0.011 (3.1.3)
singular vectors. However, notice that this result is only true in our present - 0.233 0.058
situation where the inverse relationship of the dimension weights and the 0.201 0.078
Á2 = 0.01001 (11.7%) and can be seen to differ from the matrix F(2) of Example 2.6.5 by a constant
of proportionality, as expected (the present F(2) should be multiplied by
approximately 234). Notice that the signs of the columns of F(2) are not
determined and any changes in sign of the present columns of F(Zl (com­
pared to F(2) of Example 2.6.5, say) will be accompanied by a change in sign
• •
secretaries of the corresponding singular vectors of M(Zl' computed as:
junior employees

~
= 0.0748 (87.8%) 0.455 - 0,096,
ÁI

senior employees M _ - 0.085 0.329
(Z) - -0.231 0.024 (3.1.4)
-0.139 -0.256

Compared to F(z) and M(2) of Example 2.6.5, the sign of the second column
senior·managers
is reversed, otherwise the values can be seen to be in direct proportion.
junio~ managers We can again plot the rows of F(2) as points in the usual rectangular co­
~ scole I ordinate system (Fig. 3.1) and the relative positions of the points are identical
0.1
to those of Fig. 2.16. Because of the change in sign of the co-ordinates on the
FIG. 3.1. Graphical display of the row profiles of Table 3.2 with respect to the
best-fitting plane. The inertias. denoted by)., and ).2' and their percentages are
shown on their respective axes. I second dimension, each figure is a mirror image of the other.
The display of points in Fig. 3.1 may be called the correspondence
analysis of the row profiles of the original data matrix N. Notice that in

I
'lI"" ~

60 Theory and Applications ofCorrespondence Analysis


3. Simple I/Iustrations ofCorrespondence Analysis 61
correspondence analysis the three entities of the triplet are all relative in
Section 3.1 precisely to N T. If we divide each row of N T by its total we have
nature. The profiles are vectors of relative frequencies-frequencies relative
to their respective totals. The masses assigned to the profiles are relative a matrix C (J x 1) of column profiles. In our example this is the profile of a
masses-masses relative to the total mass. (In fact the set of masses also smoking category across the staff groups. The 4 smoking category profiles,
defines a profile, as we shall see in the next section.) The dimension weights given in Table 3.3, define points in 5-dimensional space. If these profiles are
are the inverses of the centroid elements, where the centroid is also a profile
1 weighted by masses proportional to the column totals then the centroid ofthe
profiles turns out to be the profile r of the row totals. The masses are in fact

I
(the average row profile). Each of these profile vectors comprises a set of
equal to the elements of e, the centroid of the row profiles (cf. Table 3.2) and
values which sum to 1, and with this as a prerequisite in the definition of the
the centroid r of the column profiles is, symmetrically, the vector of masses of
triplet any changes in scale of the row profiles, masses or metric are "divided
out" again. In fact, just one quantity links the correspondence analysis the row profiles. In the space of the column profiles the metric is defined in a
symmetric fashion by D r- l so that each dimension is again weighted inversely
described aboye with the absolute values of the data in N-that is the total
ofN, the number ofunits partitioned in the contingency tableo
The overaIl quality of representation of the points in Fig. 3.1, calculated in
terms of the relative values of sums of squared singular values, is exactIy the
I by the element of the average (or expected) profile. The triplet defining the
dual problem is thus C, e and D r- l. The co-ordinates of the column profiles
with respect to their optimal 2-dimensional subspace are thus provided by
the rows of the 4 x 2 matrix G(2):
same as in Example 2.6.5, namely 99.5 %. (The total inertia is equal to the sum
of the squared singular values: (0.2734)2 + (0.1001)2 + (0.0203)2 = 0.08518.
G(2) = M(2)DI'(2) (3.2.1 )
ConventionaIly the percentages ofinertia are written on the axes, for example
(0.2734)2 = 0.0748 and (0.1001)2 = 0.0100 are 87.8% and 11.7% ofthe total where M )and D 1'(2) are the appropriate submatrices ofthe generalized SVD
inertia respectively, totalling 99.5 % of the inertia in the plane.) We can thus F
of C - Ir (or of C, omitting the first trivial dimension):
interpret Fig. 3.1 as the almost exact positions of the points.
AIthough this example has been constructed for illustrative purposes C-Ir T = MDI'N
- - T
where MTDcM = ÑTDr-1Ñ = 1 (3.2.2)
rather than as a serious application of correspondence analysis, let us
nevertheless comment briefiy on the interpretation of the display. Notice that We compute the matrix G(2) of co-ordinates and the matrix Ñ(2) defining
the senior employees and secretaries are relatively similar to each other in the two principal axes in 5-dimensional space as:
terms of their smoking habits. Junior managers and junior employees are
relatively far from those groups, with senior managers lying almost midway 0.393

l
between junior managers and senior employees. In this way the rows of data -0.031,
-0.100 0.141
are mapped to points in aplane and our examination of the relative positions G(2) = -0.196 0.007
(3.2.3)
of the points suggests similarities and differences amongst the staff's smoking
habits. -0.294 -0.198

0.014 -0.110
3.2 THE DUAL PROBLEM -0.088 -0.226

In Section 3.1 we have investigated the geometry of the row profiles of the
Ñ(2) = I 0.368 -0.028 (3.2.4)
-0.388 0.263
contingency table N. In a similar and symmetric fashion we can investigate
the geometry of the column profiles of N. As we shall illustrate in our simple 0.095 0.102
example and show more formaIly in Chapter 4, the geometry of the column
profiles is directIy related to the geometry of the row profiles in a number of The singular values are 0.2734, 0.1001 and 0.0203, exactIy the same as those
ways, hence the name "correspondence analysis". of the analysis of the row profiles-hence our use of the same notation Ji. for
Let us thus consider N as a set of columns rather than a set of rows. A the singular values in both cases. Notice that 4 points occupy a 3-dimensional
space.
convenient way of thinking about this is that we apply all our discussion of
As we shaIl show more formaIly in Chapter 4 the matrices G and Ñ are
3. Simple Illustrations ofCorrespondence Analysis 63

A2 =0.0100 101.7%)

O)

-5
"O
O)
o.~
E g¡, r--C")~COo
.
lighl smoking

C ro ro L!"lO"lCOL!"lC")
ro (f) ~
OON~~
-O)

~
_ro ro
> 00000
O) o~

~Cl.

.
medium smoking

C
>
>
ro 00000 oC")
AI=0.0748 (87.8 %) ..
no smoking

E O) COCOCONCO
::J I O,......,......L..OO
o
U ~
00000 o
O)

(f)
.~ "O
E
::J
COC")~NC") ~
.
heovy smoking
1 seo l.
0.1
1

o O)
-o::::t,......Cf)(Y),...... N
O) O,......,......L!"),...... C")
~ 00000 O
.2:l
ro . C")
U (f)
O)
O)(f)
C")C(f)

w
' . - ro
C")~E
-.J
~
o
E
(f) E
C
­
..c
O)
:.::::i
~r--NC")C")
~CONC")C")
C")

C")

FIG.3.2. Graphical display of the column profiles of Table 3.3 with respect to the
best-fitting plane.
OONL!"l~ N
f- c 2 00000 O
.- o N related in a very simple way to matrices M and F respectively of Section 3.1
-5ü
.~ (see (3.1.3), (3.1.4)):
(f)
Cl. O) G = Oc-1MO¡.¡ or M = OcGO;l (3.2.5)
::J C coCOOL!"l~ CO
2O) O
Z
<O <D ,......
OO~N~
en <O ~

C") Ñ = OrFO; 1 or F = O;lÑO¡.¡ (3.2.6)


'+­
'+­
00000 O
~
(f)
F or example, the element m31 of M is - 0.231. The corresponding element of
'O Gis thus (from (3.2.5)): m31J1.dc3 = (-0.231)(0.273)/(0.321) = -0.196, which
(f) (f)
(f) (/) (/)
\- 1.-
Q)
Q)
Q)
Q)
checks with (3.2.3). Notice that the signs of the columns of G and M as well
O)
0)0»>
u 0)0)00
ro C'O _ _
as those of F and Ñ must agree. If we computed the solutions of the two
C
O) CCCl.Cl.(f) problems separately it could happen that the signs of the columns of these
::J
o­ EE~ ~.~
,;g (f) matrices might difIero However, having realized that the two problems are
~
1.- 1.- 1.- 1.­ O)
O O O O (f)
O) 'c'c'c'c
O)
Ü (f) related in this way we no longer treat them separately. Solving one problem
ro
> O) ::J O) ::J
O)
C/),C/),C/) ~ is sufficient since we can obtain the other solution thanks to (3.2.5) and (3.2.6)
.~

O)
a:
-----
0~~~e
if need be, in which case the agreement of signs is implicit.
The 2-dimensional graphical display of the points representing the 4
smoking types (the rows of G(2») is given in Fig. 3.2. N otice that the principal
inertias and thus their percentages are identical to those of the corresponding
Fig.3.1.
Formulae (3.2.5) and (3.2.6) tell us that the co-ordinates of the profile
points with respect to their principal axes in the one problem are related (by

,l.
3. Simple Illustrations ofCorrespondence Analysis 65
64 Theory and Applications ofCorrespondence Analysis
[(0.364 x 0.393)+ (0.182 x -0.1(0)+ (0.273 x -0.196)+ (0.182 x -0.294)]
simple pre- and postmultiplication of diagonal matrices) to the actual Jll = 0.2734
principal axes of the profile points in the other problem, and vice versa. This
symmetry of the two problems, along with the fact that the singular values, =0.065
and thus their squares, the principal inertias, are the same in both problems, [(0.364 x -0.031)+ (0.182 x 0.141) + (0.273 x 0.007)+ (0.182 x -0.198)]
is the heart of the duality. J12 = 0.1001
In practice we are usually not interested in the matrices M and Ñ which
define the principal axes in the dual problems. In the displays of Figs 3.1 and = -0.197
3.2 the co-ordinate system has become that of the principal axes and because Allllwing for rounding error these values agree with the first row of F(2) in
we are interested in the relative positions of the points, say the row profiles (3.1.3).
in Fig. 3.1, the relationship between the new and the old co-ordinate systems Geometrically it is clear that a particular row profile tends to a position (in
in the row profile space is of secondary importance. However, based on its space) which corresponds to the smoking categories which are prominent
(3.2.5) in this case, our interest in the position of the column profiles in in that row profile. F or example, the "non-smoking" point, defined by the first
Fig. 3.2 is seen to be related to that very change of co-ordinate system. This column profile, lies on the positive side (0.393) of the first principal axis and
is an important point which needs careful consideration in order to under­ any row profile which is relatively high on non-smokers will lie on the
stand fully the duality of the geometric concepts in correspondence analysis, positive side of its first principal axis. The "expansion" of the co-ordinates by
where practically all of the entities are serving dual purposes. dividing by the respective singular values is necessary because there is a
Another way of writing (3.2.5) and (3.2.6), in terms of the co-ordinate symmetric transition formula from the set of row profile points to the
matrices F and G only, is as follows: individual column profile points and the two sets of points cannot both be
G = CFD;l (3.2.7) barycentres of each other. In our example transition formula (3.2.7) would
mean that, given the display of the row profiles (the staff groups), a parti­
F = RGD;l (3.2.8) cular smoking category tends along principal axes in the direction of the staff
groups which are relatively prominent in that category.
These formulae, which we shall prove in Chapter 4, are known as the
Because of the geometric correspondence of the two clouds of points, both
transition Jormulae, because they describe how to pass between the co­
in position as well as in inertia, the displays of Figs 3.1 and 3.2 may be merged
ordinate matrices of the two dual problems. As an illustration of the
into one joint display (Fig. 3.3). There are advantages and disadvantages of
geometric implications of these formulae let us suppose that we know the
this simultaneous display. Cleariy an advantage is the very concise graphical
co-ordinates of the column profiles with respect to their first two principal
display expressing a number of different features of the data in a single
axes, that is we know matrix G(2) of (3.2.3). The first row of R, the matrix of
picture. The display of each cloud of points indicates the nature of similarities
row profiles, is: and dispersion within the cloud, while the joint display indicates the
ri = [0.364 0.182 0.273 0.182] correspondence between the clouds. Notice, however, that we should avoid
the danger of interpreting distances between points of different clouds, since
(this is the profile of senior managers across the smoking categories). In terms
no such distances have been explicitly defined. Distances between points
of transition formula (3.2.8), the co-ordinates of this row profile with respect
within the same cloud are defined in terms of the relevant chi-square distance,
to the first two principal axes of the row profiles is given by the first row of F 2'
while the between-cloud correspondence is governed by the barycentric
f 1T = -TG
f 1
D-
(2)
1
1'(2)
nature of the transition formulae, as described above.
= (0.364gi +0.182gi +0.273gj + 0.182gJ)D;(~) (3.2.9)
where gi ... gl are the rows of G 2 • The expression in parentheses is a bary­ PrincipIe of distributional equivalence
centre of the 4 column profile points, since the sum of the elements of the A further advantage of the use of the chi-square distance and the resultant
profile vector r 1 is 1. The postmultiplication by D ;(~) means that the co­ duality between the two clouds of points is called the "principie of distribu­
ordinates of the resultant barycentre are divided by the singular values J.ll and tional equivalence". This principie is very important to the French statisticians
J.l2 respectively. Thus the co-ordinates of the first row profile are:
3. Simple Illustrations ofCo"espondence Analysis 67
66 Theory and Applications ofCo"espondence Analysis
representation of the points, which are actually in 3-dimensional space,
A2=0.0100 because only 0.5 % of the total inertia of the points is not represented in this
(11.7%)
2-dimensional subspace. In practice, however, when we deal with much larger
data matrices, we seldom obtain such excellent 2-dimensional displays. If a
lillht smo~IO\I large percentage of the total inertia lies along other principal axes then it

(U) means that sorne points are not being wel1 represented with respect to the first
(se)
(JE) • two principal axes. Since the actual 2-dimensional display shows the projec­
secretorias
junior

employees
tions of the true points onto the plane and does not show which points lie
medium smo~inll (ME) close to the plane and which are further off, we need to consider additional
senior eniployees e( SE) information if we are to interpret the display correctIy. Remember that we are
Al =0.0748
(87.8%)

no smo~lnll (NO) trying to understand the geometry of a set of high-dimensional points
through an approximate low-dimensional display and we must know where
the display is accurate and where no1. This is analogous to many other areas
of statistics, for example in constructing a model for data, where we must
(SM)
heavy' smo~inll • study both the model as well as the quality of fit of that model to the data,
,enior manollers
(HV) where the model fits the data well and where no1.
(JM)

junior manallers
Contributions to inertia
I scole I
01 To il1ustrate the principIes involved, let us suppose that we choose a 1­
dimensional correspondence analysis of the data of Table 3.1, in other words
FIG.3.3. Correspondence analysis of the data in Table 3.1. with the points
displayed in the principal plane. This is the joint display of Figs 3.1 and 3.2. the final display is given by Fig. 3.4. This represents an approximate view of
the data and we know that it is still a very good overal1 view because 87.8 %
of the inertia is represented along this dimensiono We can informal1y interpret
who developed the technique in the context of linguistics (Benzécri, 1963). this dimension as separating the "smokers" on the left from the "non­
Briefly, this principIe states that if two profiles, row profiles say, are identical smokers" on the righ1. More formal1y, however, we can quantify the part
("distributionally equivalent"), then these two rows of the original con­ played by each point in establishing this particular dimension as the first
tingency table may be added together to give a single roW without affecting principal axis. The inertia along this axis, 0.07475 = (0.2734)2, is equal to the
the geometry of the column profiles. A symmetric result is true for identical weighted sum of squared distances to the origin of the displayed row profiles
column profiles. Geometrical1y this means that we can merge two points of a or, equivalently, the corresponding weighted sum for the displayed column
cloud which lie at identical positions into a new point which has the mass of profiles, the weights being the masses of the respective points. Each term in
both points, and this does not afTect the geometry of the points in the other these sums can thus be expressed as a percentage of this first principal inertia,
doud. This unique result, which is peculiar to the geometry of correspondence and we cal1 these contributions by the points to the principal inertia or to the
analysis, is proved in Section 4.1.17. principal axis (Table 3.4). For example, the point "medium smoking" has a
mass of 0.321 and a distance from the centroid (origin of Fig. 3.4) of - 0.196
(cf. (3.2.3)). Its absolute contribution to the first principal inertia is thus
3.3 DECOMPOSITION OF INERTIA 0.321 x (-0.196? = 0.01237, which is 16.5 %of 0.07475. In this example we
see that the points representing the senior and junior employees contribute
In Sections 3.1 and 3.2 we have described the two dual problems which make over 80 % of this principal inertia of the row profiles, while amongst the
up a correspondence analysis, how the displays of the row and column column profiles the point "no smoking" contributes 65 %just by itself. If we
profiles are obtained and the reasons for merging these two displays into one. think of the points in their fixed positions in the two corresponding spaces as
The display in Fig. 3.3 represents the graphical result of the correspondence exerting forces of attraction for the principal axis by virtue of their positions
analysis of Table 3.1. In this particular example this is an almost exact
3. Simple I/lustrations ofCorrespondence Analysis
69
heavy medium lighl
no
smoking smoking smoking
smoking
I I I ,
I I 1 I
I 1 I I

, • O 0748"
"'l·
r r"
l. •
" • ~I ,
I
~"
se~ior se~ior
I

~B&~§-­ c(j) (87.8%) junio: employees


JuniOr secretcries
f- Q. ~ O'Z LL? (j)
>
~
o .­en
managers managers employees
. =' e +-' ::J ('1')
>+-,+-,X
en ~ (j) (j)..Q +-'U el) ro sccle
§ ro ~ -5 '5'~
LL ~~l:ro o o
C'J'<tC'J'<tC'J
o o o o o
'<tLDCXl'<t
o o
o:r-'
',c=,(]) c q:, _ -o o. LD M
~0"0..80(j) (j).'= c·u
r--'<t ~C'J

:9(j)ro(j)u~ _ o ro c FIG.3.4. 1-dimensional correspondence analysis of the data in Table 3.1 (i.e.
'::;Q)CfJ>Q.>­
ro g o. .", poi nts in Fig. 3.3 with respect to the horizontal axis)
e
o
ü
ro'';:::¡
-o~'';:; e
CfJ Q) ID ( 1 J ' ­
> +-'
« o.
Q)(/)~-o

-o¡:cn(j)~Q.
0_ .
2 o ~·x (j) (j) ~ ro~ and their masses, then it is these points with high contributions which have
coCin.co..c'5 (j)co.C«::>
played the major role in causing the final orientation of the principal axis.
.g C ~ ;¡; r-- -o .~.~
ro ~.- o. o
'g'o ~ C'J <D O) C'J LD
crlC'JO)'<t<D
'<tr--~'<t
o)C'JOOCXl
en E (j).=: C C
CfJ=,..o-oco (j)..QCioU
OLDO)crlCXl O) M crl <D Here we notice that it is the points with highest mass which have contributed
ro_cQ)'~cn
CC'St);<ti 00000 0000 the most to the first axis. This is not surprising, since the principal axis tends
-o 8 ro -S ..Q~x
ro

c ~.-.-'

0 . - x

C
ro Q)
U
ro O).~ Q)
u - ro

more towards the higher mass points. Yet there are often cases which we shall
cn-:S'';:;OCCL

.- ~ 0­ consider in later applications where a point has fairly low mass, but neverthe.
;;;o~rouC -
0 ....... ..=

'O
less a high contribution to inertia because of its large distance from the
Cü-o~·~~~
o.cu~·+os
e ~
O'-t=
cv
e
centroid.
"'=t'- ro ro
.Uenwc-(j)
ro +-' Q)
'§ 52 ~:¡¡ C'0C'0C'J--:<:> M--:I.Cl<:>
LDM"'LD Based on the positions ofthe points on the first principal axis (Fig. 3.4) and
C'0.~ (j) . - ~.D ..CI +-' 0._ OCXl~C'0r-­
LD M <D ~ ~
UJ,,-:-=CX)Q) Q)
--,o.o'<t.r::(j)c 's.~ 'ü knowledge of which points have contributed the most to this axis, we can
~+-, r-I--:S(ñ c o c
.....
o 0..", assign sorne descriptive name to the axis to guide us in our interpretation. In
r-- ~ 0.0· o
-'::Sog-gu
Q)oco,.;:::¡=Q)
U o.
this case the axis clearly lines up the groups in terms of their level of smoking.
-E~E.E~~ en In fact, to be precise this is rather in terms of their degree of non-smoking
OLQ)·.:::::VJCO
+-'.~
....... e 'x
LD'<tC'J<DLD ~Mr--'<t
+-'
+-,
_ _ + cQ)"O
-,"­ 0'0 +-' ro
C'J C'J M r-- C'J 00 M C'0N because of the particularly high contribution of the point "non-smoking".
uoroou~ co Q..~ ro O<DCXl'<tLD CONNr­
Q)(/)Q.UCtlco ';:::;o.>-a. OOMC'JO ","Or-r­ The actuallevel of smoking-light, medium or heavY-does not playas large
o. ro·u ro o. ~ Q) ~ e 'ü 00000 0000
U')';:::; e(J) cr c O O c 00000 0000 a role as the distinction between smokers and non-smokers.
~ Ci>'~'~ en = o. .",o. Having interpreted this dimension of the points we would like to know
.c .~ ~ ro.2 ~
:=:(])c.n-cQ)+-'
5: (J) .~ +-'..c en how close each point lies to this 1-dimensional subspace. For example, we can
Q) - - ~ +-' . ­
en en
look at the angle () between the true profile point vector and the principal axis
~ -5.5'~'~ E (J)CJ)Q)Q) O'
~ (j) (j)
~
.'=0'
(j)(j)>->­
.- --
oOOC
Q) +-,.­
co 0 ' '0
ro 0 '_
0_ 0
0'-'< c
c 0·­ (Fig. 3.5). It is convenient to examine the squared cosine of this angle because
Q.cn;:='o Q. cco.o.en g~ E-3
(j)E~roo.ro
.r:: ~ e.9- (j)
u..c. o EE~ ~.~
.- O en
-3EEen
E
O (; O (; ~
+-' U') 1- +-'
Ilh row profile
'+-Q.> Q)c+-, U) E ~.2 5;­
o > O).~
(/)·,;:;cQ.o~
ro u·­
,+-'­
'c
(j)
'c'c'c U
(j) :l (j)
~
Cl)CfJ....cU co
c:: O Ol (j) (j)
§Z:.:i2I
wilh mass r¡
....
......
0----­
'';::' Q) -o Q).~ e ~(j)J(j)J(j) :

----- -a---­
Qj5r3-5t~ di ....
e ~ o.~..s
e o:: >.....1---­
~C'JM'<tLD (,~C'JM'<t

....
.... ....
:
cenlrold e
-< .... 81 f¡k
ei "klh principal a~is
FIG. 3.5. Co-ordinate f¡k of ith row profile with respect to kth principal axis. If the
row profile point is at a distance di from the centroid e then the angle it subtends e
e
with the axis is given by cos = f¡k/d¡. The quantity cos" is called the (relative) e
contribution ofaxis k to the ith point.
70 Theory and Applications of Correspondence Analysis 3. Simple lIIustrations ofCorrespondence Analysis 71

for each point the squared cosines of these angles with the full set of A2= 0.0\00
(11.7%) do nol drink
orthogonal principal axes add up to 1. Another way of describing this is that y
,I
the inertia r¡df of the profile point vector (that is the ith row profile with mass ,I
r¡ and distance di from the centroid) is decomposed along the principal axes. ,
The part of this inertia along the first axis, say, is r;id, where f¡t is the co­ liOhl sm~klno ,, ,
ordinate of the point on this axis. Expressed as a proportion of the point's ,
,
o nationwide average
total inertia this is r¡f¡i!r¡dr = (f¡dd¡)2 = cos 2 e. The amount cos 2 e is thus junior employees ,, , ­
secretories

e
called the contribution of the axis to the inertia of the point. If cos 2 is high, A'= 0.0748 me~lium smoklnQ
senior, employees
then the axis explains the point's inertia very well; equivalently e is low and (87.8°M
drlnkV
-.no smoking

the profile vector is said to lie in the direction of the axis, or "correlate" with
e,
the axis. The values of cos 2 which are also called the relative contributions
because they are independent of the mass of the point, are also given in
heavy smoklnQ
• -
senior managers
Table 3.4 as well as the angle between each point and the principal axis.
For example, the total inertia of the point "medium smoking" in its true
-
junior managers

3-dimensional position is its mass times squared distance to the centroid,


that is 0.321 x 0.0392 = 0.01260. (The squared chi-square distance is calcu­ I scale¡
lated directly as the weighted sum of squared differences between the 0.1

profile of "medium smoking" and the average profile, given in Table 3.3, FIG.3.6. Display of a supplementary row profile ("nationwíde average") and
with the inverses of the average profile elements as weights: 0.0392 = two supplementary column profiles ("drínking" and "non-drinking") in the
(0.048 -0.057)2/0.057 + (0.113 -0.093)2/0.093 + ... etc.). From Table 3.4 the analysis of Fig. 3.3 (supplementary data given in Table 3.5) ..
part of the inertia displayed on the first axis is 0.01237, thus the value of
cos 2 eis 0.01237/0.0126 = 0.981. This very high value indicates that the point context of our simple example, suppose that the percentages of non-smokers,
"medium smoking" is practically on the first principal axis and there is hardly light, medium and heavy smokers are reported by the nationwide survey to
any error in its display in Fig. 3.4. The other relative contributions indicate be 42 %, 29 %, 20 %and 9 % respectively in the country as a whole. This set
the quality of representation of each individual poin1. Generally a high of values defines a point in the space of the row profiles of our example and
contribution of the point to the inertia of the axis implies a high relative it is a simple malter to represent this point in the existing display by
contribution of the axis to the inertia of the point, but not conversely. The projecting the point perpendicularly onto the planeo To evaluate its co­
point "secretaries" on the first axis is extremely well represented, but its ordinates, fs say, we use the relevant transition formula (3.2.9):
contribution to the axis is minimal. The point "senior managers", on the f sT= (0.42g 1 +0.29g 2 +0.20g 3 +0.09g4 )Tn;<h = [0.258 0.118]
other hand, is poorly represented here and its position is almost orthogonal
to the first principal axis. The supplementary point is displayed in Fig. 3.6 and is seen to lie relatively
In Section 4.1.11 we shall show how the total inertia of a data matrix may far from the centroid ofthe points, approximately midway between the points
be decomposed in several different ways, in much the same spirit as the representing "no smoking" and "light smoking". This shows that our sample
decomposition of the sum of squares in analysis of variance. The different consists of general!y high proportions of smokers compared to the nation­
decompositions lead to the various definitions of contributions to inertia wide average. Of course we can actually see this fact quite clearly by
which we have introduced aboye. inspecting the data but the display of the points is much more informative
and lends itself conveniently to making comparisons and identifying patterns
in the profiles.
Supplementarv profiles In a symmetric fashion we could display supplementary column profiles,
A very important concept in correspondence analysis (as well as in al! the this time using the transition formula (3.2.7) from rows to columns. For
displays based on the SVD, described in Appendix A) is that of supplementary example, we might have an additional c1assification of the sample in terms of
rows and columns which are represented on an existing display. In the whether a person consumes alcoholic beverages or no1. Table 3.5 shows our
3. Simple I1lustrations ofCorrespondence Analysis 73

original data, with two extra columns showing how the sample is divided
according to this question. Each column defines a column profile in the same
space as the profiles of the smoking categories across the statT groups and can
(/) QJ
be projected perpendicularly onto the plane of the first two principal axes.
QJ ID Their positions are also shown in Fig. 3.6. Because of the orientation of these
>­ ~!"-<Ococo
~~~
.r::
~~o:::tr---~
N
+-'
e two points, more in line with the second principal axis than the first we can
E o ~
~
~
ro
see that in our sample there is no strong association between non-drinking
::J.r::
(/) O QJ
>
QJ
e () > O
and non-smoking. However, the alignment of the points does suggest a
0- QJ Z O~L!')O!"- ~
Uro.o
e
O
~

o
Z
possible association between drinking and level of smoking amongst the
+-' smokers, with relatively more drinkers in the high smoking group.
ro
E >­ Notice that we are not making any statements about the statistical
>
ro
O
significance of the associations and patterns observed in these graphical
e I
QJ
IN'<t'<t(v)N I
?f2.
(j)
displays. The displays are simply representations of the data where we can
e
E ~ view the data in a form which is more convenient for interpretation (cf.
::J
O
()
Section 8.1, where the stability of these displays is discussed).
"O
e
E
::J
We can again compute relative contributions (squared angle cosines) of
ro "O each of the supplementary points in order to quantify how well the points are
?f2.
~ QJ
(V)!"-N(V)!"- O
e ¿ ~ (V) N displayed. The point representing the nationwide average has relative con­
ro
e O)
e M tributions 0.631 and 0.131 respectively by axes 1 and 2. Adding these together
L!') +-' j2
O
we obtain 0.762 as the relative contribution of the plane to the point, which
(V)a3
~ E E we call the quality of the (planar) display of the point, in other words the
<Il
..:-
QJ C/J EO)
?f2. squared cosine of the angle the point subtends with the plane (this simple
f-~ :.:::i N (v)O'<t<O (j)
::J
(/)
~N N geometric result is illustrated in Fig. 3.7). The two points representing the
N
-:S drinking categories subtend the same angles with the axes and the plane, in
.~
fact they are joined by a straight line through the origino Their relative
O)
e QJ

contributions are both 0.040 and 0.398 respectively, implying a quality of


O
e
ro O ?f2. representation of 0.438, that is an angle with the plane of 49°. Thus the line
Z '<t'<tL!')COO N
~
~ N~~ '<t connecting these two points lies slight1y more outside the plane than inside it,
(V) ~

QJ and is much more associated with the second axis than with the first.
:oro
Because supplementary rows and columns do not play any part in defining
f-
(/) (/)
"O the chi-square distance function nor in determining the principal axes, the
O (fJ
10-
(fJ
~
(1)
<J.)
<J.)
<J.)
~
ro
+-'
QJQJ>->­
0)0)00
ro contribution by these points to the axes is not really defined. It is convenient
ro O­ ro ro _ _ E
O ::J ccO-O-(/) ..¡:; to think of supplementary profiles in a correspondence analysis as points
e EE ~
O) ~.~
(/)
QJ
with zero mass. Thus they really have no inertia at all in the analysis and do
'+- lo- lo- 10- lo- ~ QJ
'+­
ro oooOQJ "O
.~
not attract the axes in any way, yet they have positions which can be
Uí 'c 'c'c'c ü QJ
QJ::JQJ::JQJ e O) exarnined relative to the principal axes ofthe points with positive mass. Often
C/J,C/J,C/J .Q
+-'
eQJ the supplementary points do not have any natural mass in the context of a
;:-NM';;;-cn ro >
---- ---- ---- ---- ---- Zro particular example, for example the supplementary row of Table 3.5. On the
other hand, if a point does have a natural mass, then the value of its inertia
can be exarnined relative to the inertia of a principal axis. For example, the
two supplementary columns of Table 3.5 represent a partition of the same
sample of 193 people and their masses (0.119 and 0.881 respectively) are
74 Theory and Applications 01 f.;orrespondence Analysis 3. Simple Illustrations olCorrespoY}dence Analysis 75

axis 3 T ABLE 3.6


Numerical output of the correspondence analysis of Table 3.1. which. together
with the graphical display. forms the complete results of the analysis. (a) The
principal inertias (eigenvalues) and the total inertia. the percentages of inertia,
cumulated percentages. and a histogram. (b) The co-ordinates. relative contribu­
tions (squared correlations) and absolute contributions (decomposition of inertia)
for the row points with respect to the first two principal axes. Al! quantlties are
expressed in permills (thousandths) or multiplied by 1000 Thus the mass of the
profile point first row point SM is 0.057. its inertia in the full space is 0.031. its first principal
co-ordinate is 0.066. its squared correlation with the first axis is 0.092 and
it contributes 0.003 of the inertia of the axis (i,e. 0003xO,07475909); its
second principal eo-ordinate is -0.193. its squared correlation with the second
axis is 0,800 and it contributes 0214 of the second principal inertia (ie.

~ I
0.214xO.Ol001718). The quality (OLT) of representation of this point in the
two-dimensional display is 0.892, the squared correlation (cosine) with the
plane, which is the sum of the individual squared correlations (cf Fig. 3.7). (e)
Similar printout for the column points.

(a)
axis 2

FIG.3.7. 3-dimensional position of a profile point. subtending angles 8 82


Eigenvalue % eUM Histogram
"
and 83 with the 3 orthogonal axes Simple application of Pythagoras' theorem
0.07475909 878 87,8 ..•••••..................•••••

shows that cos 2 8, + cos 2 82 + cos 2 83 = 1 and that the angle 8 between the
0.01001718 118 99.5 ....
profile point vector and the plane of the first two axes. sayo is given by:
0.00041358 05 100.0
cos 2 f} = cos 2 f}, + cos 2 82 ,
0.08518985
comparable to those of the smoking category profiles. Their inertias in the
direction of the second axis are 0.01562 and 0.00211 respectively, giving a
combined inertia of 0.01773 which is well over the inertia (0.01001) of aH the (b)
smoking points in tbis direction. This would indicate to us that these points Name OLT MASS INR K=l eOR CTR K = 2 eOR eTR
would have a large attraction on the present second axis if they were allowed
to enter the analysis. If we remember that the positions of these points is more (1) SM 892 57 31 66 92 3 -193 800 214
in a third dimension than in the plane, then including them in the analysis (2) JM 991 93 139 -258 526 84 -242 465 551
would re-orientate the second axis to line up more with this "drinking"j (3) SE 1000 264 450 381 999 512 -10 1 3
(4) JE 1000 456 308 -232 942 331 58 58 152
"non-drinking" dimensiono
(5) se 998 130 71 201 865 70 79 133 81
Table 3.6 represents the complete numerical output of a correspondence -
analysis, in the output format of the correspondence analysis computer
program by Tabet (1973). Notice that in order to eliminate decimal points (c)
and thus facilitate printing and examination, aH quantities of a relative nature
are multiplied by 1000 and expressed as integers. The co-ordinates, too, are Name OLT MASS INR K = 1 eOR CTR K =2 COR CTR
multiplied by 1000 and rounded to integers (see the Table legend for details).
(1) NO 1000 316 577 393 994 654 -29 6 29
From our description above, the contributions are interpreted in two ways.
(2) LI 984 233 83 -98 327 31 141 657 463
First, for each principal axis (dimension) we look down the column headed (3) ME 983 321 148 -195 982 166 7 1 2
CTR in order to interpret the dimensiono Secondly, for each point we scan (4) HV 994 130 192 -293 684 150 -197 310 506
across the values in the COR columns to identify the axes which represent the
76 Theory and Applications of Correspondence Analysis 3. Simple Illustrations ofCorrespondence Analysis 77

point well. The values in the QLT column (sum of the eOR columns) The example concerns a large company which has three types of c1ient,
summarize the quality of representation of the points in the subspace of which we refer to as c1ients A, B and e respectively. A random sample of
chosen dimensionality. This Table of contributions supports the interpreta­ c1ients in each of these categories is drawn and amongst the information of
tion of the graphical output of the analysis (cf. the applications in ehapter 9). interest there are 5 demographic variables: sex, marital status, age group,
income group and region of residence. The data matrix of interest is given in
Table 3.7 and shows the breakdown of the 3 types of customer across the
3.4 FURTHER ILLUSTRATION OF THE GEOMETRY categories, 18 in all, of these variables. This matrix is actually composed of 5
In the simple illustration of correspondence analysis described in the previous contingency tables, each partitioning the sample of over 8000 people accord­
sections, we saw that the row and column profiles, vectors of order 4 and 5 ing to a difIerent categorization.
respectively, actually lie exactIy in a 3-dimensional subspace. In fact the exact The row profiles are 3-vectors with elements summing to 1, for example
dimensionality of the set of row and column profiles of a data matrix in amongst the 4844 males in the sample the proportions of the 3 types of c1ient
correspondence analysis is always 1 less than the minimum of the number of are 0.50, 0.18 and 0.32 respectively. If we depict this point and the other
rows and the number of columns. We shall illustrate this fact using a matrix profile points in 3-dimensional space in the usual way, we see that they alllie
with 18 rows and 3 columns, which is thus of dimensionality 2. Because the in an equilateral triangle whose vertices are at the points [1 O O]T,
example is exactly 2-dimensional, this allows us to illustrate c1early the efIect [O 1 O]T and [O O l]T (Fig. 3.8(a)). This triangle represents a1l3-vectors
of the dimension weighting as well as other aspects of the geometry. whose elements are non-negative and add up to 1, with the vertices represent­
ing the 3 most "polarized" profiles. Thus we can take the triangle out of the
TABLE 3.7
space and represent all the row profiles exactly in a 2-dimensional triangle
Frequencies of 3 types of client (columns) within 18 categories (rows) of 5
(Fig. 3.8(b)). This is in fact the triangular (or barycentric) co-ordinate system
demographic variables.

which is often used to represent data consisting of sets of three values with a
Type of client constant sum, for example the percentage composition of three constituents
of soil samples (see J6reskog et al., 1976, pp. 93 and 160 for examples in
A B e geology).
Sex Male 2444 853 1547 The relative proportions of the three types of c1ient in the sample are
Female 712 551 923 roughly 0.45, 0.20 and 0.35 respectively and hence this is the centroid of all
the row profiles. Notice that this is also the centroid of each subset of points
Marital status Unmarried 523 290 519 corresponding to the categories of one of the variables. (There are slight
Married 2630 1106 1946
deviations amongst each subset because sorne people refuse to respond to
Age Al (16-24 yr) 189 70 136 sorne questions, especially the question on income.)
A2 (25-34 yr) 796 133 444 This triangular co-ordinate system is an ordinary Euc1idean display of the
A3 (35~49 yr) 11 00 314 706 row profiles and the display of the points in this system is not quite the
A4 (50+ yr) 1070 882 1187 correspondence analysis display, where we know that the dimensions are
Income 11 (Iowest) 273 336 427 weighted inversely by the centroid values, thus defining chi-square distances
12 1005 422 739 between the points. The easiest way to think of the dimension weighting is to
13 1049 305 609 imagine the triangle to be elastic, then stretched difIerentially along its 3 sides
14 (highest) 767 250 612 to form a new triangle, no longer equilateral but with sides inversely
proportional to the square roots of the centroid values: 1/(0.45)1 /2 = 1.49,
Regions Rl 436 142 315
R2 843 226 494
1/(0.20)1 /2 = 2.24 and 1/(0.35)1 12 = 1.69 (Fig. 3.9). The side of the triangle
R3 243 84 453 most stretched is thus the one corresponding to the least frequent type of
R4 346 145 248 c1ient, client e, and the side least stretched corresponds to the most frequent
R5 775 584 708 type, client A. The points are situated in this new triangular co-ordinate
R6 519 226 263 system in the same way as before, using vectors parallel to the 3 sides.
78 Theory and Applications ofCorrespondence Analysis

e//~nf C
la) 3. Simple Illustrations ofCorrespondence Analysis
79

0.1
0.8
0.2
0.7
e/len/ A
0.3
0.6
0.4
0.5

lb)

, 0.3
0.4
----- ----- --- - ---.,
females ,,' "
,1
05

0.6
,,
,, 0.7
females': ,
0.2
- - - - - - - - - -~. eenlroid
• o • 0.8
..
....
-- - - - - •• - -., - - \- -~males
..
0.1

...
.. . 0.9 0

'---...!.- ' .,
..
..
...... • , e/I~nf A
-0~9
...
0:8 0:7 0:6 0:5 0'4 0:3

:

0:2
\
0.1
~ ~ ""3­
di

1>
Ctien! B

FIG. 3.9. The "stretched" barycentric co-ordinate system which is the weighted
FIG.3.8. (a) Position of the profiJe point "males" in the 2-dimensional triangle. Euclidean space of the correspondence analysis. Distances between the points
(b) The triangular (or barycentric) co-ordinate system of the row profiles of Table are chi-square distances.
3.7. showing the positions of "males" and "females" as well as the row centroid.
A= 0.0083 (25.1"10) I -R6
ellen! B
Distances in this new display are the actual chi-square distances of the
correspondence analysis. Hence correspondence analysis of the row protiles -R5 etian! A

can be considered as tinding the principal axes of the points (with associated R4 me¡ies _I3
-A4 12J:marr led -R2
masses) in this new triangular co-ordinate system. -11 A=0~4.9"1o)
Al -R1 A3 _A2
Figure 3.10 is the joint graphical display of the row and column protiles in
the correspondence analysis of Table 3.7. The positions of the column points
in Fig. 3.10 are also in so-called "principal co-ordinates", that is with respect
­
females
ellan! C
-I4

to the principal axes of the column protiles. However, it is often useful to


display the column protiles in what we shall can "standard co-ordinates".
These represent the columns as the original unit vectors in the stretched
triangular co-ordinate system of the rows, that is by the vertices of the I seole ...
0.1
triangle in Fig. 3.9. In this case the row points are exact1y weighted averages
of the column points, the weights being the elements of the respective row
protiles (see Sections 4.1.15 and 4.1.16).
In this example it is easier to follow the geometry because the points lie -R3
FIG.3.10.
Correspondence analysis of the data in Table 3.7.
with the points
displayed exactly in the principal plane.
3. Simple Illustrations ofCorrespondence Analysis 81
80 Theory and Applications of Correspondence Analysis
y (1 x J) a matrix of 1 row vectors
exactly in 2-dimensional space. In our previous example (Sections 3.1-3.3) w (l xl) the masses assigned to the rows of Y
the row profiles, which are 4-vectors, actualIy lie within a 3-dimensional D q (J x J) the diagonal matrix defining scalar products and distances between rows
tetrahedron (i.e. a pyramid) with vertices at the unit values of the 4 ofY
dimensions. This tetrahedron can similarIy be imagined to be differentialIy As in Example 2.6.6, suppose that the dimension weights are inversely proportional
stretched along each dimension to provide the positions of the points which to the elements ofthe centroid ofthe points: qj = IJ./Yj,j = 1... J.
Suppose now that we rescale Y, w and D q to be:
we subsequentIy investigate. In general if a data matrix has 1 rows and J
columns, where we suppose that J ~ 1 for ease of description, then the space y* = fJY w* = 1'w D q• = ~Dq
of the row profiles is a (J -1)-dimensional polyhedron with J vertices. The Show that the co-ordinates of the rows of y* with respect to their principal axes are,
sides of the polyhedron are proportionalIy stretched according to the inverse apart from a uniform change in scale, the same as the co-ordinates of the rows of Y
square roots of the co-ordinates of the centroid and we can think of with respect to their own principal axes.
correspondence analysis as the study of the points in this new polyhedron. Proof
This description of the barycentric co-ordinate system under!ying the The generalized SVD which defines the complete principal axes geometry of the rows
points in correspondence analysis is particular!y useful in our understanding of Y is (cf. Section 2.5 and Example 2.6.6):

[~oO OTJ T
of certain peculiarities of the analysis, for example the Guttmann effect (or
"horseshoe" effect), described in Section 8.3. Y= [00 N] DI' [mo M] (3.5.1)
In conclusion let us briefiy interpret the display of Fig. 3.10. Because this is where
an exact display we do not realIy need the tables of contributions unless the [0 0 N]TDw[oo N] = [mo M]TDq[mo M] = 1 (3.5.2)
principal axes are individualIy of interest. Our interpretation must be in terms
The columns of M define principal axes and the rows of F == ND l' define the
of the positions of the categories of each variable relative to the three points co-ordinates of the rows with respect to the principal axes. We can write (3.5.1) as:
representing the 3 types of client. It is interesting to note that the age groups
lie more or less on a straight line approximately orthogonal to the vector fJY = (1/1'1/2)[0 0 N] (fJ1'1/2~1/2{~0 ~] (IW/ 2) [mo M]T
representing clients C. Thus there is very littIe difference in proportions of
clients e across the age groups, but rather an interchange between clients B i.e.
(proportionalIy high in the oldest age group) and clients A (proportionalIy y* = [o~ N*] [~~
O
OT
DI'.
J[m~ M*]T (3.5.3)
high in younger age groups, especialIy 25-34 years group). Other interesting
facts displayed in the analysis is that the lowest income group has high where the matrices in the right-hand side have been rescaled by the respective scaling
proportions of clients B and e, while the highest income group has high factors preceding them above. The rescaling implies that
proportions of clients A and e (when we say "high" we mean high relative to [o~ N*]TDw.[o~ N*] = [m~ M~]TDq.[m~ M~] =1 (3.5.4)
the centroid profile which is [0.45 0.20 0.35]T). Notice that we must so that (3.5.3) and (3.5.4) define the complete principal axes geometry of the rows of
interpret different row profiles strictIy in terms of their profiles of client types. y* (the inverse relationship of dimension weights to centroid elements is still true).
Thus the proximity of the lowest income point and the point representing Thus the columns of
females means that they have similar usage of the company's services and not M* = (1/~1/2)M (3.5.5)
that females have generally a low income. In order to investigate this latter
define the principal axes, and the rows of
possibility we would need to know the contingency table crossing the sex and
income variables. F* == N*DI'. = (fJ~1/2)NDI' = (fJ~1/2)F (3.5.6)
are the co-ordinates of the rows with respect to these principal axes.
3.5 EXAM PLES
3.5.2 Correspondence analysis as two dual principal co-ordinates

3.5.1 Invariance ofprincipal axes underrescaling ofpoints. masses analyses

and metric Principal co-ordinates analysis accepts a square symmetric matrix A of squared
Suppose we have the following triplet defining 1 points, with masses, in weighted distances between a set of objects and produces displays of the objects in subspaces of
Euclidean space:
82 Theory and Applications 01 Correspondence Analysis

chosen dimensionality (cf. Appendix A). Show that the principal co-ordinates analysis
of the matrix of chi-square distances between a set of profiles, where each profile is
weighted by its usual mass, yields the same solution as the correspondence analysis of
the profiles.

Prool
Let us suppose that the profiles are row profiles contained in the 1 x J matrix R. The
matrix ,i of squared chi-square distances is given by: ,i = sI T+ 15T- 2S, where S is
the matrix of chi-square scalar products with respect to any origin (for example:
S == RDc-1R T) and s is the vector of diagonal elements ofS (cf. Example 2.6.1 (a)).
As in Example 2.6.1~) the first stage of a principal co-ordinates analysis, namely
the operation -tCi),iCi) (where Ci) == 1 -Ir T) recovers the chi-square scalar products
with respect to the centroid profile r TR = eT: [!J
-tCi),iCi)T = (R-lc T)Dc- 1(R-lc T)T
The second stage of a principal co-ordinates analysis, where the points are weighted
by the elements of r, is to compute the eigendecomposition of D;/2( -tCi),iCi)T)D;/2,
Theory of Correspondence
say VD.lV T, and then display the objects in principal co-ordinates by the rows of
D,-1/2VDl/ 2. This is the same as the correspondence analysis solution (in principal Analysis and Equivalent
co-ordinates), given by (3.1.1) and (3.1.2), because (from (3.1.2)):
D;/2(R-lcT)Dc- 1(R _lc T)TD;/2 = D;/2ND~MTDc-lMD~NTD;/2
Approaches
= D;/2ND~NTD;/2
since MTDc-1M = 1. Hence:
D;/2ND~NTD;/2 = VD.lV T
i.e. In this chapter we first present a formal treatment of the algebra of
(ND ~)(ND~)T = (D,-IJ2VDl/2)(D,-1/2VDl/2)T correspondence analysis (Section 4.1). We then discuss various aIternative
and the result follows by the uniqueness of the eigendecomposition. approaches that have originated in different contexts: reciprocal averaging
(Section 4.2), dual scaling (Section 4.3), canonical correlation analysis of con­
tingency tables (Section 4.4) and simuItaneous linear regressions (Section 4.5).
As discussed in Chapter 1, aH these techniques share the same numerical pro­
cedure to arrive at their respective solutions, but they each have completely
different rationales and interpretations. The uniqueness of the rationale and
interpretation of correspondence analysis lies in the muItidimensional geo­
metric framework in which the problem is casto A number of examples once
again concIudes the chapter, serving to complement and ilIustrate the
, chapter's material (Section 4.6).

4.1 ALGEBRA OF CORRESPONDENCE ANALYSIS

By nature of the material to be described in this section we have chosen a


difTerent format of presentation. Each numbered subsection treats a single
COncept in correspondence analysis, for example a definition or a property.
The formal presentation of each concept is displayed between rules for em­
. phasis and for easy reference. In each subsection this may be preceded by a
verbal description of the concept, and foHowed by a proof and/or comments.
84 Theory and Applications ofCorrespondence Analysis 4. Theory and Equivalent Approaches 85

4.1.1. Suppose N is a matrix of non-negative numbers, such that its row and total n... It is slight1y easier notationally to work with P than with N, since
column sums are non-zero. The correspondence matrix P is defined as the correspondence analysis is only concerned with the relative values of the data
matrix of elements of N divided by the grand total of N. The vectors of row and is thus invariant with respect to n...
and column sums of Pare denoted by r and e respectively and the diagonal
matrices ofthese sums by D r and De respectively. 4.1.3. The row and column profiles define two clouds of points in respective
J- and /-dimensional weighted Euclidean spaces.

Data matrix
Row cloud Column cloud
N(I x J) == [n¡J, nij ~ O Points: The / row profiles rl ' " r1 Points: The J column profiles cl, .. cJ
Correspondence matrix in J-dimensional space in / -dimensional space

P == (l/n. .)N, where n.. = 1TN l (4.1.1) Masses: The / elements ofr Masses: The J elements ofe
Row and column sums Metric: Weighted Euclidean with Metric: Weighted Euclidean with
dimension weights defined by dimension weights defined by
r == PI and e == P TI (4.1.2) the inverses of the elements the inverses of the elements
where r¡ > O(i = 1 ... 1), cj > O (j = 1 .,. J) of e (chi-square metric), that ofr (chi-square metric), that
isD;l isD r- l
D r == diag(r) and De == diag(e) (4.1.3)

Comment: The terms "distance" and "metric" are synonymous here. The chi­
Comment: The sum of the elements of Pis 1. When N is a contingency table, square metric is an example of a diagonal metric defined by the distance
P can be considered to be a probability density on the cel1s of the / x J matrix, function (2.3.3). The diagonal matrix involved in the distance and scalar
and r and e the marginal densities. This is only an analogy when N is another product is itself often referred to as the metric, for example the metric D e- 1.
type of matrix. Notice that 1 == [1 ... 1]T denotes an /-vector or a J-vector
of ones, its order being deduced from the particular context. 4.1.4. The centroids of the row and column clouds in their respective spaces
are e and r respectively.
4.1.2. The row and column profiles of P (equivalently ofN) are defined as the
vectors of rows and columns of P divided by their respective sums.
Row centroid: e = R Tr Column centroid: r = eTe (4.1.5)

Matrices ofrow and column profiles

~ {~J e ~ n,-'p ~ l:J


Proo!, The jth element of each row profile is pjr¡, where r¡ is the ith e1ement
T (4.1.4) of r. Thus the jth element of the centroid is "i¡r¡(pjr¡)/,i¡r¡ = "i¡p¡j (because
R D,-'P
"i¡r¡ = 1), which is cj, the jth e1ement of e. In matrix notation the result is
easily proved: the row centroid (as a row vector) is
r TR/r Tl = rTR = rTDr-lp = l Tp = e T
Comment: Both the row profiles r¡ (i = 1... 1) and column profiles cj (j = 1... J)
are written in the rows of R and e respectively. These profiles are clearly (becauserTD;l = l T andr Tl = 1).
identical to the rows and columns of N divided by their respective sums, just Similarly it is proved that the centroid of the column profiles is r, the vector
as r and e are identical to the row and column sums of N divided by the grand of row masses.
86 Theory and Applications ofCorrespondence Analysis 4. Theory antt .t;qulValem ""ppruucnes

4.1.5. The overal1 spatial variation of each cloud of points is quantified by Proof" From (4.1.6) we have:
their total inertia, that is the weighted sum of squared distances from the in(I) = I:¡r¡I:j(p;)r¡-cY/cj
points to their respective centroids, the masses and the metric being defined
= I:¡I:/p¡j-r¡c)2/ r ¡Cj
in 4.1.3.
and in(J) = I:hI:¡(p¡/cj-rY/r¡
= I:¡I:j(p¡j-r¡cY/r¡c j
Total inertia of row points Total inertia ofcolumn points
in(I) = I:¡r¡(r¡ - e)TD c- (i\ -e)
1
in(J) = I:jcj(cj-r)TDr-l(cj-r) Hence
in(I) = in(J)
(4.1.6)
In the X2 formula nij = n.. Pij and thus the "expected" value in a cell is:
l.e. l.e.
eij == (I: jni) (I:¡ni)/n ..
in(I) = trace[D r(R-lcT)D;l(R-IeT)T] in(J) = trace[Dc(C-lrT)D;l(C-IrT)T]
= (n ..r¡) (n .. c)/n ..
(4.1.7)
= n..ricj

This implies that X2 = n.. in(I) = n..in(J), hence (4.1.8).


Comment: We use the notation in(...) to mean "inertia of. ..", with the
argument l or J to indicate the row or column cloud of points respectively. 4.1.7. The respective K*-dimensional subspaces of the row and column
Later we shal1 use a similar notation to indicate the inertia in(J1) of a subset clouds which are closest to the points in terms of weighted sum of squared
of JI column points, say, or of a single column, for example inU). The distances are defined by the K* right and left (generalized) singular vectors
notation is a bit lax, since both the inertia of all J columns and that ofthe Jth respectively of P - re T, in the metrics D; 1 and D; 1, corresponding to the
column alone are denoted by in(J), but the meaning is always clear from the K* largest singular values. In other words the right and left singular vectors
context. We often think of the number of elements of a set (e.g. l, J, JI' ... ) define the principal axes ofthe row and column clouds respectively.
equivalentIy as the index set of the elements, for example l == {1, 2, 3, ... , l}.

4.1.6. The total inertia is the same in both clouds and is equal to the mean­
Principal axes
square contingency coefficient calculated on N, that is the chi-square statistic
for "independence" divided by the grand total n.. (calculated as if N were a Let the generalized SVD of P - re Tbe:
contingency table). P-re T = ADJlB T where A TD r- 1A = BTD c- 1 B = 1 (4.1.9)
/11 ~ ... ~ /1K
> O. Then the columns of A and B define the principal axes of
the column and row clouds respectively.
in(I) = in(J) = I:¡I: j (p¡j-r¡c j )2 = x2/n ..
r¡cj
l.e. Proof: Let us consider the cloud of row points defined by the row profiles in
= trace[D r- 1 (P - re T)Dc- 1 (P - re T)T] (4.1.8) R == D r- 1P, with associated masses in the diagonal of D r and in weighted
where Euclidean space defined by the diagonal metric Dc- 1. From Section 2.5 we
2 = ~.~. (nij-eij)2
know that the principal axes as well as the co-ordinates of the row profiles
X - ~1~J with respect to these axes are obtainable from the generalized SVD of R -le T
eij
(the centered row profiles), where the left and right singular vectors are ortho­
and eij == n¡.n)n.., the "expected" value in the (i,j)th cel1 of the matrix based normalized with respect to D r and D c- 1 respectively. That is, if:
on the row and column marginals ni. and n. j •
D;IP-Ie T = LDcJ>MT where LTDrL = M TD;lM = 1 (4.1.10)
4. Theory and Equivalent Approaches 89
88 Theory and Applications ofCorrespondence Analysis

then the columns of M define the principal axes and the rows of LD q, define axes B (in the chi-square metric axes A (in the chi-square metric
the co-ordinates. D.- 1). Then:
0;1). Then:
Ifwe multiply (4.1.10) on the left by D. we obtain
P-reT=(D.L)Dq,MT where (D.L)TD;I(D.L)=M TD e- l M=I (4.1.11)
.
F = D- IAD
"
G = De-lBDI' (4.1.14)

which is in the form of (4.1.9) and shows that the columns of M (the principal Proof: Let us consider the co-ordinates of the row profiles, for example.
axes) are identical to those of B. It follows in a similar and symmetric fashion l
Notice that, because the principal axes B are orthonormal (BTDe- B = 1),
that the principal axes of the column cloud, which are defined in ¡-dimensional tbese co-ordinates are just the scalar products of the centred profiles R -leT
space by the right singular vectors of C -Ir Tin the following decomposition: with B (cf. Section 2.3), hence our definition (4.1.13). We can show (4.1.14) in
D e- l p T-lr T = YD",ZT where yTDeY = Z TD.- I Z = 1 (4.1.12) two equivalent ways. The direct proof is to rewrite (4.1.13) as follows, for
example:
are identical to the columns of A.
F = D.- l (P -re T)D; 1B (4.1.15)
Comment: Notice that the sets of singular values /11 ... /1K in (4.1.9), ePI .•. eP K
in (4.1.10) and IjJl ... 1jJ K in (4.1.12) are identical. We tacitly assume that each (using 1 = D.-Ir). Multiplying the generalized SVD (4.1.9) of P -reT on the
singular value is difTerent, in which case the singular vectors are uniquely right by D;IB we obtain:
defined up to reflections only (see Appendix A). Strictly speaking, then, we (P - re T)De- 1B = AD"
should say that the principal axes M of the row cloud are identical to the
columns of B up to reflections. If there are equal singular values amongst the hence the expression (4.1.15) becomes F = D.- IAD", the desired resulto An
first K* then the corresponding columns of M and B are only identical up to alternative proof is possible, since we know from (4.1.10) and (4.1.11) that
reflections and rotations; however, the subspace defined by M is the same as F = LD q,' where D.L is the matrix A and D q, = D w This immediately gives
that of B, which is what we are really interested in. The real problem is when the desired resulto The symmetric result G = D; 1BD" is similarly proved by
the K*th and (K* + 1)th singular values are the same, in which case the K*­ either of the aboye arguments.
dimensional subspace is not uniquely defined in its K*th dimensiono Even
Comment: The expressions (4.1.14) define the co-ordinates of the row and
though this will never occur exactly in practice, there are nevertheless
column profiles with respect to all the principal axes (the co-ordinates of
practical issues of stability which crop up when singular values are almost
individual poinis are contained in the rows of F and G). The co-ordinates of
equal, which we shall discuss later in Chapter 8 and in the course of various
tbe points with respect to an optimal K*-dimensional subspace are contained
applications.
in the rows of the first K* columns of F and G. F or example, if we write F(2)
and G(2) as the first two columns of F and G respectively, then the rows of
4.1.8. The respective co-ordinates of the row and column profiles with
F(2) and G(2) define the projections of the row and column profiles onto
respect to their own principal axes (Le. the principal co-ordinates) are related
to the principal axes of the other cloud of profiles by simple rescalings. respectively optimal planes.

4.1.9. As an immediate consequence of (4.1.9) and (4.1.14) the two sets of


co-ordinates F and G are related to each other by the following formulae.
Principal co-ordinates of Principal co-ordinates of
row profiles column profiles
Let: Let: Transitionfrom rows (F) Transitionfrom columns (G)
F == (D;IP-le T)De- l B G == (D e- l p T _lr T)D.- l A to columns (G) to rows (F)
IxK IxJ JxJ JxK JxK JxI IxI IxK
G = D- l p TFD- l = CFD- l F = D;lpGD~l = RGD~1 (4.1.16)
(4.1.13) e " "
i.e. GD = D- l p TF i.e. FD" = D;IPG
be the co-ordinates of the row be the co-ordinates of the column " e
profiles with respect to principal profiles with respect to principal
71
4. Theory and Equivalent Approaches
90 Theory and Applications o[Correspondence Analysis

Comment: Thus the ¡th row of Gis equal to a barycentre cJF of the rows of Decomposition ofinertia
F foBowed by an expansion in scale of 1/¡.¡,. on the kth dimension, for
axes
k = 1 ... K. The coefficients of the barycentre are the elements of the column total
profilec¡' the ¡th row of C. Symmetrically the ith row of F is equal to a bary­ 1 2 .. , K
centre r¡ G of the rows of G followed by similar scale expansions, where the rtft2 .. , rlftK 'lr.k!?k
coefficients of the barycentre are the elements ofthe row profile ri, the ith row
rtf?1
... r 2r.dlk
ofR. 2 r2fll r2h~ rdA

roWS
r¡f¡~ .. , rifA r¡r.d/k
4.1.1 O. With respect to the principal axes, the respective clouds of row and 1 rdA
in(l) = in(J)
column profiles have centroids at the origin. The weighted sum of squares of
total Al := ¡.¡,1 A2 == ¡.¡,~ ." AK := ¡.¡,k
the points' co-ordinates (i.e. weighted variance or (moment of) inertia) along 2 2 Clkkg~k
clg 12 ." c Ig lK
the kth principal axis in each cloud is equal to ¡.¡,;, which we denote by Ak and 1 clg11
call the kth principal inertia. The weighted sum of cross-products of the 2 c2g1K c2 r. kg1k
2 c~11 C~22 ."
co-ordinates (or weighted covariance) is zero.
columns
CJg JI
2 CJgn
2 .. , CJg~K CJr.kg}k
J
Centroid ofrows ofF Centroid ofrows ofG
rTF = OT cTG=OT (4.1.17)
These tables form the numerical support for the graphical display. We call the
Principal inertias of row cloud Principal inertias ofcolumn cloud columns of these tables contributions of the roWS and columns respectively to
the inertia of an axis. We can express each of these contributions as
FTDrF = D~ == D;. GTDcG = D~ == D;. (4.1.18) proportions of the respective inertia Ak (=- ¡.¡,~) in order to interpret the axis
itself. These contributions are often caBed "absolute contributions" because
they are affected by the mass of each point. Each row of these tables eontains
Proo!" The centerings (4.1.17) are obvious because the rows of F and G are the contributions of the axes to the inertia of the respective profile point.
merely the respective sets of centred profiles with respect to new reference Again we can express each of these as proportio ns of the point's inertia in
systems ofaxes. A prooffollows immediately from (4.1.13), for example: order to interpret how well the point is represented on the axes. These are
often cal1ed "relative contributions" because the masses are divided out (ef.
rT(Dr-1P-lc T)= lTp_cT =cT_c T =OT Section 3.3).

The results (4.1.18) pertaining to the weighted sum-of-squares and cross­ 4.1.12. In eorrespondence analysis the centering of the row and column
products of the principal co-ordinates follow directly from the standardiza­ profiles is a symmetric operation which removes exaetly one dimension from
tion ofthe principal axes in (4.1.9), and from (4.1.14). the original spaces of these profiles. This is embodied in the result that the
SVD of the uncentered matrix P "contains" the SVD of the centered matrix
P_rc T •
4.1.11. As a consequence of (4.1.6), (4.1.7) and (4.1.18) the total inertia of
each cloud of points is decomposed along the principal axes and amongsr
the points themselves in a similar and symmetric fashion. This gives a Let the generalized SVD of P be:
(4.1.19)
decomposition of inertia for each cloud of points which is analogous to a p = AD¡IBT where ATn;lA = BTn;lB = 1
decomposition of variance.
4. Theory and Equivalent Approaches 93
92 Theory and Applications ofCorrespondence Analysis
roots of the principal inertias: (Dc-1pT)F = GD,U' Applying the matrix of
while that ofP-re Tis given by (4.1.9). Then:
row profiles R == D; 1P to this result leads to:
Á. = [r A] (4.1.20) (D;IP)(Dc-1pT)F = (D;IP)GDI' = FD~
B= [e B] (4.1.21) because R maps G to a similady rescaled F in a symmetric fashion:

D~=[~ DI'
OTJ (Dr-1P)G = FD,U'
(4.1.22)

Comment: The aboye eigenequations should not be used separately to obtain


that is, there is a trivial part of the SVD of P consisting of a singular value of
F and G. Not only would this be a wasteful computational method but there
1 and associated left and right singular vectors r and e respectively, while the
would be sorne inevitable "errors" in the sign of corresponding eigenvectors,
remainder of the SVD is exactIy that of P - re T. Because 1 is the largest
because the signs of eigenvector solutions are not identified.
singular value, the non-trivial singular values are al1less than or equal to 1.

4.1.14. As an immediate consequence of (4.1.9) and (4.1.14) we have the


Proof: We only need to prove that r and e are respectively orthogonal to A fol1owing formula fo! reconstituting the correspondence matrix P from the
and B and correctIy standardized (in their respective metrics), and that 1 is matrices F, G and D ¡." and an approximate formula using the submatrices
the largest (generalized) singular value of P. The former result follows from F (K.), G (K.) and D J1(K.) of the rank K* weighted least squares approximation.
(4.1.9), or from subsequent resu1ts (4.1.14) and (4.1.17). The standardizations
are trivial: rTD;lr = eTDc-1e = 1. Finally, the matrix re T must be the
"closest" rank 1 matrix to P (in the metrics D; 1 and D c- 1), for the same Reeonstitution formula
reason that the centroids r T and e Tare the closest points to the rows of
D c- 1P Tand D r- 1P respectively. P = re T+D r FD-1GTD
J1 c
(4.1.25)
~ re T+ DrF(K.¡D ;(k·)G(K·)D e (4.1.26)

4.1.13. The columns of F and of G are (non-trivial) eigenvectors of the Le.


Pij = r¡cj(l + r.f fikgjk/Jlk) (4.1.27)
respective matrices RC and CR, standardized according to (4.1.18). The (non­
trivial) eigenvalues of both these matrices are the principal inertias. ~ r¡e)l + r. k

/ikgjk/l1k) (4.1.28)

Row eo-ordinates as eigenveetors Column eo-ordinates as eigenveetors


The approximate reconstitution of the Pi¡ from the principal axes display can
(RC)F = FD A (CR)G = GD A be used to impute missing values in the data matrix-see Sections 8.5 and 8.6.
l.e. i.e.
(Dr-lPDc-lpT)F = FD A (Dc-lpTDr-lP)G = GD,¡ (4.1.23) 4.1.15. The standardization (4.1.18) of the principal co-ordinates, restated in
(4.1.24) is the "natural" standardization imposed by our definition ofthe two
with the standardization: with the standardization:
dual and symmetric geometries. However, there wil1 be many situations
FTDrF = DA (=D~) GTDcG = DA (=D~) (4.1.24) where we are willing to sacrifice the symmetry of the definition in order to
gain other advantages. Another standardization which we shall use is that of
unit inertias along principal axes, and we denote the row and column
co-ordinate matrices with this standardization by «1> and r respectively. We
Proof: This is a direct consequence of the transition formulae (4.1.16). For
cal1 these standard eo-ordinates to distinguish them from the matrices F and
example, the matrix C == D; 1P Tof column profiles, considered as a mapping,
transforms the columns of F to the columns of G, "shrunk" by the square G of principal co-ordinates.
4. Theory and Equivalent Approaches 95
94 Theory and Applications ofCorrespondence Analysis

the display in standard co-ordinates.) The column points are actually the
projections of J "unit profiles" (the rows of the J x J identity matrix) onto the
Standard row co-ordinates Standard column co-ordinates principal subspace. This result is proved in Section 4.4 and illustrated in
(J) == FD; 1 r == GD; 1 (4.1.29) Section 3.4.
Hence the columns of (J) are Hence the columns of r are
standardized as: standardized as: 4.1.17. The following property, called the "principIe of distributional equiva­
lence" (Benzécri et al., 1973), is peculiar to correspondence analysis and, in
(J)TDr(J) = I rTDcr = I (4.1.30) particular, to the display in principal co-ordinates. If two row points, say,
occupy identical positions in multidimensional space, then they may be
merged into one point, whose mass is the sum of the two masses, without
4.1.16. By an asymmetric display we mean that the standardizations imposed afTecting the masses and interpoint distances of the column points. Similarly,
on the two sets of points is different. Most commonly, one of the sets is a row of data may be subdivided into two (or more) rows of data, each of
represented in principal co-ordinates while the other set is represented in which is proportional to the original row, leaving the geometry of the column
standard co-ordinates. The transition formulae between these points are then points invariant.
asyrnmetric, as is the interpretation of the display.

Principie ofdistributional equivalence


Asymmetric transitionformulae between F and r
If two row profiles (say) are identical then the corresponding two
Rows to columns Columns to rows
rows of the original data matrix may be replaced by their summa­
tion (a single row) without aiTecting the geometry of the column
r = D- 1p TFD- 2
c Ji
F = Dr-1pr (4.1.31)
profiles.
I.e.
rD~ = Dc-1pTF
Proof" Without loss of generality we suppose that the first and second rows
Asymmetric transitionformulae between (J) and G
of N have the same profile: n1j/nl. = n2 )nV j = 1 ... J. We remove rows 1
Rows to columns Columns to rows
and 2 from N and create a new first row with elements nlj+n 2j,j = 1... J.
The new matrix Ñ has one row less and the profile of its first row has a mass
G = D-1pT(J)
c
(J) = D r- 1pGD;2 (4.1.32) equal to the sum of the masses of the first two row profiles of N.
I.e. The masses of the column profiles are clearly unafTected by this replace­
(J)D 2Ji = D-1pG ment. The squared distance between two column profilesj and 1is now
r

r.{:2{ (ñu/n) - (ñ¡l/n.lW/(ñ¡./n. J


Proof" These results are a direct consequence of (4.1.16) and (4.1.29). whereas it was previously:

r.{: 1{(nu/n) - (n¡¡jn.lW /(n¡./n. J

Comment: In a principal axes display of the rows, say, in principal co­


ordinates and the columns in standard co-ordinates, the row points are Terms from i = 3 onwards are identical in these two expressions. Thus we
need to show that the first term of the first expression is equal to the first
exactly at barycentres of the column points, where the barycentric weights are
the elements of the respective row profiles. This might be advantageous in two terms of the second expression. This is easily proved by substituting
sorne practical situations, especially if the principal inertias are fairly high.
ñ = n j+n2j and using the given condition that nlj/n1. and n2j/n 2. are equal
2j 1

(When they are low the display in principal co-ordinates is much smaller than and thus also equal to (nlj+n2j)/U11. +n2J = ñ2)ñ2.·

96 Theory and Applications ofCorrespondence Analysis


4. Theory and Equivalent Approaches 97
4.2 RECIPROCAL AVERAGING
Yjs entering (4.2.1), so that we could introduce a scaling factor into (4.2.2),
say:
Hill (1973) introduced the use of reciprocal averaging in the analysis of
ecological data. A subsequent paper (Hill, 1974) popularized the term Yj = ~r¡(ndn)x¡ (4.2.3)
correspondence analysis as a translation of "analyse des correspondances", (where ~ > 1) in order to enable (4.2.1) and (4.2.3) to be soluble. An
but did not present the geometric description and the tables of contributions alternative formulation is to split the "expansion factor" ~ equal1y between
(decomposition ofinertia) which are the foundations ofthe French approach. the two stages of averaging so that the equations to be solved are:
This latter paper is widely cited today as a description of correspondence
analysis (for example, Mardia et al., 1979, Section 8.5) and essential1y the X¡ = ~1/2r)ndn¡JYj (4.2.4)
same material appears as a definition of the technique (Hill, 1982). However, Yj = ~1/2 r¡(ndn)x¡ (4.2.5)
we would maintain the use of the term reciprocal averaging when referring to
descriptions of this kind. Reciprocal averaging is defined as the computation of solutions x 1 .•• XI
Reciprocal averaging is defined by the transition formulae (4.1.16) with an and Yi . .. YJ which satisfy either pair of formulae and for which ~ is a
arbitrarily chosen set of identification conditions. The ecological context minimum. In other words we want to rescale as little as possible in (4.2.3) to
where these equations are applicable is usually in the determination of an recover the Yjs, or equivalently, we want (4.2.4) and (4.2.5) to be as close as
ecological gradient from a matrix N of observed frequencies or abundances possible to the reciprocal averaging relationships (4.2.1) and (4.2.2) for which
of 1 vegetational species, say, at J sites. If we knew that the sites lay along ~ = 1. ~
sorne ecological gradient, quantified by values Y1" .YJ (e.g. altitude), we The objective of minimizing ~ is equivalent to maximizing the "shrinkage
could think of e1ch species also on this gradient at a position defined by a factor" ). = 19 (where). < 1), in which case (4.2.4) and (4.2.5) are identical to
weighted average (barycentre) of the site positions, the weights being equal to the transition formulat: (4.1.16) in l-dimensional space. Hence the present
the relative abundances of the species across the sites. That is, the position of problem is equivalent to finding the first principal dimension in the
the ith species is: correspondence analysis of N and the optimal ~ wil1 be the inverse of the first
principal inertia ).1'
x¡ = rj(ndndYj (4.2.1 ) Formulae such as (4.2.4) and (4.2.5) do not identify the origin and scale of
the X¡S and Yjs and one way to achieve particular solutions is to impose
where ni. is the total number of species i in al1 the sites. Sets of values Xi'
conditions of origin and scale, for example:
i = 1 ... 1, and Yj' j = 1 ... J, are often cal1ed "ordinations" of the species and
sites respectively. r¡(n¡./n Jx¡ = rj(n)n. ,)Yj = O (4.2.6)
Correspondingly, given an ordination Xi' i = 1 ... 1, of the species we can

think of the sites ordinated at positions which are weighted averages

r¡(n¡./n. ,)x? = r)n)n ,)yJ = 1 (4.2.7)


(barycentres) of the species positions, the weights being equal to the relative
that is, in our previous notation:
abundances of each species at the site; that is, the position of the jth site is:
r Tx = e Ty = O
Yj = r¡(ndn)x¡ (4.2.2)
xTDrx = yTDcY = 1
Under these identification conditions the solutions x and y are exactIy the
where n. j is the total number of species at the jth site.
standard co-ordinates of the rows and columns on the first principal axis in
Unfortunately (4.2.1) and (4.2.2) cannot hold simultaneously for any X¡S
the correspondence analysis of N. In fact, notice that the centering of one set
and Yjs apart from the trivial case when al1 the X¡S and Yjs are 1, or when
of values implies the same centering of the other set of values: r Tx = e T y , so
the matrix N has a special block structure (see Example 4.6.7). For non­
that it is sufficient to centre one set of values only (Example 4.6.2). Similarly,
trivial Yjs, the X¡S defined by (4.2.1) are necessarily "interior" to the YjS, and
the standardization of only one set of values is sufficient, the other being
similarly the barycentres on the right-hand side of (4.2.2) are in turn
implied by the particular formulation of reciprocal averaging. When the
"interior" to the set of X¡S (by "interior" we mean the range of the set of values
"symmetric" formulae (4.2.4) and (4.2.5) are used, the standardization of the
has decreased). Therefore the Yjs produced by (4.2.2) must be interior to the
solutions x and y are necessarily identical: xTDrx = YTDcY.
98 Theory and Applications ofCorrespondence Analysis 4. Theory and Equivalent Approaches 99

Other ways of identifying the solutions are to fix any two values of one set The identified values ofy(l) are obtained by dividing the unidentified values by
(say the Yjs), usually two "end point" values, or to impose the centering (4.2.6) the square root of this inertia, for example - 0.09962/(0.004912)1/2 = -1.421.
and then rescale in order to fix one value. Notice that in the present case of Under the second choice (b) of constraints, the initial values are already
1-dimensional ordination the choice of identification conditions imposed on identified and after a complete reciprocal averaging these have to be
the final solution is immaterial to their interpretation, and is relevant only in recentered and rescaled so that Yl1) = 200 and y~1) = 1000. The difference
the geometry of multidimensional ordinations. between the unidentified values of y~l) and Yl1) is 556.3 - 507.4 = 48.9, which
is equivalent to 1000-200 = 800 on the identified scale, hence the identified
value of yi1), for example, is calculated as:
Computation by reciproca/ averaging
{(539.7 - 507.4) x (8oo/48.9)} + 200 = 728.8
Iterative application of the reciprocal averaging formulae, incorporating
identification of each successive set of trial solutions, wil1 actual1y converge (Note that the left-hand side of this expression is subject to rounding error­
at the optimal solution. We illustrate this procedure using the matrix of the value on the right is the result of the more accurate calculation.)
Table 4.1, the abundances (which can be frequencies or areal coverage, for During the 8th iteration (reciprocal averaging), the identified ordination
example) of 5 species of trees in 4 different sites on a mountain slope. (These has converged sufficiently to terminate computations. By convergence we
are the same artificial data as in Table 3.1, presumed to occur in the present mean that the identified set of values Mk ) •.• y~k) in our example) are close
ecological context.) As an initial set of values for the 4 sites, we can use their enough to the previous identified set Mk-l) ... y~-l») to be cal1ed identical.
altitude values: 200, 500, 700 and 1000. Centered and standardized according The difference between'" the unidentified and identified solutions at this
to (4.2.6) and (4.2.7) these are -1.241, -0.127,0.616 and 1.730 respectively. convergence point wil1 enable us to compute the optimal value of the
As an alternative we can fix the values of Yl and Y4 to be 200 and 1000 expansion factor~, equivalently the shrinkage factor 19. Under the choice (a)
respective1y, their respective altitude values. Table 4.2 shows the initial of identification conditions, and using (4.2.3) in this case, we need only inspect
computational steps for both choices of identification conditions, as wel1 as scale differences to deduce that 19 = 0.07476 (e.g. =0.1075/1.438). Under
the final solution. A complete reciprocal averaging is performed before the conditions (b), however, the scale change should be evaluated using un­
values used need to be identified. Under the first choice (a) of identification identified and identified deviations from the centroid (which is 657.9 in this
conditions, the unidentified y(l) already satisfies eTy(l) = O and its weighted case); or, equivalently, using the unidentified and identified differences
sum of squares (inertia) is: between two individual values, for example, between those specifical1y used
in the identification (= (683.5 - 623.7)/(1000 - 200)). Notice that, as expected
y(l) TDcy(l) = (61/193) x (-0.09962f +... + (25/193) x (0.08215f under conditions (a), the optimal ordination of the columns (sites) is the set
= 0.004912 of standard column co-ordinates in the correspondence analysis of Table 4.1,
while their averages xl7 ) ... X~7) (ordination of the species) are the principal
TABLE 4.1 row co-ordinates.
Same data as Table 3.1. but presumed to Occur in an ecological context. This algorithm is a special case of the alternating least squares algorithm
which derives the largest singular value and associated pair of singular
Sites (average altitude) vectors of a rectangular matrix. In Appendix B we discuss the convergence
Site 1 Site 2 Site 3 Site 4
Trees properties of this algorithm.
(200 m) (500 m) (700 m) (1000 m)
In practice, reciprocal averaging is used chiefly to obtain a single pair of
Species 1 4 2 3 2 11
ordinations x and y, as described aboye. This process can be repeated to
Species 2 4 3 7 obtain another pair of ordinations "orthogonal to" the first pair, equivalent
4 18
Species 3 25 10 12
Species 4
4 51 to the second principal dimension of correspondence analysis. The computa­
18 24 33 13 88 tion of this second pair is often useful in order to check the stability of the
Species 5 10 6 7 2 25 first. By stability we mean that smal1 changes in the data matrix do not lead
61 45 62 25 193
to the algorithm converging at a dramatical1y different solution. This topic is
treated in more detail in Section 8.1.
TABLE 4.2

Some initial and final steps of the reciprocal averaging computations on Table 4.1, performed under two sets of

identification conditions on the column scale values: (a) centered at mean zera. standardized to have unit variance;

(b) two fixed scale values.

(a) (b)

Identification condition
cTy=O, T
y O cy =l I
Y, =200, Y4=1000
Initial values: yIO) ... y~O)
-1.2411 -0.1270 06157 1.7298 I 200 500 700 1000
First stage of averaging to obtain xIO) . . x~O), as in (4.21)
0.008047 0.3269. -0.3527 0.1980 -0.2161 I 536.4 622.2 439.2 587.5 476.0
Second stage of averaging to obtain y;') ... y~), as in (4.2.2)
-0.09962 002052 0.04999 0.08215 I 507.4 539.7 547.7 556.3
Identified yl') ... y~')
-1.42140.29280.71331.1723 I 200728.8858.4 1000
First stage of averaging to obtain xi') ... x~')
-0.05597 02708 -0.3796 0.2298 -0.2048 I 6212 7220 5213 709.3 575.3

Second stage of averaging to obtain Yi


2
).
. y~2)

-01073 0.02645 0.05357 0.08122 \ 6t5.3 646.6


654.9 663.5

ldentified y;2). . y~)

-1.4364 0.3543 0.7175 1.0877 \ 200 767.5 882.7 1000

etc.

Identífied Yi?)· . y~)


-1.438 0.3637 0.7180 10744 \ 200 773.7 886.5 1000
Fi rst stage of averagi ng to obtai n xi?) ... xt¿)
-006577 02590 -03806 0.2330 -0.2011 \ 637.0 740.4 536.8 732.1 598.9

Second stage of averaging to obtain Yi S


) ... y~S)

-0.1075 002719 0.05368 008032 \ 623.7 6666 675.0 6835


Identified y;S) . . y~S)

-1.438 0.3637 0.7180 1.0744 \ 200 7737 8865 1000


103
4. Theory and Equivalent Approaches
102 Theory and Applications of Correspondence Analysis

It is unfortunate that this stepwise computational procedure has led to ¡nitiol


column scole volues
Cl
-
o
e2
.
C3
2
C4
3
excessively large stress being laid on the undimensionality of the ordinations
and their separate interpretations. The second pair of ordinations is often
regarded with a degree of suspicion because of its "artificial" orthogonality
with the first. Correspondence analysis, however, views the problem multi­ initiol R3R5RIR4R2
• ~ •••
I
dimensional1y, so that the combination of the first and second ordinations, ro'N scores o k 3
say, suitably standardized and plotted with respect to principal axes, is
considered an optimal 2-dimensional ordination rather than the "addition"
oftwo 1-dimensionalones. 1-dimensional ordinations, like the principal axes,
are guidelines for the interpretation of higher dimensional ordinations. They
can be given descriptive names but they do not necessarily reflect the el C2. C3 C4
existence of a true latent variable such as an ecological gradient. At best we
optima\
column scale volues" _\ o . . ,..
can say that sorne external variable is "highly associated" with a particular
ordination, otherwise we see no valid reason to think of the ordinations
individually. R5
optlmol
ro'N scores
I
_\
R3. '.1 oRtR2
1, I
1

4.3 DUAL (OR OPTIMAL) SCALlNG FIG.41. Initial and optimal sea le values for the eolumns of Table 3.1 (or Table
4.1) and derived roW seores. The optimal seale values have been eentered and
i sealed to be directly comparable with the initial values. The dispersion (as
Fírst approaeh: maxímízíng seore varíanee measured by the inertia) of the optimal row scores is larger, although this may not
be obvious by inspeetion, In faet the inertia of the initial seores is 0.0660 while the
Let us use the same data (Table3.1) once again in order to introduce the inertia of the optimal seores is known to be 0.0748. the first princ'lpal inertia.
concepts of dual scaling, but return to the context original1y described in
Section 3.1, where the rows are staff groups and the columns are categories
optimal solution for the scale values, we conventionally fix their mean and
of smoking. In this example we can think of assigning a scale value to each of
the four categories of smoking, say O, 1,2 and 3, and then evaluate a position °
variance over the whole sample to be and 1 respectively:
for each of the staff groups on this scale as an average of the scale values of (61YI + 45Y2 + 62Y3 + 25Y4)/193 = O
the members in the group. For example, senior managers have 4 non­ (61yi+45y~+63y~+25y¡)/193 = 1
smokers, 2 light smokers, 3 medium smokers and 2 heavy smokers, giving an
In our general notation ofSection 4.1, this is recognized to be:
average value of {(4 x O) + (2 x 1)+ (3 x 2) + (2 x 3)}/11 = 1.27. Values for the
other four staff groups can be similarly evaluated and the resultant "scaling" cTy = ° (4.3.1)
is shown in Fig. 4.1. Thus for a particular set of scale values y 1 ... y 4 for the yTO cY = 1 (4.3.2)
columns, the way we obtain seale values for the rows, which we shall cal1 row
scores, is precisely as described in the previous section, using the averaging The values of the row scores are the elements of the l-vector O; IPy (where
formula (4.2.1). If we now think of the seale values for the smoking categories 1 = 5 in our example) and (4.3.1) is equivalent to the mean of the row scores
as variables YI .. 'Y4 then we can pose the problem of determining values of across the whole sample being zero: rT(O; IPy) = 0, as shown in Example
YI" 'Y4 which optimize sorne suitable criterion defined on the row scores. 4.6.2. (Notice that the mean and variance are defined over the whole sample,
Our object in deriving scores for the staff groups is elearly to investigate i.e. 193 people in our example, ofwhich 11 are assigned the first row score, 18
the differences between them and a natural criterion is thus the variance of the seeond, and so on.) The variance of the row seores is then the average sum
the row scores, which we would want to maximize. Clearly this variance is ofsquares:

unaffected by adding any eonstant to the scale values and can be increased at (Or-IPy)TO~(O;lpy) = yTp O;lPy

T
will by increasing the spread of the seale values. In order to identify an
104 Theory and Applications ofCorrespondence Analysis
M
(J)
To maximize this function subject to the constraint (4.3.2) we introduce a en
N ~
___

Lagrange multiplier Aand define the Lagrangian function: ~


--- --­ ..
~ ~ cE
N N N
L(y, A) == yTpTDr-1PY+A(1_yTDcY) (4.3.3) U? ' + +
i::!
O
+'" ~
'" ~
'"
Differentiation of this function with respect to the elements of y leads to: u ~ r-- N
~
U?
+
M + co
aL/ay = 2pTDr-1PY-2ADcY U? N;: +
(4.3.4 ) ~ ~ co ;:
aL/ay denotes the column vector of partial derivatives aL/aYj, j = 1... J; to ~
(j)
+ + r
en
-q­
> r ~ +
obtain (4.3.4) we have used the result a(y TAy)/ay = 2Ay, for A square <l: ~ O r
~ ~

symmetric, which is ~asily proved (Example 4.6.3). By setting (4.3.4) equal to ~

co
zero we obtain flieé"quation:
,; ><"' 11
1><
Dc-lpTDr-1Py = AY (4.3.5)
which is precisely the eigenequation of (4.1.23). Since the eigenvalue (Lagrange ~
multiplier) Ais the score variance itself: A = YTp TDr-l Py (proved by multi­ gs ~ ~ 'I~
I :: :: ~
plying (4.3.5) on the left by y TDc and using (4.3.2)), it is clear that this solution ~ ~

is once again equivalent to the (1-dimensional) correspondence analysis :::..


solution. The optimum Ais Al' the first principal inertia, and y is the vector
of standard co-ordinates of the column profiles with respect to the first
E ¡"
M ::J '" ~
. ~ M
principal axis. The resultant row Scores x = D r- 1Py have variance A1 so that -q- U? "O • ~~
C") I
I~
M

'j .~ Q) I ~ M M N
these are exactly the principal co-ordinates of the row profiles with respect to o~l¡" ~~
the first principal axis. In the notation of Section 4.1: y = [y 11 ••• y}1] T and
tIl
~ gM ~
+-' -...-
¡.,
>..>..
¡" CO

ro
x = [fu· . .fIl]T. In Fig. 4.1 we compare the results of this optimal scaling u
al •
with those obtained from the preset scale values described at the start of this e
.- +-' >..
N

section. The optimal scale values indicate relatively smal1 differences between ~ ,-
E
-§, ~
N
~
Ñ I ~
N

the 3 categories of smoking, and a larger difference between these categories --.J • • en
(f) ~ ;: ~~ I-q­
and the "non-smoking" category. N 1
Ñ Ñ
~~

Second approach: optimizing interna/ consistencv (j) ~ .;::.;::

~ ~~
The approach to dual scaling described by Nishisato (1980) is practically
identical, except that he sets the problem in a context which is reminiscent of ;=-
.;::

.;;:
.;::
-;;:-:::-:::
.;::.;::.;::
13
1

c.o
the sum of squares decomposition in analysis of variance and discriminant
analysis. Given the scale values Yl" 'Yl of the columns, let us consider
replacing each integer in the 1 x J contingency table by just as many ~
(j)
al
repetitions ofthe corresponding scale value. For example, for the contingency U?
Q. ro

::J e U?

table of Table 3.1 we replace 4 by 4 Y1S, 2 by 2 Y2S, 3 by 3 Y3 S, 2 by 2 Y4 S, and o ro (j)

so on for all the rows (Table 4.3). The first row is thus characterized by 11 o,
'+-
E ro
+-'

values and we summarize these values by their mean, the row score Xl' The ro I\ .S?e
+-'
1-

~
ü
(f)

oro
objective which is imposed in order to calculate the scale values is cal1ed the (f) ji ji t-­

criterion of internal consistency and has its origins in the writings of Guttman ~ en
(1941,1950), Maung (1941) and Bock (1960). The idea is tbat the set ofvalues
11II

4. Theory and Equivalent Approaches 107


106 Theory and Applications 01 Correspondence Analysis

SSb = n..xTO,x = n..yTpTO;lPy


characterizing a particular row should be as similar as possible, while the
averages (scores) should be as ditTerent as possible. Similarity ofthe values in where n.. is the total of the contingency table (n .. = 193 in our example).
the rows can be measured by the sum of squared deviations of these values Similarly (4.3.8) is:
from their mean, the row score. For example, in the case of the first row of
SSr = n.. y TO cY
Table 4.3, this "within-row" similarity is:
4(yl -X l )2 + 2(y2 - X l )2 + 3(y3 -X l )2 + 2(y4 -X l )2 so that the squared correlation ratio is:
r¡2 = yTpTO,-lPy/yTO cY (4.3.10)
which can be written aIternatively as
4Yi +2y~ + 3y~ +2y¡' -llxi The vector of partial derivatives of r¡2 with respect to y is:

Summed over all the rows we obtain the sum of squared deviations within Or¡2/oy = {(yTO cy )2PTO,-lPy _ (y TpTO; lPy)20 cY}/(y TOcy )2
rows, SSw: using the result of Example 4.6.3 once again and the usual quotient rule of
SSw = 61yi +45y~ +62y~+25y¡' -llxi -18x~ -51x~ -88x¡' -25x~ (4.3.6) derivatives. Setting this equal to zero we obtain:
The sum of squared deviations between rows, that is between the scores (yTOcy)pTO;lPy = (yTpTO;lPy)OcY
assigned to the 193 people, is SSb:
Using (4.3.10) we can rewrite the right-hand side ofthis equation in terms of
SSb = 1l(x l -x)2+ ... +25(x s -X)2 r¡2 and divide both sides by yTO cY to obtain:
= llxi + ... + 25x~ -193x 2 (4.3.7)
pTO,-lPy = r¡ 20 cY
In an analysis of variance fashion, SSb and SSw sum to SSr, the total sum of
squared deviations between all193 values in Table 4.3 and their mean x: l.e.
0-lpTO-1Py = r¡2 y (4.3.11)
SSt = 61 (Yl - X)2 +... + 25(y4 - X)2 c ,

= 61yi +... + 25y¡' -193x 2 which is an eigenequation identical to (4.3.5). In fact the only ditTerence
= SSb+SSw (4.3.8) between the previous objective (4.3.3) and the present one (4.3.10) is that the
former fixes in advance the total sum of squares SSr and incorporates this
According to the criterion of internal consistency we want to maximize SSb
constraint in the objective, whereas here the objective is expressed relative to
while minimizing SSw. Because these two quantities add up to SSr, the overall SSr' Here, of course, we ultimately have to impose a constraint on SSr anyway
variation amongst the scores, it is clear that both these objectives are satisfied
to identify the eigenvector solution of (4.3.11).
simultaneously for a given SSr' Putting this another way we must maximize This technique is called dual scaling because the symmetric problem of
SSb and simultaneously minimize SSw relative to SSr' If we define the ratio assigning standardized scale values to the rows of the table to maximize the
r¡2 = SSb/SSr then (4.3.8) can be written as:
resuIting scores of the columns is dual to the above problem. The optimal row
r¡2 + SSw/SSr = 1 (4.3.9) scale values are the elements of the first (non-trivial) eigenvector satisfying
the following eigenequation:
r¡2 is called the squared correlation ratio (Guttman, 1941) and it clearly
lies between O and 1. The objective is thus to find the scale values which O-lPO-lpT
, c x = r¡2 X
maximize r¡2.
The value of r¡2 is clearly unatTected by the value of x, because x is the mean If the scale values are standardized as xTO,X = 1, the column scores
y = 0c- lpTx have maximal variance of r¡2 = A. l , the first principal inertia,
of all the quantities on which both SSb and SSr are based. Therefore we
choose x = O, which is equivalent to the centering of the scale values and are the first principal co-ordinates of the column profiles. Notice how the
61Yl + ... +25Y4 = O (Le. cTy = O), as we have already noted above. From scale values and scores for each problem play dual roles-the scale values in
here on it is easier if we use matrix notation for the general problem. Since one problem, multiplied by the corresponding correlation ratio (square root
x = D,-lpy, (4.3.7) is in general: of principal inertia) r¡ = (A. l )1/2, are the scores of the symmetric problem.
109
4. Theory and Equivalent Approaches
108 Theory and Applications ofCorrespondence Analysis
J variables
The reason why '1 can be considered a corre1ation will become clear in the , ,
next section where we show yet another context in which correspondence JI variables J2 variables
analysis may be defined, canonical correlation analysis. , ~

4.4 CANONICAL CORRELATION ANALYSIS


1 cases ZI
Z2

In this section we shall describe the geometry of canonical corre1ation


analysis and how it applies to the special case of qualitative data. We shall
show how this geometry is related to the geometry of correspondence
analysis, leading to the alternative interpretation of the principal inertias
as squared canonical correlations. The geometry of canonical correlation FIG.4.2 Typical format of multivariate data suitable for canonical correlation
analysis is not often discussed in the literature, yet it is a context which analysis, with the variables naturally dividing themselves ¡nta two disjaint sets
justifies the fundamental concepts of profile, mass and chi-square distance in
within ZI and within Z2 respectively. A complete analysis consists of
correspondence analysis. We shall thus enter into more detail than usual in
this section, which may be omitted by the reader who is less interested in the identifying further linear combinations U k = Zl a k and Vk = Z2 bk' each un­
theoretical background of correspondence analysis. correlated with previous linear combinations UI .,. Uk-l and VI'" Vk-l, which
have maximum correlation. If J 1 ~ J 2' then the procedure identifies at most
Fisher (1940) originally pointed out the relationship between the optimal
scaling and canonical correlation analysis of a contingency table, in the
K = JI canonical correlations Pk' k = 1 ... JI' in descending order. It is easy
to show that the associated vectors a k and b k of canonical weights can be
context ofthe data described in Section 9.1.
obtained from the left and right singular vectors of the matrix Sl//2S12S22112
(cL Appendix A). Specifically, the SVD of this matrix is:
A/gebra of canonica/ corre/ation ana/vsis
Slll/2S12S2F2 = WDpXT with WTW = XTX =1 (4.4.2)
Canonical correlation analysis was defined originally by Hotelling (1936).
where the singular values down the diagonal of D pare the canonical
The algebra of canonical correlation analysis of quantitative data is well
known and most comprehensive textbooks on multivariate analysis contain correlations, and the matrices A == [al'" aKJ and B == [h l ... bKJ of canonical
descriptions of the technique (e.g. Morrison, 1976; Tatsuoka, 1971; Mardia weights are simply: 2 (4.4.3)
et al., 1979). In addition there are a number of articles, mainly in the human A = Slll/2W and B = s2F X
sciences literature, where the analysis is applied to qualitative data in the The usual standardization of the singular vectors of W and X to be ortho­
form of a two-way contingency table (e.g. Srikantan, 1970; Holland et al., normal as in (4.4.2) implies that A and B are standardized as follows:
1980,1981; Kaiser and Cerny, 1981).
Data suitable for canonical correlation analysis are in a typical cases x A TS A=B TS 22 B=I (4.4.4)
ll
variables format (Fig. 4.2), with the variables naturally dividing themselves This is the usual standardization that the vectors of canonical scores are all
into two subsets of JI and J 2 variables respectively. Interest is focused on of unit variance (and uncorrelated). In general, standardizations of the form
linear relationships between the two sets of variables as observed across the A TS A = DI and B TS B = D 2 (where DI and D 2 are diagonal matrices
sample of 1 cases (thus generalizing multiple regression analysis where one of 22
withl positive
l
diagonals) do not affect the canonical correlations. Thus (4.4.4)
the subsets of variables is just a single variable). The objective is expressed is actually a set of identification conditions on the scale of the canonical
formally as finding linear combinations\u = Zla and v = Z2b of each set of weights and, equivalently, of the canonical scores in each U k and vk· In order
variables which have maximum correlation p: to identify the origins of the vectors of canonical scores, their means are con­
p = (a TS I2 b)/«aTS 11a )(b TS 22 b))1/2 (4.4.1 ) ventionally set at zero, which is equivalent to each variable (Le. each column)
of Z I and Z2 being centered with respect to its mean.
where S12, Sl1 and S22 are the covariance matrices between ZI and Z2'
./

111
4. Theory and Equivalent Approaches
110 Theory and Applications ofCorrespondence Analysis
We can describe the geometry in two different, though equivalent, ways.
J categories
(indicator.or dummy variablesl The question which is of particular interest to us is the following: if we define
, , A(K-l == [al' .. aK-] and B(K-) == [b 1 .. · b K-] as the respective submatrices of
JI categories Jz categoríes the first K* canonical weight vectors (i.e. the first K* columns of A and B of
(4.4.3) respectively), what interpretation can we give to a plot of the rows of
A(K-l and of B(K-) together in a K*-dimensional Euclidean space (e.g. a
2_dimensional plot, when K* = 2)?
1 cases Z¡ Zz

GffOmetry of the columns


The first way of describing canonical correlation analysis geometrically is to
think of the columns of Z 1 and Z2 as points in l-dimensional Euclidean space
F IG. 4.3. A bivariate indicator matrix (cf. Section 5.1) which has two sets of (Kendall (1961, p. 61)). In the case of quantitative data the columns of Zl and
columns representing the respective categories of two discrete variables. Z2 are centered, which in geometric terms means that they have been
projected orthogonally onto the (l-l)-dimensional subspace orthogonal to
A two-way contingency table N is the condensation of a cases x variables 1 (Fig. 4.4). The sets of alllinear combinations of the columns of Zl and of
data matrix of a particularly simple form (Fig. 4.3). The "variables" are called Z2 form J 1- and J 2-dimensional subspaces respectively, and the cosine ofthe
"dummy variables" (or "pseudovariables", cf. McKeon, 1966) or "indicator angle between any two vectors is equal to their correlation. Hence canonical
variables", and they indicate to which categories of two discrete variables correlation analysis is the search for any two vectors, u and v, in these
each case belongs. Such a data matrix is often called an indicator matrix respective subspaces, which subtend the smallest angle. The procedure is
(de Leeuw, 1973), an incidence matrix (Hill, 1974) or a response pattern table repeated in the (J 1 - 1)- and (J 2 -1 )-dimensional subspaces orthogonal to
(Nishisato, 1980). Application of canonical correlation analysis to such data the canonical score vectors u and v to obtain a second canonical correlation
breaks down because the covariance matrices SIl and S22 are singular. In fact and score vectors, and so on. Clear1y if JI:::; J 2' then we would eventually end
if we let r and e be the vectors of the means of the columns of Zl and Z2 up with a set of JI canonical score vectors in each subspace, the first set
respectively, and D r and De be diagonal matrices of r and c respectively, then "explaining" aH JI dimensions of the column space of Zl' while the second
the covariance matrices are simply:
T T ..
S11 = Dr-rr S22 = De-ce (4.4.5) 1-dimensional Euclidean space (l_ll_dimensional subspace
orlhogona\ to 1
and are of ranks (JI - 1) and (J2-1) respectively. We can use one of a
number of generalized inverses to carry the classical theory through: for JI- dimensional subspace of
lirst set 01 centred
example, Sil == D r- 1 -11 T and S22 == D e- 1 -11 T. In practice, it might be calumns
more convenient to omit one dummy variable from each set (e.g. Holland et
al., 1980, 1981), a strategy which can also be described as using generalized o~
inverses (Example 4.6.5). However, in this particular situation it turns out
that the complete solution to the problem is contained in the canonical
correlation analysis of the data without prior centering of the columns of Zl
and Z2' Variances and covariances are defined with respect to the origin (not
the mean) and the analysis yields a trivial maximal solution where the
Jz- dimensional subspace of
canonical correlation is 1 and associated canonical weight vectors are 1 second se! of centred
(JI-vector of ones) and 1 (J 2-vector of ones) respectively, after which the columns
canonical solutions are those of the centered problem. Here the non-trivial umns
Geometry in /-dimensional space of the centered variables (col )
solutions are centered and thus identified with respect to the origin by virtue FIG.4.4 in a conventional canonical correlation analysis.
of their uncorrelation with the trivial solution.
4. Theory and Equivalent Approaches 113
112 Theory and Applications ofCorrespondence Analysis
Mahalanobis apace of !he rows of ~
Mohalanobis space of !he rows of ZI
I-dimensionol Euclldeon spoce 1 with metric S2~
with metric 511
J¡- dimensionol subspace of
firsl sel of uncenlred coIumns Hhrow

J2 - dimensional
subspace of
second se' of
o uncentred
columns

FIG.4.5 Geometry in /-dimensional space of the uncentered dummy variables in


the canonical correlation analysis of the indicator matrix of Fig. 4.3 The vector 1 is
common to the subspaces of both sets of variables. FIG. 4.6. ("Q-mode") J, - and J 2 -dimensional geometries of the rows of Z, and
Z2 respectively (cf. the "R-mode" geometry of Figs 4.4 and 4.5). The co­
set leaves J 2 -J 1 dimensions ofthe column space of Z2 "unexplained". In this ordinates of the row points with respect to the canonical axes are the canonical
framework the correlations between the new canonical variables and the scores. In these geometries the variables are represented as "unít points". with the
canonical weíghts as co-ordinates.
original variables are simply the angle cosines between the canonical score
vectors and the columns of Zl and Z2'
When Zl and Z2 are the (uncentered) indicator matrices of Fig. 4.3, then imposed on these spaces are defined by the inverse covariance matrices Sil
the vector 1, being the sum of the columns of Zl and likewise of Z2' is actually and s2l, conventionally called Mahalanobis metrics (see, for example,
common to both subspaces (Fig. 4.5). Centering, that is projecting onto the Mardia et al., 1979, Section 2.2.3). This can be thought of as "sphericizing"
subspace orthogonal to 1, would reduce the dimensionality of each subspace the two clouds of points so that variances of points along any dimension in
and we would not be able to identify the score vectors in this lost dimensiono each respective cloud are the same, namely 1 in this case. The columns S1l8k
However, if we omit centering the columns it is clear that the highest of SllA and S22 bk of S22B can be considered orthonormal basis vectors in
canonical correlation is 1, when u and vare both collinear with the vector 1 these two spaces, since:
and thus subtend a zero angle. This is the trivial solution of which we have
(SllA)TSil(SllA) = (S22 B )TS 2l(S22 B ) = 1
spoken previously. Subsequent canonical score vectors are orthogonal to 1 in
each subspace, and are thus centered as required. If J 1 ~ J 2' there will now which is equivalent to standardization (4.4.4). The co-ordinates of the rowS
be only J 1 - 1 non-trivial canonical correlations for such data, as opposed to of Zl and of Z2 with respect to Slla k and S22bk> respectively, are the
J 1 in the usual case. corresponding vectors of canonical scores:
(4.4.6)
ZlS il(S1l8 k) = Zlak = uk
(4.4.7)
Geomelry of lhe rows Z2S2l(Snbk) = Z2bk = vk
The second way of studying canonical correlation analysis geometrically is to Thus the Q-mode interpretation of canonical correlation analysis is that it
think of the rows of Zl and Z2 as points in corresponding spaces of investigates the extent to which the two clouds of points occupy similar
dimensionalities J 1 and J 2 respectively (Fig. 4.6). This framework is often positions in their respective Mahalanobis spaces, and in the process identifies
described as "Q-mode" to indicate geometry of the cases, whereas the canonical axes of greatest positional correlation. A worthwhile discussion of
previous framework is called "R-mode" to indicate geometry of the variables. this topic in the context of an actual application is given by Falkenhagen and
In the present situation there is a strong relationship between these two
frameworks-in fact, in the case of dummy variables we shall show that there Nash (1978).
Remembering now that our present interest is directed more towards a
is hardly any difference between the row and column geometries. geometry of the rows of A (KO) and B (K ¡' let us define J 1 and J 2 "unit" point
0

We call the spaces of Fig. 4.6 Mahalanobis spaces, because the metrics
115
4. Theory and Equivalent Approaches
114 Theory and Applications ofCorrespondence Analysis
metric ITcI
metric 0;1
vectors respectively in the Mahalanobis spaces of Fig. 4.6. By a cnit point e¡,
for example, we mean a vector of zeros except for a 1 in the lth position. The
co-ordinate of el with respect to a typical canonical axis Sila k of the first unit point eJ
~proportion cI
space is simply the lth element of a k:eI S 1/ (Sil a k) = a1k • Thus for K* = 2, say, :of the rows
the plots ofthe JI rows of A(2) and the J 2 rows of B(2) give displays where the I
!
points can be interpreted as the positions of the fixed unit points with respect
to the plane of the first two canonical axes in each space.
We can amalgamate these two displays into one joint display if we take
care in the interpretation of the between-set positions. From (4.4.2) and
(4.4.3) we see that the two sets of co-ordinates A(K.) and B(K.) are related as
follows (where K* = 2, say):

SI/S12B(K·) = A(K·)Dp(K·) (4.4.8) FIG.4.7. As Fig. 4.6. when Z, and Z2 consist of dummy variables. Here there is
no distinction between the display of the rowS and columns. For example. a
Sil S21 A (K·) = B(K·)Dp(K·) (4.4.9) proportion r¡ of the rowS of Z, (i.e. Ir, rows) coincide with the unit point e¡
representing column i.
where D p(K.) denotes the K* x K* diagonal matrix of the first K* canohical
correlations PI'" PK.' If we define R == SI/S12' for example, then for
K* = 2 a particular point [ail ai2 ] T of the first cloud is seen to be a linear these particular centerings r Tak = O and eTb k = O of the non-trivial vectors of
combination I: jrij[bji bj2] T of the J 2 points of the other cloud, followed by canonical weights imply that the variances of the canonical scores are inde­
an expansion in scale of l/PI and 1/P2 on respective dimensions. Notice that pendent of centering the columns of ZI and Z2: aJDra k = aJ (D r -rrT)a k =
the columns of the JI x J 2 matrix R are the vectors of regression coefficients a~Sllak and similarly b~Debk = b~S22bk (cf. (4.4.5)). The non-trivial canonical
in the multiple regressions of the respective columns of Z2 on the set of correlations are likewise independent of the centering, since the between-set
columns of ZI' The interpretation of (4.4.8) (and similarly (4.4.9)) is not easy,
covariance matrix SI2 is:

though it is apparent that a specific point [ail ai 2]T will tend away from the T

origin of the display in the direction of those points [b ji bj2] T corresponding S12 = (l/n. ,)Zi Z 2-re T = P_re (4.4.10)
to variables which exhibit large positive regression coefficients on variable i.
(where P == (l/n.,)N = (l/n. JZiZ2 is the correspondence matrix, cf. (4.1.1)),
In the special case of the (uncentered) indicator matrices, the situation is
considerably simplified. The "uncentered" covariance matrices are simply Dr so that:
(4.4.11)
and De respectively. The 1 row points are coincident with the unit points in Pk = aJS12bk = a~(p-reT)bk = a~Pbk
each space and are distributed over these points in proportions given by the
respective elements of r and e (Fig. 4.7). The trivial solution takes the form of Equations (4.4.8) and (4.4.9) can similarly be expressed in terms of un­
trivial canonical axes r = D r 1 and e = Del in the respective spaces. With centered covariance matrices:
respect to these axes the co-ordinates of the rows of ZI and Z2 are (4.4.12)
D;IPB(K.) = A(K·¡Dp(K·)
ZIDr-1r = Z11 = 1 and Z2De-1e = Z21 = 1, and the uncentered correlation (4.4.13)
of these two vectors of identical elements is 1. The non-trivial solutions thus D e- 1 p TA(K.) = B(K·¡Dp(K·)

appear from the second canonical axes onwards and these are all orthogonal
which are just the transition formulae of (4.1.16) in K*-dimensional space.
to their respective trivial axes:
Rere the matrices of regression coefficients of each set of dummy variables on
r TD;I(D ra k) = O i.e. rTa k = O k = 1 . .. J 1 -1 the other set are R = D r- 1P and e = D e- 1 P T, the matrices ofrow and column
eTDe-I(Debk) = O i.e. eTb k = O k = 1. .. J 1 -1 profiles respectively (cf. (4.1.4)), and the canonical correlations PI' P2'''' are
the square roots (}..1)1/2, (}..2)1/2, ... (denoted previously by J.Ll' J.L2'''·' cf.
(where we number the non-trivial solutions from 1 onwards, with the trivial (4.1.18)) of the principal inertias.
solution numbered as O, and assume, as always, that J 1 ~ J 2)' Notice that
4. Theory and Equivalent Approaches 117
116 Theory and Applications ofCorrespondence Analysis

¡nitial raw
Summarv scor8S a

In summary, we have drawn attention to the geometric relationship between


the variables and the cases in canonical correlation analysis. In the Q-mode
geometry of the cases, the variables can be represented by unit points in the
space of the cases and projected onto canonical subspaces along with the 2

cases (this would be a canonical correlation "biplot", cf. Appendix A). When
the data are in the special form of an indicator matrix, the cases themselves R2 • • •
occur only at the unit points and there is, in fact, no geometric difference R4
_~----------!----------t
between the cases and the variables. The concept of assigning a mass to the RI ... -------- • • •
categories of the discrete variables (Le. columns of the indicator matrix) is
••
R5 • •
justified here by the "piling up" of the cases at each unit point representing R3 • •
the variable. The chi-square metrics in the two corresponding spaces of the
cases are equivalent to Mahalanobis metrics, and the principal inertias of the
correspondence analysis are squared canonical correlations. Finally, from
(4.4.12) and (4.4.13) and the standardization (4.4.4), it is clear that the display
of the rows of A(K*) and B(K*), the matrices of canonical weights, is the same t:h I I I ¡nitial calumn
as the K*-dimensional display of the row and column profiles in standard I 2

3 scale valu8s b
co-ordinates obtained from the correspondence analysis of N. The principal CI C2 C3 C4

co-ordinates are thus the canonical weights scaled by the canonical correla­
FIG.4.8. Plot of the initial scale values (horizontal axis) against the derived row
tions: F(K*) = A(K*)Dp(K*) and G(K*) = B(K*)Dp(K*)' scores (vertical axis). The size of each point is roughly proportional to the respec­
In Example 4.6.6 we illustrate the results of this section by recovering the tive element of the data matrix (Table 3.1 or Table 4.1). The centroid of each
correspondence analysis of Table 3.1 from the canonical correlation analysis vertical set of points is Indicated by a x and it is clear that these are not exactly
of the associated indicator matrix (cf. Table 5.1). linear. Because the row scores are the centroids of each horizontal set of points in
this plot. the regression of scale values on scores is exactly linear with a slope of 1.

4.5 SIMULTANEOUS LINEAR REGRESSIONS


If we performed least squares linear regression of y on x this would amount
Yet another definition of correspondence analysis is the so-called "simul­ to concentrating the mass of each vertical column of points at their centroid,
taneous linear regression" method, which is usually associated with the and fitting a straight line to these weighted points. This is again a consequence
author Lingoes (Lingoes, 1964, 1968, 1977). This approach is of historical of the fact that the centroid is the closest point to a set of points in terms of
importance because it is the context in which Hirschfeld (1935) first derived least squares. SymmetricaHy we can think of regressing x on y by least
the algebra of the technique which was to become so widespread under so squares, that is minimizing horizontal sum of squared deviations to a line
many different guises. which amounts to fitting the line to the centroids of the horizontal rows of
Once again we consider the data of Table 3.1 and the initial (non-optimal) points. Is there a solution for a and b in this situation such that, in
scale values for the columns and derived row scores, as in Fig. 4.1. Instead of Hirschfeld's own words, "both regressions are linear", that is the centroids of
plotting the row and column values on the same (or parallel) scales, we plot the columns and of the rows lie exactiy on the respective regression lines?
them "against" each other, in a typical "x - y" plot (Fig. 4.8). In the original It turns out that such a solution does always exist and that the first non­
context of these data the 4 non-smoking senior managers, for example, are trivial solution of correspondence analysis provides this solution (Fig. 4.9).
represented in Fig. 4.8 by 4 points coincident at [0, 1.27JT. AH 193 people This result is easHy shown to be another way of interpreting the pair of
constituting the contingency table are thus represented at 20 discrete points transition formulae, for example (4.2.4) and (4.2.5). For a candidate solution
1
of this plot, and they "pile up" at each point in the frequencies given by the a and b the centroids of the columns and rows are D c- 1 P la and D.- Pb
respectively. If a and b are a solution pair of the symmetric transition
contingency tableo
118 Theory and Applications ofCorrespondence Analysis 4. Theory and Equivalent Approaches 119

oplimol row
scores o transition between the scale values and scores is not a symmetric one, but
that this does not alTect the objective of finding the most col1inear simul­
taneous linear regressions (cf. Fig. 4.9 caption).

4.6 EXAMPLES

4.6.1 Biplot imerpretation ofjoim displav of the row and column poims

R2
R4

RI
•• -1
-
•• , ••
_____ X____X
oplimol column
In a biplot of a matrix Y == [y¡jJ, each row and each column are represented by vectors
in a low-dimensional Euclidean space such that the "between-sets" (row-column)
scalar products approximate the elements y¡j, where the approximation is traditionally
in the sense of least-squares or weighted least-squares (see Appendix A). Show that in
an "asymmetric" correspondence analysis display of the rows and columns (for
~----- • • 1· scole volues b
example where the rows are displayed in principal co-ordinates and the columns in
R5 • • • • standard co-ordinates) the between-set scalar products approximate the quantities
R3

CI

C2

C3

C4
(p¡j - r¡cj)/r¡c j. In what specific way are these quantities approximated?

Solution
When the rows are represented by the rows of F and the columns by the rows of r,
-1 the reconstitution formula (4.1.27) may be written as:
Pij = r¡cj(1 + r.fhk'ljk)
That is:
(p¡j-r¡Cj)jr¡c j = r.fhk'ljk (4.6.1)
The right-hand side of (4.6.1) is the scalar product of the ith row of F and the jth row
FIG.4.9. As Fig. 4.8. for the optimal scale values and scores. The regression of of r in the full space. Therefore in the principal K*-dimensional subspace, the scalar
scores on scale values is now an exact linear one. as shown by the dotted line products are approximations:
through the centroids of the vertical sets of points. Because the transition from (p¡j-r¡c)/(r¡c j ) ~ r.fohk'ljk (4.6.2)
scores a back to scale values b is of the form b = (1 /Jl2) 0;:-' pTa. it follows ihat
the slope of this regress:on line is Jl2. the first principal inertia. which is 0.0748. The sense of the approximation is weighted least-squares. More specifically, if
so that the angle of SlOP6 is tan- 1 0.0748 = 4.3'. The regression of scale values we denote the quantities (p¡j - r¡cj)/(r¡c j ) by yij then the function which is being
on scores is still exactly linear with slope 1. since a = O; 1 Pb. The objective has minimized is'
still been to find the "most collinear" simultaneous regressions (i.e. minimize r.¡ r. flI2CJ!2(y¡j - r.tU¡kVjk)2 (4.6.3)
1 _Jl2 in this case).
where r.tU¡kVjk is the scalar product between row and column points in K*­
l
dimensional space, the points' co-ordinates being the variables of the minimization.
formulae then D c- P Ta = Jlb and D r- Pb = Jla so that D c- 1 P T a is linear
l
The weights of the least-squares approximation are thus rl I2 cj!2, and it is easily
against b (with slope fl) and D r- 1 Pb is linear against a (with slope 1/fl). Of shown that this is identical to the approximation implied by the decomposition
course any pair of solutions has this property, but our objective is to find the (4.1.9). From the low rank ap,Proximation theorem (cf. (A.1.12), (2.5.8)), the rank
solution to the simultaneous linear regression problem for which the two K* approximation of P - re in the metrics D r- 1 and Dc- 1 minimizes trace
{Dr- 1(P - re T- S)D c- 1(P - re T- S) T} over all matrices S of rank not greater than K*.
lines are as collinear as possible. This means that we want to minimize the This can be rewritten as trace {D;/ (Y -S)D:1 2D:12(y -S) TD;12}, where S == [súJ =
angle l/Jl- fl = (1- fl)jfl, which is minimized by maximizing fl. Thus the D; ISD c- 1 is of the same rank as S. The set of scalar products r.tU¡kVjk is just a re­
solution provided by the first principal axis of correspondence analysis is the paDlmetrization of the elements sij and the optimal s¡j are the scalar products
one we require. From our discussion in Section 4.4 it is clear why this r. k hk'ljk'
objective is satisfied when the canonical correlation between the two sets of
Comment
scores is maximized, because the canonical correlation is precisely the The optimal matrix of scalar products is F (KO¡r(KO), where F (KO) and r (KO) are
correlation of al1 the points that make up Fig. 4.9. Notice that in Fig. 4.9 the the first K* columns of F and r respectively. While this matrix may be unique, its
120 Theory and Applications ofCorrespondence Ana/ysis 4. Theory and Equiva/ent Approaches 121

decomposition as the product of two matrices is no1. For example, F(K,)r(K') = So/ution
cz,(K,)G(K') so that exactly the same biplot interpretation and sense of the approxima­ In general, the correlation between an I -vector z of Os and Is and a continuous
tion is valid in the dual asymmetric display. This is again the question of how we variable x, called the "point-biserial" correlation, simplifies to:
choose to identify the solutions, which we have discussed at great length throughout {zTX - (zTI )x} j{(z T 1) (I - Z T1 )s;} 1/2
this chapter in different guises (see also Gabriel, 1978). In the display of the rows of
F (K') and r (K') the rows he exactly at the barycentres of the columns, which may where X and s; are the mean and variance of x. Since correlations are independent of
be desirable in certain situations. The standardization is prescribed by the data centering and scaling we can use canonical scores ZI 8 and Z2b which are centered and
analyst, not by the analysis. By default the display is "symmetric" and in principal have variance 1, that is 8 = Cjl and b = 'Y (the first standard co-ordinate vectors), so
co-ordinates. that the correlation simplifies further as (z Tx)j {(z T1) (I - ZT1)} 1/ 2 • If z is the ith column
Z¡ of ZI' then z!1 = Ir" where r¡ is the usual mass of the ith row of N == ZiZ2' For
within-set correlations x is ZI fP and Z!ZlfP = Ir¡({J¡, so that thecorrelation is:
4.6.2 In variance of centroid with respect to reciprocal averaging
(Ir¡({J¡)j{(Ir¡}(I -Ir¡}}1/2 = {r;/(I-r¡}}1/2((J¡
Suppose y is a set of scale values for the columns of a correspondence matrix P and
that the centroid of y is e Ty = {3. Show that the scores x = D r- 1Py are still centered For between-set correlations x is Z2'Y and ZiZ2'Y = I{(Z!Z2)jI}y = Ipr¡({J¡, by the
transition formula from columns to rows, since (z!Z2)j(Ir¡) is the ith row profile of
at /3.
N :p({J¡ = {(ZiZ2)j(Ir¡}}y. Hence the between-set correlation is just p (= (1 1)1/ 2 ) times
So/ution the within-set correlation.
The centroid ofthe scores is rTX = r TD r- 1
Py = 1Tpy = eTy = {3.
Comment
To illustrate these results, consider the contingency table ofTable 8.5. Correspondence
Comment analysis yields a first principal axis with inertia 0.1043 and a principal co-ordinate
The variance of the scores as defined above is clearly less than that of the scale values for the first column, say, of 91 = 0.443, with mass 124j390 = 0.318. The aboye
and is, in general, less than or equal to the variance of the scale values multiplied by formulae hold in a symmetric fashion for the between- and within-set correla­
the largest principal inertia Al of P. Only when the scale values are optimal (in the tions relative to the columns of Z2, so that these correlations are respectively:
sense of Section 4.3) does the row score variance reach this upper limil. 0.443(0.318)I/2j(0.682)1/2 = 0.303 and (dividing the aboye by the canonical correla­
tion, the square root of the inertia) 0.937. These figures agree with the correlations
computed by Holland et al. (1981, Table 2) up to a change in signo In their case they
4.6.3 Vector deriva tÍ ve of a quadratic form compute the canonical solutions using the large indicator matrix and then actually
Consider the quadratic form yTAy, where A is a J x J symmetric matrix. Show that compute correlations in the usual way between the vectors of dummy variables and
the vector derivative ofy TAy with respect to y is 2Ay. scores. Our object in this example is to show how much simpler it can be to obtain
their results, working directly on the contingency table, and to show that the between­
So/ution set ("interset") correlations are merely a scaled-down version of the within-set
By definition a(y TAy)jay is the vector of scalar derivatives a(y T Ay)jaYj,j = l .. . J. The ("intraset") correlations, a fact which the aboye authors appear to ignore.
terms involving Yj in yTAy are ajjyJ (with derivative 2a jj y) and then J -1 terms of the
form (ajj' +aj'j)YjYj',j' = 1 .. . J,j' =F j. Since A is symmetric these latter J -1 terms are 4. 6. 5 Generalized in verses in canonical correlation analysis of dummy
of the form 2ajj'YjYj" with derivatives 2a jj .yj" Hence a(y TAy)jaYj is 2L fajj'Yj', which is variables
precisely the jth element of 2Ay.
Let Z == [Z 1 ZJ be an I x J indicator matrix and consider the canonical correlation
analysis of the two sets of JI and J 2 columns (dummy variables). If the respective
4.6.4 Correlation between dummy variables and canonical varia tes covariance matrices Sil and S22 of the two sets of variables were non-singular then
the vectors a and b of optimal canonical weights are eigenvectors of the matrices
Let Z == [ZI Z2] be a bivariate indicator matrix and let 8 and b be vectors of SI'?SI2 S 2}S21 and S2}S2ISI'?S12 respectively, associated with the highest eigenvalue
canonical weights associated with the highest (non-trivial) canonical correlation p p2, with the usual identificationjstandardization a TSII a = bTS 22 b = 1. (This is equiva­
between the columns of ZI and Z2' Show that the correlation coefficient between the lent to (4.4.2-4.4.4).) Of course in the present case the covariance matrices (4.4.5) of
ith column of ZI and the vector Z¡8 of canonical scores is '!Vl 12 j(l- rY 12, where ({J¡ the sets of dummy variables are singular and their inverses do not exis1. However,
is the standard co-ordinate of the ith row of the contingency table N == ZiZ2 on the certain generalized inverses of these matrices can be defined and substituted for the
first principal axis of the correspondence analysis. Show furthermore that the ordinary inverses to allow the usual theory to remain applicable.
correlation coefficient between the ith column of ZI and the vector Z2b of canonical
scores is f¡rl l2 j(l- r;)1/2 where f¡ is the ith first principal co-ordinate. (Thus, since (i) Show that the generahzed inverses SIl == D r- I -11 Tand S22 == De· 1 -11 Tlead to
f¡ = p({J¡, the within-set and between-set correlations differ only by a scaling factor, the canonical weight vectors al and bl which are identical to the first standard row
the canonical correlation.) and column co-ordinates in the correspondence analysis ofN == Zi Z 2'
122 Theory and Applications ofCorrespondence Analysis 4. Theory and Equivalent Approaches 123

(ii) Consider the generalized inverses: analysis on Z* == [Z! Z!] where a column of ZI and of Z2 has been omitted. Use
the results (4.6.4) to recover eventually the principal co-ordinates of the rows and
* _ [D;<I
8 11 = O
0J
O
* _ [U;.I
8 22 = O
0JO columns of ZiZ2' given in the first columns of (3.1.3) and (3.2.3) respectively.

where Dr. and D c• are the diagonal matrices of any (JI -1) and (J 2 -1) elements Solution
respectively of r and e. For ease of exposition, we have assumed that the last elements We use program BMDP6M (Dixon et al., 1981) to perCorm the canonical correlation
rJ, and cJ, are respectively excluded. Show that these generalized inverses lead to analysis oC Z* (193 x 7), where tbe last columns oC ZI and Z2 have been omitted. The
solutions of the form a* = [a! " . a1¡ -1 O] T and b* = [b! " . b', -1 O] T and that resultant canonical weights, with the corresponding "masses" r¡ and cj written below
these solutions yield those obtained in (i) as follows: them, are:

aJ¡ = -r.{,;~lr¡ar a¡ = ar+aJ¡ i=I,,·J I - l


a*1 a! a! a: b*1 b*2 b*3
(4.6.4) -0.4936 - 1.6782 0.6548 -1.5833 2.5064 0.7089 0.3555
bJ , = -r.f;'llcjbj bj = bj +b J, j=1..·J 2 -1

Solution rl r2 r3 r4 cI c2 C 3

(i) 811812822821 = (D r- I -11 T)(p - re T) (D c- I -11 T)(p _ re T)T 0.0570 0.0933 0.2642 0.4560 0.3161 0.2332 0.3212

= Dr-IPDc-IpT and the largest canonical correlation is 0.2734. From (4.6.4) the standard co-ordinates

are obtained by first computing:


all other terms cancelling out. This is the matrix whose eigenvectors yield the
as = 0.7337
co-ordinates of the row profiles in the correspondence analysis of ZIZ2' The
standardization of the eigenvectors a T8 11 a = a T(D r - rr T)a = 1 is that of the standard b4 = -1.0718
co-ordinates (a TDra = 1), since r Ta = O by the usual orthogonality conditions with so that
the trivial eigenvector 1. 0.2401
A similar argument applies to 822821811812, which turns out to be Dc-IpTDr-Ip, -0.9445 1.4346J
yielding the standard co-ordinates of the column profiles of ziZ2' -0.3629
a =I 1.3885 b=
-0.7163
(ii) It is obvious that the generalized inverses 8!1 and 8!2 will nullify the last element -0.8496 [
of each eigenvector. Rather than derive the relationship (4.6.4) it is easier to show -1.0718
directly that the a¡s and bjs satisfy the relevant conditions. For example, the centering 0.7337
of the canonical weights al ... aJ ¡ is correct: If these are multiplied by the canonical correlation then the first principal co­
r Ta = r.¡r¡a¡ ordinates oC the correspondence analysis are indeed obtained.
= r.{';~\(ar+aJJ+rJ¡aJ¡ Comment
= r.{,;~lriar+(I-rJJaJ¡+rJ¡aJ¡ As in Example 4.6.4 it would clearly be a waste oC time to perCorm the computations
on the indicator matrix, and we include this example to illustrate the relationship and
= -aJ¡ +aJ¡ to show how much easier it is to work on the contingency tableo This Cact has been
=0 known for a long time, at least since the publication oC the excellent monograph by
The unit standardizations of a and b and the fact that they yield the largest canonical McKeon (1966). Yet, surprisingly, it is still not widely known. Many authors, like
correlation p = a Tpb follow in a similar fashion, but are a little more tedious to show. Holland et al. (1980, 1981), still proceed via the indicator matrix, while Kaiser and
Cerny (1981) think that they have discovered the relationship with the SVD oC the
Comments contingency table for the first time. Mardia et al. (1979) treat the canonical correla­
There is an abundant literature on the subject of generalized inverses. Pringle and tion analysis oC dummy variables and correspondence analysis (in their case, reciprocal
Rayner (1971) give a complete treatment of the subject and most textbooks with a averaging) in separate sections oC their book without cross-reCerencing.
detailed treatment of matrix algebra will include a short section on the topic, for
example Graybill (1976, Section 1.5). 4.6.7 Correspondence matrices with a block structure
4.6.6 Recovering a correspondence analysis from the canonical Suppose that a correspondence matrix has a block structure oC the Collowing Corm:
correlation analysis úf the indicator matrix 1 [P(l~J
Let Z == [ZI Z2] be the 193 x 9 indicator matrix corresponding to the 5 x 4 l~J = 1: O I 1>(2)

contingency table ZIZ2 of Table 3.1. Perform a conventional canonical correlation JI J2


l·'
I
124 Theory and Applications ofCorrespondence Analysis 4. Theory and Equivalent Approaches 125

where the total of Pis 1 (by definition) and the totals of p(I)(I 1 x JI) and P(2)(I2 x J 2) with a similar result for the matrix e of column profiles. Since
are denoted by tI and t 2 respectively, so that tI + t 2 = 1. Let F(1), G(1), D~I) and F(2),
G(2), D~fl denote the complete results of the respective correspondence analyses of p(l)
and p(2).
R[a l
a2
lJ = [aIR:::
1 a R2
lJ = [ala2 1lJ
1
and e[a
l
a2 1
lJ
= [al
a2 1
lJ
Show that the largest non-trivial principal inertia in the correspondence analysis of (noting the difTerent orders of the vector 1 on the left- and right-hand sides of these
P is 1 and that the other principal inertias are those of D\I) and D\2) arranged in expressions), the largest principal inertia is 1 and the values of al and a2 satisfy
descending order. Show that the principal co-ordinates F and G in the analysis of P the centeri~ and standardization conditions [a l l T a21 TJr = al t 1 + a2t 2 = O and
are of the following form (where we have ignored the re-ordering of the columns in [ale a21 JD r[a l l T a 21 TJT = ait l +a~t2 = 1, which imply that al = -(t 2/t¡}I/2
terms of the principal inertias): and a 2 = (tIft 2)112.
al In a similar fashion we can show that ~I = t l l /2 and ~2 = t;I/2 and that the
columns ofF and G satisfy theconditions FTDrF = DA = GTDcG and rTF = OT = cTG.
11 ~IF(1) o
al
F=
a2
12 o ~2F(2)

a2
al
JI ~I G(1) o
al
G=
a2

J2 1 : I o ~2G(2)

a2
associated with principal inertias:
'1 o o

DA = I O I D~I) I O

O I O I D<¡)
L .

where the scalars al' a2 , ~I and ~2 depend only on the values oft l and t 2 •

Solution
If r(1),c(l) and r(2),c(2) are the masses in the respective analyses of p(l) and p(2), then r
and c are:
r = [t l r(1)J c = [t IC(1)J
t 2r(2) t C(2)
2
(since tI + t 2 = 1). The transition matrix of row profiles R = D r- IPis just the similariy
blocked matrix of row profiles R(1) and R(2) of the submatrices:

R
= [R(1)
O
O
R(2)
J
5. Multiple Correspondence Analysis 127

in transforming "heterogeneous" data, that is data on different types of


variables, into a form which is suitable for colIective input to multiple
correspondence analysis (Section 5.4). In fact, since most data can be reduced
to qualitative form, practicalIy any type of data matrix can eventualIy be
explored by correspondence analysis.
The examples of Section 5.5 are most1y proofs of results stated in the
previous sections.

00 5.1 BIVARIATE INDICATOR MATRICES

Multiple Correspondence An examp/e


Analysis The 5 x 4 contingency table of Table 3.1 can be considered as the condensa­
tion of a 193 x 9 indicator matrix, each row of which consists of 7 zeros and
2 ones. The columns of the indicator matrix refer to the 5 categories of the
row discrete variable (i.e. "staff group" in the example) and the 4 categories
of the column discrete variable ("smoking"), so that the ones indicate the
categories to which each person belongs. For example, the 4 senior managers
In Chapters 3 and 4 we have discussed the simplest and most fundamental who do not smoke are coded as 4 identical rows of [1 O O O O 1 O O O]
example of correspondence analysis, when the data are in the form of a two­ in the indicator matrix (the first 4 rows of Table 5.1). Because there are only
way contingency tableo In Section 4.4 we showed that this is equivalent to the two discrete variables, the only information that we lose by condensing the
canonical correlation analysis of the data in the form of an indicator matrix. indicator matrix in the form of a two-way contingency table, is the identifica­
The rows of the indicator matrix correspond to the observational units tion of each person in the study.
(individuals, cases, subjects, ...) ofthe study, while the columns correspond t<;> We have already discussed the theory of canonical correlation analysis of
the categories of the two discrete variables defining the rows and columns of such an indicator matrix in Section 4.4 and have illustrated that theory on
the contingency tableo Each row has two non-zero elements (usualIy ones) the present data in Example 4.6.6. Our attention is now turned to the
which indicate the categories into which the observational unit falIs. correspondence analysis of the indicator matrix and its relationship to our
In this chapter we first demonstrate the close relationship between the previous analysis of the contingency tableo Notice that in canonical correla­
correspondence analysis of the contingency table and that of the indicator tion analysis the geometric framework of the columns is two dual sub­
matrix itself (Section 5.1). We then consider the correspondence analysis of a spaces (of dimensionalities 5 and 4 respectively in our example), while the
more general indicator matrix, where more than two discrete variables have correspondence analysis of the indicator matrix presumes the columns to lie
been observed on each unit (Section 5.2). This matrix consists of Q sets of in a single space. The 2-dimensional display of the columns of the indicator
columns, with Q ones in each row. The correspondence analysis of this matrix matrix is shown in Fig. 5.1 as welI as sorne selected rows, alI points being
is calIed multiple correspondence analysis and is equivalent to a particular displayed in principal co-ordinates. (Remember that the columns are vectors
generalization of canonical correlation analysis when there are more than originalIy in 193-dimensional space, and lie in a subspace of dimensionality
two sets of variables. at most 8.) Apart from the large change in scale, the configuration of the
In Section 5.3 we discuss the analysis of the responses of a group of column pOlnts closely resembles the configuration of the rows and columns
individuals to a questionnaire, and specificalIy how to treat such data in the of the contingency table, given in Fig. 3.3. What seems to have happened is
presence of non-responses. Here we often relax the strict zer%ne logical that the display has been stretched out along the second axis. Indeed, we shalI
coding of the indicator matrix and use values between O and 1 as welI, which prove later that relative positions along these axes remain identical apart from
is known as "fuzzy" coding. This more general type of coding is also useful different changes in scale along the axes. The total inertia and the principal
5. Multiple Correspondence Analysis 129
(f)

E
~ INNNNNNNNN NN leo ).2=0.5500
ro
~ C0 (15.7%)
o
ce

C0
>
I en
11 4 ,2
11
..
LI

Q)
000000000
N ?- ,
:oro
2:­ I
I '

'


113,2 11
>­~ w , o.

/"
c:
"O
g 2 oooooo~~~ 00 N
I

"
'
S·C
eo
~
Q)
(f)
c: u
M 11
4 t3~
11/
" JE
...
" ""­
""­
Q)
"O Ol ,
c: c: "" u 111
'04 ' .
:
o ~ -..J
oooo~~ooo 00 en
u o ~
~ ).1=0.6367
(f)
E N
I
..c. (/) (18.2%) 11 3,311? .SE
I
.S2 31 "
..c. o I ".11.
·0'
~ z ~~~~ooooo 00 I
NO
eo I

.~
....ro I

E I

E ¿
·0 u 11 4 ,4
11

u (/) en
000000000
~:..o N
en c: en

w·­
..J
en ....
Q)

113~411
<l: ro w
f- ·C )
ro 000000000 00 ro
:.o> 2:­
ro
SM
en el HV. •
X ::l
O W
C0 scale
en OJ (/) 000000000 00
'+- ~ en JM 0.5
C0
Q)
'+-
ro •
E U5
'O 2
(f) ) 000000000 00 ro
~ N
2 FIG.5.1. 2-dimensional correspondence analysis of the 193 x 9 Indicator matrix
N
.... ofTable 5.1, showing all the columns and so me selected rows.
(f)
ro
2
(/) 00 ~

~
"O
c:
ro
inertias are much higher and there are less dramatic differences in successive
en percentages of inertia, for example the first and second principal inertias
....

~
are 0.6367 and 0.5500 respectively (18.2 % and 15.7 % of the total inertia
-=
Q)
(f)

E respectively), compared to the values 0.07476 and 0.01002 (87.8 %and 11.7 %
..c. Q)
. . . . . . .
::l
f­ ..o ~ ~

(f)
respectively) in the analysis of the contingency tableo In fact we obtain 7 non­
ro ~;"-;"-;"-NNMMM ~~ c:
~ ~~~~~~~~~
tri tri E trivial principal axes of the indicator matrix, whereas the contingency table
o ::l
ce o yields only 3.
U
In Fig. 5.1 we have also indicated the positions of rows labelled "3,1",
"3,2", "3,3" and "3,4" (Le. senior employees in the 4 smoking categories, in
the context of the example) and "4, 1", "4,2", "4,3" and "4,4" (i.e. junior
130 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 131

employees in the 4 smoking categories). Since the indicator matrix has 25 that is
identical rows corresponding to senior employees who do not smoke, for Z
D,-1 O J[Zi l ZiZ2J[rfJ [rfJ Z
example, these 25 points pile up at exact1y the same position in the display, (1/21) [ O D,-1 ZIZ l ZI Z 2 rf = rf DA (5.1.6)
hence our labelling of these row points by their codeo After the following
where we have partitioned r Z into rf and rf, with JI and J 2 rows
theoretical treatment of the analysis we shall discuss the transition between
respectively. Since the correspondence matrix in the analysis of N is
the row points and the column points in this display as well as in other
p = (l/1)N = (ljI)ziz2 and since ZiZl = ID, and ZIZ2 = ID" (5.1.6) sim­
displays where either the row or the column points are in standard co­
plifies as the following pair of equations:
ordinates (Figs 5.3 and 5.4).
rf+D;lprf = 2rfDr (5.1.7)
Column geometry D,-lp Trf+rf 2rfD r
= (5.1.8)
To anticipate the generality of the remaining sections of this chapter, we
Muitiplying (5.1.7) on the left by D,-l p T and using the expression for
denote the numbers of rows and columns of the contingency table N by JI D,-lp Trf in (5.1.8), we obtain:
and J 2 respectively. The associated indicator matrix is denoted by Z, with 1
rows and (JI +J 2 ) columns, and is partitioned as Z == [ZI Z2] so that: D,-lp TD,-lprf = rf(2Dr-I)(2Df-l) (5.1.9)
N = ZiZ2 (5.1.1) Similarly, after premultiplying (5.1.8) by D,-IP and using the expression for
We shall temporarily use the superfix Z to distinguish the correspondence D,-1 prf in (5.1.7), we obtain:
analysis of Z from that of N, otherwise we use the same basic notation as D,-IPD,-lp Trf = rf(2Df-I)(2Df-l) (5.1.10)
Chapter 4.
The row masses rf are all equal to 1/1, while the column masses cf are Eigenequations (5.1.9) and (5.1.10) involve the same matrices as (4.1.23),
equal to the row and column masses of N divided by 2 (cf. the column sums hence the solutions in the analysis of N are solutions of these equations.
ofTable 5.1, which are identical to the row and column sums ofTable 3.1): However, the principal co-ordinates will be subject to difIerent rescalings
along the principal axes. It is clear when comparing (5.1.9 and 5.1.10) with
CZ=~[:J (5.1.2) (4.1.23) that the relationship between the eigenvalues of the two analyses is:
A = (2), Z _ 1)2 (5.1.11)
Thus the correspondence matrix and diagonal matrices of row and column
masses which define the correspondence analysis of Z are respectively: or, inversely:
pZ = (1/21)Z (5.1.3) AZ =!(1 ±Al / 2 ) (5.1.12)
D; = (1/1)1 (5.1.4)
In Fig. 5.2 we fully account for all (JI + J 2) dimensions, both trivial and

De
z=~[D,
2 O
0J
De (5.1.5)
non-trivial, in the analysis of Z (remembering that we are at present studying
the column points only). For ease of exposition and without loss of generality,
we assume that J 2 ~ JI' so that there are J 2 dimensions in the analysis of N,
It is convenient to show the relationship between the two analyses in terms
the first of which is trivial. These yield twice as many dimensions in the
of the standard co-ordinate matrices <J) and r (in the analysis of N) and r Z
analysis of Z, through (5.1.12). The trivial dimension (A = 1) yields the
(in the analysis of Z), since we can avoid the question of rescaling during the
expected trivial dimension associated with AZ = 1, as well as a "null"
discussion and introduce it later as an option in the display. From (4.1.23) the
dimension AZ = O (hence the 7 dimensions of Table 5.1, rather than the
standard co-ordinates of the (J 1 + J 2) columns of Z are obtained from the
expected 8). The non-trivial dimensions yield a set of J 2 -1 dimensions,
non-trivial eigenvectors of:
associated with principal inertias greater than A2 = !(1 + A1/2) and J 2- 1
dimensions with inertias less than A2 = !(1- A1/2), with standard co-ordinates
{2[D~1 D~I}1/21)ZTl(1/21)z}rZ = rZDr as shown in Fig. 5.2. This leaves JI + J 2- 2J 2 = JI - J 2 dimensions un­
132 Theory and Applications ofCorrespondence Analysis 5, Multiple Correspondence Analysis 133

(a) (b) more "spherically" than the row and column profiles of N, Because the
J2-1 J -J2 interesting principal inertias in the analysis of Z are aboye !, it seems that
. , . .I . ­

I I I 1
the percentages of inertia should be calculated on the quantities Af -!,
I I 1 1 k = l ... J 2 -1, which in our example of Fig. 5.1 would be 0.1367, 0.0500 and
I 1 1

0,0102 respectively, that is percentages of 69.4 %, 25.4 % and 5.2 %' These
JI 11 ~ :~: ~ :1

percentages reflect the relative values of the square roots of the principal
{ I I I 1

-¡---- ~-t ---:--' inertias in the analysis of N, In Section 5.2 we shall discuss the computation

J2
¡ 1 I
I
1
r I O I
I
J
I
1
r- 1-1
1
I
,:
I
r
ofpercentages ofinertia in a more general situation.

Row geometry of the in dicator matrix

I '
11 1 I 11
12U+D~) 1h 12(1-D~) I O
I 1 I
1 I
I
I 1, l
i I
D~ O
The l row profiles are vectors originally in (J I + J 2)-dimensional space, but
they occur at only JI J 2 distinct positions. The frequencies in N indicate how
many "pile up" at each of these positions and it is equivalent to consider the
FIG.5.2. (a) The complete matrix of standard co-ordinates (including the trivial geometry of the JI J 2 distinct point with masses equal to Pij' (By the principIe
and null dimensions) in the correspondence analysis of the bivariate indicator of distributional equivalence this agglomeration of the rows does not afTect
matrix Z == [Z1 Z 2 J, with associated principal inertias (eigenvalues). Thus r Z is the geometry of the columns, so that the columns can be considered initially
the (J, +J 2 ) x (J, +J 2 -2) matrix excluding the first and last columns. The first
as points in (JI J 2)-dimensional space.) DifTerent subsets of the rows collectively
column corresponds to the usual trivial dimension which centres the cloud of
J, +J 2 points, while the last column corresponds to a null dimension created by define a column of the indicator matrix, so there is only a subtle difference
the additionallinear dependency amongst the column profiles. (b) The complete between the rows and columns of the indicator matrix. For example, with the
matrix of standard co-ordinates cJ) and r in the correspondence analysis of the labelling of the rows "i l , i 2 ", i = 1 , '. l, where i l and i 2 indicate the response
contingency table N == Z;Z2' with co-ordinates cJ)0 in J 1 -J 2 null dimensions of of row i (i,e. the categories of the two discrete variables to which i belongs),
the column profiles such that cJ) TO,.cJ)° = O.
all the rows with i l fixed as ir collectively represent the category ir (cf.
Fig. 5.1, where all the points labelled "3,1" ". "3, 4" represent the group SE,
the third category of the first discrete variable), In this case there is not only
accounted fOL These are the counterparts of the null dimensions of the the usual transition formula from the column points to the individual row
analysis of N, namely the (JI - J 2) dimensions in the JI-dimensional space points but also a close relationship between the centroid of such a subset of
of column profiles along which there is no inertia (A = O). Their existence rows and the column point representing the particular category, In Figs 5.3
is irrelevant to the analysis of N but here they emerge as (J I - J 2) dimensions and 5.4 we show the results compatible in scale with Fig. 5.1, when firstly the
associated with the inertia AZ = ! (cf. (5,1.12)). The associated "co-ordinates" column points and secondly the row points are displayed in standard
(J)0 satisfy the condition of uncorrelation with (J): (J) TDr(J)o = 0, but have un­ co-ordinates. Only in Fig. 5.4, where the row points are in standard co­
determined orientation if (J I - J 2) ~ 2. Anyway it is clear that the last JI ordinates and the column points are in principal co-ordinates, do these
dimensions in this analysis are artefacts, with dimensions 2,3, ... , J 2 being centroids of the rows coincide with the corresponding column point. Other­
the only ones of interest. These correspond exactly to those in the analysis wise sorne rescaling along the principal axes is necessary to make them
ofN. coincide, where the rescaling depends on the principal inertias, Table 5.2
To summarize, so far, there is no difTerence between the display in standard summarizes the transitions between the rows (both individually and in
co-ordinates of the rows and columns of N and that of the columns of Z, groups) and the columns in each of the three displays. These results are
where we disregard all dimensions of Z associated with inertias of! and less, proved in Example 5.5,1.
However, there is a substantial difference in the principal inertias, which will Notice that it is geometrically impossible to obtain a display where the
afTect the display in principal co-ordinates. First the percentages of inertia in individual row points tie midway between their response categories and,
the analysis of Z will be very much lower and secondly the differences simultaneously, the group centroids coincide with the corresponding column
between their values is less dramatic-the column profiles of Z are dispersed point. This is reminiscent of the transition formulae between the two clouds
134 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 135

.
LI
,
.
LI
~

11
11
3 ,2
0\10)

se
se
. , .
.
JE
11
3 ,2
11
o .
JE

(i.e."SE,L1")

.
ME

~SE
3 ,3 110 '
.
ME 11

(12)
/N~
11
l5
3 ,3
11
SE
.. 113,1
°11

11
3.1
11 (25)
.
0

NO

0
11
3,4
11
11
3,4
11
/
(4)
.
HV
I °"SM
scole
1 I
scole
.
JM I 0.5

I I
0.5

.
HV SM
. FIG. 5.4. Same analysis as Fig. 5.1, with the row points in standard co-ordinates .
The centroid of the 51 row points which have labels "3,1", "3,2", "3,3" and
"3,4", for example, and which occur only at the four positions indicated, co­
incides with the column point SE, the third category of the first discrete variable.
.
JM

FIG.5.3. Same analysis as Fig. 5.1, with the column points in standard co­ Relationship with dual scaling
ordinates. Each row point lies at a position midway between the corresponding
column points, for example the 10 points labelled "3,2" (the 10 senior employees To conclude this section let us show how the dual scaling of the indicator
who smoke lightly) lie exactly midway between points SE and LI. matrix Z == [Zl Z2] is related to our analyses above, The objective of dual
scaling is to assign scale values to the columns (categories) so as to maximize
of points in the analysis of the eontingeney table N. Once again the display the variance of the row scores. Let us denote the (J 1 + J 2) scale values by the
in principal co-ordinates can be viewed as a compromise between these two vector v, partitioned into V1(J 1 x 1) and V2 (J2 x 1), and the 1 row scores by
competing objectives and has the advantage here that the rescaling in the the vector s, where the ith is of the form t(vj+vd, the average of the scale
transition to individual rows as well as to row centroids is, at least, the same. values of the responses of i. It is clear that this objective is equivalent to our
5. Multiple Correspondence Analysis 137

8co objective in the correspondence analysis of Z. In fact the optimal scale and
U O)
"O .~
"O e scores can be obtained exactly from the first principal axis of Fig. 5.3 if the
e ·0 E
tíO) ~ ::J

following identification conditions for vare chosen:
.2:l <t~~ o
co
co -Q)Cc
LO ~ ·u
o.
<J)
~
O)
u u
O)
..e..e
O) T r TV1 ~ c T Vz = O} (5.1.13)
v1 D,v I = vzDcv z = 1
O) <J)
> Ü lo- co"¡::
..e
+-,+-'
ii -5,ti)o. ........
~

co
1-5
co LL OCD g~ These imply the centering and standardization of v in terms of the column
'+­ O).~
o <J)­
:...=u)
<J)
~ co
0)0.
co O)

U"O
masses of Z (cf. (5.1.2)):
<J)
>.­ ~·ü ...;
e u
2:­
co
e ~
0
e e
0 .0
(CZ)TV =O v TDZ
e v
= 1 (5.1.14)
e o. Zuo.
co
O) but these latter conditions do not in general imply those of VI and Vz
U
e
O) ~ e O) individually in (5.1.13). Rowever, under the constraints of (5.1.14) on v, the
"O o O) .~ optimal solution does in fact turn out to satisfy (5.1.13). In effect we can view
e 0.0)
tíO)
o
o.
<J) C")co"O 3:~
o~..oO) o.
the last Jz "uninteresting" dimensions of Fig. 5.2(a) as forcing the separate
.o.~ <J)
~ ~O)u"o
LO.- co
O) > ~ centerings and standardizations of VI and Vz. Rere VI and Vz are the first
o~ ..o ~ e e ..e co O) <J)
columns of el> and r respectively, associated with the inertia (Le. score
u'O ---::J"¡:: ro
O) o.
+-'
.... 3: ..e co
.¡....o'';:;
O) o
..e ~
+-,0.

._
LL
<J) 1"0
0).­
.~ E
oQ)
O) .~
variance in dual scaling) of 1(1 + A~/Z). There is also a vector of standard
e o ro> <J)­ co-ordinates, VI and - vz' the first columns of el> and - r, associated with the
. - '+­
<J)~
~ 13 ,_N
~
O)
>.­
co
o. inertia t(l- A~/Z). The conditions that these be respectively uncorrelated
O) O)co"O
N-LO ~ x e e u
o O) co e with the trivial vector of co-ordinates 1 are:
L6'Stri
w
-.J
~ O)

0._

Z <J)
.­~

¡::

o.

'"-o: e o.

E
f-E co lTD;[:J =O and lTD;[ - : J =O
::J x
Ü W
u....: Oro O-q; l.e.
"O~ ~coco
+-'0.
0._
+-'0.
0._
~ x . o. o. o Ue o Ue T +cv
rv T =O and T TZ
rv I -cv =O
LO·- . ­
~O)uu
~ ~ I Z
:d~
O).¡:: Q)'¡::
ro "- e e
2 E
~::J.¡::.¡:: roo. roo.O) Together these conditions are equivalent to the individual centering con­
.S? o. o. ::J O) ::J
0">
0">
O) LL en '';:; U)'';::; straints of (5.1.13). SimilarIy, the conditions that these vectors be of unit
-5 O) u O) U
e ~ ~ en
<J)
~
O)
o. <J)
. inertia and uncorrelated with each other are:
O) O) <J) co O) <J) co
O)
~~t ~. O)
~'-E
3: . O) O) O)
and [vi VIJD;[ VI] O
Q)
..o
:..e e
+-' . ­ : -5.~ [vi vIJD;[::J =1 -V z
=

<J) <J)
e O) O) l.e.
o
.;;;
+-' X co: CO)
-e e _co N=..c vT T 2 and VITD ,VI -vzDcv
T z= O
.¡¡;
I D,v I +vzDcv z =
<J) '+-
o· ..... ro +-'
e O)
<J)+-, .c: 8. E co ~ ~:2~":- g.6
....~
O) co
+-' e O co ~ o·ue ::l e: ~ <J) Together these are equivalent to the individual standardizing constraints of
Q.. o
::':'~'6
+-' +-'
O)
..e Cll.- ~
,,-couo.¡::
s <J) e ::J o. Oc~m~ (5.1.13).
--"00 0·- O) • (.)O)=oco
+-' el. ~ , ...c:: u O) +-'­
g'
a .~
Cl
? S \....:
-S_N 3:
Q) ......N
"O o
......
~~ ~ ~
..o Q. co
::J.e­
e
o
Sc ~ :-0) C roO) c;:::~~;.g
3: E O ,- ..o co co e - ­
.~
o~ t:::"t¡ >·~c . .§ .~ ·0 15 o. 5.2 MULTIVARIATE INDICATOR MATRICES
·5<J) ~ o °CllCO<J):'=O)
u :~ ~ 3: e ~ :S
;;:: ...... O'xCJ)
C;;·S 3: ;;: e
:2 ·0 >
ti:2.e2~ An example of a trivariate indicator matrix is the original data underIying
O) '" -Cl
ti:.!!?Eo.~..o
<J)
O
Table 3.5. Rere we have an additional discrete variable with two categories
5. Multiple Correspondence Analysis 139
138 Theory and Applications ofCorrespondence Analysis
adds up to Q, while the column sums 1 TZ show the marginal distribution of
of response ("do drink" and "do not drink"), so that the indicator matrix is
responses over all the categories (see Fig. 5.5). We use the index j to refer to
of the form Z == [ZI Z2 Z3] with 1 = 193 rows and J = JI +J 2 +J 3 = Z
a column of Z and jq to refer to a column of Zq. The vector C of column
5 + 4 + 2 = 11 columns. Table 3.5 is therefore the particular condensation Z
masses of Z is given by C = (1/QI)Z T1, while the subset of masses for the
ZI[Z2 Z3] = [Zi Z2 zI Z3]. There is now a c1ear difference between
the information contained in Z and the information contained in this fre­
columns of Zq are denoted by the vector c;,
where c;
= (l/QI)Z;l. For sake
of c1arity we shall use the superfix Z to designate the correspondence
quency table, since the latter completely ignores the direct association
between variables 2 and 3 ("smoking" and "drinking"), as embodied in the analysis of Z in this section.
We shall now summarize sorne aspects of the row and column geometries
matrix ZiZ3' in the correspondence analysis of Z. These resuIts are all proved in Example
Underlying Table 3.7 is a 6-variate indicator matrix Z == [ZI ... Z6]'
5.5.2 and apply to the special case when Q = 2, which was discussed in
where the columns of Z6 refer to the three types of company c1ient. The table
Section 5.1. We assume throughout that there are responses in all the
is thus the condensation [ZI .,. Z5tZ6' which summarizes the association
categories of response, in other words there is no column of Z which consists
of each of the first 5 variables with variable 6, but ignores all associations
purely of zeros.
amongst the first 5.
Clearly when the study involves more than 2 discrete variables, then the (a) The sum of the masses of the columns of Zq is l/Q, for all q = 1 ... Q.
analysis of the indicator matrix and the analysis of any such condensation of Thus each discrete variable receives the same mass, which is distributed
it into a two-way frequency table might differ quite drastically. In this section over the categories according to the frequencies of response.
we shall c1arify the geometry of a multivariate indicator matrix (so-called (b) The centroid of the column profiles of Zq is at the origin of the display,
"muItiple correspondence analysis") and of certain two-way contingency that is at the centroid of all the column profiles. Thus each subc10ud of
tables derived from it. categories is balanced at the origino
(c) The total inertia of the column profiles (and of the row profiles) is:
in(J) = J/Q-1 (5.2.1)
Row and co/umn geometry

Let us consider a general Q-variate indicator matrix Z == [ZI ... ZQ], with (d) The inertia of the column profiles of Zq is:
1 rows and J = JI +... + J Q columns, where the qth matrix Zq (corresponding in(J q ) = (J q -1)/Q (5.2.2)
to the qth discrete variable) has J q columns (corresponding to J q categories).
There are QI ones scattered throughout Z, 1 in each submatrix Zq, otherwise Hence the inertia contributed by a discrete variable increases linearly
the elements of Z are zeros. Each row of Zq adds up to 1, and each row of Z with the number of response categories.
(e) The inertia of a particular category j is:
JI J2 JQ ¡nO) = l/Q - cf (5.2.3)
~.---"-

Hence the inertia contributed by a category increases as the response to


this category decreases, with an upper bound of l/Q.
(f) The number of non-trivial dimensions with positive inertia is at most
001 1°100 J - Q, in other words the dimensionality of the row and column profiles
is at most J -Q.
1
(g) The row profiles lie at the equal-weighted barycentre of the column
profiles representing their responses, up to a rescaling by the inverse
square root of the principal inertias along respective principal axes. There
is a similar relationship between the centroid of a group of rows with a
common response and the column point representing the response, as in
the bivariate case. Thus the results of Table 5.2 extend to the multivariate
FIG.5.5. A multivariate (Q-variate) indicator matrix. showing a typical row. case.
140 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 141

The "Burt matrix" Section 3.9). The Burt matrix is the analogue of the covariance matrix
It is instructive to compare the analysis of Z with that of the symmetric J x J
of Q continuous variables, where each J q x J q • submatrix is analogous
matrix Z TZ, which is cal1ed the Burt matrix in recognition of an artic1e by Burt to a covariance. Classical multivariate analysis of data on Q continuous
(1950) (cf. Benzécri, 1976; Lebart et al., 1977; Example 5.5.3). This matrix has
variables rarely proceeds beyond considering second-order moments, thanks
the fol1owing block structure: to the usual distributional assumptions of multinormality. Analogously, the
correspondence analysis of Z (or, equivalently, of ZTZ) does not take into
ZiZ l ZiZ 2........ z iz Q
account associations amongst more than two discrete variables but rather
ZIZ l ZJZ 2········ .. ~
looks at al1 the two-way associations jointly. In the jargon of multiway
ZT Z = . contingency table analysis we consider only second-order interactions. Thus
,
(5.2.4 )
" oo ••••••••••••••••••••
the correspondence analysis treatment of a multivariate indicator matrix Z
'.
T' T
. seems to be at an interface between the c1assical joint bivariate treatment of
ZQ Z l : ZQZQ continuous multivariate data and the complex interaction model1ing of
Each "off-diagonal" submatrix Z;Zq' (q =F q') is a two-way contingency table multiway contingency tables.
which condenses the association between variables q and q' across the Because the row and column co-ordinates of the Burt matrix (5.2.4) are
I individuals. Each "diagonal" submatrix Z; Zq is the diagonal matrix of identical we use one notation r B (respectively G B ) to denote the standard
column sums of Zq, which we have previously denoted in the vector QIc;. co-ordinates (respectively principal co-ordinates). Since the sum of the rows
Because the Burt matrix is positive semidefinite symmetric it is c1ear that its of each matrix Z;Zq" for al1 q and q', is the vector QIc;, it fol1ows that the
correspondence analysis produces two identical sets of co-ordinates for the rows of ZTZ sum to Q2 Ic z , so that the matrix R B ofrow profiles of ZTZ is of
rows and the columns. In Example 5.5.3 we prove that the standard the fol1owing form:
co-ordinates (of the rows or columns) in the analysis of Z TZ are identical to

l
the standard co-ordinates of the columns in the analysis of Z. Again the only I R 12 R 13 RIQJ
difference lies in the values of the principal inertias, which wil1 affect the sca1es R B = (ljQ) R~l I .R 23 R.2Q
of the principal co-ordinates. In this respect we also show that the principal . . '

inertias 2 B in the analysis of the Burt matrix are the squares of those of the R~l"""""""" : i
indicator matrix:
where R qq • is the matrix of row profiles of the two-way contingency table
2 B = (A Z)2 (5.2.5)
Z;Zq" There is only one transition, from the column co-ordinates to them­
In the bivariate case (Q = 2) the Burt matrix is simply: selves, for example in standard co-ordinates:
r B = R Br B(Df)-1!2
ZTZ == [Z¡Zl Z¡Z2J == [ID; IDN J (5.2.6)
Z2 Z 1 Z2 Z 2 N e r B can be partitioned into Q sets ofrows r: (q = 1 ... Q), in which case:
Now the standard co-ordinates of the rows and columns of N = ZiZ2 pro­
vide those of the columns of Z = [Zl Z2] (cf. Fig. 5.2), which are identical r: = (1jQ)(r:+~q'fqRqq.r:-)(Df)-1!2 q = 1...Q
to those of the columns (or rows) of Z TZ. The principal inertias 2 B of Z TZ are Col1ecting terms in r: and remembering that r Z = r B and Df = (Df)1!2
thus related to those ofN by (cf. (5.1.12)): we have the fol1owing expression for the co-ordinates of the categories of
variable q in terms of those of the other variables in the correspondence
2B = t(1 ±2 1!2)2 (5.2.7)
analysis of Z:
In Section 8.4 we describe an example by Healy and Goldstein (1976) where
the data are reported in the form of a Burt matrix. r;(QDf - 1) = (~q'fqRqq.r~) (5.2.8)
The fact that the analysis of the multivariate indicator matrix Z is As a special case of (5.2.8), when Q = 2, we have the pair of equations (5.1.7)
equiva1ent to that of the Burt matrix il1ustrates that these analyses should be and (5.1.8). This iIIustrates once again how special the bivariate case really is,
described as "joint bivariate" rather than multivariate (de Leeuw, 1973, because there is only one term in the sum on the right-hand side of (5.2.8).
142 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 143

In very special situations the analysis of a Q-variate indicator matrix Z (for In order to get (5.2.12) into a form comparable to (5.2.11) we have to factorize
Q > 2) will be equivalent to that of a two-way frequency tableo Suppose that 1/Q2' for example, as (Ql/2 /Q~/2)/(Ql/2Q~/2). The Ql/2Q~/2 factor is associated
the Q variables can be divided into two subsets of Ql and Q2 variables with Di/ 2 so that we have the following relationship between the principal
respectively, such that the variables within each subset are pairwise inde­ inertias ofthe two problems:
pendent of one another. We shall show how the analysis of Z is related to
Ql/2Q~/2Di/2 = QDf-I (5.2.13)
that ofthe table which crosses the categories ofthe Ql variables with those of
the Q2 variables. Without loss of generality, let us suppose that the first Ql The relationship between the standardized co-ordinates is a little more com­
and last Q2 (= Q - Ql) variables of Z are the iwo subsets in question, such plicated to determine, owing to the different partitioning of the masses in the
that any pair of variables of the first set are independent and, similarly, any two problems. For example, the masses associated with the first Ql sets of
pair ofthe second set are independent, that is: columns of Z sum to QdQ, whereas the masses associated with the rows of
(5.2.10) sum to 1. From (5.2.12) and (5.2.13) we know, for example, that for
Z; Zq' = Icqc; for q, q' = 1. .. Ql q =1= q'
(5.2.9) q = 1 ... Ql : Q~/2rq = [3r;, for a scaling constant [3. From the standardiza­
and for q, q' = Ql + 1 ... Q, q =1= q' tions ofrq and r;, [3 = Ql/2Q~/2/Ql/2 so that:
(where cq and cq ' are the row and column masses of the contingency table r q = (QdQ)1/2r; (5.2.14)

I
Z;Zq')' The two-way table in question is the (J 1 +.. .+J Q) x (J Q, +1 +... +J Q)
and similarly, for q = Ql + 1 ... Q (the column co-ordinates):
table:

r zIz Q
ZiZ Q,+1
.+. ZiZ Q,+2
ZiZ Q1 +2
...
... zIz
ziz Q
Q

(5.2.10)
r q = (Q2/Q)1/2r;

Relationship to canonical correlation analysis


(5.2.15)

Z~'~Ql+1 Z~lZQ,+2 ... Z~lZQ The relationship between the canonical correlation analysis of the bivariate
indicator matrix [Zl Z2] and the correspondence analysis of the contin­
The equations defining the column co-ordinates of Z are of the form (5.2.8), gency table ZiZ2 has been discussed in Section 4.4. Recall that we described
with the terms on the right-hand side subdividing into two sets. Ir q' is in the canonical correlation analysis as the search for maximally intercorrelated
same set as q then from (5.2.9) R qq , = lc;, hence the term RqqT;' is zero·· linear combinations u and v of the two sets of indicator variables, that is
because the centroid of the columns of Zq' is at the origin (the masses c;, of vectors u and v subtending a minimum angle, where u and vare identified by
these columns are proportional to the masses cq' of the columns of Z; Zq')' the usual standardization conditions of zero mean and unit variance. It is not
Thus the right-hand side of (5.2.8) involves variables of the other set only, possible to generalize this definition to the multivariate case, because each
resulting in two transition formulae between the co-ordinates of each set: submatrix Zq defines a subspace and the concept of an angle between more
for q = 1. .. Ql :r;(QDf-I) = ~~=Ql+IRqq.r~ than two subspaces cannot be generalized.
(5.2.11 ) However, there are alternative definitions of canonical correlation analysis
for q = Ql + 1. .. Q :r;(QDf - 1) = ~~~ lRq'qr~
which are readily generalized. For example, an equivalent objective is to find
In order to describe the correspondence analysis of (5.2.10), we shall use the u and v, with the same identification conditions, so that the sum of their
notation r for the matrix of standard co-ordinates, with its rows partitioned squared correlations with a third vector w (whose components also sum to
exactIy as those of r Z (i.e. the first Ql sets of rows of r are row co-ordinates zero) is a maximum (Carroll, 1968). This is in turn equivalent to finding u and
and the remaining Q2 sets of rows are column co-ordinates). Thus the usual v so that their sum u + v has maximum distance from the origin, subject to
transition formulae between row and column co-ordinates are: overall centering of u + v to have zero mean and a single standardization
condition that the mean squared distance of u and v to the origin is a constant
(columns to rows) for q = 1 ... Ql :rq Di/ 2 = (1/Q2)~~'=Q,+IRqq.r q' (Lebart et al., 1977). At the optimum u and v turn out to be individually
(rows to columns) for q = Ql + 1 ... Q2: r q Di/ 2 = (l/Ql) ~~= lRq'qrq. identified to have zero mean and the same variance.
(5.2.12) In order to generalize to Q sets of variables, let uq (I xl) denote the linear
144 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 145

combination of the qth set of variables (q = 1 ... Q), where the coefficients are subsets of scale values for each major dimension (J...: > t). When Q = 3, it
denoted by aq (J q xl): turns out that if there are K + principal axes with principal inertias above the
value i, then there are K- = 2K+ principal axes with principal inertias less
Dq = Zqa q than i which imply the individual centering and standardization of the 3
We can then search for solutions aq , q = 1 ... Q, which yield maximum sum subsets of scale values. In general, only the principal inertias above the value
of squared correlations between the DqS and a (Q + 1)th vector w (I xl), 1jQ are "interesting" and it is clear that a rather pessimistic impression of the
subject to the usual identification conditions on each a q via those on Dq • Or, quality of a display is obtained by the usual percentages of inertia. AIso if the
equivalentiy, we can search for solutions which resuit in D 1 +... + DQ having J q categories are derived from segmenting the range of a continuous variable
maximum distance from the origin, subject to overall centering and the single then we have the rather undesirable resuit that the usual percentages tend to
identification condition that the mean squared lengths of the DqS is a constant. zero, even on the major dimensions, as the subdivisions are made finer and
Either of these objectives is equivalent to the correspondence analysis of the finer. In other words, the process of subdividing introduces new dimensions
Q-variate indicator matrix Z (or of ZTZ ). Again at the optimum the DqS are as J increases (while Q is fixed), sorne of which are of no direct interest to the
individually centered and standardized to have equal variance, i.e. equal study of the interrelationships between the variables.
(squared) distance from the origin, or squared length. If we consider the analysis of the indicator matrix ZO with J 1 x J 2 •.. x J Q
rows, one row for each of the possible responses to the Q questions, then the
J - Q principal inertias are aH 1jQ. This is an additional justification for
Relationship to dual scaling taking 1jQ as a "baseline" value for the principal inertias of an indicator
matrix which is essentially a row-reweighted version of ZO.
The second equivalent definition of canonical correlation analysis which we Benzécri (1979) thus proposes that the percentages of inertia be computed
have discussed above, is exactly the objective of the dual scaling of Z: namely on the values J...: - (ljQ) and only for those inertias that are above 1jQ. In the
to assign scale values al'" aJ to all the categories of the variables (columns case of the Burt matrix the percentages should be based on the quantities:
of Z) so as to maximize the variance of the row scores. The elements of the
vector D 1 +... + DQ are Q times the row scores (the scores are conventionally (J(J...:) == {Qj(Q -1 W{J...:- (ljQ)}2 (5.2.17)
averages of scale values) and they sum to zero, so that maximizing the which vary from O to 1 as J...: varies from (ljQ) to 1. Notice that (5.2.16) is
distance of D 1 +... + DQ to the origin is equivalent to maximizing the row admitted as a special case when Q = 2. Again, only J...:s greater than (ljQ)
score variance. As discussed above, the optimum scale values are such that should be taken into account. The values (J(J...:) also occur as principal
the uqs are individually centered and standardized in the same way. In other inertias in the analysis of the Burt matrix for which the diagonal has been set
words the overall identification conditions of location and scale on the to zero. We shall discuss this property further in Section 8.6 which deals with
complete set of scale values implies individual conditions on the subsets the analysis of symmetric correspondence matrices.
al'" aJ at the optimum. We shall discuss this property further in Section
8.4, wh¿re different ways of constraining the solutions are described.
Special case: binary variables
When each variable has only two categories, that is J q = 2 for aH q and
Artificial dimensions and the calculation of percentages of inertia J = 2Q, the correspondence analysis of Z is closely related to the principal
In Section 5.1 we showed that in the special case Q = 2 the principal inertias components analysis of a matrix Y with Q columns, where each of these
J...: of the indicator matrix Z == [Zl Z2] are related to those J.. k of the columns is one of a pair of columns of Z and is standardized to have unit
contingency table as follows (cf. (5.1.11)): variance. Thus in the matrix Y each discrete variable is represented by just
one of its categories.
J.. k = 4(J...:-t)2 (5.2.16) It is slightiy easier to show the relationship between the correspondence
We also showed that the "interesting" inertias J...: are those above the value analysis of the 2Q x 2Q Burt matrix B = ZTZ and the principal components
of t, the rest being artifacts of the analysis. In the context of dual scaling analysis of the Q x Q correlation matrix (ljI)y T y. If V denotes the matrix of
these minor dimensions (J...: < t) serve to centre and standardize the two eigenvectors ofthe correlation matrix: {(1jI)y Ty}V = VD" then Vis related
146 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 147

to the matrix r + composed of the Q corresponding rows of the standard one. The fact that a particular answer is of special importance distinguishes
co-ordinate matrix r B (or r Z ) as follows. this situation from a typical survey where no special emphasis is placed on
particular responses.
r+ = D",V
Data emanatng fmm questionnaires involve all the peculiarities that
where D", is a diagonal matrix with typical diagonal element: might be expected when dealing with people. People often deliberately omit
answering a question (e.g. on their income), sometimes they do not know
t/lq = ((I _bqq)/bqq)1/2 what to respond or their responses do not coincide exactIy with any of the
where q refers to the selected category of the qth discrete variable and bqq is alternatives provided. A questionnaire which has been carefully prepared
the corresponding diagonal element of B. Notice that the t/I qS rescale the rows ineludes such alternatives as "Not prepared to answer", "Don't know" or
of V to obtain the rows of r+ which represent the selected categories. Thus "None of these" to allow for these eventualities, but having achieved this, how
the relative positions of the categories on every principal axis are changed by are these data to be analysed? We shall discuss this problem from a number
this rescaling and the difference between the configuration of rows of V and of viewpoints and illustrate a number of ways that correspondence analysis
that of r+ is by no means trivial. This difference affects the positions of the can cope with such data.
rows of Z and the rows of Y as well, since these depend on the scalings of the
Q variables. The value of t/lq depends on the variance of the qth variable and, Non-responses when the variables (questions) are binary
since the relevant column of Z contains only Os and ls, this variance is easily
seen to be zq(l-zq)/I, where Zq is the mean ofthis column (i.e. the variance of Let us start by considering a fairly simple questionnaire where each of the Q
a Bernoulli variable). Since Zq = bqq/I this variance may be written equiva­ questions has only two possible responses (e.g. "Yes" or "No", "True" or
lentIy as bqq(I - bqq )/I 3 and so the nearer bqq is to 1, say, the eloser t/I q is to zero. "False", ...). The response "Yes" to a question is coded 1 and O in the two
If bqq = tI, that is there is no "polarization" on variable q, then there is no relevant columns of the indicator matrix, while a "No" is coded O and 1. A
difference between the qth rows of r+ and V. non-response might then be coded as t and t to indicate indifference to a
These results come up again in Section 6.2 as a special case of the more "Yes" and to a "No". Alternatively, sorne other coding scheme cx q and 1-cxq
general discussion of "doubled" data in Chapter 6, where we amplify the might be used with a different justification for the choice of cx q • Another
concept of polarization of bipolar observations. alternative is to create a third response for each question, or to create just one
extra variable which records the total number of non-responses to the set of
questions by each individual.
5.3 ANALYSIS OF QUESTIONNAIRES AND NON-RESPONSES We shall briefiy describe sorne ways of handling non-responses in the
context of the data in Table 5.3. These are the (fictitious) results of a survey
The most common example of a multivariate indicator matrix arises as the of 100 randomly selected people, where each person has responded to a
result of a sample survey where 1 individuals respond to Q questions of a questionnaire consisting of 5 questions, also listed in Table 5.3. Only two
questionnaire. There are many ways to conduct a survey, for example, a alternative responses to each question are provided and 25 people choose not
question may be posed along with a fixed number of alternatives from which to respond to at least one question, with a total of 36 non-responses: 8, 16, 7
the respondent must make exactly one selection. In sorne cases it will be and 5 respectively to questions 2, 3, 4 and 5.
difficult to specify all possible responses beforehand, so that the question is Since a quarter of the sample has not responded in sorne way, we can
"open-ended" and a categorization of responses has to be made after all the consider creating a third alternative for questions 2 to 5, so that the multi­
questionnaires have been completed and studied. This latter strategy is variate indicator matrix Z is a 100 x 14 matrix, with 2 columns (la and lb)
naturally more problematic and involves a large amount of work before the for question 1 and 3 columns (2a, 2b, 2*; 3a, 3b, 3*; etc.... ) for the other
statistical investigation even commences. Anyway, whatever the method of questions (Table 5.5(a)). The Burt table ZTZ is given in Table 5.4 (ineluding
data collection the result is a multivariate indicator matrix, with each a row and column of zeros marked 1*) and shows the marginal frequencies
question q having a fixed number J q of responses. A special case of such a down the diagonal and the between-question contingency tables off the
questionnaire is a multiple choice examination, where a number of questions diagonal. Figure 5.6 shows the two-dimensional correspondence analysis of
are posed and alternative answers are provided, one of which is the correct Z. The direction of spread along the first axis separates the younger, lower
148 Theory and Applications ofCorrespondence Analysis
I
I
~

o
TABLE5,3 ~
•LD '<t~o O'<t~ O N (Y) (Y)NO OOLD
LD
Ouestionnaire with Q = 5 questions and J q = 2 possible responses to each o
Ol e
question q, The responses of 100 people in a fictitious survey are shown, where • ID o
(Y)com m co (Y)
denotes a non-response, rou '''::;
Vl
ID

-QloooloLDLD
LD '<t(Y) '<tN LD ~'<t
000
r-­
:::J

2 O
Ouestion 1 Sex: (a) male ~
ro
C1J
L!')
I '<t~0 I '<tmN
~~ ~
(Y)co'<t
~
ImN'<t
~
ILDOO
N
(b) female o.
ID

Vl
Ouestion 2 Age: (a) under 30 ro (Y)'<t0 N (Y) N (Y) ~ (Y) Oor-­ '<t(Y)0
'<t
(b) 30 or older Vl '<t
ro e
Ouestion 3 Annual income (before tax) u o
'''::; -Q ~~O N LD LD m'<tm ONO N CO N
ID
(a) below [8000 per annum u Vl '<t (Y) N (Y) ~ (Y) LD '<t
(b) above [8000 per annum o
U O
ID
:::J

Ouestion 4 Are you ~ C1J '<tr--o OO~ '<t(Y)'<t ~oo Imm(Y)


'<t N~ NN N~ '<t
(a) optimistic ~
ro
(b) pessimistic
about the future of Britain's economy?
Vl
ID
Vl
e
o
(Y)
e
·
(Y)
COCOO r--'<tLD 00<.0 '<tm(Y) l'<tm(Y)

Ouestion 5 Are you o. o


'''::; -Q '<t'<tol<'o~~\OCOOI(Y)'<t~ ICOCON
(a) for Vl
ID Vl (Y)
(b) against , ID
:::J
e
the present government's economic policies? o O ~ (Y) N (Y) (Y) O
e C1J <.000 <.Oool'<tm(Y)
(Y) (Y) (Y) '<tN <.O N (Y) ~ LD
ID e
aaaaa aabba abbab baabb a'abb ~ -S.g
baaaa
baaaa
aaa'a
aaa'a
abbab
abbab
baabb
baabb
b'abb
aabbb
LDID
~ <u ~
<Il .c o­
Vl

N
e
·
N '1 N <.00 OOCO IN~LD I~LDN INLD~

baaaa bbb'a abbab baabb abbbb "'~.c


f-- u o
OLD(Y)
'''::; -Q (Y)LDO OCOO (Y)~'<t mLD'<t
baaaa b"'a aaabb baabb b'bbb 'ro
(Y) ID Vl N N~ (Y) N~ N~ N
ID
abaaa aaaab aaabb baabb aa'bb LD :::J

aaaab aaabb baabb aa'bb ID O


abaaa :oro C1J (Y)~O '<tOO ~
'<t
<.O r-­ ONN
N (Y)
'<tOO
N (Y) N LD ~'<t
abaaa aaaab aaabb baabb aa'bb
f-­
bbaaa aaaab aaabb ababb ba'bb
bbaaa aaaab aaabb ababb ba'bb o · 000 000 0001000 000
aaabb ababb ab'bb ¡g ~

aabaa baaab ro e
aabaa baaab aaabb ababb a"bb u o
'''::; -Q 10NO ~ LD <.O O'<tco r--.~o:::t l~o~
b"'b ID (Y)
aabaa baaab aaabb ababb -5
Vl
ID
~ '<t N~ ~N ~M

abbaa baaab aaabb bbabb aba'b O


:::J

abbaa abaab aaabb bbabb bb"b .8 C1J I co O O (Y) (Y) N <.O'<tco '<t~(Y) '<t0'<t
Ol ~ LD (Y) N (Y) ~ N (Y) ~'<t
bbbaa abaab aaabb bbabb abba' e
aa'aa bbaab aaabb bbabb abba' u
e
ba'aa bbaab aaabb bbabb ab'a' o
o.
b"aa bbaab aaabb bbabb ab'b' Vl
ID C1J -Q • I '<t'<t'<t C1J -Q •
aaaba babab baabb bbabb b"b' o
I C1J -Q •
~~~ NNN
C1J-Q.
(Y)(Y)(Y)
C1J-Q.
LD LD LD
U
x
°5 (Y)
ro ~ N '<t LD

I
income respondents who are pessimistic about the economy and generally E e e e e e
+-'
o o o o o
.,::; '''::; '''::; '''::; '''::;
anti-government, from the older, higher income respondents who are opti- :::J Vl Vl Vl Vl Vl
en ID ID ID ID ID
mistic and pro-government. There does not seem to be much association :::J :::J :::J :::J :::J

between this feature and the sex of the respondent. The 4 non-response points
ID
.c
f-­
O O O O O
I
are quite separate from the others, and determine the second principal axis,
5. Multiple Correspondence Analysis 151
o . _o
·
'"
~ -Q

'+-
C
E
~OL
O)
CJ)
C
O)
E
O)
·

l.()~

LD _IN LD 'V'
A2=0.3336
(18.5%)
-
ü ° O-
CJ)
O)
~
Ü
CJ)
Ol
;20 ;2 -1'<1 -Q '"
LD ""'­
O) '+- C
CJ) 0 . ­
C "O
·2­
° _IN °
O-
CJ) <lJ

O)
~O ~ -1'<1 <lJ
LD ~
'"
O) ~
~ O)
U)m
·
·

° C C
O O)
~
,<:!"O N •,<:!"O ,<:!"O
C O- C
CJ) O)
~~())

(fJ
Ü
~oo
~CJ)~
'+-

-IN2
t)
ªO
;20 ;2 _IN
ªO
ªO
I scale I
3_
• 4­

ro ro O) 0.5
~O ~ _IN
<lJ <lJ <lJ
~"O~ '<:!"~ '<:!"~ '<:!"~

-:'Q)¿
-gü E 5­

ªO
ªO
·

'<lJ
• ~

(V)~ (V) _IN •(V) 'V'
M

-QCJ)o
": 35 ü
2-c ro
roc 0._
O- Ü ~O <lJ
'<:!"~
<lJ
'<:!"~ ~ -1'<1 -Q M
(V) ""'­
.- (f) ~ U)

.~ ~ CJ) ~
(; C"D e
~O ~ _IN ~ _1'<1 <lJ M
~O~O
~O (V) ~
lb
rozi:
(fe~ale)

~O') +-'~

ª
LD -.C'O\­

~«>-croc
~-g g
O)

I-.oECJ)O)
Ü ,
•N O ~O ~ _IN
·

NO •N O
I ¡ Al = 0.3676 (20.4%)
"o..::::'c=o 5b (against)
O)
+-' Ü
E °
c
=' m
-Q
N~
-Q
N~
-Q
N~
-Q
N~
-Q
N~
la
2b

5a

~<l>OL 3a••2a «30)
• (;¡o 30) (for)
CJ) C Ü (male)
t>reOOO) 4a
..::::'°N
:-:: C ¿;JO ¿;JO ¿;JO ¿;JO ¿;JO • 3b
(aptimistic)
evi~-S •
(.. taoOO)
LDO)C
O) "O O)
-00)
-QO ;:0 ;:0 -QO
~
-QO
~

~ü~
I-~Q)
<lJ
'+-CJ).o
O c_ ~~ <lJ <lJ
~
<lJ

~ 8.~
ro CJ) CJ)
FIG.5,6, Correspondence analysis of the 100 x 14 indicator matrix derived from
"O~O) the data of Table 5.3 (i.e. with coding of Table 5.5(a)). The inertias and per­
O) , CJ)
LCc centages of inertia are those of the usual analysis whereas if they are based on the
'-+ -O
Co­ ° CJ)
O)
CJ)

O)

CJ)

O)

CJ)
O)
CJ)
O) quantities (52.17), the percentages of inertia are 56.2% and 35.7% respectively.
0 _ CJ) E E E E
ro ..
E
ro ..
ro .. ro ..
¡g,~ ~ ro
c~ c~ c~ c~ c~

c~c
c
"O c"O c"O c"O c"O with the point "female" being the only one which looks like it tends in the
E~ E~ E~ E~
O)
'-.0 O
"O~c E"O
Ü
O
c ro
. _~ Ü
O ~_
~ O
Ü ~
~O °
Ü
~_ ~ °
Ü ~_
~ O
Ü
direction of the non-responses. It seems then that the female respondents
~U~ refused to answer the questions more often than males-in fact they refused
~OO) O
-O,+­ .0 0 0) "0 0 0) O) O)
~.~ O ~u-=- ~u-=- ~U-=- ~U-=-
Q; ~ ~ to answer 19 questions, whereas we would expect 15 non-responses based on
'+-~ro ro ro ro ro ro
~ 0"2 cr> cr> cr> cr> cr> the proportion of females in the sample, At this stage we are not interested in
"OL
O)Ü~
O)
c
O)
c
O)
c
O)
c
O)
c the statistical significance of this observation, but we can evaluate the
Lro"O
~ ~ ~ ~ ~
I-O)~
chi-square statistic between the frequencies 17 and 19 respectively of male
and female non-responses and the expected frequencies based on the propor­
152 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 153

tions 58 and 42 respectively of males and females in the sample. The statistic the centroid of these points. The displayed positions of the other points
is 1.717, which is not significant. hardly change and the only difference is that we no longer have the spread of
The spread of the non-response points relative to the spread of the other the individual non-response points, as observed in Fig. 5.6. In other words,
points is indicative of another interesting feature in the saulple. Points the principal plane is very stable with respect to merging these 4 points.
representing non-response to questions 4 and 5 lie on the side of the older, If the non-responses are of negligible importance in the survey we can
higher income group. Looking at the data we see that 12 people did not choose the coding scheme illustrated in Table 5.5(c). Here there are only 2
respond to one of these questions, 7 of whom were "older", 2 were "younger" columns per question and a non-response is coded as a ! in each column.
and 3 were of unknown age. This indicates that in the sample older people Here the non-responses to a question is counted as a half a response a and
were indeed more reticent about answering these questions. Again we are not half a response b. (But, in general, we can code this as a and /3, where
interested in testing whether this feature is statistically significant. Our a + /3 = 1; see below and Table 5.5(e).) Geometrically, the 16 non-responses
objective here is exploratory, not confirmatory. to question 3, say, are concentrated into the point 3* of Fig. 5.6. The present
Instead of creating a special column for each question's non-response, we coding re-allocates the mass of 3* in equal amounts to the points 3a and 3b.
can create just one column which records the total number of non-responses, Because the non-response points almost completely determine the 2nd
irrespective of the questions (Table 5.5(b)). Geometrically we have merged principal axis in the analysis of Fig. 5.6, we expect a new axis to emerge as a
the 4 columns 2*,3*,4* and 5* into one point ** in Fig. 5.7, which represents result of the "disintegration" of these points. Figure 5.8 shows the new
analysis and we see that the first principal axis is very similar to those of
previous analyses, while the second is now determined almost exc1usively by
>'2= 0.325~
(24.0%)
>'2= 0.2076

(non - response)
(22.7%)

seate 1
I 0.5 lb

1 seale 1
0.5

2- 0
o4­
Sa
3a 4a
• •
lb •
4b • 2a I 3.
• 4b •
·Sb 2b
Sb
>'1= 0.3S78

(26.4%)
2a 3b
• la 2b •
3a· • • "s.
Sa
• la
4a •

.3b
FIG.5.8. Correspondence analysis of the 100 x 1O indicator matrix where the
non-responses are recoded as ~ and ~ for responses a and b (i.e. with coding of
Table 5.5(c)). The individual non-response points (2', 3', 4' and 5') are
FIG.5.7. Correspondence analysis of the 100 x 11 indicator matrix where all the displayed as supplementary points but are pQorly correlated with this principal
non-responses are recoded in one column (i.e with coding ofTable 5.5(b)). plane.
154 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 155

I scole A2=0.2203 changes as ~ decreases in small steps from 1 to O, as the focus is gradual1y
0.5 I (16.6%) taken ofI the non-responses until their mass is absorbed completely into that
of the responses. Remember that the positions of the non-response points are
fixed throughout this process and that it is the orientation of the principal
lb
• plane that is changing as the "attractive force" (mass) of the non-response

.4­ points is decreased to zero. In the limit, as ~ tends to O, these positions can
still be represented as supplementary points, as in Fig. 5.8. In this particular
example, because the first principal axis is very stable, the principal plane is
.50 actual1y pivoting around this axis as the masses of the non-response points
are varied.
3-
• 40 The most general way of recoding the non-responses prior to correspond­
30
•·20 • ence analysis is illustrated in Table 5.4(e). Here a non-response to question q
AI= 0.3388
(25.5%) is given a relative mass of ~q, while the remainder of 1 - ~q is divided into a
5b
4b
• part rx q for response a and /3q for response b (so that rx q + /3q + ~q = 1). The
• 2b
• values of rx q and /3q can be the same, as previously, or difIerent if we wish to
3b
view the non-responses as missing values. For example, individual number 98
• is an "older male", and of the 20 "older males" in the data who have
10
responded to question 3 we see that 11 have a "lower income" and 9 a "higher
• income". We could thus al10cate the remainder of 1-~3 in proportions 11 to
5· 9 to responses a and b respectively: rx3 = 11(1-~3)/20, /33 = 9(1-~3)/20.

These values take into account the relatively higher proportion of "higher
FIG.5.9. Correspondence analysis of the 100 x 14 indicator matrix where the
incomes" amongst "older males", compared to the marginal frequencies of
non-responses are recoded as .¡. .¡ and 1- for responses a, b and • respectively the whole sample, which are 66 and 18 respectively for responses 3a and 3b.
(i.e. with coding ofTable 55(d)). Clearly our previous al1ocation of equal values of t(1-~3) to each response
ignores even these marginals.
In many cases these different recodings of the data will lead to negligible
the sex of the respondents. This is just a stronger indication that sex is not differences in the correspondence analyses, especial1y when the frequency of
associated with the feature which we have interpreted along the first axis. non-response is fairly low. There is no "correct" way to recode the data-each
One property which is common to al1 these schemes is that each response study has its own peculiarities and objectives which willlargely determine the
or non-response generates a 1 in the recoded matrix, so that the total of each approach of the data analyst. When faced with large frequencies of non­
recoded matrix is a constant QI, in this case 5 x 100 = 500. In Table 5.5(a) the responses, we suggest that difIerent recoding schemes be attempted and the
1 for a non-response is placed in a special column, in Table 5.5(b) al1 the ls subsequent analyses compared. As in the above example, each one of these
for non-responses are col1ected into a single column, and in Table 5.5(c) each focuses on difIerent aspects of the non-responses and their association with
1 for a non-response is split equal1y between the possible responses. Various the features in the main body of data.
combinations of these coding schemes are possible, for example in Table Example 5.5.4 treats the problem of non-responses more theoretical1y.
5.5(d); here the column names are the same as those in Table 5.5(a), but only A1though we have illustrated the approach in the simple situation where each
t of the mass of a non-response is al10cated to the non-response column, the question has just two possible responses, the principIes remain identical in a
other half being divided between the responses a and b in equal parts. The more general situation. Each line of the recoded matrix has a total of Q, a 1
analysis of this matrix, shown in Fig. 5.9, is thus "halfway" between Figs 5.6 for each question q (q = 1 ... Q). In the case of a non-response this unit of
and 5.8. In general we have a coding scheme which al10cates ~ to the non­ mass can be assigned total1y or partial1y to a non-response category. If
response column and t(1-~) to each of the response columns, where ~ can partial1y assigned, the remainder must be divided between the actual response
be any value from O to 1. It would be interesting to observe how the display categories, equally or in proportion to sorne justifiable distribution. Geo­
156 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 157

metrically, we create new (column) points for each question's non-response TABLE 5.6
and then divide the mass ofthe non-responses between these points and those Comparison of four correspondence analyses described by Hamrouni
and Benzécri (1976, p. 178, their analyses 1. 2, 5 and 7 respectively) of
ofthe other (column) points.
the voting at the United Nations in 1967. Correlations between the
positions of the 122 countries on principal axes which are higher than
Case study: United Na tions , resolutions (05)1/2 = 0.707 are reported. For example. the correlation between
these positions on axis 1 of analysis A and on axis 1 of analysis C is
Hamrouni and Benzécri (1976) discuss a case study of the pattern of voting 0986, so these axes "agree"
by member countries of the United Nations on 62 resolutions of the general
Avs B Avs C Avs O B vs C
assembly during 1967. Their discussion centres mostly around the way to
deal with abstentions and absences by various countries. Initially they 1,1:0.987 1.1 :0.986 1,10.995 1.1 :0.998
criticize previous work by Deutsch and Martin (1971), who coded the data in 2,2:0.994 2.3:0.972 3,2: 0.938 2.3:0.973
binary form as described aboye, corresponding to votes of "yes" and "no", 3.3:0.966 3.4:0.857 4.3 0.739 3.4: 0.876
with both abstentions and absences coded as t and 1. Because there are 4.4:0.969 4.5: 0.909 4,5: 0.935
5.5 0.850
large frequencies of abstention during most votes and because certain 6,6: 0.877
countries are characterized by high absenteeism, this is clearly an undesirable 7,7: 0.939
coding scheme. An abstention in this context is not a mere non-response, but 8.8: 0.838
a definite attitude and should be treated as a third distinct category. Whether
or not we should treat an absence as an additional category or whether we
should spread it across the three categories is not as clear, and Hamrouni and In order to compare the analyses we can compute the correlations between
Benzécri describe a number of different analyses to compare possible coding the positions of the 122 countries (rows of Z) on the principal axes of each
schemes as well as to check the stability of their results. In one of these an analysis. Thus there appear to be relatively minor differences between
absence is coded as three zeros in the three categories. In this case the coded analyses A and B, because their respective principal axes (as far as the 8th)
values do not sum to 1, as we have described previously, and the particular are highly correlated (see Table 5.6). In analysis C the first four axes of
country receives a smaller mass. The position of such a country would also analyses A and B are recovered, but the second axis of analysis C is unique.
be affected by this coding, in contrast to the coding schemes which try to In fact it turns out to be an "absenteeism axis" and strings out the countries
interpolate the absent vote by sorne set of expected values. which are frequently absent when a vote is taken. This feature is naturally
To illustrate the methodology of Hamrouni and Benzécri we have selected absent in the other analyses where absence is not treated quite as distinctIy.
4 out of the 10 correspondence analyses which they performed on these data In analysis D the second axis of analyses A and B appears to have not re­
(their analyses 1, 2, 5 and 7). These analyses are characterized by the number appeared. This is an axis which separates Portugal and South Africa from the
of columns J q of the matrix Z which are used for each resolution q and how other countries because they alone voted "no" to 9 resolutions. These isolated
the abstentions and absences have been coded prior to analysis: votes have been lost in the coding scheme of analysis D where a "no" and an
"abstention" are equivalent.
Analysis A-J q = 3: q+(yes), q-(no), qo (abstention); absence coded as Hamrouni and Benzécri (1976) make a substantial interpretation of the
0,0,0. results of these analyses and manage to describe as many as 7 distinct
principal axes. Of these at least 4 can be described as stable in the sense that
Analysis B-J q = 3: q+(yes), q-(no), qo (abstention); absence coded as
they reappear in different analyses.
oc q _, ocqO , the respective proportions of countries who were present that
OC q +,
voted in each of the three possible ways.

Analysis C-J q = 4: q+(yes), q-(no), qo (abstention), q* (absence); 5.4 RECODING OF HETEROGENEOUS DATA
absence has its own distinct column.
A data analyst is often faced with a study which involves different types of
Analysis D-Jq = 2: q+(yes), q-(no or abstention); absence coded as O, O. variables. For example, a questionnaire might consist of several questions to
158 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 159

which there are only categorical answers (e.g. yes/no, or disagree/undecided/ data analysts would consider this to be a continuous measurement. (Most
agree) as well as questions which elicit a numerical response, like age and analysts would even gloss over the discreteness of the age measurement,
income. So far we have considered using correspondence analysis exc1usively treating it also as continuous.) Another way of describing these variables is
on discrete data such as contingency tables of counts and indicator matrices that they are basically continuous phenomena, but our observation of them
which record qualitative information on a set of individuals. In this section (or our measuring instrument) causes a discretization of their value.
we shall show how difIerent types of data can be recoded into a standard form We can continue this process of discretization and define a smaller set of
so that they may be explored collectively using correspondence analysis. categories of age, say, so that each individual is recorded as belonging to one
Let us consider a general situation where data is collected on 1 individuals, of a set of age groups. This can be viewed either as the segmenting of the
where each individual i is observed or measured according to Q variables (or range of a continuous variable, or as the agglomeration of the categories of a
"responds" to Q questions). Each individual i is characterized by a vector of discrete variable. Information is clearly lost in the process. For example, if
alphanumeric data which is clearly divisible into Q distinct parts. For subsets temperature is recoded into three discrete categories: less than 10.0°C, 10.0
of variables of the same, or similar, type we might have a good idea how to to 20.0 oC and aboye 20.0 oC, then observations of 5.2 oC and 9.8 oC, say, are
analyse the data (e.g. principal components analysis of the continuous data, not distinguishable after recoding. A priori this seems a disastrous loss of
correspondence analysis of the discrete data), but this would not give us a information, while on the other hand it might turn out that this hardly affects
single analysis which describes the data globally. In order to achieve a global our final results at all, while gaining the advantage of having reduced the
description, we first need to convert all the data into a common formo Because variable to a discrete formo
all measurements may be regarded as discrete, to varying degrees, it is clear There are a number of ways to recode an observation on a continuous
that all data can be considered in a discrete framework. variable to be discrete, involving only a few categories. The one just discussed
When a discrete variable has only a "small" number of categories, we have is the coarsest way and causes the greatest loss of information for a given
aIready seen that these categories can be identified with columns of an number of categories. This loss of information can be attenuated in a number
indicator matrix containing Os and ls to indicate each individual's category. of ad hoc ways. For example, Guittonneau and Roux (1977) propose that
A variable like "age (in years)" is also discrete but has many categories, one near the boundaries of the categories the strict 0/1 coding be relaxed, as
for each year in the data set. Here it would usually be impractical to create a depicted in Fig. 5.10. This permits the indication by the coding that a value
difIerent column in the indicator matrix for each of these years, unless the is near the boundary point and not "completely" in the category. This is
number 1 of individuals was so high that there were no low frequencies in any calledfuzzy coding (codagefiou in French) as opposed to our previous logical
one year. Finally, a variable like temperature, recorded to one decimal place, coding. Quite apart from the problem of deciding how many categories we
say, is discrete on categories of temperature in tenths of a degree, although should use in recoding the variable, we now have additional decisions to
make concerning the width of the fuzzy areas around the boundaries. This
cat~g("llf categ('~lf catego~lf
width should be related to the particular characteristics of the variable, for
1 2 3 example measurement error, but as yet there has been no in-depth study of
the efIect of difIerent choices of fuzzy coding. Whether the information saved
-1-+1---,
I _
can produce noticeably improved results is still an open question.
\ : I
A difIerent recoding scheme has been proposed by Escofier (1979), who sets
._--------------- ..\ :/ / " ,\:/
¡, I I

i1\ /,
.

RECOVEV
up just 2 columns in the (recoded) indicator matrix for each continuous
SCALE
variable. Suppose that x¡q, i = 1 ... 1, are the (mean) centred and (variance)
u_uu.uu_uu;:/ ,: 1:
I ' '. standardized values of the observations on the qth variable across the
.: :
=-~.=.:_.=.t__=__;=.-;=.::_:.=.--=I-:-·--- - --
1: \ o

individuals, that is the mean and variance of the x iq are Oand 1 respectively.
t SCALE OF CONTINUOUS VARIABLE
The two columns of the recoded matrix are labelled q + and q - and the ith
(~,LO) individual (row) is coded as:
FIG.5.10. A typical example of fuzzy coding of a quantitative measurement into Ziq+ = (1 +x¡q)/2 Ziq- = (1-x iq )/2 (5.4.1)
three categories. A value just below the upper cut-off point of the first category is
recoded as g.. i. O rather than the strict logical coding of 1. O. O. Because Ziq+ + Ziq- = 1, the centroid of the subset of columns (q + and q - )
160 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis

pertaining to variable q is stilI at the origin of the eorrespondenee analysis 13 qualitative variables, 3 categories each (e.g. leaves : wide, average or
display. The masses of columns q + and q - are identical, because the Xiq are narrow?)
centered, and thus the two column points must be equidistant from their
centroid (origin). This coding essentialIy creates a pair of positive and 4 qualitative variables, 4 categories each (e.g. under the foveoles: two
negative poles for each continuous variable and the eoding reflects to what grooves, one groove, a fold or nothing?)
extent the variable lies on the positive and negative side ofthe observed mean.
Because of the mean centering, this type of reeoding is particularly suitable 1 qualitative variable, 6 categories (6 shapes of leaf)
for "interval" variables (i.e. having an arbitrary origin, e.g. temperature in
OC), or for "ratio" variables which have an origin so far from the range in 5 quantitative variables (e.g.length ofthe petals)
which we observe them that we can regard our observations as being of an
interval nature. Within a particular speeies the 33 qualitative variables are a constant, while
Centering does not have to be performed with respeet to the mean, in fact the quantitative measurements are the averages obtained from a sampIe of
any point of central tendency may be used, e.g. the median. Similarly a plants of that species. In the case of the variables with 3 categories, the second
different measure of spread may be used to standardize the observations category is always intermediate to the first and the third, and the authors
before applying the transformation (5.4.1). The column points q+ and q­ decide to code each of these in 2 columns rather than 3, with the intermediate
wilI now have different masses and not lie exactly equidistant from the category coded as (t, h Thus the first 28 variables generate 30+26 = 56
centroid. Notice that it is of minor consequenee that some of the recoded columns of the matrix Z, while the remaining 5 qualitative variables generate
values Ziq + wilI be greater than one, with their associated Ziq _ negative. This 16 + 6 = 22 columns. After inspection of the histogram of the 5 quantitative
departs slightly from our concept of the correspondence matrix as a distribu­ variables, these are each discretized into 4 categories, thus generating 20
tion of a unit of positive mass amongst a matrix of cells. An alternative coding columns of Z. Observations near the boundaries of these intervals are
which ensures that Ziq+ and Ziq- vary between O and 1 is to equate the indicated by fuzzy eoding, as in Fig. 5.10, although the authors do not report
minimum and maximum values min(q) and max(q) of the variable to the their exact scheme. The recoded indicator matrix is thus of the folIowing
values O and 1 of Zíq+, and define Ziq- = 1-ziq +, resealing the observations form:
between these extremes. This is equivalent to centering with respect to the
midpoint of the extreme values, t(min(q) + max(q)), standardizing by half the Z = [Zl Z2 Z3 Z4 Zs]
75 x 98 75 x (30+26+ 16+6+20)
range, t(max(q) - min(q», and then using (5.4.1). The only advantage of
using the mean centering and variance standardization is that we know in with Zl' Z3 and Z4 in logical coding and Z2 and Zs in fuzzy coding.
advanee that the inertia of the columns q + and q - is exactly l/Q. This is the Various correspondence analyses are now performed on Z, including
same as the inertia of a binary diserete variable (J q = 2) which has been individual analyses of [Zl Z2], [Z3 Z4] and Zs, the submatrices of Z
coded 10gicalIy in an indicator matrix, whereas we know from (5.2.3) (proved which are homogeneous. We know that the inertia of a variable inereases
in Example 5.5.2(e») that any other coding which forces the recoded values to linearly with the number of categories J q' hence an analysis of Z could yield
lie between O and 1 necessarily leads to the inertia of the variable being less features arising from the heterogeneity of inertias amongst the variables.
than l/Q. Since the total inertias of the aboye three homogeneous submatrices in their
individual analyses are "standardized" (and thus comparable) measures of
Case study: taxonomy ofa plant genus the variation in the respective submatrices, we can reweight each submatrix
in a global analysis so that its part of the inertia is proportional to its
Guitonneau and Roux (1977) consider data on 75 species of the plant genus inertia in the individual analysis. The way to reweight the submatriees (i.e.
Erodium. There are a total of 38 charaeteristics (variables) which can be groups of variables) can be obtained from Example 5.5.5 where we treat the
observed on each plant: reweighting of a single qualitative variable in an indicator matrix.
Again, in order to compare the results of different analyses, eorrelations
15 qualitative variables, 2 categories each (e.g. ridge of mericarp: feathery between the positions of the speeies on principal axes are ealculated. The
or not?) first four axes are recovered in alI the analyses with high intercorreIations
162 Theory and Applieations ofCorrespondenee Analysis 5. Multiple Correspondenee Analysis 163

and it turns out, for example, that reweighting does not change the results
dramatically. This does not mean that we need not reweight. Reweighting in F
Z
= RZr z = RZ[~n
this analysis is preferable because we eliminate possible features which are The result that the individual row is at the midpoint of its two responses is clear from
artifacts of the coding. The fact that the results are stable means that these our argument aboye, that is (where ft,n corresponds to a row with response (j,j'):
features are not strong enough to obscure our view of the "true" features in f(~.n = Hyf+rf)
the data. The topic of reweighting is discussed in more detail in Section 8.2. The average 1(~) here is the same as in (a) aboye, hence
rf = (D~) - 19f = (D~) - 2ff
5.5 EXAMPLES which is the result we require, since Dr = (D~)2.

5.5.1 Transition between rows and co/umns of a bivariate indicator (c) Row points in standard ca-ordinates, calumn points in principal eo-ordinates. The
matrix transition formula from columns to rows is (cf. (4.1.32)):
Prove the results of Table 5.2 which summarize the transition formulae between the
rows and columns in three different displays in the correspondence analysis of the
Cl>z = RZGz(D~)-2 = RZ[~~}D~)-2
1 x (J 1+J 2) indicator matrix Z == [Zl Z2]. The transition to an individual row point is thus easily seen to be:
Solution fl't,j') = (D~)-21(gf+gf)
(a) Both row and calumn points in principal ca-ordinates. The transition formula from The average 4lt) is thus (D~)-llt) (cf. (5.5.1)) and hence (from (5.5.3))
columns to rows is (cf. (4.1.16)):
4lt) = gf
FZ= RZGz(D~)-l = RZ[~~}D~)-l which is the required result.
The ith row profile (ith row of R Z) is zero apart from two values of t indicating the 5.5.2 Correspondence ana/ysis of a mu/tivariate indicator matrix
categories of the two discrete variables to which i belongs. Hence f¡z is just the average
of the vectors gf and gf, where j indicates the jth row (category) of G ~ and j' the j' th Prove the results in Section 5.2 concerning the correspondence analysis of the
row (category) of G~, followed by the rescaling (D~)- 1, i.e. inverse square root of the multivariate indicator matrix Z == [Zl .. , Zil (see p. 139).
principal inertias, along principal axes:
Solution
f¡z = (D~)-lt(gf+gf) (5.5.1) (a) 1 TZ ql = 1, i.e. there are Iones in each matrix Zq, q = l ... Q. Thus the masses oC
Now because G~ and G~ are identically rescaled versions of F and G (from the the columns of Zq add up to I/Ql = l/Q.
analysis ofN == Z'[Z2), we have the transition from Gfto Gf:
G~=RG~D;l (5.5.2)
c;
(b) If == (l/Q1)Z:1 are the masses of the columns of Zq, then the centroid oC the
column is
where R and D l' ~ertain to the analysis of N. Z q c;/1 TC; = (l/Q1)Z qZ:l/(l/Q)
The centroid 1(j) of all the points f¡z which represent rows with category j of the first = (1/1)1
discrete variable, say, is the average of (5.5.1) as i ranges over the rows with re~onses
U/) for j' = 1 ... J 2' Because j is regarded as fixed the average of the terms gj is still since ZqZ: = 1, the identity matrix. This is identical to the centroid of all the columns
gj . The average ofthe terms gf is thejth row of RG~in (5.5.2) so that of Z, which is the vector (1/1)1 of row masses.

1(7) = (D~)-11(gf+D~f) (c-e) We first prove (5.2.3). The inertia oC thejth column profile is its mass (ef) times
= 1(D~)-1(I+DI')gf squared distance to the centroid. The chi-square distance between columns is simply
proportional to the Euclidean distance because the row masses are constant. Elements
Since DI' = D~/2 and Dr = (DT)l/2, we have from (5.1.12): of the profile are either zero or l/(Qlef) corresponding to elements Oor 1 respectively
1(7) = D~gf (5.5.3) of the ¡th column of Z. Corresponding terms in the squared distance computation
are {O- (1/1W /(1/1) = 1/1 and {(l/(Qlef)) - (1/1W /(1/1) = (l/l){l/(Qef) - W respec­
which yields the result in Table 5.2 that gf = (D ~)-11(7). tively. There are Ql ef of the latter terms and 1 - Ql ef of the former so that the
squared distance is:
(b) Row points in principal eo-ordinates, eolumn points in standard ca-ordinates. The
transition formula from columns to rows is (cf. (4.1.31)): Qef{1/(Qef)-1}2+1-Qef = (l-Qef)/Qef (5.5.4)
';¡:<
~:'

5. Multiple Correspondence Analysis 165


164 Theory and Applications ofCorrespondence Analysis

The inertia is thus: 5.5.4 /nertia of a question in the presence of non-responses

in(j) = cf(1- Qcf)/Qcf = (l/Q) - cf In an opinion survey 1 individuals respond to a questionnaire consisting of Q
questions, each of which has only two possible responses. A number of non-responses
(5.2.2) follows by summing in(j) over j = jq = 1... J q: are recorded in the survey and all the results are coded in a multivariate indicator
in(Jq) = J q(1/Q) - (l/Q) = (J q- l)/Q matrix Z == [Zl ... ZQ], where each Zq (q = 1... Q) consists of 3 columns labelled qa, I
1
qb and q*, corresponding to responses a and b and the non-response respectively to I
since the masses of these columns add up to (l/Q). question q. Let hqa , hqb and hq• denote the relative frequencies of these 3 respective 1

(5.2.1)follows by summing in(Jq)over q = 1 ... Q: possibilities amongst the 1 individuals. Now suppose that a non-response to question
q is coded in general as !1. q, {3q and ~q in the 3 respective columns, where!1. q + {3q + ~q = 1.

in(J)=J/Q-1 In terms of these values and the relative frequencies of response, evaluate the total

inertia in(J) of the c10ud of column points and the inertia in(Jq) of the subcloud of
(f) Each c10ud of J q column profiles has the same centroid and occupies a points associated with question q. Also give the results for the special cases, firstly

subspace of dimensionality at most J q - 1. AII J profiJes thus occupy a space


when ~q = 1 and secondly when ~q = O,!1.q = {3q = t.
of dimensionality at most Lq(Jq- 1) = J - Q.

(g) The transition from columns to rows is: Solution

The aboye coding of a non-response preserves the grand total of Z as QI. In our

F Z = R zG z (Df)-1/2 previous notation we might denote the column masses of Z by cj (j = 1... J), say,

so that column totals of Z are Qlc j . Our present notation involves the relative fre­

where the ith row profile in R Z is a vector of zeros and Q values of l/Q indicating the
quencies of response (and non-response), for examp1e hqa is the number of times

responses jl ...jQ, sayo Thus f¡z is the average (l/Q)(gf.+ ... +gj~) of the column
response a is given to question q, divided by the "sample size" (number of rows) l.

points corresponding to these responses, followed by the usual rescaling along


Thus Qlc qa , the sum of column qa, is equal to Ih qa + I!1. qhq•. The following relationships

principal axes by the inverse square roots of res~ective principal inertias (cf. (5.5.1),
between the column masses cqa , cqb and cq• of the columns of Zq and the relative

where Q = 2). The centroid of all row profiles f¡ which have response ji = j to the
first variable, say, is of the (row vector) form: frequencies hqa , hqb and hq• are easily deduced:

cqa = (l/Q) (h qa + !1. qhq.), Cqb = (l/Q)(h qb + {3qhq.), cq• = (l/Q)~qhq.


(l/Ql{(gf)T + jth row ofR I2 Gf+ ... + jth row ofR IQG5}
As in Example 5.5.2, the centroid of the columns of Zq is identical to that of all

This yields the required result, using the principal co-ordinate form of (5.2.8).
the columns of Z, namely [l/l ... l/If. The inertia of column qa, say, is computed

as cqa times its squared distance to the centroid, involving Ihqa terms of the form

(1/I) {l/Qc qa ) _1}2, Ih qb terms of the form 1/1 and Ihq• terms of the form

5.5.3 Correspondence ana/vsis of a Burt matrix {(!1. q/(Qlc qa )) - (1/I)2}/(1/ I) = (1jI) {(!1. q/(Qc j )) - W (cL the simpler calculation in

Examp1e 5.5.2). The inertia is evaluated as:


Let Z == [Zl'" ZQ] be a Q-variate indicator matrix and let B == ZTZ be the
associated Hurt matrix. Show that the standard co-ordinates r B of the columns (or in(qa) = (l/Q){l-(hqa+!1.qhq.)-hq.!1.q(l-!1.q)/(hqa+Otqhq.)} (5.5.5)
rows) in the correspondence analysis of B are identical to the standard co-ordinates and, similarly:

r Z of the columns of Z. Furthermore, show that the principal inertias in the two
respective analyses are simply related as follows: AB = (A Z)2. in(qb) = (l/Ql{ 1 - (h qb + {3qh q.) - hq.{3q(l - {3q)/(hqb + {3qhq.}} (5.5.6)

The squared distance between column q* and the centroid involves I(hqa + hqb ) terms

Solution of theform (1/I) and Ih q• of theform {(~q/(Qlcq.))- (l/I¡}2 /(1/I) = (l/Il{ l/h qJ- W·

The squared distance is thus (1/h qJ-1 and the inertia of column q* is cq• times this

U sing the eigenequation (4.1.23) for the standard column co-ordinates of Z, we have:

quantity, leading to the following expression:

(CzRz)rz = rZDf where (rz)TDezr z = I

in(q*) = (l/Ql{~q-~qhq.} (5.5.7)


The row profile matrix R Z is simply (l/Q)Z, while the column profile matrix C Z is

(QID;)-IZT. Since the column masses of B are identical to those of Z, the aboye The inertia of the J q = 3 columns qa, qb and q* is the sum of the expressions (5.5.5-7):
eigenequation can be written as: in(Jq) = (l/Ql{ 1+ ~q - hq.Otq(l - Otq)/(h qa + Otqh q.) - hgo{3q(l - {3q)/(hqb + {3 qhq.l} (5.5.8)
(Q 2ID:)-IZ TZr Z = rZDf The special case of ~q = 1 (i.e. !1. q = {3q = O) simplifies as in(Jq) = 2/Q, which is not
which is precisely the transition formula in the analysis of the Hurt matrix. Hence surprising because this is a special case of (5.2.2) where the number of categories
r Z = r B and Df = (Df)I/2. J q = 3. The total inertia is then in(J) = 2.
i,
166 Theory and Applications ofCorrespondence Analysis 5. Multiple Correspondence Analysis 167

When .;. = O, IX. = Pq = t, (5.5.8) simplifies as: (3) Notice that because question 1 has no non-response, its inertia remains a constant
in(J.) = 1/Q{1-h••/4h~-h ../4(1-h~)} 0.2 in each of the three analyses above. As a percentage, however, its inertia varies
from 11.1 % to 21.8 %, depending on how the non-responses in the other questions are
where h~ "" h. a + th••, hence 1- h~ = hqb + th••. This simplifies further as: recoded. In the first analysis, where .;. = 1 (q = l ... Q), the contrast in inertias is the
¡n(Jq) = (l/Q) {1-h.j(4h~(l-h~))} greatest. We might want to eliminate this peculiarity of the coding scheme by
(5.5.9)
increasing the mass of the first question in this analysis. This is done simply by
The total inertia is then: mu1tiplying columns la and lb of the recoded indicator matrix Z by 2, i.e. code "male"
in(J) = 1- (1/4Q) L.h.j(h~(l- h~)) as 2 Oand "female" as 02, which will double the mass of question 1 so that its inertia
(5.5.10)
is identical to that of the other questions. To prove this, notice that the profiles of
Comments columns la and lb as well as their centroid are unaffected by this increase in mass.
The masses ofthe other co1umn profiles are decreased so that each subset of columns
(1) The inertia in(J.) of question q is at its highest when ~. = 1, IX. = p. = O, that is
corresponding to a question q has a total mass of 2/( Q + 1) for q = 1, and 1/(Q + 1) foc
when the non-response is coded fully as an extra response. As mass is transferred from
q = 2 ... Q. where Q = 5. Since the masses of columns la and lb have been doubled
column q* to columns qa and qb the inertia must decrease, as all three columns come
without affecting the squared distances to their centroids, it is clear that the inertia of
closer to the centroid [l/l .. . 1/1]T. We are, of course, more interested in the relative
question 1, as well as the inertias of the other questions, will now be 2/(Q + 1). In the
values of in(J.). When .;. = 1 the values of in(J.) are the same, 2/Q, for all q. At the
next example we prove a more general result concerning the reweighting of questions
other extreme when ~. = O and IX. = p. = t, say, we can deduce from (5.5.9) that the
(see also Example 5.5.S and Section 8.2 on the reweighting of questions).
value of in(J.) is highest when h•• is the lowest and h•. is t, i.e. hq• = h.b, in which case
in(J.) = (l/Q) (1- h•.). In other words, for a fixed frequency of non-response h•• across
the questions, the inertia of a question decreases as polarization of response increases.
The degree ofpolarization, measured by 1/{ h~(l - h~)}, is a quantity which is discussed 5.5.5 Reweighting of discrete variables
further in Section 6.1. Suppose that Z == [ZI ... ZQ] is a multivariate indicator matrix in its most general
form, i.e. the submatrix Z. corresponding to variable (or question) q consists of rows
(2) We can illustrate Ihese results in the analyses displayed in Figs 5.6. 5.8 and 5.9. of non-negative numbers which sum to 1 (Iogical or fuzzy coding). Let in(q) denote
From the marginals down the diagonal of Table 5.4 we can obtain the re1ative the inertia of the qth subset of J. columns of Z (note that the notation in(q) is
frequencies h. a, hqb and h•• (q = 1 ... 5): equivalent to in(J.) in the previous example). Suppose we wish to reweight the
q=l q=2 q=3 q=4 q=5 variables so that their inertias are proportional to pre-assigned values VI VQ. Show
that this is achieved by the following rescaling of the submatrices:
hq.:
h. b :
0.58
0.42
0.54
0.38
0.66 0.41 0.25 Z; = {v.fin(q)}Z. (5.5.11)
0.18 0.52 0.70
h•• : 0.00 0.08 0.16 0.07 0.05 Solution
The values of h~, 1- h~ and the polarization factor 1/{4h~(l - h~)} used in (5.5.9) are: The J. columns of each submatrix Z. have the same centroid [l/l .. . 1/1]T, and this is
the centroid of all the columns of Z. This property is unchanged when the columns of
q=l q=2 q=3 q=4 q=5 Zq are rescaled by a factor 'q (q = 1 ... Q). In general, letZ; '.Z.
h~: 0.58 0.58 0.74 0.445 0.275 now ,.g,
The sum of all the elements of Z;is thus '.1, == and let , == L.'•.
so that the mass of the qth variable is
compared to the constant mass of l/Q for all the variables of Z. The
1-h~: 0.42 0.42 0.26 0.555 0.725 distances from the column points of Z* to the centroid are identical to those of Z, as
1/{4h~(1 -h~)}: 1.03 1.03 1.30 1.01 1.25 the profiles and the metric are exactly the same. Therefore the inertia of the qth
For each of the three analyses the total inertia in(J), the inertias of the questions in(Jq) variable (i.e. qth subset of column points) of Z* is «(.fW(l/Q) times the inertia in(q),
and their values as percentages of in(}) are evaluated as: which can be written as:
q= 1 q=2 q=3 q=4 q=5 in*(q) = in(qK.Qg
total
It is clear that the values in*(q) are in the same proportion to each other as in(qK.,
~. = 1, IX. = p. = O: 0.200 0.400 0.400 0.400 0.400 1.800 hence we should use scaling factors '. = v./in(q) in order that the inertias in*(q) be
(Fig.5.6) (11.1%) (22.2%) (22.2%) (22.2 %) (22.2 %) proportional t.o given values v•.
.;. = t, IX. = P. = ±: 0.200 0.287 0.264 0.289 0.289 1.329
(Fig. 5.9; using (5.5.8)) (15.0%) (21.6 %) (19.9 %) (21.7%) (21.7 %) Comments
(1) When the elements of Z are Os and ls only (1ogical coding), as in Section S.2, we
.;. = O, IX.= p. = t: 0.200 0.184 0.158 0.186 0.190 0.918 have seen that the inertia of the qth variable is (J. -l)/Q, where J. is the number of
(Fig. 5.8; using (5.5.9)) (21.8 %) (20.0%) (17.2%) (20.3 %) (20.7 %) categories of this variable (or number of different responses to question q). In order
168 Theory and Applications ofCorrespondence Analysis

to equalize the inertias of the variables we can multiply the qth subset of variables by
any convenient value in proportion to 1/(Jq -1). For example, if Q = 3 and J 1 = 2,
J 2 = 3, J 3 = 4, then we know that the inertias will be in(l) = 1/3, in(2) = 2/3 and
in(3) = 1, with a total inertia of 2. To maintain integer values in the reweighted matrix
Z· it is convenient to multiply the 1st set of 2 columns of Z by 6, the second set of 3
columns by 3 and the third set of 4 columns by 2. This is equivalent to coding
observations on the three variables by values 6, 3 and 2 respectively instead of ones.
The new inertias are all equal to 6/11, with a total inertia of 18/11.

(2) There is a parallel between reweighting the (discrete) variables, as we have


described above, and the rescaling of continuous variables which is common practice
00
in more conventional statistical analysis. In both cases we are adjusting the original
data vectors multiplicatively in order to achieve a pre-assigned amount of "variation"
-inertia in the present case, usually variance or range in the case of continuous data.
Correspondence Analysis of
However, there is a subtle difference between these two situations. In correspondence
analysis the positions of the profile points are unchanged, only the masses assigned to
Ratings and Preferences
the points are adjusted. On the other hand, in principal components analysis, for
example, the standardization ofthe variables to have unit variance affects the position
of each point representing a variable. This distinction might seem trivial because the
variance standardization can be thought of as assigning different masses to the
variables. The important point here is that this is not the common way of approaching
the problem geometrically, whereas in correspondence analysis a c1ear distinction is
made between the concepts of mass and position. The collection of data in the form of ratings is frequent in the social sciences,
The reweighting of points is discussed further in Section 8.2. where measurements are attempted on non-physical concepts such as aggres­
sion (psychology), knowledge of a subject (education), confidence in the
political system (sociology), economic potential of a country (economics) and
pain (clinical research). The format of the data is typically a matrix of 1 rows
(subjects) by Q columns (objects, or variables), where each variable is
considered to be ordinal, that is for the qth variable (q = 1 ... Q) mq responses
are possible, conventionally numbered either by 1 to mq or by Oto mq - 1, with
these responses having an inherent ordering. Because these ordinal variables
each have an upper and lower extreme, or "pole", we call them bipolar
variables.
Within a particular study it is common for the set of variables to be rated
"on the same scale". For example, in an opinion survey people might be asked
whether they (a) agree, (b) are undecided or (c) disagree with a set of Q
statements. Each subject responds to these statements on a "3-point scale" of
agreement, where we could assign scale values of 1, 2 and 3 to the responses
(a), (b) and (c) respectively. An important point about such data, which is so
obvious that it is often overlooked, is that the way the results are analysed
should be invariant to the "direction" of the statements. For example, if one
of the statements is: "The present government has a sound economic policy",
then it should really make no difference if the statement had been rephrased
as: "The present government does not have a sound economic policy". A
rating of agreement here is equivalent to a complementary rating of disagree­
ment on the former statement, although it might be argued that there is a
170 Theory and Applications of Correspondence Analysis 6. Ratings and Preferences 171

psychological difference between the two ways of asking the subject to In Section 6.1 we introduce the concept of "doubling" bipolar data so that
respondo the matrix to be analysed is a special case of a multivariate indicator matrix.
\.~ In order to allow the respondent more freedom of choice in describing his The row and column geometry of correspondence analysis in this case is
feelings, the scale might be made 5-point, for example, by introducing inter­ shown to depend on what we term the polarization of the observations and
mediate possibilities of "disagree slight1y" and "agree slight1y". A problem of their average on each variable. Similarities and differences between the
with the analysis of such data is the choice of the original scale values: why correspondence analysis of the doubled data matrix and other applicable
do we not choose - 2, -1, O, 1,2 for the scale values, why not 0,4, 5, 6, lO? techniques, such as the vector and unfolding models and principal com­
Two alternatives to the above system of collecting bipolar data are ponents analysis, are discussed in Section 6.2. The chapter again conc1udes
possible, one very restrictive and the other very unrestrictive. The first is to with sorne theoretical examples (Section 6.3).
ask each respondent to rank the set of statements rather than assess each one
(c1early this is only possible when each scale of response is the same, for
example agree/disagree). In the above example the respondents would have 6.1 DOUBLlNG AND ITS ASSOCIATED GEOMETRY
to say which of the statements they agreed with the most, which second most,
etc., until the statement they disagreed with the most. In the case of a market
An examp/e
research survey people may not be asked to assess individual products but
rather to look at the set as a group and rank them in order of preference. This In order to introduce the correspondence analysis of bipolar data, let us
system is very restrictive because the subject is not allowed to express dislike consider the 7 x 4 data matrix of Table 6.1(a). These are artificial data which
for all the products, for example. On the other hand, it might very well be the we presume to be average ratings of 4 candidates in an election by 7 groups
intention of the market researcher to force an order of preference from a of subjects. For example, suppose that the rows represent different socio­
respondent in a situation where the information in relative terms would economic groups and that the candidates A, B, C and O are c1assifiable
otherwise be very limited. Another possibility, which is desirable when a large politically as left, independent, centre and right respectively. The original
number of objects have to be ranked, is to ask the subjects to choose small rating scale was O(totally unfavourable), 1,2,3,4,5 (undecided), 6, 7, 8, 9,10
subsets of most preferred and most disliked objects. (totally favourable)-an ll-point scale. The first group of electors thus gave
The second alternative is to allow the respondents great freedom in the leftwing candidate A an average rating of 2.2.
expressing their feelings. Ratings are made on a quasi-continuous scale, say Figure 6.1 shows the 2-dimensional correspondence analysis of these data,
from O (totally against, say) to 10 (totally in favour), allowing first decimals: which is an almost exact representation of the row and column profiles. This
for example, 8.3. In effect this is a 101-point rating scale. Sometimes such data display is a good indication of the political tendencies of the 7 groups of
are collected by asking subjects to mark their response on a given straight electors, with groups 6 and 7 tending out with leftwing candidate A as
lineo We have found such ways of collecting data to work well when the opposed to group 1 tending out with rightwing candidate D. However, there
respondents are themselves numerically minded and trained as respondents, are two apparently minor changes in the way the data were collected which
whereas we doubt whether such a scheme would appeal to the general publico could dramatically affect this analysis.
In this case the presentation of a discrete scale with descriptive naming of a In the first instance much of the success of the analysis is due to the fact
small number of scale points has c1ear advantages. that the 7 groups represent a fairly complete spectrum of opinion. The
In Chapter 5 we discussed how "optimal" scale values for the categories of situation changes quite dramatically if groups 1,2 and 3 are omitted so that
response can be derived, by treating each category as a separate column of a the remaining electors represent a predominantly "leftwing" preference. The
multivariate indicator matrix. In that case the categories are treated like correspondence analysis ofthe last four rows ofTable 6.1(a) strings out these
those of nominal variables and the inherent ordering of the categories is four groups across the display and group 4 is seen to be c1early on the
ignored (unless an analysis under order constraints is performed, cf. Section "rightwing" side (Fig. 6.2). This illustrates how correspondence analysis
8.4). In the present chapter we shall go to almost the opposite extreme by displays the positions of the individual groups relative to the spectrum of
accepting the scale decided upon by the researcher. At least, we shall accept groups inc1uded in the analysis- there is still a left-to-right dimension
the two extreme poles of the scale and the way the researcher has decided to amongst these four groups and group 4 is more to the right than the others.
gradate the categories between the two poles. Secondly, the analysis is not invariant to the direction of the rating scale.
6. Ratings and Preferences 173
Ol

-
e
~
U?
ro
'O
r'-en'<tf';co(v)--:
lf)...-,.....-oocococn
,.....-,--('.J,.....-,.....-C'JN
r'­
~

'<t
7
• .. A 3

.+....
D

­
..c
Y-
QJ f-

ONO~roOO 6.
5 4

..
..C 2.
'"ro I c:iNLf'ic6r--:c:ic:i B
Ó O '<t
FIG 6.1. 2-dimensional correspondence analysis of Table 6.1 (a)
E
2 (V)NOCO(V)--:O LD

U? I (V)~LD'<tLDr'-ro '<t
"O
QJ
i!:l U (V)
Ü
~
ro
"O
'6
7
• ..A 4
..D
e 5
.E ro I CO ro '<t. LD LD r'- (V) ro
::::J
'<tNCOLDLDCOro en B .. c
'"
QJ
U CO (V)

6 ..
U?
FIG.6.2 2-dimensional correspondence analysis of the last four rows of

Ol ror'-OLDOLDro (V)
e I Table 6.1 (b)
r--:Lf'iLf'iNc:iNN CO

~
« N

'oC
~.º If each of the ratings in Table 6.1(a) is "refiected" by subtracting it from

QJ U
- QJ ,.....-NMo:::::tLDCOr""-­
'" 10 (Table 6.1(b)) so that the value O corresponds to totally favourable and
Q) .=: ro
_~ "O
'O 10 to totally unfavourable, then the correspondence analysis would change
- e f­
~
. ..c
(1)'­ character completely: Groups of electors would be associated with the
cof-QJ
"O -o I SJOlJ818 ~o sdnoJ8
~-.~
candidates they do not like and would be displayed in opposite directions to
([J -o QJ
the candidates they favour. This is counter-intuitive because we are used to
«~> U? (V)~co(V)'<tr'-en (V)
f-ui~ ro --icOcri~~Mo ro interpreting a display in a positive sense, as we interpreted Fig. 6.1. Of course
~
o QJ
~ 'O NN,.....-NN,.....-,.....­ (V)
Ü <J) f­ the reason is simply that Table 6.1(a) can be viewed as measuring positive
QJ :;;
a:; '" association between the groups of electors and the candidates, whereas it is
Y-~
o ro ~.-.J+ orooenNOo en
more difficult conceptually to fit the measures of negative association, in
'':::'0 c:ir--:Lf'i<YiNc:ic:i ro
'" U N
0.'" Table 6.1 (b), into our idea of a correspondence matrix as a two-way distribu­
::::J
2 tion of mass. However, as we shall show in Section 6.2, we can think of the
Ol
r'­ U? + r'-roO'<tr'-eno LD
values in Table 6.1 (b) as dissimilarities, or distances, in which case the tech­
COCOLDLD'<tNN LD


-o ~ U (V) nique of multidimensional unfolding can be used to display the data.
"O
'" In order to take into account the absolute nature of the ratings and the fact
i!:l
ro
"O
e
'<tNCOLDLD(V)r'- N that they are bipolar, correspondence analysis may be applied to the doubled
"O
"O
U CO+ Lf'ir--:<Yi<i<i<Yi""": c:i
(V) data matrix comprising both the original and refiected forms of the data. In
e
ro
U the aboye example, Tables 6.1(a) and (b) would together form the 7 x 8
'<t
doubled data matrix, and there would thus be two columns for each
o U? L.~
+
<!
N(V)OLDOLDN

N'<tLDr'-Or'-r'­
r'­
(V) candidate: a "+" column indicating the measure of positive association
'<t
Ol
e between the electors and the candidate, and a "-" column indicating the
1§ complementary measure of negative association (e.g. A + and A -). The
QJ
Ol ~ N (V) '<t LD CO r'­ .!!!. correspondence analysis of the doubled matrix and of the last 4 rows of the
~ ro
QJ 'O doubled matrix are shown in Figs 6.3 and 6.4 respectively. The positions of
> f­
« the electors and the "+" candidate points in Fig. 6.3 are not much different
ro ro I SJOP818 ~O sdnoJ8 to the points of Fig. 6.1, although there is a separation of the elector cloud
and the "+" candidate cloud in the doubled analysis (cf. unfolding analysis
174 Theory and Applications ofCorrespondence Ana[ysis 6. Ratings and Prejerences

A­ explore possible associations between "undecided"s (category 3) for a certain


• question and the other responses, which might not be clear in the bipolar
7
• c­
• 8-

:3
e
1
e approach.
6 0+
]
• • Geometrv of a doub/ed matríx
•D­ Let us consider a typical pair of doubled co1umns which we index by q + and
4
A+ • •
c+ 2
• q - respectively, that is the values in these columns are Yiq and t q - Yiq
•.5 I ·8+ respectively, i = 1... 1, where t q is the "upper bound" of the qth bipolar
variable, q = 1 ... Q. The row sums of this matrix are equal to a constant
FIG.6.3. 2-dimensional correspondence analysis of the doubled data 01 Table t. = Lqt q, hence the row masses are al! equal to 1// (Fig. 6.5(a)). The column
6.1. that is ofTable 6.1 (a) and 6.1 (b) combined ínto a 7 x 8 matrix. sums of the qth pair of columns are Y. q and lt q - Y. q respectively, where Y. q is
the sum of the ratings Yiq across the subjects i = 1 ... 1, so that the column
A­ masses cq+ and cq_ are Yq/t. and (tq-Yq)/t. respectively, where Yq is the
• average Y. q/1. The pair of column masses thus add up to tq/t.. If the tq are the
same for al! q, the pair of columns q + and q - jointly receive equal mass:
cq + +Cq _ = l/Q,foral!q = 1...Q.
•0+
Because the two columns of a doubled pair sum to a constant column
4
• vector it is clear that the centroids of each pair of column points q + and q­
are aH at the origin of the correspondence display, which is the centroid
~~.
e
c+:..-
7 8­ .
r = [1/1. .. l/J]T of al! the columns. Thus with respect to any dimension of
6 •o_ A+ ·8+ the space of columns the points q + and q - are balanced at the origin with
• e5 the lighter point lying proportionaHy further away from the origin (in
mechanics this is known as the law of the lever). Thus if gq+ and gq_ are the
FIG.6.4. 2-dimensional carrespondence analysls al the last four rows al the
doubled data matrix.
(a) q+ q- lb)
¡ j

of Fig. 6.8). Figure 6.4 is quite a lot difIerent to Fig. 6.2, though, and elector I

4 is seen to be relatively unaligned while 5 tends out in the direction of "left­ I


I
, q­
wing" (A + ) and 6 and 7 tend out along with B -, e - and D -, that is I
,
I
---­ -r-­ --\------­
"anti-rightwing". I Yi. -J',;¡ 11.
ó...
el ~---P Doubling establishes a symmetry between the two poles of each bipolar - - - - - -1 - -
I
,
- .L -
I
I
- ~ - -

variable and the correspondence analysis is invariant with respect to the I I


I sr.+
choice of scale direction. Each subject's "response" is treated as a positive I
origín sr.­
mass divided between the two poles, analogous to a pair of probabilities i óv lcentroid)

assigned to each poleo Hence this is really a special case of a multivariate V


q+
indicator matrix with J q = 2, q = 1 ... Q, where aH the abservations are
recorded using a type of fuzzy coding (Section 5.4). Similarly, a question
y .• !l.-y. 11.
which is answered on a 5-point scale, say, can be considered 5-polar with a
response allocating positive mass to one of these poles. Whether we replace
FIG.6.5. Basic geometry 01 correspondence analysis 01 a doubled matrix 01
such a question by a set of 5 new variables or by a set of 2 depends on how ratings: (a) a typical doubled matrix. where each original variable is represented
much we assume and what sort of pattern we are interested in observing in by a pair 01 columns. and the marginal totals; (b) the positions 01 a typical pair 01
the data. For example, the 5-polar approach would be useful if we wished to column points q+ and q- in the lull space. and thelr projections anto any aXIs.
176 Theory and Applications ofCorrespondence Analysis 6, Ratings and Preferences 177

co-ordinates of these two points with respect to any axis of the column space see three pairs for which the origin marks off approximately the same ratio of
(Fig. 6.5(b)), then: longer to shorter distances max {d q+, dq_ }/min{dq+, dq_}. This means that the
Cq+g q++cq_gq_ = O (6.1.1) quantities 1!q(I-1!q) are the same in each case so that the lengths between
opposite poles are proportional to the standard deviations Sq/t q of the
(Notice that most of these results are just special cases of the more general "fractional" ratings, i.e. the ratings divided by their respective t q . Thus the
results obtained in Chapter 5 for multivariate indicator matrices.) polarization is the same for each variable, but the variance of opinion is
Furthermore it is easy to show that the respective distances dq+ and dq_ highest for question 2 and lowest for question 1.
between the two points q+ and q - and the origin in the full space are equal If the lengths between opposite points are the same but the origin divides
to the coefficient of variation (standard deviation divided by mean) of the these lengths into different ratios then the relationship between standard
respective columns of the doubled matrix. For example, the squared distance deviation and polarization is such that the least polarized set of responses
between the q + column and the centroid in the metric D,-1 = 11 is: must have the highest standard deviation. Thus in Fig. 6.6(b) the variance of
IL¡Ú'¡q/Y. q-1/1)2 = IL¡{ Ú'¡q/y.qV - 2y¡q/IY. q+ 1/1 2} question 4 is higher than that of question 5. If both total lengths and ratios
of the subdivisions are different, as is often the case, then there is an inter­
= I{(Is;+Iy;)/I 2y; }-2+1 action between standard deviation and polarization in (6.1.3) to give the total
= s;/Y; (6.1.2) length, which increases both by increasing standard deviation and by
increasing polarization,
where s; is the variance (L¡Ytq-1y;)/I of the ratings. Thus dq+ = Sq/Yq and,
The variation of each pair of doubled columns is measured by the sum of
similarly, dq_ = Sq/(tq-yq). The sum ofthese distances is thus:
their inertias:
dq++dq_ = (sq/t q)/{1!q(I-1!q)} (6.1.3)
in(q) = cq+d;+ +cq_d;_
where 1!q =. yq/tq and 1- 1!q = (t q- yq)/t q, so that their product 1!q(l- 1!q) is = {Yq(Sq/Yq)2 + (t q- Yq) (Sq/(t q- yq)V}/t.
inversely related to what we call the polarization of the average of the qth
question or variable, with low 1!q(I-1!q) indicating high polarization and = (tq/tJdq+dq_
high 1!q(l- 1!q) (when 1!q is near to i) indicating low polarization. We = (tq/tJ (Sq/t q)2/{1!q(I-1!q)) (6.1.4)
formally define the polarization of the average (or polarity) as the quantity
Thus this inertia depends multiplicatively on the mass attached to the
1/{1!q(I-1!q)}, which is thus greater than or equal to 4.
variable as indicated by the relative length of the scale, (tq/t.), the variance of
Let us suppose that we have an almost exact representation of the doubled
the "fractional" ratings, (Sq/tqV, and the polarization 1/{ 1!q(1 - 1!q)},
data matrix in a 2-dimensional display, so that we can ignore for the moment
The angle between two lines joining opposite points indicates the correla­
the type of approximation implied when this display is less accurate, and let
tion between the two respective sets of ratings. The cosine of the angle
Fig. 6.6 depict a few typical pairs of doubled column points. In Fig. 6.6(a) we
between point vectors q + and ij +, for example, is their scalar product
divided by the product of their lengths (remembering that the scalar product
is in the metric D,-1 = 11):
5+ 4+
2+ ",
1­ cos e = 1 L¡Ú'iq/Y.q -1jI) Ú'iq/y'q-ljI)/(d q+d<i+)

3+~ 1+
= {(LiYiqYiq)/(IYqYq) -1-1 + 1}/{ (SqSq)/(YqYq)}
2­ 4­ = (1jI) {LiYiqYiq- IYqYq}/(SqSq) (6.1.5)
= correlation between columns q + and ij +

Let us now turn our attention to the dual geometry of the row points. The
FIG.6.6. Examples of pairs of column points: (a) different lengths between the
row profiles ofthe data matrix of Fig. 6.5(a) define 1 points in 2Q-dimensional
points of each pair, but the origin cuts off sublengths in the same proportions: (b)
same lengths between the points of each pair, but the origin cuts off sublengths in space. Because there are Q linear dependencies amongst the columns, the
different proportions dimensionality of the rows is actually equal to Q, just one dimension more
178 Theory and Applications ofCorrespondence Analysis 6, Ratings and Preferences 179

than the dimensionality of the row profiles of the undoubled matrix. Equal where
masses of 1/1 are assigned to each row profile and the chi-square metric · l" _ polarization of average rating
re latIve po anzatlOn - .. f'" l .
between the row profiles implies that the squared distance between subjects i polanzatlOn o mdIvIdua ratmgs
and i' is: = (1/1)~¡1t¡q(1- 1t¡q)/{ñq(1- ñ q)}

df¡, = (1/tJ~q {(y¡q - Y¡'q)2 /Yq+ (tq - Yiq - tq + Y¡'q)2 /(tq - Yq)} These results tie up with those of Example 5.5.4 (cf. (5.5.9) and comments 1
= (1/t)~q(y¡q - Y¡'q)2 tq /{Y q(t q- Yq)} and 2). For a fixed polarization of individual ratings, the inertia of a variable
actually decreases when the average is more polarized. For a fixed polariza­
= ~q(tq/tJ (1t¡q - 1t i'q)2 /{ ñi1- ñ q)} (6,1.6)
tion of the average, the inertia increases when the individual ratings are more
where 1t¡q is defined as the fractional rating Y¡q/t q, Here the fractional rating is polarized, with a maximum inertia of tq/t, (the mass of the variable) when
analogous to a probability: subject i is at pole q + with a "probability" of 1t iq there are individual ratings only at the poles. Many other results of Chapter
and at the opposite pole q - with a "probability" of 1- 1t¡q' Thus the squared 5 carry over to the present situation, for example the reweighting of the
distance is a sum of Q terms where each term depends multiplicatively on the variables in Example 5.5.5.
relative mass (tq/t J of the question, the squared difTerence in the fractional
ratings and the polarization. If the ratings were all on a 2-point scale so that
1t¡q could only be either Oor 1, then clearly the variance of the 1t¡q (i = 1 ... 1) 6.2 COMPARISON WITH OTHER SCALlNG TECHNIQUES
would be equal to ñ q(1-ñ q), the familiar variance of the Bernoulli variate,
so that the distance between rows would be equivalent to the ordinary Vector and unfo/ding mode/s
Euclidean distance between rows of standardized data. When the rating scale
is generalized to more than 2 points, then ñ q (1- ñ q ) overestimates the In the literature there are two "classic" geometric frameworks for displaying
variance ofthe 1t¡q by an amount (1/1)~¡1t¡q(1-1t¡q): matrices of ratings and preferences, called respectively the "vector model" (or
"scalar product model") and the "unfolding model" (or "distance model"). A
(1/1){ ~i1tfq - 1ñ;} = ñ q(1- ñ q)- (1jI)~i1t¡i1-1t¡q) (6.1.7) good introduction to these models in the context of preference data is given
In the case of a 2-point scale where either 1t¡q or 1- 1t¡q is zero, each 1t¡i1- 1t¡q) by Carroll (1972). (A revised version of this paper is given by Carroll (1980),
is zero and hence their average is also zero. In general 1t¡q(1- 1t¡q) is a while an excellent taxonomy and literature survey of multidimensional
maximum of ! for an individual observation which is least polarized, i,e. scaling is provided by Carroll and Arabie (1980).)
1t¡q = t. Analogous to our definition of polarization of the average ratings we Both models aim to represent the rows and columns of the data matrix as
define the polarization ofthe individual ratings as the inverse of this average: points in a joint space of low dimensionality, but their interpretations of
1/~¡1t¡q(1-1t¡q), which is also greater than or equal to 4. We can then rewrite
interpoint positions are difTerent. For example, Fig. 6.7 depicts possible
(6.1.7) for the qth bipolar variable as: situations in 2-dimensional space where we have shown the positions of 5
variables (often called "stimuli" in this context) and 2 respondents (subjects).
(Sq/t q)2 = (1/polarization of average rating) The vector model attempts to represent the stimuli as points and each subject
- (1/polarization of individuals' ratings) by a vector through a fixed origin such that the perpendicular projections of
the stimuli onto this vector produce a set of values which approximate the
The polarization of the individual observations, which must be greater than (centered) data vector of that subject (Fig. 6.7(a)). The unfolding model
the polarization of their average, summarizes how near the poles the attempts to represent both stimuli and subjects as points so that the set of
individual observations lie. This can be very high if all the responses are distances from the variables to a particular subject approximates the data
extreme ones (e.g. strong disagreement or agreement) or very low if all the vector of that subject. The set of distances from the 5 stimuli to the point 1 in
responses are "intermediate" (e.g. undecided). From (6.1.4) the inertia of the Fig. 6.7(b) can be thought of as a folding of this fan of lines about the pivot
qth variable is related to the ratio of the polarization of the average to that point at 1 onto a common.line, hence the term unfolding to describe the
of the individuals (which we call the relative polarization) as follows: reverse process of taking the data vectors and opening out these fans to
in(q) = (tq/tJ (1- relative polarization) obtain the display. Thus for any line with origin at 1 the set of distances is
180 Theory and Applications ofCorrespondence Analysis 6. Ratings and Preferences 181

(al 3

7
• 4

,
,
5
I
\'''\ e
o
. !

A 2
!
o

subject 6
2 • .
c
(b) .
8
..,_E

,/
,":' .........

_'_-~';,. subject 1
I scale
0.2
I .B
A .-, - -;,'- - -C......... //'
. /
FIG.6.8. The "true" underlying positions of electors and candidates which
",­ /
/

generate the distances equal to the reflected ratings in Table 6.1 (b)
.:
..•...:.... "o
subject 2 which depend only on the ordering of the data. In the context of multi­
~ ~ qf..i ., dimensional scaling these are known as "non-metric" techniques. Hence a
non-metric unfolding will try to represent the subjects and objects so that
_Lff_~_~ •2 distances from the variables to a subject are ordered as similarly as possible
to the subject's data vector. The non-metric approach is in many ways a more
FIG.6.7. The two "classlc" geometric models for preference and ratings data: (a) complicated context for analysing the data of interest here and we shall not
the vector model. where the perpendicular projections of the so-called "stimulus"
points (usually the columns of the data matrix) onto the subject vector approxi­
pursue its discussion. For an introduction to non-metric scaling the reader is
mate the subject's (centered) data; (b) the unfolding model. where the distances referred to Greenacre and Underhill (1982, Sections 4 and 5).
from the subject to the set of stimuli approximates the subject's data (the "folded" Although we do not consider multidimensional unfolding in any detail in
distances are indicated below). this book, it serves as an instructive comparison with correspondence
analysis. In correspondence analysis there is a "barycentric" relationship
obtained by what might be described as "circular projection" of the stimulus between the row and column points in the display, as defined by the relevant
points onto this lineo In the vector model the set of distances is obtained by transition formulae. In unfolding there is a perhaps more direct interpretation
perpendicular projections, which can be thought of as circular projections if of the display in terms of interpoint distances. It is easy to fall into the trap of
the subject points are taken out to infinity. Thus, as Carroll (1972, 1980) interpreting a correspondence analysis as if it were an unfolding and this
points out, the unfolding model is more flexible in the sense that it admits the danger should be avoided. No distance between a row point and a column
vector model as a special case. point is included or implied in correspondence analysis, whereas this is the
It is interesting to reveal now that the artificial data of Table 6.1 (b) are the basic concept in unfolding. If the asymmetric correspondence analysis display
actual distances between the two sets of points in Fig. 6.8. Thus the unfolding is chosen (Section 4.1.16), then the biplot interpretation applies, that is
model would be the most suitable technique in the sense that it correctly between-set (row-to-column) scalar products may be interpreted. If the
assumes the data to be row-to-column distances. Figure 6.8 should be subjects are represented in principal co-ordinates, say, and the (doubled)
recovered exactly by a "metric" unfolding analysis of the data of Table 6.1 (b) stimuli in standard co-ordinates, as in (4.6.1) and (4.6.2), then the reconstitu­
in 2-dimensional space. tion formula (4.6.1) can be written as:
In many applications, especially when the data involve rankings, there is a Yiq - Yq = 'E{- hkYqk (6.2.1 )
strong case for ignoring the actual values of the data and performing analyses Yq
182 Theory and Applications ofCorrespondence Analysis 6. Ratings and Preferences 183

where q refers to the original (q + ) variable of the doubled pair. Hence, if the the angle cosines (cf. (6.1.5)), as in the covariance biplot (Appendix A: Table
subjects are displayed as vectors, in the style of the vector model (cf. Fig. A.l(4)). Reweighting does affect the principal axes of these points, so that
6.7(a)), then the projections of the stimulus points onto a subject vector are their approximately displayed positions in lower dimensional spaces will be
proportional to the quantities (y¡q - Yq)/Yq, the usual deviation of a rating different.
relative to the mean rating. Notice that the means here are across subjects (i.e.
stimulus means, in this case, column means), whereas the vector model often
displays the deviations with respect to the means across stimuli (i.e. subject 6.3 EXAMPLES
means, in this case, row means).
6.3.1 Equivalence of correspondence analysis of a doubled matrix to a
Relationship to principal components analysis principal components analysis
From (6.1.6) it can be seen that the geometry ofthe rows in the correspondence Show that the correspondence analysis of the matrix of bipolar data Yiq (i = 1 ... 1,
analysis of a doubled data matrix is the same as the row geometry of a q = 1 ... Q), where the qth column is doubled with respect to tq (q = 1 ... Q), is
equivalent to the principal components analysis of the matrix Y':
principal components analysis of the original (undoubled) matrix with a
particular rescaling of the columns (see Example 6.3.1). In principal com­ Y;q = (~q)l/lY¡q
ponents analysis it is common to rescale the columns to have unit variance. where
In our present notation and in terms of the fractional ratings 1r¡q, with vari­ ~q = (tq/U/{Yq(t q- ,Yq)} (6.3.1 )
ances (Sq/t q)2, this would lead to squared distances between rows of the form:
Solution
d'f¡. = ~q(1riq - 1r¡'q)2/(Sq/tq)2 (6.2.2) The principal components analysis of Y' (cf. Appendix A, Table A.1(1)) situates the
In (6.1.6) the rescaling involves the quantity ñ q (l- ñ q ) in the denominator rows of Y' in an ordinary Euclidean space with masses 1/1 and squared interpoint
distances between rows i and i' of 1: q(y;q - Y;.q)l. These are exactly the same masses and
which from (6.1.7) exceeds the variance (Sq/t q)2 by the quantity which we have relative positions as the rows of the doubled matrix (cf. (6.1.6)).
called the polarization of individual ratings. Ignoring for the moment the
additional rescaling by the mass (tq/t.), we see that correspondence analysis
puts less emphasis on variables which have low polarization across the 6.3.2 Correspondence analysis of undoubled and doubled preferences
ratings, compared to the standardization (6.2.2), since the quantity ñ q (l- 1fq ) Suppose that (J¡ '= [O'¡lO'¡l'" O'¡il T is a vector of preferences by subject ion Q stimuli,
is then in large excess of the variance. Putting this another way, the i = i ... 1, where O'iq indicates the ranking of the qth stimulus, with a ranking of Q being
chi-square metric defining the positions of the row profiles accentuates the the most preferred. For example, if Q = 4 and subject i ranks stimulus 3 as the most
questions where polarization of individual ratings is the greatest, over and preferred, followed by stimuli 2,4 and 1, then (Ji = [1 3 4 2]. Derive the centered
aboye the accentuation already induced by the standardization of the positions of the ith subject in (ordinary) Q-dimensional Euclidean space in:
variables. (a) the correspondence analysis of the matrix of preferences O'iq, i = 1 ... 1, q = 1 ... Q;
This is an important concept in the analysis of bipolar data and illustrates (b) the correspondence analysis of the doubled matrix of preferences O'¡q + = O'¡q,
that the standardization of variables to have equal variances is not fully O'iq- = (Q+ l)-O'¡q
justifiable in this context where the data are definiteiy not of an "interval" When are the two analyses equivalent in their displays of the subjects?
nature. In correspondence analysis the more the group is polarized on a
certain question, both individually and on aveFage, the higher the importance Solution
given to that variable in the calculation of the between-subject distances. (a) The sum 1: qO'¡q = tQ(Q+ 1) for all i, so the ith row profile has elements r¡q =
O'iq/{tQ(Q + ll}. If we denote the mean ranking (l/1)1:¡O'¡q by c1q, the mass of the
Polarization seems to be a more useful concept here than variance and these qth column is c~ = lc1q/{t1Q(Q+ 1)} = c1q/{tQ(Q+ 1)}. The qth co-ordinate of
coincide only when ratings are made exclusively at the poles of the scale. subject i's profile !TI ordinary Euclidean space is (riq - cq)/(Cq)l/l, where the division
Since we can think of rescaling in the dual sense of assigning masses to the by (Cq)l/l takes care of the weighting of the dimensions implied by the chi-square
variables there is no change of position of the points (in the full space) metric:
representing the variables (stimuli) and the correlation coefficients are still (r¡q_cq)/(Cq)l/l = (O'¡q-c1q)/(tQ(Q+ l)c1 q)l/l (6.3.2)
184 Theory and Applications 01 Correspondence Analysis

(b) From (6.3.1) we know that the row profiles of the doubled matrix can be equiva­
lently situated in Euclidean space by the vectors with elements:
a¡. = (~.)1/2a;.
where
~. = (1/Q)/{ ó'.(Q + 1- ó'.)}
The mean I:¡a;. = (~.)1/2ó'., so that the centered position of the ith subject is just the
deviation:
(~.)1/2(ai. - ó'.) = (a i • - ó'.)/(Qó'.(Q + 1- ó'.W/ 2
(6.3.2) and (6.3.3) are equal if and only if ó'. = t(Q+ 1) for aH q, that is aH stimuli
(6.3.3) [!]
receive the same average rankings, in which case the qth co-ordinate of subject i is
{a;. -t(Q + 1)}!t(Q + 1)Ql/2 Use of Correspondence

Analysis in Discriminant

Analysis, Classification,

Regression and Cluster

Analysis

Most applications of multivariate analysis involve a matrix of 1 rows


(individuals, cases, ...) and Q columns (variables, questions, ... ) and are con­
cerned with one or more of the following types of analysis:
(1) Discriminant analysis: There is a given partition of the 1 subjects into
H groups and it is investigated which variables and what patterns of
observations characterize the groups and separate them.
(2) Classification: Again the subjects are grouped but we also have an
additional set of observations on ungrouped subjects that we wish to
c1assify (this is often the ultimate goal of a discriminant analysis).
(3) Regression: One of the variables, usually a quantitative variable, is re­
garded as "dependent" on the other "independent" variables and the
nature of this dependency is investigated. If the dependent variable is
qualitative then regression is essentially a discriminant analysis. If new
sets of observations on the independent variables become available, the
value of the dependent variable may be predicted, or forecasted, using the
established regression relationship. If the dependent variable is qualita­
tive, then this is essentially a c1assification problem. Hence all the above
techniques can be broadly considered under the title of regression
186 Theory and Applications ofCorrespondence Analysis 7. Use ofCorrespondence Analysis 187

analysis, including analysis of variance and covariance, although it is the data matrix in the form of points in the continuum of multidimensional
usually preferable to treat them separately because of their individual space, whereas clustering techniques generate sets of discrete values which
peculiaritieso allocate the subjects to groups. In Fig. 7.1 we have attempted a schematic
(4) Cluster analysis (often called classification automatique in the French view of all the aboye mentioned multivariate methods.
literature): Similarities of observations between subjects are studied with In the present chapter we shall demonstrate how a scaling technique like
the aim of forming groups of similar subjects, that is creating a partition, correspondence analysis can be used to solve problems in these various
or sequence of partitions, of the subjects. contexts. Most of the material presented here is not peculiar to correspond­
ence analysis but refers to a wider class of scaling techniques, although
Notice the difference between discriminant analysis, where there is a
correspondence analysis lends itself with particular ease to the treatment of a
particular partition of interest known in advance, and cluster analysis, where
wide variety of problems.
partitions are generated by the analysis. Just as discrimination can be
considered a discrete form of regression, so cluster analysis can be considered
a discrete form of multidimensional scaling, in the following sense : scaling
7.1 DISCRIMINANT ANALYSIS
techniques (like correspondence analysis) generate sets of scale values from
Up to now we have considered correspondence analysis as a technique for
(a) Regression ( b) Discriminant I I I

analysis
displaying the profiles of the 1 rows and J columns of a suitable data matrix
with respect to optimal subspaces. We now turn our attention to a partition
of the rows (andjor a partition of the columns), that is their grouping into an
x y l-y=f(X) x Z 1 - z=f(X)
exhaustive and disjoint set of c1asses.
For example, if the rows i = 1 ... 1 refer to the districts in a country and the
columns j = 1 ... J to different types of jobs (e.g. primary school teacher, shop
X·-j y·1
Prediction 1
I
1
I

. I
X·-l z·1
CIOSSl'f',callonj 1 1
assistant, dentist, etc.), then a partition of the rows would be the grouping of
the districts into regions while the jobs could be grouped into broader c1asses
I I I
of employment (e.g. public sector, small business, health care, etc.). Let us
suppose that the datum nij is the number of people in district i with job j, so
ro--- - - - - ­
that the rows of the matrix N corresponding to districts of the same region
(c) Scaling (d) Clustering
are simply added together to give the frequencies n~j of people in region h
with job j. Algebraically, this is handled more simply by defining an 1 x H
x - , yl.y2 .... x 1-+1 zl.z2 •... logical indicator matrix Zo whose ith row is a vector of zeros apart from a 1
in the appropriate column to indicate the region to which district i be1ongs.
The H x J matrix N' of regional frequencies is then simply N' = ZciN. In a
1....- _
similar fashion the columns of N can be condensed to give the frequencies n¡;
of people in district i with a job of c1ass 1by defining an L x J logical indicator
F IG. 7.1. Schematic view of four areas of multivariate analysis. X is an Ix J matrix 'lo whose columns c1assify the activities, so that the 1 x L matrix
observation matrix of I subjects on J variables (quantitative and/or qualitative).
Vectors y and z represent a set of quantitative and qualitative values respectively
N" = N'lci. Finally, we can obtain the H xL matrix of frequencies n~: of
for the subjects. Thus: (a) Regression establishes a relationship between X and y people in region h having c1ass 1 of employment by condensing row- and
(both of which are given) and prediction (forecasting) uses this relationship to columnwise: N'" = ZciN'lci (i.e. ZciN" or N''lci). All of these matrices are
derive y. from an additional X·. (b) Discriminant analysis/classification are depicted schematically in Fig. 7.2 and a number of different correspondence
similar to regression/prediction. but the dependent variable is qualitative. (c) analyses are possible, depending on the objective ofthe study.
Scaling (e.g. correspondence analysis) derives sets of quantitative values (e.g.
principal co-ordinates) for the subjects (and/or the variables). (d) Clustering It can be easily shown that each row profile of N' is the centroid of the
derives sets of qualitative values in the form of partitions of the subjects (and/or constituent group ofrow profiles ofN (Example 7.5.1). In addition, because
the variables). the column sums of N' are the same as those of N the metric in each space of
188 Theory and Applications ofCorrespondence Analysis 7, Use ofCorrespondence Analysis 189

H J L Similar descriptions hold for the separate analyses of N" and of N"', for
example in the analysis of N'" we are etTectively investigating the dual

~
principal axes of two sets of centroids and the rows of N" and the columns of
L{ N' can be displayed a posteriori as supplementary points.
The correspondence analysis of N', N" or N'" is etTectively a discriminant
analysis between the groups defining the rowS and/or columns. We shall take
the analysis of N' as an illustration and suppose that H ::::; J ::::; l. The
(centered) row profiles of N lie in a (J -1 )-dimensional space, while the set of
N''': centroids (row profiles of N') lie in an (H -1)-dimensional subspace. Thus
Zo N
Ni; (J - H) dimensions of the original row space are eliminated by focusing the
study on the group centroids. A particular group of rows of N (group of
districts in our example above) reflects the variability within the correspond­
ing row ofN' (the region). This variability is not necessarily of a probabilistic
nature, as exemplified by our example in which the regions, districts and data

I ~:~I
are assumed complete and exhaustive. Notice that the analysis ofN, too, may
Hj I NÓ
N be considered a discriminant analysis where each group is a set of one row
alone with no variability within the group (apart from possible measurement
error in collecting the data).
F IG, 72, The baslc data matrix N and various condensations of its rows and It is up to the investigator to decide whether he is interested in the analysis
columns, The row and column groupings are given by the logical indicator of N (the inter-district analysis) or the analysis of N' (the inter-regional
matrices Zo and 2 0 respectively, analysis). Nevertheless it is often interesting to compare the principal axes
emanating from these two analyses, especial1y when the principal axes of N'
row profiles is the same, namely D,- l. SimilarIy, the column profiles of N" are are recovered amongst those of N. Clearly there is more inertia in the analysis
centroids of the constituent groups of column profiles of N and the column of N not only in the full space but also in any subspace. In other words, the
metrics in the correspondence analyses of N and N" are the same, namely process of condensing the points into groups reduces the moments of inertia
D; l. Hence there is just one metric framework for the data and possible of the cIoud in aH directions. In fact, with respect to any subspace we have the
partitions thereof, namely that of the original matrix N. cIassic result that the sum of the inenias of the individual profile points is
At the most detailed level N itself is analysed and principal axes of its equal to the sum of the inertias of the group centroids ("between-group" or
profiles computed. The rows of N' and the columns of N" may be displayed "interclass" inertia) plus the sum ofthe individual inertias within each group
as supplementary points with respect to any subspace either by computing ("within-group" or "intracIass" inenia). This result is stated more formal1y
the appropriate centroids or by using the relevant transition formulae (cf. and proved in Example 7.5.3. Thus the sum of the inertias of the centroids is
Example 7.5.1). always less than the sum of the individual inertias by an amount equal to the
To place more emphasis on ditTerences between groups of rows (regions) within-group inertia. Because the correspondence analysis ofN' identifies the
N' may be analysed. As far as the row profiles are concerned this is an principal axes which reflect maximum inertia of the centroids, the objective
analysis of the row group centroids in the same space, as if the masses of the of minimizing within-group inertia in the subspace of these axes is equiva­
cloud of l points are removed and concentrated in the respective centroids. len tly satisfied. It is this property which characterizes the discrimination:
The computation of principal axes of the centroids is analogous to canonical loosely speaking, the group centroids are pushed apart while, simultaneously,
variate analysis, which can be thought of as a weighted principal com­ the within-g:oup variability is tightened.
ponents analysis of group centroids in Mahalanobis space (cL Appendix A). In terms of the principal inertias Ak of N and A~ of N' we have the further
In the analysis of N' the rows of N and the columns of N'" can be dis­ result that Al ~ ;"1' A2 ~ ;.~, ... (Example 7.5.4). The process of condensing
played as supplementary points using the relevant transition formulae (cL the data, that is of amalgamating groups of points, necessarily leads to
Example 7.5.1). smaller principal inertias.
190 Theory and Applications ofCorrespondence Analysis 7. Use of Correspondence Analysis 191

The application in Section 9.8 treats a matrix of frequencies and a 3


particular condensation of the rows ofthis matrix. 1~
3
, \ 2
7.2 CLASSIFICATION 1 (
?
Classification is an extremely broad subject and it has an abundant literature, 2
particularly in the biological sciences. As mentioned in this chapter's intro­
duction, c1assification is often the justification for performing a discriminant 1 ,- 2
2/,2
',,
analysis-we first want to know in which respects the given groups of ,
individuals are difIerent, after which we apply this knowledge to classifying ,, 1
1
_r---.l
new individuals into the groups.
This is the typical scenario of pattern recognition, which is a more general
term for the problem of discriminationjc1assification, often used in the FIG.7.3. Neighbourhood of an unclassified vector of observations (denoted by
"7"), with radius r. Since there is a majority of vectors known to be from group "2"
computer science literature. In pattern recognition there is a "training set" (or in this neighbourhood, the predicted classification of ''7'' is "2".
"design set") of data on which an algorithm can be developed to recognize
patterns, in this case to distinguish between the groups. Unc1assified data, the
"test set", subsequentIy become available and the algorithm is then executed each point in the design set being labelled by its group affiliation then the
to perform the c1assification. most obvious way of c1assifying an unlabelled point is in terms of its set of
From this description it sounds as if the c1assifying algorithm is fairly nearest neighbours, the points which are most similar to it (Fig. 7.3). This is
well defined at the end of the design, or discrimination, phase. In most tradi­ more easily said than done, however, and a number of decisions need to be
tional statistical approaches to the problem, for example linear discriminant made before arriving at a c1assification algorithm: what is the most suitable
analysis, the c1assification is performed I:>y evaluating and comparing several measure of distance between individuals in the c1assification space, what is
functions of the new data. Here the design set has been used to compute the the size and shape of the neighbourhood of a point, and how is the actual
parameters of these functions, after which it is efIectively discarded. Such c1assification decision to be performed?
methods have optimal properties in discriminating amongst the individuals As far as choice of distance is concerned, we already assume that the metric
of the design set alone, and only when certain distributional assumptions on in the full space of the points has been chosen by the analyst. For example, if
the data are satisfied. The typical assumption is that the data vectors of the data are being analysed by correspondence analysis then the chi-square
individuals composing each group follow multinormal distributions with the . metric is assumed. Were it not for our interest in groups of points, the
same covariance matrix in each group. However, in many situations the distance (and thus the similarity) between two points would simply be the
individuals of interest may not even be obtained by any sampling process at distance calculated in the full space. "Close" individuals would thus have
all or are in fact regarded as a complete population; and even if they were similar observations across the complete set of variables. Suppose now that
sampled, it would be arare mirac1e indeed ifthese assumptions were satisfied! there exists a variable which is totally unrelated to the discrimination of the
groups, in other words two individuals of the same group can have quite
C/assification using neighbourhoods
difIerent observations on this variable. It is c1early superfluous to take this
variable into account when trying to assess the c10seness of individuals with
By contrast, we prefer a less theoretical and more pragmatic approach to the the ultimate objective of c1assifying one ofthem. It seems that for c1assification
problem, based on the concept of a neighbourhood of a point. Our approach purposes the distance should be computed in a subspace which exhibits
in this book has been to stress the geometry of a set of points in multi­ the between-group difIerences, rather than the between-point difIerences.
dimensional space, each point being characterized by its observed vector of Analogous to the canonical space of c1assical discriminant analysis, we shall
data. Our prime example has been the geometric framework of correspond­ thus compute distances in the subspace of the group centroids (or, possibly,
ence analysis, which we have shown to be particularly versatile. Ifwe imagine a principal subspace thereof). Here all variation of the individual points
192 Theory and Applications ofCorrespondence Analysis 7. Use of Correspondence Analysis 193

• • which the analyst considers too dissimilar. Between these two extremes there
• is a value (or a range of values) that can be used, A choice with more optimal
• properties can probably be made by employing sorne type of cross-validatory
• • scheme, which we shall discuss below.
• •
The actual mechanics of the c1assification decision are usual1y quite simple
• • once the neighbourhood of an unc1assified point is identified-the point is
• • c1assified into the group which has highest mass in this neighbourhood. If
•••
--------~-----~----------
• individual points have the same mass then this is equivalent to the highest
• •
frequency in the neighbourhood. This strategy takes into account the relative
• • proportions of individuals in each group in a natural way and there is no
• • need to adjust for prior probabilities of the groups, unless these are quite
different from the proportions in the design sel. For example, if a particular
• • group is doubly over-represented in the design set then the masses of al1 its


• members can be halved initial1y so that each member eventual1y counts as
• "half" a member for purposes of c1assification.

FIG.7.4. Example of c1assification subspace (dashed line) of the centroids Cross- va/idation
(open triangle and circle) of two groups of points (sol id triangles and circles)
when one grou¡:, is non-convex (the circles roughly define a banana-shaped The c1assification procedure which we have outlined aboye is not optimal in
cloud of points). A poor c1assification of the points will result in this subspace any theoretical sense because we are deliberately avoiding the mathematical
because the two groups overlap considerably when projected onto the subspace, assumptions by which a measure of its performance can be judged. It would
even though they are quite separate in the full space. make more sense here to judge the procedure in a cross-validatory fashion by
dividing the design set randomly into a design subset and test subsel.
Classification of the test subset can be performed and the results validated
orthogonal to the variation of the centroids has been eliminated as "un­ against the actual c1assification which is known. If this is repeated a number
important" for c1assification. In doing this we do make the assumption that of times the range of performance of a particular procedure can be obtained.
each group of points occupies an approximately convex region of the ful1 This whole process can be repeated on a variant of the procedure, for example
space. If one of the groups were highly non-convex then distances computed where extra dimensions are added to or subtracted from the c1assification
in the centroid subspace could be highly misleading for c1assification (Fig. space, or where the neighbourhoods are made wider or narrower. This can'
7.4). Notice that the usual multinormal assumptions are equivalent to direct the analyst towards an improved strategy, but it is a great deal of extra
assuming el1ipsoidal c10uds of points, which are convexo efforl.
It might prove worthwhile to perform separate analyses on each group of The aboye procedure is presentIy being used for forecasting the weather in
points to increase understanding of their spatial distributions. For example, a meteorological experiment and the study is outIined in Section 9.11.
if a group of points is found to occupy two separate regions then it would be
advisable to enlarge the c1assification space by adding the dimension which
coincides with this division. This may be easily achieved by replacing the 7.3 REGRESSION
original centroid of the group by the two centroids of its subgroups prior to
determining the c1assification space. A typical regression analysis attempts to "explain" the values of a quantitative
Conventional1y the neighbourhood of a point is a multidimensional variable y (which we cal1 the predictand) in terms of a number of predictor
sphere, or spheroid, and its radius is to be specified by the analysl. Empirical variables Xl ... Xj, possibly of different types, The c1assical approach is to set
guidelines for this choice are that the sphere should not be so smal1 that it up a regression model for the 1 sets of observations: Yi = f (x i1 ... X iJ; P) + e ¡,
inc1udes very few neighbours, and not so large that it encompasses points i = 1 .. . 1, where P is a vector of parameters and the e ¡s are residuals, or
194 Theory and Applications ofCorrespondence Analysis 7. Use ofCorrespondence Analysis 195

y regression computer program to take the burden otT his shoulders. This
would be fine if the analyses were viewed as exploratory, but more often than
not the resultant regression mode! is adopted and then regarded as the
"correct" mode!. This attitude is more prevalent in large studies where it is
expensive to perform further analyses to cross-validate the data and diagnose
the regression relationship more accurately (if it indeed exists!).
As an alternative we again try a less formal approach by using neighbour­
hoods of points, the only problem being to decide in which space to calculate
the neighbourhoods. As in discrimination/classification we would want a
space which somehow shows up the relationship between the predictand and
the predictors, so that a new vector x* is not matched to vectors x on features
X2
that are unassociated with the variation in y. Correspondence analysis
provides a framework for investigating this association and then deriving
FIG.7.5. Illustration of the linear regression model (a plane in this case) where
such a regression space. First, the range of y is segmented into a number of
the number of predictor variables is J = 2.
classes, the analyst being guided by the histogram of the variable and his
experience of its values. Using the example of daily rainfall again, one class of
deviations, from the mode!. The analysis does not determine the form of the rainfall might be zero rainfall, the next O-~ mm, !-l mm, and so on, where
model but estimates the parameters Pfor the prescribed model with a view to the intervals are meteorologically relevant. The more observations there are
minimizing the residuals in sorne global sense. Most commonly the prescribed in the training set of data, the more subdivisions can be imposed to make
model is a linelJ.r one: /30 + /31 x i1 + ... + /3 JX u, and the parameters are esti­ subsequent analysis more sensitive to finer variations in the predictand. As
mated by least-squares, that is by minimizing r.¡er = r.i(y¡ - /30 - r. j/3jx¡y. far as the predictors are concerned, they too need to be recoded into
The situation can be described geometrically as the fitting of a hyperplanar categories or pairs of doubled variables, as described in Section 5.4, more
response surface and is depicted in Fig. 7.5. It seems highly unlikely that the especially if the variables are of diverse types. This whole recoding process
values Yi should alllie near such a hyperplane, as implied by the linear model. aims at producing a matrix which summarizes the association between y and
In the case of a predictand like rainfall, for example, there might be many zero x by crossing the categories of the predictand with those of the predictors
values, in which case the linear model is unrealistic. The careful data modeller (Fig.7.6).
would investigate the data more closely and introduce relevant functions of The correspondence analysis of this matrix will provide a space which
the predictors into the regression model so that the response surface follows
the values Yi more closely.
coteQories of predictors
If a model can be established which fits reasonably, the data is etTectively
j
discarded and the estimated mode! is used both as a description of the ~
relationship and to predict new values Y* from new vectors of observations I I
x* on the predictors. Confidence intervals on these predicted Y* are part of cot8Qories
01
------ ----H- -------- --1
such an analysis and are usually the same width irrespective of the values of
x*, as if there were a confidence band on either side of and parallel to the
predlctond - - - -- - - - - ~-I
-- - - - - - - ­
response surface. I
I
We have mixed feelings about the use of such models, especially in the
,I
analysis of large data sets. On the one hand, when the data analyst has sorne
prior justifications for describing his observations in terms of a model and has FIG.7.6. Data matrix for setting up a regression space by correspondence
fairly firm ideas about the form the model should take, then modelling seems analysis. The (i,j)th cell of this matrix contains the frequency of association (or
perfectly suitable and justifiable. On the other hand, it may be that the other recoded measure of association) of category i of the predictand with
analyst has no fixed ideas about the relationship and resorts to a multiple category j of the predictors.
196 Theory and Applications ofCorrespondence Analysis 7. Use ofCorrespondence Analysis 197

discriminates between the classes of the predictand. The individual (recoded) topic. Instead, we shaH briefly describe the differences between hierarchical
data vectors of predictors can be projected onto this regression space, as and non-hierarchical clustering and then demonstrate how a particular form
described in Section 7.1, as well as new vectors of predictors, whose of hierarchical clustering is related geometricaHy to correspondence analysis.
neighbourhoods can be determined. The actual prediction need not be a The aim of a cluster analysis is to derive a partition, or a sequence of
classification into a predictand group, but a summary statistic evaluated on partitions, of a set of objects based on their similarities (equivalentiy, their
aH the y¡s whose predictors fall in the neighbourhood, for example the mean distances) to one another, so that objects clustered into the same group (or
of the y¡s, or their median, or sorne more sophisticated statistic taking into class) are similar, or close, to one another, while those of different groups are
account the fact that y is a regionalized variable (in the regression space). dissimilar, or far apart. Before clustering even begins, several decisions need
Finally, the whole procedure, with its host of ad hoc choices, can be fine-tuned to be made by the data analyst, the most crucial being how the inter-object
using sorne cross-validation scheme. similarity or distance is to be measured. If there are 1 objects, an 1 x 1 sym­
The efficacy of such a strategy for performing regression has not yet been metric matrix of similarities, or distances, is computed. To avoid repetition
fully investigated, although the method has a lot of appealing properties. The we shall describe clustering in terms of inter-object distances, which are
main problem with researching the method is the development of flexible monotonically inversely related to similarities (Le. the less similar two objects
computer programs to perform the large numbers of calculations involved, are, the further they are apart).
especially in cross-validation studies. One program, at least, is described by
Lebeaux (1974, 1977), who first worked in this area in collaboration with Hierarchica/ c/ustering
Benzécri. Cazes (1978), in the third of a series of articles on regression
methods, also discusses this strategy, which the French call régression par In hierarchical clustering the 1 objects are regarded initiaHy as 1 clusters of
boule (bubble regression). one object each and the analysis proceeds sequentiaHy to agglomerate
clusters into larger clusters until aH the objects form a single cluster. The
method is attractive because it is non-iterative and can be represented
7.4 CLUSTER ANALYSIS graphically in the form of a binary tree (Fig. 7.7). The only other decision
needed in order to carry out this type of clustering is how distances between
Cluster analysis has a vast theoretical and applied literature and it would be clusters of more than one object are to be measured. The three most
beyond the scope of this book to enter into a comprehensive review of the common choices for defining inter-cluster distance are the minimum of aH
inter-object distances between the two clusters ("single linkage" clustering),
I (= 9) objects the maximum inter-object distance ("diameter" clustering) and an average
e a (J h b i f d
inter-object distance ("average linkage" clustering). At each step of the
clustering the two closest clusters are agglomerated, corresponding to one of
CI>
4 the branches, or nodes, of the tree. Hence there are 1- 1 steps of the analysis,
o
e
6 that is 1 - 1 nodes of the tree, to complete the clustering of aH the objects, and
.E
.!!! each node may be indexed by the distance between the two clusters which the
-
't:>

o
d node brings together. The nodes of the tree are displayed on a vertical scale
CI> according to these distances (Fig. 7.7). Any horizontal cross-section of the
'6 7
~ tree reveals a partition of the objects, characterized by the distance d of the
"slice". For example, if the "diameter" method is used then the clusters are
such that all inter-object distances within each cluster are less than d, whereas
in "single linkage" clustering aH the clusters are separated from each other by
distances greater than d. An "average linkage" clustering compromises
FIG.7.7. Binary tree which summarizes a hierarchical clustering. Since there are between the former criterion of within-cluster similarity and the latter
9 objects. there are 8 nodes of the tree. which are formed in the order indicated
(i.e. In order of increasing inter-cluster distance). The horizontal "si ice" at criterion of between-cluster separability.
d istance d partiti ons the objects into 3 clusters: {c. a. e}. {h. b} and {l. f. g. d}. Hierarchical clustering is useful if the analyst has no prior ideas about how
198 Theory and Applications ofCorrespondence Analysis 7. Use ofCorrespondence Analysis 199

many clusters he expects or might like to have. Having obtained the results hierarchical clustering which can be related to a correspondence analysis. Let
in the form of a binary tree, sorne obvious choice of where to make the us suppose that we have analysed the correspondence matrix P(I x J) to
cross-section might become apparent. For example, in Fig. 7.7, the largejump obtain matrices of principal co-ordinates F and- G~ anQpriºfi'p~1 inertias in
between the distance values of nodes 6 and 7 suggests that the cross-section l!;.. Furthermore, let us suppose that we are interested in a clusterTng-ü(the
can be satisfactorily made between these nodes, so that 3 clusters are rows of P. In a hierarchical cluster analysis of these rows there wil1 be 1 -1
obtained. nodes, and at each node 1, two clusters l' and 1" are amalgamated. If in(ll')'
in(Iloo) and in(ll) denote the (within-cluster) inertias of the clusters 1', 1" and
the combined cluster I respective1y, each with respect to their own centroids,
Non-hierarchical clustering then we know from Example 7.5.3 that in(ll) is greater than the sum
In non-hierarchical clustering attention is directed at sorne prespecified in(lr)+in(lloo) by an amount equal to the inertia ofthe centroids of l' and 1"
number of clusters, to be obtained by attempting to optimize a criterion of with respect to their joint centroid, the centroid of cluster l. A natural choice
"clusteredness", that is an overal1 measure of within-cluster compactness and for clustering seems to be those two clusters whose agglomeration induces the
between-cluster separation. The clustering proceeds from a reasonable initial least increase in within-cluster inertia, equivalentiy the least decrease in
partition of the objects in an iterative fashion towards partitions which are between-cluster inertia. With each node I we can associate the minimum of
more and more "clustered". One such algorithm transfers one object between the increases VI:
any two clusters at each iteration so as to produce the maximum increase in VI == in(II)-{in(lr)+in(Ir')} (7.4.1)
the chosen measure of clusteredness. This can involve a large amount of
computation at each iteration, but there are simpler special cases, for example which can be rewritten as:
the present situation of 1 objects, with masses, in a weighted Euclidean space
where the clusteredness is defined as between-cluster inertia. Maximizing the r~l')r~IOO) ¡,(l') _ ¡,(l")lIb,-' (7.4.2)
VI=~1I
between-cluster inertia is equivalent to minimizing the within-cluster inertia,
as shown in Example 7.5.3. In this case the gain in between-cluster inertia where r~l') and ¡,(l'), for example, are the mass and centroid of the row profiles
resulting from the transfer of one object can be evaluated quite simply from in the l'th cluster, and 11 ... llb,-1 denotes the chi-square metric in the row
the distances of the object to the centroids, with an adjustment which geometry. This result is proved in a more general form in Example 7.5.6.
depends on the masses of the clusters and of the object (see Example 7.5.5). Thus the hierarchical clustering algorithm which minimizes VI at each node
This is a variant of the non-hierarchical clustering algorithm cal1ed "k-means is similar to a "single linkage" clustering on the (squared) chi-square
clustering" (MacQueen, 1967) where objects are assigned to the clusters with distances between the row profiles, with the difference that these distances are
the closest centroids. The centroids of the new clusters are recomputed after weighted by the masses of the pair of clusters. Clearly, the quantities VI can be
each assignment and the process continues until no more objects can change calculated at each node of the clustering tree, irrespective of the type of
clusters. To initiate this algorithm a set of random points are often computed clustering performed, but it is only in the particular case where VI is mini­
as seeds for the first clustering of the objects. mized at each step that the value of VI coincides with the vertical position of
The results of such a clustering are not unique and depend on the initial
the node in the binary tree.
clusters as wel1 as the order of the objects in the data file. To eliminate the The set of quantities VI' I = 1 ... L, provides yet another decomposition of
latter dependency the algorithm can be modified to al10w al1 the objects to be the total inertia of the cloud of row profiles in(l), which in the present
assigned before recomputation of the new cluster centroids (Forgy, 1965). notation is equivalent to the inertia of the terminal cluster in(l d- Notice that
The former dependency can be investigated by repeating the clustering our present definition of in(lr), say, is the inertia of a cluster l' of Ir points
algorithm with different choices of initial clusters. The final solution which with respect to their own centroid (the "within-cluster" inertia) so that when
provides the optimal clusteredness can then be retained. l' is a single object in(ll') = O. If we sum VI over al1 L = 1- 1 nodes, al1 terms
cancel out, by definition (7.4.1), except the inertia ofthe final cluster of al1 the
Hierarchical c/ustering and correspondence analysis objects in(l). Hence:
For the remainder of this section we shal1 discuss a particular form of in(l) = ~(Vl (7.4.3 )
200 Theory and Applications ofCorrespondence Analysis 7. Use ofCorrespondence Analysis 201

There is an interesting analogy between clustering the profiles in this way points and their contributions to the principal axes as well as the axes'
and displaying them multidimensionally. In the latter situation we choose a contributions to the nodes may be computed. The nodal inertia V¡ is a
set of orthogonal axes to represent the profiles and th~ total inertia is weighted squared distance between the centroids of the pair of c1usters l' and
decomposed along these axes. If the axes are principal axes the'n lhe 1" agglomerated at the node (cf. (7.4.2)). This squared distance may be
decomposition is optimal in the usual sense that the axes reflect maximum expressed as the sum of squared differences in principal co-ordinates of the
inertia in an ordered fashion: Al ~ A2 ~ .... On the other hand, when a centroid profiles:
hierarchical clustering isperformed on the profiles, the sequence of 110des 111'(1')-1'(1")111>," = I- k(fll')-fll"l)2
defines a sequence of partitions o( the óbjects and the f6TarTr;"~'rtiacan be
decomposed amongst the nodes according to the quantities VI' When these (wherefll'l, for example, is the principal co-ordinate ofthe ['th centroid on the
quantities are minimized at each step then the decomposition is again kth principal axis). Consequently, the total inertia can be decomposed by
optimal, in the sense that the between-cluster inertia decreases the least at nodes and by axes in a two-way table of quantities:
each step. In this case there is also an ordering within the decomposition: (1') (1")

V 1 ~ V2 ~ ••. , that is in terms of the values of the nodes on the vertical scale
=~
V¡k -(1)
(fll'l
k
_fll"l)2
k (7.4.4)
r
of the binary tree. The analogy between axes and nodes is not complete,
however, since in no sense are the nodes "orthogonal" to one another. In fact, where VI = I-kV 1k , by definition, and Ak = I-¡V¡b since (7.4.1) applies in a
the various partitions of the objects, derived from horizontal cross-sections of similar fashion to the profiles projected onto the kth principal axis.
the binary tree, are highly dependent on one another because of the As in the usual decomposition of inertia by points and by axes (cf. Section
hierarchical style of clustering. 4.1.11), the quantities v¡k/Ak can be computed to investigate the contribution
By analogy with the principal inertias we shall call the quantities V 1 , V 2 , ••• of the nodes to the kth principal axis. If such a contribution is near to 1 then
nodal inertias, or the contributions ofthe nodes to the total inertia. Percentages this means that the dispersion of the cloud of points along axis k is associated
of inertia can be computed as before: vl/in(I), and a particular partition is almost exc1usively with points clustered at node l. The quantity V¡k/V¡ is
suggested where there is a large jump in these percentages, just as a particular similarly called the contribution of the kth principal axis to the node I and
subspace is suggested when there is a large drop in the principal inertias. may be interpreted as a squared angle cosine (Fig. 7.8). The centroids of the
There are sorne interesting inequality relationships between the principal pair of clusters [' and 1" constituting the node I define a direction in multi­
inertias and the nodal inertias. The highest principal inertia Al' for example, dimensional space, subtending an angle of <Plk with principal axis k, such that
always exceeds the highest nodal inertia vL, the inertia of the node that forms cos 2 <Plk = V¡k/V¡ (proved in a more general situation in Example 7.5.6). Hence
the cluster of all the objects. If this were not so, it is easily shown that the axis if V1k/V¡ is close to 1 then the separation of the centroids of the c1usters [' and
which joins the centroids of the last two clusters would reflect a higher 1" is almost exactly along the kth principal axis. As before these squared
moment of inertia of the cloud of objects than Al' which is impossible since cosines may be added together to give the squared cosine of the inter-centroid
Al is the highest. More generally, the sum of the first K* principal inertias is vectors with respect to any subspace of the principal axes.
higher than the sum of the K* largest nodal inertias. The superiority of the
¡'
principal inertias over the nodal inertias is most dramatic amongst the higher
/
inertias. In fact, as shown by Benzécri and Cazes (1978), the largest nodal ,...""
,... ,...
inertia can be extremely small compared to Al in which case the cluster
,... ,...""
analysis is much less effective in analysing the data than correspondence I ",,,,'"
analysis. A typical example of this would be when the profiles occupy a
"continuous" region of multidimensional space and hardly cluster at all.

"'mi, kth principal axis


Mutual cantributians af nades and axes
The most interesting outcome of this nodal decomposition of inertia is that FIG.7.8. Geometry of the contribution of the kth principal axis to the /th node.
in correspondence analysis, for example, the nodes may also be displayed as vIklv 1= cos 2 4Jldsee Example 7.5.6(b)).
202 Theory and Applications ofCorrespondence Ana/ysis 7, Use ofCorrespondence Ana/ysis 203

Displav of the nades with respect to principal axes is the matrix oC row profiles oC P, then the rows oC ZciD,D,-1P = zcip = P' contaín
the weighted sums oC each group oC row profiles. To obtain the centroids, each of these
The nodes may thus be represented in a correspondence analysis display by weighted sums must be divided by the total mass oC the respective group. These
the centroids of the profiles which they bring together into a cluster. Thus the masses are contained in the vector ZciD,l = Zcir, whích is just the vector of masses r'
oC the row profiles oC P', Le. the row sums oC P' :r' = P'l = ZciPl = Zcir. Hence the
terminal node L is displayed at the origin itself, while the two clusters
centroids are D; 1P', the row profiles oCp',
preceding L are displayed by two centroids on either side of the origin, Suppose that F, G and D A are the principal co-ordinates and principal inertias in
"balanced" at the origin by their respective masses. In general, the centroids the correspondence analysis oC P, The rows oC P' can be displayed as supplementary
of the two clusters (nodes) l' and [" preceding node [ are displayed on either points by evaluating the centroids oC the groups oC rows oC F: D; 1ZciD,F, where, as
side of the centroid of [ and are balanced at this centroid by their respective beCore, D, assigns the masses to individual rows oC F, zci sums these rows in their
various groups and D; 1divides the weighted sums by the group masses. Alternatively,
masses, which in previous notation can be written in the full space as: the usual transition Cormula Crom columns to rows can be applied to the profiles
r~l')(r(l') - 1'(1») + r(l.")(r(l") - 1'(1)) = O D;:IP' :D,-; Ip'GDI 1/2.
Suppose now that F', G' and D A' are the principal co-ordinates and principal
and with respect to any principal axis k as: inertias in the correspondence analysis oC p', The rows oC P can be displayed as
supplementary points by applying the column-to-row transition Cormula to the pro­
r~l')(nl') -ni») + r~l")(nl") -ni») = o files D,-IP:D;IPG'Di l / 2. The row profiles D,-;Ip', displayed by F', are stil1 at the
centroids oC the groups of supplementary profiles: D,-; 1Z~D,D,-I PG'Di 1/2 =
Computationally, the display co-ordinates are obtained either by direct D,-;IP'G'Di 1/2 = F'.
calculation of the appropriate centroids of the clusters of points, or by
"condensing" the clusters of points into their centroids, as described in 7.5.2 Huvghens' theorem
Section 7.1 and Example 7.5.1, and then using the appropriate transition
Let y1... y J be a cloud oC points, with masses WI ... W¡, in a multidimensional weighted
formula to represent them as supplementary points. Euclidean space where the metric is defined by the positive-definite symmetric matrix
Notice that aH the results of this section apply to the more general situation Q. Let y denote the centroid oC the points 1:¡w¡y;/1:¡w¡. Show that the total inertia oC
of a set of points, with pre-assigned masses, in a weighted Euclidean space the cloud with respect to any point y is equal to its total inertia with respect to
structured by any positive-definite symmetric matrix. We state and prove the centroid plus the squared distance between y and y weighted by the total mass oC
results in Section 7.5 for this general case and these apply as a special case to thecloud:
correspondence analysis. 1:¡w¡ IIY¡-YII~ = 1:¡w;lIY¡-YII~+ (1:¡w¡) Ily-yll~ (7.5.1)
Benzécri et al. (1980) present FORTRAN programs to compute various where:
tables which enhance the interpretation of a cluster analysis in the framework Ila-bll~ == (a-b)TQ(a-b)
of correspondence analysis. A complete treatment of this subject is given by
Jambu (1978, 1983). So/ution
The total inertia oC the cloud with respect to y, i.e. the left-hand side oC (7.5.1) can be

written as:

7.5 EXAMPLES 1:¡w¡(y¡-y+y_y)TQ(y¡_y+y-y) = 1:¡w¡(y¡_y)TQ(y¡_y)+ 1:¡w¡(y_y)TQ(y_y)

+ 21:¡w¡(y¡ _y)TQ(y_y)
7.5.1 Correspondence analvsis of a "condensed" matrix By definition oC the centroid y the cross-product term is zero:

Let P(I x J) be a correspondence matrix and P'(H x J) the correspondence matrix {1:¡w¡(y¡_y)T}Q(y_y) = {1:¡wiY-1:¡w¡y}TQ(y_y) = O

formed by adding together mutually exclusive groups oC rows oC P, which we can


denote as Col1ows: P' = zcip, where Zo(I x H) is a logical indicator matrix. Show that hence the result.
each row profile oC P' is the centroid oC the set oC row profiles oC P which was grouped
to Corm the respective row. Show how the rows oCP' can be displayed as supplementary 7.5.3 Between- and within-group inertia
points in the correspondence analysis oC P and, conversely, how the rows oC P can be
displayed as supplementary points in the correspondence analysis oCP'. Let yFl, i = 1 ... 11 ; y\2), i = 1 ... 12 ; " , ; y}Hl, i = 1 ... 1H, be H clouds oC points with
respective masses wl h ), i = 1 ... I h , h = 1... H in a multidimensional weighted Euclidean
So/ution space structured by the positive-definite symmetric matrix Q. Let y(h) == 1:{'W\h)y\hl /W(h),
By definition Zo is a matrix oC Os and 1s with a single 1 in each row: Zol = 1, If D,- 1P h = 1... H, where W~h) is the collective mass oC the hth cloud: W~h) = 1:{·W\h). Show that
204 Theory and Applications ofCorrespondence Analysis 7. U se of Correspondence Analysis 205

the total inertia of all 1 = r.hl h points with respect to their overall centroid y is equal r.hin(h) resulting from the transfer is equal to:
to the sum of the inertias of the H group centroids (i.e. the between-group inertia) plus
the sum of the inertias of each group with respect to its respective centroid: wow. h') )
-(h') 2 woW (h") )
-(h") 2

(7.5.3)

( w. (h')Ilyo-y 1 - (h") Ilyo-y 1


-w o w. +w o
r.hr.:·wl h)Ilylh) - YII~ = r.hW(h) Ily(h) - YII~ + r.hr.:·wlh) Ilyjh) - y(h)ll~ (7.5.2) where W~h') and y(h'), for example, are the mass and centroid of the h'th group of points,
(where the squared distance Ila - bll~ is defined as in Example 7.5.2). before transfer of Yo'

Solution Solution
This result is a direct application of Huyghen's theorem to each group of points, where When Yo is transferred from group h' to group hU the centroids of these two groups
the point y of (7.5.1) is the overall centroid y. Thus the inertia of the hth group of are translated, respectively away from and towards the position of Yo. This transfer !
points may be expressed with respect to y as: only alTects the inertia of these two groups-the within-groups inertia of the other
r.:'wl h)Ilylh) - YII~ = r.:·wl h)Ilylh) _ y(h)lI~ + W(h) Ily(h) -YII~ groups is clearly unalTected and, because the transfer does not alter the position ofthe
overall centroid y, the inertias of the other centroids remain constant. Because the
Summing this equation over the groups, h = 1... H, gives the result (7.5.2). between- and within-groups inertias sum to a constant (Example 7.5.3) we can
evaluate the increase in the between-groups inertia equivalently by the decrease in the
Comment within-groups inertia.
This decomposition of the total inertia holds in the full space of the points as well as Before transfer, the inertia within-group h' comprising I h , points, which we denote
in any subspace onto which the points are projected. by in(Ih')' is:
in(Ih') = r.:·wl h)Ilylh') _ y(h)11 2
2 h 2
7.5.4 Principal inertias of a cloud of centroids
= W
oIIYo-y(h')11 +r.wl ') Ilylh')_y(h')11
where the summation in the second term extends over all points in group h' except the
Given the same situation as in Example 7.5.3, show that the principal inertias Al' A2 , ••. point Yo. By Huyghen's theorem this term is the inertia of the group of points without
of the original points are greater than (or equal to) the respective principal inertias Yo, which we denote by in(I h' -1), plus the new group's mass W~h') - Wo multiplied by
XI' }.~, ... of the centroids: }'I ~ A;, A2 ~ A~, ....
the squared distance between the old centroid y(h') and the new centroid denoted by
y~'), that is:
Solution (cf. Deniau and Oppenheim, 1979)
2
Let us suppose that the dimensionality of the 1 points is K and that of the H centroids in(Ih') = W
oIlyo - y(h')11 + in(Ih' -1) + (W(h') - wo)lly(h') _ y~')112
is K', where K' ~ K. The result is thus trivially true for k > K', since A~ = O in this The decrease in the inertia within group h' is thus:
case. For k ~ K', consider the following two subspaces of the K-dimensional space in
which all the points lie: first, the subspace of all vectors with respect to which the in(Ih') - in(Ih' - 1) = W 2
oIlyo - y(h')11 + (W(h') - w o ) IIY~') - y(h')11 2 1

(moment of) inertia of all the points is ~ Ak; secondly, the subspace of all vectors with This can be simplified as: !1

respect to which the inertia of the centroids is ~A~. The first subspace excludes the
first k - 1 principal axes of the 1 points and is thus of dimensionality 1 - (k - 1) = w W(h') )
in(Ih') - in(I h' - 1) = ( (h?)' 1 Yo ­ y(h') 2 11
1 - k + 1, while the second subspace includes the first k principal axes of the H W, -w o il
centroids and is thus of dimensionality k. Since the sum of these dimensionalities is using the expression for the position of the new centroid: :1
1 - k + 1 + k = 1 + 1, the intersection ofthese two subspaces must have a dimensionality
of at least 1. W(h')y-(h') - w Y
-(h') _ ' o o
We can thus assume the existence of a vector u which is common to both subspaces. Yo - W(h')-W
. o
With respect to this vector, the result of Example 7.5.3 applies and the total inertia of which implies that the dilTerence between old and new centroids is:
all the points projected onto u must be greater than the inertia of their centroids:
inu(l) ~ inu(H). By definition ofthe two subspaces to which u belongs: Ak ~ inu(I) and y~')_y(h')
(h'~o (Yo-y(h'»)
=
inu(H) ~ },~, which implies }'k ~ A~. -w o w,
This argument can now be repeated for the group hU, where the point Yo is added
to the group. This leads to the following decrease in within-group inertia:
7. 5. 5 Change in between-groups inertia induced bV transfer of one point
in(Ih")- in(Ih" + 1) = - W oIlyo - y(h')1I 2+ (W(h") + wo ) IIY~') - y(h)11 2
Given the same situation as in Example 7.5.3, consider the transfer of one point Yo
_ wow,(h") ) -(h") 2
(with mass wo ) from group h' to group hU. Let in(h) denote the inertia of the hth group - - W(h')+W IIYo-y 11
(
centroid (with respect to the overall mean) before the transfer, which in our previous O
notation is: in(h) = W(hl Ily(h) - yf. Show that the increase in between-groups inertia which is negative, as expected, because the within-group inertia must increase when a
206 Theory and Applications ofCorrespondence Analysis

point is added. The inerease in between-groups inertia (7.5.3) folIows by adding


together these within-group deereases.

7.5.6 Contributions of orthogonal axes to the nodes of a hierarchical

clustering

Let Yi, i = 1 ... 1, be a cloud of points, with masses Wi , i = 1 ... 1, in a multidimensional


weighted Euclidean spaee where the metrie is defined by the positíve-definite matríx
Q. Suppose that we have a multídímensíonal Euclidean representation of these points
with respeet to any orthogonal system ofaxes, with the eentroid of the poínts at the
origino Let the rows of the matrix F (I x K) eontain the eo-ordinates of these points ín
the fulI spaee, in other words for any i,i': (y¡_y¡,)TQ(y,_y,,) = r.f(/;k-fi'k)2. Now
suppose that we have any hierarchical clusteríng ofthe 1 points where at any stage of
00
the clusteríng a node I unites two clusters l' and 1" of 11' and 11" points respectively into
a síngle cluster of 1, poínts. Let fll'} denote the co-ordinate on the kth axis of the Speeial Tapies
centroid y(1'} of the l'th cluster of points and let W~l') be the sum of the masses of these
points. Define the quantíties: V/k == (W~")W(I'")/W~I)) (fll') - fll'"J)2 and VI == r.{Vlk .
(a) Show that VI ís the difTerence between the (wíthin-cluster) inertia of cluster I and
the sum of the inertías of the clusters l' and 1": VI = in(J 1) - (in(J 1') + in(J 1"))' In the present chapter we have gathered together a number of special topics
(b) Show that the ratío V,k/V I ís the square of the angle cosine subtended by the kth in correspondence analysis, devoting a section to each topic.
principal axis and the line joining the centroids y(1'} and y(l") (see Fig. 7,8).
Section 8.1 deals with the stability of the graphical displays and the
Solut¡on statistical sampling properties of the principal inertias (squared canonical
(a) By the present definition OfV lk and VI:
(1') (1") (1') (1")
correlations ).
V
I
=~r.K(Jr)-fll"))2
W~I) k k k =~lly(/LY(/")1I2
w(I) Q
Section 8.2 treats the topic of assigning new masses to the rows and/or
columns of a data matrix with a specific objective, usually sorne kind of
since fll'), k = 1... K, for example, are the co-ordinates of the centroid y(/') in the inertia standardization. Here we also introduce the general concept of
full Euclidean spaee of the dísplay. (Notíce that ín (7.4.1) we define VI as the
dífTerence in ínertias, whích we shall now show to be equívalent to the aboye focusing the display on a particular feature of the data.
result.) Applying Huyghen's theorem to the inertias of the two clusters of 1" and Section 8.3 illustrates the horseshoe effect which ís often observed ín the
11" points wíth respeet to the new joint eentroid y(l), we have: displays and explains how it comes about.
in(Jl) = in (JI' ) + w(l") lIy(I') - y(I)II~+ in(I,,)+ w(l'") lIy(I") - y(/)II~ In Section 8.4 we discuss alternative ways of constraining the solution to
Hence the increase in inertia in(I,) - (in(II') + in(II')) is just the sum of the ínertias satisfy certain prior conditions.
of the two eentroids (which is to be expected from Example 7.5.3). The result A method of dealing with missing data is discussed in Section 8.5, based on
folIows trivialIy on substituting for the joint eentroid: the reconstitution formula of correspondence analysis, so that data with
y(/) = (w(l')y(l')+w~l'")y(I'"))/(W(I')+W(I")) missing values can still be displayed.
(b) This result is easily obtained from simple trigonometry in Fig. 7.8. The absolute In Section 8.6 the peculiarities of analysing symmetric matrices by
value of the angle cosine is either of the following three ratios: correspondence analysis are discussed, including further discussion of the
¡n')-n/JI _ Inl"l-nol _ ifll')-fln¡ so-called Hurt matrix associated with an indicator matrix.
lIy(l')-y(/JIIQ - Ily(I") _y(l)IIQ -lIy(l')-y(l")IIQ Section 8.7 is a brief discussion of special algorithms and strategies to cope
which are equal because y(l'}, y(/) and y(l'") Iie on a straight line. The third ratio with large data sets, and the chapter is again concluded with a set of comple­
is exactly (vldvl)1/2. mentary theoretical and practical examples.
Comment
This result is applicable to any display of the points with respect to orthogonal axes 8.1 STABILlTY AND STATISTICAL INFERENCE
and any type of hierarchical clustering. Most often, however, we are interested in the
display with respect to principal axes and the clustering which minimizes VI at each Although we have stressed that correspondence analysís is an exploratory
node.
technique, we must face the question whether the features in the displays that
208 Theory and Applieations ofCorrespondenee Analysis 8. Speeial Topies 209

we finally interpret are "significant" in sorne statistical sense. This question have been obtained in the absence ofthese elements. Thus a robust regression
seems to be relevant only when the data arise from sorne random sampling procedure should be internal1y stable. Qur present situation, however, is
scheme, in other words when we assume that the data are a representative more problematic in that the planar display as a whole is analogous to the
"image" of an underiying population. In fact this ideal situation, on which estimate, that is it is an estimate of sorne theoretical plane, not a single scalar.
most conventional statistical inference is based, occurs relatively infrequentiy If the columns of a data matrix represent a set of preselected variables, it
and the data are more often than not collected in a deliberate non-random might be of additional interest to investigate the stability of the display with
fashion. For this reason we prefer to consider the wider issue of stability of respect to omitting each of the variables in turno This stability is characterized
the resuits, which includes the conventional notions of statistical significance as internal because the variables are not a sample of a potential set. Notice
as a special case when the data are representative samples in the usual sense. that we do not want to remove attention from rows or columns of a data
matrix which cause internal instability, but rather recognize the strong role
Internal and external stability they play in the display. If we could see the data vectors in their true high­
dimensional positions, there would be no problem of internal stability.
To introduce our study of stability, suppose that a 2-dimensional display of Attempts have been made to structure muitidimensional scaling as a
the rows and columns of a data matrix has been obtained by correspondence statistical technique so that confidence regions on the displayed points can be
analysis. We shall call this display stable at two different levels: first at the derived (for example Ramsay, 1977, 1978). While being of interest within the
level of the data matrix itself, and secondly at the level of the wider particular data context (which is usual1y quite specialized) these rely on
population (should the data be sampled from a population). In other words, questionable assumptions and introduce a whole new spectrum of complica­
if the rows, say, are indeed a random sample from a multivariate population, tion into the analysis, owing to large numbers of parameters that have
then the planar display is a partial view of "reality" at two levels. First, it is a to be estimated. As an alternative, we shall suggest a non-parametric
partial view of the multidimensional scatter of the data points, being approach using the ideas of jackknifing and bootstrapping. These have wide
optimally orientated to reflect as much of the points' inertia as possible. applicability, but admittedly lack the mathematical rigour of the traditional
Secondly, the data points are themselves a partial view of a theoretical statistical approach. For this reason we prefer the more physical term
geometric distribution of points in the population. stability as opposed to the statistical term confidence. An investigation of the
At the first level we say that the plane is internally stable, hence also the stability of a configuration of points often suggests very strongly that there is
scatter of points projected onto the plane, if the plane's orientation is not a "statistically significant" pattern in the data, which can then be confirmed
determined by isolated features of the given sample. An example of internal by formal analysis (if possible!). We are thus trying to push our "pattern
instability is thus a single "outlying" data point which has caused the recognizing" exploratory analysis as far as possible in the direction of con­
principal plane to swing around excessively in its direction, so that removing firmation of the patterns without having to assume a specific mathematical
this point changes the plane's orientation quite dramatically. framework for the data.
At the second level we say that the plane is externally stable if its orienta­
tion is minimal1y altered by considering further samples from the same
population. An example of external instability is thus a sample which is not Jackknifing and bootstrapping
large enough to characterize the population patterns with low variability, so Jackknifing (reviewed by Miller, 1974) and bootstrapping (Efron, 1979)
that other samples of the same size lead to different principal planes. Notice provide convenient frameworks for investigating the internal and external
that when the data are not collected by sampling, then we are only interested stabilities respectively of planar displays. Usually they are used to investigate
in the internal stability of a display. the variability of statistics calculated on sampled data, what we would term
In order to illustrate this distinction further, a paral1el might be drawn with the external stability of these statistics. In our context we are firstiy concerned
regression analysis. External stability of an estimate of a regression coefficient with the interml stability of the display with respect to a set of given entities,
would mean that the estimate has low standard deviation and would not vary be they rows or columns of the data matrix. We shall use the term jack­
unduly if the study were repeated. Internal stability of the estimate would knifing to mean the deletion of each one of these entities in turn, followed by
mean that no isolated elements of the sample itself have contributed exces­ an assessment of how seriously this affects the display of points or orientation
sively to the value of the point estimate, so that a quite different value might of the planeo Bootstrapping, on the other hand, suggests generating a large
210 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 211

number of additional "data matrices" by resampling, with replacement, from of a 2-dimensional display of a set of 1 points with masses w¡, i = 1 ... 1, where
the sampled entities in the data, with the assumption that the original sample };k denotes the principal co-ordinate of the ith point on the kth principal axis
is the best available representation of the underlying population. The changes and Ak == /lf denotes the kth principal inertia, k = 1 ... K. Rather than repeat
induced by such resampling wiH give us a fair indication of the external the analysis 1 times in order to recompute the principal plane each time a
stability of the display. Clearly, if we could resample from the parent point is omitted, the stability can be judged by inspection of the table of
population itself we would obtain a more correct picture of the external contributions (i.e. decomposition of inertia, Section 4.1.11). If a row point
stability. However, we cannot do this and Efron's idea is to "puH oneself up i = s is removed, the displáy will have a new centroid and the best-fitting
by one's own bootstraps" and resample from the sample itself. plane will have a new orientation since it is no longer attracted to the position
Thus if the rows of an 1 x J data matrix represent a sample of people, we held by S. Depending on the sizes of the contributions w.f.~ and wJ.~, one of I
could obtain as many bootstrapped data matrices as we liked by resampling the following four situations can result:
I

l
from the set of rows. Because a bootstrapped sample is the same size as the (1) The plane and the axes remain stable.
original sample, it is inevitable that it contains sorne rows (people) more than (2) The plane remains stable and axes 1 and 2 change orientation, often
once, others not at aH. If we think of each row being assigned a mass W¡ interchanging.
(i = 1 ... 1), then the original rows have masses aH equal to 1/1, whereas in (3) The plane remains partially stable-for example, axis 1 remains stable
each bootstrapped sample the masses are multiples of 1/1, and zero for a row but axis 2 rotates out of the plane, often interchanging with axis 3.
which is not included in the sample. Thus bootstrapping merely redistributes (4) The plane is unstable and assumes a completely new orientation.
the unit of mass amongst the rows in discrete units of l/l. If we generalize this
idea to the redistribution of the mass in continuous amounts, then the jack­ These are clearly extreme situations and one of many intermediate outcomes
knife is subsumed as a special case of the bootstrap. Jackknifing (the rows) is possible.
involves 1 repetitions of the analysis, omitting each row in turn from the So far we have avoided a definition of stability, but clearly a quantifiable
analysis. These are just 1 particular bootstraps, where masses 1/(1-1) are measure of the change of the principal axis is the angle </> through which it
assigned to the (1- 1) included rows and O to the excluded row. This rotates. Escofier and Le Roux (1976) remark that if </> is less than 45° then the
emphasizes the close relationship of our investigations of internal and old and the new axes are closer to each other than to any other axes. Hence
external stability, the former being a special case of the latter. The jack­ we shalllabel a rotation of more than 45° instability, with stability increasing
knifing of the display, however, is so much simpler that it can be described monotonically from 45° to O°. The same rule can apply to the plane in terms
analytically to a certain extent and thus merits special consideration. of the planar rotation, which is defined as the maximum angle between a
Very often a study involves a substantial number of sampling units which vector in one of the planes and its orthogonal projection onto the other planeo
are important only in reflecting information about groups. Considerations of In both cases maximum stability is reached at 0° (cos 2 </> = 1) and maximum
internal and external stability remain essentially the same, but the emphasis instability at 90° (cos 2 </> = O), with a borderline at 45° (cos 2 </> = !).
is now on the stability of the display of the group summaries, usually the Apart from the trivial situation where s lies exactly at the centroid, the
group centroids or the geometric scatter of the groups. In the correspondence simplest case is where s lies exactly along a principal axis, say the first (Fig.
analysis of a contingency table the rows and columns are both preselected 8.1(a)): that iSf.k = Ofor k =1= 1. Removal of s causes the centroid ofthe points
sets of attributes and the sample is condensed into the cel1s of the tableo Here to translate away from s to a position at e = - {w.l(I- w.)} f.l on the first
we are usually most interested in the stability, usually external, of the points axis. The mass of each of the remaining points is scaled up by 1/(1- ws ) and
representing the rows and the columns. Of course in both of the aboye the moments of inertia along the old principal axes are respectively:
situations, if the number of sampling units is fairly low, or if the data do not
arise from random sampling, then a specific investigation of the internal ¿ {wj(l- w.)} (};l -
ifs
C)2 = [Al - {w./(I- w.)} f.D/(I- w.) (8.1.1)
stability (of the points and the axes) would be relevant.
¿ {wj(l-w.)}¡;f = Ak /(I-w.) k=2,3,oo. (8.1.2)
ifs
Effect of individual points on the principal axes (jackknifing) Clearly if the amount {w./(1 - w.)} f.~ is large enough, a new ordering of the
Let us illustrate the jackknifingapproach by considering the internal stability principal inertias may result, with sorne axes "shifting up" in rank. As long as
212 Theory and Applieations ofCorrespondenee Analysis 8. Speeial Topies 213

(a 1
~E ­ in the direction of the previous third principal axis. In other words the new

//·7

second principal axis will be more aligned with the previous third axis than
with the second. If we express the negation of (8.1.4) in a slightly more relaxed
form by dropping the term {w.l(l - w.)} 1.~, then we have the following condi­
tion which is sufficient for the second principal axis to be "stable" (Le.

(b/~t=.
¿
7
cjJ < 45°):
A2-{ws/(1-w.)}1s~ > A3
This condition provides a quick and easy method of checking on the stability
(8.1.5)

of any principal axis k by scanning the corresponding set of inertia contribu­


1el tions wJ¡~, i = 1.,. l. The subscripts 2 and 3 in (8.1.5) can be any k and k + 1
/t~-=r
~)/

respectively, but notice that our argument assumes that the "higher" principal
/ axes are negligibly rotated by removing s (hence our assumption aboye that
1s1 = O). Notice too that we ignore elfects due to the possible change in metric
induced by removing S. In correspondence analysis the elfect on the metric of
FIG.8.1 Example af a paint Iying (a) an a principal axis; (b) in the principal removing a row or column might be fairly substantial (see (8.1.14) below).
plane; (e) aff the principal plane. If (8.1.5) is satisfied, Escofier and Le Roux (1976) give a set of upper bounds
for the rotation angle cjJ of the kth principal axis when s is removed. These
the value of (8.1.1) does not equal one of the values of (8.1.2), this is the only upper bounds are only approximate when the subspace of the first (k -1)
possible change since no new orientations ofaxes are possible. A similar principal axes is itself rotated when s is removed. We define a parameter:
argument applies to any principal axis on which point s lies exactly. Also if s
lies exactly in the principal plane (Fig. 8.1 (b)), say, then its removal causes a h ={wj(l-ws)} (Ü+1.~k+l +,..)/(Ak- Ak+l) (8.1.6)
reduction in the first two principal inertias of the form (8.1.1) while the namely that part of the inertia of point s which lies in the subspace of the
remaining principal inertias are simply rescaled as in (8.1.2). A number ofnew principal axes k, k + 1, ... , relative to the difference between Ak and Ak + 1 and
situations can result, for example if the moments of inertia along the (old) adjusted for the new centroid by dividing by (1- ws ). Another quantity of
axes defining the principal plane are still the largest and if neither 1s1 nor 1.2 importance is the "relative contribution" (cL Sections 3.3 and 4.1.11):
is zero, then a rotation of the first two principal axes takes place in the planeo
Of course these simple cases never occur in practiee, but serve to illustrate cos 2esk = ws1.Uin(s) = 1.~rr'k· 1.~' (8.1. 7)
the jackknifing idea. In general, we have a point s which lies in multi· that is the fraction of the inertia of the point s which lies along the kth
dimensional space (Fig. 8.1 (c», the removal of which causes a translation of principal axis, which is also the squared cosine of the angle esk subtended by
the centroid away from s by a vector: the point vector s and the kth principal axis. The simplest upper bound for cjJ
t = -{w s/(l-w s)}fs (8.1.3 ) is then:
as well as a re-orientation of all the principal axes, including the principal sin 2cjJ ~ h (8.1.8 )
plane. If 1.1 = 0, say, then as before the only change that can take place is in while more refined upper bounds depend on h and the angle esk :
the space orthogonal to the first principal axis, and the question would be
how much the second principal axis rotates into the subspace of the if h ~ 1: tan 2cjJ ~ h sin 2e sk /(l- h cos 2 esk ) (8.1.9)
remaining axes. The moments of inertia along all the previous axes are of the if h < 1: tan 24> ~ h sin 2e sk /(l- h cos 2esd (8.1.10)
form (8.1.1) and thus if:
If the difference Al - A2 is quite small relative to A2 - A3 , say, then we might
A2-{ws/(1-ws)}1s~ < A3-{ws/(1-ws)}fs~ (8.1.4) find instability ofaxis 1 but a high stability of the principal planeo (Again for
then rotation of the second axis out of the plane is already greater than 45° subscripts 1, 2 and 3 we can substitute k, k + 1, and k + 2.) The aboye
214 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 215

conditions can be generalized to investigating the stability of this plane by sampling. Instead we shall resort to the computer, following Efron and Gong
adding the contributions in the formulae. Thus if: (1981) who "show by example how a simple idea combined with massive
computation can solve a problem that is hopelessly beyond traditional
A2 - {ws/(l-ws)}(fs~ +fs~) > A3 (8.1.11)
theoretical solutions".
then the moment of inertia of the cloud of points without s along any line in Our proposal is that a random sample of the set of possible bootstrapped
the plane is higher than along any of the axes 3,4, ... , that is the plane will matrices be drawn, and then their geometry be related to that of the original
not rotate through cP greater than 45°. Upper bound formulae (8.1.8-10) are data matrix. Various scalar statistics can be computed, for example the angle
the same, but we define: between corresponding principal axes in the original analysis and each
h ={ws/(l- wsn (fs~ + fs~ + ... )/(A2 - A3) (8.1.12)
replicated one, eventually leading to confidence intervals in the usual style of
the bootstrap. An alternative strategy, which is less heavy computationally
and use the angle 0S.12 that s makes with the principal plane: and which yields highly satisfactory results, is to fix the original plane, say, as
the viewing plane for the replications. The replicated row and/or column
cos 2 0S.12 = cos 2 0sl +cos 2 0s2 (8.1.13)
points are then projected onto this plane in order to explore the stability of
Further detailed discussion of these formulae can be found in Escofier and the points themselves as well as, indirectly, the stability of the original planeo
Le Roux (1976), as well as a discussion of the efTect introduced by modifica­ If the plane representing the original points is unstable, the replicated points
tion of the metric in the case of correspondence analysis. This efTect is also will have large variability in the full space as well as on the viewing plane, and
expressed as an upper bound on the angle of rotation cP of the kth principal vice versa.
axis in the form: Most of our discussion aboye applies to a wide class of multidimensional
techniques. As an initial illustration of our proposed strategy in the context
sin 2cP ~ Ak[~ - P/{ e(l + P)}] (8.1.14)
of a correspondence analysis, let us return to our simple 5 x 4 data matrix of
where, in the usual notation of correspondence analysis and assuming that Table 3.1, a contingency table involving a sample of 193 cases. The analysis
the sth row of the correspondence matrix P is removed: ofthis matrix is given by Fig. 3.3 and Table 3.6. By drawing a random sample,
~ =max {Ps/(cj - Ps); j 1 = J}
with replacement, from these 193 cases we obtain a replicated contingency
tableo (This is equivalent to drawing a random sample of 193 cases from the
P=min {Ps/(cj - Psj); j = 1 J} multinomial distribution defined by the 20 cells of the original correspondence
e =min{Ak-l -Ak,Ak-Ak+d
matrix.) The replicated row and column profiles are then projected onto the
respective principal planes, say, of the original profiles, using the relevant
If Ak+1 is close to Ak so that it is a question of the efTect of the metric on the transition formulae of (4.1.16). For example, if D,: 1p* is a replicate set of row
plane of the kth and (k + 1)th principal axes, the same authors give the profiles (as row vectors), then F* = D,: 1P*GD; 1 is the matrix of projected
corresponding result for the angle of rotation of the planeo This is in the same co-ordinates. Figure 8.2 shows the replicate profiles and the convex hulls of
form as (8.1.14), with the only difTerence that: each set, as they appear on the original principal planeo All the convex hulls
in Fig. 8.2(a) intersect, while in Fig. 8.2(b) the convex hull of the set of first
e = min{Ak-l -Ak,t(Ak+l -Ak+2n column profiles does not intersect those of the third and fourth columns. This
Examples of the use of these results are given in Section 9.6. would suggest that evidence of association between the rows and columns is
not strong, in fact the X2 statistic for independence in Table 3.1 may be calcu­
lated as X2 (12) = 16.4, which is not significant (P > 0.10). Figure 8.2(b) does
Bootstrapping of the samp/e to assess externa/ stabi/irv
suggest, however, that if there is a significant difTerence to be found, it is along
If the data are based on sorne sampling scheme, we can consider the external the first principal axis (the horizontal axis) which separates the first column
stability of the low-dimensional display. Although we have described boot­ from the remaining columns.
strapping of the given sample notionally as a generalization of the above Figure 8.3 is the analysis repeated on the same data matrix which has been
jackknifing procedure, it will no longer be possible to derive similar results to multiplied by 2, as if twice the number of cases have been sampled and the
bound the angle cP over the set of replicated data matrices obtained by re­ same relative frequencies observed. The correspondence matrix is unchanged
216 Theory and AppLieations ofCorrespondenee Analysis 8. Special Topies 217

(a) (a)

~c.a.e.e
---..
O. I

(b)
/:..
4
(b)


\
...\l
.1\
I
~ca.ie
----.
O. I
­- '4
.
- .. ~
I

I
HO.t'"

O~ I

I
FIG.8.2. Replicated row and column profiles (displayed separately by (a) and
(b) respectively). projected onto the principal planes of the original row and FIG.8.3. As Fig. 8.2. except the data matrix of Table 3.1 has been multiplied by
column profiles of Table 3.1. The convex hull of each set of replicates is indicated. 2. as if there were twice the number of observations.
218 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 219

and there is no difference in the display of the original row and column when a projective geometry exists in the vector space. For example, in
profiles, but the replicated profiles are more compactly gathered about the multidimensional scaling of distance or similarity matrices the final result is
original profiles. The convex hulls of rows 3 and 4 have separated in Fig. a configuration of points, not aplane cutting through a higher-dimensional
8.3(a) while the convex hull of column 1 has separated from the remaining space of the points. Replicated distance matrices, say, can still be generated
ones in Fig. 8.3(b). The association is now significant and this is observable in terms of the underlying sampling scheme and these can be completely re­
in the separation of these profiles. analysed to obtain new configurations of points, which can then be related to
Further examples of the use of bootstrapping in correspondence analysis the original one by Procrustes analysis (Schonemann and Carroll, 1970). This 1

are given in Sections 9.2, 9.3 and 9.7. is a translation, rotation and rescaling of the replicated configurations to "fit" 1

the original one. Here the Procrustes fit statistic can be used to quantify the l

closeness of the configurations, that is the external stability of the graphical


Discussion display. In sorne situations where metric scaling is used and where the i
As far as internal stability is concerned, there is nothing special about our distances are considered to have absolute as well as relative meaning, then it
jackknifing with respect to single entities. When a large number of cases is might be preferable to omit the rescaling option, so that "smaller" replicated I
'1

involved, it would make sense to perform sorne prior clustering of the cases configurations are displayed as such. In order to investigate the internal
and then investigate the stability with respect to omitting each cluster. stability of a display obtained by multidimensional scaling, each replicated

Because the most outlying cluster of points is likely to cause the most configuration of (I - 1) points can be fitted, as described aboye, to the

instability, this would be another way of identifying outliers. In gradient corresponding (I - 1) points of the original configuration.

analysis in ecology, where the number of samples can be extremely high,


an initial clustering is routinely used to "trim off" small subsets of outlying Statistica/ inference on the canonica/ corre/ations
samples (Gauch, 1980, 1982). What we are proposing here is a more specific
investigation of the effect of each of these subsets on the stability of the ecolo­ The aboye investigation of the stability of a correspondence analysis is much
gical gradient, say (usually the first principal axis). The outlying samples are less formal than the conventional approach to the statistical properties of the
not necessarily causing large instability of the derived gradient, and it is analysis. In the literature more formal attention has been paid to the
desirable to know how much they really "perturb" the analysis. A slightly distribution theory and inference concerning the principal inertias of a
different approach is described by Wold (1978), who randomly partitions the contingency table (i.e. the squares of the canonical correlations).
data matrix into groups of elements. Each group is omitted in turn and then We have already seen (for example in Section 4.1.6) that the usual
imputed in a cross-validatory fashion in order to determine how many axes chi-square statistic for testing independence can be partitioned as X2 =
are necessary to describe the "systematic" part of the data. n(A l + A2 +... + AK), where n is the total of the table and Al'" AK are
In our study of external stability of a display, we think of bootstrapping as the complete set of principal inertias (i.e. squared canonical correlations
a random redistribution of mass amongst the sample units. This random pi ... P1d· In the original version of Kendall and Stuart (1961) it was
process might from time to time yield an unusual redistribution. In Fig. erroneously stated that each "component of chi-square" nAk ( = npí) followed
8.3(b), for example, a replicate of column point 3 is seen to be quite different an asymptotic chi-square distribution. Lancaster (1963) subsequently demon­
from the others, indicating an unusual bootstrap replicate. In other words, strated that this was incorrect and would lead to optimistic significance levels
the convex hull is itself unstable and, no doubt, more unusual replicates for the larger canonical correlations and pessimistic leveIs for the smaller
would arise if further bootstraps were performed. It appears that a much ones. To this day there is still a great deal of misunderstanding of the
larger number of bootstrap replicates needs to be generated, followed by a distribution of these particular canonical correlations. In a recent paper by
"peeling" of the convex hulls of each group of replicated points, that is the Kaiser and Cerny (1980) it is suggested again incorrectly that their variances
removal of the outer convex hulls, resulting in a more stable convex hull (for be computed ES (1 - pí)2 In, which is the asymptotic result given by Lawley
a description of convex hull peeling, see Green, 1981). (1959) for normally distributed data. Kshirsagar (1972, Section 9.6) makes a
Our proposed strategy for relating the replicated matrices (Le. the repli­ similar suggestion, but simulations demonstrate this to be an unsatisfactory
cated sets of point vectors) to the original matrix is to fix the geometry of the approximation (Lebart et al., 1977).
latter as a framework for viewing the former. This strategy is only applicable The correct asymptotic theory relies on the multivariate normal approxi­
220 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 221

mation to the multinomial distribution of the IJ elements of the contingency standard co-ordinate vectors <P(k) and 1(k) (the kth columns of cj) and r
table as their total n increases. Thus, in the simplest case when independence respectively). Formulae for the 2nd-order moments are given by ü'Neill
does in fact hold, that is the theoretical canonical correlations are zero, the (1980, 1981) and are cited in Example 8.8.1, as well as an illustration of
asymptotic distribution of the components can be shown to be the same as their application to a small contingency tableo Notice that, unlike the
the distribution of the eigenvalues of a central Wishart matrix variate case of normaHy distributed data, the sample canonical correlations are not
W J _ 1(1 - 1), that is of order J - 1 and with 1 - 1 degrees of freedom, asymptotically uncorrelated.
assuming again that J ~ 1 (this result was first given by Lebart (1975, 1976) In the case of Table 3.1 we can test the significance of the first principal
and Corsten (1976). For the definition ofthe central Wishart distribution, see inertia Al = 0.0748 by comparing the X2 component npi = 193Al = 14.4
Anderson, 1958). The critical points for certain eigenvalues are tabled, for with the upper percentage points of the largest eigenvalue of a (J - 1) = 3­
example, by Clemm et al. (1973), who give P = 0.10, 0.05 and 0.01 points of variate Wishart matrix with (1-1) = 4 degrees offreedom. The critical value
the largest and smallest eigenvalues of Wishart matrices up to an order of 20. at P = 0.05 is 15.24, so that the significance level is not less than 0.05. By
In other words these tables can be used for 1 x J contingency tables where contrast the (incorrect) conjecture of Kendall and Stuart (1961) that npi be
min {I, J} is not greater than 21. Table 51 of Pearson and Hartley (1972) gives asymptotically distributed as X2 (5) would yield the optimistic significance
P = 0.05 and 0.01 points only for the same statistics for matrices up to an level of P < 0.01.
order of 10, and also interpolation formulae for P = 0.10, 0.025 and 0.005. Another test, proposed by Bock (1960) and used extensively by Nishisato
Lebart (1975) gives approximate tables at significance level P = 0.05, based (1980) is derived from Bartlett's X2 approximation to the likelihood ratio test
on Monte Carlo studies, for the first five eigenvalues of a contingency table for testing canonical correlations when at least one set is multinormal
of order 1 x J not exceeding 100 x 50, as well as for the associated percentages (Bartlett, 1951), although the aboye authors gloss over this condition. Under
of inertia (we discuss the testing of the percentages of inertia at the end of this the independence model, the test statistic for the kth canonical correlation is
section). ü'Neill (1981) also gives the means and variances/covariances of all - {n -1 - t(l + J - 1)} 10ge(1 - p~) which is compared to the X2 (v )-distribu­
the eigenvalues of Wishart matrices up to an order of 4. tion with v = 1 + J - (2k + 1). In our example aboye with k = 1, pi = 0.0748,
The asymptotic theory when dependence exists is given by ü'Neill (1978a, the statistic is evaluated as 14.6 and, using the critical points of the X2 (6)­
1978b, 1980, 1981), and the following is a summary of his results. When K* distribution, would be judged significant at P < 0.025, another over-optimistic
of the true canonical correlations are non-zero and distinct, so that the result.
remaining K - K* are zeros, we know that the bivariate probabilities In practice we would personally interpret these significance leve1s, inc1uding
(elements p¡j of the true correspondence matrix) can be expressed as the those provided by the correct asymptotic theory, with due caution, especially
decomposition: when on the borderline of conventional "significance", for example P ~ 0.05.
We prefer to note the groupings and separations of the rows and columns of
Pij = r¡ci l + '1:.~"(Ak)1/2cP¡djk) (8.1.15) the data matrix, which can then lead to formal hypothesis testing, if
that is the reconstitution formula (4.1.27) expressed in terms of the theoretical necessary. For example, Fig. 8.2(b) suggests that columns 2, 3 and 4 be
standard co-ordinates. Here the notation - indicates theoretical values. The grouped-the X2 statistic on the resultant 5 x 2 matrix is then computed as
cases k > K* and k ~ K* are treated separately. The components nA k , X2 (4) = 13.9, which is significant at P < 0.01. In the context of the example,
k = K* + 1 ... K, are distributed asymptotically as the roots of a central this grouping is a sensible one in that it brings together all the "smokers" into
Wishart W K-K. (1- J + K - K *) only if the following condition is satisfied : a single category (see also the discussion in Section 9.3).

for aH s, t, u, v > K*: '1:.~"(Ák)1/2('1:.¡cP¡scPitcP¡kr¡)('1:. /ljuijvYjkcj) = O (8.1.16)


Significance of the percentages of inertia
Such a condition is only of theoretical interest and the asymptotic distribu­
tion appears to be used notwithstanding the possibility that the condition The eigenvalu::s of a central Wishart matrix divided by their sum (i.e. the
might not be satisfied. When k ~ K*, the variable n l/2 (pk - Pk), where percentages of variance) are distributionally independent of their sum (Lebart,
Pk = (Ak)1/2 is the theoretical canonical correlation, is asymptotically normal 1975). This result carries over, although only by approximation, to the
with zero mean and 2nd-order moments depending on both the true principal inertias relative to the total inertia (i.e. the percentages of inertia) in
canonical correlations and the 3rd- and 4th-order moments of the theoretical a correspondence analysis. Thus, even if the usual X2 test on the total inertia
222 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 223

does not reject independence of the rows and columns of a contingency table, This attempt at "correcting" the geometry of the row profiles is dual1y
the major percentages of inertia might still be significantly high. Conversely, justifiable in the geometry ofthe column profiles, as discussed in Section 9.8.2.
when the hypothesis of independence is rejected, it may be that the per­ In the application of Section 9.9, involving gene frequencies in a set of
centages of inertia are not significant, implying that correspondence analysis human populations, there is again sorne arbitrariness in the choice of
is a poor "model" of the row-column dependence. populations. This can be partially eliminated by a prior down-weighting of
Lebart et al. (1977) give curves which serve as approximate critical points samples from the same or similar populations, as described in Section 9.9.4.
(P = 0.05) for the five largest percentages of inertia under the null hypothesis A number of reweighting schemes often suggest themselves, in which case
of independence. it is advisable to try them al1 out and observe which features in the resultant
graphical displays are stable across the difIerent analyses. In our experience 1

the first few principal axes are often quite stable with respect to reweighting.

8.2 REWEIGHTING AND FOCUSING Ir a certain principal axis is observed to be quite difIerent when a new set of

masses is assigned, then this would strongly indicate that the interpretation
The masses assigned to the row and column profiles are a distinctive feature should proceed with great caution (cf. Sections 9.8.4 and 9.9.4).
of correspondence analysis. When the data are in the form of a contingency
table the masses, proportional to the row and column sums of the table, are
Reweighting of measurement data
the "natural" ones to use in this context. In general data analysis, however,
the masses can be varied in order to explore difIerent features of the multi­ The application of correspondence analysis is readily extended to data which
dimensional point clouds. are measurements of positive physical quantities like rainfall, height, chlorine
We have already come across the concept of reweighting the points, that is content, etc. (These are often called "ratio" variables because they have a
allocating new masses to them, in various situations. In Chapter 5 we "zero-point" (origin) which is relevant to the study, so that relative values are
discussed the inertias of discrete variables and showed (in Example 5.5.4) how important, as opposed to "interval" variables, like time and temperature,
these could be modified in a very simple fashion by pre-transforming the where difIerences are important.) Rere we meet two situations, first where a
indicator matrix so as to assign difIerent masses to each subset of columns. In group of variables is at least measured in the same units (homogeneous
Chapter 7 we discussed the analysis of a cloud of points and a set of subclouds measurements), secondly where the variables are in different units (hetero­
(Le. a partition of the points) where we investigated difIerences between the geneous measurements).
subclouds by removing the masses from the individual points and concen­ In the case of homogeneous measurements, correspondence analysis can
trating these into their respective subcloud's centroid. In this section we wish often be applied to the raw data, as in the application of Section 9.10 where
to develop these ideas a Httle further and also introduce a general strategy in the data are measurements on fossil skulls. Notice that the analysis displays
correspondence analysis which we callfocusing. difIerences in the shapes (Le. profiles) of the skulls, while the total of a skull's
measurements, interpreted as a type of "size" quantification of the skull, is
absorbed into the mass of the skull profile.
Reweighting to obtain prescribed masses
When the measurements are heterogeneous we are faced with the question
In many situations data are collected in a deliberate and somewhat arbitrary of assigning comparable units to the variables, which in correspondence
manner, not according to sorne elegant sampling scheme. We can attempt to analysis is a problem of reweighting the variables (cf. our previous discussion
correct the imbalances and poor representativeness of a data set by reweight­ which was more concerned with reweighting the observational units or
ing the set of points which is of primary interest. "subjects", e.g. the wildlife regions, human populations). Ideally this is a
The application of Section 9.8 is a good example of such data, which question to be dealt with by the specialist involved in the study, who has
consist of frequencies of antelope tribes in a set of African wildlife regions experience in the accuracy of his measurements and their particular levels of
(Table 9.13). Rere it is decided to reweight the regions so that their masses "significance" (Benzécri, 1977c). Thus a pollution expert might put on a
are proportional to their respective total antelope frequencies per unit area. comparable basis Sj parts per million of chemical j and si' kg of industrial
This is done quite simply by dividing all the frequencies in the matrix by the wastej' from a certain factory, in which case we would reweight the raw data
surface area of the respective region, prior to the correspondence analysis. for variablesj andj' by dividing their respective values by Sj and Sj" Because
224 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 225

of the inevitable degree of arbitrariness in such a procedure we would again In a large survey there are usually many ways of partitioning the data both
suggest that difTerent schemes be attempted to reveal stable features in the row- and column-wise. For example, the respondents may be categorized
resuiting displays which do not depend on the particular masses assigned to according to a number of demographic variables-age, sex, income group,
the variables. If a range for each Sj can be adopted then random weights etc.-which we call the secondary (or background) data. The primary (or
within these ranges can be assigned a number of times and a picture of the foreground) data, often expressions of opinion, are used to represent each
uncertainty induced by the weighting scheme can be built up, in much the individual as a point in multidimensional space. Each secondary variable
same style as the bootstrapping of Section 8.1. defines a partition of the respondents (usually the rows of the data matrix)
and thus a set of group centroids. Combinations of secondary variables
provide finer partitions of the rows and centroids of smaller subgroups,
Reweighting to obtain prescribed inertias
for example the average response of young males in the highest income
In the absence of a prescribed system of masses for the possibly unrepresenta­ group. The choice of how finely we wish to partition the data will depend
tive or heterogeneous rows and columns of a data matrix, as described above, on how much data are available and the depth of interpretation we want to
we might derive the masses so that the inertias of the displayed points have pursue.
certain relative values. The only analogy in conventional statistical analysis All of these centroids exist as supplementary points in the space of the row
is the standardization of heterogeneous variables to have unit variance. In the profiles and can be projected onto any principal subspace of the row profiles.
present situation we might want to equalize the inertias of each point or In this way a contrast of opinion along an axis may be found to associate
equalize the inertias of groups of points. Thus, if a study involves L groups of quite strongly with income, say, by observing that the supplementary points
variables, where each group is internally homogeneous, but where variables representing the income group centroids line up in their natural order along
of difTerent grou~s are heterogeneous, then we could reweight each group so this axis. Notice that if the rows and columns are in standard and principal
that the inertia of each group is equal, or proportional to sorne prescribed co-ordinates respectively then the centroids occupy the same positions as the
value. In this way heterogeneous groups of variables can be standardized columns of the indicator matrix which expresses the partition, where these
against each other while conserving the common measurement unit within columns are supplementary points in the column space of the primary data
each subset. (cf. Table 5.2 and Example 5.5.1(c)).
Unfortunately the derivation of such masses cannot be achieved in a closed We can focus on specific between-group difTerences at any chosen level of
form solution, except in special cases like the multivariate indicator matrix partitioning by analysing the respective set of group centroids. Computa­
(Example 5.5.4), where the difTerent groups of points have a common tionally this is done by analysing the primary data matrix condensed, by
centroid. In general the centroid of the display changes with every new set of simple addition of the respective rows, into the groups of interest, as
masses and an iterative procedure is required to solve the problem. Benzécri described in Section 7.1. This leads to the mass of the group being the sum of
(1977c) gives a detailed description of the problem and its solution, while a the masses of its members. This is usually desirable unless certain groups are
companion paper by Hamrouni and Grelet (1977) describes a computer incorrectly represented in the sample of respondents, in which case the masses
program to perform the calculations. may be modified before analysis according to external information.
All of the above applies in a symmetric fashion to the partitioning of the
columns of the primary data matrix. In a survey of the consumption of
Focusing on poims
beverages, for example, respondents give a detailed account of the various
Focusing is a term which we use to describe the general process of reweighting alcoholic and non-alcoholic drinks they consume in a typical week. These
a cloud of points to satisfy certain objectives. In the above discussion, for may be partitioned (i.e. condensed) in various ways depending on the level of
example, our view of a set of heterogeneous measurements is incorrectly detail required in the exploration of the data.
focused if certain variables are more influential merely because they have This strategy provides the data analyst with a very flexible technique for
higher values, so we try to focus more equitably on all the variables by focusing attention on difTerent multidimensional features of the point cloud.
reweighting them. When points are grouped we have shown in Chapter 7 how In any single analysis the principal subspaces are determined solely by the
to shift the focus from between-point difTerences to between-group difTerences points with non-zero mass, while all other points can be projected onto the
by transferring the mass of the points to their respective group centroids. subspaces to enhance the interpretation of the display.
226 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 227

Focusing on subspaces b.. tl.e' 0.4333 (26.5"1.)

-
4
If the original row profiles of a matrix are points in a K-dimensional n.. ~ 8 m

: ~.
space and if we focus on the centroids of H groups of points, then we are
actualIy restricting our investigation to an (H -1 )-dimensional subspace.
't
.~f
'.. 0.5 I
Each original point can be expressed as the sum of two components, one in
this centroid subspace and the other in the (K - H + 1)-dimensional subspace
orthogonal to it. The latter subspace is often calIed the orthogonal complement, , I '. '. I f :,",.,..,
, _5 M3~
complementing in this case the centroid subspace.
J..
There may be occasions when we deliberately focus on a subspace of points _6
in order to investigate the dispersion of points in the orthogonal complement _9
.. a ood9
of this subspace. F or example, in an opinion survey we might be interested in 2­
the dispersion of opinion which is in sorne sense unassociated with the sex of .. e
the respondent. If we gradualIy transfer mass from the points to the two
group centroids representing males and females, then the first principal axis _3
will eventually be forced to líe almost exactly through the centroids. The idea • ond k.._ 7
is to leave some mass with the original points so that the remaining principal
axes can still be determined. Dispersion of the respondents along these axes
..o
will be orthogonal to the male-female difference vector. The same result can
be achieved by orthogonalizing the centered profiles with respect to this .. 1
vector when calculating principal axes, but the advantage of focusing is that
the computations are a standard application of correspondence analysis to
the reweighted data. A further advantage is that we can easily focus on FIG.84. 2-dimensional correspondence analysis of Table 8.1. showing the
horseshoe pattern of the rows and the columns.
subspaces of points in a similar fashion. For example, if there are four income
groups then we can focus (almost exactly) on the subspace of the four income
group centroids, which might be 1-, 2- or 3-dimensional, and then study
the dispersion in the orthogonal complement of the subspace which is un­ correspondence analysis as well as other multidimensional scaling techniques.
associated with income differences. Figure 8.4 shows a typical example when the ecological presencejabsence
In Section 9.5 we discuss a set of examination marks and focus the first data ofTable 8.1 are analysed by correspondence analysis. In each display the
principal axis on the vector of total marks of the students so that information positions of the sites describe a parabolic-shaped curve, hence the term arch
which is uncorrelated with the total mark can be displayed and interpreted. or horseshoe. In Fig. 8.4 the set of species also describes a horseshoe pattern.
A number of practical issues are treated in this application, for example how The third principal axis reveals yet a further non-linear relationship and the
the focusing affects the calculation of inertias and percentages of inertia. points plotted with respect to axes 1 and 3 describe a cubic-shaped curve
(Fig.8.5).
Before attempting to explain this phenomenon let us first observe that if
8.3 HORSESHOE EFFECT the rows and columns of the data matrix are re-ordered as they appear on the
first principal axis of Fig. 8.4, then a diagonal band of positive mass is
Even though the principal axes are orthogonal to each other in a linear sense revealed (Table 8.2). This conforms to a model of a single underlying gradient
they can still be approximately related to each other in a non-linear sense. which simultaneously orders the sites and the species in this way. For
The prime example of this phenomenon is the so-called horseshoe effect (e.g. example, if we suppose that this gradient is rainfall, sites 3 and 2 at one end
Kendall, 1971), alternatively known as the arch (e.g. Gauch, 1982) or Guttman of the horseshoe might be in low rainfall areas, characterized by species
effect (Benzécri, 1973), which is often observed in the displays resulting from o, c, a ... which prefer drier conditions, while sites 9 and 7 at the other end
228 Theory and Applieations of Correspondenee Analysis 8. Special Topies 229

TABLE8.1 TABLE8.2

Presence/absence of 15 plant species at 1O sites (1 indicates presence).


Data matrix of Table 8.1, with rows and columns rearranged in terms of their
ordering on the first principal axis of the correspondence analysis (Fig 84)
Species
Species
Sites a b e d e f 9 h i i k / m n o

Sites o e a 9 i f I m b n d h k e

1 1 1 1 1 1 1 1 ­
2 1 1 1 1 1 3
3 1 1 1 1 1 2

4 1 1 1 1 1 6

5 1 1 1 1 1 1 1 5

6 1 1 1 1 1 1 1

7 1 1 1 1 1 8

8 1 1 1 1 1 1 1 10

9 1 1 1 1 1 4

10 1 1 1 1 1 1 1 9

receive higher rainfal1 and are consequentIy characterized by the presence of


X3= 0.18961 (116%) scale

different species (1, e, k, ... ). In the present case where the data are either Os or
.. 1
I 0.5 ls, a matrix in the form of Table 8.2 is cal1ed a Petrie matrix, after the
archaeologist Flinders Petrie (cf. Kendal1, 1971). When the rows and the
columns of a data matrix can be permuted to obtain a Petrie matrix, it is easy

.
.7
i..
. .1
f
..
5 6

.2

to show that the first principal axis of correspondence analysis will provide
that correct ordering. This can be proved for a more general Petrie matrix of
non-negative numbers where from row to row the mass in the row profiles
e and k j.. a..and Q shifts monotonical1y, say from the left to the right. Equivalently, this is a
8
XI = 0.7887
-------e
.10 1m .e --'--*48.3%) matrix where the mass in the column profiles shifts monotonical1y from top
to bottom as we move across the columns.
9
• .
h .. d .. b
The existence of the horseshoe is more easily visualized in the case of
.• 4
n
correspondence analysis, where the row and column profiles are represented
in barycentric co-ordinates. F or example, if there are just 3 columns, the row
profiles aH lie within a triangle whose apices represent profiles which are
concentrated entirely in one of the columns (cf. Section 3.4). Thus, as we move
.3
down the rows of the matrix depicted in Fig. 8.6(a), the row profiles trace a
path which starts near the apex representing the first column, curves around

r ..0
as it passes that of the second column and final1y heads towards the third
column (Fig. 3.6(b)). The row profiles of a similarly patterned matrix with 4

columns would aH lie within a tetrahedron. The 3-dimensional path of the


profiles as we move down the rows must migrate from column 1 (i.e. the apex
F IG. 8.5. Display with respect to pri ncipal axes 1 and 3 of the correspondence representing column 1), curve in the vicinity of column 2 and then near
analysis of Table 8.1, showing the cubic pattern of the rows and the columns. column 3, and finaHy head towards column 4. To enable the visualization of
8. Special Tapies 231
v Q) '" Q) e
..eQ)..e0
+-' u+-'._
O '0. >- tí this path we show the projections onto two faces of the tetrahedron in Fig.
+-''0=(1)
-
(j)..e:::J0
ro'~ 8.6(c). The two curves of the path usually reveal themselves along the third
(J.-=:' C/') o.. principal axis of the points.
,,~:::JQ)
~ 2 Another way of explaining the existence of the horseshoe is in terms of the
..e
ICU)O
Ol 0-5
e
distances between points (e.g. between sites in the example of Table 8.2). For
~.~.~ ~ example, the similarity (and thus the distance) between sites 2 and 4 is
-+-'(1»
o.Q)XO
rt)
o..e ro..e
+-'+-'M(J)
identical to that between 3 and 7, since each pair has five species present at
Q) e ",­ each site with no common species. The analysis is thus trying simultaneously
e
-E0~c
E
E''=' - E to represent the distant pairs of sites (e.g. 3 and 7, 2 and 7, 3 and 9, etc....) as
:::J
'O 0'==0.2 equidistant, and the neighbouring sites (e.g. 4, 9 and 7) as progressively
u "::E..e°
'" u u dissimilar. This results in a bending of the positions of the sites as the
e "''''ro'<t
ro:c Q)..e
E E - e .'= extremes are pulled in like a bow. This argument applies to a lesser degree to
:::J
ooo~g the intermediate sites as well, resulting in further twistings of the site positions
'O
e u -o ~ -:S .~ -O
C_O';:;<D in multidimensional space.
E ro \.i= e ro.r.
:::J .!Je.!!!E~ When there are two underlying gradients the situation becomes even more
o..~ ~ :§
u "8 rt) g'
''¡:; ~ Q) 3=E..e complicated and it is possible that the horseshoe effect of the stronger
e ~ O
..c en._ gradient is sufficientiy important to dominate the weaker gradient, which is
E lt.... +-'
C/')(1)+-,V>,+­
:::J >-..e ro ro O itself only revealed along a subsequent principal axis or even completely ob­
'O =+-'..c,+-V)
u ro >- - O Q)
:::J.!J Q)", U scured by the twistings ofthe stronger gradient (Austin and Noy-Meir, 1971).
~ 5·g ~ 2 This is a problem which has plagued ecologists ever since it was anticipated
0,0000
ro"Z~~ by Goodall (1963). In artificial examples of sites spaced at regular intervals
Q) ­
~u~-:So on a single gradient, usually assuming sorne type of Gaussian abundance of
-ro"''+-+-'
~;:;~o§ each of a set of species along the gradient, the sites and species are, at least,
.~.t= e Q) U)

ro
E
roo. .Q
- 0.'­
~ ~ correctly ordered by a 1-dimensional correspondence analysis (reciprocal
~-o~~'§ averaging). However, there is a gradual bunching up of site positions at the
C\I ~~(/)~o.
e extremes where the horseshoe is steepest and this does not concord with their
Ol~.~a; ~
§ 4 <--o .~ C/) ~U)..c e assumed regular spacing. Valiant attempts have been made to straighten out
~d>+-,~Q)
'O o o .~ -:5 a; the horseshoe (Swan, 1970; Williamson, 1978; Hill and Gauch, 1980). The
u
..e..e0l- .....
U:~o.._o
approach of Williamson (1978) concentrates specifically on the fact that the
c ­
usual interpoint distance functions do not include knowledge of the chaining
v>U)+-'ü
..c e lt....·

e E ..e0:::J
_ '<t.

:::J • . between distant pairs of sites that is so obvious from the Petrie matrix of
E -Q),,(")

:::J O t: =
Q) e
Table 8.2. He thus proposes that the distances between unconnected (or
"8 u,
M-etI·­
..c..out)
O
almost unconnected) pairs of points, that is pairs of points at approximately
~Q)
.~ == ~ (/) the same maximum distance apart, be recomputed as the sum of the distances
Q) ro e over a few intermediate linking points. This method, appropriately called
x u~.-
'5 -o en-o "step-across", also includes the use of the so-called "city-block" distance
------II~~:
ro e Q) Q)
"",ro=.!J
------ ~ .... .:. . ¿..c O '¡:: function to evaluate distances between points which are linked. This concords
--.. I lt.... U
.:~: '" en +-' Q. en
with the requlrement that the distances be additive along the single gradient.
------ .,' .. ~..e
CJ).~
Q)
-o
------ :.:'.=; .:. . '¡;::
<OE:::Jro
e (J) Effectively the distant pairs of points which previously had zero similarities
~
COOQ)<D are assigned negative similarities, these being increasingly more negative as
e d~:5E
-O+-'CO the similarities between the linking sites decrease. This filters through as
u...!Jro",
negative eigenvalues in the principal co-ordinates analysis which is used to
232 Theory and Applieations ofCorrespondenee Ana/ysis 8. Specia/ Topies 233

map the similarities, but these are usual1y too small in absolute value to be constraints on the solution, since the points are customarily displayed in
problematic and the horseshoe is largely eliminated in the principal subspace. principal co-ordinates as the orthogonal projections of the fixed row and
A similar strategy in the context of multidimensional scaling is to place more column profiles onto optimal1y fitting subspaces. (The principal axes them­
emphasis on local structure (i.e. larger similarities, equivalently smal1er selves require identification conditions but we are usual1y more interested in
distances) and treat al1 zero similarities as missing values in the process oC the display itself.) In the other forms of the analysis, for example reciprocal
fitting the data to a Euc1idean representation (cf. Greenacre and Underhil1, averaging, the scale values (co-ordinates) are unidentified and identification
1982).
conditions are imposed to obtain a unique solution. As iIIustrated in Table
The method cal1ed "detrended correspondence analysis" (Hill and Gauch, 4.2, two constraints on either set of scale values are sufficient to identify the
1980; described also by Gauch, 1982) is attractive in that no specific solution. These are usual1y in the form of a mean centering and a variance
functional form of the non-linearities is assumed. The iterative algorithm standardization or the fixing of two scale values.
itself which performs the reciprocal averaging is modified to eliminate not In certain circumstances the data analyst might require that the scale
only linear relationships with higher order axes, but also non-linear relation­ values satisfy another set of conditions which actual1y change the domain in
ships of a fairly general nature. This process of "detrending" involves the set which the optimal solution is to be sought (in optimization theory this
of species only (the applications being chiefly in community ecology), while domain is cal1ed the feasible region). In this section we shal1 briefly discuss
the sites are the usual barycentres of the detrended species positions. In the two different types of constraints, both of which usually lead to different
process, however, control over the geometry is lost and it is possible that, just solutions to those obtained by a "standard" correspondence analysis.
as the non-linearities can mask less important gradients, so the detrending
might introduce further artifacts into the results. Our personal view is that
correspondence analysis and other scaling techniques which rely on data on Endpoint constraints in mu/tip/e correspondence ana/vsis
pairs of points are basical1y unsuitable for the quantitative identification oC Healy and Goldstein (1976) raise the question ofidentification constraints in
more than one such gradient in data of high diversity. Specific gradient the context of optimal scaling ofthe data ofTable 8.3. In our terminology this
models with relevance to the particular application (e.g. ecology) seem more is a Burt matrix of order J = 9, where 3 discrete variables (Q = 3) have 3
appropriate here, yet these should be considerably more flexible than the categories each (J q = 3, q = 1 ... Q). The object of these authors is to derive
usual artificial examples of Gaussian abundance curves.
Otherwise the horseshoe pattern often crops up in the results of a
correspondence analysis without causing too much concern (see, for example, T ABLE 8,3

the graphical displays in Sections 9.1, 9.3, 9.4 and 9.7). Since we are in an Matrix of frequencies of responses by 12232 mothers of 11 -year-old children to

exploratory framework we can see no reason not to interpret the positions of three questions: (A) Does the child destroy its own or others' belongings 7 (B)

points along an approximate curve rather than on a straight line. AIso, as in Does the child fight with other children 7 (C) Is the child disobedient at home?

There are 3 categories of response: (1) Never; (2) Sometimes; (3) Frequently (see

Fig. 9.7, the curved pattern of the points can enrich the interpretation when Healy and Goldstein, 1976. Table 1)

there are sorne points on the concave and/or convex sides of the curve.
It is of theoretical interest that for certain ideal examples of data in the Response categories
form of a continuous two-way table, the functional form of the co-ordinates Al A2 A3 Bl B2 B3 Cl C2 C3
along principal axes of a correspondence analysis can actual1y be derived.
Example 8.8.2 deals with an ideal Petrie matrix and derives the exact Al 11440 O O 5923 5134 383 5957 5254 229
quadratic relationship between the first and second principal axes. We have A2 O 667 O 143 440 84 135 468 64
not seen similar proofs in the context of any other scaling technique. A3 O O 125 22 62 41 18 70 37
Bl 5923 143 22 6088 O O 3896 2111 81
B2 5134 440 62 O 5636 O 2084 3387 165
B3 383 84 41 O O 508 130 294 84
8.4 IMPOSING ADDITIONAL CONSTRAINTS ON THE DISPLAY Cl 5957 135 18 3896 2084 130 6110 O O
C2 5254 468 70 2111 3387 294 O 5792 O
In its geometric form correspondence analysis does not usual1y require any C3 229 64 37 81 165 84 O O 330
234 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 235

scale values (which they call "attribute scores") for the 9 categories so that TABLE 8.4
disagreement within subjects across their respective attribute scores and their Relationship between the first principal co-ordinates of the correspondence
analysis of Table 8.3 and the "attribute scores", under quadratie eonstraints, of
(weighted) average score is minimized. This is the internal consistency
Healy and Goldstein (1976, Table 2).
definition of dual scaling defined in Section 4.3 and the option of weighting is
a standard re-assignment of masses to the variables of the Burt matrix, Correspondence Healyand

equivalently to the variables (i.e. groups of columns) of the underlying analysis Goldstein's

indicator matrix. co-ordinates seores

Healy and Goldstein first consider a mean centering and a quadratic


constraint in order to identify the optimal solution. In the notation of Section
0.115 o O
-1.415 1.530 100
19.7
4.3 their quadratic constraint is a fixing of the quadratic form SSb to be equal -2.940 3.055 x 7.755 39.4

to 1, and their objective is to minimize SSw (cf. (4.3.6) and (4.3.7)). It follows 0.418 O O

that their solution should be equivalent to the optimal 1-dimensional -0.294 reeenter to have 0.712 identify max. 9.2

solution by correspondence analysis, that is the co-ordinates of the points on -1.726 O seo res for eaeh 2.144 seore of 100 27.6

0.424 category"never" O with sum of three O

the first principal axis in Fig. 8.7. Table 8.4 shows how to pass from these -0.325 0.749 eategories 9.7

co-ordinates to their attribute scores, each subset ofwhich has been translated -2.132 2.556 "frequently" 33.0

so that the lowest category "never" of each variable has score O, followed by
a uniform rescaling of all the scores so that the sum of the three highest
category scores gives a total of 100. a quadratic programming problem (e.g. see Walsh, 1975), which leads to a
Unfortunately, these adjusted scores in the last column of Table 8.4 are no system of linear equations, not an eigenvalue/eigenvector problem as before.
longer optimal, although they are perhaps more readily interpretable. This is The subsequent optimal solution, transformed in the same way as in Table
because they have been obtained from the optimal co-ordinates by more than 8.4, is quite difTerent from the one under the quadratic constraint. In this case
a mere overall centering and rescaling. It is clear from Fig. 8.7 that the we know that the optimal scaling of the 9 categories (under quadratic
re-centering of individual variables to have the same bottom endpoint must constraint) necessarily satisfies a number of additional properties, namely
make the solution sub-optimal-the categories of A are displaced towards that each of the 3 subsets of categories are individually centered and
the left, away from the almost perfect association between variables B and C. standardized (cf. relationship to dual scaling in Sections 5.1 and 5.2), even
Next, Healy and Goldstein consider optimal solutions under linear con­ though these have not been imposed a priori. The solution under the
straints which fix the "lowest" and "highest" points of the scale. This' is now endpoint constraints does not (necessarily) satisfy similar conditions, even
under a simple translation and rescaling ofthe scores together, so in this sense
A3 ' , ' O.16~O 1(23.1%)
it is not surprising that difTerent scores will result. This illustrates the fact that
• "a particular scoring system can be almost as much determined by the
\

\
constraints imposed as by the data on which it is based" (Healy and
\ C.3
Goldstein, 1976), which is a peculiarity of the scaling of multivariate indicator
\ matrices.
\ 8

\ ~\
\ •• Order constraints
\
),,-0.2412
The subject of optimization under order constraints is a vast topic in the
.-
A2 ­
82
leale
(33.8%)

literature. Jn the context of correspondence analysis, the analyst might


require the solution for a set of co-ordinates (scores) to be ordered in sorne
10 pre-specified way. It is clear that the imposition of such additional constraints
FIG.8.7. Optimal 2-dimensional display, by correspondence analysis, of the Surt must reduce the feasible region of the problem and thus lead to a solution
matrix ofTable 8.3. which is less optimal than the unconstrained one. However, this loss might
236 Theory and Applieations ofCorrespondenee Analysis 8. Speeial Topies 237

wel1 be minimal, while the gain in the interpretative value of an ordered present situation is complicated by the fact that the non-zero weights depend
solution might be large. on the row and column margins of the matrix, which are unknown unless the
Nishisato and Arri (1975) and Nishisato (1980, Section 8.1) discuss this missing values are known or are estimated.
subject, as wel1 as the special algorithms to obtain solutions of the dual
scaling problem under order constraints. Heiser (1981) and Gifi (1981) also
Imputing the missing values by iteratian af the carrespandence analysis
treat this problem at length. Historical1y, the subject was first deaIt with by
Bradley et al. (1962), and its development has been very much related to that We shal1 discuss an alternative way of tackling the missing data problem,
of non-metric scaling. originating in the work of Mutombo (1973) and Nora-Chouteau (1974).
Here the available data in the matrix "impute" the missing values, using the
same graphical "model" of the data as the analysis itself. The term "impute"
8.5 TREATMENT OF MISSING DATA means to ascribe and is often used in the context of missing data estimation.
This strategy is in the spirit of the "E-M algorithm", reviewed by Dempster
Missing data are relatively easy to handle, at least in principie, in correspond­ et al. (1977).
ence analysis as welI as in the general analysis described in Appendix A. 2. In order to introduce the algorithm, let us suppose that a single value nab
Computational1y, however, the execution time of an algorithm to perform an is missing. If we knew that the data table satisfied the folIowing "model"
analysis in the presence of missing data is greatIy increased. exactIy (i.e. the independence, or homogeneity, model, when N is a contin­
gency table): nij = ni.njn.. (i.e. p¡j = r¡cJ, then the missing value nab is the
Weighting af individual elements af the carrespandence matrix only unknown of the implicit equation: nab = (n~. + nab)(n?b + nab)/(n? + nab )
(where the superfix ° indicates that summations are performed with a zero as
In correspondence analysis we have seen that a display of the rows and the (a, b )th element of the matrix). Thus nab can be directly evaluated as:
columns of a matrix in a low-dimensional Euclidean space is obtained by
weighted least-squares fitting of the row and column points. EquivalentIy, we
n ab = (n~.n?b)/{n? -(n~. +nOb)} (8.5.2)
can think of the analysis as a weighted least-squares approximation of the When the data deviate from this model, this value still solves the implicit
elements nij of the matrix. In a K*·dimensional correspondence analysis the equation and thus does not contribute to any measure of fit of the data to the
approximation ñij of n¡j is the reconstitution formula (cf. (4.1.28»: model. However, this value need not provide the closest fit of the available
data to the model. Conversely, the value which leads to the closest fit does
ñ¡j = (n¡.n)n. J (1 + r.{"·/;kgjk/Ai/2) (8.5.1) not necessarily satisfy the implicit equation, as demonstrated in Example
This is the correspondence "model", and its "parameters" (the /;k S, gjk S and
8.8.3. In any case, the data are more likely to deviate quite dramaticalIy from
AkS) are identified in the usual way:
this simple multiplicative model, in which case the dimensionality K* of
(8.5.1) as wel1 as the unknown "parameters" stilI need to be derived.
r.¡r¡/;k = r. hgjk = O, r.¡r¡Ü = r.jCjgJk = Ak, If K* were known a priori, then the E-M algorithm could provide an
r.¡r¡/;k/;k· = r. jCjgjkgjk' = 0, where k, k' = l ... K*, k f- k' estimate of nab as follows:

"Estimation" of the parameters is performed by minimizing the weighted (a) Start with an initial value for nabo
(b) Perform the K*-dimensional correspondence analysis of the complete
least-squares function r.¡r. jW¡j(nij - ñ¡Y, where wij is equal to 1/(ni. n .Jo
matrix (the M-step, or "maximization" step, of the E-M algorithm).
If sorne of the elements are missing, or perhaps fixed by the structure of the
(c) Use (8.5.1) to obtain a new value for nab (the E-step, or "expectation"
matrix (e.g. structural zeros), then the usual algorithm is no longer applicable.
step, of the E- M algorithm).
Instead, we could pose the objective of minimizing the same function as
(d) Iterate steps (b) and (c) until the new value entering the correspondence
aboye, but letting the double summation extend only over the cel1s (i,j) of the
analysis is practical1y the same as the estimate resulting from the analysis.
matrix with valid data, in other words w¡j is equal to s;)(n¡.n.J, where sij = 1
if the datum n¡j is present, otherwise zero. The general question of weighted Again this wiII mean that the (a, b )th celI of the matrix does not contribute to
least-squares fitting is reviewed by Gabriel and Zamir (1979). Notice that the the measure offit, which we know to be r.f=K.+IAt evaluated in the eventual
238 Theory and Applieations of Correspondenee Analysis 8. Special Topies 239

correspondence analysis during the last iteration. Since K* is usually un­ this yields another set of imputed values, using (8.5.1) with K* = 2. The
known, the aboye algorithm can be repeated for increasing values of K*, 1-dimensional problem must then be repeated using the updated matrix
being initialized using the value (8.5.2) which we know solves the algorithm before the next iteration in the second dimension is performed, and so on
for the "zero-dimensional" correspondence analysis (K* = O). Steps (b) to (d) until the whole procedure stabilizes. Notice that the repetition of a 1­
are then executed for K* = 1, resulting in a value which can be used to dimensional analysis for every iteration in the second dimension is usually
initialize the algorithm for K* = 2, and so on until the fit is deemed satis­ quite rapid, especial1y near convergence, because we naturally use the
factory. When there are many missing values, then the algorithm is applied in previous solution (in the first dimension) to initialize the iterations which lead
a similar fashion, with a set of values being inserted, estimated, re-inserted, to the next solution.
etc., and the only difTerence is that the algorithm for K* = Ois now iterative
as well.
Convergence of the imputed va/ues
Although the algorithm converges for a given K* in "well-behaved"
situations (e.g. a whole row or column is not missing) we have no assurance It is an unfortunate fact that the convergence of the imputed values is usually
about the uniqueness of the resultant estimate, nor do we have any knowledge very slow and it is advisable to incorporate some acceleration technique to
of the optimality of the display. This is not too serious since we view the speed up convergence (cf. discussion of paper by Dempster et al., 1977). In
whole procedure as a strategem to allow correspondence analysis to be other situations where implicit equations need to be solved, the acceleration "

performed on all the rows and columns of the data matrix, rather than as an technique of Ramsay (1975) has been found to be extremely efTective (for
optimal way of imputing the missing values. This eliminates the highly example, in multidimensional unfolding, see Greenacre, 1978), and such a
unsatisfactory alternative of omitting the rows and columns which contain technique needs to be investigated in the present contexto
the missing data. Notice, furthermore, that the principal axes are not
"additive" as in the case of complete data-the correspondence analysis for
/mputing data which are not missing
K* = 2, for example, does not necessarily contain the principal axis of the
analysis for K* = 1, although the axes can be approximately additive if the The imputation of data values has many other applications. For example, in
imputed values are similar in these respective analyses. a correspondence analysis of a complete data matrix, each datum can be
deleted in turn and then imputed. The difTerence between this imputed value
and the value "predicted" by the original analysis is an external measure of
/mputing the missing va/ues during reciproca/ averaging the datum's infiuence on the analysis and can help in identifying extreme
An alternative computational procedure is to incorporate the imputation of data, or outliers. A global measure of the difTerences between those imputed
the data into the iterative procedure for computing the correspondence values and the data themselves provides an external measure of the fit of the
analysis. Ir reciprocal averaging is chosen as the method of computation, for graphical display to the data, in the style of the jackknife (Miller, 1974).
example, the aboye algorithm can still be used to solve the problem for Cross-validation of the graphical "model" may also be performed by
K* = O. Then, for K* = 1, reciprocal averaging is applied (cf. Section 4.2), deleting random portions of the data matrix, imputing these values and
but at the end of each iteration the missing values are updated using (8.5.1), comparing them with the original data (cf. Wold, 1978). In Section 8.6 we
or the equivalent reconstitution formula if the scale values (co-ordinates) are shall treat a specific missing value problem in the case of certain symmetric
standardized difTerently. Updating leads to new row and column margins, so data matrices, where we wish to ignore the diagonal elements.
the scale values need to be recentered before the next iteration. In this way
convergence to the 1-dimensional solution as well as convergence of the
imputed values occur simultaneously. For higher values of K*, (K*-l)­ 8.6 ANALYSIS OF SYMMETRIC MATRICES
dimensional reciprocal averagings become nested within the procedure. For
example, when K* = 2, a set of values (initially, the solution for K* = 1) is The correspondence analysis of a square matrix has special properties and
inserted into the matrix and a standard 1-dimensional reciprocal averaging is merits separate treatment. Symmetric data matrices, where the rows and
performed. A single reciprocal averaging iteration in the second dimension is columns refer to the same set of objects, frequently occur in practice, for
then performed, with the usual centering and orthogonality constraint, and example, the Burt matrices of Chapter 5, matrices of correlations, associations
240 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 241

or similarities between a set of objects, the frequencies of co-occurrence of a where a and pare any real numbers. By definition we are only interested in
set of indicators (e.g. species, artifacts) in ecological and archaeological values of a and f3 which generate matrices p(o:,P) of non-negative elements and
studies, and the total traffic or migration between areas. In many studies it is not difficult to show that the set of such values forms a 2-dimensional
involving such data, the diagonal of the matrix contains sorne structural convex polygon. The following is an outline of the proof. If we define the sets
values, the maximum similarity or association between objects, or is related ofvalues:
to the off-diagonal elements or sorne other aspect of the study. We shall
S+ == {(a,f3)lp¡\~'P) ~ O}
initially consider the given diagonal values to be included in the analysis and
then discuss how we might ignore them. Notice that matrices of distances and SO == {(a,f3)lpi\~'P) ~ O, i =f i/}
dissimilarities are not considered here, unless they can be transformed by
Sd == {(a,f3)lpi(~'P) ~ O}
sorne inverse monotonic function to values which can be regarded as a
distribution of non-negative mass over the cells of the matrix and thus
then S+, the set of values of interest, is the intersection of SO and Sd. SO can
suitable for correspondence analysis.
be shown to be a cone with vertex at the point (0,1), while Sd is a convex
polygon with at most 1 sides. Their intersection is thus a convex polygon with
Direct and in verse singular vectors at most 1 + 2 sides (Fig. 8.8). El Borgi (1978) gives an algorithm and computer
program in FORTRAN to compute the convex polygon, and shows that
The eigendecomposition of a symmetric matrix A is of the form A = UD,¡U T, certain vertices of the polygon are of special interest.
where all the elements of U and D,¡ are real. If there are no negative " Rewriting the reconstitution formula (4.1.27) for the symmetric P in terms
eigenvalues (i.e. A is positive semi-definite, or non-negative definite) then the ofthe standard co-ordinates, we have:
SVD of A is identical to the eigendecomposition, whereas if there are sorne
negative eigenvalues then the SVD takes a slightly different form, since the Pw = r¡r¡, (l + I: k<¡Jik<¡J¡'k(Al/ 2 Ek» (8.6.2)
singular values are non-negative by definition. If Ak is a negative eigenvalue
corresponding to the eigenvector U k then f1k = - Ak is a singular value of A where Ek , called the parity of the axis (or dimension), has a value of either
associated with left and right singular vectors Uk and - Uk respectively. Such + 1 or -1, depending on whether the principal axis is direct or inverse res­
a pair of singular vectors is called inverse, while a pair of identical singular pectively. It is not difficult to show that the correspondence p(o:,P) has the
vectors associated with a positive eigenvalue is called direct (Benzécri .(?t al.,
1973, Volume 11 B no. 9; El Borgi, 1978). The singular values and associated
vectors will be ordered in descending order of the absolute eigenvalues. Since
lO.l)
we measure the quality of a least-squares approximation of A in terms of the
singular values it could well turn out that an inverse pair ofvectors associated
with a large negative eigenvalue becomes important.

Convex polygon of a symmetric correspondence


Let P(I x l) be any symmetric correspondence (matrix), in which case the row
and column margins are identical: r = c. D r is the diagonal matrix of these
margins and rr T is the matrix of "expected" values based on the margins, that
is the trivial rank 1 weighted least-squares approximation of P. In the present
context we call D r the diagonal correspondence (matrix) and rr T the product
correspondence (matrix). SO
Sd s+= SOn Sd
We consider the family of correspondence matrices defined as follows:
p(o:,P) == aP + PD r + (1- a - fJ)rr T (8.6.1) FIG.8.8
242 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 243

same margins as P and the same principal axes, the only difference being in the use of the convex polygon in analysing a symmetric matrix of traffic flows
the principal inertias. In fact, the reconstitution formula for p(IX,Pl is: and compares his results with those obtained by the fitting of traffic models
to the data. Even though the symmetric correspondence analysis might
p¡\~,PJ = r¡r¡,(l + r. k({J¡k({J¡'k(aAi/ 2ek+ [3)) (8.6.3)
provide an impressive fit to the data, as in this application, its "parameters"
where the square roots of the principal inertias of p(IX,PJ, afTected by their are not really interpretable like those of a sensibly constructed parametric
respective parities, are of the form aAi/ 2ek+ [3. Notice, though, that the model. We prefer to see the introduction of the parameters a and [3 merely as
ordering ofthe terms in (8.6.2) and (8.6.3) is not necessarily the same. one of the possible ways of coping with the diagonal of a symmetric matrix.
The introduction of the additional "parameters" a and [3 into the
correspondence "model" can result in a dramatic improvement in the re­ Treating the diagonal as missing data
constitution of the data. In fact, in a space of specified dimensionality K* the
weighted sum of squared residuals (or residual inertia) is: When the diagonal values are completely irrelevant to the study, then there
is no need to fit them and we can treat them as missing data. An iterative
r.¡r.¡o{P¡¡o-Pi¡Y/r¡r¡, = r.f=KO+l(At/ 2ek+p)2 (8.6.4)
algorithm to impute values in the diagonal may be used (cf. Section 8.5),
where p == [3/a (the fit to the matrix p(IX,PJ is a2 times the aboye). This is a resulting in a solution where the imputed diagonal fits the "model" exact1y
quadratic in p and the value which minimizes the fit is: and does not contribute to the residual inertia. Since each missing value is
effectively an extra parameter, this solution should be better fitting than the
p* = - r.f=KO+1W2ek/(K -K*) (8.6.5)
one obtained from the convex polygon. The latter procedure, resulting in the
the minimum obtained being: minimum (8.6.6) can still be applied to this problem as a comparison, and has
a possible advantage that the margins of the original matrix are preserved.
r. f=KO+ lAk - (r. f=KO+1W 2 ed /(K - K*) (8.6.6) The weighted least-squares fit to the off-diagonal elements can also be
In the usual analysis of P the fit is just the first term of (8.6.6) and the performed by assigning zero weights to the diagonal. This will provide the
minimum is the sum of the smallest K - K* principal inertias. In the present best fit, but the geometry of the results is no longer clear.
situation however, the minimum might very well involve a different set of
principal axes, depending on which subset of K - K* principal inertias
Burt matrices
minimizes (8.6.6),
For example (Kazmierczak, 1978, p. 215), if P has 5 non-zero principal A Burt matrix B == ZTZ, where Z == [Zl,,,ZQ] is a Q-variate indicator
inertias (all with positive parity): matrix, is a special type of symmetric matrix. Ir Z is a logical indicator matrix,
then B has Q diagonal matrices ZiZ1 ... Z~ZQ down its diagonal (cf. (5.2.4)),
Al = 0.99 A2 = 0.98 A3 = 0.03 A4 = 0.02 AS = 0.01
so that the diagonal elements are the column sums of Z. The row (and
then for K* = 2, minimum residual inertia is provided by retaining Al and A2, column) margins of B are also proportional to these sums so that the J x J
while for K* = 3, the minimum is provided by retaining A3, A4 and As, the correspondence matrix P == B/b.. has the unique property of having its
three smallest eigenvalues! (the latter mínimum is 0.1269 x 10- 4 , whereas if diagonal being proportional to its margins, in fact Pjj = cj/Q, where cj is the
Al' A2 and A3 are retained, the residual inertia is 8.579 x 10- 4 ). The inertia on jth marginal sum of P.
the first two principal axes is readily absorbed into the "parameter" p, whose In Section 5.2 the calculation of percentages of inertia in the analyses of Z
optimal value (8.6.5) is high when Al and A2 are omitted. and B was discussed and the proposal was made that these be based on the
Notice that the fit is exact when K* = K -1, that is when any one of the quantities (cf. (5.2.17)):
principal axes is omitted, since (8.6.6) is zero. Thus in the presence of p, the
a(A!) == {Q/(Q-1WV!-(1/QW (8.6.7)
dimensionality of an exact fit is at most 1 -2, compared to 1 -1 in the usual
case. This is reminiscent of the situation in multidimensional scaling where for principal inertias A! of Z which are greater than l/Q. We now show that
the inclusion of an extra parameter called the "additive constant" also these are actually principal inertias in the analysis of B with its diagonal set
reduces the dimensionality of a cloud of points by one. to zero, called the modified Burt matrix. The associated correspondence
Results (8.6.3-6) are proved by Kazmierczak (1978), who also illustrates matrix is of the form p(IX,Pl in (8.6.1), where a = Q/(Q -1) and [3 = -l/(Q -1),
244 Theory and Applieations oJCorrespondenee Analysis 8. Special Topies 245

as shown in Example 8.8.4. Hence p = -l/Q and the square roots of the 8.7 ANALYSIS OF LARGE DATA SETS
inertias are, from (8.6.3):
Throughout this book we have assumed that the algorithm which performs
tx{(At)1/2 ek + p } = {Q/(Q-1)} {(At)1/2_(1/Q)} (8.6.8) correspondence analysis does not depend on the size or type of data matrix
at hand. The algorithms which evaluate either the relevant SVD (cf. (4.1.9))
Clearly the associated principal axes will be direct if (At)1/2 (i.e. At) is greater
or the relevant eigendecomposition and transition formula (cf. (4.1.23) and
than (l/Q) and inverse otherwise, in other words the parities e1ct ,Pl in the
(4.1.16) respectively) will clearly become unwieldy when the order ofthe data
analysis of p(ct,Pl are positive for At > l/Q, negative for Af < l/Q. The per­
matrix becomes quite large. Since one of the major uses of correspondence
centages of inertia are thus based upon the principal inertias (the squares of
analysis is the exploration of large data sets, typically sample surveys in
(8.6.8)) associated with the direct principal axes in the analysis of the modified
sociology, psychology, marketing, education and epidemiology, for example,
Burt matrix, that is the quantities of (8.6.7). In this analysis the inverse
it is necessary to consider special algorithms to ease the computational
principal axes are the artifacts which imply the individual centerings and
burden.
standardizations of the subsets of columns of B (or of Z) in the optimal
solution. In Section 4.2 we have already illustrated the reciprocal averaging algorithm,
which can be used to compute the principal co-ordinates one axis at a time.
As an illustration consider the correspondence analysis of the 9 x 9 Burt
If the data are very high-dimensional then this algorithm al10ws us to
matrix B of Table 8.3, where Q = 3, as well as that of the modified Burt
evaluate the first few sets of co-ordinates only. A further advantage is that the
matrix, with zero diagonal. The resultant principal inertias and their parities
are as follows: data may be conveniently retained in secondary storage (disk or tape) and
re-read each time the reciprocal averaging is performed (see Appendix B).
Burt matrix Further savings, both in data storage and computational time, are possible
Modified Burt matrix
in special cases. For example, when the data are in the form of a (logical)
principal inertias parities principal inertias parities multivariate indicator matrix with a large number of columns (i.e. large J),
(1) 0.260021 +1 (1) 0.25‫סס‬OO -1 for example from a large questionnaire, then it is only necessary to store the
(2) 0.172187 +1 (2) 0.25‫סס‬OO -1 addresses ofthe non-zero elements (ones) ofthe data, in other words the code
(3) 0.097292 +1 (3) 0.070164 +1 number of each response. The reciprocal averaging then proceeds as before
(4) 0.078493 +1 (4) 0.025119 -1 but involves the addressed elements of the matrix only, with all the zeros
(5) 0.065212 +1 (5) 0.014990 +1 being correctly ignored. The same strategy is applicable to any so-cal1ed
(6) 0.051835 +1 (6) 0.013677 -1 "sparse" data matrix, such as the abundance matrices occurring frequentIy in
(7) O (7) 0.006360 -1 ecological studies. Here the number of sites and the number of species may
(8) O (8) 0.001032 -1 both be very large, but with a relatively smal1 number of species present at
any one site.
It can be easily verified that only the first two principal inertias At of the Burt
matrix correspond to principal inertias At = (At)1/2 greater than 1/3 and that
these correspond to the principal inertias 3 and 5 of the modified Burt matrix Stochastic approximation
which have positive parity (e.g. 0.070164 = (9/4){ (0.260021)1/2 -1/3}2). Notice It is instructive to consider more closely the particular form of the re­ 1:
how the Q - 1 zero inertias of the Burt matrix turn up as inertias with value ciprocal averaging equations in the case of a multivariate indicator matrix
1/(Q - 1)2 in the analysis of the modified matrix. The final outcome of al1 this Z == [Zl ... ZQ] (see Lebart et al., 1977, V.2). If z"[ denotes the ith row of Z
is that we would interpret only the first two principal axes of the Burt matrix and D the diagonal matrix of column sums of Z, then the reciprocal
and assign them percentages of inertia of 82.4 % and 17.6 %respectively. The averaging (double transition) of a J-vector Yo can be shown to be:
fact that these add up to 100 % does not mean that we have explained the I

data exactIy, but it does mean that al1 the "interesting" variation is explained, Y1 = (1/Q)~{D-1Zi(Z"[YO) (8.7.1)
in the sense described aboye (cf. also the relevant discussion of "interesting" Since the metric between rows is defined by the inverse diagonal matrix of
dimensions in Sections 5.1 and 5.2). column masses: (D!)-l = QID- 1 , the scalar quantity s(i, Yo) == (l/Q)z;ryo

L 1 .. 1
246 Theory and Applieations oJCorrespondenee Analysis 8. Special Topies 247

can be considered the co-ordinate of the point (row profile) (l/Q)z¡ on the 8.8 EXAMPLES
axis (l/QI)Dyo' Only Q terms are involved in the scalar product calculation
8.8. 1 Asymptotic distribution of the canonical correlations
of s(i, Yo)' The linear operation (8.7.1) can be written as:
(of a contingency table) in the case of dependence
Yl = (1jI) r.{ s(i, Yo)1t¡ (8.7.2) Let Pk, k = 1... K*, be the observed canonical correlations (square roots of principal
inertias) of a discrete bivariate sample density Pij (i = 1., . l, j = 1... J), based on a
where the vector 1t¡ == lD- 1 z¡ = (D;)-l(1/Q)Z¡ is associated with the projec­ sample of size n, and let Pk be the theoretical canonical correlations, assumed distinct,
of the underlying bivariate density Pij (i = 1... l, j = 1... J). Then the variables
tion onto the row profile vector (l/Q)z¡. Thus y 1 is a weighted average of the n l12 (pk - Pk) are asymptotically normal with zero means and variances and covariances
1ti' where the computations are achieved at the tth iteration by l updates of given by:
the vecto~ YI ofthe form:
uf = (1 +tpf) {1 + ~IK'PIE(iP(l)ep[k))E(l(l);dk))} -iN {E(iP~»)+E(l~»)}
-2 1 - - -"4PkPk'
Gkk' = "'iPkPk' 3- - {E( fP(k)fP(k')
- 2 - 2 ) + E(-1(k)1(k')
2 - 2 )}
YI +-- YI + (1/I)s(i, YI - 1 )1t; (8.7.3 )
+ ~IK' PI [E(iP(l)iP(k)iP(k·»)E(l(l)1 (k>1(k'»)
where i = 1 ... l. Each update is achieved after accessing the Q responses of +lpkPk' {E(iP(l)iP[k»)E(l(l)l[k») + E(ep(l)iP[k,»)E(l(l)l~'»)}]
the ith case (row) and the ordering of the rows is clearly immaterial to the We shall not provide a proof of these rather lengthy results. An example of the
final result. AIso the value of YI -1 remains constant throughout such an definition one of the moments in the aboye formulae is: E(cP(l)cPtk») == r.ir.jcPilcPfkP¡j =
iteration. r.¡r. j cPilcPM:¡cj (l +r.fpk,cPik'1jk') using (8.1.15). For further details see O'Neill (1980,
The form of (8.7.3) suggests a more general updating scheme: 1981).

Application
Yt.i+ 1 +-- YI.¡ + h(i, t)s(i, YI.;}1t; (8.7.4) An example in O'Neill (1981) illustrates the aboye results. A theoretical bivariate
density is defined by the contingency table:
Each update by a row of the data changes the present y, which is in turn used
9 14 25
in the following update. The updating is no longer invariant with respect to
the ordering of the rows. (This is reminiscent of the k-means clustering
algorithm where centroids are re-computed after each individual point re­
assignment, cf. Section 7.4.) The "weights" h(i, t) are usually a non-increasing
function of i and a strictly decreasing function of t, for example:
1 18
9 38
which has marginal densities i' = [t t t]T and
20 10

correlations are easily evaluated as PI = t and P2


J e = [l
1
t l]T. The canonical
= (1/24)1/2, and the vectors of
standard co-ordinates on the two principal axes as
h(i,t) = e/{i+(t-1)l}" where e> 1,! < IX:::;; 1 iP(l) = (3/2)1/2[ -1 O l]T, iP(2) = (1/2)1/2[1 -2 l]T

This way of computing the solutions is known as stochastic approximation


1(1) = (2/3)1/2[0 1 _2]T, 1(2) = (1/3)1/2[ -3 1 l]T
and has its origins (in the present context) in the work of Benzécri (1969a). Examples of moments of the standard co-ordinates are:
f
Lebart (1974) and Lebart et al. (1977, V.3) discuss the convergence properties E(iP~) = r.;é¡,r,r¡r.A(l + r.fpk,cPik,Yjk')
of stochastic approximation and also the conditions in which it holds an = ~icPrlri (since ~AYjk' = O)
advantage over the usual reciprocal averaging iterations (8.7.3). Often a
= (3/2)3/2( -r, +r3)
single iteration of stochastic approximation aIready provides a solution y
near the optimum, and the savings in computation and reading of the data =0
12
can be tremendous. When a K*-dimensional solution is required, the and (derived similarly): E(iP[l)iPd = 1/2 1/2, E(iPtl)) = 1.5, E(ltl)1(2») = 1/3 / ,

algorithm need not proceed dimension by dimension, as in reciprocal E(1(l)) = 2.

Thus u~ = 9/16 = 0.5625. The other two moments are computed as u~ = 1.0712

averaging. Instead, it is quicker to find an approximation to the subspace of


and u12 = 0.1850.

the first K* principal axes, thus allowing large savings in computation time
at the expense of an acceptably smallloss in the solution's accuracy. Comment
This method is further illustrated by Lebart (1982a, 1982b). O'Neill (1981) also gives the first- and second-order moments of the central Wishart
248 Theory and Applieations ofCorrespondenee Analysis 8. Speeial Topies 249

matrix variate Wm(\I), for m = 2,3 and 4 and values oÍ\! up to 9. For m = 2, \1 = 2, the p(x,y)
(a 1
means of the chi-square components (under the assumption of independence) are
given as 3.571 and 0.429 respectively, with variances 6.674 and 0.391 respectively and
covariance 0.467. If the aboye contingency table were actually observed then both
components 144(t)l = 36 and 144(6/144) = 6 are together highly significant, rejecting
the null hypothesis of independence. The null hypothesis that PI 4' O and P2 = O is
also rejected quite strongly, because the second component np~ is asymptotically
WI (1), that is X2 (1), so that the observed value of 6 is significant at P < 0.001.

8.8.2 Correspondence ana/ysis of a continuous Petrie matrix


x
Consider the following continuous mass distribution:
p(x,y) = 1 x ~ y ~ x+1, O~x~l
=0 otherwise ~K 2 • JI (e)
o I

,I=r""
4(
(b)
Perform a correspondence analysis on this two-way distribution and show that the
principal co-ordinates of the "rows" (which are continuous functions of x) are poly­
nomials in x. Derive the specific relationship between the first and second principal
co-ordinate functions of the "rows" (x) as well as the "columns" (y). ~
x x
Solution
Theintegralj Jp(x,y)dxdy = 1, so thatp(x,y)dxdy (O ~ x ~ 1,0 ~ Y ~ 2)is thecon­
tinuous counterpart of the correspondence matrix Pij (i = 1 ... 1, j = 1 ... J). The c(y)
(d)

'~
marginal densities are thus:
fX+I
r(x) = Jx dy =1 O~x~l

O 1 2 Y

f: dx = y O~y~l
FIG.8.9. (a) The eontinuous mass distribution p(x. y), viewed in 3-dimensional
perspeetive. (b) p(x. y) seen "from the top", the eontinuous eounterpart of a
e(y)=
eorrespondenee matrix (e) The row mass funetion r(x). (d) the eolumn mass
funetion c(y).
J I
,-1
dx = 2-y 1~y~2

where we have used the result that 1 - zP+ 1 = (1 - z)(zP +Zp-I + ... + 1) for z = y-1.
The bivariate density p(x,y) and its marginals are sketched in Fig. 8.9. The con­ Applying the averaging on fip(y) from columns to rows we arrive at a function IXp(X):
tinuous counterparts of the row and column masses are r(x )dx and e(y)dy respectively.
We first show how the function x P (for any non-negative integer p) is afTected by the
process of reciprocal averaging. Applying the averaging process (i.e. transition
IXp(X) = f: fip(y)p(x,y)dxdy/{r(x)dx}

formula) in its continuous forro from rows to columns we arrive at a function fip(Y): that is:
fl
Jx JI +x
{(y-1)P+(y-l)r l + ... +1}dy

r
(p+1)lX p (x) = yPdy + 1

{
forO~y~l: f' xPp(x,y) dx dy/{e(y)dy} = f' x Pdx=yP/(p+1)
fip(y) = Jo
forl~y~2(similarly) ... ={l/(2-y)} f
(l/y)
1 Jo
x Pdx={(y-1)P+(y-l)P-I
= yPdy + f: (yp+ yp-I+ ... +1)dy

J,-I where we have made a change of variable in the second integral, replacing y -1 by y.
+ ... +l}/(p+l) Thus: (p+1)lX p (x) = 1/(p+1)-xP+I/(p+1)+xP+I/(p+1)+xP/p+ ... +x.
250 Theory and Applieations ofCorrespondenee Analysis 8. Special Topies 251

Hence ~p(x) is a polynomial in x of degree p: these are computed to be:


~p(x) = xP/{(p+ l)p} +XP-I/{(P+ 1)(P-l)}+ ... +x/(p+ 1)+ 1/(P+ 1)2 fl (x) = (3/2 )1/2(2x - 1) f2(X) = (5/6)1/2(6x 2 -6x+l)
It follows that the reciprocal average of a polynomial in x of degree p is another
It can be easily checked that the reciprocal averaging of fl (x) and of f2 (x) leads
polynomial in x of degree p, so that the principal co-ordinate functionsfl (X),f2(X), ...
respectively to fl (x)/2 andf2(x)/6, in accordance with the eigenvalues.
are orthogonal polynomials of x. In fact, the pth functionfp(x), that is the co-ordinates
The relationship 1?etweenf2 == f2(X) andfl == fl (x) is thus:
with respect to the pth principal axis, is the polynomial of degree p whose coefficients
are identified by the usual centering and standardization conditions and orthogonality f2 = (5/6)1/2(f12 -t)
with functions fl (x), ... ,fp-I (x). The "shrinkage" in scale of the polynomial which
satisfies the reciprocal averaging eigenequation is c1early the coefficient of x P in ~p(x), which is a parabola, symmetric with respect to thef2 axis (Fig. 8.10).
in other words the pth eigenvalue is: The corresponding principal co-ordinate functions 91(y) and 92(y) can be derived
Ap = 1/{(P+ l)p} fromfl(x) andf2(x):

r fl(x)dx
{
forO~y~
y
which decreases as p increases. 1: = (3/2)1/2(y-1)
It is a fairly simple matter to evaluate the specific form of the first two principal
co-ordinate functionsfl(x) andf2(x). We know thatfl(x) is a linear function of x,
ax+b, and thatf2(x) is a quadratic, ex 2 +dx+e. Using the centering, standardizing
and orthogonality conditions:
(A I )1/2 91 (y) =

for 1 ~ Y ~ 2: f
JO
I

y- 1
f¡{x)dx = (3/2)1/2(y-1)

Jjp(x)r(x)dx = O p = 1,2 so that91(Y) = 3 1/2 (y_1), O ~ Y ~ 2.


Similarly, we obtain:
Jj/(x)r(x)dx = Ap p = 1,2

{
forO~y~ 1:51/2(2y-1)(y-1)
Jfp(x)fp.(x)r(x)dx = O p = l,p' = 2
92(y)= for1 ~y~2:51/2(2y-3)(y-1)
'2 I 92 and the relationship between 92 == 92(y) and 91 == 91 (y):

for 91 < O: 92 = (5 1/2 /3)91(291 +3 1/2 )


92 = {
y=O. y=2 for 91 > O: 92 = (5 1/2 /3)91(291 _3 1/2 )
\ / In the 2-dimensional display of Fig. 8.10, the "columns" y form two parabolas with
\ / the property that at the origin where they meet, their tangents connect with the end­
/ points (91(0),92(0)) and (9¡{2),92(2)).
\

\
/
/ Comment

JI
\ x=O x = 1/ p(x,y) can also be considered as the continuous version of a logical indicator matrix,

\\ '1
-
91
where the number of categories for each "question" q is J q = 2. In fact, for y < 1, y and
y+ 1 form a doubled pair, so that points on the "column" curve of Fig. 8.10
corresponding to y and y + 1 connect through the origin and are balanced at the origin
in the usual way.

8.8.3 /mpvting a missing va/ve


Consider the 12 x 5 contingency table of Table 8.5, where the top left-hand element is
a structural zero by nature of the problem (see comments later).
scale
(a) Impute a value in this position which obeys the "independence model" (zero­
1.0
dimensional correspondence analysis).
FIG.8.10. Plot of the continuous row (solid curve) and column (dashed curve) (b) Show that this is not the value which optimizes the usualleast-squares fit to the
principal co-ordinate functions in the correspondence analysis of p(x. V) of actual data.
Fig.8.9. (c) Impute a value which fits the 1-dimensional correspondence "model".
252 Theory and Applications ofCorrespondence Analysis 8. SpeciaI Topics 253

TABLE8.5
Frequencies of violent and non-violent convictions amongst 390 criminals 10

(Holland el al, 1981)

Violent convictions
9

.. ....
Non-violent impuled
8

......
convictions O 1 2 3 ~4 Total

volue
7

6
...............

O O 13 5 2 O 20 5

1 3 16 6 2 3 30

2 6 19 5 2 2 34

3 10 24 5 7 2 48
5 10 15 20 25 30
4 15 13 5 3 4 40
¡lerolions
5 16 15 11 3 2 47

6 13 8 6 5 2 34
FIG.8.12. Iterations to impute the missing value, using the reconstitution formula

7 9 10 3 4 1 27
of a correspondence analysis, in the style of the E-M algorithm.

8 8 5 4 2 1 20

9-10 13 10 4 2 O 29

11-12 15 7 6 O O 28
SoIution
~ 13 16 9 3 1 4 33
(a) Because only one element is "missing", the imputed value nl1 such that PI1 = r l CI
is, from (8.5.2):
Total 124 149 63 33 21 390
nl1 = (20 x 124)/(390-20-124) = 10.08
(b) Using the rounded value of nl1 = 10 the total inertia in the data matrix can be
evaluated to be 0.140181 to 6 decimals. There is a slight contribution by nll to
this inertia due to the rounding, but this is only evident in the 6th decimal. In
Fig. 8.11 we plot the least-squares fit to the data for a range of values of nl1,
where the fit is evaluated on all actual data. That is, the fit is the inertia of the
data matrix minus the contribution of the (1, l)th element. The integer 15 is
closest to the minimum of this fit and is optimal in this sense, but does not obey
0.15 the independence model since (35 x 139)/405 = 12.01.
(c) In order to impute a value which fits the 1-dimensional correspondence model
Pij = r¡cj(l +hlgjIf..1.f/2) we have to iterate as described in Section 8.5. The
progress of these iterations, initialized by the value 10 imputed in (a), is shown in
Fig. 8.12. The rate of convergence can be seen to be very slow and eventually a
value (rounded to an integer) of 5 is reached.

0.14 Comments
The value of O in the (1, l)th cell of Table 8.5 is clearly a structural zero since the
sample contains no non-criminals. In their analysis of these data Holland et al. (1981)
overlook the presence of this zero and use it as actual data, resulting in a maximal
canonical correlation of 0.323, that is a first principal inertia of the order of 0.1043. If
the imputed value of 5 is used, according to result (c) above, then the maximal
canonical correlation is 0.286 (first principal inertia of 0.0818). Although we always
0.13 I I 1 I 1 1 I I I

report positive values for the above correlations, an examination of the canonical
o 10 20 30 40
weights (or co-ordinates on the first principal axis) shows that the correlation between
violent and non-violent convictions appears to be negative, when the ordering of the

rows and columns is taken into account. (For example, the first row and first column
FIG.8.11. Least-squares fit to all data except the (1,1 )th element. for a series of appear on opposite sides of the first axis.) The above authors do acknowledge,
imputed values for this element. however, that the evidence of negative correlation is "exceedingly minimal". Our
254 Theory and Applications ofCorrespondence Analysis

analysis aboye shows that this negative correlation, however minimal, is to a certain
extent due to the use of the structural zero as an actual datum, which naturally
reinforces the negative association ofthe ordered rows and colurnns.

8.8.4 Modified Burt matrix


Let B' be the Burt rnatrix B rnodified by setting the diagonal equal to zero. Show that
the rnodified correspondence rnatrix P' is a mernber of the convex polygon of B, hence
derive the relationship between the correspondence analyses ofB' and B.

Solution
We need to show that for sorne a and /3 (cf. (8.6.1)):
00
P' = IXP+/3D c +(l-IX-/3)cc T
so that P' has zero diagonal. That is:
Applications of

ac/Q+{Jc j +(1-IX-{J)cf = O Correspondence Analysis

where c/Q is the diagonal element of P and Cj is the jth marginal sum (mass) of P.
Hence:
IX/Q + {J + (1-1X - {J)cj = O
which can only be true for general cj ifa + /3 = 1, in which case IX/Q + (1- a) = O, thus
a=Q/(Q-I)and{J= -1/(Q-1).
From (8.6.3) the principal axes of the two analyses are the same up to possible As an introduetion to the potential of correspondence analysis as an
inversions. The principal inertias of B' are of the forrn: exploratory data·analytic technique, we now present 11 different applications
{1X(2:)1/2 + P}2 = {Q/(Q _1)2} {(J.e)I!2 - (l/QW in a wide variety of eontexts. In order to inelude so many applications, we
(i.e. the quantities defined in (8.6.7), where 2f = (2:)1/2), and their parities are the have kept the presentation of eaeh one quite brief.
sarne as the sign ofthe differences (2:)1/2 - (l/Q). The fields of application covered here are genetics (both human and
population), social psyehology, clinical researeh, education, eriminology, food
science, linguistics, ecology, palaeontology and meteorology. Further refer­
ences to published articles in an even wider variety of fields are given in
Section 9.12.

9.1 EYE COLOUR AND HAIR COLOUR OF 5387 SCHOOLCHILDREN

This application is included in acknowledgement of its historical value, since


Fisher (1940) first defined the eanonieal analysis of a eontingeney table using
these data. The same data set was also used by Maung (1941) and, very
recently, by Goodman (1981). It is also used by Hill (1982) to illustrate the
reciprocal averaging form of correspondence analysis.

9.7.7 Oatd
Table 9.1 shows the 4 x 5 contingency table of 5387 schoolchildren from
Caithness, Scotland, classified according to the two diserete variables, eye
colour and hair colour. The profiles of the hair colours (column profiles) are
256 Theory and Applications ofCorrespondence AnaJysis

TABLE 9.1

Eye and hair colour data (Fisher, 1940; Maung, 1941) The rows and eolumns

have been arranged in terms of the ordering (positive to negative) on the first

principal axis of Fig 9.1 (cf. eolumn ·'K = 1" 01 Table 92). The column profiles

are also given, In Italies.

¡~
ctj "O .~
a. 'OC'O
Hair eolour
.-
U
.!:
~
OCI~<or--lD ~'<tNC"lO
r-- r-- O) <O
C 1-­ Nr--Lt"l'<t
rf)
. _ Q) V) • U t""""" C1:)t""""" U N Lt"l
Eye colour Fair Red Medium Dark Q.O)c :
Blaek 'O E . •
Q)c:J;;:- ....

Light 688 047 116 0.41


.!:<l)_.........
"""ü° evj oc C"lO).-lD a: C"lC"l.-0'<t
584027 188014 4003 1580 ..
O O o)C"l<oC"l<O
Blue 3260.22 380.13 241 011 1100.08 30.03 718
t;:u uaJ : U
'<tC"lOOC"l
~ O> U O>
Medium 343024 84029 ..:= a.-g::a E :

909043 412030 26 022 1774 <ll 'O 1-'0 'O •

Dark 980.07 48017 ,,-..., > .......


'<tcor--'<t<O
403 079 681 049 85072 1315 '~"2'O
C"? OJ :
N lDOO'<t'<t N
r--<:tOOCO
r--'O::l o 1I ;:: 00 ~ ~ 11 N~N
m"S0C: tí :
1455 286 2137 1391 118 '-Eo.g I : >c: >c: I
5387 +-i :J Ü Q. ..
<llU<ll<tJ •
-o .>U • oc r--<ONLt"l oc .- '<t '<t '" N
ro.'O (1)-0 : 1-­ Oco O 0.- '<tC"l
!-E-cQ)(/)ct1
... . . . U .- N <O (:) '<t '<t .­
> e o~ ...... • ..
-0._
also given and it can be seen that the larger values of the profiles lie (1J : :

E- ~ E oc <O<OOOLt"l a: r--O 0)0> '<t


approximately on the "diagonal" of this matrix, suggesting a high association LO o Q) O­ O C"?LD~c.o O or--(V)(OC"l
..... U)..!:"+­ 00 O) O) U '" r-- O) O)
between the two discrete variables, Q)<J)+-'(l) U
N 00l"".!:
..... cur.n­
.0._0) .
j
O')-c c ­ Ot"""""MN <:tC"lNOOC"l

9.7.2 Method
wr.nQ)o~
~.C¡; ~ E rf) I ~
<l:><ll'O<llj:=>
I--ro0..'0 U ¡;¡
<O <O O O
(00)00
000)00
11
>c:
0<:tC"l0
<:t<:tlr-­
I
11
>c:

<:tC'?'<tCOo)
lDN lDO
,.- I
Fisher's aim in studying these data was to quantify this association by e <¡) ' . ; : : ­
ro..c ..... ro
assigning scale values (or "scores") to each of the 9 categories such that the <Il
.... <lla.
c·­ oc ~OJCOC") oc M <000 t""""" N

u
e ._
'0'.- U
e
z ~LOoo<:t
~ N LO z oo~r--ON
(V) '<t .­
correIation between the two discrete variables was maximized. We have aI­ (J) t
'+-
o'~

ready shown in Section 4.4 that this probIem differs froro the correspondence -g~ca.
0.- o o (f) (f)

o.. roe ~
(f) C"lC"l0)'<t (f) OC"lr--OON
analysis of TabIe 9.1 onIy in the standardization of the scale values. In <O~'<tO <{ C"lO>N'<t <{ r-- lD O) LO N
~o~V; cf. <6MOO ~
~NMN
~
N C"lN
'::: +-' 0..:= co ~
0-oE­
u c o <Il O)lDO)O OC"lOO)OO
)..2= 0.03011 (13.1%) E'O~-5 1­
.....J r--0>0)0 ~ 0000)0)
o~ ~-o O) 0)0)0)0 OOOOcno>
C O O
.
rf)
black halr

.. .
-<Il<110
...... :J..c­
dark eyes blue eyes fair holr ='ro-ro ~

• ..dark holr
D.>c:'2'
..... e (l) :J
<Il
::l
r--OOlDO
'<t<ocnO
o <Il -, Cloc <Il
OCOCla:<{
-~U

.
red holr light eyes ::l <Il.!: o
0.91 ...... ­
_ <Il .. o
<tJ~Eu
m
>
e
<Il
'<tOOlDO
NOOOO
0)000
0)C"l00
O)
O
C"l
Z
E
<tJ
::::;I--w<{
a:l:::J~Cl
wwww
E
<tJ
Z
;::¡:w~<!.....J
LLOC"",Cla:l
)..1 =0,1992 (86.6"10)
.~ en ~ Ol ~ooo N
<Il<tJOl W 0000 O
E·t o .-NC"l'<t ~N('l')-.:;;;tL.O
=,Q)t;
scole

medlum ha ir z c·­.!:
.-
W medium eyes .j- ..

FIG. 9.1. Optimal 2-dimensional display, by correspondence analysis, of the


eontingeney table ofTable 9.1.
258 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 259

correspondence analysis the scale values are (usually) the principal co­ analysis can be applied to data with less stringent properties, for example
ordinates, while in canonical correlation analysis the standardization is not sparse matrices and matrices with sorne low cell frequencies (less than 5, say).
identified by the objective but is usually prescribed to be "unit", that is On the other hand, modelling of contingency tables is usually applied to fairly
equivalent to what we have called the standard co-ordinates. small matrices with substantial cell frequencies, sometimes at the expense of
agglomerating sorne rows or columns.

9.1.3 Resu/ts and interpreta tian


9.2 PRINCIPAL WORRIES OF ISRAEL! ADULTS
Figure 9.1 is the 2-dimensional éorrespondence analysis of these data and 1
Table 9.2 contains the numerical results. The ordering of each set of This application is also of sorne historical interest because it is reported by
categories along the first axis is not unexpected, but the relative positioning Guttman (1971), Guttman being the originator of correspondence analysis in
is interesting. Blue and light eyes are close together in terms of their profiles the form of a psychometric scaling technique (Guttman, 1941). Here we
across the hair colours. Red hair lies approximately midway between medium illustrate the use of bootstrapping to assess the external stability of the
and fair in terms of profiles across the eye colours. The largest "jump" is from display.
medium to dark (both eyes and hair), which might be expected because ofthe
,1

high association of the dominant "dark" genetic characteristics. 1

The parabolic configuration of the points in the plane is the horseshoe 9.2.1 Data
efTect which characterizes data with a strong "diagonal band" in the profiles 1554 Israeli adults are cross-classified in a two-way contingency table
(see Section 8.3). Notice that the rows and columns of Table 9.1 have been according to their 7 "principal worries" and an eighth category "more than
intentionally arranged in the same order as the first principal co-ordinates to one", and according to 5 groups which depend on their place ofresidence and
reveal this band of association. that of their fathers (Table 9.3).

9.1.4 Discussian
Goodman (1981) discusses these data in detail and models the frequencies TABLE 9.3
using his association models. He shows that his "RC-model" for the ex­ Principal worries of Israeli adults (Guttman. 1971).
pected frequencies Pij = E(pij) = rx.J3 je V'I';YI is closely linked with the canonical
correlation approach when the rows and columns derive from the discretiza­ m m
u .S!
tion of a bivariate normal density (or a transformed bivariate normal Q¡ ~ Q¡
Q) m E Q¡
density). Since the parameters are estimated by maximum likelihood, this m
u E
E.~
Q)
.;:: <t: m ~ -5<t:
m ___ -5
framework is very valuable for formal statistical inference. Hypotheses like m

({Ji = ({Jo + i({Jd (i.e. the theoretical "scale values" ({Ji are equally spaced) can be
:>
~
~ ---o..
Q)
:.~
---m
'+- Q)
.:..c o..
'+­

...o
...o
---
m
.¡¡; 2 -Q)
m·- Vl
Q) O
m ~
:::J
Q)
m m
Q)

tested quite easily, which can be very useful in a variety of circumstances (cf. <t: <t: w
:::J ~

~<t:
~

~w Vl Vl

Section 9.3). However, because of the large sample size, Goodman finds that
ASAF EUAM IFAA IFEA IFI
the aboye RC-model does not "fit the data ... so well", and proposes a more
general mode!. No doubt even the more general model would then be rejected Enlisted relative ENR 61 104 8 22 5

if the sample size were increased, one of the dilemmas of hypothesis testing. Sabotage SAB 70 117 9 24 7

By contrast, correspondence analysis has no aspirations of modelling the Military situation MIL 97 218 12 28 14

data statistically. The data serve as the population and every "degree of Political situation POL 32 118 6 28 7

Economic situation ECO 4 11 1 2 1

freedom", as it were, is worth looking al. A displayed point is not a parameter 128 14 52 12

Other OTH 81
estimate in the usual sense, although the stability of the display can be More than one worry MTO 20 42 2 6 O
explored by analogy with conventional estimation and inference (cf. Sections Personal economics PER 104 48 14 16 9

8.1, 9.2, 9.3 and 9.7)~ AIso, being a more modest technique, correspondence
260 Theory and Applications ofCorrespondence Analysis

9.2.2 Method
Correspondence analysis of the data is performed and then the stability of the
display is explored by bootstrapping the sample 100 times and projecting the
replicate profiles onto the principal subspace (a plane in this case). Q)

-f a: !CONO)LDOO)COO a: 1 (")COO)LDLD
Ol l- ~LDLO roCOr- 1­ LDO~O)N
e U N LD U ~ r--.
o
9.2.3 Resu/ts and interpretation ro
~
(j) a: O'<!"O~O(")~(") a: '<!"CONO)O
The principal inertias and their percentages are 0.05967 (77.0 %), 0.01533 Q) O '<!"(")cor--. O)LD O NrDr--.,<!"M
O) ~
U Nr--.'<!" 0)(") U
(19.8 %), 0.00240 (3.1 %) and 0.00010 (0.1 %) respectively. Therefore the 2­ ~
(j)

dimensional correspondence analysis, containing 96.8 %of the total inertia is '+­
o N OrDO)NOONLD N Nr--.(")LDN
CO N LD (")
(") (") N LDLDCONO
an almost exact display of the profiles (Fig. 9.2). (j)
Q.
11 ,...... I N,...... 11 I (") ~
The first axis is determined almost exclusively by the opposition amongst ::J ::..:: 1 ::..:: I I
2
the worries of "personal economics" to "political situation" and "military Ol
II.1
a: OO,<!",<!"LD~N'<!" a: O(")NCOr--.
situation", with a corresponding opposition of Israelis living in Asia/Africa (j)
e 1­ coco
,......('1") 1­ '<!"COrD
LD (")
U r--. U
to those living in Europe/America. The contributions of these points to the E
::J
l'

first axis are very high (Table 9.4). The interpretation of this contrast is ou a: a:
(j)
LD'<!"CO'<!"LDr--.~rD NN'<!"O)r--. 1

obvious: people in the former group, living in the developing countries, have "O
e x
ro ro
Q) O
U
LD LD N (")
'<!"O)LD
LD O)
NO)
O
U
r--.(")O)(")'<!"
O) O) co I
financial problems, while those in the latter group are concerned about the
q- .-. co
wider issues mentioned above, as well as the "economic situation", rather
I
(j)Q.
O) Q)'­ '<!"NLDr--.NCOr--.r--. r--.~,<!"LDO 1'
.- u ~ INO)rD~NO) N,...... 0)(,0,......
than their personal finances. ~ ~
m O'¡::
e 11 ,...... N,...... ,...... LO 11 (")NNI~
1

<i :;: Q. ::..:: I ::..:: I '

I-~o

:;: ....
(j) :;: 1'1

>'2=0.01531 (9.8%) a: rD(")r--.MCOr--.COr--. a: COrD(")rDCO


o .... OLD ~(")rD N,...... LO <.O M I
~ ~ Z LD Z '<!"(")
Q)~

..
More Ihan one -f
....
(j)
(/)
(/) O)rDr--.(")NLDLD(")
(/)
(/) NrDNLDLD
1

Military siluation Ol
« N'<!"(")N~CO'<!"N
« OO'<!"~(")

.
e
EURO PE IAMERICA
Sabotage
•l Enlisled relalive
.
ASIA/AFRICA Personal
eeonomies •
o
E
ro
~
,......,......N,......
~
M LD :\1

Economie silualion
....ro ~
LDCOCOLDLDONO)
0)(")(")0)(")000) ~
rDOrDCOr--.
0)0 rD cor--.
O)OO)O)N
Nr--.O)O)LDOrDO)
O O
.
Q)
e
Polilieal silualion
• ISRAEl. falher 'O
ISRAEL' falher ASIAlAFRICA e Q) Q) LL~
ISRAEL o E a:a:l-l-lOIOa: E ««««
';:0
ro z«-Oul-l- w ro (/)~«w_

.
OIher
(j)
o
Q.

E
z W(/)~o....wO~o.... z «W
LLLLLL

o
sea le u ~N(")'<!"LDrDr--.CO ,......NM"Í" LO
Q)
t-------I
0.1 O
ISRAEL~ fal,*,r

EUROPE/AMERICA

FIG.9.2. Optimal 2-dimensional display, by correspondence analysis, of the


contingency table of Table 9.3.
262 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 263

that the second canonical correlation is highly "significant", which is indeed


the case. The second X2-component is 1554 x 0.01533 = 23.8, which can be
tested using the critical points of the largest eigenvalue of a Wishart W3 (6)
variate. From Pearson and Hartley (1972, Table 51), the re1evant critical
point is 23.69 for P = 0.01, so that the component is significant at P < 0.01.
Notice that this is the test under dependence, the null hypothesis being that
the first canonical correlation is the only one that is non-zero.

9.3 RATINGS OF ANALGESIC DRUGS

This application shows how well a bootstrapped correspondence analysis can


support the statistical analysis and modelling of a contingency tableo

9.3.1 Data
The data are from a study by Calimlim et al. (1982) and are reproduced in
Table 9.5. 121 hospital patients have been randomly assigned to four groups
and each group receives a different drug (A, B, C or D). Each patient rates the
drug's effect on a 5-point scale worded poor, fair, good, very good and
excellent.
sea le
>------l
0.1 9.3.2 Methods and results
FIG.9.3. Bootstrapped display of the columns (groups of Israelis); the number­ The 2-dimensional correspondence analysis of the data is given in Fig. 9.4.
ing of the columns from 1 to 5 is the same as in Table 9.3. The co-ordinates on the first principal axis are optimal "scores" and "scale
values" respectively for the drugs and the categories of the rating scale (see
The second axis separates out the three groups in Israel from the two groups Section 4.3). These scores and the scale values are related by the "symmetric"
outside Israel, mainly because of the response "other". This suggests either transition formulae (4.1.16) (cf. also (4.2.4) and (4.2.5)), so that the scores are
that the Israeli inhabitants are reticent about answering the questionnaire, or not, strictly speaking, weighted averages of the scale values. The choice of
that they really have worries which cannot be easily c1assified into the given
categories.
Notice from column QLT (quality) of Table 9.4 that the row "enlisted TABLE 9.5

relative" (ENR) and the column "Israel: father Israel" (IFI) are not well Responses of 121 hospital patients in a survey of the effectiveness

represented in this display. They do in fact lie in the third dimension but this of four drugs; each drug has been rated on a verbal 5-point scale:

poor/fair/good/very good/excellent.

is a very unstable feature, due to very few individuals.


Figure 9.3 shows the bootstrapping of the column profiles only. The Poor Fair Good V.good Excellent
contrast along the first axis between columns 1 and 2 is seen to be very stable.
The sample sizes of columns 3 and 5 are too small to determine a stable Drug A 5 1 10 8 6
profile point. However, along the second axis, the convex hull of column 4 Drug B 5 3 3 8 12
Drug e 10 6 12 3 O
(Israe1is in EuropejAmerica) is very c1early separated from those of columns Drug o 7 12 8 1 1
1 and 2, confirming the (external) stability of the second axis. This indicates
264 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 265

~Z=0.07731 (19.9%)

falr ..

.
exeellent
e drug B
edrull o

poor.. ~I =0.3047
very llood .. (78.3%l

drug e
edrug A llood"

seale
1----<
0.2

FIG. 9.4. Optimal 2-dimensional display, by correspondence analysis, of Table


95.
6ca.ie
standardization of the row and column co-ordinates does not affect our o--------<
O. 2
discussion of the results in any way.
The bootstrapping of this display is slightly difTerent from the previous
example in that resampling should be carried out within each row rather than FIG.9.6. Bootstrapped display of the columns (rating scale) of Fig. 9.4; replicated
points for poor, fair. good, very good and excellent are indicated by 1, 2, 3, 4
and 5 respectively.

over the whole table in order to obtain replicates of the row profiles. In other
words we assume the rows to be four independent samples from four multi­
nomial populations. The replicated row profiles are then projected onto the
principal plane as before (Figs 9.5 and 9.6). These strongly suggest that the
drugs separate into two groups: A with B and C with D, the first group being
more favourably rated than the second. Correspondingly, there does not
seem to be enough evidence to separate the responses "very good" and
"excel1ent", while there is sorne confusion at the lower end of the scale, with
"poor" and "good" being similarly scaled and "fair" occupying an anomalous
position.

9,3.3 Discussion
Cox and Chuang (1982) propose various analyses based on logit functions of
ca.le
&
>--------i
the multinomial probabilities (Le. the elements of the row profiles). Because
0.2 of the problems involved with low-frequency cel1s, they combine the cate­
gories "very good" and "excel1ent" so that the contingency table is of the
FIG.9.5. Bootstrapped display of the rows (drugs) of Fig. 9.4; replicated points order 4 x 4. The x2-statistic computed on this condensed table is 43.9, with 9
for drugs A B, e and D are indicated by 1,2,3 and 4 respectively. degrees of freedom, compared to the value of 47.1 on the original table, with
266 Theory and Applications ofCorrespondence Analysis 9. A pplications ofCorrespondence Analysis 267

12 degrees of freedom. In either case, the statistic is highly significant by Cox and Chuang (1982), which involve fitting various models to functions
(P < 0.001). of the logits, lead to the same conclusions suggested by our analysis of Section
The x2-statistic is asymptotically equivalent to the likelihood ratio statistic 9.3.2.
G2 (see, for example, Fienberg, 1980), which also tests a contingency table for The anomalous position of the rating "fair" in Fig. 9.4 deserves comment.
homogeneity (Le. row-column independence). Chuang (1982) shows that G2 It is readily seen that this is almost entirely due to the relatively high
can be partitioned into J -1 components, in this case, which are themselves frequency of this rating for drug D. Although this involves only a few people,
G2-statistics for testing for homogeneity respectively on J -1 subtables of the bootstrapped profiles of "fair" nevertheless separate out from the other
order 1 x 2 of the following form: profiles (Fig. 9.6). This equivalently explains the anomaly that the second
column of Table 9.6 shows a highly significant result, while the first column
r. j" > jn 1 j"1

l
nlj
does not. If the people were perceiving the drugs on a "bad-to-good"
n2 ·
·J
r. "> ·n2·'J
J.J j = 1... J - 1 dimension and, correspondingly, if the drugs were perceivable on such a scale,
·· .. then we would expect the 5 ratings to be in the correct order along the first
principal axis. An explanation that we would venture for the actual outcome
nlj r.j">jnlj"
is that drug D, which we know to be drug C in a much lower dosage, had
Since the J categories are ordered from least to most favoured, the ratio minimal analgesic effect. Since there is no category of response labeHed "1 do
nu/r.j" > jn¡j" is the relative chance of rating j compared to aH more favourable not know", the respondent might opt for "fair" as an easy way-out, which
ones, called the continuation ratio (Fienberg, 1980). Each G2 , with 3 degrees' does not mean that he considers the effect to be between "poor" and "good".
of freedom, corresponding to such a subtable can be further partitioned into This is only a possible explanation of what might have happened, the lesson
3 components with 1 degree of freedom each. Because of the similarities of being that verbal rating scales of this sort might not be interpreted in the
drugs A and B and drugs C and D, this finer partitioning can be carried out expected way by the respondents.
in terms of these comparisons. In Table 9.6 we give a summary of this
partitioning, derived from Cox and Chuang (1982), in terms of the X2
statistics (note that the decomposition is no longer exact). Further analyses 9.4 MULTIDIMENSIONAL TIME SERIES

Many studies involve vectors of frequencies or percentages observed at


TABLE9.6 different points of time. Here we briefly describe two such data sets and show
Approximate partitioning of the x2-statistics for the testing of the continuation
how effectively correspondence analysis can summarize them.
ratios (Cox and Chuang, 1982).

Poor vs fair Fair vs good Good vs very good 9.4.7 Science doctorates conferred in the USA, 7960-7975
to excellent to excellent to excellent
The data, given by Gabriel and Zamir (1979), are reproduced in Table 9.7 and
Testing equality of x; = 3.06 x; =19.51 x; = 21.88 described in the table caption. There are only two principal inertias of interest
continuation ratios P ~ 0.40 P < 0.001 P < 0.001
Not significant and the principal plane (Fig. 9.7) explains 95 % of the inertia. In 1960 there
appears to have been relatively more agriculture, earth science and chemistry
Partitioning the test statistic degrees, as opposed to engineeringjmathematics, while the trend from 1965
A vs B Not worth x~ = 1.00 x~ = 4.81 to 1975 appears to be away from the physical sciences towards the social
computing since Not significant P < 0.05
overall X2 is not sciences, sociology, psychology and anthropology.
significant Notice that the total inertia of the data matrix (0.01318) is quite low­
Cvs D x~ = 2.98 x~ = 0.00 the profiles are changing fairly "slowly". Nevertheless, the display does show
P<0.10 Not significant that they are changing methodically and regularly from year to year, so the
A and B vs C and D X~= 15.06 X~ = 18.00 trend is definite. An informal forecast of the profile in 1976 may be obtained
P < 0.001 P < 0.001
by using the co-ordinates gu = -0.135 and g.2 = 0.075 of the extrapolated

l
TABLE 9.7

Data on science doctorates in the USA (source: statistical abstract of the United States, 1976, table 958) The last column

consists of estimated frequencies based on the extrapolated point in the display of Fig. 9.7 and a total of 18352 doctorates (the

total is 18361 owing to rounding errors, the estimated frequencies being accurate to only 3 significant figures).

1960 1965 1970 1971 1972 1973 1974 1975 1976 (es!)

Engineering 794 2073 3432 3495 3475 3338 3144 2959 2773
Mathematics 291 685 1222 1236 1281 1222 1196 1149 1099
Physics 530 1046 1655 1740 1635 1590 1334 1293 1254
Chemistry 1078 1444 2234 2204 2011 1849 1792 1762 1804
Earth Sciences 253 375 511 550 580 577 570 556 584
Biological Sciences 1245 1963 3360 3633 3580 3636 3473 3498 3541
Agricultural Sciences 414 576 803 900 855 853 830 904 908
Psychology 772 954 1888 2116 2262 2444 2587 2749 2822
Sociology 162 239 504 583 638 599 645 680 687
Economics 341 538 826 791 863 907 833 867 879
Anthropology 69 82 217 240 260 324 381 385 394
Other Social Sciences 314 502 1079 1392 1500 1609 1531 1550 1616

Totals 6263 10477 17731 18880 18940 18948 18316 18352 (18361)

"Tl "Tl
G) G)

w W
co -..J
o
:::J

30 O . o .0
~
rou -g. -CD' 'O
O
~3'
°1'
3 O
::o
.Do>
e -
lO
9 oo
o
og:
o> S
:I
g:
'"
'<

.
N
ro N , 'O
::J ,
00.
CD'
~.:1
o
CD
o:¡; o. lO
O
o lO '<
lO
o
:o
-< _. a> o o
e 3
o
o :I ;¡..
0.3 01 7<'
3 :::J O
..... ro -9: o' O
QJ ro
::J .g o.
,8' -i::J lO
o O
O
'".... ~ """"
~
~ o ¡¡j'
o>(/)
ú)
ro
o'
::J ::o
~
7<'
I
a>
N ~
Q)
g::o
~.

ro ::J
w~ 3
. ,.'"
:::J
o
lO
'<

'"
,,'
.... ~~
a>
a:..
,... ~
o'
ro
nO.
QJ CD
o,o
i a>
O
~ O
<::>
:..,.¡g,
.

:I ,,
:¡1 '""
O
;:,;
too

~ 8 <Q.,
O ~(/)
:::Y _.
&"(11 ;;: ¡:;¡ ..... u 3 r-..l O
QJ (/)
~
q,u
ro' O) r
<D(I) !2.
o
a> (0)
3-:::

o'
lO
,01
1
'-..1
O
01
N .
(J
<:>
--<
o. ­
0>0'
::o
CD
oCD<:::J
"e
oa
o
¡:¡
CD
'"
~
~ o,g
.~
¡Ji
a>
;!.
~
W-<
O'
.•
:::J
.",
I l:T
o' ... ... o
~
a>
Ñ
""
too

"";:,;<:>
.
Q)O I
e '< o' ~
6,-< CD
o. OS; .~. I
O oo
.-..1 e '-J
nO '"
N>~
:I:
CD
8 ro g: ,,­ '" '< :::J
o Q .~ ;!. I:l.
""...,
;:,;
o S; (}I.... ; ;

tn. ~
~(/) ;j' 1 3 ::r !2.
=ro(/) 2-, • • w U
'" '-..1 o' lO
""
'"O o lO

'"
lO
::J • J> CD -..JO .·0 o ;¡..
(/)u 0'0 01
'O' (i'
- o
~::J
Wo.
a
'O

'"
~
"'8"
"':I
!2.
(}1::J
~o.
. ro
::J
O
:I',
'<
lO
o' ,
I
.
:::J
o
lO
n (DO
:I:::J
CDo
~.g: ~
;:,;
¡:,

too
COro .~ ¡;;.
-­ - --­ -.~
lO
ro
.O::J
o o> le»
---~---
-iro
o>
0"0>
-::J o
?'
"
::J
o>
-<
\<11
.- --­
,...
5' (/)

!tCD:I: ooa>
ro o> -O
y>
.w-<­ -..1\ 0
.
N~.
--:-y> .~ 3 N
<D
s.
.-+
0
.
"'<D
O

J>
·cr
a:
CD
~
N
:.,.
:I:
o
3
:::Y
ro
o.
-rf!. UI
O

0l:: IV
e
o >!!
!.. o~ ~
o>
O-­
\O
::;
"
270 Theory and Applications ofCorrespondence Analysis

point ., which is at an approximate position conforming to the observed


trend. These co-ordinates can be substituted into the reconstitution formula:
1
9.5
9. Applications ofCorrespolldence Analysis

PATTERNS IN EXAMINATION MARKS

This application illustrates the analysis of a doubled matrix as welI as the


271

p¡./c. = r¡(l + f¡lg.t!(A'1 )I /Z+ f¡zg.z/().z)I/Z) strategy offocusing a principal axis on an obvious feature of the data seL
to give an estimated profile p¡./c., i = 1 ... 12. Since the total number of
doctorates is fairIy steady from 1974 to 1975, the estimated frequencies for 9.5.1 Data
1976 are n.(p¡./c.), where n. = 18352, the same number as in 1975 (see last
column ofTable 9.7). Table 9.8 gives the marks for a class of 38 students on 8 questions in an
examination, as welI as the total marks (out of 100) and the average marks
The horseshoe effect is strikingly evident in the pattern of column points
for each question. In accordance with the discussion of Chapter 6 these marks
representing the years. The row points, representing the various disciplines,
are doubled with respect to their respective maximum marks, so that the data
do not follow the same pattern. The points representing agricultural sciences,
matrix analysed is of the order 38 x 16.
earth sciences and economics, for example, lie clearIy within the horseshoe,
which means that the profiles of these disciplines are higher than average in
the earIy and later years. This display should be compared to those of Figs 95.2 Method and resu/ts
9.1 and 9.4, where both sets of points exhibit approximately the same
Scaling ofthe students provided by thefirst principal axis
horseshoe effect owing to more ordered row-column associations. Notice
Since the first principal axis of correspondence analysis provides a scaling of
how the 2-dimensional curved representation of what is basicalIy a regular
the students which has maximum discriminability (as quantified by inertia),
trend allows for a richer description of the data.
it would be interesting to see how this scaling compares to the scaling
provided by the total marks, which is usualIy taken as the best summary of
the students' achievements.
9.4.2 Recorded number of offences of 18 types of crime. 1950-1963
Suppose that f¡ is the first principal co-ordinate of the ith student
The data matrix is given by Chatfield and Collins (1980) and consists of the (i = 1 ... 1), and that gq+, gq _ are the first principal co-ordinates of the pair of
frequencies of 18 types of crime (the rows) in each of the 14 years 1950-1963 points corresponding to question q (q = 1 ... Q). The transition formula from
(the columns) in the USA. The total inertia of the matrix is 0.008688 and the columns to rows is:
percentages of inertia are 72.4 %, 15.6 %, 4.3 %, 2.8 %, ... , so that it is again
clear that a planar display is quite adequate (Fig. 9.8). Even though the total f¡ = (l/Ji) L q{Y¡qgq + + (t q- Y¡q)gq-}lt (9.6.1 )
inertia is smalI, the changes from year to year are stilI methodical, but not as
regular as in Fig. 9.7. There is a trend from the crimes of violence, indecency where Ji is the square root of the first principal inertia, tq is the maximum
and homosexuality in the earIy 1950s to the crimes of theft as welI as motor mark for question q and t. is the maximum total mark, 100 in this case (we
vehicle related crimes in the 1960s. Remember that the display does not use the same notation as in Section 6.1). The minimum value of the f¡, when
represent the change in the total number of offences, but rather the changing all marks Y¡q (q = 1 ... Q) are zero, is:
emphasis in the relative frequencies of offences.
The isolated position of the point representing homósexuality-related fmin = (l/Ji) Lqtqgq_/t.
crimes shows that its development over the years has not been the same as
other crimes. This can be checked back to the data and it is indeed evident In order to scale the f¡ to lie between O and t., as a more convenient
that, in spite of a steadily increasing frequency of crimes in general, the comparison with the total mark which also lies between O and t., it is clear
frequencies of this particular offence rise to a peak in the mid-1950s and then that we first need to subtractfmin from all thef¡, which gives:
show a general, though not steady, decrease. The "bump" in the sequence of
year points in Fig. 9.8 demonstrates this pattern which is unique amongst the f¡-fmin = (l/Ji) Lq{Y¡igq+ -gq_}}/t (9.6.2)
various crimes, probably due to changing public and legal opinion about this
controversialoffence. and then multiply f¡-fmin by a constant 1 so that the maximum value of
TABLE9.8

Marks obtained by a class of students in an actual examination. showing the maximum mark possible for each question. the

average mark for each question across the class and the total mark of each student. The students are identified by their positions

in the class according to their total marks in this examination.

Ouestions (maximum marks in brackets)


Position in class 1 (3) 2 (8) 3 (10) 4 (13) 5 (8) 6 (5) 7 (8) 8 (45) Total (100)

1 3 7 9 10 7 5 8 43 92
2 2 6 10 9 8 5 8 43 91
3 3 7 8 12 8 5 8 39 90
4 3 8 9 8 7 5 5 45 90
5 1 7 10 9 5 5 5 45 87
6 3 6 10 6 8 5 8 41 87
7 3 6 10 12 7 5 O 43 86
8 1 8 10 9 8 5 8 36 85
9 3 8 9 10 6 5 6 36 83
10 3 7 10 6 6 5 5 40 82
11 3 8 9 6 8 5 8 32 79
12 3 8 10 9 8 5 6 29 78
13 1 7 9 9 6 5 8 31 76
14 3 6 10 12 5 5 7 28 76
15 2 7 8 10 8 5 7 26 73
16 1 6 10 8 6 5 8 29 73

17 3 8 10 8 8 5 8 22 72
18 3 8 7 6 7 5 8 28 72
19 3 6 8 9 7 5 7 25 70
20 1 7 9 9 6 4 O 33 69
21 3 8 4 9 8 5 8 24 69
22 3 6 O 8 8 5 3 35 68
23 3 8 10 13 6 5 8 15 68
24 O 6 8 3 5 3 8 34 67
25 3 6 2 6 8 5 6 30 66
26 3 5 10 3 7 5 7 24 64
27 1 6 10 6 6 5 8 21 63
28 3 7 8 O 7 5 7 23 60
29 2 4 10 9 7 5 O 22 59
30 3 6 4 8 7 5 6 19 58
31 3 7 9 O 5 5 6 20 55
32 1 8 1 6 2 2 8 27 55
33 3 4 2 5 8 5 O 26 53
34 2 4 O 5 2 5 6 28 52
35 1 5 4 3 O 5 7 21 46
36 1 5 2 6 O 5 O 26 45
37 O 6 1 O 5 5 O 28 45
38 O 4 7 6 O 5 O 10 32

Average 2.2 6.5 7.3 7.2 6.1 4.9 5.7 29.7 69.6
I1

274 Theory and Applications ofCorrespondence Analysis 9. Applications 01Correspondence Analysis 275 1I
1

7­ l(i; - f min) = LqiXqYiq, where the coefficients iX q are:


l2= 0.0591 (19.2%1
iX q = t.(gq+ -gq_)/Lqtq(gq+ -gq_) (9.6.4)
The values of iX q , as wel1 as the maximum marks tq and the average marks Yq­
for this particular example, are as fol1ows;
-
7


37 22 Question q: 1 2 3 4 5 6 7 8
- 33 - --
54
Maximum t q : 3 8 10 13 8 5 8 45
'2- - 20

­ Averageyq : 2.18 6.47 7.29 7.18 6.05 4.87 5.6829.66


Coefficient iX q : 1.111 0.811 1.672 0.757 1.540 1.009 1.227 0.810
ll=0.1I2 It is easily checked that LqiXqt q = 100. The fact that the coefficient iX 3 = 1.672
(36.3%1 is substantially greater than 1 means effectively that the 10 marks al10cated
to this question are upgraded to over 16 (1.672 x 10 = 16.72). The marks for
this question are both quite variable and polarized. Thus the few students .'

who receive less than fuI! marks and in this way fail to achieve what most of
1

the others succeed in achieving, are penalized more than their loss of marks
8- 27- -17 \7+: I
Icale implies. In a similar fashion, if the students had mostly obtained O for a
I I
question, then the student who does manage to gain marks receives more
-
0.5 23
,

than his actual marks gained, since he succeeds where most others have failed. 1

'1
This is one of the features which distinguishes the correspondence analysis
F:G. 9.9. Optimal2 -dimensional display. by correspondence analysis. of the data scaling from other scalings of such data.
matrix of Table 98. doubled column-wise with respect to the maximum mark for
The doubled profile of the total marks can be represented as two supple­
each question; point q+ indicates the original qth column and q- its doubled
counterpart q = 1 .. 8. The vector of total marks and its doubled counterpart are mentary points in the same space as the doubled questions (Fig. 9.9). The
correlation between the total mark profile vector and the first principal axis '
represented as supplementary column points T + and T- respectively.
(i.e. the new linear combination of the marks defined aboye) is found to be
0.982, implying an angle of just under 11 degrees.

I(i; -fmin), when al1 the individual marks are maximum Yiq = t q (q = 1... Q), Focusing the total mark vector on the first principal axis
is t : In most cases we shal1 want to accept the total mark as the best overal1 1

l(l/Ji) Lqtq(gq+ -gq_)/t. = t. summary of the students' ability. Then the question of interest is how much
i.e.: variation and what sort of patterns are in the data which are uncorrelated
with the total mark. In Fig. 9.9 the dimension defined by the total mark is
l = Jit~/Lqtq(gq+ -gq_) (9.6.3) lined up quite wel1 with the first principal axis of the correspondence analysis,
In which respects could the scaling obtained by correspondence analysis be but we real1y want to force the first principal axis to coincide with this
different from the scaling by the total mark? In view of our discussion in dimensiono A convenient way of doing this is to introduce the doubled total
Section 6.1 it is clear that the correspondence analysis will be sensitive to the lPark into the analysis as a pair of included variables T + and T -, and then
polarization of the marks on each question, whereas this feature is not taken to increase their mass (if necessary) until the first axis is total1y aligned with
into account at al1 by the total mark. Figure 9.9 shows the correspondence this dimensiono Notice that this strategy of focusing affects neither the
analysis of the doubled data matrix. The linear combination of the individual centroid of the display nor the relative values of the inertias of the component
marks Yiq (q = 1... Q) to obtain the position on the redefined scale, with questions (cf. (6.1.4)).
minimum at O and maximum at t = 100, is, from (9.6.2) and (9.6.3), If T + and T - are assigned masses which sum to 1, then their inertia in
,
276 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 277

the full space is computed as 0.1009, roughly a third of the total inertia ~3= 0.02111 (10.8%)
(0.3020) of the 8 doubled questions. When the individual marks and the total
marks are analysed simultaneously, the set of 8 questions receives half the 7­
mass it had before and the total mark also receives half the mass on which
the aboye inertia calculation is based. Because the centroid has not changed,
the inertias of the individual questions are halved, hence stay in the same
proportion, so that the sum of the inertias of the 8 questions is 0.1510, and
the inertia of T + and T - is similarly 0.0505. This gives a total inertia of
0.1510+0.0505 = 0.2015. Because the profiles T + and T - are averages of .29
the respective sets of profiles 1 + ... 8 + and 1 - ... 8 -, it is clear that the
introduction of the total marks into the analysis can only decrease the total
inertia of the cloud of points.
.7
It can be similarly argued that the first principal inertia (0.1055) in the
combined analysis must be less than the first principal inertia (0.1116) of the
analysis of the questions alone. Of course the percentage of inertia represented
by the first axis can increase, as it does in this example: the introduction of
T + and T - increases the percentage of inertia from 37.0 % to 52.4 % ~2=0.0301

because of the high correlation between the total mark and the first axis. 21 (15.0%)

Geometrically this is obvious because mass is being concentrated very close .22

to the original first principal axis.


In the correspondence analysis of the combined data, the first principal
axis is almost exactly aligned with the total mark dimension (the angle cosine
is 0.998). The mass of T + and T - can be increased, but the focusing of the
axis is for aH practical purposes complete. The correlations (angle cosines) of
the individual questions with the first principal axis in the original analysis
and in the "focused" analysis are: aeale
0.1
Questions q: 1 2 3 4 5 6 7 8
Original correlations: 0.452 0.574 0.706 0.546 0.706 0.190 0.494 0.698
•32
Focused correlations: 0.436 0.565 0.643 0.567 0.678 0.164 0.452 0.764 \
e al -2.2 an vertical olla

~
\
\
\
The focusing, which has increased the correlation with the total mark
dimension from 0.982 to 0.998, generally decreases the correlations with the
individual questions, except for questions 4 and 8 which have highest
FIG. 9.10. Display with respect to the second and third principal axes of the
maximum marks. correspondence analysis of the doubled matrix of marks. where the first principal
In order to investigate the subsequent axes for possible patterns orthogonal axis has been focused on the total mark. (In other words. this is the optimal
to the total mark, we consider the display of the points with respect to axes 2 2-dimensional display orthogonal to the dimension defined by the total marks; the
and 3 of the focused analysis (Fig. 9.10). The first 5 questions play minor roles contribution of the total mark vector to this display is (almost) zera.) Only the
students which have prominent correlations with this plane are indicated. Notice
in this display and this plane thus shows the variation amongst the students )
'.\
that the question points are allocated a total mass of t in this particular focusing,
in tackling the last 3 questions. For example, students ranked 15, 17, 23, 26 so that these principal inertias should be multiplied by 2 for comparison with
and 27 performed relatively badly on question 8. The large distance between those of the unfocused analysis, assuming a zero contribution by the points T +
7 + and 7 - is chiefly due to students 7 and 20 who obtained O for this and T-.
T ABLE 9.9

Estimated protein consumption, in 9 per head per d, in 25 countries and from 9 protein sources (from Weber, 1973).

Agrarpolitik im Spannungsfeld des Internationalen Ernaehrungspolitik. Kiel. Institut fuer Agrarpolitik und Marktlehre (mimeo­

graphed); see Gabriel. 1981, p 151).

(f) (f)

ctl ""O
Q)
E Q)

e ,
(f) (f)
Q)
ctl
Ol
S" (f)
'0 -o
e e :J ""O
ctl
o
.." .¡;¡ o o ;:l Q)
Q. :J Ol
ctl
.:;; ~
""O
.2 e Q)
Ol
e (f) >- >
~ ctl ctl
-c (f)'

ro .~
Q)
p (f)
(f)
Ol
.:>L -c ~ ~ .!!'.
-o Q) Ol
Ol
(f) Q) ;:'l :J 2
<{ ~ a::: w ~ LL U (fJ o.... LL

MEAT PIPl EGGS MllK FISH CERS STAR NUTS FRVG

Albania ALBA 10.1 1.4 0.5 8.9 0.2 42.3 0.6 5.5 1.7
Austria AUST 8.9 14.0 4.3 19.9 2.1 28.0 3.6 1.3 4.3
Belgium/luxembourg BElX 13.5 9.3 4.1 17.5 4.5 26.6 5.7 2.1 4.0
Bulgaria BUlG 7.8 6.0 1.6 8.3 1.2 56.7 1 .1 3.7 4.2
Czechoslovakia CZEC 9.7 11.4 2.8 12.5 2.0 34.3 5.0 1 .1 4.0
Denmark DEN M 10.6 10.8 3.7 25.0 9.9 21.9 4.8 0.7 2.4

East Germany EGER 8.4 11.6 3.7 11 .1 5.4 24.6 6.5 0.8 3.6
Finland FINl 9.5 4.9 2.7 33.7 5.8 26.3 5.1 1.0 1.4
France FRAN 18.0 9.9 3.3 19.5 5.7 28.1 4.8 2.4 6.5
Greece GREE 102 3.0 2.8 17.6 5.9 41.7 22 7.8 6.5
Hungary HUNG 5.3 12.4 2.9 9.7 0.3 40.1 4.0 5.4 4.2
Ireland IREl 13.9 10.0 4.7 25.8 22 24.0 6.2 1.6 2.9
Italy ITAl 9.0 5.1 2.9 13.7 3.4 36.8 21 4.3 6.7
Netherlands NETH 9.5 13.6 3.6 23.4 2.5 22.4 4.2 1.8 3.7
Norway NORW 9.4 4.7 2.7 23.3 9.7 23.0 4.6 1.6 2.7
Poland POlA 6.9 10.2 2.7 19.3 3.0 36.1 5.9 2.0 6.6
Portugal PORT 6.2 3.7 1 .1 4.9 14.2 27.0 5.9 4.7 7.9
Rumania RUMA 6.2 6.3 1.5 11 .1 1.0 49.6 3.1 5.3 2.8
Spain SPAI 7.1 3.4 3.1 8.6 7.0 29.2 5.7 5.9 7.2
Sweden SWED 9.9 7.8 3.5 24.7 7.5 19.5 3.7 1.4 2.0
Switzerland SWIT 13.1 10.1 3.1 23.8 2.3 25.6 2.8 2.4 4.9
United Kingdom UK 17.4 5.7 4.7 20.6 4.3 24.3 4.7 3.4 3.3
Russia USSR 9.3 4.6 2.1 16.6 3.0 43.6 6.4 3.4 2.9
West Germany WGER 11.4 12.5 4.1 18.8 3.4 18.6 5.2 1.5 3.8
Yugoslavia YUGO 4.4 5,0 1.2 9,5 0.6 55.9 3.0 5.7 3.2
280 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 281

question. Students 24 and 32, who stumbled on question 6 (where everyone analogous to frequencies in that a total mass of protein is distributed over the
else had done so well), account for the large contribution of this question on cells ofthe matrix in units ofO.1 g (per head per day).
the third axis. Notice the position of student 29 who did well on 6, badly on
7 and not too well on 8, compared with the average across the class. This
9.6.2 Method
plane thus shows the students who did not follow the average pattern for the
last 3 questions. This information could be useful, for example, if there seems Two analyses are performed on these data:
to have been insufficient time to complete the examination, in which case the
lecturer might consider the marks obtained by the students for questions 1 to Analysis 1. A principal components analysis on the data centred with respect
5 and questions 6 to 8 separately in his assessment of the class. to the column means (Appendix A, Table A.1(1)). When a data set involves
measurements on different scales, they are usually pre-standardized so that
9.5.3 . Discussion each vector of measurement has unit variance. However, the scale of
measurement is the same throughout the present table and rescaling seems
The chief reason, if not the only reason, of an examination is to arrive at an unnecessary, although it will become apparent that the largest protein
ordering of the students. Ir the examination has been carefully constructed sources do play an overwhelming role in the analysis.
with marks allocated in terms of the importance of different sections of the
syllabus, then the total mark certainly provides that ordering. However, given Analysis 2. A correspondence analysis. A row point thus represents the profile
a specific set of results, the total mark is almost certainly not the most of protein consumption in the particular country. The total consumption is
discriminating linear combination of the marks in any specific statistical not used in the point's position but rather as a mass to weight the point. It is
sense. Correspondence analysis of the doubled matrix of marks is a technique thus not the absolute amounts but the dietary preferences which are
of identifying a linear combination of the marks which maximizes a measure displayed, while the X2-distance (between countries, say) tries to correct for
of discrimination between students. Notice that we are not saying that this is the large differences between highly consumed proteins.
a more equitable way of combining the marks together in order to obtain a
total mark, but rather that this is a different way, based on the global set of
9.6.3 Resu/ts and interpretation
marks of the actual class, and that it might be of interest to study these
results to understand more fully the way the examination has tested the Tables 9.10 and 9.11list the complete numerical results from both analyses.
students. The strategy of focusing also helps in the understanding of more Notice that all co-ordinates are principal co-ordinates, including the "co­
subtle features in the marks which are uncorrelated with the usual ordering ordinates" of the proteins in the principal components analysis (standard
of the students in terms of their total marks. computer packages usually give standard co-ordinates, sometimes called
"standardized scores"). In both analyses a few points contribute substantially
to the major principal axes. We first investigate the internal stabilities of the
9.6 PROTEIN CONSUMPTION IN EUROPE AND RUSSIA displays before proceeding with their interpretation.

In this application we compare the results of correspondence analysis with Principal components analysis
those of principal components analysis, using the same data set. The question The display of the countries with respect to the first principal plane is shown
of internal stability of the displays is discussed, and it is shown why a very in Fig. 9.11 (the arrows are explained later). Bulgaria and Yugoslavia
highly contributing row or column is best treated as a supplementary point. contribute the most to the first principal axis, 0.183 x 149.0 = 27.3 and
0.178 x 149.0 = 26.5 respectively. Because there is such a large difference be­
9.6.7 Data tween the fir~t and second principal inertias (variances), namely 149.0-29.5 =
119.5, it is clear that the first axis would not rotate very much if either ofthese
These data are estimated protein consumptions from 9 different sources, by points were removed. To put an upper bound on the angle of rotation of the
inhabitants of 25 countries (Table 9.9). Here the data are neither contingency first axis if Bulgaria, say, were removed, the quantity h of (8.1.6) is first
nor frequency in nature, as in the previous applications, but they are computed as: h = (1/0.96)(0.134 x 209.7)/(149.0-29.5) = 0.245. (Notice that
TABLE 9.10

Decomposition of inertia (variance) in the principal components analysis of Table 9.9. in a similar format to that of a

correspondence analysis (cf. Table 9.11 ). for the first two principal axes. Ouantities which are multiplied by 1000 or expressed

as permills (thousandths) are indicated by x 1000 or 7•• respectively. Notice that the co-ordinates have not been multiplied by

1000 in this case. The total variance is 209.7 and the first two principal variances are 149.0 (71.1%) and 29.5 (14.1%)

respectively.

OLT MASS INR COR CTR COR CTR


(a) Name x 1000 x 1000 7n K=1 x 1000 700 K=2 x1000 700
1 ALBA 776 40 49 -2.8 769 53 -0.3 7 2
2 AUST 434 40 14 1.1 402 8 0.3 32 3
3 BELX 753 40 10 1.2 711 10 -0.3 42 3
4 BULG 983 40 134 -5.2 967 183 0.7 16 15
5 CZEC 324 40 9 -0.7 232 3 -0.4 92 6
6 DENM 885 40 42 2.8 876 52 0.3 9 3
7 EGER 253 40 23 1.0 196 6 -1.7 57 95
8 FINL 831 40 64 2.5 450 40 2.3 381 173
9 FRAN 398 40 19 1.3 394 11 0.1 4 1
10 GREE 611 40 28 -1.8 549 22 0.6 62 12
11 HUNG 684 40 34 -2.2 653 31 -0.5 31 8
12 IREL 941 40 34 2.4 784 38 1.1 157 38
13 ITAL 763 40 10 -1.3 731 11 -0.3 32 2
14 NETH 825 40 33 2.4 799 37 0.4 26 6
15 NORW 721 40 32 2.2 721 33 0.0 O O

16 POLA 342 40 9 -0.5 142 2 0.6 200


17 PORT
12
856 40 63 -0.2 2 O -3.4 854 381
18 RUMA 990 40 71 -3.8 972 98 0.5 18
19 SPAI
9
834 40 26 -0.4 27 1 -2.1 807 149
20 SWED 926 40 45 3.0 924 59 0.1 2
21 SWIT
1
879 40 21 1.8 736 22 0.8 143 22
22 UK 615 40 27 1.9 612 23 0.1
23 3 1
USSR 880 40 28 -2.1 753 30 0.9 127 26
24 WGER 887 40 42 2.7 836 49 -0.7 51
25 15
YUGO 991 40 130 -5.1 973 178 0.7 18 17

(b) Name QLT MASS INR K=1 COR CTR K=2 COR CTR
1 MEAT 363 111 51 1.8 315 23 0.7 48
2 PIPL
18
195 111 62 1.6 191 17 0.2 4 2
3 EGGS 573 111 6 0.8 562 5 0.1 11 O
4 MILK 976 111 231 5.2 556 181 4.5 420 690
5 FISH 442 111 53 1.6 216 16 -1.6 226 85
6 CERS 997 111 551 -10.5 955 741 2.2 42
7 165
STAR 326 111 12 0.8 260 4 -0.4 66 6
8 NUTS 549 111 18 -1.4 511 13 -0.4 38 5
9 FRVG 290 111 15 -0.2 20 O -0.9 27 29
TABLE9.11

Decomposition of inertia in the correspondence analysis of Table 9.9, for the first two principal axes. The information for the

third principal axis is also given for future reference, but this has not been included in the quality (OLT) of the points' planar

display. The total inertia is 0.1690 and the first three principal inertias are 0.0865 (51.2%),0.0390 (23.1 %) and 0.0200 (11.8%)

respectively.

(a) Name OLT MASS INR K=1 COR CTR K=2 COR CTR K=3 COR CTR

1 ALBA 763 33 74 -530 744 108 -85 19 6 -242 156 98


2 AUST 676 40 24 149 222 10 -212 454 47 148 218 44
3 BELX 590 41 10 159 581 12 -19 9 O 53 65 6
4 BULG 910 42 76 -516 881 130 -92 29 9 -10 O O
5 CZEC 343 39 16 -42 27 1 -146 316 21 178 467 62
6 DEN M 837 42 48 387 777 72 107 60 12 -26 4 1
7 EGER 196 35 25 1 51 189 9 30 7 1 274 621 133
8 FINL 429 42 58 312 421 47 -42 8 2 -312 423 206
9 FRAN 385 46 20 167 372 15 31 13 1 35 16 3
10 GREE 602 46 35 -220 376 26 171 226 34 -146 165 49
11 HUNG 615 39 43 -293 470 39 -162 145 27 221 265 96
12 IREL 884 43 32 281 617 39 -184 267 37 -43 15 4
13 ITAL 604 39 16 -197 561 18 54 43 3 -17 5 1
14 NETH 843 39 30 263 530 32 -201 313 41 84 54 14
15 NORW 739 38 41 286 452 36 228 287 51 -177 175 60
16 POLA 86 43 12 -13 4 O -60 82 4 102 229 23
17 PORT 941 35 128 -69 8 2 757 933 518 174 50 54

18 RUMA 955 41 51 -439 911 91 -95 44 10 -31 5 2


19 SPAI 789 36 43 -156 122 10 367 667 125 91 41 15
20 SWED 816 37 37 367 795 58 60 21 3 -130 101 32
21 SWIT 683 41 20 178 390 15 -153 293 25 -41 21 4
. 22 UK 320 41 28 191 320 17 -5 O O -125 139 33
23 USSR 469 43 18 -181 463 16 -20 6 O -90 116 18
24 WGER 802 37 30 309 694 41 -121 108 14 148 160 41
25 YUGO 953 41 85 -569 934 155 -80 19 7 -41 5 4

(b) Name OLT MASS INR K=1 COR CTR K=2 COR CTR K=3 COR CTR

1 MEAT 336 115 65 176 322 41 -36 14 4 -63 42 23


2 PIPL 483 92 116 223 234 53 -230 249 126 316 468 461
3 EGGS 640 34 28 284 590 32 -81 50 6 104 79 19
4 MILK 754 199 173 315 679 229 -104 75 56 -170 199 291
5 FISH 962 50 198 355 188 73 720 774 663 -2 O O
6 CERS 966 376 235 -317 956 438 -31 10 10 -19 4 8
7 STAR 364 50 44 203 276 24 115 88 17 165 183 68
8 NUTS 740 36 87 -506 625 106 218 115 44 -71 13 9
9 FRVG 354 48 54 -76 32 3 246 322 75 224 266 121
286 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 287

).2=29.5\ (14.1%) ).2'0.0390 +(23.1%)


FINL.I

YUGO USSR_ , • SWIT f'REL


PL
BULG-r- R~MA GREE POLA fNETH DENM
AUST UK ., \l"SWED
AUST

P1
.NETH
I = 149.0
(71.1%)
ALBA, HUNG .. CZEC
, ITAL' ~BELX
,
NORW
WGER"
BULG
YUGo... .RUMA
HUNG
• CZEC

POLA

Sr 'T .'REL

EGGS. 'M1LK
WGER

~EGER
ALBA

.CERS

• .FINL
.USSR
scole SpÍI ).,=0.0865
~ (51.2%) .ITAL .SWED

PORT
1 STAR
• DENM•
.GREE
NUTS
• FRVG .NORW
FIG.9.11. Optimal 2-dimensional display. by principal components analysis. of •
the data matrix of Table 9.9 which has been centered with respect to column
means. The lines emanating from each point indicate changes in position when
the analysis is repeated using thedata for columns MILK and CEREALS only. SPAI.

the inertia of the 4th point is 0.134 x total inertia, this being equal to
w4Ul1 +Ü2 +...) in (8.1.6).) Hence, by (8.1.8), 4J < 7.10. To evaluate the
tighter bound we first compute 841 to be 10.5°, since cos 2 841 = 0.967 in Table
scare
9.10(a). Hence, by (8.1.10), 4J < 3.3°. Clearly the first axis is internally stable. >--------1
0.1
On the second axis the point Portugal contributes 0.381 of the inertia,
namely 0.381 x 29.5 = 11.2. The inertia of Portugal which is along axes FISH
2,3, ... etc. is its total inertia 0.063 x 209.7 = 13.2 minus that part along the PORT •
first axis 0.002 x 149.0 = 0.3. Since A,3 = 15.0, h is evaluated as 0.927 and the •
rough upper bound for 4J (if Portugal were removed) is 34°, with the tighter
bound of 31°. Although the second axis would undergo a substantial
rotation, it would not be enough to label the axis unstable. Notice that the F1G.9.12. Optimal 2-dimensional display. by correspondence analysis. of Table
9.9. Notice that the co-ordinates on the second axis are opposite in sign to those
points discussed aboye lie very close to the principal axes mentioned, so that in Table 9.11. We have reversed the second axis in the display to facilitate
there is no need to consider the possibility of "diagonal" spatial rotations of comparison with Fig. 9.11.
the planeo
From Table 9.1O(b) it can be seen that variables milk and cereals play
an overwhelming role in determining the principal plane-their joint con­ the principal components analysis through their high variance. Sorne type of
tributions to the plane are proportions 0.922 and 0.855 of the respective standardization is needed, or alternatively, the columns of high magnitude
principal inertias. Reasoning as before, we would expect that the principal can be dowliweighted, possibly in steps, so that patterns of a more mu1ti­
plane would hardly change if the analysis were repeated using just these two dimensional nature come into view.
sources of protein as variables. The arrows in Fig. 9.11 show the approximate
movements of the points when this is done, and the change in the configura­ Correspondence analysis
tion is minima!. This illustrates how the most consumed proteins dominate The correspondence analysis display of Fig. 9.12 only represents the differ­
288 Theory and Applications ofCorrespondence Analysis
c

Az~0.024Bt(l6.1%)
ro
I~
.. PIPl al I~NNN~mMooo~m~o~Mo~OO~O~~oo
NO ~~~~ N~~~~~N ro~~~ ~
E
o
~ ~ ~

a..
.AUST
e
• HUNG CZEC o
• NETH Q. I~~O~~M~OON~~MO~NOOM~~O~~OO

.EGER
• .WGER ero
eQ)
18 ~~M
~
M~~~
~ ~~
~MOOOOM~M
~~ M~N
~~oom~~
N~ N

POlA. IREl
SWI¡ S"'[AR 'EGGS E
Q)
Im~~~~~oON~~OO~Ooo~~~o~mN~~
BUlG RUMA
.
FRVG BElX.
AI'0.0900 Q.
Q. N~MM~o~oo~~~m~OmON~~~m~oo
YUGO • • CERS
FRAN·.. MEAT
.. MllK
(5B.5%)
::J
rJ)
Q)
..c
12
~N
I
N~~~
I
I NN
I
I NN~
I
N~
I I
I I ~

.USSR +-'
·'TAl .UK .DENM
ALBA (;
• .SWED
.FINl
'+­
+-'
::J
c:::
f-
~mNN~~O~~~N~OOOOOOMoomM~~OO~
~ ~M ,.......~oo::::t~N'o::::t('f')~N.q- m LO~~..--('f)LD
B U
NUTS
. GREE EPA I
::J ro
O''¡::;
~ ~ ~

• .NORW
coQ5
ü C
Q¡'+­
E o
c::: m~OOOOOOO~~Mmm~~~OO~ONOO~OOM~O
~ ::J C
O ~o~mM~mN~NO~NN~
~N~OO
~oooo~~~m~
~~~MMU')~U')U')~ m ~MM~~m
0.1
e .g U
.lISH ~~.~
PORTO a:i;Q.
w C E ~ oo~~ooo~~oooo~~~~mN~U')~~U')~MO~

(supplemenlary pain!) cE co o MMU')N~mU')O~~o~m~O~~M~~OOOOO~

<l: ,ü 11 L!) ,....- ,....- L!) IM~M~NMN~NM ]oo::::t..--C"?,....-,....-..--MLO

f-rJ)Q) I I I I I I I I
c'O ~ I
E ro
::J'+­
- o
FIG.9.13. Optimal 2-dimensional display. by correspondence analysis. of Table 8.!!!
9.9. with Portugal excluded but displayed as a supplementary point. 'O
~~
e I c:::Z IU')U')N~~OOOM~MOOMONNMOO~~OO~M~
OON~OO~U')M~N~~MNMU')~U')~~NMNMm

rJ)..c
:S;+-'

~,
'O
Q)

I~
ences in "shape" of the data vectors, while the principal components analysis 'O
::J
~NM~OM~~~~~~~~mU')N~mMM~OOM
M~~~~~M~~~~~~~M~~MM~~~M~
display represents both "size" and "shape". Whereas the point Portugal Ü
C
contributed highly to the second axis of Fig. 9.11 because of Portugal's
ro
generally low protein consumption (a feature of "size"), in Fig. 9.12 it has a (;
very high contribution to the second axis (51.8 %, see Table 9.11(a)) because
I~
'+­ l~mooNmMO~U')~O~~~OU')M~N~moo~o
ro ~~oo~~~~~m~mOU')m~~MOU')ON~U')
of its unusually high consumption of fish (a feature of "shape"). The stability +-'
Q)
oooo~m~OOMU')M~OO~~OOOONmMm~~U')mm

of the second axis with respect to the removal of Portugal can be investigated C
'+­
as before. Condition (8.1.5) is not satisfied in this case, so it seems like1y that o
C Q)
the second axis could be unstable. o F. ~f-X~U~C:::zw~ I~~~
.¡:;
.0 0f- c:::C:::O
The correspondence analysis is thus repeated with Portugal as a supple­ 'Vi
o Z oo~~~wZw~~WZ~~f-C:::~~~w- ~w~
mentary point (Fig. 9.13 and Table 9.12). This 2-dimensional display is now Q. ~~W~NW~~C:::C:::~~~woo~a..~~~~~~
E ~~ooooUOw~~~I __ ZZa..c:::~~~~~ r
stable and provides an excellent "protein map" of the countries, with c1earIy o ~NM~U')~~oomO~NM~U')~oomO~NM~U')
ü
Q) ,....-..--,....-,....-,....-..--..--,....-..--NNNNNN
defined and well separated regions corresponding to southern Europe, o
eastern Europe and northern/central Europe.
9. Applications ofCorrespondence Analysis 291
e Q) e
ro
I I
o o
(J)~(J) 9.6.4 Discussion
~ ~ I I la:

u
q-O
q- NO
LO
..... O COLO
..... Mq-
M
CO .....
(J) a: Q)
ctlf-=
E uo
ro-o. Figure 9.13 shows more variation amongst the non-Mediterranean countries
.- (J)
..... Q) >
e x ....
than Fig. 9.12, thanks to the removal of Portugal as a contributing point. In
Q) ctl ctl
Fig. 9.12 the second axis is more than 50% determined by Portugal and, in
I cnLOf'COLOOCOf'M I 8-2 ~
81~1
particular, by a single element of Portugal's profile, the fish consumption. It
la:
O cnCONf' NO (J) (J) E
u CO M ..... .~ § ~
is preferable to remove the influence of this obvious and isolated feature from
(J).- Q.
.- ...... Q.
Q) ::J ::J the display so that more subtle multidimensional patterns can be investigated.
Q; ~ fJ')
Principal components analysis and correspondence analysis treat the data
~ ~oe
~I~I
IN I Nf'cnLOM
coco ..... cnMOLOq- .....
CO ..... N 1 "O o
11 1 M Iq- N ~uo differently and it is thus unfair to judge either as being better. However, these
~ I 1 ~ Q)'¡¡; data are definitely ratio measurements, as opposed to interval measurements,
._~ (J)
"O ..... ::J
c"O u and we do feel that correspondence analysis is better suited to ratio data.
.- e (J)
f'ctl'6 Gabriel (1981) illustrates the biplot using these data, and his analysis is a
~Iol la:
f- IcnCOOMcnCOCOLON
Mq-M ..... cnMNO I 8a:-.....:
. u variation of principal components analysis which we call the "covariance
u N q-..... oZ~

o::::..·~
biplot" (see Appendix A, Table A.1(4)). Here the columns rather than the
ctl (J)
(J).-
(J) ..... ­
> rows are scaled by the singular values. As in Fig. 9.11, it is difficult to separate
ce Q:; ce
"ti E e ~
the differences in "size" (absolute protein consumption) and "shape" (relative
<b a: ..... a: o N N N LO N O N O .- ....
consumption) of the data vectors, whereas correspondence analysis concen­
:J O O NNNMcnCONMM ~~ctl
.~ u
U MNCOf'McnMCO f-f-"S .
trates on shape patterns only. It is convenient in correspondence analysis to
c:::
.. u~ .:<-
o(.) e·~·t ,. !'~
ro represent the "size" (i.e. mass in this case) in the form of the size, for example,
Q) ~ ctl M .~.
I ~ ro ~f' ofthe displayed point.
N ..... co
..... ..... q-q-COCOLOMCOOCO - c · - (J)
..... 11 11
f' ..... f'Oq-N ..... NCO
..... NNMq-MNLOI roctl~Q)
cn 1
::JQ) ...... O
Q)
~ ~ 1 I Ü ~.S: ~
:octl ~.s: eg¡
f­ .;;; "O ,e'
.,.t j 9.7 SERIATION OF THE WORKS OF PLATO
(J) Q) •
ctl "O Q)
a: co a: NLOCOOf'f'f'f'f'
E ~ ~

t
co f'NNf'q-COq-cnq­
z Z .......... N U ::J
~ e uo
This application shows how well the bootstrapping of a correspondence
._.-
c~.~ 'O analysis display can agree with more conventional statistical analysis of a
contingency tableo
1'"1
1& § ~
~I~I

(/) co q- LO q- LO f' cn LO co o .....


« COcnMOq-f'q-Mq­ ¡iju::J
::2 ..... N M ..... Q)'O
c~ (J)
9.7.1 Data
Q) ..... (J)
E '+- ctl
~~E These data, published originally by Cox and Brandwood (1959), are also
f-
...J
O ~I
I f­
...J
O
I cn f' COCOf'coq-MM
N ..... cn o o N co cn M
McnCOf'f'cnMf'
I
Q.Q)­
g-(J) ctl~ .~e
ctl E ~
given by Mardia et al. (1979, p. 314). Each of 7 works of Plato (Republic,
Laws, Critias, Philebus, Politicus, Sophist and Timaeus) are characterized by
(J) Q) o the frequency distribution of 32 types of sentence endings, that is the 25
.- ~ Q.
-roOl .....(J)'­(/) possible ways the last 5 syllables of each sentence can be c1assified, each
Q) Q)
:::)coE
E E
t~c syllable being c1assified long or short. Because it is known that Republic and
ctl ctl
f- (/)~ (/)a:(/)l? o ctl o
Z
f-
a: Z
«...Jl?...J I a: « f - > Q..~"o Laws were written, respectively, before and after the other 5 works, Cox and
wQ..l?-(/)Wf-:::>a: Q) Q) Q)
O
Q.. ::2ii:w::2LI:U(/)zu. g E ~
Brandwood develop a method of discriminating between these two major
f' ..... NMq-LOCOf'COcn
.- ctl.o
(/)(J)o works in order to derive an ordering of the other works between these two
• ~-
Q) (J) "endpoints". The assumption is that "Plato's change in literary style was
..... ctl
monotone in time", and their results consist of estimated scores and standard
292 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 293

).2= 0.0212 I (15.9"10)

.17

.13


.5

..
.31 • 15

~
29. . . LAWS
"REP.

PHIL.~ r
·23 ).1 = 0.0917 (69.0%)

4'~ ¡'_,
l _ '...... ~%
.1 .12 7 .14
3
~6 28 ;

.26
• %
. 24 ..CRIT.

9- 9.9-:.~->:J.~99;
O•
POLo .6
22 .27
"iJí~ 1
~/ ~....
r'I
.
.20 5 9 !I
SOPH.
"ih

· .\1/
..TIM.

.8 - scole
0.1 • "r¡
11/ scole
t------i
0.1

FIG.9.14. Optimal 2-dimensional display. by correspondence analysis. of the FIG. 9.15. Bootstrapped display of the columns (works of Plato) of Fig. 9.14;
Plato data (Cox and Brandwood. 1959). The rows are labelled by the decimal replicated points are labelled as follows: 1. Republic. 2. Laws. 3. Critias. 4.
equivalent of the 5-syllable sentence endings considered as binary numbers. For Philebus. 5. Politicus, 6. Sophist, 7. Timaeus.
example. the sentence ending of 1 long syllable. 3 short and 1 long. is coded as
10001 in binary. hence 17 in decimal.

other, where the optimal scores have been rescaled to have the same variance
errors for each of the intermediate works. The term "seriation", a synonym as Cox and Brandwood's scores, the score for each work being weighted by
for ordination (cf. Section 4.2), is often used in the context of archaeological the number of sentences in the work. For each score, 2 standard errors are
and historical data, especially when the ordering is presumed to be temporal indicated in the case of Cox and Brandwood's analysis, while 2 standard
(see Kendall, 1971). deviations are indicated in the case of the bootstrapped correspondence
analysis. The concordance, both of the scores and their variabilities, is
astounding and the only clear difference is that correspondence analysis gives
9.7.2 Method and resu/ts a lower score for Políticus.
Figure 9.14 shows the 2-dimensional correspondence analysis of these data.

This display was bootstrapped in the way described previously (Section 8.1)
9.7.3 Discussion
and Fig. 9.15 shows the replicates of the works (columns) only. Notice that ~

we are resampling from what is essentially population data (cf. the justifica­ ~)'Wi
','

Correspondence analysis results in a similar seriation of the works because


tion by Cox and Brandwood, 1959, that there is a dispersion of sentence the two "endpoint" works, Republic and Laws, are in fact the most different
endings between different books ofthe same work). in terms of profiles of sentence endings, that is the X2-distance between them
Since the first principal axis provides optimal scores of the works (Section is the highest. AIso their masses (proportional to number of sentence endings,
4.3), it is interesting to compare it with the scores derived by Cox and that is size of the work) are the highest, which further establishes their serving
Brandwood. Figure 9.16 shows the two sets of scores plotted against each as endpoints (or poles) along the first principal axis. Agreement might not
<.O
co
»
z
-i
m
r
O
"lJ
m
n
m
Z
rJ)
e
rJ)

o
»
-i
»
z
»
11
:JJ

9I~.. -j~ f
n
»
z -
G)
»
~ ~
m en
:JJ
m
rJ)
m
:JJ
<
m
rJ)

TABLE 9.13

Frequencies of antelope tri bes in African wildlife areas (Greenacre and Vrba, 1984); the rows and columns have been

rearranged according to the first principal co-ordinates (e.g. the ordering in Fig. 9.17). Forcensus sources see Vrba (1980).

Antelope tri bes and abbreviations


..><
u
..><' ..>< :::J
..>< o u .o
o .o :::J ..c
en x .o en
.o O) :::J
01 ~ u .o .o O)
e :o o ~ .;::
Bro
.­·S
:::J'
ti ID ..><' o
Q.
._ :o
._­
U
en O) u ._ :::J ..><' u
ID
._ O)
e:: .o ....
e:: ro
.­ en
01, ._
:::J
._ .o
e:: O)
~ .e::..><
c: -d
o
.- .o ,~
:c:
;:; ro Ó
e::­O)
._ .~ ~ g.~

"_ (]) C/)
,,,O) (!J e ... ro ~
Q.N .2 u ....
~g Q)Q.
"- ~
O ro
~O)
Q.:=
~~2
... ro
O
Q.~
o e:: ro
:::¡ ~
"b. ~Q)
~1B
... en
oQ)0l
,,,
<li "3
Q. .
u E
::>...­
Q..
·S .o
~
Surface
e:: 01 ~ ~m .8- 6 Q) 01 (!J0l Q) 01 Q)0l 001 area
Wildlife areas <::(a.i <::(O)..c :t:a.i Cl::a.i f-.:O) ~a.i <..Ja.i <::( a.i Cl:la.i surveyed
and abbreviations ANT ALC HIP RED TRA NEO CEP AEP BOV Total (km 2 )

Ngorongoro NGO 5000 13628 O 120 400 O O O 60 19208 360


Lake Turkana LAK 1087 2023 1342 O O O O O O 4452 2050
Etosha ETO 12000 4600 4796 O 2500 820 250 O O 24966 22270
Kalahari KAL 24041 20556 16073 O 6569 1645 710 O O 69594 36692
Serengeti SER 190000 455000 5000 5500 9500 O O 65000 50000 780000 25500
Nairobi NAI 845 1348 O 105 78 6 2 633 O 3017 122
Kafue KAF O 190 31 198 42 90 30 2 85 668 91
Bicuar BIC O 500 500 150 450 350 500 150 100 2700 7900
Luando LUA O O 3000 2250 900 250 1100 O 150 7650 8280
Cuelei CUE O 250 200 1600 750 1000 500 O O 4300 4500
Hluhluwe HLU O 3509 O 1009 3202 O 15 4894 2195 14824 960
Mkuzi MKU O 1397 O 69 533 6 4 9394 O 11403 323
Wankie WAN O 2630 2620 1250 5450 4000 2000 800013000 38950 13300
Ouicama OUI O O 1500 1500 5500 O 3500 O 8000 20000 9960
Kruger KRU O 10000 1400 4730 11124 5700 1300 153000 24200 211454 19084
Manyara MAN O O O 55 25 O O 700 1500 2280 93

Total 232973 515631 36462 18536 47023 13867 9911 241773 99290 1215466
296 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 297

9.8.7 Data BU~~J~ve~ L .. ,L•. ~ ,L, ~ ... .••••••••••.•M.!.! .H••. ~ ••. H•••••.•• ~ .M. ~. ~ ..•••••••••H•.••.
LEKS N
KBL eH MWQK M
The original data of this study are observed numerical frequencies of antelope ATAE A
AlU U L KAUR A

at the generic and triballevels in 16 sub-Saharan African wildlife areas. There


are 17 genera which constitute 9 tribes. There are thus two data matrices
--:-[.. ~. .*-*-*---:
o

N
T
K oL R

L
e
l

J
P
·*'**-:-~-:--~-[-""'r----:-*--
F eA E

ERE
VAO
U

E
P
UN l U

E
P
o

N~! = 0.645
l4O.1'I'.)

of interest-the 16 x 9 matrix of tribal frequencies (reproduced in Table


'iow' scale '",e.d<..um' 'h.i.gh'
9.13) and the 16 x 17 matrix of generic frequencies respectively (see Greenacre bu.6hcoveJt I I bU1>hCl1VeJt,
0.5
and Vrba, 1984, Table 1). The sources of the census data are given by
Vrba (1980, Table 14.2), who also performs a detailed interpretation of these
FIG.9.17. Optimal 1-dimensional display. by correspondence analysis. of the
data. area-corrected data of Table 9.13. with rows indicated aboye the axis and the
AIso available are a number of categorizations of the areas in terms of their columns below. Above each wildlife area a category (L-Iow. M-medium. H-high)
rainfall, bush cover, altitude, longitude, biomass, tribal diversity, etc., given of bushcover is given. The three dummy variables for the discrete variable bush­
also by Greenacre and Vrba (1984). Although these are rather crude cover can also be displayed as supplementary points in the positions indicated.
summaries of complex ecological variables (only 3 categories for each
variable), they nevertheless provide useful supplementary information for
interpreting the frequency data. 9.8.3 Results and interpretation
In the correspondence analysis of the area-corrected tribal data of Table 9.13,
the game reserves and antelope tribes are projected onto the first principal
9.8.2 Method
axis as shown in Fig. 9.17. Above each reserve point acode indicates the
The wide range of the total antelope frequencies in each game reserve reflects reserve's category ofbushcover-L for low bushcover, M for medium and H
the tremendous differences in size of these areas (see last two columns of for high. The supplementary points for these categories are also displayed. It
Table 9.13). The profiles of each game reserve would normally be assigned is immediately clear that all the reserves with low bushcover are on the
masses proportional to these totals, with larger reserves like Kruger Park and negative side of this axis and there is a wide gap between these and the other
Serengeti receiving very high masses. There is in fact no ecological significance reserves. It was concluded that this principal axis, which is determined solely
in assigning masses to game reserves proportional to their total antelope by the profiles of antelope frequencies in the reserves, is highly associated
population, since it is primarily history and politics that have determined with the environmental variable bushcover. Antelope tribes which líe on the
their sizes. This is a case where correspondence analysis should not be applied negative side of this axis would thus be associated with low bushcover
to the raw data, even though they are frequencies. Instead, it was decided to reserves.
re-express the data as frequencies per unit area, which is equivalent to The interpretation continued in the same vein and the second, third and
reweighting the game reserve profiles by their total number of antelope per fourth axes were attributed to the supplementary variables, longitude,
unit area (square km). This puts the different areas more on a par with one biomass and rainfall, respectively (only the first two principal axes are
another for purposes of comparison. The actual mechanics of this procedure reported by Greenacre and Vrba, 1984, as axes 3 and 4 appear to be
is simply to divide the rows of Table 9.13 by the respective surface areas prior unstable). Figure 9.18 shows the display of the points in two dimensions and
to performing the correspondence analysis. This changes the row masses, not the ordination of the reserves along the second axis (the vertical axis) turns
the row profiles, but the metric in the space of the row profiles is altered, as out to be almost identical to their west-to-east longitude across Africa, as
are the antelope profiles in the dual space. In this way less emphasis is placed indicated by the positions of the "longitude" categories.
on antelope tribes which had high mass merely because they are present in A further interesting aspect of this particular analysis is the way these
the larger game reserves. graphical displays may be used to display comparable fossil frequency data.
The supplementary categorical data can be coded as a logical indicator There were in fact five fossil sites in southern Africa where skulls of the same
matrix and each category can be represented in the graphical displays to antelope tribes have been found, dating back to lt-2t million years ago,
assist the interpretation. AIso a cluster analysis of the game reserves can be the period called Plio-Pleistocene. Each site thus has a profile across the
represented on the same displays, as described in Section 7.4. antelope tribes and its position with respect to the principal axes may be
298 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 299

~2= 0.385 I (23.9%) * MKUZl


represented as a supplementary point, using the appropriate transition
formula. Their positions in the plane of the first two principal axes are also
.AfP.
shown in Fig. 9.18. It turned out that four of these sites correlate highly with
the first axis and lie well to the negative side of the axis, suggesting that in the
Plio-Pleistocene the fossil sites were part of a low bushcover environment.
* KRUGER
AIthough we have not given the subdivision of the tribal frequencies into
generic frequencies here, it is interesting to report that axes 1 and 2 of the
present tribal analysis are recovered as axes 1 and 3 of the generic analysis
NAIROBl * (Greenacre and Vrba, 1984). The principal inertias are quite similar:
_NGORONGORO
. .- L - *SERENGETI IlOngLtude,
o E.\ST ~I' 0.645 Tribal analysis Generic analysis

NlT. ALC. HLU~lUWE (40.1"1.1


Axis 1 Al = 0.6452 Axis 1 Al = 0.6730

F

TRA.
Axis 2 Az = 0.3852 Axis 3 )'3 = 0.3996

F
but slightly lower in the tribal analysis, as expected by the results of Example
F 7.5.4, the tribal frequencies being a condensation of the generic frequencies.
CoveA: lOW F Axis 2 of the generic analysis reflects a feature of generic variation within
o
MANYARA * tribes which is obviously absent in the tribal analysis.
ETOSHA *
* KALAHARI
LAKE* CoveA:HIGIl
TURKANA o BOV••
9.9 HLA GENE FREQUENCY DATA IN POPULATION GENETICS
WANKIE *

• REV. This section is a summary of the analyses described by Greenacre and Degos
(1977). This is a particularly nice iIIustration of a case where several principal
axes can be interpreted.
*KAFUE

9.9.1 Data
BICUAR * QUI,AMA
o • eNEO.
HIP. • longLtude: CENTRAL There are two sets of data, emanating respectively from the Fifth and Sixth
International Histocompatibility Workshops (Bodmer et al., 1973; Bodmer,
* CUELEI 1976) (histocompatibility == compatibility of tissue). These were two extensive
surveys of human populations aimed principalIy at studying the so-caIled
HLA chromosomic region (HLA == human leucocyte antigen) on human
• CEPo chromosome number 6. At the time of the Fifth Workshop, in 1972, the HLA
.cale
r - - - 0 .5
LUANOO* o complex was known to inelude 2 serologicaIly defined genes, now caIled
CoveA:MEVIUM
o HLA-A (at locus A) and HLA-B (at locus B). Twelve aIleles for HLA-A and
rongLtude :WEST
15 aIleles for HLA-B had been identified, thus aiready establishing the
extraordinary polymorphism of the complex. In this workshop an attempt to
FIG.9.18. Optimal 2-dimensional display, by correspondence analysis. of the test aIl hum&n populations was made. By the time of the Sixth Workshop,
area-corrected data of Table 9.13. Dummy variables representing categories of which aimed mainly at an intensive study of Caucasoid populations, know­
bushcover and longitude are also displayed as supplementary column points. Five ledge had progressed to the extent of identifying a total of 15 aIleles for
supplementary row profiles. derived from frequencies of antelope fossil skulls in 5
fossil sites. are also displayed as supplementary points (indicated by F). Notice
HLA-A, 20 for HLA-B and 5 for an additional gene in the system, HLA-C.
that Fig. 9.1 7 is the projection of this display onto the first (horizontal) axis. The populations studied and alleles tested in the Fifth Workshop are given in
9. Applications of Correspondence Analysis 301

Table 9.14 (see Greenacre and Degos, 1977, for Sixth Workshop populations
and alleles).
Because each person has a pair (maternal and paternal) of sixth chromo­
sornes and because the HLA system is codorninant, a typical HLA typing (or
phenotype) might be:
.c .o
.o O)
c~
C'?C'? HLA-A3,A28
C'?s:
ro ::JO)
~ C 5:(]J -B8
Q. ~"O ::J
E c
- C ~_.~
, c <t N -CW2
o o ~~ ~_.
'N
E ~s:
(/)L 0>­
u ';::; ro O) QJ
~ (The "w" signifies a "workshop" allele which is still in the process of
ro ;fl C =::)..r::.
.8
(/) ::J ~:SE
0.-
ro
u 'x -"><--
._ c:
lo...

O> O O>
5:(]J
<t,...: definition.) The interpretation of the A locus is clear since the person is
I Q. u
~~~
o ro ro ro ::J C 'N
C
CL QJ QJ
ro O> ;
;:::'S: definitely heterozygous with respect to this gene, i.e. the chromosomes are
~ o
C
'S 'S5
C - o
~ ro
E
ro
c: e c: en ........ ­

rororoo>.8"Oc C'?(]J
different at this locus, respectively A3 and A28. The fact that only one allele
E E E o c ~.~
C ro o ::J
u.. 5:.
~ >-<.9 <.9
~L ro L roo>O) LO
<t~
C
_ro ro u ro .­ Ol ~ ....... .o

..r::. -'= -'=


~ '0. ~ ~ ~ ~ t:: Q¡ :; Qí
-
(B8) is identified at the B locus can theoretically mean one of two things:
QJ
L
:=.,:.= o ro O> O> o L 0.0 ~ E
>-~ g-.~ g; ro § ~~~6o;flE +s:
c Ü:Ü:~~ZZZ(f)(f)f= .~ <t <.9 CL CL O 5: (f) (]J(]J(]JUI5:~ O(]J either the person is homozygous for this gene (both chromosomes have the
~'
c
o
.;::;
"O
C ~ cc>' B8 allele) or there is an as yet unidentifiable allele on one of the chromosomes,
..8 <t...­
ro
.;;
C
ro 11 s: known as the "blank" allele. If the person were French, say, then we would be
"O a:--.JU:::)_UJO .S:? ~ <t CL ~ UJ a: Z ~IZ(f)Zf-Z~ ~(]J
~ ~ -'--.JO<t<.9<.9>I~(]J Ui>-:::)<t_:::)<t:::) ~:::):::):::)OO-<t C'!.o almost sure that he is actually homozygous since the B locus alleles are all
(/) .o
U:::U:::~~ZZZ(f)(f)~ E<t<.9CLCL05:N :? (]J (]J (]J U I 5: N O"l ...­ well-defined and identifiable for European Caucasoid populations thanks to
'<I"~ci. .o
<t <t Z "'-s:
~~o
(J) . _ L
5:(]J thorough family studies. The C locus result is a bit more difficult since this
<t, r­'
~ -"5 ~ O"lN
gene is not yet so well defined, and the possibility exists that the other allele
ce --
<t::-:: o~ "O N(]J is the "blank" allele.
f-~5: C <t .

<t
--.J
Q.
(/)

c
o
- -
ro

(/)
O>
~

O>
(/)
.00
00...­
N(]J
<t .
.'<1"
In any case, given a random sample of individuals from a certain popula­
tion and their HLA phenotypes it is not difficult to compute an estimate of
I .;::; (/)

~"O "O
"O QJ
"O
c ...- ...­
...-(Il
the frequency of the different alleles (including the "blank") at each locus. The
ro C ro
<t .
:o ::J
Q.
Q.

~
C

E
C

~~ ~ ~_.g . C'?
0...­
sum of these gene frequencies at a particular locus is 1. In the case of the Fifth
"O
o
CL (/) O) ~.o C
(/)
O>
-
a:i.~
::J O>
E ...-CD Workshop populations we obtained the gene frequencies direcdy from the
O'~ <t .
C
ro
O C
>ro
> 'ti)
.-
L
---
-c_
-..- ='
ro
.~
~ -u<t<t
__ c:
"O .N
O"l ...­
Joint Report (Bodmer et al., 1973), while for the Sixth Workshop populations
c: o.... Q)~Q)Q)Q) ~
"O .::::."-
..::.¿
c: ü
co .-
Q)
ctl -..- L
...::.¿
co
Q)
.~ .~ ~.~ ~ ~ Q).:!:. o <tlXl we computed the frequencies from the raw data using random, unrelated
~
CJ)
O> L ro L .- "O O> .- (/) CL C
(/)
c: .s:. C:,,;:¡
:::J en o.... o. ~ c: ....... Q)
c: c: .2' ~ .21 c c ;fl Q¡ E individuals only.
~
(/)
~g~t)c~~·~~g~~·~E~~E
~rocro~roB~roroo>ro~U::J>O>
o o es ~ ~.~ 'ti) ~
.o U.o ro roL ro(/)
c <t(]JUJUJu..(f) __ ~--.J--.J(f)_(f)f->>- <t <t-,-,UUJUJ
­ o
ro
::J
Q.
c
.g
ro
.;;
~
O o
"O
9.9.2 Method
The data suggest two correspondence analyses:
o ~ ~<t(f)<.9<tUJ<tUJOACL(]J<tZOa:~~ gu 0z-'-(f)~ ::5:5
CL ua:<tZCLa:(f) ~<tUJ(f):::)U~>UJ c(]J (]J-~::I:<t(f)
.o
.o ~<t(]JUJUJu..I~~~--.J--.J--.JCL(f)f->>- o<t <t<t<tuUJUJ II
ro ~ Analysis 1. Analysis of all the populations in both workshops with respect to
~ <t u
the set of alleles defined in the Fifth Workshop (hence the alleles common to
both workshops). Because of the overwhelming majority of Caucasoid
populations in the Sixth Workshop we determined the principal axes of
inertia using the data from the Fifth Workshop only, with the Sixth
Workshop populations as supplementary elements. The allele AW33 and the
"blank" alleles at both the loci under consideration were also made supple­
302 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 303

mentary elements after a preliminary analysis. Hence this analysis included


samples from 49 Fifth Workshop populations, with 74 Sixth Workshop
samples as supplementary elements. Average profiles of ethnic groups can
also be displayed as supplementary elements, if required. s
Analysis 2. Analysis of the populations from the Sixth Workshop with respect
S~ @ 8
.m;;:;- ~
to all the alleles (including those newly defined). Because the majority of
populations were European Caucasoid we adopted the following strategy. ..," ~_
26.&"I.~ "a· --~ .. ...,
ocal. en bo.h a'.'
""""'tlii
&'~
@ r.w
=
The principal axes of inertia were determined using the 58 samples of •• _lSl . •
European Caucasoid populations, with the other samples as supplementary
elements. Several new alleles had to be made supplementary elements as
well, these being outliers with very low masses (i.e. low frequencies of
mlI [e;s)
@s
occurrence). lWl
Here we shall mainly discuss the first analysis. I!m @) AXIS2 ®
~ 134". or lnertlo
~

9.9.3 Resu/ts and interpretatían


In the displays that follow there are 4 different types of points: FIG.9.19. Optimal 2-dimensional display. by correspondence analysis. of the
HLA gene frequency data (for abbreviations of the populations. see Table
Fifth Workshop populations: represented by their abbreviations (see 9.14 (a)). The ethnic averages for the Sixth Workshop samples are also displayed
Table 9.14a) and framed by a rectangle. as supplementary points (1 to 6. see text).
Alleles: represented by their abbreviations (see Table 9.14b) and framed
by an ellipse.
Sixth Workshop populations: represented simply by a single letter abbre­ groups according to the HLA system. The Oceanic and American Indian
viation. samples distinguish themselves clearly from the Asiatic, Middle Eastern,
Ethnic averages for the Sixth Workshop data: represented by one of the.. European and African groups. The projections onto the plane of these 4latter
following numbers framed by a square: groups occupy different, though overlapping regions. In the centre of the
display are the Asian samples, stretching from Japanese and South East
1= European Caucasoid
Asian to the Far Eastern Caucasoid samples (India and Pakistan) which lie
2= Middle Eastern Caucasoid
close to the Middle Eastern and European Caucasoid groups. The Middle
3= Far Eastern Caucasoid
Eastern Caucasoid samples (Turks, Yemenites, Lebanese and Arabs) them­
4= African Negroid
selves determine a separate region between the Asian region and the region
5= mixed Negroid and Caucasoid
on the positive side of the first axes shared by the Negroid and European
6= Asian Mongoloid
groups. Many of the main alleles in European populations are present in the
Note that the ethnic average can be considered as a centroid of all the Negroid populations, and in comparison to the large differences of the other
individuals belonging to the particular ethnic group, allhough strictly speak­ populations these two groups are quite similar.
ing it is not the average gene frequency computed from the raw data. Examining the projections of the alleles in this plane, we notice a cor­
We shall push the interpretation as far as the sixth principal axis, even responding spread of alleles in accordance with their predominance in a
though it will appear that only the first four principal axes are stable. particular ethnic group. The alleles which contribute strongly to these axes
are BW22, A9, BW40, A2 and BW35.
First and second principal axes (Fig. 9.19) Although not indicated in Fig. 9.19, the Sixth Workshop samples are
This principal plane, representing 26.8 %+ 13.4 % = 40.2 % of the total consistently situated, without any exceptions, in the region corresponding to
inertia, clearly demonstrates the degree of separation of the different ethnic their ethnic group. The ethnic averages are displayed and vouch for this fact.
304 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 305

® ~ AXIS4

I
AXIS4
lAiHI ~ 8.8'-.orinerti ol

j'1
~ f~~~~~rw~-~
seo. 0l'I

~
boIh o,., @®
1m! --::@ m N ~
~Iüil <@:I mi 1m! Ee!I ~ ~
E
@ IZ!BI I
I
@ ~ E ~'u
o E---r"
AXIS 3 liI ,.
E G
93"1. of Inerlto 'vJ
~
1
I s GG E U
lAm B~UCED
I
I
-01­
S~
!lome SoCol. on

both oxes
~ p---"+---l¡,_~F-;iI-----'__+----;>"....-~.
E @j'.
m
~ <gIi U
F
F

U E
E
G
u

IZílBI : ¿
¡T:m3l~
u R

ot-t'l3 ot-1.27 "- 1m ¡ G YF 1 S

j ai
.J
y~U E Cy-,
2
I ~ ~ @@
m~8@~
1 01- 04 F
1 mE! H

FIG.9.20. Display with respect to principal axes 3 and 4 of the HLA gene lID1l ~
frequency data. The region encased by dotted lines is enlarged in Fig. 9.21. 1
mJ

Thus the main aspects of this data set are displayed as the separation of the
Oceanic populations (first axis), particularly from the European and Negroid
groups, and then the separation the American Indian populations from the FIG. 9.21. Enlarged display of the positive third principal axis, showing the
others (second axis). Sixth Workshop Caucasoid populations as supplementary row points. The abbre­
viations are: N. Norway; E. England; U. USA; D, Denmark; G. Germany; W.
Third and fourth principal axes (Figs 9.20 and 9.21 ) Sweden; S. Switzerland; F. France; P, Netherlands; V. Austria; X. Finland; Y. Israel;
C, Czechoslovakia; R. Russia; 1. Italy; H. Hungary.
The percentages of inertia explained by these axes are respectively 9.3 % and
8.8 %' The proximity of these values suggests two phenomena of almost equal
importance and the orientation of the principal axes is likely to be unstable.
Therefore an interpretation is more valid in the plane rather than along the position of the Middle Eastern Caucasoids among the Southern Europeans.
separate axes. Notice the great spread of the United States samples in Fig. 9.21-this is
The plane clearly represents the spread of the populations which are indicative of the heterogeneity of the American people with respect to this
situated on the positive half of the first principal axis of Fig. 9.19. Stretching genetic system.
diagonally across the plane (from bottom left-hand corner to top right), there Stretching out perpendicularly from the line of spread of the Caucasoids
is a clear spread of the Caucasoid samples. The main contrast is between are aH the Negroid samples, opposed chiefly to the English and Scottish
Northern Europe and the two samples from Sardinia associated with alleles samples. The alleles AW19.2 (sum of AW30 and AW31) and BW17 alone
BW21 and B18. The positions of the Sixth Workshop European popula­ contribute a large amount to the inertia represented by this planeo
tions confirm the interpretation of this factor (see Fig. 9.21), and with very
few exceptions the Mediterranean populations can be separated from the Remaining principal axes (not illustrated)
Northern European and English populations. Other Caucasoid populations The fifth and sixth principal axes, explaining 6.9 % and 5.4 % of the total
are situated between these two extremes, the only notable tendency being the inertia respectively, still appear interesting, but they represent oppositions
1

306 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 307

among relatively few samples. Thus the fifth axis is principally an opposition
I HOT

between the samples of the coastal populations of New Guinea and Australia
(aboriginal) owing chiefly to the allele A11, of very high frequency in the New
[

~~~ -----'~
Guinean sample. The sixth axis contrasts the samples from Easter Island and [ WPA
PUN I q
1

Lapland and, correspondingly, the alleles Al! and BW10 are opposed to
BW35 and BW15. However, the orientation of these minor axes is found to
be quite sensitive to changes in the mass accorded to the population sample [ X~~
~fi . I

points and the relative importance of these phenomena must be judged with I
this reservation in mind (see discussion below). 1CE

In the second analysis (of the Sixth Workshop European Caucasoid [ ~~~ -----'~

populations) the principal inertias are much lower, indicating the relative ENG '
homogeneity of the European Caucasoids genetically. The first principal axis BUH

also represents the contrast between northern and southern populations seen 1 [
~~~----_ ....~

on the fourth axis of analysis 1 (Figs 9.20 and 9.21). The first principal inertia
(0.029) is of the same order as the fourth principal inertia (0.025) of the first I
,
[~Á~
cLAP
I :
I
I

[g~~
analysis, but slight1y higher because of the additional alleles included in
analysis 2. For further resuits and interpretation, see Greenacre and Degos
(1977).
Jenkins (1983) uses the same methodology in an analysis of gene frequency
l [

PI M

~~~ =::::::::::.
t

~ [§~~
data of African populations.
t

9.9.4 Discussion [ ~b~AMJ


AIN
•I
I

T~5
[ EPA
The ordering of the principal axes in terms of the values of the principal
I
inertias informally quantifies the heterogeneity of the groups of populations I
concerned. Thus the main ethnic differences appear along the major axes, '1 FIL
followed by differences within ethnic groups. The minor axes represent [ CHI
t

contrasts between single populations which can suffer from internal insta­ CNVI I I

bility discussed in Section 8.1. [ HSA


LSA
1
.
: I
In these analyses we have retained the "natural" masses in the gene cABC : I
frequency matrix, that is equal masses for all samples. It would be clearly cEAS
cNGC :
_ _I

meaningless to assign masses to the samples proportional to the sample sizes,


since these were determined from practical considerations (for example, it is
cABO
[F I J ,:
: I I

easier to obtain European Caucasoids than coastal New Guineans), and even NGU . 1
less justified to assign masses proportional to the population size (China AYM
[ WAR I :
. I

would dominate such an analysis). Benzécri (personal communication) I

advised that we attempt "to obtain clusterings of the populations in order to I


reveal close populations which are just variants of the same allelic equilibrium o I I I I I l.
o . o
~
O
~
and which should be each counted with a weight inversely proportional to
the number of populations in its class". A hierarchical clustering of the Fifth
FIG.9.22. Binary tree which shows the cluster analysis. by maximum linkage (or
Workshop populations, based on the X2.-distances and with inter-cluster "diameter c1ustering"). of the populations in terms of their X2-distances. The si ice
distance computed by the diameter method (maximum linkage), yields the through the tree at a distance of 1 .08 partitions the populations ¡nto 21 groups.
308 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 309

clustering tree of Fig. 9.22. Slicing the tree at the X2-distance of 1.08 results in
21 clusters, sorne of which (e.g. Lapps, North Vietnamese) are single
populations. Each population's set of gene frequencies is then divided by the
number of populations in the respective cluster, so that each cluster of
populations receives an equal mass. The correspondence analysis of this
(row-) reweighted matrix did not difTer noticeably from Analysis 1 aboye with
respect to the first four principal axes. However, difTerent local features in the
data appeared from the fifth axis, indicating internal instability of these axes
(cf. Section 8.1). This justifies our halting of the interpretation at the fourth
axis.
I
The question of external stability and of statistical significance in this
context is an interesting one. In order to bootstrap the display of the
populations and the alleles, independent resampling with replacement must
I 4 ~ LB Le '
.-,II

be carried out within each sample of people from a given population and
a replicate matrix of estimated gene frequencies computed. The sampling
units in this study are the individuals themselves, not the populations. Ir a
number of replicate matrices were derived, an idea of the variability of the
population and allele points would be obtained by the projection of the
replicated profiles onto the principal axes, as described in Section 8.1. Ir
another type of scaling technique were used, perhaps with a difTerent defini­
tion of inter-population genetic distance, then bootstrapping can again be
performed but each bootstrapped distance matrix might have to be re­
analysed to obtain a new display which is then fitted to the original display
(see discussion in Section 8.1). There is scope for sorne interesting work in this
area, especially since there is a vast literature on genetic distances and their
statistical properties (for example, Balakrishnan and Sanghvi, 1968; Jacquard,
1973; Edwards, 1971; Smith, 1977).

9.10 MEASUREMENTS ON FOSSIL SKULLS, WITH MISSING DATA

This application illustrates the analysis and interpretation of data which, like
the data of Section 9.6, are measurements on a ratio scale. In the data set of
interest there are a substantial number of missing values, which need to be
imputed if all the rows and columns are to be displayed. The final results of FIG.9.23. Measurements taken on the skull fossil of a cave bear in the studies of
Marinelli (1931), Mottl (1933) and Cordy (1972); diagram according to Cordy
the correspondence analysis illustrate how the displays may be used for (1972)
informal classification of additional data.
This section is a summary of the unpublished report by Greenacre (1974). between the years 1829 and 1836 by the palaeontologist Schmerling in the
province of Liege, Belgium. As a reference data set, Cordy uses the data of
Marinelli (1931) on the cave bear skulls from the Drachenhohle caves at
9.10.1 Data
Mixnitz, Austria, as well as the data of Mottl (1933) on the cave bear skulls
Cordy (1972) describes a collection of fossil skulls of cave bears gathered from the Igrichohle caves at Pest, Hungary. We shall use the same data sets

~; ,

310 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 311

in our analysis, although we stress the obvious point that these do not AZ=0.000481(24.7%) *"
represent an exhaustive sampling of cave bear skulls.
The basic data sets thus comprise: . *" .. Seo'e
.. . . . .. .. -. . . . . *
WF I-----i
0.0\
(1) 47 skulls from the Drachenhohle,
.. .
(2) 77 skulls from the Igrich6hle,
(3) 12 skulls from the Schmerling collection.
we * ...
. - ... .
.
. .. ....
.
. .. - . .
..
..
WZ
. He

Fortunately, the same measurements were made on all these skulls and these
...A.."'*"'"
}o( .6.
i:J. HF l:J.
i:J..6.
'" .. A ­ ...
o '"

.. - -
­
WM... *'}.L1
are indicated in Fig. 9.23. We ourselves introduced a new variable LI =
LF - (LD + LM) and then omitted LF, which is now decomposed into three '"
'"
'" ",'" o
LS
LP
WI
* LO

WT
AI=0.00052 (26.7%\

segments: LI, LD and LM. Similarly, LB was omitted because it is the sum '" '" '" '" '"'" '" o
O
O
O

~I
of the facial and cerebrallengths: LB = LF + Le. Because the depth G of the '" '" '"
O
o o
O O

'" '" '" '" '" '" o


o
o o
glabel1a had not been measured on most of the Drachenhohle skulls, we also '" '" "'1A oo '"
o,
LM o
o
decided to omit this measurement, especially when a preliminary analysis '" '" o LD

'" o
o
showed that it caused instability in the displays. o
Amongst the skulls themselves, 4 of the Schmerling skulls were so damaged o o
..... o
that more than half their measurements were missing, so these had to be
omitted. This left a data matrix of order 132 x 17, given by Greenacre (1974). _ small (female\ skulls from DrachenhOhle

"'+
Draehenhohle .. Iarge (molelskulls from DrochenhiShle
o smoll (femole1skulls from 19r1elilhle

9.70.2 Methad mole 'emale


*
'" large (male\skulls from IgrichOhle
skulls from Sehmerling callection
Approximately one third of the 132 skulls have complete data on the Igriehohle
variables retained. The remaining skulls have missing values on an average
of 3 variables. By imputing these missing values the data set is thus effectively FIG.9.24. Optimal 2-dimensional display, by correspondence analysis. of the

tripled. We chose to use a "second-order approximation" in the imputation, skull fossil data, where the missing elements have been interpolated using the

that is the reconstitution formula of a 2-dimensional correspondence analysis reconstitution formula (8.5.1), also in 2 dimensions. The dispersion of the skull

points is summarízed at bottom left.

recovers the imputed values exactly (cf. Section 8.5).


Separate analyses of the Drachenhohle and Igrichohle skulls were under­
taken, as well as an analysis of both these data sets with the Schmerling skulls observe that the sex group separations do take place along slightly diagonal
as supplementary points. Rere we shall discuss this last analysis only. axes, as indicated at bottom left of Fig. 9.24.
Amongst the variables it is seen (and may be verified by the tables of
contributions-see Greenacre, 1974) that the male-female dimorphism is
9.70.3 Results and interpretatían
explained by the corresponding opposition of variables we, WZ, HF, HI,
Figure 9.24 shows the 2-dimensional display obtained by correspondence HM against LD, LO, LM, WT, LP. The larger male skulls are clearly wider
analysis. Notice how low the principal inertias are (Al = 0.00052, ..1. 2 = 0.(0048), and higher in profile than the smaller female skulls which are characterized
since the profiles are quite similar to each other, the skulls aH being derived by the length measurements, especially faciallengths. Notice that the separa·
from the same biological species (Ursus Spelaeus). Nevertheless, over 50 %of tion of the sexes is less suceessful amongst the Drachenhohle skulls, possibly
the inertia is represented by this display and there are clear separations due to the low proportion of females in that sample.
between the "large" (male) and "small" (female) skulls as well as between the The display may be used for purposes of informal classification. For
skulls from the Drachenh6hle and the Igrich6hle. Because the first 2 principal example, the large Igrichohle skul1 indicated by the arrow in Fig. 9.24 seems
inertias are close to each other (26.7 %and 24.7 %of the inertia respectively), more like a female in terms of its profile. This is in agreement with the
the principal axes will be relatively unstable in the planeo It is interesting to assertion of Cordy (1972), who re-classifies Ibis skull as Cemale. The sex oC the
312 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 313

Schmerling skulls may be predicted similarly by their positions in the display, able to identify as accurately as possible in advance the type of weather
and they can also be seen to be more similar to the prachenhohle skulls than situation which is imminent in order to declare the day (or other fixed time
the Igrichohle skulls. More detailed interpretation of Fig. 9.24, as weH as of period) an experimental unit or nol. Here we shaH describe how a large
other principal axes, is given by Greenacre (1974). heterogeneous data set is recoded and analysed by correspondence analysis,
and how the resultant graphical displays are used to arrive at daily weather
forecasts on an operational basis.
9.70.4 Discussian
Remember that correspondence analysis displays difTerences in the shape of
9.77.7 Data
the skulls, not in their size. In this application the shape difTerences on the first
dimension are correlated with size difTerences (see Benzécri, 1977a, Section The basic data set consists of 485 days of data, gathered during three rainfall
3.7.2; 1978). By contrast, a principal components analysis of the data usuaHy seasons, on relevant meteorological variables. These variables are of various
produces a first dimension of "size" and then dimensions orthogonal to (un­ types, but mostly quantitative variables such as temperatures at various
correlated with) "size" (cf. Section 9.6). levels ofthe atmosphere and wind vectors (wind speeds and directions).
The justification for using correspondence analysis on these data is again Each of these days has aIready been assigned to one of 5 weather types,
that the measurements do have a positive mass interpretation-the addition briefiy described as follows: (1) fair weather days; (2) general rain days; and
of two measurements is a physical reunion of quantities, as in Section 9.6. then 3 categories of days on which convective activity is present: (3)
Exclusive use of the X2-distance is not easily defined, but then any choice of convective days that do not meet with the seeding criteria of the experiment;
metric has a certain ad hoc quality. The X2-distance does have the advantage (4) convective days that do meet the seeding criteria; (5) convective days
of the principIe of distributional equivalence which ensures a certain stability where hail is present in the clouds. This decision is made by the project
of the skull positions if variables are grouped or subdivided (Section 4.1.17). leaders at the end of the day, once the actual weather situation has been
As stated in Section 8.5, the imputation of the missing values is a strategy observed from the ground, from the air and on radar.
to complete the data matrix so that all rows and columns may be displayed,
rather than an estimation of the missing data. We would thus restrain the
9.77.2 Methad
interpretation of points which have a relatively high frequency of missing
data. For example, two of the Schmerling skulls have measurement HI In order to use data on aH the variables in a global analysis, an overall
missing, which is an important variable on the second axis. The imputed recoding of the data set into a multivariate indicator matrix was performed.
values are fairly low, which contributes to their lying towards the positive It was convenient to recode each variable into 9 discrete categories (J q = 9).
side of the second axis. The positions of these particular skulls on the second The categories were not chosen on a purely ad hoc basis, but in close
axis should thus be interpreted with caution. collaboration with the meteorologists involved in the project. Thus the
The informal classification described in this study is a nice fringe benefit of range of a continuous variable like surface temperature is divided into 9
the graphical display. In the next section data are analysed with the specific intervals, taking into account the rounding error in the measurements and
purpose of performing classifications. meteorologically relevant boundaries. This process of discretization is es­
pecially suitable for recoding the wind vectors, which are otherwise quite
unmanageable quantities in conventional multivariate statistical analysis. An
9.11 GRAPHICAL WEATHER FORECASTING example of one of the categories of the wind vector at the 300 millibar level
is category 8: speed greater than 20 m/sec, direction between 2200 and 255°.
This section illustrates the material of Sections 5.4, 7.1 and 7.2, using In the context of the experiment it was found convenient to separate the
meteorological data from a weather-modification experiment in an important study into two parts, first the distinction between fair weather (type 1),
river catchment area in South Africa. The experiment is still in the preliminary general rain (type 2) and convective (types 3 to 5) days, and secondly the
stages where weather and rainfall patterns are being investigated before the distinction amongst the three types of convective days. Here we shall discuss
actual statistical and physical cloud-seeding experiment. the results of the first part of the study only, although the methodology in
Classification plays a major role in such an experiment. It is crucial to be both parts is essentially the same.
314 Theory and Applications of Correspondence Analysis 9. Applications of Correspondence Analysis 315

For the purposes of this study we presumed that only the first two seasons' TABLE9.15

data (258 days) were available to design a classification strategy, and that the Table of correct and incorrect forecasts. using the classifi­

cation strategy based on neighbourhoods. applied to 127

third season lay ahead, as it were, in order to evaluate the strategy's new days.

efTectiveness. In the notation of Section 7.1 (see Fig. 7.2), N is the totally
recoded matrix of design data and Zo is the logical indicator matrix with 3 Observed
columns indicating the category fair, general or convective. The matrix Predicted Fair General Convective
f.l'
'},
,~
Fair 23 3 16 42
).2=0.08871 (34.8%)
General 1 3 4 8
Convective 8 4 65 77

, 32 10 85 127
,

® Pr.watw
~
........
3
I N' = Z¿N thus has general element n~j = the number of days of type h for
N-NWwind
,
.moderate I which variable category j was observed. The test data may be denoted by N*
I and the problem is thus to predict the classification of each row of N* and
I
I I compare it to the actual classification.
3 , I As described in Section 7.1 the analysis of N' discriminates between the
I
~I

' 3"
3
3 3
' 1/3 centroids of the 3 subclouds of points. The advantage of having 3 groups of
Pr._.' 3 " 3 3 /
points is that the subspace of the centroids may be represented exactly in a
1_
~

1 11 3 3 3 3 3 3/ 3
" I
planeo The rows of the original data matrix N are projected onto this
--
1,
3
- ;.......1 1.•..:1
3
3
i " ,1
3
3 3
33 3
3 _ _3_ 1_3 _ subspace and are identified by their weather type (Fig. 9.25). The rows of the
). = 0.1661 1 11 113 ............ "'J- .. ~ 3 3 3 3 3 /8'3 3

(65.2%), <D ® "new" data N* are also projected onto this plane and are classified according
~.......
I 1 1 \ 1 ,3 3 .3 3 3 3/

'
1
1 '3
\ 1
3 3
3
1 3
__ ........... /
3/ 3
to the frequencies of neighbouring points of known weather type.
1
1
1
1..--r-n
L-
1
~_ J 3 3
J
3 ~'" J It was not too difficult to arrive at a reasonable radius of a neighbourhood,
33 ':"333333 indicated in Fig. 9.25, which represents a set of points considered to be similar

3 3 3 'J
in the' context of these particular data. Again this decision was made in

31 J J J

scale 3
consultation with project members familiar with aH the days in the study.
~
3
0.1
Here it proved both instructive and informative to examine the neighbour­

1 = FA IR
I :~

i)
hoods for a set of increasing radii, noting how the relative frequencies of the
weather types change as "rings of similarity" are added to the neighbour­
2 ~ GENERAL RAIN
ti
hoods. When two weather types have the same frequencies in a neighbour­
3 = CON~[CTl YE ~,
-,~
hood we have assigned the weather type with the closest centroid, although
FIG.9.25. Exact 2-dimensional display of the three centroids (circled points 1. 2
there is still considerable room for improvement in the actual classification
and 3) of the three groups of days. The individual days. labelled by their weather decision (see Section 9.11.3).
types. are projected onto this plane. that is displayed as supplementary points. As The results of the test c1assification are summarized in Table 9.15.
examples of the many variables used in this analysis. the 9 categories of the
variable "precipitable water". which has high association with the weather types.
are indicated, as well as 2 categories of wind, "moderate to strong north to 9.11.3 Discussion
north-westerly winds". which associate strongly with general rain days. An
example of a new day. indicated by? and its neighbourhood are shown-the There are two major advantages of the aboye forecasting strategy over more
forecast would ciearly be a type 3 day. conventional statistical techniques. First, important data on the speed and
316 Theory and Applications ofCorrespondence Analysis 9. Applications of Correspondence Analysis 317

direction of the wind are taken into account by categorizing the wind vectors. equally spread around. It again seems that we should be balancing treatments
The recoding of all the observations to be discrete results in a homogeneous and controls in the subspace which exhibits the association between the
data set which can be analysed global!y. Secondly, the spatial framework and covariates and the response. This subspace can be identified by conducting
neighbourhood concept is particularly useful when applying the c1assification the exploratory phase of the experiment, which is standard practice now in
procedure operational!y. When the predictor information of a new day is weather modification experiments. For a one-off experiment like the typical
known, the day can be immediately situated as a point in the display of the clinical trial with a modest sample size, avoiding the pitfal!s of the randomiza­
historical data points. Not only can the types of neighbouring days be tion can be tricky. We would suggest a sequential allocation of treatments
identified, but also their actual dates. Relevant information on the weather and controls if the response is known after a relatively short time, so that the
situations on these dates can be recal!ed in arder to remind the human subspace can be estimated during the experiment in an attempt to balance the
forecaster of these similar situations in the past. This should assist him in groups. Otherwise the only other satisfactory design seems to involve
making a more accurate forecast ofthe situation at hand. matching treatment and control units as closely as possible, for example in
More work still has to be done on the actual c1assification decision. For matched pairs. Notice that even when an exploratory phase is possible it is
example, the distances of the neighbouring points to the new point can be good practice to "update" the response-covariate subspace during the course
taken into account, so that closer points "count more" than points further of the experiment, as the association between covariates and response might
away. The use of cross-validation, as described in Section 7.2, can assist in be changing.
finding out if additional sophistication of the procedure leads to improved
classifications.
Finally, notice that similar geometric frameworks have other useful appli­ 9.12 REFERENCES TO PUBLlSHED APPLlCATIONS
cations in experimental designo For example, the most fundamental require­
ment of treatment and control groups of experimental units is that they be This section gives a comprehensive list of references to published applications
balanced with respect to their covariates. In a weather modification experi­ of correspondence analysis, c1assified by field of application. Because most of
ment there are a number of important variables which can be associated these are in French (and mostly published in the journal Les Cahiers de
with the response variable of interest, say rainfall. Many experiments have I'Analyse des Données), we specifically indicate when the articles are in
been heavily criticized by statisticians who have discovered later that the English. We also give a very brief summary, in telegraphic style, of the data
randomized decision of the experimental units into treatment and control and special features of the application.
groups has a strong association with sorne important covariate, which could
explain the observed between-group differences. When there are many such Art and archaeology
covariates available, the chances of nul!ifying the resu1ts of an experiment by Prehistoric art in Southern Europe-Roux et al. (1976); various contin­
such an argument are very good. The same problem crops up in the design of gency tables, published by Leroi-Gourhan (1965), for example nij = number
clinical trials, where the treatment and control groups are rarely similar
of times theme j (e.g. a horse) is found in region i; nodes of a clustering tree
across the many covariates like age, severity of illness, etc. Since the vector of are displayed (cf. Section 7.4).
covariates (or recoded covariates) defines a point in multidimensional space,
the desired strategy assigns treatment and controllabels to each point in such Typology of stone-age tools-F. Benzécri and Djindjian (1977); 359 x 18
a way that the two c10uds of points are as confounded ("unclustered") as data matrix (tools x variables), recoded as a 359 x 70 indicator matrix.
possible.
In the case of sorne clinical trials where the cases are al! available before Analysis of musical scores, illustrated by a choral work of Bach- Morando
the randomization occurs this could be performed by identifying the low­
(1980); various matrices are suggested, for example nii , = number of times
dimensional subspace close to al! the cases, subdividing this subspace note i is fol!owed by note i'.
and then randomly allocating treatments and controls within each multi­
dimensional "block". Working in this subspace does not necessarily avoid the Epidemiological, biomedical and pharmaceutical
danger that the response in the experiment is associated with the residual
space of the covariates in which the treatments and controls have not been Variations of Iymphoidalleukemias and their evolution-Bastin (1976);
318 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 319

detailed medical files on 102 hospital cases; all data recoded into discrete English; data sets which are simulated to represent ecological gradients are
form, resulting in a 102 x 91 indicator matrix (cf. Section 5.4). used.

Death or survivalfrom myocardial infarct-Nakache et al. (1977); 101 x 15 Taxonomy of the horse genus Equus, based on skull and jaw measure­

matrix (patients x variables), recoded as a 101 x 83 indicator matrix; various ments-Eisenmann and Turlot (1978); 349 x 25 matrix of measurements

types of discriminant analysis, sorne based on correspondence analysis, are (individuals x variables).

performed and compared.


Ecology of Orthoptera acridiae (locusts, grasshoppers) in west Africa­

Retrospective study of infant mortality in Mali-Abbaoui-Maiti (1979); Dahdouh et al. (1978); a large and very detailed data set on 105 species of the

811 mothers of deceased children respond to a detailed biographical and insect, as well as 429 species of vegetation at the 35 sites surveyed, 97 environ­

medical questionnaire; various subtables are analysed. mental variables of a permanent nature, 6 variables of a temporary nature;

subsets of variables are seleeted and various analyses are reported.

Relationship between training and performance in sport-Fouillot and


Tekaia (1979); continuous and discrete variables are observed during a Estimation of palaeo-environments according to marine fossil remains­
number of training sessions and are recoded as dummy variables. Roux (1979); the framework ofthis study is very similar to that ofGreenacre
and Vrba (1984), summarized in Section 9.8.
Introduction to correspondence analysis, using medical data-F. Benzécri
(1980); 7 x 6 table nij = number of times medicine j is administered for
i
Relationship between locust population changes and meteorological
disease i. variables-Meimaris (1979); a climatology of the region of interest

Typology and pathology of lymphocytes-Bastin and Flandrin (1980);


).) (Madagascar) is first developed, using cluster analysis of the days in terms
of the meteorological data.
4944 x 40 indicator matrix zij = 1 if lymphocyte i is in category j (there are 8
continuous variables, recoded into 5 categories each). Primary and secondary protein structures (sequences of 5 amino acids,
called pentapeptides, are considered)-Mullon and Colonna (1980); for ex­
Clinical tria~s of antibiotics-Assouly et al. (1980); 187 x 69 indicator ample, the 100 x 16 table n(il)j = the number oftimes amino acid i (i = 1 ... 20)
matrix (patients x categories of diserete variables). is in position I of a pentapeptide (1 = 1 ... 5) with a spatial (secondary)
structure of type j, e.g. a helix (j = 1 ... 16); attempt at classifying a penta­
Biology and ecology peptide in terms of its structure (j), given its (primary) sequential structure,
by nearest neighbours (cf. Section 7.2).
10 different applications of numerical taxonomy in botany, phytosoeiology,
zoology and ecology-Benzécri et al. (1973, Volume 1, Part C, which we shall Morphology of Brachiopods (types of shellfish)-Gaspard and Mullon
refer to as Vol. 1C). (1980); 356 x 13 matrix of measurements (individuals x variables), the indi­
viduals being obtained from three different coastal areas of France.
Taxonomy of a plant genus-Guittoneau and Roux (1977); described in
Seetion 5.4.
Psychology and animal behaviour
Taxonomy of small mammal genus Crocidura, based on skull measure­
ments-Abi-Boutros and Bellier (1977); 238 x 20 matrix of measurements Child fears-in Benzécri et al. (1973, Vol. 1C, no. 13); 34 x 34 symmetric
(individuals x variables). matrix of frequencies nij' = number of children who express fears i and i'
together; interpretation is carried out as far as the tenth principal axis; the
Comparison of eorrespondence analysis (in the form of reciprocal averaging) study is followed by a commentary by the psychologist who collected the
to non-metric scaling and principal components analysis- Fasham (1977), in data.
Relationships amongst a group of children-Lebeaux et al. (1976); 30 x 30 matrix of cells and the data are coded as nij = 1, if the ith character passes
table nji' = acceptancejindifferencejrejection of child j towards child j'. through the jth cell, =0 otherwise; 11 replications in different handwriting
styles of the characters a, b, d and pare considered.

,
I '
Perception of the countryside- Brun-Chaize (1978); pairs of photographs
are shown to each of 324 people and the more preferred member is selected Discussion of pattern recognition and the recoding of images- Benzécri
in each case; the matrix nii' = number of people who prefer photographs j (1981b).
and (, is analysed as well as supplementary background information on the (I
¡
respondents.
Education
Mating parade of the albatros-Blosseville (1981); data are of the form
Marks in the admission examination of the Ecole Polytechnique, 1970 and
nij = number of times attitude j is observed during "individual-sequence" j,
1971-Nakhlé (1976); 911 x 20 table Yij = mark on subjectj by candidate i;
that is each row represents a particular bird during a demarcated time period
illustrates doubling (ef. Section 6.1).
of activity; see also Spence (1978).
Tertiary education in Greece and the profession of the fathers of students­
Geology Meimaris (1978); 75 x 9 matrix nij = number of students from faculty j at one
of the 4 Greek universities, with fathers in professionj.
Geomorphological study of granulometric data-F. Benzécri et al. (1976);
102 x 20 table nij = mass of sand in sample i which falls in granulometric
Results of multiple-choice mathematics test given to 1300 pupils- Murtagh
classj (e.g. betw~en 0.4 and 0.5 mm in diameter).
(1981); 155 multiple-choice items partitioned into 55 groups; eomparisons
( are made between test and re-test data on the same pupils and between boys
Soil mechanics, marine sediments-Benzécri et al. (1981); two sets of
and girls; Guttman efTect illustrated and discussed.
samples of marine deposits collected off Irish Coast; categorizing of con­
tinuous variables.
Primary school children and their families-Kubow-Ivarson (1982); 774
\ children surveyed, 89 questions with a total of 632 response categories,
Chemical analyses of geological samples from Canadian volcanic belt­
leading to a 632 x 632 Burt matrix; illustrates the analysis of various "slices"
David et al. (1974), in English, with outline of the methodology; data
of the Burt matrix, i.e. subtables of order J x J l' where J = 632 and J 1 < 632.
analysed including all 22 elements, then using trace elements only; analyses
compared.
Social surveys
Chemical analyses of geological samples from Ethiopian volcanic belt­
Teil and Cheminée (1975), in English, with outline of the methodology in Teil Living conditions in Lebanon, 1960-1970-in Benzécri et al. (1973: Vol.
(1975), in English but using Benzécri's notation; study very similar to that of 2C, no. 4, 5); 60 x 140 matrix of ratings Yiq = rating (O to 4) in region i on
David et al. (1974) described aboye; groups of samples are investigated. question q; correspondence analysis of doubled ratings; comparison made
between surveys in 1960 and in 1970.
Pattern recognition
Rural development in Colombia-in Benzécri et al. (1973: Vol. 2C, no. 6);
Electron microscopy- Van Heel and Franck (1980), in English; images are 45 x 306 matrix of ratings Yiq = rating (O to 4) in region i on question q;
divided up into a 32 x 32 matrix of cells and the data are in the form nij = correspondence analysis of doubled ratings; separate analysis performed to
intensity of cellj in image i; this application is discussed by Benzécri (1981a). reveal patterns in missing data.

Pattern recognition of alphabetie eharacters-Chaumereuil and Villard Attitude of students in Zalre to acts perpetrated against people and their
(1981); a rectangle in which a letter is written is subdivided into a 10 x 7 possessions-in Benzécri et al. (1973: Vol. 2C, no. 9); 535 x 54 matrix Yiq =
322 Theory and Applications ofCorrespondence Analysis 9. Applications ofCorrespondence Analysis 323

severity rating (O to 10) given by student ion act q (e.g. accidental poisoning); Operations research, mu/tip/e-criteria decision making
correspondence analysis of doubled ratings. Evaluation of environmental impact and cost in situating an autoroute­
Gutsatz (1976); 1 = 7 alternatives, Q = 9 heterogeneous criteria, recoded as
Public irnage of the judiciary in France-in Benzécri el al. (1973: J = 23 dummy variables.
Vol. 2C, no. 10); for example, 1072 x (69 + 366) recoded indicator matrix
(respondents x categories), where there are 69 categories of personal informa­
lnstallation of services in a building- Vasserot (1977); 105 x 105 matrix
tion and 366 categories of opinion; illustrates analysis of a large survey and
(not symmetric) Yw = rating ofhow much contact section i needs with section
the recoding of open-ended questions.
i' ; various analyses, including cluster analysis, are reported.
Comparison of activities of high-school boys and girls-Goudard el al.
U sage of a large computer in terms of different types of jobs submitted and
(1977); 676 x 33 table (pupils x questions), recoded as a 676 x 100 indicator
their various execution times-Carreiro (1978); various analyses are per­
matrix; part ofthis study is an illustration of the theory of Yagolnitzer (1977),
formed, including principal components analyses.
who compares correspondence analyses of two matrices with the same row
and column entities.
Comparison of correspondence analysis and factor analysis in the context
Survey of daily activity and attitude to work of 423 Parisians on the eve of of multiple-criteria decision making-Stewart (1980), in English; a critical
their retirement-Blosseville and Cribier (1979); 19 questions, recoded as 60 study of these two techniques when applied to two small data matrices;
dummy variables. correspondence analysis was generally the most successful.

Survey on attitudes of French army, navy and air force non-commissioned


officers-Rosenzveig and Thomas (1979); 6000 respondents, 186 questions. Economics
Naming of activities within a particular industry- Volle (1970); summarized
Opinion survey amongst walkers in Fontainebleau forest- Mandille el al. in Benzécri el al. (1973: Vol. lC no. 12); 432 x 39 matrix nij = 1 if company i
(1979, 1980); 633 people respond to 22 questions (for example: "Are you produces article j; clustering and correspondence analysis illustrated.
aware that trees have been cut down in the centre of the forest? lf so, does
tbis shock you ?"). French foreign trade in 1832-Robert (1976); 28 x 10 table nij = value (in
francs) of product i traded in manner j (e.g. imported by ship); illustrates
General discussion of opinion surveys, illustrated by survey of commuters calculation of mutual contributions of nodes of cluster analysis (of the
in Paris on their satisfaction/dissatisfaction with public transport-Benzécri products) and principal axes (cf. Section 7.4).
(1980b).
Foreign investment ratings-Greenacre and Benzécri (1976); 43 x 15 table
Analysis of open-ended questions in surveys-Lebart (1982a, b), former
= average rating of country i on criterion j (e.g. political stability; ratings
Yij
reference in English; illustrates the correspondence analysis of large sparse
doubled (cf. Section 6.1).
data matrices and the use of stochastic approximation (Section 8.7).
,.1'
Multidimensional time series of economic indices-Teillard (1976); for
Po/itics example, the 144 x 112 table Yij = value oC indexj in month i; illustrates the
United Nations general assembly resolutions in 1976-Hamrouni and use of the tables of contributions to identify "aberrant" indices.
Benzécri (1976); (ef. Section 5.3).
Multidimensional time series of Brasilian imports of machines and mecha­
Voting attitudes in national elections in Greece from 1958 to 1977­ nical tools-Gouvea (1977); 38 x 17 matrix nij = value (in units of $100000)
Tsébélis (1979). of machine type i imported in year j.
324 Theory and Applications of Correspondence Analysis 9. Applications of Correspondence Analysis 325

Review article on data analysis ("analyse des données") in economics­ Consumer survey of cigarette smoking-in Benzécri et al. (1973: Vol. 2C,
Benzécri (1980a). no. 3); for example, the 30 x 12 table of frequencies nij = number of people
who say cigarette i falls in category j (e.g. ratber pleasant); Guttman (borse­
Agricultural development in Syria from 1960 to 1977-Arbache (1982); shoe) etTect illustrated, especially tbe situation wben some points líe inside tbe
63 x 7 x 18 matrix nijl = production of product i in region j in year t; aH borsesboe.
possible two-way tables are analysed and compared.
Miscellaneous
Foreign subsidiaries of USA-based multinational companies-Cholakian
(1980a); 3-way 42 x 28 x 7 table nijl = number of foreign subsidiaries in Ratings of films by newspaper film critics-Tagliante et al. (1976); 242 x 8
industrial sector i in the country (or region) j during the period t; various table Yij = rating of film i by critic j; each column is doubled and a third
analyses of 2-way contingency tables derived from this table are reported, column is also added for eacb critic to record omissions (i.e. non-responses).
some of which are marginal tables (e.g. the 42 x 28 table of nij., summed over The coding scbeme is oftype (a) in Table 5.5.
t), others obtained by combining two ofthe indices into one (e.g. the 42 x 196
table of niUI )' where the columns represent countries at particular time Interurban traffic flow in the southern suburbs of Paris- Kazmierczak
periods); the use of supplementary points and cluster analysis is extensively (1978); 15 x 15 matrix nii' = number of people who travel from bome in zone
illustrated; the same data is used in a subsequent article (Cholakian, 1980b) i to work in zone i' (not a symmetric matrix); various analyses are performed,
to illustrate the adjustment of a contingency table to have prescribed margins including tbe computation of the convex polygon of the symmetrized data
(Madre, 1980; Benzécri, Bourgarit and Madre, 1980). matrix (cf. Section 8.6).

Exports from India, 1963 to 1975-Gopalan (1980); multidimensional Quality control and durability of shoes, studied both in natural conditions
time series nijl = value of export of product i to country j during year t; and in laboratory tests-Hariri (1980); a questionnaire comprising 21
similar approach to Cholakian (1980a) aboye. questions, yielding 63 categories of response, is completed for each pair of
shoes in tbe study at various points of time; various condensations of tbe data
are considered (cf. discussion of focusing in Section 8.2).
Supply and demand of employment, and patterns of unemployment­
Cabannes (1980); multidimensional time series, for example nil = number of
jobs otTered in industrial sector i during month t; displays seasonal patterns
. within each year and trends from year to year.

Evolution of balance of payments from 1967 to 1978 in 21 European


countries-Ibrahim (1981); 3-way data matrix nijl = value of item j (e.g.
freight and merchandise: credit) in country i during year t.

Market research
Choice of a product name in market research- Vasserot (1976); 17 x 19
table of frequencies nij = number of people who associate name i with
adjectivej; background data on the people interviewed are also analysed.
1
EtTectiveness and etTect of washing powders, their chemical composition
and ecotoxicity-Benzécri and Grelet-Puterflam (1981); recoding of hetero­
geneous data. I 1

j '
l '\

References 327

Benzécri, J.-P. (1969b). Statistical analysis as a tool to make patterns emerge from
data. In "Meihodologies of Pattern Recognition, (Watanabe, S., ed.), pp. 35-74.
Academic Press, New York.
Benzécri, J.-P. et al. (1973). "L'Analyse des Données". Tome (Vol.) 1: La Taxinomie.
Tome 2: L'Analyse des Correspondances. Dunod, Paris.
Benzécri, J.-P. (1977a). Histoire et préhistoire de l'analyse des données. 5-L'analyse
des correspondances. Cahiers de rAnalyse des Données 2, 9-40.
Benzécri, J.-P. (1977b). Sur l'analyse des tableaux binaires associés a une corres­
pondance multiple. Cahiers de l'Analyse des Données 2, 55-71.
References
Benzécri, J.-P. (1977c). Choix des unités et des poids dans un tableau en vue d'une
analyse de correspondance. Cahiers de l'Analyse des Données 2, 333-352.
Benzécri, J.-P. (1978). Note de lecture: l'allométrie. Cahiers de l'Analyse des Données 1,
371-376.
Benzécri, J.-P. (1979). Sur le calcul des taux d'inertie dans I'analyse d'un questionnaire.
Addendum et erratum a [BIN.MULT.]. Cahiers de rAnalyse des Données 4,
377-378.
Benzécri, J.-P. (1980a). Analyse des données en économie. Cahiers de l'Analyse des
Abbaoui-Maiti, S. (1979). Cause de mortalité infantile au Mali: enquete rétrospective
Données 5,9-16.
aupres des meres d'enfants décédés. Cahiers de rAnalyse des Données 4, 49-59. Benzécri, J.-P. (1980b). Les sondages d'opinion: point de vue d'un statisticien.
Abi-Boutros, B. and Bellier, L. (1977). Contribution a la taxinomie des micro­ Cahiers de l'Analyse des Données 5,475-480.
mammiferes. Application au genre Crocidura. Cahiers de rAnalyse des Données 2,
Benzécri, J.-P. (1981a). Mémoires re~us: c1assification d'objets dans des micrographies
435-450. électroniques brouillées, au moyen de I'analyse des correspondances. Cahiers de
Alvey, N., Galwey, N. and Lane, P. (1982). "An Introduction to GENSTAT".
r Analyse des Données 6,101-107.
Academic Press, London. Benzécri, J.-P. (1981b). La reconnaissance des formes: le~on d'introduction en"forme
Anderson, T. W. (1958). "An lntroduction to Multivariate Statistical Analysis". Wiley,
de dialogue. Cahiers de l'Analyse des Données 6, 157-174.
NewYork. Benzécri, J.-P. and Cazes, P. (1978). Probleme sur la c1assification. Cahiers de
Arbache, C. (1982). Evolution des productions de l'agriculture syrienne par régions,
l'Analyse des Données 3,95-101.
de 1960 a 1977. Cahiers de l'Analyse des Données 7, 67-91. Benzécri, J.-P. and Benzécri, F. (1980). "L'Analyse des Correspondances: Exposé
Assouly, P., Maccario, J. and Auget, J. L. (1980). Analyse des essais d'un antibiotique. Elémentaire." Dunod, Paris.
r
Cahiers de Analyse des Données 5,361-367. Benzécri, J.-P., Bourgarit, C. and Madre, J. L. (1980). Probleme: ajustement d'un
Austin, M. P. and Noy-Meir, I. (1971). The problem of non-Iinearity in ordination:
tableau a ses marges d'apres la formule de reconstitution. Cahiers de rAnalyse des
experiments with two-gradient models. J. Ecol. 59, 763-773. Données 5, 163-172.
Balakrishnan, V. and Sanghvi, L. D. (1968). Distance between populations on the Benzécri, J.-P., Lebeaux, M. O. and Jambu, M. (1980). Aides a l'interprétation en
basis of attribute data. Biometrics 24, 859-865.
Bartlett, M. S. (1951). The goodness of fit of a single hypothetical discriminant
r
c1assification automatique. Cahiers de Analyse des Données 5, 101-123.
Benzécri, J.-P., Biarez, J. and Favre, J.-L. (1981). L'analyse des données en mécanique
function in the case of several groups. Ann. Eugen. 16, 119-214. des soIs. Application a des sédiments marins. Cahiers de l'Analyse des Données 6,
Bastin, C. (1976). Les leucémies Iymphoides chroniques: la diversité des cas et leur
39-57.
évolution. Cahiers de l'Analyse des Données 1, 419-440. Benzécri, J.-P. and Grelet-Puterflam, Y. (1981). Sur les poudres de lessive utilisées
Bastin, C. and Flandrin, G. (1980). Typologie des Iymphocytes et pathologie Iympho­
pour le lavage en machine: efficacité, usure de linge, composition chimique et
cytaire (étude préliminaire). Cahiers de rAnalyse des Données 5,347-359. écotoxicité. Cahiers de l'Analyse des Données 6, 415-437.
Benzécri, F. (1980). Introduction a l'analyse des correspondances d'apres un example
Blosseville, J.-M. (1981). Analyse des dialogues: la parade de l'albatros. Cahiers de
de données médicales. Cahiers de l'Analyse des Données 5, 283-310. r Analyse des Données 6,345-376.
Benzécri, F., Bressolier, C. and Thomas, Y. (1976). Deux analyses de données
Blosseville, J. M. and Cribier, F. (1979). Les types de métiers: une analyse multi­
granulométriques en géomorphologie. Cahiers de rAnalyse des Données 1, 145-160.
dimensionel1e des caractéristiques socioprofessionnelles de Parisiens a la veille de la
Benzécri, F. and Djindjian, F. (1977). Typologie de l'outillage préhistorique en pierre
retraite. Cahiers de rAnalyse des Données 4,29-47.
taillée. Application a la définition du type burin de Noailles. Cahiers de l'Analyse
Bodmer, J. G. et al. (1973). "Joint report of the Fifth Histocompatibility Workshop".
des Données 2, 215-238. Histocompatibility Testing 1972,619-719. Munksgaard, Copenhagen.
Benzécri, J.-P. (1963). "Cours de Linguistique Mathématique". Université de Rennes, Bodmer, J. G. (1976). The ABC of HLA. In "Histocompatibility Testing 1975", pp.
Rennes, France. 21-99. Munksgaard, Copenhagen.
Benzécri, J.-P. (1969a). Approximation stochastique dans une algebre normée non Bradley, R. A., Katti, S. K. and Coons, 1. J. (1962). Optimal scaling for ordered
commutative. Bull. Soco Math. France 97,225-241. categories. Psychometrika 27,355-374.
,j
i
328 Theory and Applicarions ofCorrespondence Analysis
References
329
Bradu, D. and Gabriel, K. R. (1978). The biplot as a diagnostic tool for models oftwo­
Chatfield, C. and Collins, A. 1. (1980). "An Introduction to Multivariate Analysis".
way tables. Technomerrics 20, 47-68. Chapman and Hall, London.
Bradu, D. and Grine, F. E. (1979). Multivariate analysis of Diademodontine crania Chaumereuil, P. and Villard, J. P. (1981). Un example de discrimination de figures par
from South Africa and Zambia. S. Afr. J. Sci. 75, 441-448. l'analyse des correspondances. Cahiers de l' Analyse des Données 6, 108-114.
Bretaudiere, J.-P., Dumont, G., Rej, R. and Bailly, M. (1981). Suitability of control Chola~ian, V. (1980a). Les filiales étrangeres des entreprises multinationales originaires
materials. General principIes and methods of investigation. C/in. Chem. 27, 798-805. des Etats-Unis: analyse de leur répartition par industrie, pays et date de création.
Bretaudiere, J.-P., Rej, R., Drake, P., Vassault, A. and Bailly, M. (1981). Suitability Cahiers de l' Analyse des Données 5, 17-43.
of control materials for determination of IX-amylase activity. C/in. Chem. 27, 806­ Cholakian, V. (1980b). Un exemple d'application de diverses méthodes d'ajustement
815. a r
d'un tableau des marges imposées. Cahiers de Analyse des Données 5, 173-176.
Brun-Chaize, M. C. (1978). Le paysage forestier-analyse des criteres de préférence du
Chuang, C. (1982). On the decomposition of G2 for two-way contingency tables.
public apartir de photographies. Cahiers de I'Analyse des Données 3, 65-78.
Unpublished manuscript, Department of Statistics and Division of Biostatistics,
Bryant, E. H. and Atchley, W. R. (eds) (1975). "Multivariate Statistical Methods: University of Rochester, Rochester, New York.
Within-group Covariation". Halsted Press, Stroudsburg, Pennsylvania. Clemm, D. S., Krishnaiah, P. R. and Waikar, V. B. (1973). Tables of the extreme
Burt, C. (1950). The factorial analysis of qualitative data. Br. J. Psychol. (Srarisrical roots of a Wishart matrix. J. Srarisr. Compur. Simul. 2, 65-92.
Secrion) 3,166-185. Cordy, J.-M. (1972). Etude de la variabilité des cranes d'ours des cavernes de la
Burt, C. (1953). Scale analysis and factor analysis. Comments on DI. Guttman's collection Schmerling. Annls. Paléonrologie, pp. 151-207.
paper. Br. J. Srarisr. Psychol. 6, 5-23. Corsten, L. C. A. (1976). Matrix approximation, a key to application of multivariate
Cabbanes, J. P. (1980). Analyse de quelques séries relatives au chomage. Cahiers de methods. "Proceedings of the 9th International Conference of the Biometric
I'Analyse des Données 5, 443-474. Society", 1, pp. 61-77. Raleigh, North Carolina, U.S.A.
Calimlim, J. F., Wardell, W. M., Cox, c., Lasagna, L. and Sriwatanakul, K. (1982). Cox, C. and Chuang, C. (1982). Comparison of analytical approaches for ordinal data
Analgesic efficiency of orally Zomipirac sodium. Abstract presented at the 83rd from pharmaceutical studies. Unpublished manuscript, Division of Biostatistics,
annual meeting of the American Society of Clinical Pharmacology and Thera­ University ofRochester, Rochester, New York.
peutics, Lake Buena Vista, Florida, March 17-20, 1982. C/in. Pharmacol. Therap. Cox, D. R. and Brandwood, L. (1959). On a discriminatory problem connected with
31,208. the works of Plato. J. R. Srarisr. Soco B 21, 195-200.
Carreiro, S. (1978). L'utilisation des ressources d'un ordinateur: diversité des travaux Dahdouh, B., Durantan, J. F. and Lecoq, M. (1978). Analyse des données sur
et leur variation dans le temps. Cahiers de l' Analyse des Données 3, 343-354. l'écologie des acridiens d'Afrique de I'ouest. Cahiers de I'Analyse des Données 3,
Carroll, J. D. (1968). Generalization of canonical correlation analysis to three or more 459-482.
sets of variables. "Proceedings of the 76th annual convention of the American David, M., Campiglio, C. and Darling, R. (1974). Progress in R-and Q-mode analysis:
Psychological Association", 3, 227-228. correspondence analysis and its application to the study of geological processes.
Carroll, J. D. (1972). Individual differences and multidimensional scaling. In "Multi­ Can. J. Earrh Sci.l1, 131-146.
dimensional Scaling: Theory and Application in the Behavioral Sciences" (Shepard, De Leeuw, 1. (1973). Canonical analysis of categorical data. Thesis, Psychological
R. N., Romney, A. K. and Nerlove, S., eds), Vol. 1, pp. 105-155. Seminar Press, Institute, University of Leiden, The Netherlands.
New York. De Leeuw, J. (1982). Nonlinear principal component analysis. In "COMPSTAT
Carroll, J. D. (1980). Models and methods for multidimensional analysis ofpreferential 1982", (Caussinus, R, Ettinger, P. and Tomassone, R., eds.) pp. 77-89. Physica­
choice (or other dominance) data. In "Similarity and choice" (Lantermann, E. D. Ver/ag, Vienna.
and Feger, H., eds), pp. 234-289. Hans Huber Publishers, Bern. De Leeuw, J. and Van Rijkevorsel, J. (1980). HOMALS and PRINCALS-some
Carroll, J. D. and Arabie, P. (1980). Multidimensional scaling. Ann. Rev. Psychol. 31, generalizations of principal components analysis. In "Data Analysis and
607-649. Informatics" (Diday, E. et al., eds) pp. 231-242. North Holland, Amsterdam.
Cazes, P. (1978). Méthodes de régression. III-L'analyse des données. Cahiers de Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from
I'Analyse des Données 3,385-391. incomplete data via the EM algorithm. J. R. Starisr. Soc. B 39,1-38.
Cazes, P. (1980). L'analyse de certains tableaux rectangulaires décomposés en blocs:
Deniau, C. and Oppenheim, G. (1979). Effet de I'affinement d'une partition sur les
généralisation de~ propriétés rencontrées dans l'étude des correspondances lO.
valeurs propres issues d'un tableau de correspondance. Cahiers de I'Analyse des
a
multiples. 1: Définitions et applications l'analyse canonique des variables quali­ Données 4, 289-297.
tatives. 11: Questionnaires: variantes de codages et nouveaux ca1culs de
Derijver, P. A. and Kittler, J. (1982). "Pattern Recognition: A Statistical Approach".
contributions. Cahiers de I'Analyse des Données 5,145-161,387-406. Prentice-Hall, London.
Cazes, P. (1981). L'analyse de certains tableaux rectangulaires décomposés en blocs:
Deutsch, S. B. and Martin, J. J. (1971). An ordering algorithm for analysis of data
généralisation des propriétés rencontrés dans I'étude des correspondances multiples. arrays. Operations Res. 19, 1350-1362.
III: Codage simultané de variables qualitatives et quantitatives. 111. Cas modeles. Dixon, W. J. et al. (1981). "BMDP Statistical Software 1981". University ofCalifornia
Cahiers de r Analyse des Données 6,9-18,135-143. Press, Berkeley, California.
Chambers, J. M. (1977). "Computational Methods for Data Analysis". Wiley, New Eckart, C. and Young, G. (1936). The approximation of one matrix by another of ,
York. lower rank. Psychometrika 1, 211-218.
1
1
330 Theory and Applications ofCorrespondence Analysis
References
331
Edwards, A. W. R. (1971). Distances between populations on the basis of gene Wiley, Chichester, UK.
frequencies. Biometrics 27, 873-881.
Gabriel, K. R. and Zamir, S. (1979). Lower rank approximation of matrices by least
Efron, B. (1979). Bootstrap methods-another look at the jackknife. Ann. Statist. 7,
squares with any choice ofweights. Technometrics 21, 489-498.
1-26.
Gaspard, D. and Mullon, e. (1980). Étude de la dilTérenciation spécifique sur trois
Efron, B. and Gong, G. (1981). Statistical theory and the computer. In "Computer
populations de térébratules biplissées du Cénomanien. Cahiers de rAnalyse des
Science and Statistics: Proceedings of 13th symposium on the Interface (Eddy, Données 5,193-211.
W. F., ed.) pp. 3-7. Springer-Verlag, New York.
Gauch, H. G. (1977). "ORDIFLEX-A Flexible Computer Program for Four
Eisenmann, V. and Turlot, J.-e. (1978). Sur la taxinomie du genre Equus. Cahiers de
Ordination Techniques: Weighted Averages, Polar Ordination, Principal Com­
rAnalyse des Données 3, 179-201. ponents Analysis and Reciprocal Averaging, Release B." Comell University, Ithaca,
El Borgi, Y. (1978). Programme de tracé de polygone convexe associé a une loi N.Y.
symétrique. Cahiers de /'Analyse des Données 3,219-234.
Gauch, H. G. (1979). "COMPCLUS-A FORTRAN Program for Rapid Initial
Escofier-Cordier, B. (1965). L'analyse des correspondances. Doctoral thesis, Université Clustering of Large Data Sets." Comell University, Ithaca, N.Y.
de Rennes. Later published in Cahiers du Bureau Universitaire Recherche Opération­
Gauch, H. G. (1980). Rapid initial clustering oflarge data sets. Vegetatio 42,103-111.
el/e, no. 13 (1969),25-39.
Gauch, H. G. (1982). "Multivariate Analysis in Community Ecology." Cambridge
Escofier, B. (1979). Traitement simultané de variables qualitatives et quantitative en University Press, Cambridge.
analyse factorielle. Cahiers de r Analyse des Données 4,137-146.
Gauch, H. G. and Stone, E. L. (1979). Vegetation and soil pattem in a mesophytic
Escofier, B. and Le Roux, B. (1976). Influence d'un élément sur les facteurs en analyse forest in Ithaca, New York. Am. Midl. Nat.102, 332-345.
des correspondances. Cahiers de r Analyse des Données 1, 297-318.
Gauch, H. G. and Wentworth, T. R. (1976). Canonical correlation analysis as an
Falkenhagen, E. R. and Nash, S. W. (1978). Multivariate classification in provenance ordination technique. Vegetatio 33, 17-22.
research. A comparison of two statistical techniques. Si/vae Genet. 27, 14-23:
Gauch, H. G., Whittaker, R. H. and Singer, S. B. (1981). A comparative study of
Fasham, M. 1. R. (1977). A comparison of nonmetric multidimensional scaling, nonmetric ordinations. J. Ecol. 69, 135-152.
principal components and reciprocal averaging for the ordination of simulated
Gauch, H. G., Whittaker, R. H. and Wentworth, T. R. (1977). A comparative study of
coenoclines, and coenoplanes. Ecology 58,551-561. reciprocal averaging and other ordination techniques. J. Ecol. 65,157-174.
Fenelon, J.-P. (1981). "Qu'est-ce que I'Analyse des Données?" Lefonen, Paris.
Gifi, A. (1981). "Nonlinear Multivariate Analysis." Department of Data Theory,
Fienberg, S. E. (1980). "The Analysis of Cross-Classified Categorical Data", 2nd University of Leiden, The Netherlands.
edition. MIT Press, Cambridge, Massachusetts.
Gittins, R. (1979). Ecological applications of canonical analysis. In "Multivariate
Finch, P. D. (1981). On the role of description in statistical enquiry. Br. J. Phi/osophy
Methods in Ecological Work" (Orloci, L., Rao, e. R. and Stiteler, W. M., eds),
Sci.32,127-144.
pp. 309-535. Intemational Co-operative Publishing House, Fairland, Maryland,
Fisher, R. A. (1940). The precision of discriminant functions. Ann. Eugen. 10,422-429. USA.
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpret­
Gnanadesikan, R. (1977). "Methods for Statistical Data Analysis of Multivariate
ability of classifications. Biometrics 21, 768-769. Observations". Wiley, New York.
Fouillot, J.-P. and Teakaia, F. (1979). Elaboration d'un langage commun entre médicins
Golub, G. H. and Reinsch, e. (1971). The singular value decomposition. In "Handbook
et sportifs par l'analyse des données. Cahiers de /'Analyse des Données 4,231-252.
for Automatic Computation" (Wilkinson, J. H. and Reinsch, e., eds), Springer­
Francis, l. (1981). "Statistical Software-a Comparative Review". North Holland, Verlag, Berlin.
NewYork.
Good, 1. 1. (1969). Sorne applications of the singular decomposition of a matrix.
Francis, 1. and Lauro, N. (1982). An analysis of developers' and users' ratings of Technometrics 11, 823-831.

statistical software using multiple correspondence analysis. In "COMPSTAT


Goodall, D. W. (1963). The continuum and the individualistic association. Vegetatio

1982" (Caussinus, H., Ettinger, P. and Tomassone, R., eds), pp. 212-217. Physica­ 11,297-316.
Verlag, Vienna.
Goodman, L. A. (1981). Association models and canonical correlation in the analysis

French, S. (1981). Measurement theory and examinations. Br. J. Math. Statist.


of cross-classifications having ordered categories. J. Am. Statist. Ass. 76, 320-334.

Psychol. 34, 38-49.


Gopalan, T. (1980). L'évolution du commerce d'exportation de l'Inde entre 1963 et

Friedman, J. H. et al. (1975). An algorithm for finding nearest neighbors. IEEE Trans. 1975. Cahiers de rAnalyse des Données 5, 407-442.
Computing 24,1000-1006.
Gordon, A. D. (1981). "Classification. Methods for the Exploratory Analysis of
Gabriel, K. R. (1971). The biplot-graphic display of matrices with application to Multivariate Data." Chapman and Hall, London.
principal component analysis. Biometrika 58,453-467.
Goudard, J., Grelet, Y. and Benzécri, J.-P. (1977). Les Iycéens du second cycle:
Gabriel, K. R. (1972). Analysis of meteorological data by means of canonical
comparaison entre filles et ganyons. Cahiers de rAnalyse des Données 2, 273-291.
decomposition and biplots. J. Appl. Meteorology 11,1071-1077.
Gouvea, V. (1977). Analyse des importations brésiliennes de machines et outils
Gabriel, K. R. (1978). Least-squares approximation ofmatrices by additive and multi­ mécaniques. Cahiers de rAnalyse des Données 2, 293-302.
plicative models. J. R. Statist. Soco B. 40,186-196.
Gower, J. e. (1966). Sorne distance properties of latent root and vector methods used
Gabriel, K. R. (1981). Biplot display of multivariate matrices for inspection of data in multivariate analysis. Biometrika 53,325-338.
and diagnosis. In "Interpreting Multivariate Data" (Bamett, V., ed), pp. 147-174.
Gower, J. e. and Digby, P. G. N. (1981). Expressing complex relationships in two
I

Jia

332 Theory and Applications ofCorrespondence Analysis References 333


dimensions. In "Interpreting Multivariate Data" (Barnett, V., ed), pp. 119-146. on canonical correlation and on Iinear-by-linear interaction. Ann. Statist. 9,
Wiley, Chichester, UK. 1178-1186.
Graybill, F. A. (1976). "Theory and Application ofthe Linear Model." Duxbury Press, a
Hamrouni, A. and Benzécri, J.-P. (1976). Les scrutins de 1967 l'assemblée des
Massachusetts, USA. nations unies. Cahiers de rAnalyse des Données 1,161-195,259-286.
Green, P. E. and Carroll, J. D. (1976). "Mathematical Tools for Applied Multivariate HamrOlmi, A. W. and Grelet, Y. (1977). Programme de calcul dont les contributions
Analysis." Academic Press, New York. relatives pour plusieurs groupes homogénes de variables. Cahiers de I'Analyse des
Green, P. J. (1981). Peeling bivariate data. In "Interpreting Multivariate Data" Données 2,353-359.
(Barnett, V., ed), pp. 3-20. Wiley, Chichester, UK. Hariri, A. (1980). La résistance a l'usure: usage naturel et essais de laboratoire.
Greenacre, M. J. (1974). Analyse des correspondances: les cranes d'ours des cavernes. Applications a cinq modéles de chaussure. Cahiers de l' Analyse des Données 5,
Unpublished dissertation, Diplóme d'Etudes Approfondies, Université de Paris VI, 177-191.
France. Hatheway, W. H. (1971). Contingency table analysis of rain forest vegetation. In
Greenacre, M. J. (1978a). Quelques méthodes objectives de représentation graphique "Statistical Ecology", Vol. 3: Many Species Populations, Ecosystems and Systems
d'un tableau de données. Thése de doctorat, 3e cycle, l'université Pierre et Marie Analysis (Patil, G. P., Pielou, E. C. and Waters, W. E., eds). Pennsylvania State
Curie, Paris. University Press, University Park, Pennsylvania.
Greenacre, M. J. (1978b). Sorne Objective Methods of Graphical Display of a Data Hayashi, C. (1950). On the quantification of qualitative data from the mathematico­
Matrix (the English translation of the 1978 doctoral thesis). Special Report, statistical point ofview. Ann. Inst. Statist. Math. 2,35-47.
Department of Statistics and Operations Research, University of South Africa Hayashi, C. (1952). On the prediction of phenomena from qualitative data and the
(November). quantification of qualitative data from the mathematico-statistical point of view.
Greenacre, M. J. (1981). Practical correspondence analysis. In "Interpreting Multi­ Ann. Inst. Statist. Math. 3, 69-98.
variate Data" (Barnett, V., ed), pp. 119-146. Wiley, Chichester, UK. Hayashi, C. (1954). Multidimensional quantification-with the applications to
Greenacre, M. 1. and Benzécri, J.-P. (1976). Tables a l'usage des investisseurs a analysis of social phenomena. Ann. Inst. Statist. Math. 16, 231-245.
l'étranger. Cahiers de I'Analyse des Données 1, 137-143. Hayashi, C. (1968). One dimensional quantification and multidimensional quantifi­
Greenacre, M. J. and Degos, L. (1977). Correspondence analysis of HLA gene cation. Ann. Jap. Ass. Philosophy Sci. 3, 115-120.
frequency data from 124 population samples. Am. J. Hum. Genet. 29, 60-75. Healy, M. J. R. and Goldstein, H. (1976). An approach to the scaling of categorized
Greenacre, M. J., Hudak, D. R. and Kahn, A. (1983). Graphical approach to weather attributes. Biometrika 63,219-229.
prediction in the context of the Bethlehem weather modification experimento Heiser, W. J. (1981). Unfolding analysis of proximity data. Doctor of Social Sciences
Progress Report, Bethlehem Weather Modification Experiment, Private Bag X15, thesis, Department of Data Theory, University of Leiden, The Netherlands.
Bethlehem, South Africa. Hill, M. O. (1973). Reciprocal averaging: an eigenvector method of ordination. J.
Greenacre, M. J. and Underhill, L. G. (1982). Scaling a data matrix in low­ Ecol. 61, 237-251.
dimensional Euclidean space. In "Topics in Applied Multivariate Analysis" Hill, M. O. (1974). Correspondence analysis: a neglected multivariate method. Appl.
(Hawkins, D. M., ed), pp. 183-268. Cambridge University Press, Cambridge, Statist.23,34O-354.
UK. Hill, M. O. (1979). "DECORANA-A FORTRAN Program for Detrended Corres­

Greenacre, M. J. and Vrba, E. S. (1984). Graphical display and interpretation of pondence Analysis and Reciprocal Averaging". Cornell University, Ithaca,

antelope census data in African wildlife areas, using correspondence analysis. N.Y.
Ecology (in press). Hill, M. O. (1982). Correspondence analysis. In "Encyclopedia of Statistical Sciences"
Guitonneau, G. G. and Roux, M. (1977). Sur la taxinomie du genre Erodium. Cahiers (Kotz and Johnson, eds), 2, pp. 204-210. Wiley, New York.
de rAnalyse des Données 2, 97-113. Hill, M. O. and Gauch, H. G. (1980). Detrended correspondence analysis, an
Gutsatz, M. (1976). L'analyse des correspondances-systéme de décision multi­ improved ordination technique. Vegetatio 42,47-58.
dimensionelle. Cahiers de rAnalyse des Données 1,47-59. Hill, M. O. and Smith, A. J. E. (1976). Principal component analysis of taxonomic
Guttman, L. (1941). The quantification of a class of attributes: A theory and method data with multi-state discrete characters. Taxon 25, 249-255.
of scale construction. In "The Prediction of Personal Adjustment" (Horst, P., ed), Hills, M. (1969). On looking at large correlation matrices. Biometrika 56, 249-253.
pp. 319-348. Social Science Research Council, New York. Hirschfeld, H. O. (1935). A connection between correlation and contingency.
Guttman, L. (1950). The principal components of scale analysis. In "Measurement Cambridge Philosophical Soc. Proc. (Math. Proc.) 31,520-524.
and Prediction" (Stouffer, S. A., Guttman, L., Suchman, E. A., Lazarsfeld, P. F., Holland, T. R., Levi, M. and Watson, C. G. (1980). Canonical correlation in the
Star, S. A. and Clausen, J. A., eds). Princeton University Press, Princeton. analysis of a contingency table. Psychol. Bull. 87, 334-336.
Guttman, L. (1953). A note on Sir Cyril Burt's "Factorial analysis of qualitative data". Holland, T. R., Levi, M. and Beckett, G. E. (1981). Associations between violent and
Bri. J. Statist. Psychol. 6,1-4. non-violent criminality: a canonical contingency-table analysis. Multivariate Behav.
Guttman, L. (1959). Metricizing rank-ordered and ordered data for a linear factor Res. 16,237-241.
analysis. Sankhyii 21,257-268. Horst, P. (1935). Measuring complex attitudes. J. Social Psychol. 6, 369-374.
Guttman, L. (1971). Measurement as structural theory. Psychometrika 36, 329-347. Horst, P. (1936). Obtaining a composite measure from a number of different measures
Haberman, S. J. (1981). Tests for independence in two-way contingency tables based .\,
of the same attribute. Psychometrika 1, 53-60.

~
:11
334 Theory and Applications 01 Correspondence Analysis
Relerences 335
Horst, P. (1963). "Matrix AIgebra for Social Scientists". Ho1t, Rinehart and Winston, Lancaster, H. O. (1963). Canonical correlations and partitions of "I!. Q. J. Math. 14,
New York. 220-224.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal Lancaster, H. O. (1966). Kolmogorov's remark on the Hotelling canonical cor­
components. J. Educ. Psychol. 24,417-441,498-520. relations. Biometrika 53,585-588.
Hotelling, H. (1936). Relationships between two sets of variates. Biometrika 28, Lancaster, H. O. (1969). "The Chi-Squared Distribution." Wiley, New York.
321-377. Lawley, D. N. (1956). Tests of significance for the latent roots of covariance and
Householder, A. S. and Young, G. (1938). Matrix approximation and latent roots. correlation matrices. Biometrika 43,128-136.
Am. Math. Monthly 45,165-171. Lawley, D. N. (1959). Tests of significance in canonical analysis. Biometrika 46,
Ibrahim, C. (1981). La balance des paiements de 21 pays de l'O.Cn.E.; son évolution 59-66.
de 1967 a 1978. Cahiers de l' Analyse des Données 6, 261-296. Lawley, D. N. and Maxwell, A. E. (1971). "Factor Analysis as a Statistical Method."
Jacquard, A. (1973). Distances généalogiques et distanees génétiques. Cahiers 2nd edn. Butterworth, London.
d'Anthropologie et d'Ecologie Humaine 1, 11-124.
Lebart, L. (1974). On Benzécri's method for finding eigenvectors by stochastic
Jambu, M. (1976). Programme de calcul des contributions mutuelles entre classes approximation (the case of binary data). In "Proceedings in Computational
d'une hiérarchie et facteurs d'une correspondance. Cahiers de rAnalyse des Données Statistics (COMPSTAT)". pp. 202-211. Physica-Verlag, Vienna.
1,77-92. Lebart, L. (1975). Validité des résultats en analyse des données. Report CREDOC­
Jambu, M. (1978). "Classification Automatique pour l'Analyse des Données. 1­ DGRST. 142 rue du Chevaleret, 75013 Paris.
Méthodes et AIgorithmes." Dunod, Paris. Lebart, L. (1976). The significance of eigenvalues issued from correspondence analysis.
Jambu, M. (1983). "Cluster Analysis in Data Analysis." North Holland, Amsterdam. In "Proceedings in Computational Statistics (COMPSTAT)", pp. 38-45. Physica­
Jenkins, T. (1983). Human evolution in southern Africa. In "6th International Verlag, Vienna.
Congress of Human Genetics" (Motulsky, A. G. et al., eds), Vol. 1. Alan R. Liss Lebart, L. (1981). Une procédure d'analyse lexicale écrite en langage FORTRAN.
Inc., New York. Cahiers de l' Analyse des Données 6, 229-241
Johnson, P. O. (1950). The quantification of qualitative data in discriminant analysis. Lebart, L. (1982a). Exploratory analysis of large sparse matrices with application to
J. Am. Statist. Ass. 45, 65-76. textual data. In "COMPSTAT 1982" (Caussinus, H., Ettinger, P. and Tomassone,
Johnson, R. M. (1963). On a theorem stated by Eckart and Young. Psychometrika 28, R., eds), pp. 67-76. Physica-Verlag, Vienna.
259-263. Lebart, L. (1982b). L'analyse statistique des réponses libres dans les enquetes socio­
Joreskog, K. G., Klovan, 1. E. and Reyment, R. A. (1976). "Methods in Geo­ économiques. Consommation-Revue de Socio-Economie 1, 39-62.
mathematics 1: Geological Factor Analysis". Elsevier, Amsterdam. Lebart, L., Morineau, A. and Fenelon, J.-P. (1979). "Traitement des Données
Kaiser, H. F. and Cerny, B. A. (1980). On thecanonical analysis ofcontingency tables. Statistiques." Dunod, Paris.
Educ. Psychol. Measurement 40, 95-99. Lebart, L., Morineau, A. and Tabard, N. (1977). "Techniques de la Description
Karchoud, A. (1981). Etude de la taxinomie des Équidés d'apres les mesures Statistique: méthodes et logiciels pour l'analyse des grands tableaux." Dunod,
squelettales. Cahiers de [' Analyse des Données 6, 453-463. Paris.
Kazmierczak, J. B. (1978). Migrations interurbaines dans la banlieue sud de Paris. Lebeaux, M.-O. (1974). Programmes de régression et de classification utilisant la
. Cahiers de rAnalyse des Données 3, 203-218. notiQn de voisinage. Doctoral thesis, Université Pierre et Marie Curie, Paris.
Kendall, D. G. (1971). Seriation from abundance matrices. In "Mathematics in the Lebeaux, M.-O. (1977). Notice sur l'utilisation du programme POUBEL. Cahiers de
Archaeological and Historical Sciences" (Hodson, C. R., Kendall, D. G. and Táutu, rAnalyse des Données 2, 467-481.
P., eds), pp. 215-252. Edinburgh University Press, Edinburgh. Lebeaux, M. O., Stepan, S. and Benzécri, J.-P. (1976). Analyse de liens au sien d'un
Kendall, M. G. (1961). "A Course in the Geometry of n Dimensions." Statistical groupe d'enfants. Cahiers de rAnalyse des Données 1, 197-216.
Monograph No. 8, Griffin, London. Lebras, H. (1974). Vingt analyses mu1tivariées d'une structure connue. Math. Sci.
Kendall, M. G. (1975). "Multivariate Analysis." Hafner Press, New York. HUm. 47, 37-55.
Kendall, M. G. and Stuart, A. (1961). "The Advanced Theory of Statistics", Vol. 2. Leroi-Gourhan, A. (1965). "Préhistoire de l'Art Occidentale." Mazenod, Paris:
Griffin, London. Lingoes, J. C. (1963). Multivariate analysis of contingencies: an IBM 7079 program
Kendall, M. G. and Stuart, A. (1973). "The Advanced Theory of Statistics", Vol. 2, for analyzing metricjnonmetric or linearjnonlinear data. Computational Report 2,
3rd edn. Griffin, London. 1-24. (Computing Center, University of Michigan).
Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika 58, Lingoes, 1. C. (1964). Simultaneous linear regression: an IBM 7090 program for
433-451. analyzing metricjnonmetric or linearjnonlinear data. Behav. Sei. 9, 87-88.
Kshirsagar, A. M. (1972). "Multivariate Analysis." Mareel Dekker, New York. Lingoes, 1. C. (1968). The multivariate analysis of qualitative data. Multivariate
Kubow-Ivarson, W. (1982). L'enfant au cycle primaire et sa famille: un exemple Behav. Res. 3, 61-94.
d'analyse par bandes du tableau de Burt issu d'une enquete. Cahiers de rAnalyse Lingoes, J. C. (1977). With contributions by Borg, 1., De Leeuw, J., Guttman, L.,
des Données 7, 45-65. Heiser, W., Lissitz, R. W., Roskam, E. E. and Schonemann, P. H. "Geometric
Lancaster, H. O. (1958). The structure of bivariate distributions. Ann. Math. Statist. Representations of Relational Data: Readings in Multidimensional Scaling."
29, 719-736. Mathesis Press, Ann Arbor.

336 Theory and Applications ofCorrespondence Analysis References 337


Lord, F. M. (1958). Sorne relations between Guttman's principal components of scale une batterie d'épreuves de mathématique. Cahiers de I'Analyse des Données 6,
analysis and other psychometric theory. Psychometrika 23,291-296. 297-318.
MacQueen, J. (1967). Sorne methods for c1assification and analysis of multivariate Mutumbo, F. K. (1973). Traitement des données manquantes et rationalisation d'un
observations. In "ProceOOings of the 5th Berkeley Symposium on Mathema­ réseau de stations de mesures. Doctoral thesis, Université Pierre et Marie Curie,
tical Statistics and Probability" (Le Cam, L. M. and Neyman, J., OOs), 1, pp. Paris.
281-297. Nakache, J.-P., Lorente, P., Benzécri, J.-P. and Chastang, J.-F. (1977). Aspects
Madre, J. L. (1980). Méthodes d'ajustement d'un tableau a des marges. Cahiers de pronostiques et thérapeutiques de l'infarctus myocardiaque aigu compliqué d'une
r Analyse des Données 5, 87-99. défaillance sévere de la pompe cardiaque. Application des méthodes de
Mahé, J. (1974). L'analyse factorielle des correspondances et son usage en discrimination. Cahiers de rAnalyse des Données 2,415-434.
paléontologie et dans l'étude de l'évolution. Bull. Soco Géol. France 7-série 16, Nakhlé, F. (1976). Sur l'analyse d'un tableau de notes dédoublées. Cahiers de
336-340. l' Analyse des Données 1, 243-257, 367-379.
Mai'ti, D. (1979). Programme d'homogénéisation et d'analyse d'un tableau de données Naouri, J. C. (1970). Analyse factorielle des correspondances continues. Publications
hétérogénes. Cahiers de l'Analyse des Données 4, 465-487. de l'Institut de Statistique de l'Université de Paris 19,1-100.
Mallows, C. L. and Tukey, J. W. (1982). An overview of techniques of data analysis, Nash, S. W. (1978). Review of"Techniques de la Description Statistique", by Lebart,
emphasizing its exploratory aspects. In "Sorne Recent Advances in Statistics (de Morineau and Tabard. J. Am. Statist. Ass. 74, 254-255.
Oliveira, J. T. and Epstein, B., eds), pp. 111-172. Academic Press, London. Nishisato, S. (1980). "Analysis ofCategorical Data: Dual Scaling and its Applications."
Mandel, J. (1982). Use of the singular value decomposition in regression analysis. University ofToronto Press, Toronto.
Am. Statistician 36, 15-24. Nishisato, S. and Arri, P. S. (1975). Nonlinear programming approach to optimal
Mandille, J., Kalaora, B. and Bedeneau, M. (1979, 1980). La foret de Fontainebleau. scaling of partially ordered categories. Psychometrika 40, 525-548.
Dépouillement d'une enquete faite auprés des promeneurs. Cahiers de I'Analyse des Nora-Chouteau, C. (1974). Une méthode de réconstitution et d'ánalyse de données .
Données 4, 313-330; 5, 65-74. incompletes. Doctoral thesis, Université Pierre et Marie Curie, Paris.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). "Multivariate Analysis." Academic O'Neill, M. E. (1978a). Asymptotic distributions of the canonical correlations from
Press, London. contingency tables. Aust. J. Statist. 20, 75-82.
Marinelli, W. (1931). Schadel des Hohlenbaren. In "Die Drachenhohle bei Mixnitz O'Neill, M. E. (1978b). Distributional expansions for canonical correlations from
(Abel and Kyrle, eds) pp. 332-497. Osterreiche Staatsdruckerei, Vienna. contingency tables. J. R. Statist. Soco B 40,303-312.
Marshall, A. W. and Olkin, l. (1979). "lnequalities: Theory of Majorization and its O'Neill, M. E. (1980). The distribution of higher-order interactions in contingency
Applications." Academic Press, New York. tables. J. R. Statist. Soco B 42,357-365.
Maung, K. (1941). Measurement of association in a contingency table with special O'Neill, M. E. (1981). A note on the canonical correlations from contingency tables.
reference to the pigmentation of hair and eye colours of Scottish school children. Aust. J. Statist. 23, 58-66.
Ann. Eugen.ll, 189-205. Orlóci, L. (1978). "Multivariate Analysis in Vegetation Research." W. Junk, The Hague.
McKeon, J. J. (1966). Canonical analysis: sorne relations between canonical cor­ Pearson, K. (1901). On lines and planes of c10sest fit to a system of points in space.
relation, factor analysis, discriminant function analysis, and scaling theory. Psycho­ Philosophical Magazine and Journal ofScience, Series 6, 2, 559-572.
metric monograph no. 13, Psychometric Society. Pearson, E. S. and Hartley, H. O. (1972). "Biometrika Tables for Statisticians."
Meimaris, M. (1978). Statistique de l'enseignement en Grece: étude des différents Volume 2. Cambridge University Press, Cambridge.
établissements d'enseignement supérieur suivant l'origine socioprofessionelle de Pringle, R. M. and Rayner, A. E. (1971). "Generalized lnverse Matrices with
leurs étudiants. Cahiers de l'Analyse des Données 3, 355-365. Applications to Statistics." Griffin, London.
Meimaris, M. (1979). Analyse des relations entre variables météorologiques et flux de Ramsay, J. O. (1975). Solving implicit equations in psychometric data analysis.
population: application au criquet migrateur malgache. Cahiers de r Analyse des Psychometrika 40, 337-360.
Données 4, 95-106. Ramsay, J. O. (1977). Maximum likelihood estimation in multidimensional scaling.
Miller, R. N. (1974). The jackknife-a review. Biometrika 61, 1-15. Psychometrika 42, 241-266.
Morando, B. (1980). L'analyse statistique des partitions de musique. Cahiers de Ramsay, J. O. (1978). Confidence regions for multidimensional scaling analysis.
rAnalyse des Données 5,213-228. Psychometrika 43,145-160.
Morrison, D. F. (1976). "Multivariate Statistical Methods." 2nd edn. McGraw Hill, Rao, C. R. (1980). Matrix approximations and reduction of dimensionality in multi­
New York. variate statistical analysis. In "Multivariate Analysis" (Krishnaiah, P. R., ed) Vol. 5,
Mosier, C. l. (1946). Machine methods in scaling by reciprocal averages. Proceedings, pp. 3-22. North Holland, Amsterdam.
Research Forum, 35-39. lnternational Business Corporation, Endicott, N.Y. Richardson, M. and Kuder, G. F. (1933). Making a rating scale that measures.
Mottl, M. (1933). Zur Morphologie der Hohlenbarenschadel aus der Igric-hOhle. Personnel J. 12, 36-40.
Jahrb. Klg. Ung. Geol. Reichsanst. 29, 187-246. Robert, J. (1976). Etat du commerce extérieur de la France en 1832. Cahiers de
Mullon, C. and Colonna, F. (1980). Correspondance entre structures primaire et r Analyse des Données 1, 71-75.

..

secondaire dans les protéines. Cahiers de l'Analyse des Données 5, 75-85. Rosenzveig, C. (1978). Une chaine d'analyse des correspondances sur micro­
Murtagh, F. (1981). Recherche d'un scalogramme sur les réponses de 1300 éleves a ordinateur. Cahiers de rAnalyse des Données 3, 418-434.
l.,,'··'.
338 Theory and Applications ofCorrespondence Analysis
References 339
Rosenzveig, C. and Thomas, J. P. H. (1979). Attitudes des sous-officiers des trois Tukey, J. W. (1977). "Exploratory Data Analysis". Addison-Wesley, Reading,
armées: dépouillement d'une enquete de sociologie militaire. Cahiers de I'Analyse Massachusetts.
des Données 4, 7-27.
Tukey, P. A. and Tukey, J. W. (1981). Graphical display of data sets in 3 or more
Roux, M. (1979). Estimation des paléoclimats d'apres l'écologie des foraminiferes.
dimensions. 1. Preparation; prechosen sequence of views. 2. Data-driven view
Cahiers de /'Analyse des Données 4, 61-79. selection; agglomeration and sharpening. 3. Summarization; smoothing; supple­
Roux, M., Robert, J. and Benzécri, J.-P. (1976). Analyse de données sur l'art mented views. In "Interpreting Multivariate Data" (Barnett, V., ed.) pp. 189-278.
préhistorique. Cahiers de I'Analyse des Données 1, 61-70. Wiley, Chichester, UK.
Schmetterer, L. (1969). Multidimensional stochastic approximation. In "Multi­ Vasserot, G. (1976). L'analyse des correspondances appliquée au marketing. Le choix
variate Analysis" (Krishnaiah, P. R., ed), Vol. 2, pp. 443-460. Academic Press, du nom d'un produit. Cahiers de rAnalyse des Données 1, 319-333.
New York. Vasserot, G. (1977). L'implantation des services d'une société. Cahiers de I'Analyse des
Schonemann, P. H. (1970). On metric multidimensional unfolding. Psychometrika 35, Données 2, 303-311.
349-366. Van Heel, M. and Frank, 1. (1980). Classification of particles in noisy electron
Schonemann, P. H. and Carroll, R. M. (1970). Fitting one matrix to another under micrographs, using correspondence analysis. In "Pattern Recognition in Practice"
choice of a central dilation and a rigid motion. Psychometrika 35,245-255. (Gelsema, E. S. and Kanal, L. N., eds). North-Holland, Amsterdam.
Slater, P. (1960). The analysis of personal preferences. Br. J. Statist. Psychol. 3, Volle, M. (1970). La construction des nomenclatures d'activités économiques de
119-135. l'industrie. Annales de rI.N.S.E.E. 4, 101-131.
Smith, C. A. B. (1977). A note on genetic distance. Ann. Hum. Genet. (Lond.) 40, Vrba, E. S. (1980). The significance of bovid remains as indicators of environment
463-479. and predation patterns. In "Fossils in the Making: Vertebrate Taphonomy and
Spence, I. (1978). Multidimensional scaling. In "Quantitative Ethology" (Colgan, Paleoecology" (Behrensmeyer, A. K. and Hill, A. P., eds). University of Chicago
P. W., ed.) Wiley, New York. Press, Chicago.
Srikantan, K. S. (1970). Canonical association between nominal measurements. J. Am. Walsh, G. R. (1975). "Methods ofOptimization". Wiley, New York.
Statist. Ass. 65, 284-292. Wilkinson, J. H. and Reinsch, C. (1971). "Handbook for Automatic Computation."
Stewart, G. W. (1973). "Introduction to Matrix Computations." Academic Press, New Vol. 11: Linear Algebra (Bauer, F. L., ed.). Springer-Verlag.
York. Williams, E. J. (1952). Use of scores for the analysis of association in contingency
Stewart, T. J. (1981). A descriptive approach to multiple-criteria decision making. J. tables. Biometrika 39, 274-298.
Operational Res. Soco 32, 45-53. Williamson, M. H. (1978). The ordination of incidence data. J. Ecol. 66, 911-920.
Swan, J. M. A. (1970). An examination of sorne ordination problems by use of Wold, S. (1978). Cross-validatory estimation of the number of components in factor
simulated vegetational data. Ecology 51, 89-102. and principal components models. Technometrics 20, 397-405.
Tabet, N. (1973). Programme d'analyse des correspondances. Part of doctoral thesis, Yagolnitzer, E. (1977). Comparaison de deux correspondances entre les memes
3e cycle, Université de Paris VI. ensembles. Cahiers de rAnalyse des Données 2,251-264.
Tagliante, P., Chaumereuil, P. F. and Villard, J. P. (1976). Les critiques de cinéma Young, G. (1937). Matrix approximations and subspace fitting. Psychometrika 2,
d'apres la cote des films publiée par l'hebdomadaire Pariscope. Cahiers de l' Analyse 21-25.
Des Données 1, 381-400. Young, G. and Householder, A. S. (1938). Discussion of a set of points in terms of
Tatsuoka, M. M. (1971). "Multivariate Analysis." Wiley, New York. their mutual distances. Psychometrika 3, 19-22.
Teil, H. (1975). Correspondence factor analysis: an outline of its method. Math. Geol.
7,3-12.
Teil, H. and Cheminée, J. L. (1975). Application of correspondence factor analysis to
the study of major and trace elements in the Erta Ale chain (Afar, Ethiopia).
Math. Geol. 7, 13-30.
Teillard, P. (1976). L'évolution de la production industrielle franeyaise de 1963 a 1975.
Cahiers de /'Analyse des Données 1,401-417.
Tsébélis, G. (1979). Géographie électorale de la Gri:ce: analyse des attitudes de vote
aux scrutins nationaux de 1958 a 1977. Cahiers de rAnalyse des Données 4,
423-436.
Tsianco, M. c., Odoroff, C. L., Plumb, S. and Gabriel, K. R. (1981). BGRAPH-a
program for biplot multivariate graphics, Version 1. User's guide. Technical report
81/20, Department of Statistics and Division of Biostatistics, University of
Rochester, Rochester, New York 14642.
Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In
"Psychological Scaling: Theory and Applications" (Gulliksen, H. and Messick, S.,
eds), pp. 155-167. Wiley, New York.

).
Appendix A 341

approach for teaching. In this appendix we show how the SYD underlies
principal components analysis, biplot, canonical correlation analysis,
canonical variate analysis and correspondence analysis. These techniques
are aH variations of a theme, and that theme is the algebra and geometry
ofthe SYD.
Further relevant literature on the SYD in statistics is provided by Good
(1969), Chambers (1977), Gabriel (1978), Rao (1980), Mandel (1982) and
Appendix A Greenacre and Underhill (1982).

A.l SVO ANO LOW RANK MATRIX APPROXIMATION


Singular Value Decomposition
Usual form of SVD in fue/idean geometry
(SVD) and Multidimensional The SYD of a matrix is the decomposition of the matrix as the product of
Analysis three matrices of particularly simple form and geometric interpretation. The
fundamental theorem is that any real 1 x J matrix A can be expressed as:

A = U Da y T (A.U)
I',(J IxK KxK KxJ
The singular value decomposition (SYD) is one of the most useful tools in where:
matrix algebra, yet it is still not treated in many textbooks for statisticians.
Its origins can be traced back to the work of French and Italian mathemati­ Dais a diagonal matrix of positive numbers ai ... aK

cians in the 1870s (see, for example, Marshall and Olkin, 1979, Chapter 19). K is the rank of A (K ~ min{I, J})

One of its largest fields of application, namely low rank matrix approxima­ UTU = yTy = 1, i.e. thecolumns ofU and Y are orthonormal (in an ordinary

tion, was first reported by Eckart and Young (1936) in the first volume Euclidean sense)

of Psychometrika. Psychometricians often use the term "Eckart-Young


decomposition" in their honour. Other synonyms evident in the literature
are: "basic structure" (Horst, 1963; Green and Carroll, 1976, pp. 230-240), An equivalent way ofwríting (A.U) is:
"canonical form" (Eckart and Young, 1936; Johnson, 1963), "singular "",K T
A = ~k;iakukvk (A.1.2)
decomposítion" (Good, 1969; Kshirsagar, 1972) and "tensor reduction"
(translation of réduction tensorielle, Benzécri et al., 1973). The name "basic where 0 1 " , UK and Vi'" VK are the columns of U and Y. The values ak'
structure" is perhaps the most descriptive in that the adjective "basic" implies k = 1 'K, are called the singular values of A, while the vectors Uk and Vk ,
the notion of inherency, of something fundamental, as well as suggesting the k = 1 K, are called the left and right singular vectors. respectively. The
geometric concept of a basis for a space (cf. Sections 2.1 and 2.2). However, left singular vectors form an orthonormal basis for the columns of A in
the term singular value decomposition is in more common usage, mainly for 1-dimensional space and the right singular vectors form an orthonormal basis
historical reasons, even though it has a rather obscure justification. The for the rows of A in J·dimensional space. The SYD in the form of (A. 1.2) can
abbreviation SYD is often used in writing and speech, and we have adopted be interpreted as a linear combination of the "standardized" rank 1 matrices
it in the place of its rather lengthy parent termo ukvJ, k = 1... K, with the singular values indicating the magnitude of the
The framework of the SYD can accommodate a wide range of multi­ "
matrix in each of its K "dimensions". The origin of the name singular (value)
dimensional techniques and thus unifies what might seem superficially to be "
decomposition is probably the fact that the subtraction of any term akukvJ

1
quite different analyses. For this reason we have found it to be an ideal from the matrix A results in a singular matrix.
342 Theory and Applications ofCorrespondence Analysis n
~.

1,
Appendix A 343

A familiar special case of the SVD is the eigenvaluejeigenvector decomposi­ f


reflections in corresponding singular vectors. If two singular values are
tion, or eigendecomposition, of a real symmetric matrix B (J x J) of rank 1:
identical, say IXk = IX k+ l ' then the corresponding pairs of singular vectors are
K ~J: determined only up to rotations in their respective 2-dimensional subspaces.
T ¡... Although it is rare that singular values turn out to be equal in practice, it may
B = V DA V = r.fAkVkVI
JxJ JxK KxK KxJ happen that they are approximately equal, leading to instability of the
associated singular vectors with respect to small changes in the data matrix
In this case the left and right singular vectors are identical and are commonly (cf. Section 8.1).
known as the eigenvectors of B, while the singular values are called eigen­ p

values (synonyms are latent or characteristic vectors and latent or character­ I


istic roots respectively). Note that the SVD consists of real matrices and exists Complete form of the SVD
for any rectangular matrix, whereas the eigendecomposition of a square " It is often useful to "complete" the orthonormal bases of U and V in
matrix often involves complex elements if the matrix is non-symmetric. their respective spaces to obtain square matrices O == [u 1 ... UK'" UI] and
Proof of the existence of the SVD of A is usually facilitated by assuming 11 V == [VI'" vK", vJ]. For example, UK+l'" UI are 1- K orthonormal vectors
the existence of the eigendecomposition of the square symmetric positive­ which are also orthogonal to u1 .. ,UK, so that OTO = I. Similarly, VTV = l.
semidefinite matrix B == ATA = VD AV T, which has positive eigenvalues. It is r, The SVD can then be written as:
then easy to show that the SVD of A is A = UD <xV T, where V is the above !
matrix of eigenvectors of B, that the singular values are the square (oots of A = O A VT (A.l.3)
IxJ Ixl IxJ JxJ
the eigenvalues: D<x = Dif2, and that U = AVD;1 (cf. Mardia et al., 1979,
pp. 473-474). A more fundamental proof assumes the existence of only one "\, where
singular value and pair of associated singular vectors and then derives the
SVD by an inductive argument (Greenacre, 1978, Appendix A). ~ A==[~<X ~J
A slightly different approach to the definition of the SVD is described by
Greenacre (1978, Appendix A), more like the usual definition of an eigenvalue
A and eigenvector x of a square matrix B: Bx = AX. A non-zero scalar IX is
called a singular value and a pair of vectors u and vare called singular vectors ,
1
1
,~
which we call the complete SVD of A (see Green and Carroll, 1976, p. 234).
Notice that there are usually an infinity of ways of completing the bases, that
is of choosing orthonormal bases for the orthogonal complements of U and

j, I~1
of a rectangular matrix A if Av = IXU and ATu = IXV hold simultaneously. V. The complete form is sometimes called the SVD itself (cf. Kshirsagar, 1972,
pp. 247-249).
Since the pair of vectors u and - v will also satisfy this pair of formulae for a ~:~,';I
singular value of -IX, it is assumed that the singular values are positive. There , ..;I
is also an indeterminacy in the scale of u and v, but the singular vectors do Low rank matrix approximation
fl'
have equal Euclidean norms (u Tu = vTv), so that it is sufficient to standardize
one of the vectors, usually to be of unit norm. The only indeterminacy that
.~¡ In view of form (A.1.2) of the SVD it seems that if singular values IXKO+l" .IXK
,
remains is a simultaneous reflection (multiplication by -1) of u and v in their .J are small compared to 1X 1 ... IXKO, then dropping the last K -K* terms of the
respective dual spaces, but this is usually of no consequence. It can be easily right-hand side of (A.l.2) gives a good approximation to A and has lower
proved that singular vectors associated with distinct singular values are rank than A. The approximation is in fact a least squares one, and it is this
necessarily orthogonal to each other in their respective dual spaces. result which makes the SVD so useful. The theorem of low rank approxima­
tion (first stated and proved by Eckart and Young, 1936) is as follows:

Uniqueness of the SVD Let A[KO] == r.folXkUkvI be the 1 x J matrix of rank K* formed from the
largest K* singular values and corresponding singular vectors of A. Then
From now on it is assumed that the singular values are arranged in A[KO] is the rank K*least squares approximation of A in that it minimizes:
descending order: IX1 ~ 1X 2 ~ ... ~ IX K > 0, with the singular vectors ordered
accordingly. If strict inequalities order the singular values, that is there is no r.{r.f(a;j-x;j)2 = trace{(A-X)(A-X)T} (A. 1.4)
multiplicity of singular values, then the SVD is uniquely determined up to for all matrices X ofrank K* or less.
Appendix A 345
344 Theory and Applications 01 Correspondence Analysis

The columns of N and M may be caBed generalized left and right singular
It is not difficult to prove this result and a possible proof runs along the
vectors respectively. They are still orthonormal bases for the columns and
same lines as existing proofs for the special case when A is square (see, for
rows of A, but the metrics imposed in the 1- and J-dimensional spaces are no
example, Kshirsagar, 1972, pp. 429-430; Stewart, 1973). Rere we need to use
longer simple Euclidean, but generalized (or weighted) Euclidean metrics
the complete SVD of A, given in (A.1.3). The objective function (A.IA) can
defined by O and «D respectively (cf. Section 2.3). Similarly, the diagonal
then be written as:
elements of the diagonal matrix D IX may be caBed generalized singular values,
trace {ÜÜT(A-X)"VVT(A-X)T} = trace {(A-G)(A-G)T} ordered from largest to smaBest.
= r.f(a k-gkk)2 + r.rr.f+k9il The generalized SVD is easily proved, assuming the ordinary SVD of
01/2A«D l /2, where we use the symmetric matrix square roots (i.e. if O has
where G (I x J) == Ü T xv.Since G is of the same rank as X, it is clear that the
eigendecomposition O = WD /l W T, then 0 1/2 = WD~/2W T):
optimal G must have a "diagonal" of al'" aK. and otherwise be zero, which
implies X = A [K.] is optimal. 01/2A«D l /2 = VDIXV T, where VTV = VTV = I (A. l.1 O)
We require a notation for the submatrices ofV, D IX and V:
Letting:
DIX(K.) O ] N == 0-1/2V and M == «D- l /2V (A.Ul)
V == [V(K·) V(K-K·)] D IX == [ O D V == [V(K·) V(K-K·)]
IX(K-K·)
we have (A.1.8) and (A.1.9).
so that V(K.) (IxK*), DIX(K.) (K*xK*) and V(K") (JxK*) compose the The corresponding generalization which is induced in the theorem of low
SVD of A [K.]: rank approximation is as foBows. If the last K - K* terms of (A.1.8) are
A [K.] = V (K·¡D IX(K·)V (K·) (A.1.5) dropped, then A [K.] == r.f·akDkmJ = N (K.¡D IX(K.)M (K.)is the generalized rank
K* least-squares approximation of A in that it minimizes:
The matrix of residuals is clearly:
trace{O(A - X)«D(A - X)T} (A. 1. l2)
A-A[K.] = V(K-K.¡DIX(K-K.)V(K-K.) (A. 1.6)
amongst aH matrices X of rank K* (or less).
Because the sum of squared elements of a matrix Y is equal to trace (YY T),
When O, say, is a diagonal matrix D w of positive numbers Wl ... WI. the
we have from (A.l.1), (A. 1.5) and (A.1.6) that the sums of squared elements of fit (A.l.12) can be written as:
A, A[K.] and A-A[K.] are respective1y r.faf, r.f·af and r.f=K.+lai. A
traditional measure of the quality of approximation of A by A[K.] is the trace{Dw(A-X)«D(A-X)T} = r.{wi(ai-x¡}T«D(ai-x¡} (A. 1. l3)
percentage sum of squares: where ~i and Xi are the rows of A and X respective1y, written as column
7:K. == 100r.f·aflr.faf (A.1.7) vectors. This function essentiaBy defines a generalized principal components
analysis in the spirit of Pearson (1901) and Young (1937), as described in
Section 2.5. The rows of A are considered to be a cloud of 1 points in J­
Generalized SVD dimensional generalized (or weighted) Euclidean space, where the metric is
We now need to introduce a slight generalization in the definition ofthe SVD defined by «D. (In Section 2.5 «D is also a diagonal matrix, often caBed a
in order to accommodate our later description. If O(I x 1) and «D(J x J) are "diagonal metric".) The values W l ... W I are masses (or weights) which are
given positive-definite symmetric matrices, then any real 1 x J matrix A of assigned to each of the row points themselves. The rows of X are unknown
rank K can be expressed as: points in a K*-dimensional subspace and the minimum of (A.U3) attained
by X == A[K.] identifies the subspace which is closest to the cloud of points in
A = N D IX MT = r.fakDkmJ (A.1.8) terms of weighted sum of squared distances. In this case the vectors
Ixl IxK KxK Kxl
mI'" mK. define orthonormal principal axes of the subspace, while the rows
where the columns of N and M are orthonormalized with respect to O and of the matrix N(K.¡DIX(K.) = [alDl ... aK.DK.] define the co-ordinates (with
«D respective1y: respect to these axes) of the projections of the cloud of points onto the
NTON = MT«DM = I (A.1.9) subspace. Remember that the orthonormality of the axes and of the projec­

.J

346 Theory and Applications of Correspondence Analysis Appendix A 347

tions is defined in terms ofthe metric «1>. Notice that the matrix approximation PHASE3
A[K*]. equivalentIy the optimal subspace defined by M(K*), is unique as long Obtain a graphical display of the rows and/or the columns of the
as IX K* is strictIy greater than IX K* + 1 (cf. aboye discussion of the uniqueness data by plotting the rows of
of the SVD). In practice this means that the dimensionality K* of the
approximation should be chosen where there is a clear difference between F == N(K*¡D~(K*) and/or G == M(K*¡D~(K*)
IXK* and IXK*+1 (cf. Section 8.1).
for given a and b, with respect to K* Cartesian axes.
Usually we would plot the rows of F == N (K*¡D ",(K*) in K*-dimensional
Euclidean space (K* is most often equal to 2 or 3) in order to explore the
This analysis can be programmed as a computer subroutine/procedure/macro
multidimensional configuration of the rows of the data matrix A. However,
and a variety of analyses are made possible by supplying the following
it is possible to plot the rows of G == M(K*) in the same space, as described by "parameters" to the program:
Gabriel (1971, 1981). This particular joint display of points representing both
the rows and the columns of a matrix is called a biplot and is essentially the (1) The type of centering/recoding of the data matrix in Phase 1.
same as the vector model for preferences suggested by Tucker (1960), but (2) The positive-definite symmetric matrices U and «1> in Phase 2.
applicable in a wider context of data. In a biplot display the scalar product (3) The scalars a and b which indicate how the singular values are appor­
of the ith row point vector (i.e. ith row f¡T of F) and the jth column point tioned to rescale the left and right singular vectors respectively prior
vector (i.e. jth row gJ of G) approximates the datum a jj : to plotting.
aij ~ f¡T gj = (length off;) x (length of g) Table A.1 illustrates a number of well-known special cases of the aboye
x (angle cosine between f¡ and g) (A.U4) analysis. We also give a very brief description of the geometric interpretation
ofthe plots that result from these analyses, leaving the reader to refer to more
The data values aij are usually centered in sorne way, for example with respect detailed descriptions which exist in textbooks and journal articles in the
to the column means, and thus a positive deviation aij > O is indicated by column "References". The data-analyst is also free to experiment with the
vectors f j and gj subtending an acute angle, while a negative deviation ajj < O "parameters" in the three phases of the analysis in the context of a specific
is indicated by vectors subtending an obtuse angle. data set.
A few explanatory remarks are required to assist the reader's assimilation
ofTable A.1:
A.2 A GENERAL ANALYSIS AND ITS SPECIAL CASES
(1) It is not uncommon that only one set of points (row points or column
points) is of interest. Usually it is the set of points which is not rescaled
The aboye description suggests the following general analysis of a rectangular (e.g. b = O) that is not of interest.
data matrix Y:
(2) Ifboth sets ofpoints are plotted and a+b = 1, the biplot interpretation
PHASE 1 is valid, that is between-set scalar products f¡Tgj in the display approxi­
mate the elements a¡j of the matrix A.
Pre-process Y in sorne way, usually sorne type of centering or
4 (3) If both sets of points are plotted, where b, say, is O, then the display of
recoding of the data. This results in a matrix A.
the column points will sometimes be very much "smaller" that the display
PHASE2 of the row points which are rescaled by a = 1, say: The column points
can then be uniformly rescaled by a convenient amount so that relative
Compute the generalized SVD of A for given U and «1> (cf. (A.1.l0)
positions of the column points are more easily observed, in which case the
and (A.U1)):
scale of each set of points is different. When a = b, the points are usually
A = ND",M T, where NTUN = MT«I>M = 1 plotted with respect to the same scale.
(4) Table A.1 is a framework for the fundamental computations for each
and select a rank K* approximation:
method. Specific methods require additional computations, for example
A[K*] = N (K*¡D"'(K*)M(K*) the computation of correlations between component scores (columns of
TABLE A.1

A number of multidimensíonal techniques defined in terms of generalized singular value decompositíons.

PHASE 1 PHASE 2 PHASE3


Transformed matrix
Name of analysis and A n <11 a b Outline of geometric interpretation References
special definitions íxJ Ixl J xJ

(1 ) Principal componenrs analysis Y-(111)11 Ty (11/)1 o Origin of display is the centroid of the rows. Pearson (1901 ) and
(or principal componenrs biplor if Displayed row points are the orthogonal Hotelling (1933) (both
variables are displayed). projections of the J-dimensional row points reproduced in Bryant and
The data matrix Y is typically cases (rows) onto the "ciosesC (i.e. best fitting) Atchley. 1975. along with
by variables (columns). K'-dimensional subspace. other articles on principal
Variables are plotted (if desired) as vectors and components analysis);
biplot interpretation is valid. Morrison (1976); Gabriel
Columns of G are eigenvectors of covariance (1971); Kendall (1975);
matrix. Chambers (1977).

(2) Generalized principal componenrs analysis (i) Y-11 TO",Y o As in (1) except the space of the rows is Many well-known
(... biplor) -O", is a diagonal matrix or generalized Euciidean according to the metric techniques can be con­
of positive weights. assumed to have (ii) {Y-11To",Y}<II1/2 O <11 and the row points are weighted respec­ sidered as special cases. cf.
sum 1. tively by w, ... WI in the diagonal of O"'. The analyses (3). (7). (8) and
quality of approximation of each A is the (9). The potential of
same. Options (i) and (ii) give the same different O'" and <11 has
configuration F of row points but different yet to be explored
configurations G of column points (see (3)
below).

(3) Principal componenrs analysis (i) Y-(111)11 T y O Well-known special case of (2) where the Most textbooks on multi·
(... biplor) 01 srandardized dara Os is the or variance of the row points along each of the variate analysis treat this
diagonal matrix of standard deviations of (ii) {Y-(111)11Ty}Os' O original axes is made equal. problem specifically under
the variables The columns of G in (ii) are the usual the guise of principal
eigenvectors of the correlation matrix. Option components analysis of
(i) results in a ~ which is related to that G the correlation matrix (i.e.
by: ~ = OsG. so thatthe standard deviations option (ii».
are approximately represented by the lengths of
the column point vectors. but not the
covariance structure (see (4) below).

(4) Covariance biplor O Origin of display is the centro id of the rows. Gabriel (1971.1972.1981)
Displayed row points are the orthogonal
projections oftheJ-dimensional row points in
Mahalanobis space onto the "ciosesC (i.e.
best fitting) K' -dimensional subspace.
Variables are plotted as vectors and their lengths
approximate respective standard deviations of
the variables. Angle cosines between these
vectors approximate correlations between
respective variables.

(5) Correlation biplot {Y - (1 1/)11 Ty}O;' 1(1-1) '/2 O As for covariance biplot. except the column Hills (1969)
O. as in (3) aboye. points are at length 1 from the origin in
J-dimensional space The quality of display of
each variable may thus be gauged by observing
how near these points Iie to the unit
K··dimensional hypersphere (e.g unit circle in
2-d).

(6) Symmetn(; biplot , ~ This biplot focuses attention on the structure of


'2 Bradu and Gabriel (1978);
ji is the mean of all the elements of the the matrix itself rather than the geometry of Gabriel (1981 )
data matrix. the rows and columns. Only the between-set
biplot interpretation is val id.

(7) Canonical correlarion analysis The columns of F and G are the canonical Anderson (1958);
LetY", [Y, Y 2].wherethevariables loadings. ie. the coefficients on the original Tatsuoka (1971); Morrison
naturally divide into subsets of size I and variables which give the canonical variables. (1976); Mardia er al.
J. Suppose Y is centered with respect to If the canonical scores are plotted. i.e. the rows (1979); Falkenhagen and
variable (column) means and that S" and of Y, F and Y2G. then these may be considered Nash (1978); Chambers
5 22 represent the within-sets covariance to be two clouds 01 points representing the (1977); Gittins (1979)
matrices. and 5'2 the between·sets cases in their respective Mahalanobis spaces
covariance matrix. (metrics 5;-,' and S;,i respectively). projected
onto the K' -dimensional subspaces which
exhibit the greatest positional correlation 01 the
two ciouds.

(8) Canonical variare analysis Ow 5- 1 o The origin of the display is the centroid of the Same relerences as lor
The rows 01 Y lall into H groups. Let Y HxH rows 01 Y. i.e. of the rows of Y too. canonical correlation
be the H xJ matrix 01 group means on the This is a special case of generalized principal analysis as well as most
J variables. D w the diagonal matrix 01 components analysis. as delined in (2). where textbooks on discriminant
proportions 01 rows in each group. and S we are identifying the principal axes 01 the analysis
the usual pooled within-groups co· group centroids. weighted by the number in
variance matrix. each group. in the Mahalanobis space defined
in terms of the within-groups pooled covariance
matrixS.

(9) Correspondence analysls (synonyms: O, Oc This analysis may be described as a special case Benzécri er al. (1973);
dual scaling. rec/procal averaging. of (7) aboye when the data is discrete. The Hill (1974); Greenacre and
canonical analysis oi conringency rabies) rows and columns of contingency table Y are Degos (1977); Greenacre
If Y is a contingency table. sayo let P be represented by points whose positions indicate (1978); Nishisato (1980);
Y divided by its sumo Let O, and Oc be the associations between rows and columns in Greenacre (1981); Gifi
diagonal matrices 01 row and column Y. The row and column displays are dual (1981); Gauch (1982);
sums respectively 01 P. generalized principal components analyses this book
(or principal co-ordinates analyses. cf. Example
3.5.2).
350 Theory and Applications ofCorrespondence Analysis Appendix A 351

F) and variables (columns of Y or A) in principal components analysis. wider framework in that it relies on an input matrix of interpoint distances
These are very simple to add to the basic method if one uses a pro­ rather than the original rectangular data matrix. Here we generalize the usual
gramming language like SAS or GENSTAT (see Appendix B). definition of the analysis to accommodate masses assigned to points. A
(5) In variations (4) and (5) the data are divided initially by (I _1)1 /2 so that matrix S ofscalar products is first calculated (cf. Example 2.6.1(b)) and then
the true scale of variance (respectively, correlation) is recovered in the its generalized eigendecomposition calculated: S = ND¡JN T, say, where
display. For example, in the covariance biplot: NTDwN = 1, D w being the diagonal matrix of point masses. The optimal
(least-squares) K*-dimensional display of the scalar products is provided
{Y - (1jI)11 TY}/(I _1)1/2 = NDo:MT
by the rows of F == N (Ko)D~lt1(o). The generalized eigendecomposition is
T
so that MD;M equals the covariance matrix: usually computed via the ordinary eigendecomposition, like the generalized
{Y - (1 jI) 11 Ty} T{Y - (1jI)11 TY}/(I -1) SVD (cf. Section 2.5 and Appendix B): that is, find the eigenvalues/eigen­
vectors ofD~j2SD~J2 = UDIlU T, then N = D;;; 1/2u.
Hence the scalar products of the rows of G == M(KO¡DIX(K O) approximate Example 3.5.2 shows how correspondence analysis may be defined as two
the respective covariances, which implies the geometric interpretation in dual principal co-ordinates analyses.
terms of standard deviations and correlations (Gabriel, 1971).
(6) Notice the distinction between options (i) and (ii) in analyses (2) and (3).
Weighting of individual dala
There is no difIerence in the final display of the row points,. but the
displays of the column points are difIerent because difIerent matrices are A problem which is quite external to the aboye framework is that of
being biplotted. approximating the matrix A by weighted least-squares, where each term in aij
(7) Notice that the inverse of the matrix defining the metric acts as a is weighted by a prescribed wij' This is discussed by Gabriel and Zamir (1979)
"weighting matrix" in the dual problem. In canonical correlation analysis and is useful:
and correspondence analysis the matrix "parameters" n and C() are de­
(1) For the treatment ofmissing data, for which wij can be set to O;
fined as the weighting matrices, whose inverses imply the usual metrics in
(2) when we have available a measure of confidence for each datum, which
the dual spaces. For example, the configuration of the row points F in
we equate proportionally to wij;
correspondence analysis which is obtained from Table A.1 (9), is the same
(3) in the treatment of outliers, which may be individually down-weighted.
as that ofthe analysis of A = D,-1P-11 TDe , with n = D" C() = D e- 1 and ._
a = 1. This is a generalized principal components analysis (Table A. 1(2)), Notice that our framework does allow a weighting scheme of the form
where the rows of A are the centered row profiles of P, weighted by wlJ = sltj , so that individual rows and/or columns of A can be weighted in the
masses in the diagonal of D, (cf. Section 2.5). Symmetrically, the con­ least-squares approximation. This is useful, for example, in reducing the róle
figuration of column points G is the same as the generalized principal played by an outlying vector of data. However, in order to accommodate a
components analysis of A = D c- 1 P T-11 TD, with n = De' C() = D; 1 and general weighting scheme we can no longer rely on the theory based on the
b = 1. As shown in Chapter 4, variations of correspondence analysis in SVD, with its neat algorithm and nice optimality properties. The usual row
the literature, for example reciprocal averaging and dual scaling, differ and column vector geometries are also lost when general weighting is
algebraically only in the definition of the parameters a and b, which are introduced.
often both set to zero. If a = 1 and b = O in correspondence analysis,
then each row point lies at a particular barycentre (weighted average) of
the column points, where the column points are weighted proportionally
to the respective elements in that row of the contingency tableo

Principal ca-ordinales analysis


Another framework which accommodates most of the techniques of Table
A.l is that ofprincipal co-ordinates analysis (Gower, 1966). This is a sligbtly
Appendix B 353

the computations are only slightIy more cumbersome than the algorithm
defined by Table B.l. Either one of the eigenequations of (4.1.23) can be used,
whichever involves the matrix of smallest order, and then the other set of
co-ordinates can be obtained using the relevant transition formula of (4.1.16).
We are not particularly concerned about the loss of accuracy implied by
computing the solution of what is essentially an SVD problem by means of
the eigendecomposition (cf. Golub and Reinsch, 1971; Chambers, 1977). This
would be of concern only in rare instances.
Appendix B
Computations by reciprocal averaging

Computational Aspects ',;ti


Ir none of the above software is available and if the user is programming the
analysis from scratch, then the reciprocal averaging algorithm (described in
Section 4.2 and illustrated in Table 4.2) is to be recommended (see Hill, 1973).
This algorithm has the advantage that only the co-ordinates in a subspace
of required dimensionality need to be evaluated, one dimension at a time.
However, it is then necessary to compute separately the total inertia of the
The basic computations ofcorrespondence analysis are very simple, especially data so that percentages of inertia can be calculated. We also recommend
if subroutines which perform matrix computations are available, or if that the solution in at least one additional dimension be computed, so that
high-Ievel statisticallanguages like GENSTAT or SAS are used. the stability of the principal axes can be assessed (cf. uniqueness of the SVD,
Appendix A; and Section 8.1). Remember that from the second dimension
(principal axis) onwards, the orthogonality relationships embodied in (4.1.30)
Skeleton program in GENSTAT
have to be maintained, as well as the usual centering constraint. (The
For example, Table B.1 contains a listing of a simple GENSTAT program, .centering condition is equivalentIy described as orthogonality with the trivial
along with input data and the resultant output, to compute the row and axis.)
column principal co-ordinates and the row and column decompositions Usually there are no problems with the convergence of this algorithm in
of inertia for the data matrix of Table 3.1. The central computational state­ each dimension, unless successive principal inertias are identical. In practice,
ment is: due to inevitable rounding errors, no two principal inertias will be exactIy the
same, but convergence will nevertheless be extremely slow in this situation.
'SVD'X;VAL;F;G
Thus a slow rate of convergence would definitely indicate that at least one
which computes the (ordinary) SVD in one lineo Apart from input and output more principal axis should be computed.
statements, the rest of the program involves simple matrix calculations like
multiplications, rescalings (i.e. multiplications by diagonal matrices) and Plouing
summations (Le. multiplications by vectors of ones). For an introduction to
The basic algorithm yields co-ordinates of two sets of points in a low­
GENSTAT see Alvey et al. (1982). Notice that the trivial solution is obtained
dimensional Euclidean space. Various programs are available to perform
in this program because no prior centering of the data matrix is performed
plotting of these points, but there is unfortunately very little standardization
(cf. Sections 4.1.7, 4.1.8 and 4.1.12), and also that the decompositions of
of software in this area. For example, the program BGRAPH (Tsianco et al.,
inertia are not rescaled in the usual form of absolute and relative contribu·
1981) is an extremely versatile plotting package, geared specifically to the
tions (cf. Table 3.6).
biplot which also involves the simultaneous display of two sets of points.
Various other useful features, like the grouping of points, the drawing of
Computations using the eigendecomposition concentration ellipses and the projection onto oblique planes, are incor­
porated in this programo However, the FORTRAN code concerned with the
Ir no SVD routine is available, but an eigenvaluejeigenvector routine is, then
354 Theory and Applications ofCorrespondence Analysis Appendix B 355

TABLEB.1 Table B.1-continued


A simple GENSTAT program to perform a correspondence analysis, along with 'PRINT' F,G,VAL $ 10.5
input data and results. Notice that the trivial solution (eigenvalue of 1 and 'RUN'
associated eigenvectors) is included. 4 2 3 2
4 3 7 4
25 10 12 4
Correspondence analysis and decomposition of inertia 18 24 33 13
using GENSTAT 10 6 7 2
'EOD'
'CLOSE'
Vectors and matrices used in the program 'STOP'

Name Order Description


nxm Data matrix, replaced by X divided by its sum ROW(F) AND COLUMN(G) CO-ORDINATES
X
nx1 Vector of ones F
ONEN
1 2 3 4
ONEM mx1 Vector of ones
1 -1.000 0.066 -0.194 0.071
XR nx1 Vector of row sums of X
2 -1.000 -0.259 -0.243 -0.034
XC mx1 Vector of column sums of X
3 -1.000 0.381 -0.011 -0.005
DR nxn Diag(XR) (replaced bysqrt(inv(DR)) in program)
mxm 4 -1.000 -0.233 0.058 0.003
DC Diag(XC) (replaced by sqrt(inv(DC)) in program)
5 -1.000 0.201 0.079 -0.008
F n xm Row co-ordinate matrix (eventually inertia decompn.)
G mxm Column co-ordinate matrix (eventually inertia decompn.)
G
(NB. First columns of F and G respectively are the 'trivial' solutions 1 2 3 4
i.e. we compute uncentered solution) 1 -1.000 0.393 -0.030 -0.001
2 -1.000 -0.099 0.141 0.022
The program follows. 3 -1.000 -0.196 0.007 -0.026
4 -1.000 -0.294 -0.198 0.026
'REFE' CORR
'SCAL' N.M
'READ' N,M ROW(F) AND COLUMN(G) DECOMPOSITIONS OF INERTIA AND
'RUN' INERTIAS(VAL)
5 4 F
'MATR' X $ N,M: F $ N,M: G $ M,M: XR $ N,1 : XC $ M,1 1 2 3 4
:ONEN $ N,1 = (1)N : ONEM $ M,1 = (1)M : XXR $ N,1': XXC $ M,1 1 0.05699 0.00025 0.00214 0.00029
'DIAG' DR $ N: DC $ M: VAL $ M 2 0.09326 0.00625 0.00552 0.00011
'SCAL' XSUM 3 0.26425 0.03828 0.00003 0.00001
'READ' X 'CALe XSUM=SUM(X):X=X/XSUM 4 0.45596 0.02474 0.00152 0.00000
'CALe XR=PDT(X;ONEM): XC=TPDT(X;ONEN) 5 0.12953 0.00524 0.00081 0.00001
XXR=1/S0RT(XR) : XXC=1/S0RT(XC)
'EOUA' DR.DC=XXR, XXC G
'CALe X= PDT(DR;PDT(X;DC)) 1 2 3 4
'SVD' X; VAL; F; G 1 0.31606 0.04889 0.00029 0.00000
'CALe F=PDT(DR;PDT(F;VAL)): G=PDT(DC;PDT(G;VAL)): VAL=VAL-VAL 2 0.23316 0.00231 0.00464 0.00011
'CAPT" 'ROW(F) AND COLUMN(G) CO-ORDINATES" 3 0.321·24 0.01238 0.00002 0.00021
'PRINT' F,G $ 8.3 4 0.12953 0.01118 0.00507 0.00009
'EOUA' DR,DC=XR, XC
'CALe F=PDT(DR;F"F): G=PDT(DC;G·G) VAL
'L1NES' 6 1 2 3 4
'CAPT" 'ROW(F) AND COLUMN(G) DECOMPOSITIONS OF INERTIA AND 1.00000 0.07476 0.01002 0.00041
INERTIAS(VAL)' ,
356 Theory and Applications ofCorrespondence Analysis Appendix B 357
plotting is hardly portable and the program awaits conversion to other These are well organized and are complemented by published articles in the

computer installations. (At present it runs on the DEC 10, with Tektronix ecologicalliterature.

4010 series graphics terminals, and relies on NCAR graphics software or the The book by Nishisato (1980) also gives listings of programs to perform

graphics package DISPLAA.) We used BGRAPH to plot Figs 8.2, 8.3, 9.3, various types of dual scaling, including the imposition of order constraints on

9.5,9.6 and 9.15. the solutions. Enquiries can be made to:

S. Nishisato,

Specia/ized programs wirh French documemarion Ontario Institute for Studies in Education,

Many users will prefer to receive a portable FORTRAN program to do the 252 Bloor Street West,

computations as well as handle the drudgery of input, possible recoding of Toronto, Canada

the data, output and line printer plottmg. Such software is available from a At present we ourselves are also occupied with the development and full
number of sources in France. F or example, a complete library of programs is documentation of a suite of portable FORTRAN programs to perform the
available by subscription, and includes a number of recoding programs analyses described in this book, including sorne graphics routines according
(doubling, creating indicator matrices, etc.) as well as programs to perform to an intemational standard. Interested parties can be kept in contact about
various cluster analyses and compute inertia contributions. Enquiries can be the availability of this software by writing to:
madeto: Michael Greenacre,

ADDAD (Association pour le Développement et la Diffusion Department of Statistics,

de l'Analyse des Données), University of South Africa,

Laboratoire de Statistique, PO Box 392,

Tour 45-55, 2eme étage, Pretoria 0001, South Africa

4 Place Jussieu,
All the correspondence analyses in this book were performed using the
75005 Paris, France program written in France by Tabet (1973).
Another library of programs, including correspondence analysis and
multiple correspondence analysis, geared to the analysis of survey data, is
published in the book of Lebart et al. (1977). A new version ofthese programs
exists, called SPAD-1983, and may be obtained through:
CESIA (Centre de Statistique et d'Informatique Appliquées),
82 rue de Sevres,
75007 Paris, France
A set of program subroutines in FORTRAN and in APL is also published
in the book by Lebart et al. (1979). Enquiries can also be made to the aboye
address.

Specia/ized programs wirh Eng/ish documemarion


Various programs (see references, Gauch, 1977, 1979) aimed specifically at
ecologists are available at a reasonable price from:
Hugh G. Gauch, Jr.,
Ecology and Systematics,
Comell University,
Ithaca,
New York 14850, USA

l
I

1
Subject Index 359

e Contributions
absolute, see Contributions of points
Canonical correlation, 115, 120
to a principal axis

asymptotic distribution of, 219-221


interpretation of, 74-6

asymptotic distribution under


of nodes to principal axes, 201

dependence,247-8,263 of nodes to total inertia, 200

Canonical correlation analysis, 108-16,


of orthogonal axes to nodes, 206

349
ofpoints to a principal axis, 67, 91

Subject Index algebra of, 108-11

generalizations of, 143-4

ofpoints to a principal inertia, see

Contributions of points to a
geometry of, 111-6
principal axis

Canonical scúre, 109, 111, 113


ofprincipal axes to nodes, 201, 206

Canonical variate analysis, 188, 349


of principal axes to the inertia of a

Canonical weight, 109


point, 70, 91

A questionnaire with non-responses, Centering


of supplementary points to principal

147-55
of correspondence matrix, 91-2
axes,73-4

Angle, 25, 26
ratings in an e1ection, 171-4
of indicator matrices in canonical
relative, see Contributions of principal

between profile vector and principal


ratings of analgesic drugs, 263-7
correlation analysis, 111-2
axes to the inertia of a point

axis, 70, 91
science doctorates conferred in the
ofrow and column profiles, 91-2
table of, 75, 257, 261, 282-5, 289-90

between profile vector and principal


USA, 267-70
Centre of gravity, see Centroid
to inertia, 67-70, 211, 213

plane,73-4
seriation of the works of Plato,
Centroid, 17, 18-20,24,34,85,90
Convex hull, 42, 215-8

Applications of correspondence analysis


291-4
coincident with trivial principal axis,
peeling of, 218

antelope census in game reserves, 294-9


survey of clients, 76-80
51-3
Convex polygon of a symmetric

criminal offences, 270


survey of types of smokers, 54-76
contained in the optimal subspace,
correspondence, 241, 254

ecological data, 226-32


taxonomy of a plant genus, 160-2
44-5
Correlation

examination marks, 271-80


United Nations resolutions, 156-7
invariance with respect to reciprocal
between dummy variables and

eye and hair colour data, 255-9


weather forecasting, 312-6
averaging, 120
canonical variates, 120-2

fossil skull measurements with missing


Asymmetric display, 94-5
Chi-square distance, 70, 77-8, 82, 281,
between profile vector and principal

data, 308-12
biplot interpretation of, 119-20
312
axis, 70, 91

HLA gene frequencies in human


Asymmetric transition formulae, see
as a Mahalanobis distance, 116
canonical, see Canonical correlation

populations,299-308
Transition formulae
between rows of a doubled matrix,
Correlation ratio, 106

in art and archaeology, 317


Average profile, 35
178, 182
Correspondence analysis

in biology and ecology, 318-9


Axis, 16
Chi-square scalar product, 82
algebra of, 83-95

, in economics, 323-4
Chi-square statistic, 31-3
and c1assification, 190-3

in education, 321
B proportional to the total inertia, and discriminant analysis, 187-90

in epidemiology, medicine and


B6-7 and hierarchical cluster analysis,

pharmaceutics,317-8
Barycentre, 24, 42, 90
Classification, 185-6, 190-3, 308-12,
198-202

in geology, 320
Barycentric co-ordinate system,
312-7
and multidimensional unfolding, 181

in market research, 324-5


77-'130 Cluster analysis, 185-7, 196-202
and principal components analysis,

in operations research and multiple


Basis,17
hierarchical,197-8
181-3,281-91

criteria decision making, 323


Basis vector, 17
non-hierarchical,198
and regression, 193-6

in pattern recognition, 320-1


identification of, 18
Co-ordinates, 16
as a display in a triangular

in politics, 322-3
Biplot, see also Vector model, 119-20,
standard, see Standard co-ordinates
(barycentric) co-ordinate system,

in psychology and animal behaviour,


346,347,348-9,350
with respect to principal axes, see
76-80

319-20
Bipolar variables, 169-71
Principal co-ordinates
as two dual principal co-ordinates

in social surveys, 321-2


Bootstrapping, 209-10, 214-8, 262,
Column vector, 15
analyses,81-2

principal worries ofIsraeli adults,


292-4
Component, 24
based on generalized singular value

259-63
Burt matrix, 140-1,243-4
Computer programs, 356-7
decomposition, 349

protein consumption in Europe and


correspondence analysis of, 164
Constraints on the display, 232-6
biplot interpretation of, 119-20, 181-2

Russia, 280-91
modified, 243-4, 254
Continuation ratio, 266
computation of, 352-7

360 Subieet Index Subjeet Index 361

Correspondence analysis - eontinued


D Fractional ratings, 177, 178
of bivariate indicator matrix, 133-4,

detrended,232
Fuzzy coding, 159, 174
136

equivalent approaches, 7-9, 96-125

equivalent to canonical correlation

Decomposition of inertia

in terms of nodes of clustering tree,


1
G
when display is asymmetric, 94-5

analysis of indicator matrix, 114-6


199-202
I

L
equivalent to dual scaling, 104

equivalent to reciprocal averaging,

97

equivalent to simultaneous linear

in terms of principal axes, 40, 66-70

Detrended correspondence analysis, 232

Dimension, 16,24

interpretation of, 74

I
I

Generalized inverse of covariance matrix

ofdummy variables, 110, 121-2

GENSTAT, 352, 354-5

Group centroids, 188-9

Least squares matrix approximation,


343-4

Length, 25, 26

I
regressions, 117-8
Dimension weighting, 77-8
Lever principie, 175

Guttman efTect, see Horseshoe efTect

geometry of, 76-80


Dimensionality, 24
Likelihood ratio statistic, 266

historical background, 7-11


Direction cosine, 27

H Linear combination, 23

introduction, 3-7
Discriminant analysis, 185-6, 187-90
Linear independence (of vectors), 24

invariance with respect to total of data


Display

Horseshoe efTect, 80, 226-32, 258


Linear mapping, 42-3

matrix,85 asymmetric, see Asymmetric display Huyghen's theorem, 203, 204, 206
Logical coding, 159

joint, see Joint display


multiple, see Multiple correspondence
analysis Distance, 25-7
1
Low rank matrix approximation, 38-40,

numerical output of, see also chi-square, see Chi-square distance


I I
93,343-4
Contributions, table of, 75
genetic, 308

Identification of solutions, 120


M
of Hurt matrix, 164
in terms of scalar product, 27, 41-2
in dual scaling, 137

of Petrie matrix, 226-9, 248-51


Distributional equivalence, see Principie in reciprocal averaging, 97-8, 233
Mahalanobis distance (or metric), 113,

of bivariate indicator matrix, 127-37


of distributional equivalence 1
Imputation of missing data, 236-9,
116

of doubled and undoubled preferences,


Double-centering,42
251-4
Mahalanobis space, 112-3

183-4
Doubling, 171-9,271 Incidence matrix, see Indicator matrix
Mass, 35-6, 85, 296, 306-8

of doubled matrix -relationship to


Dual scaling, 8,11,102-108,233-5 Indicator matrix, 110-2, 114-6, 187
in canonical correlation analysis of

principal components analysis,


equivalent to l-dimensional
Indicator variables, 110
indicator matrix, 116

181-3
correspondence analysis, 104
Inertia, see also Contributions, principal
role of masses in determining principal

of large matrices, 245-6


relationship to multiple
inertias, nodal inertias, 35, 40
axes,67-8

of multivariate indicator matrix,


correspondence analysis, 135
between-group, 203-4
Mean vector, see also Centroid, 17,24

137-46
with order constraints, 236
contributions to, see also
Mean-square contingency coefficient, 35

of preferences, 169-84
Duality,60-6

Contributions,67-70 Metric, see also Distance, 25, 85

ofratings, 169-83
Dummy variables, see Indicator variables
decomposition along principal axes, Missing data, 236-9

of symmetric matrix, 239-44


imputation of, 93, 310, 312

, of the column profiles, 60-3


E
decomposition of, 66-70
Models for contingency tables, 258

of the row profiles, 54-60


moment of, 35
Multidimensional scaling, 219

Ecological data, 96, 226-32

recovered from canonical correlation


of question with non-responses, 165-7
Multidimensional space, 20-23

Ecological gradient, 96, 218, 227-32

analysis of indicator matrix, 122-3


percentages of, 50-1
Multiple correspondence analysis, 8,

Eigendecomposition, 38, 342

stability of, 156-7,207-22


total, 86
126-68

computation of,46

theory of, 83-125


within-group,203-4
analogy to classical multivariate

generalized,351

Correspondence matrix, 84, 115


Inner product, see Scalar product
methods, 140-1

diagonal, 240
of symmetric matrix, 240

Eigenvalue, 38
Internal consistency, 104-6, 234
artificial dimensions in, 144-5

product,240
Interval variable, 160,223
joint bivariate nature of, 140-1

symmetric, 240
Eigenvector, 38

Euclidean space
of indicator matrix of binary variables,

with block structure, 123-5


J 145-6

Cosine rule, 27
multidimensional, 28

two-dimensional, 25-8
of questionnaire data with non­
Criterion of internal consistency, see
Jackknifing,209-14 responses, 146-57

Internal consistency
Experimental design, 316

Joint display, 65
percentages of inertia in, 144-5

Cross-validation
F biplot interpretation of, 119-20
relationship to dual scaling, 135-7, 144

of correspondence analysis, 239


in canonical correlation analysis, 114
relationship to generalization of

of classification, 193, 316


Focusing, 222-6, 275-80
interpretation of, 181
canonical correlation analysis, 143-4

362 Subject Index Subject Index 363

Multiple correspondence analysis­ Principal co-ordinates, 88-9, 94-5


Reconstitution formula, 93,119,236,270 ofprincipal axes, 99,157,210-4

continued as eigenvectors, 92-3


Regression, 40-1,185-6,193-6,208-9 Standard basis, 17, 23

relationship to modelling of computation of, 93


orthogonal, 41 Standard co-ordinates, 93-5

contingency tables, 141 standardization of, 92, 93-4


Relative contributions, see Contributions Standardization

when equivalent to correspondence Principal co-ordinates analysis, 81-2,


of principal axes to the inertia of a of measurement units, 28-30

analysis of a 2-way table, 142-3 350-1


point of solutions, see also Identification of

Multivariate indicator matrix, 138, 174 Principal components analysis, 36, 39,
ofthe supplementary points, 73 solutions, 256-8

47-8,145-6,182-3,281-7,312,348
Response pattern table, see Indicator Stem-and-Ieaf histogram, 2-3

N generalized, 39-40, 345-6, 348,350


matrix Stochastic approximation, 245-6

Principal inertias, 51, 90-1


Reweighting, 222-6, 296, 306-8 Subspace, 17-18,23,24

Neighbourhood (of a point), 190-3,315,


equal to squared canonical of discrete variables to have prescribed fitting of, 45-51

316

correlations, 115
inertias,167-8 Supplementary points, 70-4,188-9,

Nodalinertias,2oo-2
asymptotic distribution of, 219-21,
of submatrices to have prescribed 296-9,301-3,310

Node,197
247-8
inertias, 161-2,224 as points with zero mass, 73

display with respect to principal axes,


of Burt matrix, 140, 243-4
projection onto principal axes, 71

202

Non-trivial solution
of bivariate indicator matrix, 131-3
s Supplementary profiles, see

of cloud of centroids, 204


Supplementary points

in canonical correlation analysis of


of symmetric correspondence matrix, Scalar product, 25, 27 Surveys, 225

indicator matrix, 110

242
chi-square, see Chi-square scalar SVD, see Singular value decomposition,

Norm, see also Length, 26

Principie of distributional equivalence, product

Normalization,27
65-6,95,312
dependency on origin of space, 28 T

o Procrustes analysis, 219


in terms of distance, 41-2

Profile, 5-6,32-3,77, 84-5


Scale value, 102 Total inertia

Optimal scaling, see Dual scaling


Projection
optimal, 104, 107 decomposition of, 90-1

Optimal subspaces, 35-41 orthogonal, 48


Scaling, 179-83 equals the mean-square contingency

Ordination, 96
dual, see Dual scaling coefficient, 86-7

Orthogonal, 27
Q
Score,102 is the same in the row and column

Orthogonal complement, 226


Seriation, 292 clouds, 86-7

Orthonormal, 27, 31 Q-mode geometry, 112


Simultaneous linear regressions, Transition formulae, 64-5, 71, 89-90,

Orthonormal basis, 27 Quadratic form


116-9 92-3,115,117,181,203,248
Outliers, 218, 239, 302 derivative of, 120
Singular value decomposition (SVD), as definition of reciprocal averaging,

Quality
37-40,340-51 97

P of display of a point, 73, 76


complete, 343 asymmetric,94,162-3

of matrix approximation, 344


existence of, 342 in correspondence analysis of bivariate

Parity (of a principal axis), 241 Questionnaire


generalized, 39-40, 51-3, 81, 87, 91, indicator matrix, 162-3

Pattern recognition, 190 analysis of, 146-57


344-5 Triangular co-ordinate system, see
Percentages of inertia, 60
computation of, 40 Barycentric co-ordinate system
of Burt matrix, 140, 243-4 R
uniqueness of,342-3 Triplet (i.e. profiles, masses and metric),
Petrie matrix, 229
Singular values, 38, 341 58,60

continuous, 248-9 R-mode geometry, 112


equality of, 88 ofthe dual problem, 61

Point (or point vector), 23

j
Ratio variables, 160,223
trivial and non-trivial, 92 Trivial axis, 51-3

Polarization, 166, 182 Reciprocal averaging, 8, 11,96-102,245,


Singular vectors, 38, 341 Trivial solution, 92

of average rating, 176 353


direct and inverse, 240 in canonical correlation analysis of

of individual ratings, 178 computation by, 98-102


Size and shape, 291, 312 indicator matrix, 110, 112, 114

relative, 178-9 equivalent to l-dimensional


¡", Stability,207-19 in multiple correspondence analysis,

Principal axes, 39, 87-8, 345 correspondence analysis, 97


external, 208, 214-9, 308 131-2

computation of, 45-51 Recoding


internal, 208,210-4,281-91,306

interpretation of, 69-70, 102 of heterogeneous data, 157-62,313


,;1 of correspondence analysis displays, u

invariance under rescaling of points, of non-responses in a questionnaire,


207-22

"
masses and metric, 80-1 147-56
of multidimensional scaling, 219 Unfolding model, 179-81

364 Subject lndex

Unit vector transpose oC, 15


projected onto canonical axes, 113-6
V
w
Vector, 15-17,23-25 Weighted Euclidean space, 28-33
Vector model, see also Bipiot, 179-80, Weighted least squares matrix
181-2,346 approximation, 119,236,345
Vector Wishart distribution, 220
order of, 15 moments of, 247-8

,,~¿.

,"·'8~"(/,
r - 'r rv
<~. /

~ L

"" Bibl¡o¡tca::
u-

? ..Y .....'"
r:r t-'. ~\y
va d \.

You might also like