Multivariate Data Analysis: The French
Way
Susan Holmes
∗
,
Stanford University
Abstract: This paper presents exploratory techniques for multivariate data,
many of them well known to French statisticians and ecologists, but few well
understood in North American culture.
We present the general framework of duality diagrams which encompasses
discriminant analysis, correspondence analysis and principal components and
we show how this framework can be generalized to the regression of graphs on
covariates.
1. Motivation
David Freedman is well known for his interest in multivariate projections [5] and
his skepticism with regards to modelbased multivariate inference, in particular in
cases where the number of variables and observations are of the same order (see
Freedman and Peters[12, 13]).
Brought up in a completely foreign culture, I would like to share an alien ap
proach to some modern multivariate statistics that is not well known in North
American statistical culture. I have written the paper ‘the French way’ with theo
rems and abstract formulation in the beginning and examples in the latter sections,
Americans are welcome to skip ahead to the motivating examples.
Some French statisticians, fed Bourbakist mathematics and category theory in
the 60’s and 70’s as all mathematicians were in France at the time, suﬀered from
abstraction envy. Having completely rejected the probabilistic entreprise as useless
for practical reasons, they composed their own abstract framework for talking about
data in a geometrical context. I will explain the framework known as the duality
diagram developed by Cazes, Caill`es, Pages, Escouﬁer and their followers. I will
try to show how aspects of the general framework are still useful today and how
much every idea from Benzecri’s correspondence analysis to Escouﬁer’s conjoint
analysis has been rediscovered many times. Section 2.1 sets out the abstract pic
ture. Section 2.22.6 treat extensions of classical multivariate techniques;principal
components analysis, instrument variables, cannonical correlation analysis, discrim
inant analysis, correspondence analysis from this uniﬁed view. Section 3 shows how
the methods apply to the analysis of network data.
2. The Duality Diagram
Established by the French school of “Analyse des Donn´ees” in the early 1970’s this
approach was only published in a few texts, [1] and technical reports [9], none of
∗
Research funded by NSFDMS0241246
AMS 2000 subject classiﬁcations: Primary 60K35, 60K35; secondary 60K35
Keywords and phrases: duality diagram, bootstrap, correspondence analysis, STATIS, RV
coeﬃcient
1
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 2
which were translated into English. My PhD advisor, Yves Escouﬁer [8][10] pub
licized the method to biologists and ecologists presenting a formulation based on
his RVcoeﬃcient, that I will develop below. The ﬁrst software implementation of
duality based methods described here were done in LEAS (1984) a Pascal program
written for Apple II computers. The most recent implementation is the R pack
age ade4 (see A for a review of various implementations of the methods described
here).
2.1. Notation
The data are p variables measures on n observations. They are recorded in a matrix
X with n rows (the observations) and p columns (the variables). D
n
is an nxn
matrix of weights on the ”observations”, most often diagonal. We will also use a
”neighborhood” relation (thought of as a metric on the observations) deﬁned by
taking a symmetric deﬁnite positive matrix Q. For example, to standardize the
variables Q can be chosen as
Q =
¸
¸
¸
¸
1
σ
2
1
0 0 0 ...
0
1
σ
2
2
0 0 ...
0 0
1
σ
2
3
0 ...
... ... ... 0
1
σ
2
p
¸
.
These three matrices form the essential ”triplet” (X, Q, D) deﬁning a multivariate
data analysis. As the approach here is geometrical, it is important to see that Q
and D deﬁne geometries or inner products in R
p
and R
n
respectively through
x
t
Qy =< x, y >
Q
x, y ∈ R
p
x
t
Dy =< x, y >
D
x, y ∈ R
n
From these deﬁnitions we see there is a close relation between this approach and
kernel based methods, for more details see [24]. Q can be seen as a linear function
from R
p
to R
p∗
= L(R
p
), the space of scalar linear functions on R
p
. D can be seen
as a linear function from R
n
to R
n∗
= L(R
n
). Escouﬁer[8] proposed to associate to
a data set an operator from the space of observations R
p
into the dual of the space
of variables R
n∗
. This is summarized in the following diagram [1] which is made
commutative by deﬁning V and W as X
t
DX and XQX
t
respectively.
R
p∗
−−→
X
R
n
Q
¸
V D
¸
W
R
p
←−−
X
t
R
n∗
This is known as the duality diagram because knowledge of the eigendecompo
sition of X
t
DXQ = V Q leads to that of the dual operator XQX
t
D. The main
consequence is an easy transition between principal components and principal axes
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 3
as we will see in the next section. The terms duality diagram or triplet are often
used interchangeably.
Notes:
1. Suppose we have data and inner products deﬁned by Q and D :
(x, y) ∈ R
p
×R
p
−→ x
t
Qy =< x, y >
Q
∈ R
(x, y) ∈ R
n
×R
n
−→ x
t
Dx =< x, y >
D
∈ R
We say an operator O is Bsymmetric if < x, Oy >
B
=< Ox, y >
B
, or equiv
alently BO = O
t
B. The duality diagram is equivalent to three matrices
(X, Q, D) such that X is n × p and Q and D are symmetric matrices of the
right size (Q is p×p and D is n×n). The operators deﬁned as XQ
t
XD = WD
and X
t
DXQ = V Q are called the characteristic operators of the diagram [8],
in particular V Q is Qsymmetric and WD is Dsymmetric.
2. V = X
t
DX will be the variancecovariance matrix if X is centered with
regards to (D, X
D1
n
= 0).
3. There is an important symmetry between the rows and columns of X in the
diagram, and one can imagine situations where the role of observation or
variable is not uniquely deﬁned. For instance in microarray studies the genes
can be considered either as variables or observations. This makes sense in
many contemporary situations which evade the more classical notion of n
observations seen as a random sample of a population. It is certainly not the
case that the 30,000 probes are a sample of genes since these probes try to be
an exhaustive set.
2.1.1. Properties of the Diagram
Here are some of the properties that prove useful in various settings:
• Rank of the diagramX, X
t
, V Qand WD all have same rank which will usually
be smaller than both n and p.
• For Q and D symmetric matrices V Q and WD are diagonalisable and have
the same eigenvalues. We note them in decreasing order
λ
1
≥ λ
2
≥ λ
3
≥ . . . ≥ λ
r
≥ 0 ≥ · · · ≥ 0.
• Eigendecomposition of the Diagram:
V Q is Q symmetric, thus we can ﬁnd Z such that
V QZ = ZΛ, Z
t
QZ = I
p
, where Λ = diag(λ
1
, λ
2
, . . . , λ
p
) (2.1)
In practical computations, we start by ﬁnding the Cholesky decompositions
of Q and D, which exist as long as these matrices are symmetric and positive
deﬁnite, call these H
t
H = Q and K
t
K = D. Here H and K are triangular.
Then we use the singular value decomposition of KXH:
KXH = UST
t
, with T
t
T = I
p
, U
t
U = I
n
, S diagonal
. Then Z = (H
−1
)
t
T satisﬁes (2.1) with Λ = S
2
. The renormalized columns
of Z, A = SZ are called the principal axes and satisfy:
A
t
QA = Λ
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 4
Similarly, we can deﬁne L = K
−1
U that satisﬁes
WDL = LΛ, L
t
DL = I
n
, where Λ = diag(λ
1
, λ
2
, . . . , λ
r
, 0, . . . , 0). (2.2)
C = LS is usually called the matrix of principal components. It is normed so
that
C
t
DC = Λ.
• Transition Formulæ: Of the four matrices Z, A, L and C we only have to
compute one, all others are obtained by the transition formulæprovided by
the duality property of the diagram.
XQZ = LS = C X
t
DL = ZS = A
• The Trace(V Q) = Trace(WD) is often called the inertia of the diagram, (in
ertia in the sense of Huyghens inertia formula for instance). The inertia with
regards to a point A of a cloud of p
i
weighted points being
¸
n
i=1
p
i
d
2
(x
i
, a).
When we look at ordinary PCA with Q = I
p
, D =
1
n
I
n
, and the variables are
centered, this is the sum of the variances of all the variables, if the variables
are standardized Q is the diagonal matrix of inverse variances, the intertia is
the number of variables p.
2.2. Comparing Two Diagrams: the RV coeﬃcient
Many problems can be rephrased in terms of comparison of two ”duality diagrams”
or put more simply, two characterizing operators, built from two ”triplets”, usually
with one of the triplets being a response or having constraints imposed on it. Most
often what is done is to compare two such diagrams and try to get one to match
the other in some optimal way.
To compare two symmetric operators there is either a vector covariance
covV (O
1
, O
2
) = Tr(O
t
1
O
2
) or their vector correlation[8]
RV (O
1
, O
2
) =
Tr(O
1
O
2
)
Tr(O
t
1
O
1
)tr(O
t
2
O
2
)
If we were to compare the two triplets
X
n×1
, 1,
1
n
I
n
and
Y
n×1
, 1,
1
n
I
n
we would
have RV = ρ
2
.
PCA can be seen as ﬁnding the matrix Y which maximizes of the RV coeﬃcient
between characterizing operators that is between (X
n×p
, Q, D) and (Y
n×q
, I, D),
under the constraint that Y be of rank q < p .
RV
XQX
t
D, Y Y
t
D
=
Tr (XQX
t
DY Y
t
D)
Tr (XQX
t
D)
2
Tr (Y Y
t
D)
2
This maximum is attained where Y is chosen as the ﬁrst q eigenvectors of
XQX
t
D normed so that Y
t
DY = Λ
q
. The maximum RV is
RV max =
¸
q
i=1
λ
2
i
¸
p
i=1
λ
2
i
.
Of course, classical PCA has D = identity, Q =
1
n
identity but the extra ﬂexibility
is often useful. We deﬁne the distance between triplets (X, Q, D) and (Z, Q, M)
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 5
where Z is also n ×p as the distance deduced from the RV inner product between
operators XQX
t
D and ZMZ
t
D. In fact, the reason the French like this scheme so
much is that most linear methods can be reframed in these terms. We will give a
few examples such as Principal Component Analysis(ACP in French), Correspon
dence Analysis (AFC in French), Discriminant Analysis (AFD in French), PCA
with regards to instrumental variable (ACPVI in French) and Canonical Correla
tion Analysis (AC).
2.3. Explaining one diagram by another
Principal Component Analysis with respect to Instrumental Variables was a tech
nique developed by C.R. Rao [25] to ﬁnd the best set of coeﬃcients in multivariate
regression setting where the response is multivariate, given by a matrix Y . In terms
of diagrams and RV coeﬃcients this problem can be rephrased as that of ﬁnding
M to associate to X so that (X, M, D) is as close as possible to (Y, Q, D) in the
RV sense.
The answer is provided by deﬁning M such that
Y QY
t
D = λXMX
t
D.
If this is possible then the two eigendecompositions of the triplet give the same
answers. We simplify notation by the following abbreviations:
X
t
DX = S
xx
Y
t
DY = S
yy
X
t
DY = S
xy
and R = S
−1
xx
S
xy
QS
yx
S
−1
xx
.
Then
Y QY
t
D −XMX
t
D
2
= Y QY
t
D −XRX
t
D
2
+ XRX
t
D −XMX
t
D
2
The ﬁrst term on the right hand side does not depend on M, and the second term
will be zero for the choice M = R.
In order to have rank q < min (rank (X), rank (Y)) the optimal choice of a posi
tive deﬁnite matrix M of M = RBB
t
R where the columns of B are the eigenvectors
of X
t
DXR with 1:
B =
1
√
λ
1
β
1
, . . .
1
λ
q
β
q
X
t
DXRβ
k
= λ
k
β
k
β
t
k
Rβ
k
= λ
k
, k = 1 . . . q
λ
1
> λ
2
> . . . > λ
q
The PCA with regards to instrumental variables of rank q is equivalent to the PCA
of rank q of the triplet (X, R, D)
R = S
−1
xx
S
xy
QS
yx
S
−1
xx
.
2.4. One Diagram to replace Two Diagrams
Canonical correlation analysis was introduced by Hotelling[18] to ﬁnd the common
structure in two sets of variables X
1
and X
2
measured on the same observations.
This is equivalent to merging the two matrices columnwise to form a large matrix
with n rows and p
1
+ p
2
columns and taking as the weighting of the variables the
matrix deﬁned by the two diagonal blocks (X
t
1
DX
1
)
−1
and (X
t
2
DX
2
)
−1
Q =
¸
¸
¸
(X
t
1
DX
1
)
−1
0
0 (X
t
2
DX
2
)
−1
¸
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 6
R
p1∗
−−−−→
X1
R
n
Ip
1
¸
V1 D
¸
W1
R
p1
←−−−−
X
t
1
R
n∗
R
p2∗
−−−−→
X2
R
n
Ip
2
¸
V2 D
¸
W2
R
p2
←−−−−
X
t
2
R
n∗
R
p1+p2∗
−−−−−→
[X1;X2]
R
n
Q
¸
V D
¸
W
R
p1+p2
←−−−−−−
[[X1;X2]
t
R
n∗
This analysis gives the same eigenvectors as the analysis of the triple
(X
t
2
DX
1
, (X
t
1
DX
1
)
−1
, (X
t
2
DX
2
)
−1
) also known as the canonical correlation analy
sis of X
1
and X
2
.
2.5. Discriminant Analysis
If we want to ﬁnd linear combinations of the original variables X
n×p
that charac
terize the best the group structure of the points given by a zero/one group coding
matrix Y , with as many columns as groups, we can phrase the problem as a duality
diagram. Suppose that the observations are given individual weights in the diagonal
matrix D, and that the variables are centered with regards to these weights.
Let A be the q ×p matrix of group means in each of the p variables. This satisﬁes
Y
t
DX = ∆
Y
1
q
A where ∆
Y
is the diagonal matrix of group weights.
The variance covariance matrix will be denoted T = X
t
DX, with elements
t
jk
= cov(x
j
, x
k
) =
n
¸
i=1
d
i
(x
ij
− ¯ x
j
)(x
ik
− ¯ x
k
)
. The between group variancecovariance is
B = A
t
∆
Y
A.
The duality diagram for linear discriminant analysis is
R
p∗
−−→
X
R
n
T
−1
¸
B ∆
Y
¸
AT
−1
A
t
R
p
←−−
X
t
R
n∗
Corresponding to the triplet (A, T
−1
, ∆
Y
), because
(X
t
DY )∆
−1
Y
(Y
t
DX) = A
t
∆
Y
A
this gives equivalent results to the triplet (Y
t
DX, T
−1
, ∆
−1
Y
).
The discriminating variables are the eigenvectors of the operator
A
t
∆
Y
AT
−1
They can also be seen as the PCA with regards to instrumental variables of
(Y, ∆
−1
, D) with regards to (X, M, D).
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 7
2.6. Correspondence Analysis
Correspondence analysis can be used to analyse several types of multivariate data.
All involve some categorical variables. Here are some examples of the type of data
that can be decomposed using this method:
• Contingency Tables (crosstabulation of two categorical variables)
• Multiple Contingency Tables (crosstabulation of several categorical vari
ables).
• Binary tables obtained by cutting continuous variables into classes and then
recoding both these variables and any extra categorical variables into 0/1
tables, 1 indicating presence in that class. So for instance a continuous variable
cut into three classes will provide three new binary variables of which only
one can take the value one for any given observation.
To ﬁrst approximation, correspondence analysis can be understood as an extension
of principal components analysis (PCA) where the variance in PCA is replaced
by an inertia proportional to the χ
2
distance of the table from independence. CA
decomposes this measure of departure from independence along axes that are or
thogonal according to a χ
2
inner product. If we are comparing two categorical
variables, the simplest possible model is that of independence in which case the
counts in the table would obey approximately the margin products identity. For
an m × p contingency table N with n =
¸
m
i=1
¸
p
j=1
n
ij
= n
··
observations and
associated to the frequency matrix
F =
N
n
Under independence the approximation
n
ij
.
=
n
i·
n
n
·j
n
n
can also be written:
N
.
= cr
t
n
where
r =
1
n
N1
p
is the vector of row sums of F and c
t
=
1
n
N
1
m
are the column sums
The departure from independence is measured by the χ
2
statistic
X
2
=
¸
i,j
[
(n
ij
−
n
i·
n
n
·j
n
n)
2
n
i·
n
·j
n
2
n
]
Under the usual validity assumptions that the cell counts n
ij
are not too small, this
statistic is distributed as a χ
2
with (m−1)(p−1) degrees of freedom if the data are
independent. If we do not reject independence, there is no more to be said about
the table, no interaction of interest to analyse. There is in fact no ‘multivariate’
eﬀect.
On the contrary if this statistic is large, we decompose it into one dimensional
components.
Correspondence analysis is equivalent to the eigendecomposition of the triplet
(X, Q, D) with
X = D
−1
r
FD
−1
c
−1
t
1, Q = D
c
, D = D
r
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 8
D
c
= diag(c), D
r
= diag(r)
X
D
r
1
m
= 1
p
, the average of each column is one.
Notes:
1. Consider the matrix D
r
−1
FD
c
−1
and take the principal components with
regards to the weights D
r
for the rows and D
c
for the columns.
The recentered matrix D
r
−1
FD
c
−1
−1
m
1
p
has a generalized singular value
decomposition
D
r
−1
FD
c
−1
−1
m
1
p
= USV
, with U
D
r
U = I
m
, V
D
c
V = I
p
having total inertia:
D
r
(D
r
−1
FD
c
−1
−1
m
1
p
)
D
c
(D
r
−1
FD
c
−1
−1
m
1
p
) =
X
2
n
2. PCA of the row proﬁles FD
r
−1
, taken with weight matrix D
c
with the metric
Q = D
c
−1
.
3. Notice that
¸
i
f
i·
(
f
ij
f
i·
f
·j
−1) = 0
the row and columns proﬁles are centered
¸
j
f
˙
j
(
f
ij
f
i·
f
·j
−1) = 0
This method has been rediscovered many times, the most recently by Jon Klein
berg’s in his method for analysing Hubs and Authorities [19], see Fouss, Saerens
and Renders, 2004[11] for a detailed comparison.
In statistics the most commonplace use of Correspondence Analysis is in ordina
tion or seriation, that is , the search for a hidden gradient in contingency tables. As
an example we take data analysed by Cox and Brandwood [4] and [6] who wanted
to seriate Plato’s works using the proportion of sentence endings in a given book,
with a given stress pattern. We propose the use of correspondence analysis on the
table of frequencies of sentence endings, for a detailed analysis see Charnomordic
and Holmes[2].
The ﬁrst 10 proﬁles (as percentages) look as follows:
Rep Laws Crit Phil Pol Soph Tim
UUUUU 1.1 2.4 3.3 2.5 1.7 2.8 2.4
UUUU 1.6 3.8 2.0 2.8 2.5 3.6 3.9
UUUU 1.7 1.9 2.0 2.1 3.1 3.4 6.0
UUUU 1.9 2.6 1.3 2.6 2.6 2.6 1.8
UUUU 2.1 3.0 6.7 4.0 3.3 2.4 3.4
UUUU 2.0 3.8 4.0 4.8 2.9 2.5 3.5
UUU 2.1 2.7 3.3 4.3 3.3 3.3 3.4
UUU 2.2 1.8 2.0 1.5 2.3 4.0 3.4
UUU 2.8 0.6 1.3 0.7 0.4 2.1 1.7
UUU 4.6 8.8 6.0 6.5 4.0 2.3 3.3
.......etc (there are 32 rows in all)
The eigenvalue decomposition (called the scree plot) of the chisquare distance
matrix (see [2]) shows that two axes out of a possible 6 (the matrix is of rank 6)
will provide a summary of 85% of the departure from independence, this suggests
that a planar representation will provide a good visual summary of the data.
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 9
Eigenvalue inertia % cumulative %
1 0.09170 68.96 68.96
2 0.02120 15.94 84.90
3 0.00911 6.86 91.76
4 0.00603 4.53 96.29
5 0.00276 2.07 98.36
6 0.00217 1.64 100.00
Tim
Laws
Rep
Soph
Phil
Pol
Crit
Axis #1: 69%
A
x
i
s
#
2
:
1
6
%
0.2 0.0 0.2 0.4
0
.
0

0
.
2

0
.
3

0
.
1
0
.
1
Figure 1: Correspondence Analysis of Plato’s Works
We can see from the plot that there is a seriation that as in most cases follows a
parabola or arch[16] from Laws on one extreme being the latest work and Republica
being the earliest among those studied.
3. From Discriminant Analysis to Networks
Consider a graph with vertices the members of a group and edges if two members
interact. We suppose each vertex comes with an observation vector x
i
, and that each
has the same weight
1
n
, In the extreme case of discriminant analysis, the graph is
supposed to connect all the points of a group in a complete graph, and be discon
nected from the other observations. Discriminant Analysis is just the explanation
of this particular graph by linear combinations of variables, what we propose here
is to extend this to more general graphs in a similar way. We will suppose all the
observations are the nodes of the graph and each has the same weight
1
n
. The basic
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 10
decomposition of the variance is written
cov (x
j
, x
k
) = t
jk
=
1
n
n
¸
i=1
(x
ij
− ¯ x
j
)(x
ik
− ¯ x
k
)
Call the group means ¯ x
gj
=
1
n
g
¸
i∈Gg
x
ij
, g = 1 . . . q
¸
i∈Gg
(x
ij
− ¯ x
gj
)(¯ x
gj
− ¯ x
k
) = (¯ x
gj
− ¯ x
k
)
¸
i∈Gg
(x
ij
− ¯ x
gj
) = 0
Huyghens formula t
jk
= w
jk
+b
jk
Where w
jk
=
q
¸
g=1
¸
i∈Gg
(x
ij
− ¯ x
gj
)(x
ik
− ¯ x
gk
)
and b
jk
=
q
¸
g=1
n
g
n
(¯ x
gj
− ¯ x
j
)(¯ x
gk
− ¯ x
k
)
T = W +B
As we showed above, linear discriminant analysis ﬁnds the linear combinations a
such that
a
t
Ba
a
t
Ta
is maximized. This is equivalent to maximizing the quadratic form
a
t
Ba in a, subject to the constraint a
t
Ta = 1. As we saw above, this is solved by
the solution to the eigenvalue problem
Ba = λTa or T
−1
Ba = λa if T
−1
exists
then a
Ba = λa
Ta = λ. We are going to extend this to graphs by relaxing the
group deﬁnition to partition the variation into local and global components
3.1. Decomposing the Variance into Local and Global Components
Lebart was a pioneer in adapting the eigenvector decompositions to cater to spatial
structure in the data (see [20, 21, 22]).
cov (x
j
, x
k
) =
1
n
n
¸
i=1
(x
ij
− ¯ x
j
)(x
ik
− ¯ x
k
)
=
1
2n
2
n
¸
i=1
n
¸
i
=1
(x
ij
−x
i
j
)(x
ik
−x
i
k
)
var(x
j
) =
1
2n
2
¸
(i,i
)∈E
(x
ij
−x
i
j
)
2
+
¸
(i,i
)/ ∈E
(x
ij
−x
i
j
)
2
If we call M is the incidence matrix of the directed graph: m
ij
= 1 if i points to j.
Suppose for the time being that M is symmetrical (the graph is undirected). The
degree of vertex i is m
i
=
¸
n
i
=1
m
ii
. We take the convention that there are no self
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 11
loops. Then another way of writing the variance formula is
var(x
j
) =
1
2n
2
n
¸
i=1
n
¸
i
=1
m
ii
(x
ij
−x
i
j
)
2
+
¸
(i,i
)/ ∈E
(x
ij
−x
i
j
)
2
var
loc
(x
j
) =
1
2m
n
¸
i=1
n
¸
i
=1
m
ii
(x
ij
−x
i
j
)
2
where m =
n
¸
i=1
n
¸
i
=1
m
ii
The total variance is the variance of the complete graph. Geary’s ratio [14] is used to
see whether the variable x
j
can be considered as independent of the graph structure.
If the neighboring values of x
j
seem correlated then the local variance will only be
an underestimate of the variance:
G = c(x
j
) =
var
loc
(x
j
)
var(x
j
)
Call D the diagonal matrix with the total degrees of each node in the diagonal
D = diag(m
i
).
For all variables taken together, j = 1, . . . p note the local covariance matrix
V =
1
2m
X
t
(D − M)X, if the graph is just made of disjoint groups of the same
size, this is proportional to the W within class variancecovariance matrix. The
proportionality can be accomplished by taking the average of the sum of squares
to the average of the neighboring nodes ([23]). We can generalize the Geary index
to account for irregular graphs coherently. In this case we weight each node by its
degree. Then we can write the Geary ratio for any nvector x as
c(x) =
x
t
(D −M)x
x
t
Dx
, D =
¸
¸
¸
¸
m
1
0 0 0
m
2
0 0
.
.
.
.
.
.
.
.
.
. . . 0 0 m
n
¸
We can ask for the coordinate(s) that are the most correlated to the graph structure
then if we want to minimize the Geary ratio, choose x such that c(x) is minimal,
this is equivalent to minimizing x
t
(D −M)x under the constraint x
t
Dx = 1. This
can be solved by ﬁnding the smallest eigenvalue µ with eigenvector x such that:
(D −M)x = µDx
D
−1
(D −M)x = µx
(1 −µ)x = D
−1
Mx
This is exactly the deﬁning equation of the correspondence analysis of the matrix
M. This can be extended to as many coordinates as we like, in particular we can
take the ﬁrst 2 largest eigenvectors and provide the best planar representation of
the graph in this way.
3.2. Regression of graphs on node covariates
The covariables measured on the nodes can be essential to understanding the ﬁne
structure of graphs. We call Xthe n×p matrix of measurements at the vertices of the
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 12
graph; they may be a combination of both categorical variables (gene families, GO
classes) and continuous measurements (expression scores). We can use the PCAIV
method deﬁned in section 2 to the eigenvectors of the graph deﬁned above. This
provides a method that uses the covariates in X to explain the graph. To be more
precise, given a graph (V, E) with adjacency matrix M, deﬁne the Laplacian
L = D
−1
(M −I) D = diag(d
1
, d
2
, . . . , d
n
) diagonal matrix of degrees
Using the eigenanalysis of the graph, we can summarize the graph with a few
variables, the ﬁrst few relevant eigenvectors of L, these can then be regressed on
the covariates using Principal Components with respect to Instrumental Variables
[25] as deﬁned above to ﬁnd the linear combination of node covariates that explain
the graph variables the best.
Appendix A: Resources
A.1. Reading
There are few references in English explaining the duality/operator point of view,
apart from the already cited references of Escouﬁer[8, 10]. Fr´ederique Gla¸con’s PhD
thesis[15] (in French) clearly lays out the duality principle before going on to ex
plain its application to the conjoint analysis of several matrices, or data cubes. The
interested reader ﬂuent in French could also consult any one of several Masters level
textbooks on the subject for many details and examples:
Brigitte Escoﬁer and J´erˆome Pag`es[7] have a textbook with many examples, al
though their approach is geometric, they do not delve into the Duality Diagram,
more than explaining on page 100 its use in transition formula between eigenbases
of the diﬀerent spaces.
[22] is one of the broader books on multivariate analyses, making connections be
tween modern uses of eigendecomposition techniques, clustering and segmentation.
This book is unique in its chapter on stability and validation of results (without
going as far as speaking of inference).
Caillez and Pag`es [1] is hard to ﬁnd, but was the ﬁrst textbook completely based on
the diagram approach, as was the case in the earlier literature they use transposed
matrices.
A.2. Software
The methods described in this article are all available in the form of R packages
which I recommend. The most complete package is ade4[3] which covers almost
all the problems I mention except that of regressing graphs on covariates, however
a complete understanding of the duality diagram terminology and philosophy is
necessary as these provide the building blocks for all the functions in the form
a class called dudi (this actually stands for duality diagram). One of the most
important features in all the ‘dudi.*’ functions is that when the argument scannf
is at its default value TRUE the ﬁrst step imposed on the user is the perusal of the
screeplot of eigenvalues. This can be very important, as choosing to retain 2 values
by default before consulting the eigenvalues can lead to the main mistake that can
be made when using these techniques: the separation of two close eigenvalues. When
two eigenvalues are close the plane will be stable, but not each individual axis or
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 13
principal component resulting in erroneous results if for instance the 2nd and 3rd
eigenvalues were very close and the user chose to take 2 axes[17].
Another useful addition also comes from the ecological community and is called
vegan. Here is a list of suggested functions from several packages:
• Principal Components Analysis (PCA) is available in prcomp and princomp
in the standard package stats as pca in vegan and as dudi.pca in ade4.
• Two versions of PCAIV are available, one is called Redundancy Analysis
(RDA) and is available as rda in vegan and pcaiv in ade4.
• Correspondence Analysis (CA) is available in cca in vegan and as dudi.coa
in ade4.
• Discriminant analysis is available as lda in stats, as discrimin in ade4
• Canonical Correlation Analysis is available in cancor in stats (Beware cca
in ade4 is Canonical Correspondence Analysis).
• STATIS (Conjoint analysis of several tables) is available in ade4.
Acknowledgements
I would like to thank Elizabeth Purdom for discussions about multivariate analy
sis, Yves Escouﬁer for reading this paper and teaching much about Duality over
the years, and Persi Diaconis for suggesting the Plato data in 1993 and for many
conversations about the American way. This work was funded under the NSF DMS
award 0241246.
References
[1] F. Cailliez and J. P. Pages., Introduction `a l’analyse des donn´es., SMASH,
Paris, 1976.
[2] B. Charnomordic and S. Holmes, Correspondence analysis for microar
rays, Statistical Graphics and Computing Newlsetter, 12 (2001).
[3] D. Chessel, A. B. Dufour, and J. Thioulouse., The ade4 package  i:
Onetable methods., R News, 4 (2004), pp. 5–10.
[4] D. R. Cox and L. Brandwood, On a discriminatory problem connected
with the works of plato, J. Roy. Statist. Soc. Ser. B, 21 (1959), pp. 195–200.
[5] P. Diaconis and D. Freedman, Asymptotics of graphical projection pursuit,
The Annals of Statistics, 12 (1984), pp. 793–815.
[6] P. Diaconis and J. Salzmann, Projection pursuit for Discrete Data, IMS,
2006, ch. tba, p. tba.
[7] B. Escofier and J. Pag` es, Analyse factorielles simples et multiples : Ob
jectifs, m´ethodes et interpr´etation, Dunod, 1998.
[8] Y. Escoufier, Operators related to a data matrix., in Recent developments
in Statistics., J. e. a. Barra, ed., North Holland,, 1977, pp. 125–131.
[9] , Cours d’analyse des donn´ees, Cours Polycopi´e 7901, IUT, Montpellier,
1979.
[10] , The duality diagram: a means of better practical applications, in De
velopment. in numerical ecology., P. In Legendre and L. Legendre, eds., 1987,
pp. 139–156.
[11] F. Fouss, J.M. Renders, and M. Saerens, Some relationships between
Kleinberg’s hubs and authorities, correspondence analysis, and the salsa algo
rithm, in JADT 2004, International Conference on the Statistical Analysis of
Textual Data, LouvainlaNeuve, 2004, pp. 445–455.
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. Holmes/The French way. 14
[12] D. A. Freedman and S. C. Peters, Bootstrapping a regression equation:
Some empirical results, Journal of the American Statistical Association, 79
(1984), pp. 97–106.
[13] , Bootstrapping an econometric model: Some empirical results, Journal of
Business & Economic Statistics, 2 (1984), pp. 150–158.
[14] R. Geary, The contiguity ratio and statistical mapping., The Incorporated
Statistician, 5 (1954), pp. 115–145.
[15] F. Glac¸on, Analyse conjointe de plusieurs matrices de donn´ees. Comparaison
de diﬀ´erentes m´ethodes., PhD thesis, Grenoble, 1981.
[16] M. Hill and H. Gauch, Detrended correspondence analysis, an improved
ordination technique, Vegetatio, 42 (1980), pp. 47–58.
[17] S. Holmes, Outils Informatiques pour l’Evaluation de la Pertinence d’un Re
sultat en Analyse des Donn´ees, PhD thesis, Montpellier, USTL, 1985.
[18] H. Hotelling, Relations between two sets of variables., Biometrika, 28 (1936),
pp. 321–327.
[19] J. M. Kleinberg, Hubs, authorities, and communities, ACM Comput. Surv.,
31 (1999), p. 5.
[20] L. Lebart, Traitement des Donn´ees Statistiques, Dunod, Paris, 1979.
[21] L. Lebart, A. Morineau, and K. M. Warwick, Multivariate Descriptive
Statistical Analysis, Wiley, 1984.
[22] L. Lebart, M. Piron, and A. Morineau, Statistique exploratoire multidi
mensionnelle, Dunod, Paris, France, 2000.
[23] A. Mom, M´ethodologie statistique de la classiﬁcation des r´eseaux de trans
ports., PhD thesis, Montpellier, USTL, 1988.
[24] E. Purdom, Comparative Multivariate Methods, PhD thesis, Stanford Univer
sity, Sequoia Hall, Stanford, CA 94305., 2006.
[25] C. R. Rao, The use and interpretation of principal component analysis in
applied research., Sankhya A, 26 (1964), pp. 329–359.
Sequoia Hall, Stanford, CA 94305
Email: susan@stat.stanford.edu
imsartlnms ver. 2005/05/19 file: dfc.tex date: July 30, 2006
S. The main consequence is an easy transition between principal components and principal axes imsartlnms ver. 2005/05/19 file: dfc. that I will develop below. Dn is an nxn matrix of weights on the ”observations”.. 2 which were translated into English.. it is important to see that Q and D deﬁne geometries or inner products in Rp and Rn respectively through xt Qy =< x. . the space of scalar linear functions on Rp . For example.... most often diagonal... y ∈ Rn From these deﬁnitions we see there is a close relation between this approach and kernel based methods. Q can be seen as a linear function from Rp to Rp∗ = L(Rp ). Holmes/The French way. . Yves Escouﬁer [8][10] publicized the method to biologists and ecologists presenting a formulation based on his RVcoeﬃcient. The most recent implementation is the R package ade4 (see A for a review of various implementations of the methods described here). My PhD advisor.. D can be seen as a linear function from Rn to Rn∗ = L(Rn ).tex date: July 30. to standardize the variables Q can be chosen as 1 2 σ1 0 1 2 σ2 0 Q= 0 .. y >Q xt Dy =< x. y ∈ Rp x. 0 0 1 2 σ3 0 . The ﬁrst software implementation of duality based methods described here were done in LEAS (1984) a Pascal program written for Apple II computers. y >D x. As the approach here is geometrical. Notation The data are p variables measures on n observations.1. Rp∗ − → − X Q V X D Rn W Rp ← t− Rn∗ − This is known as the duality diagram because knowledge of the eigendecomposition of X t DXQ = V Q leads to that of the dual operator XQX t D. 2006 .. Q. These three matrices form the essential ”triplet” (X. for more details see [24]. D) deﬁning a multivariate data analysis. . They are recorded in a matrix X with n rows (the observations) and p columns (the variables). Escouﬁer[8] proposed to associate to a data set an operator from the space of observations Rp into the dual of the space of variables Rn∗ . This is summarized in the following diagram [1] which is made commutative by deﬁning V and W as X t DX and XQX t respectively. 2. We will also use a ”neighborhood” relation (thought of as a metric on the observations) deﬁned by taking a symmetric deﬁnite positive matrix Q. . 0 0 0 0 1 2 σp ...
The duality diagram is equivalent to three matrices (X.tex date: July 30. Then Z = (H −1 )t T satisﬁes (2. λ2 . Notes: 1. Z t QZ = Ip . 2. . • For Q and D symmetric matrices V Q and W D are diagonalisable and have the same eigenvalues. 2. and one can imagine situations where the role of observation or variable is not uniquely deﬁned. Then we use the singular value decomposition of KXH: KXH = U ST t . It is certainly not the case that the 30. . Here H and K are triangular. X D1n = 0). 2006 . Properties of the Diagram Here are some of the properties that prove useful in various settings: • Rank of the diagram X. 2005/05/19 file: dfc. λp ) (2. where Λ = diag(λ1 . This makes sense in many contemporary situations which evade the more classical notion of n observations seen as a random sample of a population.1) In practical computations. y) ∈ Rp × Rp −→ xt Qy =< x. y >D ∈ R We say an operator O is Bsymmetric if < x. U t U = In .1. We note them in decreasing order λ1 ≥ λ2 ≥ λ3 ≥ . ≥ λr ≥ 0 ≥ · · · ≥ 0. A = SZ are called the principal axes and satisfy: At QA = Λ imsartlnms ver. . 3. • Eigendecomposition of the Diagram: V Q is Q symmetric.S.1. we start by ﬁnding the Cholesky decompositions of Q and D. S diagonal . The renormalized columns of Z. V = X t DX will be the variancecovariance matrix if X is centered with regards to (D. . y >Q ∈ R (x. which exist as long as these matrices are symmetric and positive deﬁnite. 3 as we will see in the next section. y >B . with T t T = Ip . There is an important symmetry between the rows and columns of X in the diagram. Suppose we have data and inner products deﬁned by Q and D : (x.1) with Λ = S 2 . D) such that X is n × p and Q and D are symmetric matrices of the right size (Q is p×p and D is n×n). Q. Holmes/The French way. . For instance in microarray studies the genes can be considered either as variables or observations. y) ∈ Rn × Rn −→ xt Dx =< x. V Q and W D all have same rank which will usually be smaller than both n and p. call these H t H = Q and K t K = D. thus we can ﬁnd Z such that V QZ = ZΛ. The operators deﬁned as XQt XD = W D and X t DXQ = V Q are called the characteristic operators of the diagram [8]. Oy >B =< Ox.000 probes are a sample of genes since these probes try to be an exhaustive set. or equivalently BO = Ot B. The terms duality diagram or triplet are often used interchangeably. X t . . in particular V Q is Qsymmetric and W D is Dsymmetric.
D). Comparing Two Diagrams: the RV coeﬃcient Many problems can be rephrased in terms of comparison of two ”duality diagrams” or put more simply. a).2. • Transition Formulæ: Of the four matrices Z. A. this is the sum of the variances of all the variables. 0). (2. 0. under the constraint that Y be of rank q < p . 2006 . It is normed so that C t DC = Λ. I. XQZ = LS = C X t DL = ZS = A • The T race(V Q) = T race(W D) is often called the inertia of the diagram. . 2005/05/19 file: dfc. usually with one of the triplets being a response or having constraints imposed on it. Holmes/The French way. . if the variables are standardized Q is the diagonal matrix of inverse variances. λ2 i 1 Of course. Y Y t D = T r (XQX t DY Y t D) T r (XQX t D) T r (Y Y t D) 2 2 This maximum is attained where Y is chosen as the ﬁrst q eigenvectors of XQX t D normed so that Y t DY = Λq . Q. n In and Yn×1 . Q. n In we would 2 have RV = ρ . O2 ) = T r(O1 O2 ) or their vector correlation[8] RV (O1 . 1 When we look at ordinary PCA with Q = Ip . . . P CA can be seen as ﬁnding the matrix Y which maximizes of the RV coeﬃcient between characterizing operators that is between (Xn×p . we can deﬁne L = K −1 U that satisﬁes W DL = LΛ. Q = n identity but the extra ﬂexibility is often useful. Q. . and the variables are centered. 2. (inertia in the sense of Huyghens inertia formula for instance). . We deﬁne the distance between triplets (X. Most often what is done is to compare two such diagrams and try to get one to match the other in some optimal way. D) and (Yn×q . . 1. the intertia is the number of variables p. built from two ”triplets”. λ2 . O2 ) = T r(O1 O2 ) t t T r(O1 O1 )tr(O2 O2 ) 1 1 If we were to compare the two triplets Xn×1 . all others are obtained by the transition formulæprovided by the duality property of the diagram. λr .2) C = LS is usually called the matrix of principal components. 4 Similarly. To compare two symmetric operators there is either a vector covariance t covV (O1 . D) and (Z. D = n In . The inertia with n regards to a point A of a cloud of pi weighted points being i=1 pi d2 (xi . The maximum RV is RV max = q i=1 p i=1 λ2 i . . L and C we only have to compute one.tex date: July 30. two characterizing operators. RV XQX t D.S. Lt DL = In . M ) imsartlnms ver. where Λ = diag(λ1 . 1. classical PCA has D = identity.
given by a matrix Y . k = 1 . In fact. D) in the RV sense. rank (Y)) the optimal choice of a positive deﬁnite matrix M of M = RBB t R where the columns of B are the eigenvectors of X t DXR with 1: X t DXRβk = λk βk 1 1 t βk Rβk = λk . . 2006 . 2005/05/19 file: dfc. In terms of diagrams and RV coeﬃcients this problem can be rephrased as that of ﬁnding M to associate to X so that (X. Discriminant Analysis (AFD in French). This is equivalent to merging the two matrices columnwise to form a large matrix with n rows and p1 + p2 columns and taking as the weighting of the variables the t t matrix deﬁned by the two diagonal blocks (X1 DX1 )−1 and (X2 DX2 )−1 t (X1 DX1 )−1 0 Q= t −1 0 (X2 DX2 ) imsartlnms ver. . D) −1 −1 R = Sxx Sxy QSyx Sxx . PCA with regards to instrumental variable (ACPVI in French) and Canonical Correlation Analysis (AC). q βq B = √ β1 . 2. Q. If this is possible then the two eigendecompositions of the triplet give the same answers. the reason the French like this scheme so much is that most linear methods can be reframed in these terms.R. In order to have rank q < min (rank (X). .4. The answer is provided by deﬁning M such that Y QY t D = λXM X t D. > λq The PCA with regards to instrumental variables of rank q is equivalent to the PCA of rank q of the triplet (X. λ1 λq λ1 > λ2 > .S. .tex date: July 30. = Y QY t D − XRX t D + XRX t D − XM X t D 2 The ﬁrst term on the right hand side does not depend on M . and the second term will be zero for the choice M = R. Explaining one diagram by another Principal Component Analysis with respect to Instrumental Variables was a technique developed by C. . . R. 2. Rao [25] to ﬁnd the best set of coeﬃcients in multivariate regression setting where the response is multivariate. We will give a few examples such as Principal Component Analysis(ACP in French). Correspondence Analysis (AFC in French). Holmes/The French way. M.3. One Diagram to replace Two Diagrams Canonical correlation analysis was introduced by Hotelling[18] to ﬁnd the common structure in two sets of variables X1 and X2 measured on the same observations. We simplify notation by the following abbreviations: X t DX = Sxx Then Y QY t D − XM X t D 2 Y t DY = Syy X t DY = Sxy 2 −1 −1 and R = Sxx Sxy QSyx Sxx . 5 where Z is also n × p as the distance deduced from the RV inner product between operators XQX t D and ZM Z t D. . D) is as close as possible to (Y.
Discriminant Analysis If we want to ﬁnd linear combinations of the original variables Xn×p that characterize the best the group structure of the points given by a zero/one group coding matrix Y . because (X t DY )∆−1 (Y t DX) = At ∆Y A Y this gives equivalent results to the triplet (Yt DX. Holmes/The French way.5. T−1 . ∆Y ). D) with regards to (X. ∆−1 ).S. (X2 DX2 )−1 ) also known as the canonical correlation analysis of X1 and X2 . D). The duality diagram for linear discriminant analysis is Rp∗ − → Rn − X B AT −1 At −1 ∆Y T Rp ← t− Rn∗ − X Corresponding to the triplet (A. ∆−1 . The between group variancecovariance is B = At ∆Y A. 6 Rp1 ∗ − − → Rn −− X1 W V Ip1 D 1 1 Rp1 ←−− − t− X1 Rp2 ∗ − − → Rn −− X2 W V Ip2 D 2 2 Rp2 ←−− − t− X2 Rn∗ Rp1 +p2 ∗ − − − − −→ Rn [X1 . The variance covariance matrix will be denoted T = X t DX. 2006 . with as many columns as groups. M. imsartlnms ver. 2. with elements n tjk = cov(xj .X2 ] Q V D W Rp1 +p2 ←− −− −−− t [[X1 .X2 ] Rn∗ Rn∗ This analysis gives the same eigenvectors as the analysis of the triple t t t (X2 DX1 .tex date: July 30. and that the variables are centered with regards to these weights. xk ) = i=1 di (xij − xj )(xik − xk ) ¯ ¯ . Let A be the q × p matrix of group means in each of the p variables. 2005/05/19 file: dfc. T−1 . we can phrase the problem as a duality diagram. Suppose that the observations are given individual weights in the diagonal matrix D. (X1 DX1 )−1 . Y The discriminating variables are the eigenvectors of the operator At ∆Y AT −1 They can also be seen as the PCA with regards to instrumental variables of (Y. This satisﬁes Y t DX = ∆Y 1q A where ∆Y is the diagonal matrix of group weights.
7 2.j [ Under the usual validity assumptions that the cell counts nij are not too small. • Binary tables obtained by cutting continuous variables into classes and then recoding both these variables and any extra categorical variables into 0/1 tables. correspondence analysis can be understood as an extension of principal components analysis (PCA) where the variance in PCA is replaced by an inertia proportional to the χ2 distance of the table from independence.tex date: July 30. the simplest possible model is that of independence in which case the counts in the table would obey approximately the margin products identity. we decompose it into one dimensional components. D) with X = D−1 FD−1 − 1t 1. For m p an m × p contingency table N with n = i=1 j=1 nij = n·· observations and associated to the frequency matrix F= Under independence the approximation . All involve some categorical variables. N = crt n N n The departure from independence is measured by the χ2 statistic X2 = i. there is no more to be said about the table. ni· n·j n nij = n n can also be written: where r= 1 1 N 1p is the vector of row sums of F and ct = N 1m are the column sums n n n·j (nij − ni· n n)2 n ] ni· n·j n2 n . 2005/05/19 file: dfc. this statistic is distributed as a χ2 with (m − 1)(p − 1) degrees of freedom if the data are independent.6. Correspondence Analysis Correspondence analysis can be used to analyse several types of multivariate data. On the contrary if this statistic is large. Holmes/The French way. If we do not reject independence. 2006 . Q. CA decomposes this measure of departure from independence along axes that are orthogonal according to a χ2 inner product. D = Dr r c imsartlnms ver. So for instance a continuous variable cut into three classes will provide three new binary variables of which only one can take the value one for any given observation. Here are some examples of the type of data that can be decomposed using this method: • Contingency Tables (crosstabulation of two categorical variables) • Multiple Contingency Tables (crosstabulation of several categorical variables).S. Q = Dc . There is in fact no ‘multivariate’ eﬀect. 1 indicating presence in that class. If we are comparing two categorical variables. no interaction of interest to analyse. To ﬁrst approximation. Correspondence analysis is equivalent to the eigendecomposition of the triplet (X.
PCA of the row proﬁles F Dr −1 . see Fouss.3 2..etc (there are 32 Pol Soph Tim 1.0 2. Consider the matrix Dr −1 FDc −1 and take the principal components with regards to the weights Dr for the rows and Dc for the columns.7 UUU.9 2. imsartlnms ver.8 0.6 1.3 rows in all) The eigenvalue decomposition (called the scree plot) of the chisquare distance matrix (see [2]) shows that two axes out of a possible 6 (the matrix is of rank 6) will provide a summary of 85% of the departure from independence.3 3.0 6.7 3. Holmes/The French way. In statistics the most commonplace use of Correspondence Analysis is in ordination or seriation..6 8.1 2.5 3.5 .6 3. 3.3 UUU 2.5 3.8 2.8 2.tex date: July 30.5 UUUU 1.0 3. for a detailed analysis see Charnomordic and Holmes[2].4 0.3 2.3 3.3 2. that is . Saerens and Renders.6 2.5 UUU 2.4 3.S.4 2.9 2. with a given stress pattern.0 3.6 1. 2004[11] for a detailed comparison.8 UUU 2. 2006 .2 1.4.5 3. 8 Dc = diag(c).0 2.1 2..0 UUUU.3 4.1 1.7 4.0 2. the search for a hidden gradient in contingency tables.8 2. As an example we take data analysed by Cox and Brandwood [4] and [6] who wanted to seriate Plato’s works using the proportion of sentence endings in a given book.4 2.4 2.9 2.4 6.0 4.0 2.0 1.6 3.8 UUUU 1.0 6. Dr = diag(r) X Dr 1m = 1p .4 3. Notes: 1.6 1. the average of each column is one. Notice that fij fi· ( − 1) = 0 fi· f·j i the row and columns proﬁles are centered fj ( ˙ j fij − 1) = 0 fi· f·j This method has been rediscovered many times.1 UUUU 1. The recentered matrix Dr −1 FDc −1 − 1m 1p has a generalized singular value decomposition Dr −1 FDc −1 − 1m 1p = USV .8 3.8 4..7 2..7 1.6 UUUU 2.. The ﬁrst 10 proﬁles (as percentages) look as follows: Rep Laws Crit Phil UUUUU 1. V Dc V = Ip having total inertia: Dr (Dr −1 FDc −1 − 1m 1p ) Dc (Dr −1 FDc −1 − 1m 1p ) = X2 n 2.4 2.3 0.2. the most recently by Jon Kleinberg’s in his method for analysing Hubs and Authorities [19].3 4.3 3. with U Dr U = Im . this suggests that a planar representation will provide a good visual summary of the data. taken with weight matrix Dc with the metric Q = Dc −1 .7 4. 2005/05/19 file: dfc.8 6. We propose the use of correspondence analysis on the table of frequencies of sentence endings.1 3.9 3.1 3.
the graph is supposed to connect all the points of a group in a complete graph.00217 1. 2006 . The basic imsartlnms ver. We will suppose all the 1 observations are the nodes of the graph and each has the same weight n .64 100.4 Figure 1: Correspondence Analysis of Plato’s Works We can see from the plot that there is a seriation that as in most cases follows a parabola or arch[16] from Laws on one extreme being the latest work and Republica being the earliest among those studied.1 0.86 91.S.2 0.07 98.96 68. Holmes/The French way. From Discriminant Analysis to Networks Consider a graph with vertices the members of a group and edges if two members interact.90 0. 3.2 0.00603 4.tex date: July 30.29 0.0 Axis #1: 69% 0.2 0.3 Tim 0.94 84. what we propose here is to extend this to more general graphs in a similar way.36 0. In the extreme case of discriminant analysis.00 Laws Phil Rep Axis #2: 16% 0.53 96.09170 68. and be disconnected from the other observations. 9 1 2 3 4 5 6 Eigenvalue inertia % cumulative % 0.76 0. and that each 1 has the same weight n .96 0.02120 15. Discriminant Analysis is just the explanation of this particular graph by linear combinations of variables.00911 6. 2005/05/19 file: dfc.0 0.00276 2. We suppose each vertex comes with an observation vector xi .1 Pol Crit Soph 0.
this is solved by the solution to the eigenvalue problem Ba = λT a or T −1 Ba = λa if T −1 exists then a Ba = λa T a = λ. Holmes/The French way. 21. . cov (xj . As we saw above. . g = 1 . subject to the constraint at T a = 1.i )∈E / (xij − xi j )2 (i. 2005/05/19 file: dfc. We are going to extend this to graphs by relaxing the group deﬁnition to partition the variation into local and global components 3. Suppose for the time being that M is symmetrical (the graph is undirected).tex date: July 30. 2006 . 10 decomposition of the variance is written cov (xj . The n degree of vertex i is mi = i =1 mii . 22]).S. We take the convention that there are no self imsartlnms ver. This is equivalent to maximizing the quadratic form a Ta t a Ba in a. q i∈Gg (¯ gj − xk ) x ¯ i∈Gg ¯ (xij − xgj ) = 0 Huyghens formula Where wjk = tjk = wjk + bjk q ¯ ¯ (xij − xgj )(xik − xgk ) g=1 i∈Gg and bjk T = ng (¯ gj − xj )(¯ gk − xk ) x ¯ x ¯ n g=1 q = W +B As we showed above. linear discriminant analysis ﬁnds the linear combinations a t such that atBa is maximized. Decomposing the Variance into Local and Global Components Lebart was a pioneer in adapting the eigenvector decompositions to cater to spatial structure in the data (see [20. xk ) = = 1 n n (xij − xj )(xik − xk ) ¯ ¯ i=1 n n 1 2n2 (xij − xi j )(xik − xi k ) i=1 i =1 var(xj ) = 1 2n2 (xij − xi j )2 + (i.1. xk ) = tjk ¯ Call the group means xgj ¯ (xij − xgj )(¯ gj − xk ) x ¯ i∈Gg = = = 1 n n (xij − xj )(xik − xk ) ¯ ¯ i=1 1 ng xij .i )∈E If we call M is the incidence matrix of the directed graph: mij = 1 if i points to j.
. Regression of graphs on node covariates The covariables measured on the nodes can be essential to understanding the ﬁne structure of graphs. . 11 loops. . This can be extended to as many coordinates as we like.. xt Dx . This can be solved by ﬁnding the smallest eigenvalue µ with eigenvector x such that: (D − M )x = µDx D (D − M )x = µx (1 − µ)x = D−1 M x −1 This is exactly the deﬁning equation of the correspondence analysis of the matrix M. In this case we weight each node by its degree. For all variables taken together. Then we can write the Geary ratio for any nvector x as m1 0 0 0 m2 0 0 xt (D − M )x c(x) = . . choose x such that c(x) is minimal. . this is proportional to the W within class variancecovariance matrix. If the neighboring values of xj seem correlated then the local variance will only be an underestimate of the variance: G = c(xj ) = varloc (xj ) var(xj ) Call D the diagonal matrix with the total degrees of each node in the diagonal D = diag(mi ). 2005/05/19 file: dfc.S. p note the local covariance matrix 1 V = 2m X t (D − M )X. We call X the n×p matrix of measurements at the vertices of the imsartlnms ver.. 3. if the graph is just made of disjoint groups of the same size. .tex date: July 30. j = 1. 2006 .2. The proportionality can be accomplished by taking the average of the sum of squares to the average of the neighboring nodes ([23]). We can generalize the Geary index to account for irregular graphs coherently. in particular we can take the ﬁrst 2 largest eigenvectors and provide the best planar representation of the graph in this way. 0 0 mn We can ask for the coordinate(s) that are the most correlated to the graph structure then if we want to minimize the Geary ratio. Then another way of writing the variance formula is n n 1 var(xj ) = mii (xij − xi j )2 + 2n2 i=1 i =1 n (xij − xi j )2 (i. this is equivalent to minimizing xt (D − M )x under the constraint xt Dx = 1.. Geary’s ratio [14] is used to see whether the variable xj can be considered as independent of the graph structure. . .i )∈E / varloc (xj ) = where m = 1 2m n n mii (xij − xi j )2 i=1 i =1 n mii i=1 i =1 The total variance is the variance of the complete graph. D= . Holmes/The French way. . .
but was the ﬁrst textbook completely based on e the diagram approach. Appendix A: Resources A. as choosing to retain 2 values by default before consulting the eigenvalues can lead to the main mistake that can be made when using these techniques: the separation of two close eigenvalues. . A. One of the most important features in all the ‘dudi. deﬁne the Laplacian L = D−1 (M − I) D = diag(d1 . 12 graph. 10]. d2 . or data cubes. they do not delve into the Duality Diagram. This book is unique in its chapter on stability and validation of results (without going as far as speaking of inference). We can use the PCAIV method deﬁned in section 2 to the eigenvectors of the graph deﬁned above. Fr´derique Gla¸on’s PhD thesis[15] (in French) clearly lays out the duality principle before going on to explain its application to the conjoint analysis of several matrices. . E) with adjacency matrix M .2. GO classes) and continuous measurements (expression scores). more than explaining on page 100 its use in transition formula between eigenbases of the diﬀerent spaces. Caillez and Pag`s [1] is hard to ﬁnd. we can summarize the graph with a few variables. given a graph (V. To be more precise. 2006 . they may be a combination of both categorical variables (gene families. [22] is one of the broader books on multivariate analyses. . . the ﬁrst few relevant eigenvectors of L. The most complete package is ade4[3] which covers almost all the problems I mention except that of regressing graphs on covariates. dn ) diagonal matrix of degrees Using the eigenanalysis of the graph.1. but not each individual axis or imsartlnms ver. these can then be regressed on the covariates using Principal Components with respect to Instrumental Variables [25] as deﬁned above to ﬁnd the linear combination of node covariates that explain the graph variables the best.*’ functions is that when the argument scannf is at its default value TRUE the ﬁrst step imposed on the user is the perusal of the screeplot of eigenvalues. clustering and segmentation. Reading There are few references in English explaining the duality/operator point of view. Holmes/The French way. aleo e though their approach is geometric. This provides a method that uses the covariates in X to explain the graph. making connections between modern uses of eigendecomposition techniques. The interested reader ﬂuent in French could also consult any one of several Masters level textbooks on the subject for many details and examples: Brigitte Escoﬁer and J´rˆme Pag`s[7] have a textbook with many examples.tex date: July 30. 2005/05/19 file: dfc. When two eigenvalues are close the plane will be stable. e c apart from the already cited references of Escouﬁer[8. Software The methods described in this article are all available in the form of R packages which I recommend. This can be very important.S. however a complete understanding of the duality diagram terminology and philosophy is necessary as these provide the building blocks for all the functions in the form a class called dudi (this actually stands for duality diagram). as was the case in the earlier literature they use transposed matrices.
Cailliez and J.. References [1] F. IUT. e e [8] Y. This work was funded under the NSF DMS award 0241246. [4] D. • Two versions of PCAIV are available. The duality diagram: a means of better practical applications. Dunod. North Holland. 5–10. J. and the salsa algorithm.. Another useful addition also comes from the ecological community and is called vegan. 195–200. Roy. Montpellier. B. P. On a discriminatory problem connected with the works of plato. 12 (2001).M. pp. Charnomordic and S. J. Brandwood.. [10] . Escofier and J. Ser. 4 (2004). Legendre. correspondence analysis. Holmes. as discrimin in ade4 • Canonical Correlation Analysis is available in cancor in stats (Beware cca in ade4 is Canonical Correspondence Analysis). Correspondence analysis for microarrays. • Correspondence Analysis (CA) is available in cca in vegan and as dudi. Analyse factorielles simples et multiples : Objectifs. Statist. 125–131. Yves Escouﬁer for reading this paper and teaching much about Duality over the years. Pages. Saerens. in Recent developments in Statistics. 2006.pca in ade4. Cours d’analyse des donn´es. 12 (1984). Cours Polycopi´ 7901. • Discriminant analysis is available as lda in stats. [11] F. a e Paris. Some relationships between Kleinberg’s hubs and authorities. Diaconis and D. Holmes/The French way. [5] P. [3] D.tex date: July 30. SMASH. 21 (1959). imsartlnms ver.. ch. [6] P. Projection pursuit for Discrete Data. 793–815. Here is a list of suggested functions from several packages: • Principal Components Analysis (PCA) is available in prcomp and princomp in the standard package stats as pca in vegan and as dudi. International Conference on the Statistical Analysis of Textual Data. pp.. Asymptotics of graphical projection pursuit. Freedman.. R News. Fouss. and Persi Diaconis for suggesting the Plato data in 1993 and for many conversations about the American way. 13 principal component resulting in erroneous results if for instance the 2nd and 3rd eigenvalues were very close and the user chose to take 2 axes[17]. Operators related to a data matrix. P. Introduction ` l’analyse des donn´s. J. IMS. 2006 . Salzmann. • STATIS (Conjoint analysis of several tables) is available in ade4. Renders. The ade4 package . in JADT 2004.. pp. Cox and L. Escoufier. Statistical Graphics and Computing Newlsetter. The Annals of Statistics. pp.. Barra. ed. [2] B. e e 1979. Diaconis and J. one is called Redundancy Analysis (RDA) and is available as rda in vegan and pcaiv in ade4. 2004. R. Dufour. Thioulouse. Pages.S. 2005/05/19 file: dfc.. a. in numerical ecology. 1976. 1998. m´thodes et interpr´tation. 1987. tba.. Chessel. Acknowledgements I would like to thank Elizabeth Purdom for discussions about multivariate analysis. eds.i: Onetable methods. p. in Development. [9] . LouvainlaNeuve. e. B. tba. pp. 1977. In Legendre and L. and J. Soc. ` [7] B.coa in ade4. pp. 445–455. A. 139–156. and M.
Sequoia Hall. PhD thesis. Holmes/The French way. e e [16] M.. Sequoia Hall. pp. Piron. Hill and H. 26 (1964). PhD thesis. Stanford.. CA 94305 Email: susan@stat. [20] L. an improved ordination technique. Gauch. e [21] L. Montpellier. 5 (1954). 321–327. Outils Informatiques pour l’Evaluation de la Pertinence d’un Resultat en Analyse des Donn´es. Peters. Grenoble. Vegetatio. authorities. Hubs. Dunod. 2000. Kleinberg. pp. ACM Comput. [14] R. Morineau. Bootstrapping a regression equation: Some empirical results. Bootstrapping an econometric model: Some empirical results. 1985. 1984. PhD thesis. Journal of Business & Economic Statistics. Comparative Multivariate Methods.edu imsartlnms ver. Biometrika.S. 329–359. The use and interpretation of principal component analysis in applied research. Lebart. pp. Hotelling. Stanford University. 2006. and communities. Glacon. Analyse conjointe de plusieurs matrices de donn´es. Surv. Morineau. Purdom. 1979. 14 [12] D. USTL. pp. 28 (1936). Multivariate Descriptive Statistical Analysis.. The Incorporated Statistician. 2006 . Paris. 2005/05/19 file: dfc. [25] C. 1988. Journal of the American Statistical Association. 42 (1980). 47–58. 2 (1984). Lebart. e [18] H.stanford. Paris. 115–145. [23] A. Detrended correspondence analysis. [15] F. p. Lebart.. [22] L. Rao. Freedman and S. PhD thesis. M. Traitement des Donn´es Statistiques. and A. 1981.. Geary.tex date: July 30. pp. C. Statistique exploratoire multidimensionnelle. Comparaison ¸ e de diﬀ´rentes m´thodes. 97–106. 150–158. 5. [24] E. [17] S. Relations between two sets of variables. R. A. France. Wiley. The contiguity ratio and statistical mapping. M. M´thodologie statistique de la classiﬁcation des r´seaux de transe e ports. A. USTL. Holmes. M. Sankhya A. Warwick. and K.. 31 (1999). Stanford. Mom. Dunod.. 79 (1984). CA 94305. Montpellier. pp. [13] . [19] J.