You are on page 1of 27

# Overview

Principal component analysis
Herve´ Abdi1∗ and Lynne J. Williams2
Principal component analysis (PCA) is a multivariate technique that analyzes a data
table in which observations are described by several inter-correlated quantitative
dependent variables. Its goal is to extract the important information from the table,
to represent it as a set of new orthogonal variables called principal components, and
to display the pattern of similarity of the observations and of the variables as points
in maps. The quality of the PCA model can be evaluated using cross-validation
techniques such as the bootstrap and the jackknife. PCA can be generalized
as correspondence analysis (CA) in order to handle qualitative variables and as
multiple factor analysis (MFA) in order to handle heterogeneous sets of variables.
Mathematically, PCA depends upon the eigen-decomposition of positive semidefinite matrices and upon the singular value decomposition (SVD) of rectangular
matrices.  2010 John Wiley & Sons, Inc. WIREs Comp Stat 2010 2 433–459

P

rincipal component analysis (PCA) is probably the
most popular multivariate statistical technique
and it is used by almost all scientific disciplines. It
is also likely to be the oldest multivariate technique.
In fact, its origin can be traced back to Pearson1 or
even Cauchy2 [see Ref 3, p. 416], or Jordan4 and also
Cayley, Silverster, and Hamilton, [see Refs 5,6, for
more details] but its modern instantiation was formalized by Hotelling7 who also coined the term principal
component. PCA analyzes a data table representing
observations described by several dependent variables, which are, in general, inter-correlated. Its goal
is to extract the important information from the data
table and to express this information as a set of new
orthogonal variables called principal components.
PCA also represents the pattern of similarity of the
observations and the variables by displaying them as
points in maps [see Refs 8–10 for more details].

PREREQUISITE NOTIONS AND
NOTATIONS
Matrices are denoted in upper case bold, vectors are
denoted in lower case bold, and elements are denoted
in lower case italic. Matrices, vectors, and elements

∗ Correspondence

to: herve@utdallas.edu

from the same matrix all use the same letter (e.g.,
A, a, a). The transpose operation is denoted by the
superscriptT . The identity matrix is denoted I.
The data table to be analyzed by PCA comprises
I observations described by J variables and it is
represented by the I × J matrix X, whose generic
element !is x"i,j . The matrix X has rank L where
L ≤ min I, J .
In general, the data table will be preprocessed
before the analysis. Almost always, the columns of X
will be centered so that the mean of each column
is equal to 0 (i.e., XT 1 = 0, where 0 is a J by
1 vector of zeros and 1 is an I by 1 vector of
ones).
addition, each element of X is divided
√ If in √
by I (or I − 1), the analysis is referred to as
a covariance PCA because, in this case, the matrix
XT X is a covariance matrix. In addition to centering,
when the variables are measured with different units,
it is customary to standardize each variable to unit
norm. This is obtained by dividing each variable by
its norm (i.e., the square root of the sum of all the
squared elements of this variable). In this case, the
analysis is referred to as a correlation PCA because,
then, the matrix XT X is a correlation matrix (most
statistical packages use correlation preprocessing as a
default).
The matrix X has the following singular value
decomposition [SVD, see Refs 11–13 and Appendix B
for an introduction to the SVD]:

1 School

of Behavioral and Brain Sciences, The University of Texas
at Dallas, MS: GR4.1, Richardson, TX 75080-3021, USA

2 Department

of Psychology, University of Toronto Scarborough,
DOI: 10.1002/wics.101

Volume 2, July /August 2010

X = P!QT

(1)

where P is the I × L matrix of left singular vectors,
Q is the J × L matrix of right singular vectors, and !

 2010 John Wiley & Sons, Inc.

433

Overview

www.wiley.com/wires/compstats

is the diagonal matrix of singular values. Note that
!2 is equal to " which is the diagonal matrix of the
(nonzero) eigenvalues of XT X and XXT .
The inertia of a column is defined as the sum of
the squared elements of this column and is computed
as
γ 2j =

I
#

x2i,j .

(2)

i

The sum of all the γ 2j is denoted I and it is called
the inertia of the data table or the total inertia. Note
that the total inertia is also equal to the sum of
the squared singular values of the data table (see
Appendix B).
The center of gravity of the rows [also called
centroid or barycenter, see Ref 14], denoted g, is the
vector of the means of each column of X. When X is
centered, its center of gravity is equal to the 1 × J row
vector 0T .
The (Euclidean) distance of the i-th observation
to g is equal to
2
=
di,g

J
#
\$
%2
xi,j − gj .

(3)

j

When the data are centered Eq. 3 reduces to
2
=
di,g

J
#

x2i,j .

Finding the Components
In PCA, the components are obtained from the SVD
of the data table X. Specifically, with X = P!QT
(cf. Eq. 1), the I × L matrix of factor scores, denoted
F, is obtained as:
F = P!.

j

GOALS OF PCA
The goals of PCA are to
(1) extract the most important information from the
data table;
(2) compress the size of the data set by keeping only
this important information;
(3) simplify the description of the data set; and
(4) analyze the structure of the observations and the
variables.

In order to achieve these goals, PCA computes
new variables called principal components which
are obtained as linear combinations of the original
variables. The first principal component is required

(5)

The matrix Q gives the coefficients of the linear
combinations used to compute the factors scores.
This matrix can also be interpreted as a projection
matrix because multiplying X by Q gives the values
of the projections of the observations on the principal
components. This can be shown by combining Eqs. 1
and 5 as:
F = P! = P!QT Q = XQ.

(4)

2
Note that the sum of all di,g
is equal to I which is the
inertia of the data table .

434

to have the largest possible variance (i.e., inertia and
therefore this component will ‘explain’ or ‘extract’
the largest part of the inertia of the data table).
The second component is computed under the
constraint of being orthogonal to the first component
and to have the largest possible inertia. The other
components are computed likewise (see Appendix A
for proof). The values of these new variables for
the observations are called factor scores, and these
factors scores can be interpreted geometrically as the
projections of the observations onto the principal
components.

(6)

The components can also be represented
geometrically by the rotation of the original axes.
For example, if X represents two variables, the length
of a word (Y) and the number of lines of its dictionary
definition (W), such as the data shown in Table 1, then
PCA represents these data by two orthogonal factors.
The geometric representation of PCA is shown in
Figure 1. In this figure, we see that the factor scores
give the length (i.e., distance to the origin) of the
projections of the observations on the components.
This procedure is further illustrated in Figure 2. In
this context, the matrix Q is interpreted as a matrix
of direction cosines (because Q is orthonormal). The
context, the matrix X can be interpreted as the
matrix as:
X = FQT

with FT F = !2 and QT Q = I.

(7)

This decomposition is often called the bilinear
decomposition of X [see, e.g., Ref 15].

 2010 Jo h n Wiley & So n s, In c.

Vo lu me 2, Ju ly /Au gu s t 2010

The contributions and the squared cosines are multiplied by 100 for ease of reading. and Squared Cosines of the Observations for the Example Length of Words (Y) and Number of Lines (W) TABLE 1 Raw Scores.00 0. Infectious & 4 13 10 Arise Pretentious For Neither 9 8 4 With 12 5 9 Scoundrel 4 11 5 Slope 5 8 6 Relief This 4 9 Monastery Solid 9 2 By 2 9 6 Insane 4 11 2 On 7 7 6 11 14 3 Across W Bag Y Gravity.70 4.32 1.92 0.76 −0. and the negative important contributions are represented in bold.89 0.30 3 1 3 8 9 4 31.99 2.29 3.99 0.71 44.77 0.54 0.14 3.84 6.29 4.52 −7.Volume 2. Squared Coordinates on the Components.71 8 12 30. July /August 2010 8 1 3 9 Therefore Generality  2010 John Wiley & Sons. Contributions of the Observations to the Components.30 2.71 11 8 1.69 0.06 −5.15 17.29 3.23 −2.00 −4.84 4.53 −1.05 0.48 F22 392 22. 15 6 160 5 4 10 120 Blot 1 0 4 −2 −1 4 3 −3 −2 −1 5 −2 3 −1 0 3 0 −4 −4 0 −3 y 0 −2 7 5 −4 −7 0 1 4 −4 −6 0 −3 3 0 −4 1 1 3 −1 6 w 0 −3.07 −4.23 −1.61 3 7 8 6 56.95 0 0.84 1.59 0.60 1. Coordinates.61 1.51 1.07 0.68 −0.52 F12 2 6 6 0 1 ctr2 × 100 0 11 ctr1 × 100 74 92 87 95 97 29 74 90 90 85 29 95 94 0 99 53 71 88 71 99 cos21 × 100 1 26 8 13 5 3 71 26 10 10 15 71 5 6 0 1 47 29 12 29 cos22 × 100 MW = 8.38 −2. Inc.07 1.54 −1.52 1.07 −2.69 F2 1 9.67 F1 0 2.84 0.51 6.29 0.59 12 3 1 14 14.65 2.71 1 15 21.30 2.85 0 6 0 8.85 0.49 444 I λ2 λ1 20 53 26 32 58 9 5 17 41 37 4 18 10 0 25 17 1 25 1 45 d2 52 5.98 4.71 100 10 100 4 48.92 3. Deviations from the Mean.15 8. MY = 6.00 0.49 3. The following abbreviations are used to label the columns: w = (W − MW ).38 −1. The positive important contributions are italicized .85 0.29 5.41 1.68 15.39 1.15 5 2 1 0 5 3 0 24.15 1.41 2. Squared Distances to the Center of WIREs Computational Statistics Principal component analysis 435 . y = (Y − MY ).91 −6.76 −5.83 6.11 0.35 36.

Note that the final graph could have been obtained directly by plotting the observations from the coordinates given in Table 1. the same preprocessing should be applied to the supplementary observations prior to the computation of their factor scores. It has Ysur = 3 letters. Because the principal components and the original variables are in the same space.com/wires/compstats Projecting New Observations onto the Components (a) 9 8 5 wsur = Ysur − MY = 3 − 6 = −3 3 Blot Solid This On For 2 Arise Bag By 1 2 1 3 4 5 6 7 9 10 11 12 13 14 15 8 Number of lines of the definition Pretentious 5 Generality 4 Infectious 3 Scoundrel 2 Monastery and = Wsur − MW = 12 − 8 = 4. the coordinates of the projections on the components can be directly computed 436 Insane Slope Relief With 4 (8) If the data table has been preprocessed (e. Specifically a 1 × J row vector xTsup .e. Then we plot the supplementary word in the graph that we have already used for the active analysis. These observations are called supplementary or illustrative observations. In c. Add a second component orthogonal to the first such that the sum of the squared distances is minimum. denoted fsup which is computed as: Pretentious Infectious −3 Therefore Neither On Bag 1 By 3 Scoundrel 1 Monastery −7 −6 −5 −4 −3 −2 −1 Across 7 Blot Solid For 2 Generality 6 This −4 (c) 5 Arise −1 2 Solid Insane Slope 1 2 3 4 Relief With −2 −3 For This Arise 5 Blot Bag 6 7 1 On By FIGURE 1 | The geometric steps for finding the components of a principal component analysis. the projections of the supplementary observation give its coordinates (i. So. but we would like to know how it relates to the English vocabulary. This is shown in Figure 3.g. we decided to treat this word as a supplementary observation. To find the components (1) center the variables then plot them against each other. Ju ly /Au gu s t 2010 . Equivalently. As an illustration. This matrix can also be used to compute factor scores for observations that were not included in the PCA. gives the 1 × L vector of factor scores. suppose that—in addition to the data presented in Table 1—we have the French word ‘sur’ (it means ‘on’).Overview www. By contrast.. 6. and our French dictionary reports that its definition has Wsur = 12 lines. Scoundrel Therefore 7 (b) T fsup Pretentious Generality Infectious Monastery 11 10 Number of letters of the word Equation 6 shows that matrix Q is a projection matrix which transforms the original data matrix into factor scores. can be projected into the PCA space using Eq. rotate the figure in order to position the first component horizontally (and the second component vertically). This T . the values of this observation are transformed into deviations from the English center of gravity. The factor scores for supplementary observations are obtained by first positioning these observations into the PCA space and then projecting them onto the principal components. centered or normalized). We find the following values: ysur Across Neither 6 Therefore Neither 1 −7 −6 −5 −4 −3 −2 −1 Across Insane 1 2 3 4 Relief Slope −1 With −2 2 = xTsup Q. (2) Find the main direction (called the first component) of the cloud of points such that we have the minimum of the sum of the squared distances from the points to the component. (3) When the components have been found.  2010 Jo h n Wiley & So n s. Vo lu me 2. the observations actually used to compute the PCA are called active observations. then erase the original axes..wiley. The first step is to preprocess this supplementary observation in a identical manner to the active observations. Because the data matrix was centered. we do not want to include it in the analysis. factor scores) on the components. Because sur is not an English word.

obtained as: i Volume 2.60 and −2.60 Scoundrel Therefore 2 2 Pretentious Generality On second component 5 Generality (a) Projection of “neither” On first component FIGURE 3 | How to find the coordinates (i. A useful heuristic is to base the interpretation of a component on the observations whose contribution is larger than the average contribution (i. the importance of an observation for a component can be obtained by the ratio of the squared factor score of this observation by the eigenvalue associated with that component. In our example (see Table 2) the inertia of the first component is equal to 392 and this corresponds to 83% of the total inertia. The projections (or coordinates) of the word ‘neither’ on the first and the second components are equal to −5.# fi.e. (9) INTERPRETING PCA Inertia explained by a component The importance of a component is reflected by its inertia or by the proportion of the total inertia ‘‘explained’’ by this factor. Inc. Therefore. The factor scores of the supplementary observations are not used to compute the eigenvalues and therefore their contributions are generally not computed. from Eq.# = # = 2 λ# fi. Formally. for a given component.38.5369 ' ( = 4. and (b) The projections of the sur on the principal components give its coordinates. the sum of the contributions of all observations is equal to 1. Squared Cosine of a Component with an Observation The squared cosine shows the importance of a component for a given observation.# 7 1 2 3 4 5 6 Insane Arise Relief Slope 1 By Generality ctri. the more the observation contributes to the component.38 4 Infectious Monastery 3 4.3835 .e. The larger the value of the contribution. 437 .5369 0.38 −3 7 2 −2 With Solid This For On −2 4 Infectious 3 Scoundrel 2 Neither 1 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 Insane Arise Relief Slope Across −1 Bag −4 (b) components. denoted ctri.# .8437 0. 8 (see also Table 3 for the values of Q) as: ) * ' ( −0.. The value of a contribution is between 0 and 1 and. where λ# is the eigenvalue of the #-th component. with the first and second 2 2 fi. observations whose contribution is larger than 1/I). July /August 2010 With Solid This For On Sur By Blot Bag 1 Pretentious (10) 5 −0.99 Scoundrel Therefore 2 Neither 1 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7 Insane Arise Across ReliefSlope −1 With Blot Solid −2 This −3 For On Bag 1 Sur −4 By 2 FIGURE 2 | Plot of the centered data. Contribution of an Observation to a Component Recall that the eigenvalue associated to a component is equal to the sum of the squared factor scores for this component. The squared  2010 John Wiley & Sons.. factor scores) on the principal components of a supplementary observation: (a) the French word sur is plotted in the space of the active observations from its deviations to the W and Y variables.8437 T = xTsup Q = −3 4 × fsup 0. the contribution of observation i to component # is. The observations with high contributions and different signs can then be opposed to help interpret the component because these observations represent the two endpoints of this component.WIREs Computational Statistics Principal component analysis Pretentious 1 Neither −7 −6 −5 −4 −3 −2 −1 Across −1 −4 −2 −3 Blot . This ratio is called the contribution of the observation to the component.# 5 Monastery Therefore 4 Infectious Monastery 3 −5 .9853 − 0.

In this case. and Q with only their first M components. the goal is to estimate the value of new observations from this population. we evaluate the similarity 440 In most applications. Therefore. The smaller the value of RESS. Several coefficients can be used between X and X for this task [see. +[M] is built back using Eq. = trace ET E PC1 =I− (b) λ# (15) #=1 where & & is the norm of X (i.Overview www. In c.e.. incidentally. we cannot use standard parametric procedures. A popular cross-validation technique is the jackknife (aka ‘leave one out’ procedure). a matrix denoted X. Note that the supplementary variables are not positioned on the unit circle. Using this procedure. !. better estimation of X the matrix X is always perfectly reconstituted with L components (recall that L is the rank of X). a larger M gives a +[M] . sup = xsup Q Frequency (16) Random Effect Model FIGURE 4 | Circle of correlations and plot of the loadings of (a) the variables with principal components 1 and 2. The predicted observations are then stored in . 12 can be adapted to compute the estimation of the supplementary observations as PC 2 Length (number of letters) M # Number of lines of the definition PC1 # Entries [M] [M]T + x[M] Q .19 The most popular coefficient. that Eq. and (b) the variables and supplementary variables with principal components 1 and 2. the set of observations represents a sample from a larger population. 16) the left-out observation which constitutes the testing set. this can also be done with a squared coefficient of correlation or (better) with the RV  2010 Jo h n Wiley & So n s. X−X To evaluate the quality of the reconstitution of X with M components. The overall quality of the PCA random effect model using M components is evaluated as the . the performance of the PCA model is evaluated using computer-based resampling techniques such as the bootstrap and cross-validation techniques where the data are separated into a learning and a testing set. and Q[M] represent. As with the fixed similarity between X and X effect model. For a fixed effect model.[M] . e. 7 can be rewritten in the current context as: +[M] + E = F[M] Q[M]T + E X=X (14) (where E is the error matrix. The learning set is then used to estimate (using Eq. 12 Then. This corresponds to a random effect model. In order to estimate the generalization capacity of the PCA model. It is computed as: PC 2 Length (number of letters) Number of lines of the definition +[M] &2 RESSM = &X − X . is the residual sum of squares (RESS). The squared coefficient of correlation is sometimes used. In addition. Note. the square root of the sum of all the squared elements of X). Vo lu me 2.g. In the jackknife.wiley. For a fixed effect model. as well as the RV coefficient. and where the trace of a matrix is the sum of its diagonal elements. the better the PCA model. the matrix X keeping only the first M components: +[M] = P[M] ![M] Q[M]T X = F[M] Q[M]T = XQ[M] Q[M]T (13) where P[M] . which is equal to +[M] ). Refs 16–18]. each observation is estimated according to a random effect model.18.com/wires/compstats (a) +[M] . ![M] . Eq.20–22 each observation is dropped from the set in turn and the remaining observations constitute the learning set. respectively the matrices P. however. Ju ly /Au gu s t 2010 ..

In contrast to Q2# . This procedure.Refs 9. but there are some guidelines [see. A more elaborated approach [see e.’’ see Refs 8. the information in the learning set is not useful to fit the testing set). somewhat subjective.[M] &2 PRESSM = &X − X (17) The smaller the PRESS the better the quality of the estimation for a random model. How Many Components? Often. It is computed as: . In this case. this rule boils down to the standard advice to ‘keep only the eigenvalues larger than 1’ [see. df# is the number of degrees of freedom for the #-th component equal to L λ# > 1# 1 λ# = I L L (18) df# = I + J − 2#. only the important information needs to be extracted from a data matrix.24 and Table 2] and to see if there is a point in this graph (often called an ‘elbow’) such that the slope of the graph goes from ‘steep’ to ‘‘flat’’ and to keep only the components which are before the elbow. Refs 27–31] begins by computing.WIREs Computational Statistics Principal component analysis coefficient.0975) are kept [an alternative set of critical values sets the threshold to 0. This problem is still open. Ref 25]. Inc. # = J(I − 1) − # # k=1 (I + J − 2k) = J(I − 1) − #(I + J − # − 1). The value of dfresidual. the index W# .e.05 when I ≤ 100 and to 0 when I > 100.8. Another approach—based on cross-validation— to decide upon the number of components to keep uses the index W# derived from Refs 32 and 33. when using a random model.g. which depends on RESS and PRESS. Similar to RESS. e. July /August 2010 (21) and dfresidual. PRESS# df# (20) where PRESS0 is the inertia of the data table. 441 . # (where L is the rank of X). This is particularly the case when the number of variables is larger than the number of observations (a configuration known as the ‘small N large P’ problem in the literature). this amount to keeping the #-th component if the optimal number of components to keep when the goal is to generalize the conclusions of an analysis to new data. When the quality of the prediction decreases as the number of components increases this is an indication that the model is overfitting the data (i. the problem is to figure out how many components need to be considered.g..952 = 0. For a correlation PCA. the value of both Q2L and WL is meaningless because they both involve a division by zero. see Ref 28]. quality typically increases and then decreases. However. the matrix X is not always perfectly reconstituted with all L components...  2010 John Wiley & Sons.. this procedure can lead to ignoring important information [see Ref 26 for an example of this problem]. A simple approach stops adding components when PRESS decreases. In fact. # is obtained as: dfresidual. it is important to determine Volume 2. # PRESS#−1 − PRESS# × . for each component #. Another standard tradition is to keep only the components whose eigenvalue is larger than the average. When J is smaller than I. the quality of the prediction does not always increase with the number of components of the model.23]. Contrary to what happens with the fixed effect model. one can use the predicted residual sum of squares (PRESS). e. A first procedure is to plot the eigenvalues according to their size [the so called ‘‘scree. It is computed for the #-th component as W# = dfresidual. is called the scree or elbow test. but W# can give a more conservative estimate of the number of components to keep than Q2# . a quantity denoted Q2# is defined as: Q2# = 1 − PRESS# RESS#−1 (19) with PRESS# (RESS# ) being the value of PRESS (RESS) for the #-th component (where RESS0 is equal to the total inertia). depends only upon PRESS. Random Model As mentioned earlier. # is the residual number of degrees of freedom which is equal to the total number of degrees of freedom of the table [equal to J(I − 1)] minus the number of degrees of freedom used by the previous components. Formally. when the number of variables exceeds the number of observations. Q2# and W# will agree on the number of components to keep.g. Therefore. Only the components with Q2# greater or equal to an arbitrary critical value (usually 1 − 0. (22) Most of the time.

and how well it goes with meat (i. it appears that the first component represents characteristics that are inversely correlated with a wine’s price while the second component represents the wine’s sweetness. F1 F2 ctr1 ctr2 cos21 cos22 −1. A PCA of this table extracts four factors (with eigenvalues of 4.61 23 21 69 24 0.76.07.89 17 41 50 46 Wine 5 1. 0. The second component contrasts the wine’s hedonic qualities. July /August 2010 We can see from Figure 5 that the first component separates Wines 1 and 2 from Wines 4 and 5.08 0. The data [from Refs 40.35. factors (whose eigenvalues are clearly larger than one) and the noise (represented by factors with eigenvalues clearly smaller than one). The cosines show that Component 1 contributes highly to Wines 1 and 5. Volume 2. we examine the loadings of the variables on the first two components (see Table 8) and the circle of correlations (see Figure 6 and Table 9). and the negative important contributions are represented in bold. we can apply a varimax rotation. From this. its sugar and alcohol content. 1. To strengthen the interpretation. and how much the wine goes with meat. λ2 = 1. TABLE 7 PCA Wine Characteristics Factor scores. For convenience. Each wine is also described by its price. The examination of the values of the contributions and cosines. while the second component separates Wines 2 and 5 from Wines 1 and 4. Correlation PCA Wine 2 PC2 FIGURE 5 | PCA wine characteristics. while Component 2 contributes most to Wine 4. respectively).19 0 2 7 34 Wine 4 0. these two components account for 94% of the inertia. acidity.81.81. The factor scores for the first two components are given in Table 7 and the corresponding map is displayed in Figure 5.44] are given in Table 6. Suppose that we have five wines described by the average ratings of a set of experts on their hedonic dimension. how much the wine goes with dessert. together. Inc. Factor scores of the EXAMPLES Wine 1 Wine 2 0.76.86 32 20 78 19 Wine 3 −1.17 −0. its acidity.e. However. λ1 = 4. shown in Table 7.23 −0. then the rotation is likely to provide a solution that is more reliable than the original solution. and squared cosines of the observations on principal components 1 and 2. the wine tasters preferred inexpensive wines). To find the variables that account for these differences. then rotation will make the solution less replicable and potentially harder to interpret because the mathematical properties of PCA have been lost. which gives a clockwise rotation  2010 John Wiley & Sons. and its acidity. 443 .. its amount of alcohol. Only two components have an eigenvalue larger than 1 and. squared cosines and contributions have been multiplied by 100 and rounded.04 Wine 5 Wine 3 PC1 Wine 1 Wine 4 observations plotted on the first two components. τ 2 = 26%. if this model does not accurately represent the data. and alcohol content with its sugar content and how well it goes with dessert.WIREs Computational Statistics Principal component analysis TABLE 6 An (Artificial) Example of PCA using a Centered and Normalized Matrix Hedonic For Meat For Dessert Price Sugar Alcohol Acidity Wine 1 14 7 8 7 7 13 7 Wine 2 10 7 6 4 3 14 7 Wine 3 8 5 5 10 5 12 5 Wine 4 2 4 7 16 7 11 3 Wine 5 6 2 4 13 3 10 3 Five wines are described by seven variables [data from Ref 44].61 The positive important contributions are italicized. complements and refines this interpretation because the contributions suggest that Component 1 essentially contrasts Wines 1 and 2 with Wine 5 and that Component 2 essentially contrasts Wines 2 and 5 with Wine 4. and 0. τ 1 = 68%.55 29 17 77 17 0. From these. we see that the first component contrasts price with the wine’s hedonic qualities. contributions of the observations to the components.

and squared cosines of the observations on principal components 1 and 2 F1 635.20 −0.44 11 Blue collar −588.32 11 Blue collar 264.86 0 1 40 22 Upper class 4 Children −206. 445 .33 8 7 86 7 −139.13 −0.41 −0.41 −0.94 1. squared cosines and contributions have been multiplied by 100 and rounded.75 0 7 26 40 12.56 Upper class 2 Children Blue collar 3 Children −112.44 −0.02 −0. and the negative important contributions are represented in bold.17 8 0 98 0 Upper class 3 Children 12 89 9 4 Children −188. July /August 2010  2010 John Wiley & Sons. Q matrix).76 15 86 11 5 Children −571. For convenience. Amount of Francs Spent (per month) by Food Type.WIREs Computational Statistics Principal component analysis TABLE 10 PCA Wine Characteristics: Loadings (i.54 4 7 83 15 White collar 4 Children 57.05 9 0 100 0 Blue collar 2 Children White collar 2 Children 488. and Number of Children. Volume 2.92 3 19 57 36 Upper class 5 Children −296.04 97.46 PC 1 PC 2 0.38 −992. Inc.47 0 24 2 79 White collar 5 Children 235.89 13 5 95 3 −142.95 39.11 0.05 F2 ctr1 ctr2 cos21 cos22 −120.83 The positive important contributions are italicized. of the Variables on the First Two Components For For Hedonic Meat Dessert Price Sugar Alcohol Acidity −0.51 42.03 White collar 3 Children 485.e.21 −0. Social Class.03 TABLE 11 Average Number of Francs Spent (per month) on Different Types of Food According To Social Class and Number of Children [dataset from Ref 45] Bread Vegetables 332 Type of Food Fruit Meat Poultry Milk Wine 354 247 427 Blue collar 2 Children 428 1437 526 White collar 2 Children 293 559 388 1527 567 239 258 Upper class 2 Children 372 767 562 1948 927 235 433 Blue collar 3 Children 406 563 341 1507 544 324 407 White collar 3 Children 386 608 396 1501 558 319 363 Upper class 3 Children 438 843 689 2345 1148 243 341 Blue collar 4 Children 534 660 367 1620 638 414 407 White collar 4 Children 460 699 484 1856 762 400 416 Upper class 4 Children 385 789 621 2366 1149 304 282 Blue collar 5 Children 655 776 423 1848 759 495 486 White collar 5 Children 584 995 548 2056 893 518 319 Upper class 5 Children 515 1097 887 2630 1167 561 284 447 732 505 1887 803 358 369 107 189 165 396 250 117 72 Mean Sˆ TABLE 12 PCA Example.01 333. After Varimax Rotation.05 −0.17 144.. Factor scores.71 −0.15 33 3 97 1 520. contributions of the observations to the components.63 −0.48 0.

In CA. .095. Bread PC2 Milk Vegetable Wine Computations PC1 The first step of the analysis is to compute the probability matrix Z = N−1 X. the elements of c). the W values of 1. These are obtained in a similar way as for standard PCA. τ 2 = 8%.e. the contribution of row i to component # and of column j to component # are obtained respectively as: ctri. Note that a negative Q2 value suggests that a component should not be kept.84.WIREs Computational Statistics Principal component analysis the rows and one for the columns.24.82 and 0.. T −1 . The Q2 values of 0. and therefore we have contributions and cosines for both sets. CA will provide two sets of factor scores: one for Volume 2.. c the vector of the columns totals. PT D−1 r P = Q Dc Q = I (24) The row and (respectively) column factor scores are obtained as: . The grand total of the table is denoted N. λ1 = 3. Inc.# = 2 ri fi. F = D−1 r P! and . and Dc = diag {c}. Correlations (and circle of correlations) of the variables with Components 1 and 2. We denote r the vector of the row totals of Z..# λ# and ctrj. These factor scores are. July /August 2010 / 0 . the rows and the columns of the table have a similar role.# = 2 cj gj. indicating that both components should be kept. and dc = diag GGT dr = diag FFT (27) As for PCA. with 1 being a conformable vector of 1s).. T with . Dr = diag {r}. In contrast.023.31 and 0. Therefore. the elements of r) and weights (i. (i. G = D−1 c Q! (25) In CA. we look at Q2 and W. To confirm the number of components to keep. contributions help locating the observations or variables important for a given component. The factor scores are obtained from the following generalized SVD (see Appendix ): Meat Fruit Poultry FIGURE 9 | PCA example: Amount of Francs spent (per month) on food type by social class and number of children.45 for the first two components suggest that only the first component should be kept because only W1 is greater than 1.# λ# (26) (with ri begin the ith element of r and cj being the j-th element of c). τ 1 = 88%. . in general..Q P! Z − rcT = . It can be interpreted as a particular case of generalized PCA for which we take into account masses (for the rows) and weights (for the columns). λ2 = 290.141. we keep the name component for both PCA and CA).e. the total inertia in CA is equal to the sum of the eigenvalues. As for standard PCA. but the computations of the contributions need to integrate the values of the masses (i. r = Z1. . The vector of the squared (χ 2 ) distance from the rows and columns to their respective barycenter are obtained as: .575. the inertia of the contingency table is proportional to the χ 2 which can be computed to test the independence of the rows and the columns of this table. here. these components are often called factors rather than components. Specifically. the factor scores in CA decompose this independence χ 2 into orthogonal components (in the CA tradition. SOME EXTENSIONS OF PCA Correspondence Analysis Correspondence analysis (CA. By contrast with PCA. see Refs 46–51) is an adaptation of PCA tailored to handle nominal variables. for coherence.e. CA analyzes a contingency table and provides factor scores for both the rows and the columns of the contingency table.37 for Components 1 and 2 both exceed the critical value of 0. Notations The I × J contingency table to be analyzed is denoted X. the total inertia can also be computed equivalently as the weighted sum of the squared distances of the rows or  2010 John Wiley & Sons. scaled such that their inertia is equal to their eigenvalue (some versions of CA compute row or column factor scores normalized to unity). . 447 .

722. We can see from Figure 10 that the first component separates Proust and Zola’s pattern of punctuation from the pattern of punctuation of the other four authors.01 3.. Eigenvalues.717.88 3.23 0.25 PC 6 PC 7 & λ PRESS W −1. the variables (columns) in CA are interpreted identically to the rows.e.96 121.04 3. PRESS.27 27.992.249.08 0.00 0. The projection formula. colon.50 1.023.82 1.00 3.313. Their coordinates of the illustrative rows (denoted fsup ) and column (denoted gsup ) are obtained as: / 0−1 . Example For this example.0056.37 0. The data are shown in Table 15.26 0. and Zola contributing most to the component.com/wires/compstats TABLE 14 PCA Example: Amount of Francs Spent (per month) on Food Type by Social Class and Number of children.811.435.44 0. The factor scores of the observations (rows) and variables (columns) are shown in Tables 16 and the corresponding map is displayed in Figure 10.98 52.00 0.04 −0. interrogation mark. respectively). This shows that the second component is essential to understand Giraudoux’s pattern of punctuation (see Table 16). for Giraudoux the highest squared cosine (94%).434. The squared cosines show that the first component accounts for all of Zola’s pattern of punctuation (see Table 16).737.00 0.803.141.01 3. we use a contingency table that gives the number of punctuation marks used by the French writers Rousseau. The second component separates Giraudoux’s pattern of punctuation from that of the other authors. but the CA formula needs to take into account masses and weights.05 0. A CA of the punctuation table extracts two components which together account for 100% of the inertia (with eigenvalues of .12 0. squared cosines help locating the components important for a given observation or variable. Vo lu me 2.92 0.01 −0. the inertia can be computed as: I= L # l λ# = rT dr = cT dc (28) The squared cosine between row i and component # and column j and component # are obtained respectively as: cos2i. and semicolon). and all other punctuation marks combined (i. these terms are superfluous]. Proust.19 0.89 0.78 8.32 0.575.i and dc.13 0. Giraudoux also has the highest contribution indicating that Giraudoux’s pattern of punctuation is important for the second component. The factor scores for the variables (columns) are shown in  2010 Jo h n Wiley & So n s.45 0. supplementary or illustrative elements can be projected onto the components. Hugo.438.141.919.472. Proust.532. Specifically.# = 2 fi.00 1.02 1.446.407.31 0.84 0.023. Ju ly /Au gu s t 2010 .51 0.00 723. prior to projection.37 0.18 0.52 0. cumulative eigenvalues.22 I the columns to their respective barycenter. RESS.75 3.25 0.978.75 1. the comma. Chateaubriand.92 0.795. let iTsup being an illustrative row and jsup being an illustrative column to be projected (note that in CA.525. −1 = jTsup 1 jTsup F! (30) / 0−1 / 0−1 [note that the scalar terms iTsup 1 and jTsup 1 are used to ensure that the sum of the elements of isup 448 RESS/I or jsup is equal to one.31 259.515. And just like for PCA. In c.wiley.0178 and .# 2 dc.00 7.68 0.02 PC 4 25.01 152.j .07 PC 3 68.08 3. with Chateaubriand. being respectively the i-th element of dr and the j-th element of dc ).49 0. is called the transition formula and it is specific to CA.i and cos2j.108. is obtained for Component 2.Overview www. In contrast with PCA.435.231.00 λ λ/I PC 1 3. Q2 .00 0.j (29) 2 2 (with dr.00 3.35 1. exclamation mark.83 723.02 155. a illustrative row or column is re-scaled such that its sum is equal to one).00 ∞ λ/I RESS 0. Formally.99 0. and Giraudoux [data from Ref 52].49 0.# = 2 gj.00 54. −1 and gsup fsup = iTsup 1 iTsup G! / 0−1 .98 1. In addition.512.95 PC 5 22.430.298.249.28 4.444. Just like for PCA.24 0.382.00 3. Zola. if this is already the case.88 412.00 −0.# 2 dr. and W values for all seven components & & PRESS/I Q2 610.24 PC 2 290.58 0. This table indicates how often each writer used the period.

0002 0.0835 0. r.0004 . The centroid row.615 184. 449 .0000 cT 0.WIREs Computational Statistics Principal component analysis TABLE 15 The Punctuation Marks of Six French Writers (from Ref 52). inertia to barycenter.159 0.0012 28 29 0. λ2 = 0. Volume 2.037 0. Author’s Name Period Comma Other xi+ Rousseau 7.0016 15 4 .0000 0.10 0.983 197. cT .0029 92 8 19 0 .20 1 58 0.479 62.00 −0.05 ri × ri × F12 F22 2 dr.0835 x+j 423.226 359.367 14.84.112 6.754 565. contributions.0033 100 0 −0. and the negative important contributions are represented in bold.i 2 0.09 Zola 0.1094 Giraudoux 46.03 −0.0056 .580 803.0027 0.382 0. The vector of mass for the rows. July /August 2010 Table 17 and the corresponding map is displayed in the same map as the observations shown in Figure 10.2973 0. τ 1 = 76. and squared cosines for the rows. mass.0033 0.670 155.06 31 8 . The row labeled x+j gives the total number of times each punctuation mark was used.0055 0. mass × squared factor scores.0001 0. gives the proportion of each punctuation mark in the sample (cj = x+j /N).2522 0.4951 1.3966 0. Rousseau Chateaubriand Hugo F1 F2 −0.1094 0. Inc.0066 76 24 0.1393 0.  2010 John Wiley & Sons. PC2 Chateaubriand Proust Comma PC1 Zola Rousseau Hugo Other marks Period Giraudoux FIGURE 10 | CA punctuation. From Figure 10 we can see that the first component also separates the comma from the ‘others’ punctuation marks.026 26.974 Chateaubriand r 0.177 105.655 102. λ1 = 0.836 13.0056.16.2522 Zola 161.388 N = 142.451 0.101 12.5642 0.0189 53.0002 ctr2 6 −0. The cosines also support this interpretation because the first component accounts for 88% of the use of the comma and 91% of the use of the ‘others’ punctuation marks (see Table 17). squared cosines and contributions have been multiplied by 100 and rounded.1393 Hugo 115. τ 2 = 23.07 −0.3966 Proust 38.371 58. This is supported by the period’s high contribution to the second component and the component’s contribution to the use of the period (see Table 17).383 42. The projections of the rows and the columns are displayed in the same map.0034 6 94 — 100 100 — .1385 The column labeled xi+ gives the total number of punctuation marks used by each author.19 0.0059 93 7 0.0011 0.11 ri × ri ctr1 cos21 cos22 91 9 0. For convenience.0234 λ1 λ2 76% 24% I τ1 τ2 — The positive important contributions are italicized. The second component separates the period from both the comma and the ‘other’ punctuation marks. This is supported by the high contributions of ‘others’ and comma to the component. is the proportion of punctuation marks used by each author (ri = xi+ /N).299 119.413 198.22 Proust Giraudoux & −0. Factor scores.24 −0. N is the grand total of the data table.0189 0.948 0.926 340. TABLE 16 CA punctuation.0178 .541 59.0178.0050 0.0032 0.

variable loadings are correlations between original variables and global  2010 Jo h n Wiley & So n s.1385 0. To normalize a subtable. In order to make the subtables comparable. 450 The #-th eigenvalue of the t-th subtable is denoted [t] (# . In order to do so.e. As in standard PCA. where I is the number of observations and [t] J the number of variables of the t-th subtable.e. The number of variables in each group may differ and the nature of the variables (nominal or quantitative) can vary from one group to the other but the variables should be of the same nature in a given group.0053 0..0011 0. the square root of the first eigenvalue) is the normalizing factor which is used to divide the elements of this subtable.29 ctr2 cj cj × F12 F22 cj × 2 dc. In c.e. A PCA is then run on Z to get a global solution. So. contributions.09 — ctr1 cj × λ1 λ2 76% 24% τ1 τ2 The positive important contributions are italicized. see Refs 53–55] is used to analyze a set of observations described by several groups of variables. mass × squared factor scores. The normalized subtables are computed as: [t]Z =√ 1 [t] ( 1 × [t]X = 1 × [t] ϕ 1 [t]X (32) The normalized subtables are concatenated into an I × J matrix called the global data matrix and denoted Z. The #-th singular value of the t-th subtable is denoted [t] ϕ # .0178 0.. Each data set is called a subtable. inertia to barycenter. we first compute a PCA for this subtable. different subtables) describing the same observations.e. (31) t Each subtable is preprocessed (e. Computations The goal of MFA is to integrate different groups of variables (i. Vo lu me 2.j cos21 cos22 0.0129 91 9 — 100 100 — . formally. Ju ly /Au gu s t 2010 . The first singular value (i. denoted [t]Y. with: J= # [t] J .g.. and contributions have been multiplied by 100 and rounded. while Chateaubriand’s is characterized by his larger than average use of other types of punctuation marks than the period and the comma.2973 0.0037 0.0056 0. The total number of variables is equal to J.11 4 66 .. Zola’s œuvre is characterized by his larger than average use of the comma.. and the negative important contributions are represented in bold.wiley. The analysis derives an integrated picture of the observations and of the relationships between the groups of variables. we need to normalize them. each [t]X) is projected into the global space as a supplementary element. in general.0007 0. Multiple Factor Analysis Multiple factor analysis [MFA. To find out how each subtable performs relative to the global solution. Z is centered but it is not normalized (i.04 30 14 . Such a step is needed because the straightforward analysis obtained by concatenating all variables would be dominated by the subtable with the strongest structure (which would be the subtable with the largest first singular value). In addition. columns from different subtables have. Together.5642 0.Overview www. and squared cosines for the columns F1 Period Comma Other & F2 −0. squared cosines. each subtable (i. Note that because the subtables have previously been centered and normalized with their first singular value. Each subtable is an I ×[t] J rectangular data matrix.com/wires/compstats TABLE 17 CA punctuation.05 0. Giraudoux’s œuvre is characterized by a larger than average use of the period. Specifically.0234 I −0. Factor scores. mass.0118 0.0008 0.0044 16 84 −0. Notations The data consists of T data sets.10 −0. For convenience. centered and normalized) and the preprocessed data matrices actually used in the analysis are denoted [t]X.0061 88 12 66 20 . the pattern of distribution of the points representing the authors and the punctuation marks suggest that some of the differences in the authors’ respective styles can be attributed to differences in their use of punctuation. different norms). the first step is to make these subtables comparable.

59 0.08 1.99 0. or cross-product matrices.32 −0..80 1.20 −2. and barycentric discriminant analysis techniques such as discriminant CA [see e.10 0. Specifically PCA is obtained from the eigen-decomposition of a covariance or a correlation matrix.69 sup.92 −0.62 2.05 1. (A.89 0.23 0.27 1. Ju ly /Au gu s t 2010 .28 −0.28 −0.62 −0. Expert 3 Woody Fruity Butter Woody 0.1) When rewritten.23 0.13 0.Overview www.43 1.39 1.97 3 0.90 2.13 −2.20 0.12 −0.30 Expert 3sup [3] F2 [2] F2 2.11 1. correlations) on the Principal Components of the Global Analysis of the Original Variables.30 0.83 85 0. the most common approach defines an eigenvector of the matrix A as a vector u that satisfies the following equation: Eigenvectors and eigenvalues are numbers and vectors associated to square matrices.wiley. (A.96 0. Refs 58–61].51 −0.24 −0. Even though the eigendecomposition does not exist for all square matrices.02 −0. Expert 2. and Expert 3 for the First Two Components Expert 1sup [1] F2 Global F1 F2 Wine 1 2.95 0. The eigen-decomposition of this type of matrices is important because it is used to find Au = λu .49 1.83 Wine 6 Expert 2sup [1] F1 0.83 1.38 −0.85 −3.98 2 0.80 Vo lu me 2.00 −0.00  2010 Jo h n Wiley & So n s.02 Coffee 0.49 0. Only the First Three Dimensions are Kept Loadings with Original Variables Expert 2 Expert 1 452 PC λ τ (%) Fruity Woody 1 2.56 Wine 3 Wine 4 Wine 5 −0.99 −1. the equation becomes: (A − λI)u = 0.50 −2.44 −0.77 −1.06 −0.61 −0.14 −0.28 −1.19 0.18 Wine 2 −0.21 Roasted Vanillin 0.37 Fruit −0. e. [2] F1 0.g. Loadings (i. Together they provide the eigen-decomposition of a matrix.21 [3] F1 −0.11 0.98 1.40 0.29 0. Factor Scores for the Global Analysis. supplementary element. APPENDIX A: EIGENVECTORS AND EIGENVALUES Notations and Definition There are several ways to define eigenvectors and eigenvalues. Expert 1.. the maximum (or minimum) of functions involving these matrices.36 11 −0. Ref 47].com/wires/compstats TABLE 18 Raw Data for the Wine Example [from Ref 55 Expert 1 Wines Oak Type Fruity Woody Expert 2 Coffee Red Fruit Roasted Expert 3 Vanillin Woody Fruity Butter Woody Wine 1 1 1 6 7 2 5 7 6 3 6 7 Wine 2 2 5 3 2 4 4 4 2 4 4 3 Wine 3 2 6 1 1 5 2 1 1 7 1 1 Wine 4 2 7 1 2 7 2 1 2 2 2 2 Wine 5 1 2 5 4 3 5 6 5 2 6 6 Wine 6 1 3 4 4 3 5 4 5 1 7 5 canonical variate analysis.93 −0.95 −0.00 0.76 −2.86 −0. which analyzes the structure of this matrix.24 0.58 −1.76 −0. it has a particularly simple expression for matrices such as correlation.2) TABLE 19 MFA Wine Ratings and Oak Type.97 −0.81 −1. In c.15 −0.e. linear discriminant analysis [see.10 −0.56 TABLE 20 MFA Wine Ratings and Oak Type. covariance.g..08 0.54 −0.12 3 0.

76 Only the first three dimensions are kept.98 2 0.e. transform them such that their length is equal to one).6) Traditionally.84 0.10) for a certain matrix X (containing real numbers).WIREs Computational Statistics Principal component analysis TABLE 21 MFA Wine Ratings and Oak Type.35 0. we put the set of eigenvectors of A in a matrix denoted U.83 85 . In particular. correlation matrices. where λ is a scalar called the eigenvalue associated to the eigenvector.g. It is important to note that not all matrices have an eigen-decomposition. The important properties of a positive semidefinite matrix is that its eigenvalues are always positive or null. where the diagonal elements gives the eigenvalues (and all the other values are zeros). So.. ) * 0 1 of the matrix .14 [1] PC2 [2] PC1 [2] PC2 [3] PC1 [3] PC2 0. therefore uT u = 1 (A.94 −. correlations) on the Principal Components of the Global Analysis of the Principal Components of the Subtable PCAs Loadings with First Two Components from Subtable PCAs Expert 1 Expert 2 Expert 3 PC λ τ (%) [1] PC1 1 2. for proofs see. This implies that a positive semi-definite matrix is always symmetric. the eigenvectors and the eigenvalues of a matrix constitute the eigen-decomposition of this matrix. This is the case.05 0. A.28 −0. Loadings (i.01 For the previous example we obtain: A = U"U−1 ) *) *) * 3 −1 4 0 2 2 = 2 1 0 −1 −4 6 ) * 2 3 = (A.16 0..08 0. and has a particularly convenient form.. e. we can also say that a vector u is an eigenvector of a matrix A if the length of the vector (but not its direction) is changed when it is multiplied by A.7) A = U"U−1 .35 0. 453 . and cross-product matrices are all positive semi-definite matrices.94 −0.e.36 11 3 0.1 as: AU = "U. In a similar manner. e.4) and 0. (A. A matrix is said to be positive semidefinite when it can be obtained as the product of a matrix by its transpose. For example. The eigenvectors are also composed of real values (these last two properties are a consequence of the symmetry of the matrix..8) or also as: Volume 2.12 3 −0.  2010 John Wiley & Sons. formally.15 −0.58 0. the matrix: ) * 2 3 A= 2 1 (A. The eigenvalues are stored in a diagonal matrix (denoted ").9) 2 1 Together.99 −0.g. and that its eigenvectors are pairwise orthogonal when their eigenvalues are different. The eigen-decomposition of these matrices always exists. July /August 2010 A type of matrices used very often in statistics are called positive semi-definite.09 −0.3) has for eigenvectors: ) * 3 u1 = 2 with eigenvalue λ1 = 4 (A. Each column of U is an eigenvector of A. the matrix A is positive semi-definite if it can be obtained as: A = XXT (A.5) For most applications we normalize the eigenvectors (i. We can rewrite the Eq.13 −0. Inc. Positive (semi-)Definite Matrices ) * −1 u2 = 1 with eigenvalue λ2 = −1 (A. Also some matrices can have 0 0 imaginary eigenvalues and eigenvectors. covariance. (A.

We impose as a constraint that the coefficients of the linear combinations are finite and this constraint is.12) can be decomposed as: Roasted PC1 UT U = I with where U is the matrix storing the normalized eigenvectors.14)  ) *  = 1 0 (A. it is possible to store all the eigenvectors in an orthogonal matrix (recall that a matrix is orthogonal when the product of this matrix by its transpose is a diagonal matrix). 454 1 32 1 2 3   3 * 1  4 0  32 0 2 1 1 32 1 2 1 32 − 12 3  3 ) * 3 1 = 1 3 (c) (A. Ju ly /Au gu s t 2010 . Because eigenvectors corresponding to different eigenvalues are orthogonal. This amounts to defining the factor score matrix as:  2010 Jo h n Wiley & So n s.16) Vo lu me 2. the matrix: ) * 3 1 A= (A.13) 1 3 Expert 1 (b) (A. we want to find row factor scores. In c.wiley.com/wires/compstats (a) Refs 62. expressed as imposing to the sum of squares of the coefficients of each linear combination to be equal to unity.11) 1 32 − 12  1 32 1 2 ) 2 3 1 32 − 12 3 1 32 − 12   (A. in PCA.15) 0 1 Statistical Properties of the Eigen-decomposition The eigen-decomposition is important because it is involved in problems of optimization. if these are not normalized then UT U is a diagonal matrix. in general. express the positive semi-definite matrix A as: A = U"UT PC2 2PC2 Red Fruit Woody A = U"UT 2PC1  3 Vanillin = with  3 Expert 2 3PC2 PC2  3PC1 Butter Woody PC1 Fruity Expert 3 FIGURE 12 | MFA wine ratings and oak type. For example. We can. (A. . therefore.63). Each experts’ variables have been separated for ease of interpretation.Overview www. obtained as linear combinations of the columns of X such that these factor scores ‘explain’ as much of the variance of X as possible and such that the sets of factor scores are pairwise orthogonal. Specifically. Circles of correlations for the original variables. F = XQ. This implies the following equality: PC 2 Fruity PC1 1PC2 Coffee 1PC1 Woody U−1 = UT .

in order to give the following expression 0 / " QT Q − I (A. 1 ! = " 2 with " being the diagonal matrix of the eigenvalues of matrix AAT and of the matrix AT A (as they are the same). If A is a rectangular matrix.18) (i. The columns of Q are called the right singular vectors of A. A.g. = trace QT XT XQ − " QT Q − I (where the trace {} operator gives the sum of the diagonal elements of a square matrix). (A. Finally.23) Because " is diagonal.22) This implies also that XT X = Q"QT . and this indicates that " is the matrix of eigenvalues of the positive semi-definite matrix XT X ordered from the largest to the smallest and that Q is the matrix of eigenvectors of XT X associated to ". 455 . we find that the factor matrix has the form F = XQ. (B. Inc. ∂Q (A.17) is a diagonal matrix (i. the diagonal elements of the matrix " 2 which are the standard deviations of the factor scores are called the singular values of matrix X (see Section on Singular Value Decomposition).. / 0L = trace FT F − " QT Q − I (A.e. The SVD is a straightforward consequence of the eigen-decomposition of positive semi-definite matrices [see.11.47.64.. we first compute the derivative of L relative to Q: ∂L = 2XT XQ − 2Q". In order to find the values of Q which give the maximum values of L.19) (see Refs 62. (A. B. this is clearly an eigendecomposition problem.20) / 0.1 can also be rewritten as: A = P!QT = L # δ # p# qT# .e..21) and then set this derivative to zero: T T X XQ − Q" = 0 ⇐⇒ X XQ = Q". and q# being (respectively) the #-th singular value. e. • Q: The (normalized) eigenvectors of the matrix AT A (i.65]. (A. • !: The diagonal matrix of the singular values. 1 Incidently. this shows that the first factor scores ‘extract’ as much of the variance of the original data as possible.1) with • P: The (normalized) eigenvectors of the matrix AAT (i. The columns of P are called the left singular vectors of A. QT Q = I). Note that Eq. This shows that X can be  2010 John Wiley & Sons.25) Taking into account that the sum of the eigenvalues is equal to the trace of XT X.e. p# . its SVD gives A = P!QT . denoted ".. The solution of this problem can be obtained with the technique of the Lagrangian multipliers where the constraint from Eq.. left and right singular vectors of X. The SVD decomposes a rectangular matrix into three simple matrices: two orthogonal matrices and one diagonal matrix.18 is expressed as the multiplication with a diagonal matrix of Lagrangian multipliers.2) #=1 with L being the rank of X and δ # .WIREs Computational Statistics Principal component analysis (with the matrix Q being the matrix of coefficients of the ‘to-be-found’ linear combinations) under the constraints that T T T F F = Q X XQ (A. and so on for the remaining factors. Refs 5. PT P = I).e.24) The variance of the factors scores is equal to the eigenvalues because: FT F = QT XT XQ = QT Q"QT Q = ". July /August 2010 (A. APPENDIX B: SINGULAR VALUE DECOMPOSITION (SVD) The SVD is a generalization of the eigendecomposition. F is an orthogonal matrix) and that QT Q = I (A. This amount for defining the following equation . for details). Q is an orthonormal matrix). Volume 2. and that the second factor scores extract as much of the variance left unexplained by the first factor. (B.

. .e. It is obtained as: T A[M] = P[M] ![M] Q[M] = M # δ m pm qTm . . . Ju ly /Au gu s t 2010 .7) by substitution: 1 1 AW− 2 A = M− 2 .Q A =.3) In other words.65]. P! . if A is an I × J matrix of rank L (i. . = I. = !.. The reconstructed matrix A[M] is said to be optimal (in a least squares sense) for matrices of rank M because it satisfies the following condition: 62 6 6 6 6A − A[M] 6  2010 Jo h n Wiley & So n s. A: We verify that: . . PT M. P=Q (B. in a least square sense.63. δ m . A = P!QT with: PT P = QT Q = I. .15) Vo lu me 2.. (B. P = M− 2 P and . . The GSVD gives a weighted generalized least square estimate of a given matrix by a lower rank matrix For a given I × J matrix A.T .8) To show that Condition B. . = QT W− 12 WW− 12 Q = QT Q = I. Ref 62. . if M is the I × I matrix expressing the constraints for the rows of A and W the J × J matrix of the constraints for the columns of A. qM (B. In particular. . A = M 2 AW 2 ⇐⇒ A = M− 2 .g. In c. and so on. Generalized Singular Value Decomposition The generalized singular value decomposition (GSVD) decomposes a rectangular matrix and takes into account constraints imposed on the rows and the columns of the matrix.3 holds. e. . We begin by defining the matrix . ![M] ) the matrix made of the first M columns of P (respectively Q. the δ # p# qT# terms). the generalized singular vectors are orthogonal under the constraints imposed by M and W. respectively. . Q (B.5) The matrices of the generalized eigenvectors are obtained as: 1 . in general. B.T . . This decomposition is obtained as a result of the standard SVD. δ M } . Formally. (B. P = PT M− 2 MM− 2 P = PT P = I (B. . and. Q (B. The matrix A is now decomposed into: . A as: 1 1 1 1 .11) ' ( Q[M] = q1 . . generalizing the SVD. 1 1 = M− 2 P!QT W− 2 . . . the sum of the first M matrices gives the best reconstitution of X with a matrix of rank M.e. qm . T WQ . .wiley.4) We then compute the standard singular value decomposition as . of any rectangular matrix by another rectangular matrix of same dimensions.Overview www. T WQ . (B. P!Q (from Eq.10) Mathematical Properties of the Singular Value Decomposition It can be shown that the SVD has the important property of giving an optimal approximation of a matrix by another matrix of smaller rank [see. ! .9) . pm . The first of these matrices gives the best reconstitution of X by a rank one matrix. Precisely. .com/wires/compstats reconstituted as a sum of L rank one matrices (i. AW− 2 . the SVD gives the best approximation. but smaller rank. the sum of the first two matrices gives the best reconstitution of X with a rank two matrix. P! 456 (B.T =.. = trace 7/ [M] A−A 6 62 = min 6A − X6 X 0/ [M] A−A 0T 8 (B. . PT M. These two matrices express constraints imposed on the rows and the columns of A. (B.12) ![M] = diag {δ 1 .6).14) m (with δ m being the m-th singular value). = W− 12 Q. . we denote it by P[M] (respectively Q[M] .13) The matrix A reconstructed from the first M eigenvectors is denoted A[M] . pM (B. . A contains L singular values that are not zero). involves using two positive definite square matrices with size I × I and J × J. !): ' ( P[M] = p1 . . with: . A as: .Q A =. (B. suffice it to show that: and 1 1 .6) The diagonal matrix of singular values is simply equal to the matrix of singular values of .

Jordan C. Relationships among various kinds of eigenvalue and singular value decompositions. On the number of principal components: a test of dimensionality based on measurements of similarity between matrices. 2nd ed. Biometrika 1956. Schonemann P. 307–330. 907–912. 1829. 10.  2010 John Wiley & Sons. Tokyo: Springer Verlag. Encyclopedia of Measurement and Statistics. ed. 849–853. 2007. Notes on bias and estimation. Eigen-decomposition: eigenvalues and eigenvectors. Principal Component Analysis. 6:559–572. The Rainbow of Mathematics. eds. 2009. ed. Kruskal JB. Boyer C.e. REFERENCES 1. Quenouille M. 45–56. 8. 3. Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition(GSVD). 18. vol. Abdi H. This quantity is interpreted as the reconstructed proportion or the explained variance. 1:259–260. In: Yanai H. In: Hodson F. 43:353–360. Abdi H. 457 . Thousand Oaks: Sage Publications. Thousand Oaks: Sage Publications. 2007. The quality of the reconstruction is given by the ratio of the first M eigenvalues (i. Okada A. Sur l’´equation a` l’aide de laquelle on d´etermine les in´egalit´es s´eculaires des mouvements des plan`etes. New York: The Free Press. New York: Springer. Statistical methods of comparing different multivariate analyses of the same data. Psychometrika 1974. In: Salkind NJ.18 between the original matrix and its approximation. Pearson K. In: Govaert G. 15. 5. On lines and planes of closest fit to systems of points in space. Comput Stat Data Anal 2008. ed. 2. Takane Y. Jackson JE. Analysis of a complex of statistical variables into principal components. Dray S. 1–23. Edinburgh: Edingburh University Press. Mathemematics in the Archæological and Historical Sciences. X (B. New York: Norton. Encyclopedia of Measurement and Statistics. ed. Grattan-Guinness I. Niang N. Volume 2. 1997. New Developments in Psychometrics. In: Salkind NJ.g. Jolliffe IT. M´emoire sur les formes bilin´eaires. Oeuvres Compl`etes (II`eme S´erie). July /August 2010 16. A History of Mathematics.16) for the set of matrices X of rank smaller or equal to M. 1971. On the early history of the singular value decomposition. Principal component analysis: application to statistical process control. Kendall D. eds.. Abdi H. Inc.14. 2002. Factor analysis and principal component analysis: Bilinear methods. 11. 9. B. Wiley Interdisciplinary Reviews: Computational Statistics. Kano Y. Saporta G. Lingoes J. 304–308. London: John Wiley & Sons. 138–149. eds. Paris: Blanchard. International Encyclopedia of Statistics. 52:2228–2237. 2002. Centroid. 39:423–427. the squared singular values) to the sum of all the eigenvalues.14. 35:551–566. Tautu P. Encyclopedia of Measurement and Statistics. namely 9 . Abdi H. 19. Cauchy AL. Meulman J. Alternative measures of fit ¨ for the Schonemann-Carrol matrix fitting algorithm. Merzbach U. Shigemasu K. The quality of reconstruction can also be interpreted as the squared coefficient of correlation [precisely as the Rv coefficient.66]. In: Kruskal WH. 14.. Philos Mag A 1901. Tannur JM. SIAM Rev 1993. The GSVD minimizes an expression similar to Eq. A User’s Guide to Principal Components. 19:35–54. J Educ Psychol 1933. 9. Data Analysis. 25:417–441. B. 6. it corresponds to the inverse of the quantity minimized by Eq. Hotelling H. ACKNOWLEDGEMENT We would like to thank Yoshio Takane for his very helpful comments on a draft of this article. 1978. Ref 65. 2009. Thousand Oaks: Sage Publications. J Math Pure Appl 1874. New York: John Wiley & Sons. e. 13. Stewart GW. 20. Gower J. 12. ¨ 17. RV cofficient and congruence coefficient. New York: John Wiley & Sons. In: Salkind NJ. 7. 2007.WIREs Computational Statistics Principal component analysis for the set of matrices X of rank smaller or equal to M [see. -: A[M] = min trace M (A − X) W (A − X)T . 1991. 4. 1989.

Encyclopedia for Research Methods for the Social Sciences. 1993. Geisser S. Stone JV. Abdi H. Greenacre MJ. 1995. Wold S. 43. ed. Maydeu-Olivares A. 55. 36. Stone M. Pag`es J. Abdi H.wiley. Learning. m´ethodes. Takane Y. Technometrics 1982. 2007. In: Salkind NJ. 1984. Br J Math Stat Psychol 1961. multiple correspondence analysis and recent developments. Abdi H. ed. Valentin D. 51. Vols. eds. Jackknife. 40. 22. In: Millsap R. Lebart L. Independent Component Analysis: A Tutorial Introduction. Sheperd UL. A predictive approach to the random effect model. Valentin D. 23. 2002. Multiple Factor Analysis. Thurstone LL. interpr´etation. In: Salkind NJ. 1973. 16:39–50. ed. 1989. PLS for multivariate linear modeling. Using the bootstrap and the Rv coefficient in the multivariate context. 39. 46. 458 49. 44. Cambridge: MIT Press. 657–663. Correspondence analysis. Diaconis P. In: van de Waterbeemd H. Encyclopedia of Measurement and Statistics. Thousand Oaks.  2010 Jo h n Wiley & So n s. Neural Networks. Theory and Applications of Correspondence Analysis. Holmes S. Multivariate Behav Res 1966. Jackson DA. Paris: Dunod. Biometrika 1974. Abdi H. Paris: Dunod. The Jackknife. Escofier B. Efron B. Tomiuk MA. Vo lu me 2. 74:2204–2214. 248:116–130. Ju ly /Au gu s t 2010 . Wiley Interdisciplinary Reviews: Computational Statistics. 47. Malinowski ER. 10:405–411. Mehlman DW. 49:974–997. Chemometric Methods in Molecular Design. A low dimensional representation of faces in the higher dimensions of the space. A note on Guttman’s lower bound for the number of common factors. Jackson DA. Abdi H. Kaiser HF. 2010. 53. 48. CMBF-NSF Regional Conference Series in Applied Mathematics: New York SIAM. Encyclopedia of Measurement and Statistics. New York: John Wiley & Sons. Kelt DA. 54. 23:187–200. (In press). London: Academic Press. La r´egression pls. Correspondence Analysis in Practice. An Introduction to the Bootstrap. ed. (In press). New York: Chapman and Hall. 27. CA: Sage Publications. Technometrics 1978.Overview www. 669–702. 3rd ed. Sci Am 1983. The scree test for the number of factors. Eastment HT. Kung SY. 57. Bootstrapping principal components analysis: a comment. Faut-il pond´erer les donn´ees linguistiques. Data Analysis. 24:73–77. Stopping rules in principal aomponents analysis: a comparison of heuristical and statistical approaches. Efron B. Kaiser HF. 41. Abdi H. Diamantaras KI. Principal Component Neural Networks: Theory and Applications. J R Stat Soc [Ser A] 1974. 1947. Paris: Dunod. Jackson DA. Multiple factor analysis. Tennenhaus M. 33. Cross-validatory choice of the number of components from a principal component analysis. Pag`es J. 243–263. Handbook of Quantitative Methods in Psychology. 20:397–405. the Bootstrap and other Resampling Plans. Thousand Oaks. Encyclopedia of Research Design. Edelman B. 14:1–2. Ecology 1995. 61:101–107. Correspondence analysis. 1982. 1998. How many principal components? stopping rules for determining the number of non-trivial axes revisited. CA: Sage. FL: Chapman & Hall/CRC. 50. 45. 1 and 2. 26. New York: Nova Science. F´enelon JP. Statistique et informatique appliqu´ees. 1:245–276. vol. ed. Cattell RB. Williams LJ. IL: University of Chicago Press. 38. eds. 2007. 1975. Factor Analysis in Chemistry. Greenacre MJ. London: Sage Publications 2009. Efron B. Valentin D. Krzanowski WJ. 28. Escofier B. New York: John Wiley & Sons. Cross-validatory estimation of the number of components in factor and principal component analysis. 1999. 34. Analyses factorielles simples et multiples: objectifs. PLS-Regression. 52. Futing T. Psychometrika 1958. Wold S. Bootstrapped principal components analysis: a reply to Mehlman et al. 30. 83. Somers KM. Paris: Technip. 97–106. Encyclopedia of Research Design. Comput Stat Data Anal 1994. Ecology 1995. Tibshirani RJ. 35. Valentin D. CUMFID 1989. 56. Chicago. Brunet E. Symbolic and Numeric Knowledge. 24. 651–657. Thousand Oaks: Sage Publications. L’analyse des donn´ees. Comput Stat Data Anal 2005. CA: Sage Publications. 2010. 36:111–133. Peres-Neto PR. Multiple factor analysis (mfa). Thousand Oaks. Abdi H. 2 2010. 76:644–645. 1996. O’Toole AJ. Abdi H. Partial least square regression. ed. Weinheim: Wiley-VCH Verlag. 31. Deffenbacher KA. 1990. CA: Sage Publications 2003. The varimax criterion for analytic rotation in factor analysis. Boca Raton. Ecology 1993. Benz´ecri J-P. Multivariate analysis. In: Diday E. Thousand Oaks. Computer intensive methods in statistics. 32. Hwang H. Thousand Oaks: Sage Publications. In: Salkind NJ. 37. 2007. Cross-validatory choice and assessment of statistical prediction. 2004. 119–132. In: Salkind NJ. J Opt Soc Am [Ser A] 1993. 2nd ed. In c. Bryman A. Projection on latent structures Regression. 29. 25. Multiple correspondence analysis. In: Lewis-Beck M. 76:640–643. 18:121–140. 42. Williams LJ. 195–217.com/wires/compstats 21.

2010. Les Cahiers de. Nakache JP. The approximation of a matrix by another of a lower rank. Some applications of the singular value decomposition of a matrix. 11:823–831.. MA: Wellesley-Cambridge Press. Lorente P. CA: Sage Publications. Mahwah. Niang N. Math´ematiques pour les sciences cognitives. CA: Sage. 63. Williams LJ. Strang G. Abdi H. Benz´ecri JP. Encyclopedia of Measurement and Statistics. Inc. Volume 2. 62. Saporta G. In: Salkind NJ. (In press). Eckart C. 66. Correspondence analysis and classification. 270–275. Abdi H. In Lewis-Beck M. 60. Analyse des Donn´ees. Chastang JF. CA: Sage Publications. 61. NJ: Lawrence Erlbaum Associates. Psychometrika 1936. 64. Applied Matrix Algebra in the Statistical Sciences. Thousand Oaks. Harris RJ. 978–982.. 459 . 59. 1:211–218. A Primer of Multivariate Statistics. Thousand Oaks. 2003. 2006. Blasius J. Cambridge. 65. 2:415–534. eds. Futimg T.WIREs Computational Statistics Principal component analysis 58. In: Greenacre M. 2007. 371–392. eds. ed. Good I. In: Salkind NJ. Technometrics 1969. Abdi H. 2001. July /August 2010  2010 John Wiley & Sons. New York: Elsevier. FL: Chapman & Hall. Valentin D. 67. Encyclopedia of Research Design. Thousand Oaks. 1977. Bryman A. 2003. Multiple Correspondence Analysis and Related Methods. Factor Rotations. Discriminant correspondence analysis. Grenoble: PUG. Introduction to Linear Algebra. Young G. Boca Raton. ed. Encyclopedia for Research Methods for the Social Sciences. 1983. FURTHER READING Baskilevsky A. Aspect pronostics et th´erapeutiques de l’infarctus myocardique aigu. Barycentric discriminant analyis (BADIA). Abdi H. 2006.