SSD14 Dec 15

Correspondence
Analysis (contd.)
Recapitulation: Steps Involved
• Computing Expected frequencies 𝐸𝐸𝑖𝑖𝑖𝑖 from the 𝑛𝑛 × 𝑝𝑝 contingency table whose
(𝑖𝑖, 𝑗𝑗)-th cell contains the frequency 𝑥𝑥𝑖𝑖𝑖𝑖
• Computing the matrix 𝑛𝑛 × 𝑝𝑝 matrix ℂ whose (𝑖𝑖, 𝑗𝑗)-th element is
𝑐𝑐𝑖𝑖𝑖𝑖 = 𝑥𝑥𝑖𝑖𝑖𝑖 − 𝐸𝐸𝑖𝑖𝑖𝑖 � 𝐸𝐸𝑖𝑖𝑖𝑖 .
• Computing the row and column factors by SVD of ℂℂ′ as the left singular
vectors and the right singular vectors corresponding to the 𝑅𝑅 singular values 𝜆𝜆𝑖𝑖
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 2
Recapitulation: Steps Involved (contd.)
• Computing the proportion of variance explained by each factor based on the
corresponding singular value, as
𝜆𝜆𝑘𝑘
.
∑𝑅𝑅𝑗𝑗=1 𝜆𝜆𝑗𝑗
• If the first two factors together explain most of the variance, that is, if
𝜆𝜆1 +𝜆𝜆2
∑𝑅𝑅
is sufficiently close to 1, then a CA plot is generated for visualization.
𝑗𝑗=1 𝜆𝜆𝑗𝑗
Recapitulation: Interpretation of CA Plots
• The more variance is explained, the fewer insights will be missed.
• Proximity between row (column) labels generally indicates similarity.
• The further points are from the origin, the more discriminating they are
• The closer row/column labels are to origin, the less distinct they are likely to be.
• The association between a row and a column label is indicated by their angle to the origin
• strong association if the angle is small
• absence of association if the angle is 90 degrees
• negative association if the angle is close to 180 degrees
• The further a row and a column label are from the origin, the stronger their positive or negative
association
Functions for Implementing CA in R
Some commonly used functions (with names of Irrespective of the function used for CA, the
packages in parentheses) typical results should consist of
• corresp() (MASS) • a set of eigenvalues
• ca() (ca) • provide information of the variability in the
data.
• CA() (FactoMineR)
• a table with the row coordinates
• dudi.coa() (ade4) • provide information about the structure of the
• afc() (amap) rows in the analyzed table.
• a table with the column coordinates
• provide information about the structure of the
columns in the analyzed table
Illustration with the corresp function in MASS
• With the quine data frame in MASS

• Has 146 rows and 5 columns.
• Children from Walgett, New South Wales, Australia, grouped by Ethnic
Background, Age, Sex and Learner status and the number of days absent from
school in a particular school year was recorded.
• We use the categorical variables
• Eth (ethnic background: Aboriginal or Not, ("A" or "N"))
• Age: (Primary ("F0"), or forms "F1," "F2" or "F3“)
> ct
First canonical correlation(s): 5.317534e-02
7.558758e-18
Age scores:
The Computation [,1] [,2]
F0 -0.3344445 -2.24704927
F1 1.4246090 0.01582536
ct <- corresp(~ Age + Eth, data = F2 -1.0320002 0.40182073
F3 -0.4612728 0.31144153
quine, nf=2) Eth scores:
[,1] [,2]
ct$cor A -1.0563816 1
N 0.9466276 1
> ct$cor
[1] 5.317534e-02 7.558758e-18
The CA Plot
biplot(ct)
Multiple Correspondence Analysis
(MCA)
What is MCA
• An extension of correspondence analysis (CA) to a large set of categorical variables
• Performed by applying CA to either
• an indicator or design matrix (also called complete disjunctive table – CDT) or
• a Burt table formed from these variables.
• An indicator matrix is an individuals × variables matrix, where the rows represent
individuals, and the columns are dummy variables representing categories of the
variables.
• The Burt table is the symmetric matrix of all two-way cross-tabulations between the
categorical variables
Illustration: Indicator Matrix
SURVIVAL AGE LOCATION
Case No. NO YES LESST50 A50TO69 OVER69 TOKYO BOSTON GLAMORGAN
1 0 1 0 1 0 0 0 1
2 1 0 1 0 0 1 0 0
3 0 1 0 1 0 0 1 0
4 0 1 0 0 1 0 0 1
... . . . . . . . .
... . . . . . . . .
... . . . . . . . .
762 1 0 0 1 0 1 0 0
763 0 1 1 0 0 0 1 0
764 0 1 0 1 0 0 0 1
Illustration: Burt Table
MCA versus PCA
• MCA is a PCA applied to the complete disjunctive table (CDT).
• To do this, the CDT must be transformed as follows.
• Let 𝑦𝑦𝑖𝑖𝑖𝑖 denote the general term of the CDT. 𝑦𝑦𝑖𝑖𝑖𝑖 is equal to 1 if individual 𝑖𝑖 is in the
category 𝑘𝑘 and is 0 otherwise.
• Let 𝑝𝑝𝑘𝑘 denote the proportion of individuals in the category 𝑘𝑘.
𝑦𝑦𝑖𝑖𝑖𝑖
• The transformed CDT (TCDT) has as its general term 𝑥𝑥𝑖𝑖𝑖𝑖 = 𝑝𝑝𝑘𝑘
− 1.
• The unstandardized PCA applied to TCDT, the column 𝑘𝑘 having the weight 𝑝𝑝𝑘𝑘 ,
leads to the results of MCA.
An Example: Wine Tasting
https://personal.utdallas.edu/~herve/Abdi-MCA2007-pretty.pdf
• To evaluate the effect of the oak species on barrel-aged red Burgundy wines
• wine coming from the same harvest of Pinot Noir was aged in six different barrels made with
two types of oak.
• Wines 1, 5, and 6 were aged with the first type of oak, whereas wines 2, 3, and 4 were aged with the
second.
• Next, three wine experts were asked to choose from two to five variables to describe the wines.
• For each wine and for each variable, the expert was asked to rate the intensity. The answer given by the
expert was coded either as a binary answer (i.e., fruity vs. non-fruity) or as a ternary answer (i.e., no
vanilla, a bit of vanilla, clear smell of vanilla). Each binary answer is represented by 2 binary columns
(e.g., the answer “fruity” is represented by the pattern 1 0 and “non-fruity” is 0 1). A ternary answer is
represented by 3 binary columns (i.e., the answer “some vanilla” is represented by the pattern 0 1 0).
• The results are presented in the Table on the next slide.
Example (contd.): The Data
Example (contd.): The Results
• Eigenvalues, corrected
eigenvalues, proportion of
explained inertia and corrected
proportion of explained inertia.
• The eigenvalues of the Burt
matrix are equal to the squared
eigenvalues of the indicator
matrix
Example: The Plot
Projections on the first 2 dimensions. The eigenvalues (λ) and proportion of explained inertia (τ)
(a) The I set: rows (i.e., wines), wine ? is a supplementary element.
(b) The J set: columns (i.e., adjectives). Oak 1 and Oak 2 are supplementary elements.
PCA with discrete ordinal data
Polychoric PCA: Using the Polychoric Correlation Matrix
Polychoric Correlation
• A technique for estimating the correlation between two
hypothetical normally distributed continuous latent variables, from
two observed ordinal variables.
• Tetrachoric correlation is a special case when both observed
variables are binary.
• The names are derived from the polychoric and tetrachoric series
which are used for estimation of these correlations.
Tetrachoric Correlation
• Is the correlation coefficient between 𝑋𝑋′ and 𝑌𝑌𝑌 where 𝑋𝑋′ and 𝑌𝑌′
are latent variables which are distributed in the bivariate normal
form
• 𝑋𝑋 = 1 when 𝑋𝑋′ > 𝑐𝑐 and
𝑌𝑌 = 1 when 𝑦𝑦𝑦 > 𝑑𝑑
for some unknown constants 𝑐𝑐 and 𝑑𝑑.
Pros and Cons
• Express association in a familiar form--a correlation coefficient.
• They provide a way to separately quantify association and similarity of
category definitions.
• They do not depend on number of levels; results can be compared for
studies where the number of levels is different.
• They can be used even if different categorical variables have different
numbers of levels.
Example: Tetrachoric Correlation
• Two psychiatrists (Raters 1 and 2) making a diagnosis for
presence/ absence of Major Depression.
• Though the diagnosis is binary, depression as a trait is
continuously distributed in the population.
Example (contd.)
+---------------------------------------------------------------+
| |
| |
| | * |
| | * * |
| | * * |
| | * | * |
| | * | * |
| | ** | ** |
| | *** | *** |
| | *** | *** |
| | ***** | ***** |
| +--------------------------------+----------------> Y |
| not depressed t depressed |
| |
+---------------------------------------------------------------+
Latent continuous variable (depression severity, Y); and

discretizing threshold (t).
Example (contd.)
Rater 1
- +
+-------+-------+
- | a | b | a + b
Rater 2 +-------+-------+
+ | c | d | c + d
+-------+-------+
a + c b + d 1
Joint distribution (ellipse) of

depression severity as judged 2 x 2 cross-classification of the
by two raters (Y1 and Y2); and raters' ratings.
discretizing thresholds (t1 an
t2)
Example (contd.)
• 𝑋𝑋1 and 𝑋𝑋2 : discrete-valued variables representing observed ratings by Raters 1 and 2
• 𝑌𝑌1 and 𝑌𝑌2: latent continuous variables associated with 𝑋𝑋1 and 𝑋𝑋2 ; the pre-discretized,
continuous "impressions" of the trait level, as judged by Raters 1 and 2;
• 𝑇𝑇: the true latent trait level of a case, with which a rating or diagnosis of a case begins
• Each rater applies discretizing thresholds to this judged trait level to yield a dichotomous or
ordered-category rating (𝑋𝑋1 and 𝑋𝑋2 ).
• More formally,
𝒀𝒀𝟏𝟏 = 𝒃𝒃𝒃𝒃 + 𝒖𝒖𝟏𝟏 + 𝒆𝒆𝟏𝟏 ,
𝒀𝒀𝟐𝟐 = 𝒃𝒃𝒃𝒃 + 𝒖𝒖𝟐𝟐 + 𝒆𝒆𝟐𝟐 ,
where 𝒃𝒃 is a regression coefficient, 𝒖𝒖𝟏𝟏 and 𝒖𝒖𝟐𝟐 are the unique components of the raters’
impressions, and 𝒆𝒆𝟏𝟏 and 𝒆𝒆𝟐𝟐 represent random error or noise.
Example (contd.)
• A simpler model is
𝒀𝒀𝟏𝟏 = 𝒃𝒃𝟏𝟏 𝑻𝑻 + 𝒆𝒆𝟏𝟏 ,
𝒀𝒀𝟐𝟐 = 𝒃𝒃𝟐𝟐 𝑻𝑻 + 𝒆𝒆𝟐𝟐 .
• Assume: the latent trait 𝑇𝑇 is normally distributed.
• As scaling is arbitrary, we specify that 𝑻𝑻 ~ 𝑁𝑁(0, 1).
• Assume: Error is normally distributed (and independent both between raters and
across cases).
• Assume: 𝑣𝑣𝑣𝑣𝑣𝑣(𝒆𝒆𝟏𝟏 ) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝒆𝒆𝟏𝟏 ).
• Thus 𝒆𝒆𝟏𝟏 , 𝒆𝒆𝟏𝟏 ~ 𝑁𝑁(0, 𝜎𝜎 2 ).
Example (contd.)
• 𝒀𝒀𝟏𝟏 and 𝒀𝒀𝟐𝟐 must also be normally distributed.
• To fix the scale, specify that 𝑣𝑣𝑣𝑣𝑣𝑣(𝒀𝒀𝟏𝟏 ) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝒀𝒀𝟐𝟐 ) = 1.
• It follows that 𝑏𝑏1 = 𝑏𝑏2 = 𝑏𝑏, the correlation of both 𝒀𝒀𝟏𝟏 and 𝒀𝒀𝟐𝟐 with the
latent trait.
• We define the tetrachoric correlation, 𝑟𝑟 ∗ , as 𝑟𝑟 ∗ = 𝑏𝑏2 .
• Computational methods are available for determining 𝑟𝑟 ∗ from data.
Discrete data PCA for SES
Polychoric correlation
• Involves two ordinal variables
• Assumes an underlying bivariate normal distribution with 𝑛𝑛𝑖𝑖
cutoff points 𝛼𝛼𝑖𝑖,𝑗𝑗 , the 𝑗𝑗-th point corresponding to the 𝑖𝑖-th
variable, 𝑖𝑖 = 1,2 𝑗𝑗 = 1,2, ⋯ , 𝑛𝑛𝑖𝑖
• Is a maximum likelihood estimate of the correlation of that
underlying bivariate normal distribution: asymptotically efficient
• Requires iterative maximization, hence slow, especially in large
data sets and with many variables.
Polychoric correlation: Illustration
• 𝛼𝛼1,1 = −2
• 𝛼𝛼1,2 = −0.75
• 𝛼𝛼1,3 = 0.5
• 𝛼𝛼2,1 = −0.25
• 𝛼𝛼2,2 = 1
• The correlation of the underlying
bivariate normal is 0.2.

SSD14 Dec 15

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSD14 Dec 15

Uploaded by

Copyright:

Available Formats

Correspondence

• With the quine data frame in MASS

Latent continuous variable (depression severity, Y); and

Joint distribution (ellipse) of

You might also like