You are on page 1of 29

Correspondence

Analysis (contd.)
Recapitulation: Steps Involved
• Computing Expected frequencies 𝐸𝐸𝑖𝑖𝑖𝑖 from the 𝑛𝑛 × 𝑝𝑝 contingency table whose
(𝑖𝑖, 𝑗𝑗)-th cell contains the frequency 𝑥𝑥𝑖𝑖𝑖𝑖
• Computing the matrix 𝑛𝑛 × 𝑝𝑝 matrix ℂ whose (𝑖𝑖, 𝑗𝑗)-th element is
𝑐𝑐𝑖𝑖𝑖𝑖 = 𝑥𝑥𝑖𝑖𝑖𝑖 − 𝐸𝐸𝑖𝑖𝑖𝑖 � 𝐸𝐸𝑖𝑖𝑖𝑖 .
• Computing the row and column factors by SVD of ℂℂ′ as the left singular
vectors and the right singular vectors corresponding to the 𝑅𝑅 singular values 𝜆𝜆𝑖𝑖

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 2
Recapitulation: Steps Involved (contd.)
• Computing the proportion of variance explained by each factor based on the
corresponding singular value, as
𝜆𝜆𝑘𝑘
.
∑𝑅𝑅𝑗𝑗=1 𝜆𝜆𝑗𝑗
• If the first two factors together explain most of the variance, that is, if
𝜆𝜆1 +𝜆𝜆2
∑𝑅𝑅
is sufficiently close to 1, then a CA plot is generated for visualization.
𝑗𝑗=1 𝜆𝜆𝑗𝑗

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 3
Recapitulation: Interpretation of CA Plots
• The more variance is explained, the fewer insights will be missed.
• Proximity between row (column) labels generally indicates similarity.
• The further points are from the origin, the more discriminating they are
• The closer row/column labels are to origin, the less distinct they are likely to be.
• The association between a row and a column label is indicated by their angle to the origin
• strong association if the angle is small
• absence of association if the angle is 90 degrees
• negative association if the angle is close to 180 degrees
• The further a row and a column label are from the origin, the stronger their positive or negative
association

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 4
Functions for Implementing CA in R
Some commonly used functions (with names of Irrespective of the function used for CA, the
packages in parentheses) typical results should consist of
• corresp() (MASS) • a set of eigenvalues
• ca() (ca) • provide information of the variability in the
data.
• CA() (FactoMineR)
• a table with the row coordinates
• dudi.coa() (ade4) • provide information about the structure of the
• afc() (amap) rows in the analyzed table.
• a table with the column coordinates
• provide information about the structure of the
columns in the analyzed table

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 5
Illustration with the corresp function in MASS

• With the quine data frame in MASS


• Has 146 rows and 5 columns.
• Children from Walgett, New South Wales, Australia, grouped by Ethnic
Background, Age, Sex and Learner status and the number of days absent from
school in a particular school year was recorded.
• We use the categorical variables
• Eth (ethnic background: Aboriginal or Not, ("A" or "N"))
• Age: (Primary ("F0"), or forms "F1," "F2" or "F3“)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 6
> ct
First canonical correlation(s): 5.317534e-02
7.558758e-18
Age scores:
The Computation [,1] [,2]
F0 -0.3344445 -2.24704927
F1 1.4246090 0.01582536
ct <- corresp(~ Age + Eth, data = F2 -1.0320002 0.40182073
F3 -0.4612728 0.31144153
quine, nf=2) Eth scores:
[,1] [,2]
ct$cor A -1.0563816 1
N 0.9466276 1
> ct$cor
[1] 5.317534e-02 7.558758e-18

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 7
The CA Plot
biplot(ct)

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 8
Multiple Correspondence Analysis
(MCA)

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 9
What is MCA
• An extension of correspondence analysis (CA) to a large set of categorical variables
• Performed by applying CA to either
• an indicator or design matrix (also called complete disjunctive table – CDT) or
• a Burt table formed from these variables.
• An indicator matrix is an individuals × variables matrix, where the rows represent
individuals, and the columns are dummy variables representing categories of the
variables.
• The Burt table is the symmetric matrix of all two-way cross-tabulations between the
categorical variables

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 10
Illustration: Indicator Matrix
SURVIVAL AGE LOCATION
Case No. NO YES LESST50 A50TO69 OVER69 TOKYO BOSTON GLAMORGAN
1 0 1 0 1 0 0 0 1
2 1 0 1 0 0 1 0 0
3 0 1 0 1 0 0 1 0
4 0 1 0 0 1 0 0 1
... . . . . . . . .
... . . . . . . . .
... . . . . . . . .
762 1 0 0 1 0 1 0 0
763 0 1 1 0 0 0 1 0
764 0 1 0 1 0 0 0 1

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 11
Illustration: Burt Table

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 12
MCA versus PCA
• MCA is a PCA applied to the complete disjunctive table (CDT).
• To do this, the CDT must be transformed as follows.
• Let 𝑦𝑦𝑖𝑖𝑖𝑖 denote the general term of the CDT. 𝑦𝑦𝑖𝑖𝑖𝑖 is equal to 1 if individual 𝑖𝑖 is in the
category 𝑘𝑘 and is 0 otherwise.
• Let 𝑝𝑝𝑘𝑘 denote the proportion of individuals in the category 𝑘𝑘.
𝑦𝑦𝑖𝑖𝑖𝑖
• The transformed CDT (TCDT) has as its general term 𝑥𝑥𝑖𝑖𝑖𝑖 = 𝑝𝑝𝑘𝑘
− 1.
• The unstandardized PCA applied to TCDT, the column 𝑘𝑘 having the weight 𝑝𝑝𝑘𝑘 ,
leads to the results of MCA.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 13
An Example: Wine Tasting
https://personal.utdallas.edu/~herve/Abdi-MCA2007-pretty.pdf

• To evaluate the effect of the oak species on barrel-aged red Burgundy wines
• wine coming from the same harvest of Pinot Noir was aged in six different barrels made with
two types of oak.
• Wines 1, 5, and 6 were aged with the first type of oak, whereas wines 2, 3, and 4 were aged with the
second.
• Next, three wine experts were asked to choose from two to five variables to describe the wines.
• For each wine and for each variable, the expert was asked to rate the intensity. The answer given by the
expert was coded either as a binary answer (i.e., fruity vs. non-fruity) or as a ternary answer (i.e., no
vanilla, a bit of vanilla, clear smell of vanilla). Each binary answer is represented by 2 binary columns
(e.g., the answer “fruity” is represented by the pattern 1 0 and “non-fruity” is 0 1). A ternary answer is
represented by 3 binary columns (i.e., the answer “some vanilla” is represented by the pattern 0 1 0).
• The results are presented in the Table on the next slide.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 14
Example (contd.): The Data

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 15
Example (contd.): The Results
• Eigenvalues, corrected
eigenvalues, proportion of
explained inertia and corrected
proportion of explained inertia.
• The eigenvalues of the Burt
matrix are equal to the squared
eigenvalues of the indicator
matrix
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 16
Example: The Plot

Projections on the first 2 dimensions. The eigenvalues (λ) and proportion of explained inertia (τ)
(a) The I set: rows (i.e., wines), wine ? is a supplementary element.
(b) The J set: columns (i.e., adjectives). Oak 1 and Oak 2 are supplementary elements.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 17
PCA with discrete ordinal data
Polychoric PCA: Using the Polychoric Correlation Matrix

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 18
Polychoric Correlation
• A technique for estimating the correlation between two
hypothetical normally distributed continuous latent variables, from
two observed ordinal variables.
• Tetrachoric correlation is a special case when both observed
variables are binary.
• The names are derived from the polychoric and tetrachoric series
which are used for estimation of these correlations.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 19
Tetrachoric Correlation

• Is the correlation coefficient between 𝑋𝑋′ and 𝑌𝑌𝑌 where 𝑋𝑋′ and 𝑌𝑌′
are latent variables which are distributed in the bivariate normal
form
• 𝑋𝑋 = 1 when 𝑋𝑋′ > 𝑐𝑐 and
𝑌𝑌 = 1 when 𝑦𝑦𝑦 > 𝑑𝑑
for some unknown constants 𝑐𝑐 and 𝑑𝑑.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 20
Pros and Cons
• Express association in a familiar form--a correlation coefficient.
• They provide a way to separately quantify association and similarity of
category definitions.
• They do not depend on number of levels; results can be compared for
studies where the number of levels is different.
• They can be used even if different categorical variables have different
numbers of levels.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 21
Example: Tetrachoric Correlation
• Two psychiatrists (Raters 1 and 2) making a diagnosis for
presence/ absence of Major Depression.
• Though the diagnosis is binary, depression as a trait is
continuously distributed in the population.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 22
Example (contd.)
+---------------------------------------------------------------+
| |
| |
| | * |
| | * * |
| | * * |
| | * | * |
| | * | * |
| | ** | ** |
| | *** | *** |
| | *** | *** |
| | ***** | ***** |
| +--------------------------------+----------------> Y |
| not depressed t depressed |
| |
+---------------------------------------------------------------+

Latent continuous variable (depression severity, Y); and


discretizing threshold (t).

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 23
Example (contd.)

Rater 1
- +
+-------+-------+
- | a | b | a + b
Rater 2 +-------+-------+
+ | c | d | c + d
+-------+-------+
a + c b + d 1

Joint distribution (ellipse) of


depression severity as judged 2 x 2 cross-classification of the
by two raters (Y1 and Y2); and raters' ratings.
discretizing thresholds (t1 an
t2)

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 24
Example (contd.)
• 𝑋𝑋1 and 𝑋𝑋2 : discrete-valued variables representing observed ratings by Raters 1 and 2
• 𝑌𝑌1 and 𝑌𝑌2: latent continuous variables associated with 𝑋𝑋1 and 𝑋𝑋2 ; the pre-discretized,
continuous "impressions" of the trait level, as judged by Raters 1 and 2;
• 𝑇𝑇: the true latent trait level of a case, with which a rating or diagnosis of a case begins
• Each rater applies discretizing thresholds to this judged trait level to yield a dichotomous or
ordered-category rating (𝑋𝑋1 and 𝑋𝑋2 ).
• More formally,
𝒀𝒀𝟏𝟏 = 𝒃𝒃𝒃𝒃 + 𝒖𝒖𝟏𝟏 + 𝒆𝒆𝟏𝟏 ,
𝒀𝒀𝟐𝟐 = 𝒃𝒃𝒃𝒃 + 𝒖𝒖𝟐𝟐 + 𝒆𝒆𝟐𝟐 ,
where 𝒃𝒃 is a regression coefficient, 𝒖𝒖𝟏𝟏 and 𝒖𝒖𝟐𝟐 are the unique components of the raters’
impressions, and 𝒆𝒆𝟏𝟏 and 𝒆𝒆𝟐𝟐 represent random error or noise.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 25
Example (contd.)
• A simpler model is
𝒀𝒀𝟏𝟏 = 𝒃𝒃𝟏𝟏 𝑻𝑻 + 𝒆𝒆𝟏𝟏 ,
𝒀𝒀𝟐𝟐 = 𝒃𝒃𝟐𝟐 𝑻𝑻 + 𝒆𝒆𝟐𝟐 .
• Assume: the latent trait 𝑇𝑇 is normally distributed.
• As scaling is arbitrary, we specify that 𝑻𝑻 ~ 𝑁𝑁(0, 1).
• Assume: Error is normally distributed (and independent both between raters and
across cases).
• Assume: 𝑣𝑣𝑣𝑣𝑣𝑣(𝒆𝒆𝟏𝟏 ) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝒆𝒆𝟏𝟏 ).
• Thus 𝒆𝒆𝟏𝟏 , 𝒆𝒆𝟏𝟏 ~ 𝑁𝑁(0, 𝜎𝜎 2 ).
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 26
Example (contd.)
• 𝒀𝒀𝟏𝟏 and 𝒀𝒀𝟐𝟐 must also be normally distributed.
• To fix the scale, specify that 𝑣𝑣𝑣𝑣𝑣𝑣(𝒀𝒀𝟏𝟏 ) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝒀𝒀𝟐𝟐 ) = 1.
• It follows that 𝑏𝑏1 = 𝑏𝑏2 = 𝑏𝑏, the correlation of both 𝒀𝒀𝟏𝟏 and 𝒀𝒀𝟐𝟐 with the
latent trait.
• We define the tetrachoric correlation, 𝑟𝑟 ∗ , as 𝑟𝑟 ∗ = 𝑏𝑏2 .
• Computational methods are available for determining 𝑟𝑟 ∗ from data.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 27
Discrete data PCA for SES

Polychoric correlation
• Involves two ordinal variables
• Assumes an underlying bivariate normal distribution with 𝑛𝑛𝑖𝑖
cutoff points 𝛼𝛼𝑖𝑖,𝑗𝑗 , the 𝑗𝑗-th point corresponding to the 𝑖𝑖-th
variable, 𝑖𝑖 = 1,2 𝑗𝑗 = 1,2, ⋯ , 𝑛𝑛𝑖𝑖
• Is a maximum likelihood estimate of the correlation of that
underlying bivariate normal distribution: asymptotically efficient
• Requires iterative maximization, hence slow, especially in large
data sets and with many variables.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 28
Polychoric correlation: Illustration
• 𝛼𝛼1,1 = −2
• 𝛼𝛼1,2 = −0.75
• 𝛼𝛼1,3 = 0.5
• 𝛼𝛼2,1 = −0.25
• 𝛼𝛼2,2 = 1
• The correlation of the underlying
bivariate normal is 0.2.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata 2022 December 15, 2022 29

You might also like