Correspondence Analysis: Ata Cience and Nalytics

12/15/2019 DSA SPSS Short Course Module 9 Correspondence Analysis
Data Science and Analytics
UIT | Help Desk | Training | About Us | Publications | DSA Home
Please participate in the DSA Client Feedback Survey.
Return to the SPSS Short Course
MODULE 9
Correspondence Analysis
Correspondence analysis is appropriate when attempting to determine the proximal relationships among two or more categorical variables. Using correspondence a
categorical variables is analogous to using correlation analysis and principal components analysis for continuous or nearly continuous variables. They provide the r
as to the relationships among variables and the dimensions or eigenvectors underlying them. A key part of correspondence analysis is the multi-dimensional map p
the output. The correspondence map allows researchers to visualize the relationships among categories spatially on dimensional axes; in other words, which catego
other categories on empirically derived dimensions.
Unlike correlation, correspondence analysis is nonparametric and does not offer a statistical significance test because it is not based on a distribution (or distributio
Comparison of different models (e.g. different variables entered/removed) should be done with categorical or logistic regression. Again, correspondence analysis re
variables only. Correspondence analysis accepts nominal variables, ordinal variables, and/or discretized interval - ratio variables (e.g. quartiles), although creating d
from a continuous variable is generally discouraged.
For the duration of this tutorial we will be using the IntroPsych_Fall2009.sav file; which is fictitious and contains 1500 participants' responses on the following var
(sequential numbers which identify each participant); sex; age; family_income (four income brackets); HS_GPA (high school grade point average brackets); IQ (in
measured by the Wechsler-Adult Intelligence Scale version IV); class_standing (freshman, sophomore, junior, senior); drinks_week (number of alcoholic drinks co
week); confidence (self rating of how much confidence the student has in their ability to achieve desired grades in college courses [possible range: 0-20]); hardwor
how much effort the student puts toward their college classes [possible range: 0-20]); number_grade (numeric course grade for the Introduction to Psychology cour
(course letter grade for the Introduction to Psychology course).
1.) The first example will explore a 2 way relationship between the 4 categories of family_income and the 4 categories of class_standing. We would expect weak r
between family income and the members of each class; for example, family income should have no relation with a student being a freshman, sophomore, junior or
Begin by clicking on Analyze, Data Reduction, Correspondence Analysis...
Next, highlight / select the family_income variable and use the top arrow button to move it into the Row: box. Then, click the top Define Range... button and type a
value and type a 4 for the maximum value. Then click the Update button; then click the Continue button.
bayes.acs.unt.edu:8083/BayesContent/class/Jon/SPSS_SC/Module9/M9_Correspondence/SPSS_M9_Correspondence1.htm 1/6
Next, highlight / select the class_standing variable and use the bottom arrow button to move it to the Column: box. Then, click the Define Range... button. Next, ty
minimum value: box and type a 4 in the Maximum value: box, then click the Update button. Then, click the Continue button.
Next, click on the Statistics... button. By default the following should be selected: Correspondence table, Overview of row points, and Overview of column points.
profiles, Column profiles as well as Confidence Statistics for Row points and Column points. Then, click the Continue button.
Next, click on the Plots... button and select: Row points, Column points, Transformed row categories, and Transformed column categories. By default, the Biplot sh
already. Next, click the Continue button, then click the OK button.
The output should be similar to what is displayed below.
The Correspondence Table displays the frequency for each category of each variable; it is essentially a cross-tabulation frequency table.
The Row Profiles table displays the proportions of each column value across each row. For instance, there are 23 Freshman out of all 207 students whose family in
25000; 23 is 11.1% of 207. The Mass values across the bottom refer to the column's proportion of the total sample size. For instance, 213 freshmen represent 14.2%
student total sample.
The Column Profiles table displays the proportions of each row value down each column. For instance, 23 students' family income is 00000 - 25000 out of all 213
freshmen; 23 is 10.8% of 213. The Mass values down the right-most column represent each row's proportion of the total sample size. For instance, 207 students wh
is 00000 - 25000 represent 13.8% of the 1500 student total sample.
The Summary table displays a variety of useful information. First, we see that 3 dimensions were derived, but only two are interpretable (i.e. only two dimensions
supposedly meaningful proportion of the total inertia value). The Singular Value column displays the canonical correlation between the two variables for each dime
column displays the inertia value for each dimension and the total inertia value. The total inertia value represents the amount of variance accounted for in the origin
table by the total model. Each dimension's inertia value, thus refers to the amount of that total variance which is accounted for by each dimension. So for instance,
dimension 1 accounts for 0.8% of the 0.9% of the total variance our model explains in the original correspondence table. Stated another way; our model accounts fo
variance in the original correspondence table and of that (small) percentage, dimension 1 explains 0.8%. The chi-square test is testing the hypothesis that the total i
not different than zero. Here, our sig. or p-value is greater than 0.05 (a common cutoff value); which indicates our total inertia value is not significantly different th
mind, this chi-square is not a model fit statistic; it does not lend itself to comparing models with different variables as chi-square is often used. It is only testing the
zero. The Proportion of Inertia columns represent the proportion of total inertia for each dimension; for example, dimension 1 (.008) accounts for 86.6% of total in
Standard Deviation column refers to the standard deviation of the Singular Value(s) and the correlation column refers to the correlation between dimensions.
The Overview Row Points table displays values which allow the research to evaluate how each row contributes to the dimensions and how each dimension contribu
Mass (as mentioned above), is simply the proportion of each row to the total (1500). The Score in Dimension displays each row's score on dimension 1 and dimens
are derived based on the proportions (mass) for each cell, column, and row when compared to total sample; the scores are representative of dimensional distance an
graphs below. The Inertia column shows the amount of variance each row accounts for of the total inertia value. The contribution Of Point to Inertia of Dimension
role each row plays in each dimension; these are analogous to factor or component loadings. The contribution Of Dimension to Inertia of Point columns show the r
plays in each row -- these are not the inverse or opposite of the previous two columns because each dimension is composed of multiple points. The Total column re
each dimensions role in the row.
The Overview Column Points table displays values which allow the research to evaluate how each column contributes to the dimensions and how each dimension c
columns. The Mass (as mentioned above), is simply the proportion of each column to the total (1500). The Score in Dimension displays each column's score on dim
dimension 2. The scores are derived based on the proportions (mass) for each cell, column, and row when compared to total sample; the scores are representative o
distance and are used in the graphs below. The Inertia column shows the amount of variance each column accounts for of the total inertia value. The contribution O
Dimension columns show the role each column plays in each dimension; these are analogous to factor or component loadings. The contribution Of Dimension to In
columns show the role each dimension plays in each column -- these are not the inverse or opposite of the previous two columns because each dimension is compo
points. The Total column represents the sum of each dimensions role in the column.
The confidence points tables display the standard deviation of each point's dimension score, as well as the correlation between each point's dimension scores. Reca
themselves are displayed in previous tables (above).
The first two graphs show the score for each category of Family Income on dimension 1 and dimension 2.
The next two graphs show show the score for each category of Class Standing on dimension 1 and dimension 2.
The next two graphs show the scores for each category on both dimensions (at once) for Family Income and Class Standing.
Finally, the correspondence map shows each category score on both dimensions (at once) for both family income and class standing (at once). Now we can see the
as measures of distance on the two interpreted dimensions of our model. The scores allow us to compare categories across variables in (this case) two dimensional
correlation is a standardized measure of relationship between two (typically) continuous variables. Correspondence is a standardized measure of relationship (in sp
between categories of multiple variables (in this case two). It is important to note that the dimensions are empirically derived axes or eigenvectors and not simply th
into the analysis. So, we could say that Juniors appear to have family incomes between 50 and 75 thousand dollars. BUT, given our not significantly different from
value of 0.009, we really can not have confidence in this data's ability to offer conclusions about the general population. The model is not good at all with only 00.9
the original correspondence table accounted for by the total model (all three dimensions; only two of which were interpreted).
As with most of the tutorials / pages within this site, this page should not be considered an exhaustive review of the topic covered and it should not be considered a
good textbook.
Return to the SPSS Short Course
UNT home page
Contact Information
Jon Starkweather, PhD Jonathan.Starkweather@unt.edu 940-565-4066
Richard Herrington, PhD Richard.Herrington@unt.edu 940-565-2140
Please participate in the DSA Client Feedback Survey.
Last updated: 2018.11.27 by Jon Starkweather.
UIT | Help Desk | Training | About Us | Publications | DSA Home

Correspondence Analysis: Ata Cience and Nalytics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correspondence Analysis: Ata Cience and Nalytics

Uploaded by

Copyright:

Available Formats

12/15/2019 DSA SPSS Short Course Module 9 Correspondence Analysis

Data Science and Analytics

UIT | Help Desk | Training | About Us | Publications | DSA Home

Please participate in the DSA Client Feedback Survey.

Return to the SPSS Short Course

Begin by clicking on Analyze, Data Reduction, Correspondence Analysis...

The output should be similar to what is displayed below.

Return to the SPSS Short Course

UNT home page

Jon Starkweather, PhD Jonathan.Starkweather@unt.edu 940-565-4066

Richard Herrington, PhD Richard.Herrington@unt.edu 940-565-2140

Please participate in the DSA Client Feedback Survey.

Last updated: 2018.11.27 by Jon Starkweather.

UIT | Help Desk | Training | About Us | Publications | DSA Home

You might also like