Professional Documents
Culture Documents
Strategy For Complete Discriminant Analysis
Strategy For Complete Discriminant Analysis
Outliers
Multicollinearity
Validation
Sample problem
We will use the script for testing for normality and test substituting the
log, square root, or inverse transformation when they induce normality
in a variable that fails to satisfy the criteria for normality.
If we reject the null hypothesis and conclude that the variances are
heterogeneous, we substitute separate covariance matrices in the
classification, and evaluate whether or not our classification accuracy
is improved.
Assumption of homogeneity of variance - 2
According to the SPSS Base 10.0 Applications Guide, page 259, "cases
with large values of Mahalanobis distance from their group mean can
be identified as outliers."
Second, type in
3 in the Third, click on the
Maximum text Continue button to
box. close the dialog box.
Click on the OK
button to request the
output for the
discriminant analysis.
Sample size – ratio of cases to variables
evidence and answer
Classification Resultsb,c
Descriptives
Min. D Squared
Third, we click
on the up
arrow button to
move the
function to the
Numeric
Second, we scroll down the
Expression
list of SPSS function to
textbox.
highlight the one we need:
IDF.CHISQ(p, df)
Completing the function arguments
To complete the
request, we click on
the OK button.
The omitted outlier
Classification Resultsb,c
The values at group centroids for the The values at group centroids for
first discriminant function were positive the second discriminant function
for the group who thought we spend were positive for the group who
about the right amount of money on thought we spend too little money
welfare (.446) and negative for group on welfare (.235) and negative for
who thought we spend too little money group who thought we spend too
on welfare (-.220) and group who much money on welfare (-.362).
thought we spend too much money on This pattern distinguishes survey
welfare (-.311). This pattern respondents who thought we
distinguishes survey respondents who spend too little money on welfare
thought we spend about the right from survey respondents who
amount of money on welfare from thought we spend too much
survey respondents who thought we money on welfare.
spend too little or too much money on
welfare. The answer to the question is true.
Best subset of predictors - question
Self-employment can be
characterized as the second best
predictor.
Relationship of second independent variable – evidence
and answer: loadings on functions
Since "self-employment" is a
dichotomous variable, the mean is not
directly interpretable. Its interpretation
must take into account the coding by
which 1 corresponds to self-employed
and 2 corresponds to working for
someone else. The higher means for
survey respondents who thought we
spend too little money on welfare
(mean=1.93), when compared to the
means for survey respondents who
thought we spend too much money on
welfare (mean=1.75), implies that the
groups contained fewer survey
respondents who were self-employed
and more survey respondents who were
working for someone else.
Cas es Us ed in Analysis
WELFARE Prior Unweighted Weighted
1 TOO LITTLE .406 56 56.000
2 ABOUT RIGHT .362 50 50.000
3 TOO MUCH .232 32 32.000
Total 1.000 138 138.000
Classification Resultsb,c
Classification Resultsb,c
Yes
No
True
Complete discriminant analysis:
sample size requirements - 1
Yes
Number of cases in
smallest group greater No Inappropriate
than number of application of
independent variables? a statistic
Yes
Complete discriminant analysis:
sample size requirements - 2
Yes
Satisfies preferred DV
No
group minimum size of 20
cases? True with caution
Yes
True
Complete discriminant analysis:
assumption of normality
The variable
satisfies criteria for
No False
a normal distribution?
No
Run revised discriminant
using transformed variables
True and omitting outliers.
Complete discriminant analysis:
Model selected for interpretation
Cross-validated accuracy
for revised discriminant
analysis > accuracy of
Yes baseline by 2% or more? No
True False
Complete discriminant analysis:
Assumption of equal dispersion
No
Re-run discriminant analysis, using
separate-groups covariance matrices
True for classification
Yes
True
Complete discriminant analysis: 8
Sufficient statistically No
significant functions to False
distinguish DV groups?
Yes
No
True
Complete discriminant analysis:
groups differentiated by functions
Pattern of functions No
evaluated at centroids False
correctly interpreted?
Yes
True
Complete discriminant analysis:
individual relationships - 1
No
Best subset of predictors
correctly identified?
No
False
Yes
Relationships between No
individual IVs and DV groups False
interpreted correctly?
Yes
Complete discriminant analysis:
individual relationships - 2
Yes
No
True
Complete discriminant analysis:
classification accuracy
Cross-validated accuracy is No
25% higher than proportional False
by chance accuracy rate?
Yes
True
Complete discriminant analysis:
validation
Cross-validated accuracy is No
25% higher than proportional False
by chance accuracy rate?
Yes
True
Complete discriminant analysis:
summary of findings - 1
Overall relationship No
correctly stated (significant False
function)?
Yes
No
Individual relationship with
IV and DV correctly stated?
False
Yes
No
Classification accuracy False
supports useful model?
Yes
Complete discriminant analysis:
summary of findings - 2
No
True