You are on page 1of 7

Discriminant Analysis

 Statistical technique that allows the researcher to determine which [continuous


variable ] discriminate between 2 or more naturally occurring groups.
 Seeks to determine group membership from predictor variables

Two groups: Discriminant analysis


more than two groups: Multiple discriminant analysis (MDA)

Example:

1. a medical researcher wants to investigate which variables discriminate between


smokers, ex-smokers, and non-smokers.
2. Predicting the success or failure of a new product
3. Determine what category of credit risk person falls into.
4. classifying students as to vocational interests
5. deciding whether a student should be admitted to Graduate School.
6. predicting whether a company will success or not

Checklist of requirements:

1. The dependent variable must be categorical (nominal)


2. The independent variables are metric (continuous).
3. The size of the sample has a direct impact on the stability of the results.
a. “Rule of thumb” suggests a ratio of 20 observations for each predictor variable
b. If group sizes do vary markedly, then randomly sample from the larger group(s) to
obtain group sizes comparable to the smaller group(s).

4. They are location for the dependent categories in the initial classification are correctly
classified.
5. The groups are mutually exclusive and collectively exhaustive.
6. Linearity:
an implicit assumption is that all relationships among all pairs of predictors within each
group are linear.
7. Multivariate normality:
scores on each predictor variable are normally distributed
8. Homogeneity of variance-covariance matrices:
a. When sample sizes are unequal and small, unequal covariance matrices can adversely
affect the results of significance testing.
b. A test of this assumption can be made via Box’s M. this test is overly sensitive (increases
the probability of Type I error), an alpha level of .001 is recommended.
9. Multicollinearity:
10. Outliers:
MDA is highly sensitive to the presence of outliers. Eliminate significant outliers before
conducting MDA.

Discriminate Analysis Model:

D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk

D = discriminant score
b 's = discriminant coefficient or weight
X 's = predictor or independent variable

 The coefficients, or weights (b), are estimated so that the groups differ as much as
possible on the values of the discriminant function.
 This occurs when the ratio of between-group sum of squares to within-group
sum of squares for the discriminant scores is at a maximum.
 The independent variables should be selected based on a theoretical model or
previous research, or the experience of the researcher.
 Direct Method: All the predictors (independent variables) are included
simultaneously, and the Discriminant function is then estimated.
 Stepwise Discriminant: The predictors are entered sequentially based on their
ability to discriminate among the groups.

Determine the Significance of Discriminant Function:


 This test is based on Wilks‘ Lambda . The significance level is estimated based on
a chi-square statistic.
 If the null hypothesis is rejected, indicating significant discrimination, one can
proceed to interpret the results.

Interpret the Results:

l The interpretation of the discriminant weights, or coefficients, is similar to that in


multiple regression analysis.
l We can obtain some idea of the relative importance of the variables by examining
the absolute magnitude of the standardized discriminant function coefficients.
l Canonical loadings or discriminant loadings:

These simple correlations between each predictor and the discriminant function
represent the variance that the predictor shares with the function.

 A cutoff value (average of centroids) for the discriminant function (Y) is


determined to serve as the basis for the future classification.

Access Validity of Discriminant Analysis:

 compare the percentage of cases correctly classified by discriminant analysis to the


percentage that would be obtained by chance.
 Classification accuracy achieved by discriminant analysis should be at least 25% greater
than that obtained by chance.

SPSS:

Choose from menu [Analyze]  Classify  Discriminant…

Steps:

1. Add the (The Dependent variable) from the left pan to the “Grouping variable”
2. Press {Define Range}  put value for (Minimum) & (Maximum)
3. Add the Independent Variables from the left pan to the “independent”
4. Press {Statistics}  Check (Means , Univariate ANOVA, Box’s M)  Press {Continue}
5. Press {Classify}  Check (Replace missing values with mean) , (Separate- Groups) 
Press {Continue}
6. Press {Save}  Check (Predict group membership, Discriminant scores, Probabilities of
group membership)  Press {Continue}
7. Press {OK}

Output:

1. Check the cases count and the missing cases

2. Check the significance of all the predictors to be discriminant variables:

If any variable is not significant, it should be removed from the Discriminant function

3. Check the Box’s M test (Homogeneity of variance-covariance matrices)


4. Check the Wilk’s Lambda & the Canonical Discrimination
Canonical Discriminant Function:

a. Wilk’s lambda: shows that the discriminant function is highly significant by


the chi-square test, χ² (df = 4) = 84.646, p < .001
b. Canonical Correlation: 0.659 , Squaring this correlation (.659²) yields a value
of .434. Thus, 43.4% of the variance in the dependent variable (GROUP) can
be accounted for by this model that includes the five predictor variables.
c. Standardized Canonical Discriminant Function Coefficients:
i. Each coefficient represents the relative contribution of its associated predictor
variable to the discriminant function.
ii. predictor variables with relatively larger coefficients contribute more to the
discriminating power of the function than do variables with smaller coefficients.
d. Structure Matrix: presents the correlation (discriminant loading) of each
predictor variable with the discriminant function.
i. In general, any variables with loadings of ≥±.30 are considered substantive.

5. Check the Group Centroid :


6. Check Classification Matrix:

You might also like