You are on page 1of 22

Discriminant Analysis – Basic

Relationships

DISCRIMINANT FUNCTIONS AND SCORES

DESCRIBING RELATIONSHIPS

CLASSIFICATION ACCURACY

SAMPLE PROBLEMS

STEPS IN SOLVING PROBLEMS


Discriminant analysis

Discriminant analysis is used to analyze relationships


between a non-metric dependent variable and metric or
dichotomous independent variables.

Discriminant analysis attempts to use the independent


variables to distinguish among the groups or categories
of the dependent variable.

The usefulness of a discriminant model is based upon its


accuracy rate, or ability to predict the known group
memberships in the categories of the dependent variable.
Discriminant scores

Discriminant analysis works by creating a new


variable called the discriminant function score which
is used to predict to which group a case belongs.
Discriminant function scores are computed similarly
to factor scores, i.e. using eigenvalues. The
computations find the coefficients for the
independent variables that maximize the measure of
distance between the groups defined by the
dependent variable.
The discriminant function is similar to a regression
equation in which the independent variables are
multiplied by coefficients and summed to produce a
score.
Discriminant functions

Conceptually, we can think of the discriminant


function or equation as defining the boundary
between groups.

Discriminant scores are standardized, so that if the


score falls on one side of the boundary (standard
score less than zero, the case is predicted to be a
member of one group) and if the score falls on the
other side of the boundary (positive standard score),
it is predicted to be a member of the other group.
Number of functions

 If the dependent variable defines two groups, one statistically


significant discriminant function is required to distinguish the
groups; if the dependent variable defines three groups, two
statistically significant discriminant functions are required to
distinguish among the three groups; etc.

 If a discriminant function is able to distinguish among groups,


it must have a strong relationship to at least one of the
independent variables.

 The number of possible discriminant functions in an analysis


is limited to the smaller of the number of independent
variables or one less than the number of groups defined by the
dependent variable.
Overall test of relationship

The overall test of relationship among the independent


variables and groups defined by the dependent variable is
a series of tests that each of the functions needed to
distinguish among the groups is statistically significant.

In some analyses, we might discover that two or more of


the groups defined by the dependent variable cannot be
distinguished using the available independent variables.
While it is reasonable to interpret a solution in which
there are fewer significant discriminant functions than
the maximum number possible, our problems will
require that all of the possible discriminant functions be
significant.
Interpreting the relationship between independent
and dependent variables
The interpretative statement about the relationship
between the independent variable and the
dependent variable is a statement like: cases in
group A tended to have higher scores on variable X
than cases in group B or group C.

This interpretation is complicated by the fact that


the relationship is not direct, but operates through
the discriminant function.

Dependent variable groups are distinguished by


scores on discriminant functions, not on values of
independent variables. The scores on functions are
based on the values of the independent variables that
are multiplied by the function coefficients.
Groups, functions, and variables

 To interpret the relationship between an independent variable and


the dependent variable, we must first identify how the discriminant
functions separate the groups, and then the role of the independent
variable is for each function.

 SPSS provides a table called "Functions at Group Centroids"


(multivariate means) that indicates which groups are separated by
which functions.

 SPSS provides another table called the "Structure Matrix" which,


like its counterpart in factor analysis, identifies the loading, or
correlation, between each independent variable and each function.
This tells us which variables to interpret for each function. Each
variable is interpreted on the function that it loads most highly on.
Which independent variables to interpret

In a simultaneous discriminant analysis, in which all


independent variables are entered together, we only
interpret the relationships for independent variables that
have a loading of 0.30 or higher one or more
discriminant functions. A variable can have a high
loading on more than one function, which complicates
the interpretation. We will interpret the variable for the
function on which it has the highest loading.

In a stepwise discriminant analysis, we limit the


interpretation of relationships between independent
variables and groups defined by the dependent variable
to those independent variables that met the statistical
test for inclusion in the analysis.
Discriminant analysis and classification
Discriminant analysis consists of two stages: in the
first stage, the discriminant functions are derived; in
the second stage, the discriminant functions are used
to classify the cases.
While discriminant analysis does compute
correlation measures to estimate the strength of the
relationship, these correlations measure the
relationship between the independent variables and
the discriminant scores.
A more useful measure to assess the utility of a
discriminant model is classification accuracy, which
compares predicted group membership based on the
discriminant model to the actual, known group
membership which is the value for the dependent
variable.
Evaluating usefulness for discriminant models

The benchmark that we will use to characterize a


discriminant model as useful is a 25% improvement over
the rate of accuracy achievable by chance alone.

Even if the independent variables had no relationship to


the groups defined by the dependent variable, we would
still expect to be correct in our predictions of group
membership some percentage of the time. This is
referred to as by chance accuracy.

The estimate of by chance accuracy that we will use is the


proportional by chance accuracy rate, computed by
summing the squared percentage of cases in each group.
Comparing accuracy rates
To characterize our model as useful, we compare the
cross-validated accuracy rate produced by SPSS to 25%
more than the proportional by chance accuracy.

The cross-validated accuracy rate is a one-at-a-time hold


out method that classifies each case based on a
discriminant solution for all of the other cases in the
analysis. It is a more realistic estimate of the accuracy
rate we should expect in the population because
discriminant analysis inflates accuracy rates when the
cases classified are the same cases used to derive the
discriminant functions.

Cross-validated accuracy rates are not produced by SPSS


when separate covariance matrices are used in the
classification, which we address more next week.
Steps in discriminant analysis: 1

Question: Variables included in the analysis satisfy the level of


measurement requirements?

Dependent non-metric? No Inappropriate


Independent variables application of
metric or dichotomous? a statistic

Yes

Ordinal independent Yes


variable included in True with caution
analysis?

No

True
Steps in discriminant analysis: 2a

Question: Number of variables and cases satisfy sample size


requirements?

Run discriminant analysis, using method for including


variables identified in the research question.

Ratio of cases to No Inappropriate


independent variables at application of
least 5 to 1?
a statistic

Yes

Number of cases in
smallest group greater No Inappropriate
than number of application of
independent variables? a statistic

Yes
Steps in discriminant analysis: 2b

Question: Number of variables and cases satisfy sample size


requirements? (continued)

Satisfies preferred ratio of No


cases to IV's of 20 to 1 True with caution

Yes

Satisfies preferred DV
No
group minimum size of 20
cases? True with caution

Yes

True
Steps in discriminant analysis: 3

Question: Sufficient statistically significant functions to


differentiate among groups?

Sufficient statistically No
significant functions to False
distinguish DV groups?

Yes

Caution for ordinal variable Yes


or sample size not meeting
preferred requirements?
True with caution

No

True
Steps in discriminant analysis: 4

Question: Groups defined by dependent variable differentiated


by discriminant functions?

Pattern of functions No
evaluated at centroids False
correctly interpreted?

Yes

True
Steps in discriminant analysis: 5a

Question: Interpretation of relationship between independent


variable and dependent variable groups?

Stepwise method of entry


used to include
independent variables?
Yes

No
Best subset of predictors
correctly identified?
No

False
Yes

Relationships between No
individual IVs and DV groups False
interpreted correctly?

Yes
Steps in discriminant analysis: 6b

Question: Interpretation of relationship between independent


variable and dependent variable groups? (cont’d)

Yes

Caution for ordinal variable Yes


or sample size not meeting True with caution
preferred requirements?

No

True
Steps in discriminant analysis: 7

Question: Classification accuracy sufficient to be characterized


as a useful model?

Cross-validated accuracy is No
25% higher than proportional False
by chance accuracy rate?

Yes
Steps in discriminant analysis: 8a

Question: Summary of findings correctly stated, including


cautions?

Overall relationship No
correctly stated (significant False
function)?

Yes

No
Individual relationship with
IV and DV correctly stated?
False

Yes

No
Classification accuracy False
supports useful model?

Yes
Steps in discriminant analysis: 8b

Question: Summary of findings correctly stated, including


cautions? (continued)

Caution for ordinal variable Yes


or sample size not meeting True with caution
preferred requirements?

No

True

You might also like