You are on page 1of 13


Discriminant Analysis
 Discriminant analysis is a classification
problem, where two or more groups or
clusters or populations are known a priori
and one or more new observations are
classified into one of the known
populations based on the measured
 Let us look at three different examples.
Example 1 - Swiss Bank Notes:
 We have two populations of bank notes, genuine, and
counterfeit. Six measures are taken on each note:
 Length
 Right-Hand Width
 Left-Hand Width
 Top Margin
 Bottom Margin
 Diagonal across the printed area
 Take a bank note of unknown origin and determine
just from these six measurements whether or not it is
real or counterfeit. Perhaps this is not as impractical
as it might sound. A more modern equivalent is a
scanner that would measure the notes automatically
and makes a decision.
Example 2 - Pottery Data:
 Pottery shards are sampled from four sites: L)
Llanedyrn, C) Caldicot, I) Ilse Thornes, and A)
Ashley Rails and the concentrations of the following
chemical constituents were measured at a laboratory
 Al: Aluminum
 Fe: Iron
 Mg: Magnesium
 Ca: Calcium
 Na: Sodium
 An archaeologist encounters a pottery specimen of
unknown origin. To determine possible trade routes,
the archaeologist may wish to classify its site of
Example 3 - Insect Data:
 Data were collected on two species of insects in the
genus Chaetocnema, (a) Ch. concinna and (b) Ch.
heikertlingeri. Three variables were measured on each
 width of the 1st joint of the tarsus (legs)
 width of the 2nd joint of the tarsus
 Our objective is to obtain a classification rule for
identifying the insect species based on these three
variables. An entomologist can identify these two
closely related species, but the differences are so
subtle that one has to have considerable experience to
be able to tell the difference. If a classification rule
may be developed, then this might be a more accurate
way to help differentiate between these two different
Learning Objectives & Outcomes
 Upon completion of this lesson, you should
be able to do the following:
 Determine whether linear of quadratic
discriminant analysis should be applied to a
given data set;
 Be able to carry out both types of
discriminant analyses using SAS/Minitab;
 Be able to apply the linear discriminant
function to classify a subject by its
 Understand how to assess the efficacy of a
discriminant analysis.
Prediction Accuracy
 A single interval variable might discriminate between groups
in an almost perfect fashion, not at all, or somewhere in
between. For example, if one wished to differentiate adult
males and females, one could collect information on how
many bras the person owned, score on the last statistics test,
and height.
 In the case of the number of bras, the discrimination would be
very good, but not perfect (some women don't own any bras,
some men do). In the case of the score on the last statistics
test, little discrimination would be possible because males and
females generally score about the same.
 In the case of height, some discrimination between adult
males and females would be possible, but it would be far
from perfect.
 In general, the larger the difference between the means of the
two groups relative to the within groups variability, the better
the discrimination between the groups.
 The following program allows the student to explore data sets
with different degrees of discrimination ability.
Frequency Polygons and Means in
Discriminant Analysis
 The figure below shows the results of the program
when the discrimination is set to low.

 The next figure shows the results when the

discrimination is set to high.
 Note that the two frequency polygons overlap a great deal
when there is little or no discriminability between groups
and hardly at all when there is high discriminability.
 In the same vein, the means are fairly similar relative to
their standard deviations in the low discriminability
condition and different in the high discriminability
Advantages of Discrimininant
 Multiple dependent variables
 Reduced error rates

 Easier interpretation of Between-group

Differences: each discriminant function
measures something unique and different.
Disadvantages of Discriminant Analysis
 Interpretation of the discriminant functions:
 Mystical like identifying factors in a factor analysis
 Assumptions:
– each discriminant function formed is distributed
normally in each group being compared.
– each discriminant function is assumed to show
approximately equal variances in each group.
– patterns of correlations between a variables are
assumed to be equivalent from one group to the next
– the relationships between variables are assumed to be
linear in all groups
– no dependent variable may be perfectly correlated to a
linear comination of other variables
– discriminant analysis is extremely sensitive to outliers.
Interpretation of discriminant
 begins with a series of univariate tests to
determine which of the original dependent
variables have contributed to the overall
significance of the discriminant functions.
 A discriminant function can be interpreted
by determining which groups it best
 Correlations between a discriminant function
and the original dependent variables can
reveal what conceptual variable the
discriminant function represents.