You are on page 1of 16

Discriminant Function Analysis

Overview Contents
Discriminant function analysis, a.k.a. discriminant Key concepts
analysis or DA, is used to classify cases into the and terms
values of a categorical dependent, usually a
Tests of
dichotomy. If discriminant function analysis is
significance
effective for a set of data, the classification table of
correct and incorrect estimates will yield a high Effect size
percentage correct. Discriminant function analysis is measures
found in SPSS under Analyze, Classify, Interpreting
Discriminant. One gets DA or MDA from this same discriminant
menu selection, depending on whether the specified functions
grouping variable has two or more categories.
SPSS output
Multiple discriminant analysis (MDA) is an
extension of discriminant analysis and a cousin of Assumptions
multiple analysis of variance (MANOVA), sharing Frequently
many of the same assumptions and tests. MDA is asked
used to classify a categorical dependent which has questions
more than two categories, using as predictors a Bibliography
number of interval or dummy independent variables.
MDA is sometimes also called discriminant factor
analysis or canonical discriminant analysis.
There are several purposes for DA and/or MDA:
 To classify cases into groups using a
discriminant prediction equation.
 To test theory by observing whether
cases are classified as predicted.
 To investigate differences between or
among groups.
 To determine the most parsimonious
way to distinguish among groups.
 To determine the percent of variance
in the dependent variable explained
by the independents.
 To determine the percent of variance
in the dependent variable explained
by the independents over and above
the variance accounted for by control
variables, using sequential
discriminant analysis.
 To assess the relative importance of
the independent variables in
classifying the dependent variable.
 To discard variables which are little
related to group distinctions.
 To infer the meaning of MDA
dimensions which distinguish groups,
based on discriminant loadings.
Discriminant analysis has two steps: (1) an F test
(Wilks' lambda) is used to test if the discriminant
model as a whole is significant, and (2) if the F test
shows significance, then the individual independent
variables are assessed to see which differ
significantly in mean by group and these are used to
classify the dependent variable.
Discriminant analysis shares all the usual
assumptions of correlation, requiring linear and
homoscedastic relationships, and untruncated
interval or near interval data. Like multiple
regression, it also assumes proper model
specification (inclusion of all important
independents and exclusion of extraneous
variables). DA also assumes the dependent variable
is a true dichotomy since data which are forced into
dichotomous coding are truncated, attenuating
correlation.
DA is an earlier alternative to logistic regression,
which is now frequently used in place of DA as it
usually involves fewer violations of assumptions
(independent variables needn't be normally
distributed, linearly related, or have equal within-
group variances), is robust, handles categorical as
well as continuous variables, and has coefficients
which many find easier to interpret. Logistic
regression is preferred when data are not normal in
distribution or group sizes are very unequal.
However, discriminant analysis is preferred when
the assumptions of linear regression are met since
then DA has more stattistical power than logistic
regression (less chance of type 2 errors - accepting a
false null hypothesis). See also the separate topic on
multiple discriminant function analysis (MDA) for
dependents with more than two categories.
Key Terms and Concepts
○ Discriminating variables: These are the independent variables, also called
predictors.
○ The criterion variable. This is the dependent variable, also called the grouping
variable in SPSS. It is the object of classification efforts.
○ Discriminant function: A discriminant function, also called a canonical root, is a
latent variable which is created as a linear combination of discriminating
(independent) variables, such that L = b1x1 + b2x2 + ... + bnxn + c, where the b's are
discriminant coefficients, the x's are discriminating variables, and c is a constant.
This is analogous to multiple regression, but the b's are discriminant coefficients
which maximize the distance between the means of the criterion (dependent)
variable. Note that the foregoing assumes the discriminant function is estimated
using ordinary least-squares, the traditional method, but there is also a version
involving maximum likelihood estimation.
 Pairwise group comparisons display the distances between group means
(of the dependent variable) in the multidimensional space formed by the
discriminant functions. (Not applicable to two-group DA, where there is
only one function). The pairwise group comparisons table gives an F test
of significance (based on Mahalanobis distances) of the distance of the
group means, enabling the researcher to determine if every group mean is
significantly distant from every other group mean. Also, the magnitude of
the F values can be used to compare distances between groups in
multivariate space. In SPSS, Analyze, Classify, Discriminant; check "Use
stepwise method"; click Method, check "F for pairwise distances."
 Number of discriminant functions. There is one discriminant function
for 2-group discriminant analysis, but for higher order DA, the number of
functions (each with its own cut-off value) is the lesser of (g - 1), where g
is the number of categories in the grouping variable, or p,the number of
discriminating (independent) variables. Each discriminant function is
orthogonal to the others. A dimension is simply one of the discriminant
functions when there are more than one, in multiple discriminant analysis.
The first function maximizes the differences between the values of the
dependent variable. The second function is orthogonal to it (uncorrelated
with it) and maximizes the differences between values of the dependent
variable, controlling for the first factor. And so on. Though
mathematically different, each discriminant function is a dimension which
differentiates a case into categories of the dependent (here, religions)
based on its values on the independents. The first function will be the most
powerful differentiating dimension, but later functions may also represent
additional significant dimensions of differentiation.
 The eigenvalue, also called the characteristic root of each discriminant
function, reflects the ratio of importance of the dimensions which classify
cases of the dependent variable. There is one eigenvalue for each
discriminant function. For two-group DA, there is one discriminant
function and one eigenvalue, which accounts for 100% of the explained
variance. If there is more than one discriminant function, the first will be
the largest and most important, the second next most important in
explanatory power, and so on. The eigenvalues assess relative importance
because they reflect the percents of variance explained in the dependent
variable, cumulating to 100% for all functions. That is, the ratio of the
eigenvalues indicates the relative discriminating power of the discriminant
functions. If the ratio of two eigenvalues is 1.4, for instance, then the first
discriminant function accounts for 40% more between-group variance in
the dependent categories than does the second discriiminant function.
Eigenvalues are part of the default output in SPSS (Analyze, Classify,
Discriminant).
 The relative percentage of a discriminant function equals a
function's eigenvalue divided by the sum of all eigenvalues of all
discriminant functions in the model. Thus it is the percent of
discriminating power for the model associated with a given
discriminant function. Relative % is used to tell how many
functions are important. One may find that only the first two or so
eigenvalues are of importance.
 The canonical correlation, R*, is a measure of the association
between the groups formed by the dependent and the given
discriminant function. When R* is zero, there is no relation
between the groups and the function. When the canonical
correlation is large, there is a high correlation between the
discriminant functions and the groups. Note that relative % and R*
do not have to be correlated. R* is used to tell how much each
function is useful in determining group differences. An R* of 1.0
indicates that all of the variability in the discriminant scores can be
accounted for by that dimension. Note that for two-group DA, the
canonical correlation is equivalent to the Pearsonian correlation of
the discriminant scores with the grouping variable.
○ The discriminant score, also called the DA score, is the value resulting from
applying a discriminant function formula to the data for a given case. The Z score
is the discriminant score for standardized data. To get discriminant scores in
SPSS, select Analyze, Classify, Discriminant; click the Save button; check
"Discriminant scores". One can also view the discriminant scores by clicking the
Classify button and checking "Casewise results."
○ Cutoff: If the discriminant score of the function is less than or equal to the cutoff,
the case is classed as 0, or if above it is classed as 1. When group sizes are equal,
the cutoff is the mean of the two centroids (for two-group DA). If the groups are
unequal, the cutoff is the weighted mean.
○ Unstandardized discriminant coefficients are used in the formula for making
the classifications in DA, much as b coefficients are used in regression in making
predictions. The constant plus the sum of products of the unstandardized
coefficients with the observations yields the discriminant scores. That is,
discriminant coefficients are the regression-like b coefficients in the discriminant
function, in the form L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable
formed by the discriminant function, the b's are discriminant coefficients, the x's
are discriminating variables, and c is a constant. The discriminant function
coefficients are partial coefficients, reflecting the unique contribution of each
variable to the classification of the criterion variable. The standardized
discriminant coefficients, like beta weights in regression, are used to assess the
relative classifying importance of the independent variables.
If one clicks the Statistics button in SPSS after running discriminant analysis and
then checks "Unstandardized coefficients," then SPSS output will include the
unstandardized discriminant coefficients.
○ Standardized discriminant coefficients, also termed the standardized canonical
discriminant function coefficients, are used to compare the relative importance of
the independent variables, much as beta weights are used in regression. Note that
importance is assessed relative to the model being analyzed. Addition or deletion
of variables in the model can change discriminant coefficients markedly.
As with regression, since these are partial coefficients, only the unique
explanation of each independent is being compared, not considering any shared
explanation. Also, if there are more than two groups of the dependent, the
standardized discriminant coefficients do not tell the researcher between which
groups the variable is most or least discriminating. For this purpose, group
centroids and factor structure are examined. The standardized discriminant
coefficients appear by default in SPSS (Analyze, Classify, Discriminant) in a
table of "Standardized Canonical Discriminant Function Coefficients". In MDA,
there will be as many sets of coefficients as there are discriminant functions
(dimensions).
○ Functions at group centroids are the mean discriminant scores for each of the
dependent variable categories for each of the discriminant functions in MDA.
Two-group discriminant analysis has two centroids, one for each group. We want
the means to be well apart to show the discriminant function is clearly
discriminating. The closer the means, the more errors of classification there likely
will be. SPSS generates a table of "Functions at group centroids" by default when
Analyze, Classify, Discriminant is invoked.
 Discriminant function plots, also called canonical plots, can be created in
which the two axes are two of the discriminant functions (the dimensional
meaning of which is determined by looking at the structure coefficients,
discussed above), and circles within the plot locate the centroids of each
category being analyzed. The farther apart one point is from another on
the plot, the more the dimension represented by that axis differentiates
those two groups. Thus these plots depict discriminant function space.
For instance, occupational groups might be located in a space representing
educational and motivational dimensions. In the Plots area of the Classify
button, one can select Separate-group plots, a Combined-group plot, or a
territorial map. Separate and combined group plots show where cases are
located in the property space formed by two functions (dimensions). By
default, SPSS uses the first two functions. The territorial map shows inter-
group distances on the discriminant functions. Each function has a
numeric symbol: 1, 2, 3, etc. Cases falling within the boundaries formed
by the 2's, for instance, are classified as 2. The individual cases are not
shown in territorial maps under SPSS, however.
○ Tests of significance
 (Model) Wilks' lambda is used to test the significance of the discriminant
function as a whole. In SPSS, the "Wilks' Lambda" table will have a
column labeled "Test of Function(s)" and a row labeled "1 through n"
(where n is the number of discriminant functions). The "Sig." level for this
row is the significance level of the discriminant function as a whole. The
researcher wants a finding of significance, and the larger the lambda, the
more likely it is significant. A significant lambda means one can reject the
null hypothesis that the two groups have the same mean discriminant
function scores and conclude the model is discriminating. Wilks's lambda
is part of the default output in SPSS (Analyze, Classify, Discriminant). In
SPSS, this use of Wilks' lambda is in the "Wilks' lambda" table of the
output section on "Summary of Canonical Discriminant Functions."
 Stepwise Wilks' lambda appears in the "Variables in the Analysis"
table of stepwise DA output, after the "Sig. of F. to Remove"
column. The Step 1 model will have no entry as removing the first
variable is removing the only variable. The Step 2 model will have
two predictors, each with a Wilks' lambda coefficient. which
represents what model Wilks' lambda would be if that variable
were dropped, leaving only the other one. If V1 is entered at Step 1
and V2 is entered at Step 2, then the Wilks' lambda in the
"Variables in the Analysis" table for V2 will be identical to the
model Wilks' lambda in the ""Wilks' Lambda" table for Step 1,
since dropping it would reduce the model to the Step 1 model. The
more important the variable in classifying the grouping variable,
the higher its stepwise Wilks' lambda.
Stepwise Wilks' lambda also appears in the "Variables Not in the
Analysis" table of stepwise DA output, after the "Sig. of F to
Enter" column. Here the criterion is reversed: the variable with the
lowest stepwise Wilks' lambda is the best candidate to add to the
model in the next step.
 (Model) Wilks' lambda difference tests are also used in a second context
to assess the improvement in classification when using sequential
discriminant analysis. There is an F test of significance of the ratio of two
Wilks' lambdas, such as between a first one for a set of control variables as
predictors and a second one for a model including both control variables
and independent variables of interest. The second lambda is divided by the
first (where the first is the model with fewer predictors) and an
approximate F value for this ratio is found using calculations reproduced
in Tabachnick and Fidell (2001: 491).
 ANOVA table for discriminant scores is another overall test of the
DA model. It is an F test, where a "Sig." p value < .05 means the
model differentiates discriminant scores between the groups
significantly better than chance (than a model with just the
constant). It is obtained in SPSS by asking for Analyze, Compare
Means, One-Way ANOVA, using discriminant scores from DA
(which SPSS will label Dis1_1 or similar) as dependent.
 (Variable) Wilks' lambda also can be used to test which independents
contribute significantly to the discrimiinant function. The smaller the
variable Wilks' lambda for an independent variable, the more that variable
contributes to the discriminant function. Lambda varies from 0 to 1, with 0
meaning group means differ (thus the more the variable differentiates the
groups), and 1 meaning all group means are the same. The F test of
Wilks's lambda shows which variables' contributions are significant.
Wilks's lambda is sometimes called the U statistic. In SPSS, this use of
Wilks' lambda is in the "Tests of equality of group means" table in DA
output.
 Dichotomous independents are more accurately tested with a chi-
square test than with Wilks' lambda for this purpose.
○ Effect size measures
 Classification functions: There are multiple methods of actually
classifying cases in MDA. Simple classification, also known as Fisher's
classification function, simply uses the unstandardized discriminant
coefficients. Generalized distance functions are based on the Mahalanobis
distance, D-square, of each case to each of the group centroids. K-nearest
neighbor discriminant analysis (KNN) is a nonparametric method which
assigns a new case to the group to which its k neighest neighbors also
belong. The KNN method is popular when there are inadequate data to
define the sample means and covariance matrices. There are other
methods of classification.
 The classification table, also called a classification matrix, or a confusion,
assignment, or prediction matrix or table, is used to assess the
performance of DA. This is simply a table in which the rows are the
observed categories of the dependent and the columns are the predicted
categories of the dependents. When prediction is perfect, all cases will lie
on the diagonal. The percentage of cases on the diagonal is the percentage
of correct classifications. This percentage is called the hit ratio.
 Expected hit ratio. Note that the hit ratio must be compared not to
zero but to the percent that would have been correctly classified by
chance alone. For two-group discriminant analysis with a 50-50
split in the dependent variable, the expected percent is 50%. For
unequally split 2-way groups of different sizes, the expected
percent is computed in the "Prior Probabilities for Groups" table in
SPSS, by multiplying the prior probabilities times the group size,
summing for all groups, and dividing the sum by N. If group sizes
are known a priori, the best strategy by chance is to pick the largest
group for all cases, so the expected percent is then the largest
group size divided by N.
 Cross-validation. Leave-one-out classification is available as a
form of cross-validation of the classification table. Under this
option, each case is classified using a discriminant function based
on all cases except the given case. This is thought to give a better
estimate of what classificiation results would be in the population.
In SPSS, select Analyze, Classify, Discriminant; select variables;
click Classify; select Leave-one-out classification; Continue; OK.
 Measures of association can be computed by the crosstabs
procedure in SPSS if the researcher saves the predicted group
membership for all cases. In SPSS, select Analyze, Classify,
Discriminant; select variables; click Save; select Discriminant
scores; Continue; OK.
 Mahalanobis D-Square, Rao's V, Hotelling's trace, Pillai's trace, and
Roys gcr are indexes other than Wilks' lambda of the extent to which the
discriminant functions discriminate between criterion groups. Each has an
associated significance test. A measure from this group is sometimes used
in stepwise discriminant analysis to determine if adding an independent
variable to the model will significantly improve classification of the
dependent variable. SPSS uses Wilks' lambda by default but offers
Mahalanobis distance, Rao's V, unexplained variance, and smallest F ratio
also.
 Canonical correlation, Rc: Squared canonical correlation, Rc2, is the
percent of variation in the dependent discriminated by the set of
independents in DA or MDA. The canonical correlation of each
discriminant function is also the correlation of that function with the
discriminant scores. A canonical correlation close to 1 means that nearly
all the variance in the discriminant scores can be attributed to group
differences. The canonical correlation of any discriminant function is
displayed in SPSS by default as a column in the "Eigenvalues" output
table. Note the canonical correlations are not the same as the correlations
in the structure matrix, discussed below.
○ Interpreting discriminant functions
 Structure coefficients and structure matrix. Structure coefficients, also
called structure correlations or discriminant loadings, are the
correlations between a given independent variable and the discriminant
scores associated with a given discriminant function. They are used to tell
how closely a variable is related to each function in MDA. Looking at all
the structure coefficients for a function allows the researcher to assign a
label to the dimension it measures, much like factor loadings in factor
analysis. A table of structure coefficients of each variable with each
discriminant function is called a canonical structure matrix or factor
structure matrix. The structure coefficients are whole (not partial)
coefficients, similar to correlation coefficients, and reflect the uncontrolled
association of the discriminating variables with the criterion variable,
whereas the discriminant coefficients are partial coefficients reflecting the
unique, controlled association of the discriminating variables with the
criterion variable, controlling for other variables in the equation.
Technically, structure coefficients are pooled within-groups correlations
between the independent variables and the standardized canonical
discriminant functions. When the dependent has more than two categories
there will be more than one discriminant function. In that case, there will
be multiple columns in the table, one for each function. The correlations
then serve like factor loadings in factor analysis -- by considering the set
of variables that load most heavily on a given dimension, the researcher
may infer a suitable label for that dimension. The structure matrix
correlations appear in SPSS output in the "Structure Matrix" table,
produced by default under Analyze, Classify, Discriminant.
Thus for two-group DA, the structure coefficients show the order of
importance of the discriminating variables by total correlation, whereas
the standardized discriminant coefficients show the order of importance by
unique contribution. The sign of the structure coefficient also shows the
direction of the relationship. For multiple discriminant analysis, the
structure coefficients additionally allow the researcher to see the relative
importance of each independent variable on each dimension.
 Structure coefficients vs. standardized discriminant function
coefficients. The standardized discriminant function coefficients indicate
the semi-partial contribution (the unique, controlled association) of each
variable to the discriminant function(s), controlling the independent but
not the dependent for other independents entered in the equation (just as
regression coefficients are semi-partial coefficients). In contrast, structure
coefficients are whole (not partial) coefficients, similar to correlation
coefficients, and reflect the uncontrolled association of the discriminant
scores with the criterion variable. That is, the structure coefficients
indicate the simple correlations between the variables and the discriminant
function or functions. The structure coefficients should be used to assign
meaningful labels to the discriminant functions. The standardized
discriminant function coefficients should be used to assess the importance
of each independent variable's unique contribution to the discriminant
function.
 Mahalanobis distances are used in analyzing cases in discriminant
analysis. For instance, one might wish to analyze a new, unknown set of
cases in comparison to an existing set of known cases. Mahalanobis
distance is the distance between a case and the centroid for each group (of
the dependent) in attribute space (n-dimensional space defined by n
variables). A case will have one Mahalanobis distance for each group, and
it will be classified as belonging to the group for which its Mahalanobis
distance is smallest. Thus, the smaller the Mahalanobis distance, the closer
the case is to the group centroid and the more likely it is to be classed as
belonging to that group. Since Mahalanobis distance is measured in terms
of standard deviations from the centroid, therefore a case which is more
than 1.96 Mahalanobis distance units from the centroid has less than .05
chance of belonging to the group represented by the centroid; 3 units
would likewise correspond to less than .01 chance. SPSS reports squared
Mahalanobis distance: click the Classify button and then check "Casewise
results."
 Wilks's lambda tests the significance of each discriminant function in
MDA -- specifically the significance of the eignevalue for a given
function. It is a measure of the difference between groups of the centroid
(vector) of means on the independent variables. The smaller the lambda,
the greater the differences. Lambda varies from 0 to 1, with 0 meaning
group means differ (thus the more the variable differentiates the groups),
and 1 meaning all group means are the same. The Bartlett's V
transformation of lambda is then used to compute the significance of
lambda. Wilks's lambda is used, in conjunction with Bartlett's V, as a
multivariate significance test of mean differences in MDA, for the case of
multiple interval independents and multiple (>2) groups formed by the
dependent. Wilks's lambda is sometimes called the U statistic.
○ Validation
 A hold-out sample is often used for validation of the discriminant
function. This is a split halves test, were a portion of the cases are assigned
to the analysis sample for purposes of training the discriminant function,
then it is validated by assessing its performance on the remaining cases in
the hold-out sample.

SPSS Output Examples


○ Discriminant Function Analysis (two groups)
○ Multiple Discriminant Function Analysis (three groups)

Assumptions
○ Proper specification. The discriminant coefficients can change substantially if
variables are added to or subtracted from the model.
○ True categorical dependents. The dependent variable is a true dichotomy. When
the range of a true underlying continuous variable is constrained to form a
dichotomy, correlation is attenuated (biased toward underestimation). One should
never dichotomize a continuous variable simply for the purpose of applying
discriminant function analysis. To a progressively lesser extent, the same
considerations apply to trichotomies and higher. All cases must belong to a group
formed by the dependent variable. The groups must be mutually exclusive, with
every case belonging to only one group.
○ Independence. All cases must be independent. Thus one cannot have correlated
data (not before-after, panel, or matched pairs data, for instance).
○ No lopsided splits. Group sizes of the dependent are not grossly different. If this
assumption is violated, logistic regression is preferred. Some authors use 90:10 or
worse as the criterion.
○ Adequate sample size.There must be at least two cases for each category of the
dependent and the maximum number of independents is sample size minus 2.
However, it is recommended that there be at least four or five times as many cases
as independent variables.
○ Interval data. The independent variable is or variables are interval. As with other
members of the regression family, dichotomies, dummy variables, and ordinal
variables with at least 5 categories are commonly used as well.
○ Variance. No independents have a zero standard deviation in one or more of the
groups formed by the dependent.
○ Random error Errors (residuals) are randomly distributed.
○ Homogeneity of variances (homoscedasticity): Within each group formed by the
dependent, the variance of each interval independent should be similar between
groups. That is, the independents may (and will) have different variances one
from another, but for the same independent, the groups formed by the dependent
should have similar variances and means on that independent. Discriminant
analysis is highly sensitive to outliers. Lack of homogeneity of variances may
indicate the presence of outliers in one or more groups. Lack of homogeneity of
variances will mean significance tests are unreliable, especially if sample size is
small and the split of the dependent variable is very uneven. Lack of homogeneity
of variances and presence of outliers can be evaluated through scatterplots of
variables.
○ Homogeneity of covariances/correlations: within each group formed by the
dependent, the covariance/correlation between any two predictor variables should
be similar to the corresponding covariance/correlation in other groups. That is,
each group has a similar covariance/correlation matrix as reflected in the log
determinants (see "Large samples" discussion above).
 Box's M tests the null hypothesis that the covariance matrices do not
differ between groups formed by the dependent. This is an assumption of
discriminant analysis. Box's M uses the F distribution. If p(M)<.05, then
the variances are significantly different. The researcher wants M not to be
significant, rejecting the null hypothesis that the variances of the
independents among categories of the categorical dependent are not
homogenous. That is, the researcher wants this test not to be significant, so
as to accept the null hypothesis that the groups do not differ. Thus, the
probability value of this F should be greater than .05 to demonstrate that
the assumption of homoscedasticity is upheld. This test is very sensitive to
meeting also the assumption of multivariate normality. Note, though, that
DA can be robust even when this assumption is violated. In SPSS, select
Analyze, Classify, Discriminant; click the Statistics button; check Box's
M.
Large samples. Where sample size is large, even small differences in
covariance matrices may be found significant by Box's M, when in fact no
substantial problem of violation of assumptions exists. Therefore, the
researcher should also look at the log determinants of the group
covariance matrices, which are printed along with Box's M. If the group
log determinants are similar, then a significant Box's M for a large sample
is usually ignored. Dissimilar log determinants indicates violation of the
assumption of equal variance covariance matrices, leading to greater
classification errors (specifically, DA will tend to classify cases in the
group with the larger variability). When violation occurs, quadratic DA
may be used (not support by SPSS as of Version 13).
○ Absence of perfect multicollinearity. If one independents is very highly correlated
with another, or one is a function (ex., the sum) of other independents, then the
tolerance value for that variable will approach 0 and the matrix will not have a
unique discriminant solution. Such a matrix is said to be ill-conditioned.
Tolerance is discussed in the section on regression.
○ Low multicollinearity of the independents. To the extent that independents are
correlated, the standardized discriminant function coefficients will not reliably
assess the relative importance of the predictor variables. In SPSS, one check on
multicollinearity is looking at the "pooled within-groups correlation matrix,"
which is output when one checks "Within-groups correlation" from the Statistics
button in the DA dialog. "Pooled" refers to averaging across groups formed by the
dependent. Note that pooled correlation can be very different from normal (total)
correlation when two variables are less correlated within groups than between
groups (ex., race and illiteracy are little correlated within region, but the total r is
high because there are proportionately more blacks in the South where illiteracy is
high). When assessing the correlation matrix for multicollinearitym a rule of
thumb is no r > .90 and not several > .80.
○ Assumes linearity (does not take into account exponential terms unless such
transformed variables are added as additional independents).
○ Assumes additivity (does not take into account interaction terms unless new
crossproduct variables are added as additional independents).
○ For purposes of significance testing, predictor variables follow multivariate
normal distributions. That is, each predictor variable has a normal distribution
about fixed values of all the other independents. As a rule of thumb, discriminant
analysis will be robust against violation of this assumption if the smallest group
has more than 20 cases and the number of independents is fewer than six. When
non-normality is caused by outliers rather than skewness, violation of this
assumption has more serious consequences as DA is highly sensitive to outliers. If
this assumption is violated, logistic regression is preferred.

Frequently Asked Questions


○ Isn't discriminant analysis the same as cluster analysis?
No. In discriminant analysis the groups (clusters) are determined
beforehand and the object is to determine the linear combination of
independent variables which best discriminates among the groups. In
cluster analysis the groups (clusters) are not predetermined and in fact the
object is to determine the best way in which cases may be clustered into
groups.
○ When does the discriminant function have no constant term?
When the data are standardized or are deviations from the mean.
○ How important is it that the assumptions of homogeneity of variances and of
multivariate normal distribution be met?
Lachenbruch (1975) indicates that DA is relatively robust even when there
are modest violations of these assumptions. Klecka (1980) points out that
dichotomous variables, which often violate multivariate normality, are not
likely to affect conclusions based on DA.
○ In DA, how can you assess the relative importance of the discriminating
variables?
The same as in regression, by comparing beta weights, which are the
standardized discriminant coefficients. If not output directly by one's
statistical package (SPSS does), one may obtain beta weights by running
DA on standardized scores. That is, betas are standardized discriminant
function coefficients. The ratio of the betas is the relative contribution of
each variable. Note that the betas will change if variables are added or
deleted from the equation.
Dummy variables. As in regression, dummy variables must be assessed as
a group, not on the basis of individual beta weights. This is done through
hierarchical discriminant analysis, running the analysis first with, then
without the set of dummies. The difference in the squared canonical
correlation indicates the explanatory effect of the set of dummies.
Alternatively, for interval independents, one can correlate the discriminant
function scores with the independents. The discriminating variables which
matter the most to a particular function will be correlated highest with the
DA scores.
○ In DA, how can you assess the importance of a set of discriminating variables
over and above a set of control variables? (What is sequential discriminant
analysis?)
As in sequential regression, in sequential discriminant analysis, control
variables may be entered as independent variables separately first. In a
second run, the discriminating variables of interest may be entered. . The
difference in the squared canonical correlation indicates the explanatory
effect of discriminating variables over and above the set of control
variables. Alternatively, one could compare the hit rate in the two
classification tables.
○ What is the maximum likelihood estimation method in discriminant analysis
(logistic discriminate function analysis)?
Using MLE, a discriminant function is a function of the form T = k1X1 +
k2X2 + ... + knXn, where X1...Xn are the differences between the two
groups on the ith independent variable, k1...kn are the logit coefficients,
and T is a function which classes the case into group 0 or group 1. If the
data are unstandardized, there is also a constant term. The discriminant
function arrives at coefficients which set the highest possible ratio of
between-group to within-groups variance (similar to the ANOVA F test,
except that in DA the group variable is the dependent rather than the
independent). This method, called logistic discriminant function
analysis, is supported by SPSS.
○ What are Fisher's linear discriminant functions?
The classical method of discriminant classification calculated one set of
discriminant function coefficients for each dependent category, using
these to make the classifications. SPSS still outputs these coefficients if
you check the "Fisher's" box under the Statistics option in discriminant
function analysis. This outputs a table with the discriminant functions
(dimensions) as columns and the independent variables plus constant as
rows. The Fisher coefficients are used down the columns to compute a
discriminant score for each dimension and the case is classified in the
group generating the highest score. This method gives the same results as
using the discriminant function scores but is easier to compute.
○ What is stepwise DA?
Stepwise procedures select the most correlated independent first, remove
the variance in the dependent, then select the second independent which
most correlates with the remaining variance in the dependent, and so on
until selection of an additional independent does not increase the R-
squared (in DA, canonical R-squared) by a significant amount (usually
signif=.05). As in multiple regression, there are both forward (adding
variables) and backward (removing variables) stepwise versions.
In SPSS there are several available criteria for entering or removing new
variables at each step: Wilks’ lambda, unexplained variance, Mahalanobis’
distance, smallest F ratio, and Rao’s V. The researcher typically sets the
critical significance level by setting the "F to remove" in most statistical
packages.
Stepwise procedures are sometimes said to eliminate the problem of
multicollinearity, but this is misleading. The stepwise procedure uses an
intelligent criterion to set order, but it certainly does not eliminate the
problem of multicollinearity. To the extent that independents are highly
intercorrelated, the standard errors of their standardized discriminant
coefficients will be inflated and it will be difficult to assess the relative
importance of the independent variables.
The researcher should keep in mind that the stepwise method capitalizes
on chance associations and thus significance levels are worse (that is,
numerically higher) than the true alpha significance rate reported. Thus a
reported significance level of .05 may correspond to a true alpha rate of .
10 or worse. For this reason, if stepwise discriminant analysis is
employed, use of cross-validation is recommended. In the split halves
method, the original dataset is split in two at random and one half is used
to develop the discriminant equation and the other half is used to validate
it.
○ I have heard DA is related to MANCOVA. How so?
Discriminant analysis can be conceptualized as the inverse of
MANCOVA. MANCOVA can be used to see the effect on multiple
dependents of a single categorial independent, while DA can be used to
see the effect on a categorical dependent of multiple interval independents.
The SPSS MANOVA procedure, which also covers MANCOVA, can be
used to generate discriminant functions as well, though in practical terms
this is not the easiest route for the researcher interested in DA.
○ How does MDA work?
A first function is computed on which the group means are as different as
possible. A second function is then computed uncorrelated with the first,
then a third function is computed uncorrelated with the first two, and so
on, for as many functions as possible. The maximum number of functions
is the lesser of g - 1 (number of dependent groups minus 1) or k (the
number of independent variables).
○ How can I tell if MDA worked?
SPSS will print out a table of Classification Results, in which the rows are
Actual and the columns are Predicted. The better MDA works, the more
the cases will all be on the diagonal. Also, below the table SPSS will print
the percent of cases correctly classified.
○ For any given MDA example, how many discriminant functions will there be,
and how can I tell if each is significant?
The answer is min(g-1,p), where g is the number of groups (categories)
being discriminated and p is the number of predictor (independent
variables). The min() function, of course, means the lesser of. SPSS will
print Wilks's lambda and its significance for each function, and this tests
the significance of the discriminant functions.
○ In MDA there will be multiple discriminant functions, so therefore there will
be more than one set of unstandardized discriminant coefficients, and for
each case a discriminant score can be obtained for each of the multiple
functions. In dichotomous discriminant analysis, the discriminant score is
used to classify the case as 0 or 1 on the dependent variable. But how are the
multiple discriminant scores on a single case interpreted in MDA?
Take the case of three discriminant functions with three corresponding
discriminant scores per case. The three scores for a case indicate the
location of that case in three-dimensional discriminant space. Each axis
represents one of the discriminant functions, roughly analogous to factor
axes in factor analysis. That is, each axis represents a dimension of
meaning whose label is attributed based on inference from the structure
coefficients.
One can also locate the group centroid for each group of the dependent in
discriminant space in the same manner.
In the case of two discriminant functions, cases or group centroids may be
plotted on a two-dimensional scatterplot of discriminant space (a
canonical plot). Even when there are more than two functions,
interpretation of the eigenvalues may reveal that only the first two
functions are important and worthy of plotting.
○ Likewise in MDA, there are multiple standardized discriminant coefficients -
one set for each discriminant function. In dichotomous DA, the ratio of the
standardized discriminant coefficients is the ratio of the importance of the
independent variables. But how are the multiple set of standardized
coefficients interpreted in MDA?
In MDA the standardized discriminant coefficients indicate the relative
importance of the independent variables in determining the location of
cases in discriminant space for the dimension represented by the function
for that set of standardized coefficients.
○ Are the multiple discriminant functions the same as factors in principal-
components factor analysis?
No. There are conceptual similarities, but they are mathematically
different in what they are maximizing. MDA is maximizing the difference
between values of the dependent. PCA is maximizing the variance in all
the variables accounted for by the factor.

Bibliography
○ George H. Dunteman (1984). Introduction to multivariate analysis. Thousand
Oaks, CA: Sage Publications. Chapter 5 covers classification procedures and
discriminant analysis.
○ Huberty, Carl J. (1994). Applied discriminant analysis . NY: Wiley-Interscience.
(Wiley Series in Probability and Statistics).
○ Klecka, William R. (1980). Discriminant analysis. Quantitative Applications in
the Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.
○ Lachenbruch, P. A. (1975). Discriminant analysis. NY: Hafner.
○ McLachlan, Geoffrey J. (2004). Discriminant analysis and statistical pattern
recognition. NY: Wiley-Interscience. (Wiley Series in Probability and Statistics).
○ Press, S. J. and S. Wilson (1978). Choosing between logistic regression and
discriminant analysis. Journal of the American Statistical Association, Vol. 73:
699-705. The authors make the case for the superiority of logistic regression for
situations where the assumptions of multivariate normality are not met (ex., when
dummy variables are used), though discriminant analysis is held to be better when
assumptions are met. They conclude that logistic and discriminant analyses will
usually yield the same conclusions, except in the case when there are
independents which result in predictions very close to 0 and 1 in logistic analysis.
○ Tabachnick, Barbara G. and Linda S. Fidell (2001). Using multivariate statistics,
Fourth ed. (Boston: Allyn and Bacon). chapter 11 covers discriminant analysis.

You might also like