You are on page 1of 30

Discriminant Function Analysis

Overview
Discriminant function analysis, a.k.a. discriminant analysis or DA, is used to classify cases into the values of a categorical dependent, usually a dichotomy. If discriminant function analysis is effective for a set of data, the classification table of correct and incorrect estimates will yield a high percentage correct. Discriminant function analysis is found in SPSS under Analyze, Classify, Discriminant. One gets DA or MDA from this same menu selection, depending on whether the specified grouping variable has two or more categories. Multiple discriminant analysis (MDA) is an extension of discriminant analysis and a cousin of multiple analysis of variance (MANOVA), sharing many of the same assumptions and tests. MDA is used to classify a categorical dependent which has more than two categories, using as predictors a number of interval or dummy independent variables. MDA is sometimes also called discriminant factor analysis or canonical discriminant analysis. There are several purposes for DA and/or MDA:

To classify cases into groups using a discriminant prediction equation. To test theory by observing whether cases are classified as predicted. To investigate differences between or among groups. To determine the most parsimonious way to distinguish among groups. To determine the percent of variance in the dependent variable explained by the independents. To determine the percent of variance in the dependent variable explained by the independents over and above the variance accounted for by control variables, using sequential discriminant analysis. To assess the relative importance of the independent variables in classifying the dependent variable. To discard variables which are little related to group distinctions. To infer the meaning of MDA dimensions which distinguish groups, based on discriminant loadings.

Discriminant analysis has two steps: (1) an F test (Wilks' lambda) is used to test if the discriminant model as a whole is significant, and (2) if the F test shows significance, then the individual independent variables are assessed to see which differ significantly in mean by group and these are used to classify the dependent variable. Discriminant analysis shares all the usual assumptions of correlation, requiring linear and homoscedastic relationships, and untruncated interval or near interval data. Like multiple regression, it also assumes proper model specification (inclusion of all important independents and exclusion of extraneous variables). DA also assumes the dependent variable is a true dichotomy since data which are forced into dichotomous coding are truncated, attenuating correlation. DA is an earlier alternative to logistic regression, which is now frequently used in place of DA as it usually involves fewer violations of assumptions (independent variables needn't be normally distributed, linearly related, or have equal within-group variances), is robust, handles categorical as well as continuous variables, and has coefficients which many find easier to interpret. Logistic 1

regression is preferred when data are not normal in distribution or group sizes are very unequal. However, discriminant analysis is preferred when the assumptions of linear regression are met since then DA has more stattistical power than logistic regression (less chance of type 2 errors - accepting a false null hypothesis). See also the separate topic on multiple discriminant function analysis (MDA) for dependents with more than two categories.

Key Terms and Concepts


Discriminating variables: These are the independent variables, also called predictors. The criterion variable. This is the dependent variable, also called the grouping variable in SPSS. It is the object of classification efforts. Discriminant function: A discriminant function, also called a canonical root, is a latent variable which is created as a linear combination of discriminating (independent) variables, such that L = b1x1 + b2x2 + ... + bnxn + c, where the b's are discriminant coefficients, the x's are discriminating variables, and c is a constant. This is analogous to multiple regression, but the b's are discriminant coefficients which maximize the distance between the means of the criterion (dependent) variable. Note that the foregoing assumes the discriminant function is estimated using ordinary least-squares, the traditional method, but there is also a version involving maximum likelihood estimation. o Pairwise group comparisons display the distances between group means (of the dependent variable) in the multidimensional space formed by the discriminant functions. (Not applicable to two-group DA, where there is only one function). The pairwise group comparisons table gives an F test of significance (based on Mahalanobis distances) of the distance of the group means, enabling the researcher to determine if every group mean is significantly distant from every other group mean. Also, the magnitude of the F values can be used to compare distances between groups in multivariate space. In SPSS, Analyze, Classify, Discriminant; check "Use stepwise method"; click Method, check "F for pairwise distances." o Number of discriminant functions. There is one discriminant function for 2-group discriminant analysis, but for higher order DA, the number of functions (each with its own cut-off value) is the lesser of (g - 1), where g is the number of categories in the grouping variable, or p,the number of discriminating (independent) variables. Each discriminant function is orthogonal to the others. A dimension is simply one of the discriminant functions when there are more than one, in multiple discriminant analysis. The first function maximizes the differences between the values of the dependent variable. The second function is orthogonal to it (uncorrelated with it) and maximizes the differences between values of the dependent variable, controlling for the first factor. And so on. Though mathematically different, each discriminant function is a dimension which differentiates a case into categories of the dependent (here, religions) based on its values on the independents. The first function will be the most powerful differentiating dimension, but later functions may also represent additional significant dimensions of differentiation.
o

The eigenvalue, also called the characteristic root of each discriminant function, reflects the ratio of importance of the dimensions which classify cases of the dependent variable. There is one eigenvalue for each discriminant function. For twogroup DA, there is one discriminant function and one eigenvalue, which accounts for 100% of the explained variance. If there is more than one discriminant function, the 2

first will be the largest and most important, the second next most important in explanatory power, and so on. The eigenvalues assess relative importance because they reflect the percents of variance explained in the dependent variable, cumulating to 100% for all functions. That is, the ratio of the eigenvalues indicates the relative discriminating power of the discriminant functions. If the ratio of two eigenvalues is 1.4, for instance, then the first discriminant function accounts for 40% more between-group variance in the dependent categories than does the second discriiminant function. Eigenvalues are part of the default output in SPSS (Analyze, Classify, Discriminant). The relative percentage of a discriminant function equals a function's eigenvalue divided by the sum of all eigenvalues of all discriminant functions in the model. Thus it is the percent of discriminating power for the model associated with a given discriminant function. Relative % is used to tell how many functions are important. One may find that only the first two or so eigenvalues are of importance. The canonical correlation, R*, is a measure of the association between the groups formed by the dependent and the given discriminant function. When R* is zero, there is no relation between the groups and the function. When the canonical correlation is large, there is a high correlation between the discriminant functions and the groups. Note that relative % and R* do not have to be correlated. R* is used to tell how much each function is useful in determining group differences. An R* of 1.0 indicates that all of the variability in the discriminant scores can be accounted for by that dimension. Note that for two-group DA, the canonical correlation is equivalent to the Pearsonian correlation of the discriminant scores with the grouping variable. The discriminant score, also called the DA score, is the value resulting from applying a discriminant function formula to the data for a given case. The Z score is the discriminant score for standardized data. To get discriminant scores in SPSS, select Analyze, Classify, Discriminant; click the Save button; check "Discriminant scores". One can also view the discriminant scores by clicking the Classify button and checking "Casewise results." Cutoff: If the discriminant score of the function is less than or equal to the cutoff, the case is classed as 0, or if above it is classed as 1. When group sizes are equal, the cutoff is the mean of the two centroids (for two-group DA). If the groups are unequal, the cutoff is the weighted mean. Unstandardized discriminant coefficients are used in the formula for making the classifications in DA, much as b coefficients are used in regression in making predictions. The constant plus the sum of products of the unstandardized coefficients with the observations yields the discriminant scores. That is, discriminant coefficients are the regression-like b coefficients in the discriminant function, in the form L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable formed by the discriminant function, the b's are discriminant coefficients, the x's are discriminating variables, and c is a constant. The discriminant function coefficients are partial coefficients, reflecting the unique contribution of each variable to the classification of the criterion variable. The standardized discriminant coefficients, like beta weights in regression, are used to assess the relative classifying importance of the independent variables. If one clicks the Statistics button in SPSS after running discriminant analysis and then checks "Unstandardized coefficients," then SPSS output will include the unstandardized discriminant coefficients.

Standardized discriminant coefficients, also termed the standardized canonical discriminant function coefficients, are used to compare the relative importance of the 3

independent variables, much as beta weights are used in regression. Note that importance is assessed relative to the model being analyzed. Addition or deletion of variables in the model can change discriminant coefficients markedly. As with regression, since these are partial coefficients, only the unique explanation of each independent is being compared, not considering any shared explanation. Also, if there are more than two groups of the dependent, the standardized discriminant coefficients do not tell the researcher between which groups the variable is most or least discriminating. For this purpose, group centroids and factor structure are examined. The standardized discriminant coefficients appear by default in SPSS (Analyze, Classify, Discriminant) in a table of "Standardized Canonical Discriminant Function Coefficients". In MDA, there will be as many sets of coefficients as there are discriminant functions (dimensions).

Functions at group centroids are the mean discriminant scores for each of the dependent variable categories for each of the discriminant functions in MDA. Two-group discriminant analysis has two centroids, one for each group. We want the means to be well apart to show the discriminant function is clearly discriminating. The closer the means, the more errors of classification there likely will be. SPSS generates a table of "Functions at group centroids" by default when Analyze, Classify, Discriminant is invoked. o Discriminant function plots, also called canonical plots, can be created in which the two axes are two of the discriminant functions (the dimensional meaning of which is determined by looking at the structure coefficients, discussed above), and circles within the plot locate the centroids of each category being analyzed. The farther apart one point is from another on the plot, the more the dimension represented by that axis differentiates those two groups. Thus these plots depict discriminant function space. For instance, occupational groups might be located in a space representing educational and motivational dimensions. In the Plots area of the Classify button, one can select Separate-group plots, a Combined-group plot, or a territorial map. Separate and combined group plots show where cases are located in the property space formed by two functions (dimensions). By default, SPSS uses the first two functions. The territorial map shows inter-group distances on the discriminant functions. Each function has a numeric symbol: 1, 2, 3, etc. Cases falling within the boundaries formed by the 2's, for instance, are classified as 2. The individual cases are not shown in territorial maps under SPSS, however.

Tests of significance

(Model) Wilks' lambda is used to test the significance of the discriminant function as a whole. In SPSS, the "Wilks' Lambda" table will have a column labeled "Test of Function(s)" and a row labeled "1 through n" (where n is the number of discriminant functions). The "Sig." level for this row is the significance level of the discriminant function as a whole. The researcher wants a finding of significance, and the larger the lambda, the more likely it is significant. A significant lambda means one can reject the null hypothesis that the two groups have the same mean discriminant function scores and conclude the model is discriminating. Wilks's lambda is part of the default output in SPSS (Analyze, Classify, Discriminant). In SPSS, this use of Wilks' lambda is in the "Wilks' lambda" table of the output section on "Summary of Canonical Discriminant Functions." o Stepwise Wilks' lambda appears in the "Variables in the Analysis" table of stepwise DA output, after the "Sig. of F. to Remove" column. The Step 1 model will have no entry as removing the first variable is removing the only variable. The Step 2 model will have two predictors, each with a Wilks' lambda coefficient. which represents what model Wilks' lambda would be if that variable were dropped, leaving only the 4

other one. If V1 is entered at Step 1 and V2 is entered at Step 2, then the Wilks' lambda in the "Variables in the Analysis" table for V2 will be identical to the model Wilks' lambda in the ""Wilks' Lambda" table for Step 1, since dropping it would reduce the model to the Step 1 model. The more important the variable in classifying the grouping variable, the higher its stepwise Wilks' lambda. Stepwise Wilks' lambda also appears in the "Variables Not in the Analysis" table of stepwise DA output, after the "Sig. of F to Enter" column. Here the criterion is reversed: the variable with the lowest stepwise Wilks' lambda is the best candidate to add to the model in the next step.

(Model) Wilks' lambda difference tests are also used in a second context to assess the improvement in classification when using sequential discriminant analysis. There is an F test of significance of the ratio of two Wilks' lambdas, such as between a first one for a set of control variables as predictors and a second one for a model including both control variables and independent variables of interest. The second lambda is divided by the first (where the first is the model with fewer predictors) and an approximate F value for this ratio is found using calculations reproduced in Tabachnick and Fidell (2001: 491). o ANOVA table for discriminant scores is another overall test of the DA model. It is an F test, where a "Sig." p value < .05 means the model differentiates discriminant scores between the groups significantly better than chance (than a model with just the constant). It is obtained in SPSS by asking for Analyze, Compare Means, OneWay ANOVA, using discriminant scores from DA (which SPSS will label Dis1_1 or similar) as dependent. (Variable) Wilks' lambda also can be used to test which independents contribute significantly to the discrimiinant function. The smaller the variable Wilks' lambda for an independent variable, the more that variable contributes to the discriminant function. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the more the variable differentiates the groups), and 1 meaning all group means are the same. The F test of Wilks's lambda shows which variables' contributions are significant. Wilks's lambda is sometimes called the U statistic. In SPSS, this use of Wilks' lambda is in the "Tests of equality of group means" table in DA output. o Dichotomous independents are more accurately tested with a chi-square test than with Wilks' lambda for this purpose.

Measuring strength of relationships

Classification functions: There are multiple methods of actually classifying cases in MDA. Simple classification, also known as Fisher's classification function, simply uses the unstandardized discriminant coefficients. Generalized distance functions are based on the Mahalanobis distance, D-square, of each case to each of the group centroids. K-nearest neighbor discriminant analysis (KNN) is a nonparametric method which assigns a new case to the group to which its k neighest neighbors also belong. The KNN method is popular when there are inadequate data to define the sample means and covariance matrices. There are other methods of classification. The classification table, also called a classification matrix, or a confusion, assignment, or prediction matrix or table, is used to assess the performance of DA. This is simply a table in which the rows are the observed categories of the dependent and the columns are the predicted categories of the dependents. When prediction is perfect, all cases will lie on the diagonal. The percentage of cases on the diagonal is the percentage of correct classifications. This percentage is called the hit ratio.

Expected hit ratio. Note that the hit ratio must be compared not to zero but to the percent that would have been correctly classified by chance alone. For two-group discriminant analysis with a 50-50 split in the dependent variable, the expected percent is 50%. For unequally split 2-way groups of different sizes, the expected percent is computed in the "Prior Probabilities for Groups" table in SPSS, by multiplying the prior probabilities times the group size, summing for all groups, and dividing the sum by N. If group sizes are known a priori, the best strategy by chance is to pick the largest group for all cases, so the expected percent is then the largest group size divided by N. o Cross-validation. Leave-one-out classification is available as a form of crossvalidation of the classification table. Under this option, each case is classified using a discriminant function based on all cases except the given case. This is thought to give a better estimate of what classificiation results would be in the population. In SPSS, select Analyze, Classify, Discriminant; select variables; click Classify; select Leave-one-out classification; Continue; OK. o Measures of association can be computed by the crosstabs procedure in SPSS if the researcher saves the predicted group membership for all cases. In SPSS, select Analyze, Classify, Discriminant; select variables; click Save; select Discriminant scores; Continue; OK. Mahalanobis D-Square, Rao's V, Hotelling's trace, Pillai's trace, and Roys gcr are indexes other than Wilks' lambda of the extent to which the discriminant functions discriminate between criterion groups. Each has an associated significance test. A measure from this group is sometimes used in stepwise discriminant analysis to determine if adding an independent variable to the model will significantly improve classification of the dependent variable. SPSS uses Wilks' lambda by default but offers Mahalanobis distance, Rao's V, unexplained variance, and smallest F ratio also. Canonical correlation, Rc: Squared canonical correlation, Rc2, is the percent of variation in the dependent discriminated by the set of independents in DA or MDA. The canonical correlation of each discriminant function is also the correlation of that function with the discriminant scores. A canonical correlation close to 1 means that nearly all the variance in the discriminant scores can be attributed to group differences. The canonical correlation of any discriminant function is displayed in SPSS by default as a column in the "Eigenvalues" output table. Note the canonical correlations are not the same as the correlations in the structure matrix, discussed below.
o

Interpreting the discriminant functions

Structure coefficients and structure matrix. Structure coefficients, also called structure correlations or discriminant loadings, are the correlations between a given independent variable and the discriminant scores associated with a given discriminant function. They are used to tell how closely a variable is related to each function in MDA. Looking at all the structure coefficients for a function allows the researcher to assign a label to the dimension it measures, much like factor loadings in factor analysis. A table of structure coefficients of each variable with each discriminant function is called a canonical structure matrix or factor structure matrix. The structure coefficients are whole (not partial) coefficients, similar to correlation coefficients, and reflect the uncontrolled association of the discriminating variables with the criterion variable, whereas the discriminant coefficients are partial coefficients reflecting the unique, controlled association of the discriminating variables with the criterion variable, controlling for other variables in the equation. Technically, structure coefficients are pooled within-groups correlations between the independent variables and the standardized canonical discriminant functions. When the 6

dependent has more than two categories there will be more than one discriminant function. In that case, there will be multiple columns in the table, one for each function. The correlations then serve like factor loadings in factor analysis -- by considering the set of variables that load most heavily on a given dimension, the researcher may infer a suitable label for that dimension. The structure matrix correlations appear in SPSS output in the "Structure Matrix" table, produced by default under Analyze, Classify, Discriminant. Thus for two-group DA, the structure coefficients show the order of importance of the discriminating variables by total correlation, whereas the standardized discriminant coefficients show the order of importance by unique contribution. The sign of the structure coefficient also shows the direction of the relationship. For multiple discriminant analysis, the structure coefficients additionally allow the researcher to see the relative importance of each independent variable on each dimension.

Structure coefficients vs. standardized discriminant function coefficients. The standardized discriminant function coefficients indicate the semi-partial contribution (the unique, controlled association) of each variable to the discriminant function(s), controlling the independent but not the dependent for other independents entered in the equation (just as regression coefficients are semi-partial coefficients). In contrast, structure coefficients are whole (not partial) coefficients, similar to correlation coefficients, and reflect the uncontrolled association of the discriminant scores with the criterion variable. That is, the structure coefficients indicate the simple correlations between the variables and the discriminant function or functions. The structure coefficients should be used to assign meaningful labels to the discriminant functions. The standardized discriminant function coefficients should be used to assess the importance of each independent variable's unique contribution to the discriminant function. Mahalanobis distances are used in analyzing cases in discriminant analysis. For instance, one might wish to analyze a new, unknown set of cases in comparison to an existing set of known cases. Mahalanobis distance is the distance between a case and the centroid for each group (of the dependent) in attribute space (n-dimensional space defined by n variables). A case will have one Mahalanobis distance for each group, and it will be classified as belonging to the group for which its Mahalanobis distance is smallest. Thus, the smaller the Mahalanobis distance, the closer the case is to the group centroid and the more likely it is to be classed as belonging to that group. Since Mahalanobis distance is measured in terms of standard deviations from the centroid, therefore a case which is more than 1.96 Mahalanobis distance units from the centroid has less than .05 chance of belonging to the group represented by the centroid; 3 units would likewise correspond to less than .01 chance. SPSS reports squared Mahalanobis distance: click the Classify button and then check "Casewise results." Wilks's lambda tests the significance of each discriminant function in MDA -- specifically the significance of the eignevalue for a given function. It is a measure of the difference between groups of the centroid (vector) of means on the independent variables. The smaller the lambda, the greater the differences. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the more the variable differentiates the groups), and 1 meaning all group means are the same. The Bartlett's V transformation of lambda is then used to compute the significance of lambda. Wilks's lambda is used, in conjunction with Bartlett's V, as a multivariate significance test of mean differences in MDA, for the case of multiple interval independents and multiple (>2) groups formed by the dependent. Wilks's lambda is sometimes called the U statistic.

Validation

A hold-out sample is often used for validation of the discriminant function. This is a split halves test, were a portion of the cases are assigned to the analysis sample for purposes of training the discriminant function, then it is validated by assessing its performance on the remaining cases in the hold-out sample.

Discriminant Function Analysis (Two Groups): SPSS Output


Notes This example is from the SPSS 7.5 "Applications Guide" example for file "gss 93 subset.sav". The dependent is "vote92." The independents are age, educ, income91, sex, and polviews (which is a 7-point Likert scale from "Extremely liberal" to "extremely conservative"). To obtain this output: 1. File, Open, point to gss 93 subset.sav. 2. Restrict vote92 to 1's and 2's by choosing Data, Select Cases, "If condition is satisfied". Click the If button and enter vote92 <3. Click Continue, OK. 3. Statistics, Classify, Discriminant 4. Select vote92 as the "grouping variable" (the dependent). As independents, select age, educ, income91, sex, and polviews. Check "Enter independents together" (i.e., not stepwise). 5. Click on Statistics and check all Descriptives and all Function Coefficients. 6. Click on Classify and check Results (limit to first 10), Summary Table, and all plots. 7. To run, click OK. Comments in blue are by the instructor and are not part of SPSS output.

Discriminant
First come several blocks of general processing and descriptive statistics information. Notes 02 Mar 98 14:11:35

Output Created Comments Input Data Filter Weight Split File

Y:\PC\spss95\GSS93 subset.sav vote92 < 3 (FILTER) <none> <none> 8

N of Rows in Working Data File Definition of Missing Missing Value Handling

1452

User-defined missing values are treated as missing in the analysis phase. In the analysis phase, cases with no user- or system-missing values for any predictor variable are used. Cases with user-, systemmissing, or out-of-range values for the grouping variable are always excluded. DISCRIMINANT /GROUPS=vote92(1 2) /VARIABLES=sex age educ income91 polviews /ANALYSIS ALL /PRIORS EQUAL /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW TABLE /PLOT=COMBINED SEPARATE MAP /PLOT=CASES(10) /CLASSIFY=NONMISSING POOLED .

Cases Used

Syntax

Resources

Elapsed Time

0:00:01.21

Analysis Case Processing Summary N 1345 Missing or out-of-range group codes At least one missing discriminating variable Excluded Both missing or out-of-range group codes and at least one missing discriminating variable Total 0 107 0 107 Percent 92.6 .0 7.4 .0 7.4

Unweighted Cases Valid

Total

1452

100.0

Group Statistics Valid N (listwise) Std. Deviation Unweighted Weighted

Voting in 1992 Election

Mean

Respondent's Sex Age of Respondent voted Highest Year of School Completed Total family Income Think of Self as Liberal or Conservative Respondent's Sex Age of Respondent did not vote Highest Year of School Completed Total Family Income Think of Self as Liberal or Conservative Total Respondent's Sex Age of Respondent Highest Year of School Completed Total family Income

1.55 47.56 13.64 15.51 4.19 1.57 41.64 11.84 12.60 4.10 1.56 45.91 13.14 14.70

.50 16.73 2.97 5.00 1.41 .50 17.34 2.84 5.77 1.21 .50 17.10 3.04 5.39

971 971 971 971 971 374 374 374 374 374 1345 1345 1345 1345

971.000 971.000 971.000 971.000 971.000 374.000 374.000 374.000 374.000 374.000 1345.000 1345.000 1345.000 1345.000 10

Think of Self as Liberal or Conservative

4.17

1.36

1345

1345.000

In the ANOVA table below, the smaller the Wilks's lambda, the more important the independent variable to the discriminant function. Wilks's lambda is significant by the F test for age, educ, and income92. We might consider dropping sex and polviews from the model. Tests of Equality of Group Means Sig. .518 .000 .000 .000 .233

Wilks' Lambda Respondent's Sex Age of Respondent Highest Year of School Completed Total family Income Think of Self as Liberal or Conservative 1.000 .976

F .418 33.197

df1

df2

1 1343 1 1343 1 1343 1 1343 1 1343

.930 101.620 .941 .999 83.840 1.423

Analysis 1 Box's Test of Equality of Covariance Matrices


The larger the log determinant in the table below, the more that group's covariance matrix differs. The "Rank" column indicates the number of independent variables -- 5 in this case. Since discriminant analysis assumes homogeneity of covariance matrices between groups, we would like to see the determinants be relatively equal. Box's M, next, tests the homogeneity of covariances assumption. Log Determinants Rank 5 5 5 Log Determinant 10.006 9.945 10.019

Voting in 1992 Election voted did not vote Pooled within-groups

11

The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

Test Results Box's M test tests the assumption of homogeneity of covariance matrices. This test is very sensitive to meeting also the assumption of multivariate normality. Discriminant function analysis is robust even when the homogeneity of variances assumption is not met, provided the data do not contain important outliers. For the data below, the test is significant so we conclude the groups do differ in their covariance matrices, violating an assumption of DA. Note that when n is large, as it is here, small deviations from homogeneity will be found significant, which is why Box's M must be interpreted in conjunction with inspection of the log determinants, above. Box's M Approx. df1 F df2 Sig. Tests null hypothesis of equal population covariance matrices. 40.399 2.679 15 2102169.732 .000

Summary of Canonical Discriminant Functions


The table below shows the eigenvalues. The larger the eigenvalue, the more of the variance in the dependent variable is explained by that function. Since the dependent in this example has only two categories, there is only one discriminant function. However, if there were more categories, we would have multiple discriminant functions and this table would list them in descending order of importance. The second column lists the percent of variance explained by each function. The third column is the cumulative percent of variance explained. The last column is the canonical correlation, where the squared canonical correlation is the percent of variation in the dependent discriminated by the independents in DA. Sometimes this table is used to decide how many functions are important (ex., eigenvalues over 1, percent of variance more than 5%, cumularive percentage of 75%, canonical correlation of .6). This issue does not arise here since there is only one discriminant function, though we may note its canonical correlation is not high. Eigenvalues Canonical Correlation

Function

Eigenvalue % of Variance Cumulative %

12

.164(a)

100.0

100.0

.376

a First 1 canonical discriminant functions were used in the analysis.

This second appearance of Wilks's lambda serves a different purpose than its use in the ANOVA table above. In the table below it tests the significance of the eigenvalue for each discriminant function. In this example there is only one, and it is significant. Wilks' Lambda Test of Function(s) 1 Wilks' Lambda Chi-square df .859 203.909 5 Sig. .000

The standardized discriminant function coefficients in the table below serve the same purpose as beta weights in multiple regression: they indicate the relative importance of the independent variables in predicting the dependent. Standardized Canonical Discriminant Function Coefficients Function 1 Respondent's Sex Age of Respondent Highest Year of School Completed Total family Income Think of Self as Liberal or Conservative .011 .657 .712 .423 .018

The structure matrix table below shows the correlations of each variable with each discriminant function. In this case, there is only one discriminant function. However, when the dependent has more categories there will be more discriminant functions. In that case, there will be additional columns in the table, one for each function. The correlations then serve like factor loadings in factor analysis -- that is, by identifying the largest absolute correlations associated with each discriminant function the researcher gains insight into how to name each function. 13

Structure Matrix Function 1 Highest Year of School Completed Total family Income Age of Respondent Think of Self as Liberal or Conservative Respondent's Sex Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. .679 .616 .388 .080 -.044

The table below contains the unstandardized discriminant function coefficients. These would be used like unstandardized b (regression) coefficients in multiple regression -- that is, they are used to construct the actual prediciton equation which can be used to classify new cases. Canonical Discriminant Function Coefficients Function 1 Respondent's Sex Age of Respondent Highest Year of School Completed Total family Income Think of Self as Liberal or Conservative .021 .039 .243 .081 .013 14

(Constant) Unstandardized coefficients

-6.253

The table below is used to establish the cutting point for classifying cases. If the two groups are of equal size, the best cutting point is half way between the values of the functions at group centroids (that is, the average). If the groups are unequal, the optimal cutting point is the weighted average of the two values. Cases which evaluate on the function above the cutting point are classified as "did not vote," while those evaluating below the cutting point are evaluated as "Voted." Of course, the computer does the classification automatically, so these values are for informational purposes. Functions at Group Centroids

Function

Voting in 1992 Election voted did not vote

1 .251 -.653

Unstandardized canonical discriminant functions evaluated at group means

Classification Statistics
The table below just tells the researcher about the status of cases in terms of processing. Classification Processing Summary Processed Missing or out-of-range group codes Excluded At least one missing discriminating variable Used in Output 1452 0 107 1345

Prior Probabilities below are used in classification. The default is using observed group sizes (marginals) in your sample to determine the prior probabilities of membership in the groups formed by the dependent, and this is necessary if you have different group sizes. If each group is of the same size, as an alternative you could specify equal prior probabilities for all groups. 15

Prior Probabilities for Groups Cases Used in Analysis Voting in 1992 Election Prior Unweighted .500 .500 1.000 971 374 1345 Weighted 971.000 374.000 1345.000

voted did not vote Total

The table below is the result of checking "Fisher's" under "Function Coefficients" in the "Statistics" option of discriminant analysis. Two sets (one for each dependent group) of unstandardized linear discriminant coefficients are calculated, which can be used to classify cases. This is the classical method of classification, though now little used. Classification Function Coefficients Voting in 1992 Election voted Respondent's Sex Age of Respondent Highest Year of School Completed Total family Income Think of Self as Liberal or Conservative (Constant) Fisher's linear discriminant functions 7.048 .250 1.884 .319 2.187 -32.013 did not vote 7.029 .215 1.664 .246 2.175 -26.541

The table below results from checking "Casewise results" in the "Classify" options of discriminant function analysis. The table lists the actual group, the predicted group based on largest posterior probabilities, the prior probability (the probability of the observed group score given membership in the predicted group), the posterior probability (the chance the case belongs to the predicted group, 16

based on the independents), the Mahalanobis distance squared of the case to the group centroid (large scores indicate outliers), and the discriminant score for the case. The case is classified based on the discriminant score in relation to the cutoff (not shown). Misclassified cases are marked with asterisks. The "Second Highest Group" columns show the posterior probabilities and Mahalanobis distances for the case had the case been classed based on the second highest posterior probability. Since there are only two groups in this example, the "second highest" is equivalent to the "other" group. Casewise Statistics

Highest Group P(D>d | G=g) df Actual Group

Second Highest Group

Discrimina Scores

Case Number

Predicted Group

Squared Squared P(G=g Mahalanobis P(G=g Mahalanobis Group | D=d) Distance to | D=d) Distance to Centroid Centroid

Function

iginal 1 2 3 4 5 6 7 8

1 1 1 1 1 1 1 2

2(**) .774 1 .561 1 .528 1 .445 1 .015 1 .622 1 .878 1(**) .835

1 1 1 1 1 1 1 1

.537 .718 .727 .750 .932 .701 .634 .645

.082 .339 .399 .583 5.941 .243 .024 .044

1 2 2 2 2 2 2 2

.463 .282 .273 .250 .068 .299 .366 .355

.381 2.208 2.358 2.780 11.165 1.951 1.119 1.238

-.3

.8

.8

1.0

2.6

.7

.4

.4

17

9 10

1 1

1 .390 1 .430

1 1

.766 .755

.738 .624

2 2

.234 .245

3.107 2.870

1.1

1.0

Misclassified case

Separate-Groups Graphs
The tables below result from checking "Combined-groups" and "Separate-groups" under "Plots" in the "Classify" options of discriminant analysis. If there were two or more discriminant functions, the charts below would be scatterplots showing the relation of the first two discriminant functions. As the dependent in this example has only one discriminant function, bar charts are displayed instead. In a good discriminant function, the bar chart will have most cases near the mean, with small tails.

18

The table below is used to assess how well the discriminant function works, and if it works equally well for each group of the dependent variable. Here it correctly classifies about two-thirds of the cases, making about the same proportion of mistakes for both categories. This would normally not be considered a satisfactory level of discrimination and the researcher would seek to test other models. Classification Results(a)

Predicted Group Membership did not vote Voting in 1992 Election voted Total

voted Count did not vote Original voted % did not vote a 67.1% of original grouped cases correctly classified.

649 121 66.8 32.4

322 253 33.2 67.6

971 374 100.0 100.0

19

Discriminant Function Analysis (Three Groups): SPSS Output


Notes This example is from the SPSS 7.5 "Applications Guide" example for file "gss 93 subset.sav". The dependent is "race." The independents are agewed, educ, rincom91, sibs, rap, polviews (which is a 7-point Likert scale from "Extremely liberal" to "extremely conservative"), and marital. To obtain this output: 1. File, Open, point to gss 93 subset.sav. 2. Statistics, Classify, Discriminant 3. Select race as the "grouping variable" (the dependent). As independents, select agewed, educ, rincom91, sibs, rap, polviews, and marital.Check "Enter independents together" (i.e., not stepwise). 4. Click on Statistics and check Univariate ANOVA amd Box's M. 5. Click on Classify and check Computer from group sizes, Summary table, and all plots. 6. To run, click OK. Comments in blue are by the instructor and are not part of SPSS output.

Discriminant
First come several blocks of general processing and descriptive statistics information. Notes Output Created Comments Data Filter Weight Input Split File N of Rows in Working Data File Missing Value Definition of Missing Y:\PC\spss95\GSS93 subset.sav <none> <none> <none> 1500 User-defined missing values are treated as missing in the analysis phase. 03 Mar 98 13:12:51

20

Handling

Cases Used

In the analysis phase, cases with no user- or system-missing values for any predictor variable are used. Cases with user-, systemmissing, or out-of-range values for the grouping variable are always excluded. DISCRIMINANT /GROUPS=race(1 3) /VARIABLES=agewed educ rincom91 sibs rap polviews marital /ANALYSIS ALL /PRIORS SIZE /STATISTICS=UNIVF BOXM TABLE /PLOT=COMBINED SEPARATE MAP /CLASSIFY=NONMISSING POOLED .

Syntax

Resources

Elapsed Time Analysis Case Processing Summary

0:00:02.20

Unweighted Cases Valid Missing or out-of-range group codes At least one missing discriminating variable Excluded Both missing or out-of-range group codes and at least one missing discriminating variable Total Total Group Statistics Valid N (listwise) Racew of Respondent Age When First Married Highest Year of School Completed Respondent's Income white Number of Brothers and Sisters Rap Music Think of Self as Liberal or Conservative Marital Status black Age When First Married

N 732 0 768 0 768 1500

Percent 48.8 .0 51.2 .0 51.2 100.0

Unweighted Weighted 623 623 623 623 623 623 623 73 623.000 623.000 623.000 623.000 623.000 623.000 623.000 73.000 21

Highest Year of School Completed Respondent's Income Number of Brothers and Sisters Rap Music Think of Self as Liberal or Conservative Marital Status Age When First Married Highest Year of School Completed Respondent's Income other Number of Brothers and Sisters Rap Music Think of Self as Liberal or Conservative Marital Status Age When First Married Highest Year of School Completed Respondent's Income Total Number of Brothers and Sisters Rap Music Think of Self as Liberal or Conservative Marital Status

73 73 73 73 73 73 36 36 36 36 36 36 36 732 732 732 732 732 732 732

73.000 73.000 73.000 73.000 73.000 73.000 36.000 36.000 36.000 36.000 36.000 36.000 36.000 732.000 732.000 732.000 732.000 732.000 732.000 732.000

In the ANOVA table below, the smaller the Wilks's lambda, the more important the independent variable to the discriminant function. Wilks's lambda is significant by the F test for all variables except rincom91 and polviews, which we might consider dropping from the model. Tests of Equality of Group Means Wilks' Lambda Age When First Married Highest Year of School Completed Respondent's Income Number of Brothers and Sisters Rap Music .992 .990 .996 .937 .946 F 3.118 3.648 1.459 24.567 20.793 df1 df2 2 2 2 2 2 Sig.

729 .045 729 .027 729 .233 729 .000 729 .000 22

Think of Self as Liberal or Conservative Marital Status

.994 .981

2.221 6.992

2 2

729 .109 729 .001

Analysis 1 Box's Test of Equality of Covariance Matrices


Log Determinants Racew of Respondent white black other Pooled within-groups Rank 7 7 7 7 Log Determinant 10.002 12.168 11.980 10.537

The ranks and natural logarithms of determinants printed are those of the group covariance matrices. Box's M test tests the assumption of homogeneity of covariance matrices. This test is very sensitive to meeting also the assumption of multivariate normality. For the data below, the test is significant so we conclude the groups do differ in their covariance matrices, violating an assumption of DA. However, discriminant function analysis is robust even when the homogeneity of variances assumption is not met, provided the data do not contain important outliers. Also, when n is large, as it is here, small deviations from homogeneity will be found significant. Test Results Box's M Approx. F df1 df2 Sig. 165.317 2.792 56 32394.124 .000

Tests null hypothesis of equal population covariance matrices.

Summary of Canonical Discriminant Functions


One discriminant function will be computed the lesser of g - 1 (number of dependent groups minus 1) or k (the number of independent variables). Since the dependent, race, has three groups, the number of discriminant functions computed is two. The eigenvalues show how much of the variance in the dependent, race, is accounted for by each of the functions. To attach meaning to the functions (like to factors in factor analysis) we will use the structure matrix later in the output. Wilks's lambda shows each function is significant. 23

Eigenvalues Function Eigenvalue % of Variance Cumulative % Canonical Correlation 1 2 .145(a) .021(a) 87.6 12.4 87.6 100.0 .356 .142

a First 2 canonical discriminant functions were used in the analysis. Wilks' Lambda Test of Function(s) 1 through 2 2 Wilks' Lambda Chi-square df .856 .980 112.950 14.783 Sig.

14 .000 6 .022

The standardized discriminant function coefficients in the table below serve the same purpose as beta weights in multiple regression: they indicate the relative importance of the independent variables in predicting the dependent. Standardized Canonical Discriminant Function Coefficients Function 1 Age When First Married Highest Year of School Completed Respondent's Income Number of Brothers and Sisters Rap Music Think of Self as Liberal or Conservative Marital Status .147 -.211 .071 .674 -.644 -.193 .190 2 .579 -.003 -.255 .246 .135 .086 -.716

The structure matrix table below shows the correlations of each variable with each discriminant function. The correlations serve like factor loadings in factor analysis -- that is, by identifying the largest absolute correlations associated with each discriminant function the researcher gains insight into how to name each function. Structure coefficients vs. standardized discriminant function coefficients. The standardized discriminant function coefficients (above) indicate the partial contribution of each variable to the discriminant function(s), controlling for other independents entered in the equation. The structure coefficients (below) indicate the simple correlations between the variables and the discriminant function or functions. The structure coefficients should be used to assign meaningful labels to the discriminant functions. The standardized discriminant function coefficients should be used to assess each independent variable's unique contribution to the discriminant function.

24

You can see from the example below, it is not easy to assign a meaningful label to each function. The first and most important function has to do with siblings, rap music, education, and political views. The second dimension (function) has to do with age married, marital status, and income. Could these functions be labeled "culture" and "marriage"? Structure Matrix Function 1 Number of Brothers and Sisters Rap Music Highest Year of School Completed Think of Self as Liberal or Conservative Marital Status Age When First Married Respondent's Income .675(*) -.626(*) -.260(*) -.200(*) .232 .105 -.156 2 .270 .112 .099 .117 -.744(*) .581(*) -.156(*)

Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. * Largest absolute correlation between each variable and any discriminant function The table below is used to establish the cutting points for classifying cases. The optimal cutting point is the weighted average of the paired values. The cutting points set ranges of the discriminant score to classify cases as white, black, or other. Of course, the computer does the classification automatically, so these values are for informational purposes. Functions at Group Centroids Function Racew of Respondent white black other 1 -.158 .982 .738 2 -6.990E-03 -.219 .565

Unstandardized canonical discriminant functions evaluated at group means

Classification Statistics
The tables below just tells the researcher about the status of cases in terms of processing. 25

Classification Processing Summary Processed Excluded Missing or out-of-range group codes At least one missing discriminating variable 1500 0 768 732 Prior Probabilities for Groups Prior Racew of Respondent white black other Total .851 .100 .049 1.000 Cases Used in Analysis Unweighted Weighted 623 73 36 732 623.000 73.000 36.000 732.000

Used in Output

The territorial map below is a plot of the boundaries used for classifying cases into groups based on discriminant function scores. It is obtained by checking "Territorial map" in the "Classify" options of discriminant analysis. For the meaning of the symbols, note the legend below the map. For instance, where one sees "13" near the top of the map, this is a point in discriminant space where group 1 (whites) are differentiated from group 3 (other) on the two functions.
Territorial Map Canonical Discriminant Function 2 -3.0 -2.0 -1.0 .0 1.0 2.0 3.0 +---------+---------+---------+---------+---------+---------+ 3.0 + 13 + I 13 I I 13 I I 13 I I 13 I I 13 I 2.0 + + + + + + 13 + I 13 I I 133333I I 12222I I 12 I I 12 I 1.0 + + + + + + 12 + I 12 I I 12 I I * 12 I I 12 I I 12 I .0 + + + * + + + 12 + I * 12 I I 12 I I 12 I I 12 I

26

I -1.0 + I I I I I -2.0 + I I I I I -3.0 +

12 I +12 + 12 I 12 I 12 I 12 I 12 I + + + + 12 + 12 I 12 I 12 I 12 I 12 I 12 + +---------+---------+---------+---------+---------+---------+ -3.0 -2.0 -1.0 .0 1.0 2.0 3.0 Canonical Discriminant Function 1 + + + +

Symbols used in territorial map Symbol -----1 2 3 * Group ----1 2 3 Label -------------------white black other Indicates a group centroid

Separate-Groups Graphs
The tables below result from checking "Combined-groups" and "Separate-groups" under "Plots" in the "Classify" options of discriminant analysis. Since there are two or more discriminant functions, the charts are scatterplots showing the discriminant scores of the cases on the two discriminant functions. The first three tables show this separately for each of the three race groups, and the fourth

27

table shows the same information for the combined groups.

28

The table below is used to assess how well the discriminant function works, and if it works equally well for each group of the dependent variable. Here it correctly classifies about 85% of the cases, but this is not as good as it seems. DA gets almost all whites correctly classified. However, it misclassifies most of the "blacks" and "other" cases. The seemingly high 85% rating is obtained by classifying nealy everyone white in a sample which is preponderantly white. This is not a satisfactory discriminant analysis. It would be better to train DA on an analysis set which was balanced in terms of numbers of people in each race group. Classification Results(a) 29

Predicted Group Membership Race of Respondent white Count Original % black other white black other white 616 64 33 98.9 87.7 91.7 black 6 8 2 1.0 11.0 5.6 other 1 1 1 .2 1.4 2.8

Total 623 73 36 100.0 100.0 100.0

a 85.4% of original grouped cases correctly classified.

30