You are on page 1of 26

Discriminant Analysis and Classification

Discriminant Analysis as a Type of MANOVA


The good news about DA is that it is a lot like MANOVA; in fact in the case of a factor with only two levels it is the same thing Has the same assumptions as MANOVA; multivariate normality, independence of cases, homogeneity of group covariances DA permits a multivariate analysis of variance hypothesis of the test that two or more groups (conditions, levels) differ significantly on a linear combination of discriminating variables. Another way to put this is: how well can the levels of the grouping variable be discriminated by scores on the discriminating variables? In general its good to use naturally occurring groups that are mutually exclusive groups that are exhaustive of the domain, rather than median splits or arbitrary divisions

Discriminant Analysis as a Type of MANOVA, contd


In the case where there are more than two groups, DA permits you to test the hypothesis that there is more than one significant way of describing how the groups differ on a weighted linear combination of the discriminating variables, and you can think of these combinations, called canonical variables, as dimensions of difference. These variables will be uncorrelated with each other This way of using DA is called descriptive discriminant analysis

Discriminant Analysis as Part of a System for Classifying Cases


Usually discriminant analysis is presented conceptually in an upside down sort of way, where what you would traditionally think of as dependent variables are actually the predictor variables, and group membership rather than being the levels of the IV are groups whose membership is being predicted When it is used in this way, the hypothesis you are testing is that there is a linear combination of variables which when appropriately weighted (like beta weights) will maximally discriminate between members of two or more groups and permit new cases to be classified into the groups In this mode, called predictive discriminant analysis, DA is used to develop a classification rule that will permit things like classifying people as potential Republican voters or not, or to predict their future status as able to complete four years of college or not, or to be able to pay their car loan

Discriminant Analysis as Part of a System for Classifying Cases, contd


Discriminant analysis is part of the general linear model and combines some of the features familiar to you from multiple regression and some from MANOVA. Its basically multiple regression where the criterion variable is nominal rather than interval/ratio level When DA is used in this predictive way it is usually followed up by classification procedures to classify new cases based on the obtained discriminant function(s)

Discriminant Analysis and MANOVA


Lets work through an example of discriminant analysis, and show how it can approach a question from two sides: testing a MANOVA hypothesis and predicting group membership First lets consider the hypothesis that a nations level of concentration of wealth (in the hands of a few, more widely distributed, or somewhere in between) has a significant impact on four dependent variables: human development score, political rights score, the gini (inequality) index, and civil liberties score

Discriminant Analysis and MANOVA, contd


Note. In creating these three wealth concentration groups out of interval level data I am not advocating this practice but only creating groups for purposes of illustration. Naturally occurring, clearly separated groups, e.g., males and females, people who survived after five years of diagnosis and people who didnt) are preferred for the grouping variable

This sounds like a hypothesis that could be tested with MANOVA, and it is, but it can also be tested with discriminant analysis First lets look at what MANOVA will tell us about this hypothesis

MANOVA test of the Hypothesis


d Multiv ariate Tests

Effect Intercept

WCONCENT

Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root

Value .980 .020 47.996 47.996 .880 .205 3.468 3.344

F Hypothesis df 467.961 b 4.000 467.961 b 4.000 b 467.961 4.000 467.961 b 4.000 7.857 8.000 11.793b 8.000 16.473 8.000 33.443c 4.000

Error df 39.000 39.000 39.000 39.000 80.000 78.000 76.000 40.000

Sig. .000 .000 .000 .000 .000 .000 .000 .000

Partial Eta Squared .980 .980 .980 .980 .440 .547 .634 .770

Noncent. Parameter 1871.844 1871.844 1871.844 1871.844 62.852 94.344 131.787 133.772

Observed a Power 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

a. Computed using alpha = .05 b. Exact statistic c. The statistic is an upper bound on F that yields a lower bound on the significance level. d. Design: Intercept+WCONCENT

Here we see that the hypothesis is confirmed: Countrys wealth concentration has a significant main effect on the set of four indicators

Univariate F Tests of the Four Variables

As you can note from the output, the univariate F tests for each of the four variables are all significant at p < .001. But what this output doesnt tell us is what sort of combination of these four variables the countries differ on, or if there is more than one combination on which they are significantly different

More than MANOVA: Additional Information from Discriminant Analysis


Here is some of the additional information we can get from a discriminant analysis to help us understand the relationship between a countrys concentration of wealth and the four variables DA transforms the original variables into one or more new variables, called canonical variables, that combine the four separate variables, appropriately weighted, into a new, single index which maximally discriminates between the countries in terms of concentration of wealth. That is, the procedure looks for a set of weights (the discriminant function) to apply to the discriminating variables that produces as much separation as possible among the levels of the grouping variable In the case of more than two levels of the grouping variable (for instance, concentration of wealth), there may be one or more additional ways of weighting and combining the variables (resulting in one or more canonical variables) that will maximize how the groups differ

Number of Functions Extracted in Heres Wilks lambda again. DA Combining both discriminant
The discriminant analysis procedure extracts a maximum of m (number of discriminating variables) or k-1 underlying dimensions or canonical discriminant functions (whichever is smaller), where k is the number of groups or categories of the nominal level variable. For example, we have three categories of countrys wealth concentration, so two of these functions are extracted. Think of the idea of a total amount of variation in countrys wealth concentration that you could predict with one or more different combinations of the four variables (gini index, civil liberties score, etc) as 100%. The first new canonical variable (weighted combination of the four) accounts for 96.4 % of it, and the second canonical variable for the remaining 3.6 %. Combining these two improves the prediction
Wilks' Lambda Test of Function(s) 1 through 2 2 Wilks' Lambda .205 .890 Chi-square 64.215 4.726 df 8 3

functions allows you to predict all but .205 of the variation in level of wealth concentration

Sig. .000 .193

Eigenv alues Function 1 2 Eigenvalue % of Variance 3.344 a 96.4 .124 a 3.6 Cumulative % 96.4 100.0 Canonical Correlation .877 .332

a. First 2 canonical discriminant functions were used in the analysis.

Of the variance explained in wealth concentration, 96.4% was explained by the first function and 3.6% by the second one. Some variance of course remains unexplained.

Statistics Associated with the Two Discriminant Functions


Note that associated with each of these two functions is a level of Wilks lambda. From the first table, we can see that the Wilkslambda is big (.89) for just the second canonical discriminant function, and that means that using that combination of weights on the four dependent variables leaves about 89% of the variance in countrys wealth concentration unexplained. But when you add the first function to the predictive equation, you reduce the unexplained variance to only about 20% (.205). The second function isnt significant, but the combination of the two is. This value of Wilks lambda is the one that is tested for significance in the overall test in MANOVA (see slide 5)
Wilks' Lambda Test of Function(s) 1 through 2 2 Wilks' Lambda .205 .890 Chi-square 64.215 4.726 df 8 3 Sig. .000 .193

Eigenv alues Function 1 2 Eigenvalue % of Variance 3.344 a 96.4 a .124 3.6 Cumulative % 96.4 100.0 Canonical Correlation .877 .332

a. First 2 canonical discriminant functions were used in the analysis.

Two other values that you see in the output are the eigenvalue and the canonical correlation. The eigenvalue is a value that can be interpreted as the variance of its respective discriminant function and the canonical correlation is the correlation between the new canonical variables formed by applying the weights from the discriminant function to the four predictors, and levels of wealth concentration

Standardized and Unstandardized Canonical Discriminant Function Coefficients


Standardized Canonical Discriminant Function Coefficients Function 1 human devel score: hi=more Political rights score Civil liberties score Gini index:0=perfect $ equality,100=perfect inequality -.203 -.528 .033 .884 2 .689 -.437 .641 .482
human devel score: hi=more Political rights score Civil liberties score Gini index:0=perfect $ equality,100=perfect inequality (Constant) Canonical Discriminant Function Coefficients Function 1 -1.240 -.366 .027 .126 -2.384 2 4.207 -.303 .535 .069 -7.167

Unstandardized coefficients

The standardized and unstandarized canonical discriminant function coefficients are like the b and the weights in multiple regression. The ones on the right, with a constant, are like the beta weights and the intercept that you use with raw scores to classify new cases as to countrys wealth concentration. The ones on the left are the standardized coefficients, which means the variables are all measured on the same scale, and the weights can be compared to determine the relative importance of each of the variables to explaining group separation (differences in level of wealth concentration)

Interpreting the Standardized Discriminant Function Coefficients


Standardized Canonical Discriminant Function Coefficients Function 1 human devel score: hi=more Political rights score Civil liberties score Gini index:0=perfect $ equality,100=perfect inequality -.203 -.528 .033 .884 2 .689 -.437 .641 .482

These coefficients can be used to classify new cases if the four discriminating variables are expressed in standard (z) scores

These coefficients or weights tell you how the four original variables combine to make a new one that maximally separates the countries based on their wealth concentration. You can interpret the standardized discriminant function coefficients as a measure of the relative importance of each of the original predictors. We will only interpret the first function since it explains so much more of the variance in countrys wealth concentration than the second one, and the second function was not significant. Function 1 could be labeled inequality since it is defined by the high positive loading of the gini index, and the high negative loading of political rights. The human development score and civil liberties score are comparatively unimportant in describing the separation among the categories of countrys wealth concentration

Discriminant Functions at the Group Centroids


Functions at Group Centroids Concentration of Wealth in Hands of Few LowWealthConcentr ModerateWealthConcentr HighWealthConcentr Function 1 -2.023 -.022 1.828 2 .148 -.792 .144

Unstandardized canonical discriminant functions evaluated at group means

Canonical Discriminant Function Coefficients Function 1 human devel score: hi=more Political rights score Civil liberties score Gini index:0=perfect $ equality,100=perfect inequality (Constant) -1.240 -.366 .027 .126 -2.384 2 4.207 -.303 .535 .069 -7.167

This table shows the group centroids (vector of means) on the two new canonical variables formed by applying the discriminant function weights. Notice how well function 1 separates the low wealth concentration countries from the high wealth countries. You can think of the centroid for each group or level as that groups average discriminant score on that function (where for raw scores the discriminant score is -2.384 -1.240 human development score -.366 political rights score + .027 civil liberties + .126 gini index). New cases would be classified into groups depending on the group whose centroid their own vector of scores was closest to.

Unstandardized coefficients

Territorial Map from Discriminant Analysis This territorial map plots off the

Low wealth concentration

High

Medium

location of cases based on their discriminant scores. Note for example that most of the low wealth concentration cases (the 1s) are concentrated on the negative end of function 1 (i.e., they are negative on inequality)) and the high wealth concentration cases (the 3s) are on the positive end (i.e., they are positive on inequality), consistent with the location of their group means (centroids) on the function (see arrows)
Functions at Group Centroids

Concentration of Wealth in Hands of Few LowWealthConcentr ModerateWealthConcentr HighWealthConcentr

Function 1 -2.023 -.022 1.828

2 .148 -.792 .144

Unstandardized canonical discriminant functions evaluated at group means

Quadratic Classification
High
Low Wealth Concentration

Medium

One way of handling the problem of unequal covariances across groups (i.e., you flunked the Boxs M test) is to base the classification not on the combined covariance matrices but on the separate ones (this is an option in SPSS). Notice that you get a bit of a different result.

Using Classification Results to Evaluate the Discriminant Functions


a Classification Results

Original

Count

Predicted Group Membership Concentration of Wealth LowWealt ModerateWe HighWealt in Hands of Few hConcentr althConcentr hConcentr LowWealthConcentr 17 1 0 ModerateWealthConcentr 1 4 2 HighWealthConcentr 0 3 17 Ungrouped cases 4 15 9 LowWealthConcentr 94.4 5.6 .0 ModerateWealthConcentr 14.3 57.1 28.6 HighWealthConcentr .0 15.0 85.0 Ungrouped cases 14.3 53.6 32.1

Total 18 7 20 28 100.0 100.0 100.0 100.0

a. 84.4% of original grouped cases correctly classified.

Recall that the new canonical variables created by applying the discriminant function weights to the four original variables could be used to classify cases. Its best to have a holdout sample to use to test the new canonical variables as to how well they classify cases that werent part of the development or training sample, but we can go back and reclassify the existing cases to see how well we do at using the new canonical variables to classify cases back into the groups they belong to. According to the table above when the discriminant functions were used to predict what a countrys level of wealth concentration was from the four variables, 84.4% of the original grouped cases were correctly reclassifed back into their original categories (p(2), the hit rate). You can note that the largest proportion of errors were in reclassifying the middle category (moderate wealth concentration) while the classification was nearly perfect in reclassifying the low wealth concentration countries (only one error)

Classification Rules
Decision rules developed from discriminant analysis can be influenced by knowledge of or expectations about the relative size in the population of the levels of the grouping variable E.g., approximately 5% of the population of mortgagees will default in a given year, so the prior probabilities are 5% for one group and 95% for the non-default group In cases where these prior probabilities are not known they are often based on the sample sizes for the levels of the grouping variable if the sample is a random sample from the population Some decision rules treat the prior probabilities as equal across all levels and let the discriminating variables do all the classification work

Classification Rules
As mentioned earlier, sometimes a decision is made in advance to test a discriminant function by holding out a sample and then using the function obtained on the training sample to classify the new cases from the holdout sample An alternative approach is the leave-oneout method which is an option in SPSS under the Classify button
Each case is deleted in turn from the training sample and is classified by means of the classification rule established on the remaining observations

Stepwise Discriminant Analysis


Recall that when we talked about regression we learned about a variation of multiple regression called stepwise in which variables were entered into the regression equation based on the strength of their relationship with the criterion variable You can perform this same sort of stepwise procedure with discriminant analysis. At each step in the analysis the variable which maximizes the overall Wilks lambda or some related criterion is entered, and if a variable doesnt make a significant contribution according to the F to enter and F to remove criteria that you set up it will not be kept in the final equation Stepwise DA is useful when the number of potential discriminating variables is large and you need to reduce the number

Example of Stepwise Discriminant Analysis


Standardized Canonical Discriminant Function Coefficients Function Political rights score Gini index:0=perfect $ equality,100=perfect inequality 1 -.620 .898 2 .804 .472

Wilks' Lambda Test of Function(s) 1 through 2 2 Wilks' Lambda .222 .944 Chi-square 62.440 2.372 df 4 1 Sig. .000 .124

The stepwise discriminant analysis tossed out two of the four variables for not measuring up, the two that seemed to have the lowest weights on the first function in the original DA. Note that these new canonical variables dont explain quite as much variance (lambda is a little bigger than the .205 that it was in the original analysis, and the classification correctness rate is lower (75.6% compared to 84.4%)). The original seems better as long as it is not your goal to find the most parsimonious solution using the fewest predictors

a Classification Results

Original

Count

Predicted Group Membership Concentration of Wealth LowWealt ModerateWe HighWealt in Hands of Few hConcentr althConcentr hConcentr LowWealthConcentr 17 1 0 ModerateWealthConcentr 2 3 2 HighWealthConcentr 0 6 14 Ungrouped cases 4 14 10 LowWealthConcentr 94.4 5.6 .0 ModerateWealthConcentr 28.6 42.9 28.6 HighWealthConcentr .0 30.0 70.0 Ungrouped cases 14.3 50.0 35.7

Total 18 7 20 28 100.0 100.0 100.0 100.0

a. 75.6% of original grouped cases correctly classified.

Writing up the Results of Your Discriminant Analysis


Discriminant analysis was used to conduct a multivariate analysis of variance test of the hypothesis that countries with high, moderate, and low concentration of wealth would differ significantly on a linear combination of four variables, gini index, political rights score, civil liberties score, and human development score. The overall Chi-square test was significant (Wilks = .205, Chi-square = 64.215, df = 8, Canonical correlation = .877, p <. 001); the two functions extracted accounted for nearly 80% of the variance in countrys wealth concentration, confirming the hypothesis. Table 1 presents the standardized discriminant function coefficients. Function 1 was labeled inequality. The gini index, which measures inequality, was highly correlated with the function and the political rights score had a strong negative correlation. Table 2 shows the two functions at the group centroids. Reclassification of cases based on the new canonical variables was highly successful: 84.4% of the cases were correctly reclassified into their original categories.
Standardized Canonical Discriminant Function Coefficients Function 1 human devel score: hi=more Political rights score Civil liberties score Gini index:0=perfect $ equality,100=perfect inequality -.203 -.528 .033 .884 2 .689 -.437 .641 .482

Functions at Group Centroids Concentration of Wealth in Hands of Few LowWealthConcentr ModerateWealthConcentr HighWealthConcentr Function 1 -2.023 -.022 1.828

2 .148 -.792 .144

Unstandardized canonical discriminant functions evaluated at group means

Table 1

Table 2

Now Its Time for you to Do a Discriminant Analysis in SPSS


Go here to download the file NationsoftheWorldmodified.sav Lets test the hypothesis that Countrys Wealth Concentration is significantly associated with a linear combination of three variables, number of peaceful political demonstrations, political rights, and number of strikes Go to Analyze/ Classify/ Discriminant

Select Enter Independents together (not stepwise for now) Click on the Classify button and under Prior Probabilities set All Groups Equal and under Display select Summary table, and click Continue Click on the Statistics button and check means, univariate Anovas, Boxs M, and unstandardized function coefficients, and click Continue Click OK, and compare your output to the next several slides

Move the Countrys Wealth Concentration Variable into the Grouping window and set the range to a minimum of 1 and a maximum of 3 Move the Number of peaceful political demonstrations, Political rights, and Number of strikes variables into the Independents box

Important Statistics for this Discriminant Analysis


Wilks' Lambda Test of Function(s) 1 through 2 2 Wilks' Lambda .605 .990 Chi-square 29.616 .616 df 6 2 Sig. .000 .735

Eigenv alues Function 1 2 Eigenvalue % of Variance .635 a 98.4 .010 a 1.6 Cumulative % 98.4 100.0 Canonical Correlation .623 .102

a. First 2 canonical discriminant functions were used in the analysis.


Standardized Canonical Discriminant Function Coefficients Function 1 Number of peaceful political demonstrations Political rights score Number of strikes of >1,000 indust or service workers .311 1.009 -.273 2 .330 .022 .856

Functions at Group Centroids Concentration of Wealth in Hands of Few LowWealthConcentr ModerateWealthConcentr HighWealthConcentr Function 1 1.052 -.384 -.658

2 -.018 .180 -.079

Unstandardized canonical discriminant functions evaluated at group means


a Classification Results

Original

Count

Predicted Group Membership Concentration of Wealth LowWealt ModerateWe HighWealt in Hands of Few hConcentr althConcentr hConcentr LowWealthConcentr 21 1 0 ModerateWealthConcentr 5 1 8 HighWealthConcentr 7 4 16 Ungrouped cases 14 7 28 LowWealthConcentr 95.5 4.5 .0 ModerateWealthConcentr 35.7 7.1 57.1 HighWealthConcentr 25.9 14.8 59.3 Ungrouped cases 28.6 14.3 57.1

Total 22 14 27 49 100.0 100.0 100.0 100.0

a. 60.3% of original grouped cases correctly classified.

Lab #9, Question 2


Question 2. Duplicate the preceding data analysis in SPSS. Write up the results (the tests of the hypothesis about the relationship of countrys wealth concentration and the three predictor variables of number of strikes, number of demonstrations and political rights score, as if you were writing for publication. Put your paragraph in a Word document, and illustrate your results with tables from the output as appropriate (for example, the overall Wilks lambda table, group centroids, classification results, etc. Use the writeup from the previous discriminant analysis as a template.

You might also like