You are on page 1of 12





Saurav Gupta

Discriminant Analysis

Introduction ............................................................................................................................................................................... 2 Assumptions .............................................................................................................................................................................. 2 Purpose ........................................................................................................................................................................................ 3 Computational Approach ..................................................................................................................................................... 4 Stepwise Discriminant Analysis ........................................................................................................................................ 4 Model ....................................................................................................................................................................................... 4 Forward stepwise analysis ............................................................................................................................................. 4 Backward stepwise analysis .......................................................................................................................................... 5 F to enter, F to remove...................................................................................................................................................... 5 Capitalizing on chance ...................................................................................................................................................... 5 Discriminant Functions ......................................................................................................................................................... 5 Interpreting a Two-Group Discriminant Function .................................................................................................... 5 Discriminant Functions for Multiple Groups ............................................................................................................... 6 Canonical analysis .............................................................................................................................................................. 6 Interpreting the discriminant functions ................................................................................................................... 6 Factor structure matrix .................................................................................................................................................... 7 Significance of discriminant functions ....................................................................................................................... 7 Summary ................................................................................................................................................................................ 8 Using SPSS .................................................................................................................................................................................. 8 Applications ............................................................................................................................................................................. 10 Bankruptcy prediction.................................................................................................................................................... 10 Face recognition ................................................................................................................................................................ 11 Marketing ............................................................................................................................................................................. 11

Page 1

Saurav Gupta

Discriminant Analysis

Discriminant function analysis is a statistical analysis to predict a categorical dependent variable (called a grouping variable) by one or more continuous or binary independent variables (called predictor variables). The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936. It is different from an ANOVA or MANOVA, which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables. Discriminant function analysis is useful in determining whether a set of variables is effective in predicting category membership. Discriminant Analysis is a technique used to discriminate between two or more mutually exclusive and exhaustive groups on the basis of some explanatory variables. These groups are known apriori. When the criterion variable has two categories, the technique is known as two-group discriminant analysis. When three or more categories are involved, the technique is referred to as multiple discriminant analysis. Discriminant analysis helps researchers who are interested to understand how consumers differ with respect to demographic and psychographic characteristics. Discriminant analysis is also used to predict the group membership. Banks use discriminant analysis to discriminate between the customers who default and who repay the loan in time, based on their age, income, assets, number of dependents, and previous outstanding loan, etc.

The assumptions of discriminant analysis are the same as those for MANOVA. The analysis is quite sensitive to outliers and the size of the smallest group must be larger than the number of predictor variables. The major underlying assumptions are: the observations are a random sample; each predictor variable is normally distributed; each of the allocations for the dependent categories in the initial classification are correctly classified; there must be at least two groups or categories, with each case belonging to only one group so that the groups are mutually exclusive and collectively exhaustive (all cases can be placed in a group); each group or category must be well defined, clearly differentiated from any other group(s) and natural. Putting a median split on an attitude scale is not a natural way to form groups. Partitioning quantitative variables is only justifiable if there are easily identifiable gaps at the points of division; for instance, three groups taking three available levels of amounts of housing loan; the groups or categories should be defined before collecting the data;

Page 2

Saurav Gupta

Discriminant Analysis

the attribute(s) used to separate the groups should discriminate quite clearly between the groups so that group or category overlap is clearly non-existent or minimal; group sizes of the dependent should not be grossly different and should be at least five times the number of independent variables.

There are several purposes of DA: To investigate differences between groups on the basis of the attributes of the cases, indicating which attributes contribute most to group separation. The descriptive technique successively identifies the linear combination of attributes known as canonical discriminant functions (equations) which contribute maximally to group separation. Predictive DA addresses the question of how to assign new cases to groups. The DA function uses a persons scores on the predictor variables to predict the category to which the individual belongs. To determine the most parsimonious way to distinguish between groups. To classify cases into groups. Statistical significance tests using chi square enable you to see how well the function separates the groups. To test theory whether cases are classified as predicted.

The aim of the statistical analysis in DA is to combine (weight) the variable scores in some way so that a single new composite variable, the discriminant score, is produced. One way of thinking about this is in terms of a food recipe, where changing the proportions (weights) of the ingredients will change the characteristics of the finished cakes. Hopefully the weighted combinations of ingredients will produce two different types of cake.

Figure 1 Similarly, at the end of the DA process, it is hoped that each group will have a normal distribution of discriminant scores. The degree of overlap between the discriminant score distributions can then Page 3

Saurav Gupta

Discriminant Analysis

be used as a measure of the success of the technique, so that, like the different types of cake mix, we have two different types of groups (Fig. 1). For example: The top two distributions in Figure 1 overlap too much and do not discriminate too well compared to the bottom set.

Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA). Let us consider a simple example. Suppose we measure height in a random sample of 50 males and 50 females. Females are, on the average, not as tall as males, and this difference will be reflected in the difference in means (for the variable Height). Therefore, variable height allows us to discriminate between males and females with a better than chance probability: if a person is tall, then he is likely to be a male, if a person is short, and then she is likely to be a female. We can generalize this reasoning to groups and variables that are less "trivial." For example, suppose we have two groups of high school graduates: Those who choose to attend college after graduation and those who do not. We could have measured students' stated intention to continue on to college one year prior to graduation. If the means for the two groups (those who actually went to college and those who did not) are different, then we can say that intention to attend college as stated one year prior to graduation allows us to discriminate between those who are and are not college bound (and this information may be used by career counselors to provide the appropriate guidance to the respective students).


Probably the most common application of discriminant function analysis is to include many measures in the study, in order to determine the ones that discriminate between groups. For example, an educational researcher interested in predicting high school graduates' choices for further education would probably include as many measures of personality, achievement motivation, academic performance, etc. as possible in order to learn which one(s) offer the best prediction.

Put another way, we want to build a "model" of how we can best predict to which group a case belongs. In the following discussion we will use the term "in the model" in order to refer to variables that are included in the prediction of group membership, and we will refer to variables as being "not in the model" if they are not included.


In stepwise discriminant function analysis, a model of discrimination is built step-by-step. Specifically, at each step all variables are reviewed and evaluated to determine which one will contribute most to the discrimination between groups. That variable will then be included in the model, and the process starts again.

Page 4

Saurav Gupta

Discriminant Analysis


One can also step backwards; in that case all variables are included in the model and then, at each step, the variable that contributes least to the prediction of group membership is eliminated. Thus, as the result of a successful discriminant function analysis, one would only keep the "important" variables in the model, that is, those variables that contribute the most to the discrimination between groups.

The stepwise procedure is "guided" by the respective F to enter and F to remove values. The F value for a variable indicates its statistical significance in the discrimination between groups, that is, it is a measure of the extent to which a variable makes a unique contribution to the prediction of group membership. If you are familiar with stepwise multiple regression procedures, then you may interpret the F to enter/remove values in the same way as in stepwise regression.

A common misinterpretation of the results of stepwise discriminant analysis is to take statistical significance levels at face value. By nature, the stepwise procedures will capitalize on chance because they "pick and choose" the variables to be included in the model so as to yield maximum discrimination. Thus, when using the stepwise approach the researcher should be aware that the significance levels do not reflect the true alpha error rate, that is, the probability of erroneously rejecting H0 (the null hypothesis that there is no discrimination between groups).

Discriminant analysis works by creating one or more linear combinations of predictors, creating a new latent variable for each function. These functions are called discriminant functions. The number of functions possible is either Ng-1 where Ng = number of groups, or p (the number of predictors), whichever is smaller. The first function created maximizes the differences between groups on that function. The second function maximizes differences on that function, but also must not be correlated with the previous function. This continues with subsequent functions with the requirement that the new function not be correlated with any of the previous functions. Given group , with sets of sample space, there is a discriminant rule such that if , then

. Discriminant analysis then, finds good regions of to minimize classification error, therefore leading to a high percent correct classified in the classification table.


In the two-group case, discriminant function analysis can also be thought of as (and is analogous to) multiple regression (the two-group discriminant analysis is also called Fisher linear discriminant analysis after Fisher, 1936; computationally all of these approaches are analogous). If we code the two groups in the analysis as 1 and 2, and use that variable as the dependent variable in a multiple

Page 5

Saurav Gupta

Discriminant Analysis

regression analysis, then we would get results that are analogous to those we would obtain via Discriminant Analysis. In general, in the two-group case we fit a linear equation of the type: Group = a + b1*x1 + b2*x2 + ... + bm*xm where a is a constant and b1 through bm are regression coefficients. The interpretation of the results of a two-group problem is straightforward and closely follows the logic of multiple regression: Those variables with the largest (standardized) regression coefficients are the ones that contribute most to the prediction of group membership.


When there are more than two groups, then we can estimate more than one discriminant function like the one presented above. For example, when there are three groups, we could estimate (1) a function for discriminating between group 1 and groups 2 and 3 combined, and (2) another function for discriminating between group 2 and group 3. For example, we could have one function that discriminates between those high school graduates that go to college and those who do not (but rather get a job or go to a professional or trade school), and a second function to discriminate between those graduates that go to a professional or trade school versus those who get a job. The b coefficients in those discriminant functions could then be interpreted as before.

When actually performing a multiple group discriminant analysis, we do not have to specify how to combine groups so as to form different discriminant functions. Rather, you can automatically determine some optimal combination of variables so that the first function provides the most overall discrimination between groups, the second provides second most, and so on. Moreover, the functions will be independent or orthogonal, that is, their contributions to the discrimination between groups will not overlap. Computationally, you will perform a canonical correlation analysis that will determine the successive functions and canonical roots (the term root refers to the eigenvalues that are associated with the respective canonical function). The maximum number of functions will be equal to the number of groups minus one, or the number of variables in the analysis, whichever is smaller.


As before, we will get b (and standardized beta) coefficients for each variable in each discriminant (now also called canonical) function, and they can be interpreted as usual: the larger the standardized coefficient, the greater is the contribution of the respective variable to the discrimination between groups. (Note that we could also interpret the structure coefficients; see below.) However, these coefficients do not tell us between which of the groups the respective functions discriminate. We can identify the nature of the discrimination for each discriminant (canonical) function by looking at the means for the functions across groups. We can also visualize how the two functions discriminate between groups by plotting the individual scores for the two discriminant functions (see Figure 2).

Page 6

Saurav Gupta

Discriminant Analysis

Figure 2 In this example, Root (function) 1 seems to discriminate mostly between groups Setosa, and Virginic and Versicol combined. In the vertical direction (Root 2), a slight trend of Versicol points to fall below the center line (0) is apparent.


Another way to determine which variables "mark" or define a particular discriminant function is to look at the factor structure. The factor structure coefficients are the correlations between the variables in the model and the discriminant functions; if you are familiar with factor analysis you may think of these correlations as factor loadings of the variables on each discriminant function. Some authors have argued that these structure coefficients should be used when interpreting the substantive "meaning" of discriminant functions. The reasons given by those authors are that (1) supposedly the structure coefficients are more stable, and (2) they allow for the interpretation of factors (discriminant functions) in the manner that is analogous to factor analysis. However, subsequent Monte Carlo research (Barcikowski & Stevens, 1975; Huberty, 1975) has shown that the discriminant function coefficients and the structure coefficients are about equally unstable, unless the n is fairly large (e.g., if there are 20 times more cases than there are variables). The most important thing to remember is that the discriminant function coefficients denote the unique (partial) contribution of each variable to the discriminant function(s), while the structure coefficients denote the simple correlations between the variables and the function(s). If one wants to assign substantive "meaningful" labels to the discriminant functions (akin to the interpretation of factors in factor analysis), then the structure coefficients should be used (interpreted); if one wants to learn what is each variable's unique contribution to the discriminant function, use the discriminant function coefficients (weights).


One can test the number of roots that add significantly to the discrimination between group. Only those found to be statistically significant should be used for interpretation; non-significant functions (roots) should be ignored.

Page 7

Saurav Gupta

Discriminant Analysis

To summarize, when interpreting multiple discriminant functions, which arise from analyses with more than two groups and more than one variable, one would first test the different functions for statistical significance, and only consider the significant functions for further examination. Next, we would look at the standardized b coefficients for each variable for each significant function. The larger the standardized b coefficient, the larger is the respective variable's unique contribution to the discrimination specified by the respective discriminant function. In order to derive substantive "meaningful" labels for the discriminant functions, one can also examine the factor structure matrix with the correlations between the variables and the discriminant functions. Finally, we would look at the means for the significant discriminant functions in order to determine between which groups the respective functions seem to discriminate.

You will now be taken through a discriminant analysis using that data which includes demographic data and scores on various questionnaires. smoke is a nominal variable indicating whether the employee smoked or not. The other variables to be used are age, days absent sick from work last year, self-concept score, anxiety score and attitudes to anti-smoking at work score. The aim of the analysis is to determine whether these variables will discriminate between those who smoke and those who do not. This is a simple discriminant analysis with only two groups in the DV. With three or more DV groupings a multiple discriminant analysis is involved, but this follows the same process in SPSS as described below except there will be more than one set of eigenvalues, Wilks Lambdas and beta coefficients. The number of sets is always one less than the number of DV groups. 1. Analyse >> Classify >> Discriminant 2. Select smoke as your grouping variable and enter it into the Grouping Variable Box (Fig. 3). 3. Click Define Range button and enter the lowest and highest code for your groups (here it is 1 and 2) (Fig. 4).

Figure 3 Page 8

Saurav Gupta

Discriminant Analysis

4. Click Continue. 5. Select your predictors (IVs) and enter into Independents box (Fig. 5) and select Enter Independents Together. If you planned a stepwise analysis you would at this point select Use Stepwise Method and not the previous instruction.

Figure 4

Figure 5 6. Click on Statistics button and select Means, Univariate Anovas, Boxs M, Unstandardized and Within-Groups Correlation (Fig. 6).

Figure 6

Page 9

Saurav Gupta

Discriminant Analysis

7. Continue >> Classify. Select Compute From Group Sizes, Summary Table, Leave One Out Classifi cation, Within Groups, and all Plots (Fig. 7).

Figure 7 8. Continue >> Save and select Predicted Group Membership and Discriminant Scores (Fig. 8).

Figure 8 9. OK.

In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Despite limitations including known nonconformance of accounting ratios to the normal distribution assumptions of LDA, Edward Altman's 1968 model is still a leading model in practical applications.

Page 10

Saurav Gupta

Discriminant Analysis

In computerized face recognition, each face is represented by a large number of pixel values. Linear discriminant analysis is primarily used here to reduce the number of features to a more manageable number before classification. Each of the new dimensions is a linear combination of pixel values, which form a template. The linear combinations obtained using Fisher's linear discriminant are called Fisher faces, while those obtained using the related principal component analysis are called Eigen faces.

In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. Logistic regression or other methods are now more commonly used. The use of discriminant analysis in marketing can be described by the following steps: 1. Formulate the problem and gather data Identify the salient attributes consumers use to evaluate products in this category Use quantitative marketing research techniques (such as surveys) to collect data from a sample of potential customers concerning their ratings of all the product attributes. The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to rate a product from one to five (or 1 to 7, or 1 to 10) on a range of attributes chosen by the researcher. Anywhere from five to twenty attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colorfulness, price, or size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the products in the study. The data for multiple products is codified and input into a statistical program such as R, SPSS or SAS. (This step is the same as in Factor analysis). 2. Estimate the Discriminant Function Coefficients and determine the statistical significance and validity Choose the appropriate discriminant analysis method. The direct method involves estimating the discriminant function so that all the predictors are assessed simultaneously. The stepwise method enters the predictors sequentially. The two-group method should be used when the dependent variable has two categories or states. The multiple discriminant method is used when the dependent variable has three or more categorical states. Use Wilkss Lambda to test for significance in SPSS or F stat in SAS. The most common method used to test validity is to split the sample into an estimation or analysis sample, and a validation or holdout sample. The estimation sample is used in constructing the discriminant function. The validation sample is used to construct a classification matrix which contains the number of correctly classified and incorrectly classified cases. The percentage of correctly classified cases is called the hit ratio. 3. Plot the results on a two dimensional map, define the dimensions, and interpret the results. The statistical program (or a related module) will map the results. The map will plot each product (usually in two-dimensional space). The distance of products to each other indicate either how different they are. The dimensions must be labeled by the researcher. This requires subjective judgment and is often very challenging.

Page 11

You might also like