Discriminant Analysis

This contains my personal notes only – thus, this is not complete. Most of the contents were taken from the training manual of IBM SPSS Modeler. Please refer to the training manual for a complete discussion.

• It is a technique designed to characterize the relationship between a set of variables and a grouping variable with a relatively small number of categories. • It creates a linear combination of the predictors that best characterizes the differences among the groups.

Types of Discriminant
• Predictive Discriminant Analysis (PDA) – focuses in formulating a rule by which prediction of, or identification with, group membership for a given unit is determined. Sample rule: If HSGPA > 85, Average Grade in GE Subjects> 2.0 Average Grade in major Subjects >1.5 PreboardScore>80 Then, pass LET. • Descriptive Discriminant Analysis (DDA) - uses the grouping variable as the outcome variable and attempts to study the relationships between it and the input variables.

Discriminant Example
• Goal: To determine what set of demographic and attitude items best predict which customers might buy another VCR.
• Data:
buyyes age complain educ Willingness to buy another VCR Age of respondent Performance: complaint resolution Education of respondent

pinnovat preliabl puse qual

Did product ever fail to operate?
Performance: innovative company Performance: reliability Performance: ease of Use Performance: overall quality



Frequency of use
Performance: good value for money

The standardized coefficients
• The value of the function for category 2 (those customers likely to buy) is positive (.613). • This means that higher scores on a variable with a positive coefficient will be associated with group 2 memberships.

The Structure Matrix
• The structure coefficients are not affected with multicollinearity. • Tend to be more stable for small sample

Eigenvalue, Canonical correlation, and Lambda
• The Eigenvalues table provides information about the relative efficacy of each discriminant function. With only one function, the percent of variance is always 100%. The canonical correlation measures the association between the discriminant scores and dependent variable (buyyes). High correlation indicates high predictive accuracy. The Wilk’s lambda (1- sqrt of canonical) is the proportion of variance not explained. Lambda is used to test the null hypothesis that the means of the two groups on the discriminant function are equal.

Testing Equality of Covariances
• Box’s M is used to test for equality of covariance matrices. Researchers want nonsignificant Box’ M.

The Fisher Linear Discriminant Functions
• Used for scoring

• Classification statistics

