You are on page 1of 2

BAN6065 Marketing Analysis

Discriminant Analysis

In class we discussed that it might be quite messy to directly use cluster analysis on your data. A
common procedure is to first run a factor analysis, and then run a cluster analysis to, say, segment
your customers, based on the identified factors. In other words, factor analysis can be used as a
technique prior to cluster analysis.

Discriminant analysis (DA), on the other hand, is a technique that often runs after cluster
analysis. DA is a dependence method, and is appropriate when one seeks to understand a
categorical dependent variable in terms of several independent variables. Groups identified by a
cluster analysis are just labels, and therefore, are categorical. One of the chief uses of DA is to
take groupings provided by cluster analysis, and then see whether it is possible to discriminate
those clusters based on other variables. Perhaps the most common usage along these lines is
segmenting (clustering) a group of customers according to their “needs, wants, and desires,” and
then attempting to distinguish (discriminate among) them in terms of their demographics and
psychographics. In this way, we can predict what a target customer might like, based on factual
data provided about them, such as their age, income, gender, and educational level. We could
also identify which demographic variables are good discriminators, and which are not; an
example might be “of these five mocked-up Super bowl ads, we find that the biggest differences
in response are across gender and education, but there is almost no effect of urban versus rural.

Marketing researchers apply DA to many distinct ends, since the methods is versatile, and can be
applied in a variety of real-world contexts:

(i) Discrimination. Which linear combinations of the given (independent) variables best
distinguish known groups (dependent variable?
Example: Which information in our customer database (e.g., prior purchases, age,
income, other geodemographics) best explains which customers did reply to our
promotional offer last month?
(ii) Classification / Prediction. Given a new set of items (e.g., customers, products, firms)
whose group membership we do not know, which of the pre-established groups are
they likely to fall into?
Example: Can we now predict which customers will reply to next month’s promotional
offer?

1
(iii) Testing / Verification. Are the various groups significantly different, based on the
“profiles” (independent variables) of the individuals found in them?
Example: Does the customer data we spend so much to collect and store truly help
identify promotion-sensitive customers? Or could the groupings be arising from
chance along?
(iv) Influence / Importance. Which input variables seems to best predict group
differences?
Example: We have all this data on each customer! But which variables are most helpful
in prediction who will reply to our promotions? Do some seem completely useless?

To summarize, we want to use DA to: estimate “discriminant functions”; make group predictions,
determine whether the groups really seem different; and, if they do, identify the most useful
predictors in discrimination. Note the striking similarities between DA and multiple regression,
which yields, analogously, a linear combination of “independent” predictors, a “prediction” or
“fit” for each item, a test for the entire model (F-test), and separate tests for each variable (t-
test). The only real difference is that DA analyzes a categorical dependent variable. Consequently,
when there are more than two groups to distinguish, DA produces several discriminant functions,
not just one.

Discriminant Analysis in R

In R, the (linear) discriminant analysis can be done using the function lda() [MASS package]:

> library(MASS)

> fit <- lda(G ~ x1 + x2 + x3, data=mydata, na.action="na.omit", CV=TRUE)

Refer to https://www.statmethods.net/advstats/discriminant.html for more details of


conducting DA in R.

You might also like