You are on page 1of 23

DISCRIMINANT ANALYSIS

What is discriminant analysis?

■ Consider the beer people buy


– Regular beer
– Light beer
■ Predictors of the beer people buy
– Age, Income, Alcohol content, Education level, etc.
■ We will consider just two predictors to understand the
basic idea of discriminant analysis
What is discriminant analysis?

■ Consider two predictors


– Age
– Income
■ The outcome is a binary outcome – beer preference
– Regular beer
– Light beer
The basic idea of discriminant analysis?

■ To find a “line” which best separates the classes


– A “line” when we have two predictors
– A “plane” when we have three predictors
– A “hyperplane” when we have many predictors
■ Consider a sample of 𝑛𝑛 = 100 people
 Suppose we want to build a predictive model that
would predict the beer preference of a new person
given only their age and their income level.
 How do we approach this problem using
discriminant analysis?
The basic idea of discriminant analysis?

■ We want to separate these two clouds by finding a “line”


that would slice between the two classes.
■ By hand, probably
The basic idea of discriminant analysis?

■ The line is chosen by the algorithm so that it is equally


distant (statistical distance) from each class center.
– Class center also called a centroid – is a vector (list)
of the means of each of the predictors computed for
each class separately.
– Distance – measuring the distance between a record
and a centroid can be done in different ways
■ statistical distance or Mahalanobis distance
Uses of discriminant analysis

■ Profiling
– Explaining or determining factors that discriminate
between classes.
■ Classifying
– Predict class membership.
Example: Personal Loan offer by a bank

■ Objective: To identify customers most likely to accept the


loan offer.
■ Using data from a previous campaign on 5000 customers
where 480 accepted
■ We build an algorithm that helps generate predictions for
this new customer base required
Example: Personal Loan offer by a bank

■ Predictors – Age; Income; Experience in years; education


level; Credit Card Average score; family size etc.
■ The outcome variable is Personal Loan
– A binary outcome
– Did the customer accept the personal loan offered in
the last campaign? 1 – Yes; 0 - No
CONSIDER JUST TWO PREDICTORS
Can you easily
find a
separating
line?
And now?
If we are trying to
separate our two
classes, we cannot do
it easily manually or by
eye and hence we
need discriminant
analysis to do all the
calculations behind the
scene and find the
“best line”.
Classifying a record

■ Suppose we have run the algorithm and we get the line,


how do we use it to classify the record?
– Define the center of each class
■ If we have 𝑥𝑥 classes then we have 𝑥𝑥 centers
– For a new record of interest
■ Measure records distance from the center of the class
■ Classify the record to the closest class
Centroid: the center of the class
Classification function
■ Software generates a classification function – “the line of best it”
■ A linear combination of predictors for deriving proximity scores to each class

Variable Classification Function


Accept Reject
Constant -7.596 -2.096
Income 0.089 0.040
CCAvg 0.255 0.111

■ The table shows two classification functions for the classes – Accept buying the
loan and Reject buying the loan
– There are as many classification functions as there are classes.
Example: Classifying a record

■ How far is our new record from each of these classes?


■ Record: Income = 34K/year; CCAvg = 1.50/month
■ The Classification Score (CS) for the classes – Accept and Reject are:
– 𝐶𝐶𝐶𝐶𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = −7.596 + 0.089 × 34 + 0.255 × 1.55 = −4.20
– 𝐶𝐶𝐶𝐶𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = −2.096 + 0.040 × 34 + 0.111 × 1.55 = −0.57
■ -0.57 is smaller hence this customer is classified to the Reject class – You do not
send them the loan form
■ Where would you classify Record: Income = 114K/year; CCAvg = 3.80/month
– Are they more likely to accept the loan offer?
Computing membership probabilities

■ Use the record classification scores to compute class membership probabilities


𝐶𝐶𝐶𝐶 𝐶𝐶𝑗𝑗
𝑒𝑒
■ 𝑝𝑝 𝐶𝐶𝑗𝑗 |𝑋𝑋 =
𝑒𝑒 𝐶𝐶𝐶𝐶 𝐶𝐶0 +𝑒𝑒 𝐶𝐶𝐶𝐶 𝐶𝐶1 +⋯+𝑒𝑒 𝐶𝐶𝐶𝐶 𝐶𝐶𝑔𝑔
𝑒𝑒 −4.20
■ 𝑝𝑝 𝐶𝐶𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 |𝑋𝑋 = = 0.026
𝑒𝑒 −4.20 +𝑒𝑒 −0.57
– Do you want to send out loan forms to customers with such low probability
acceptance scores?
■ The decision depends on the application…
𝑒𝑒 −0.57
– 𝐶𝐶𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 |𝑋𝑋 = = 0.974
𝑒𝑒 −4.20 +𝑒𝑒 −0.57
Ranking predictor importance using
classification function
■ To compare predictors, you run DA on standardized predictors
■ Compare the difference in coefficients across the two classes
𝑏𝑏1 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 − 𝑏𝑏1 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅

Variables Classification Function Difference


1 0
Constant -2.36 -0.72
Income_Z 2.02 -0.22 2.24
CCAvg_Z 0.2 -0.05 0.25
Standardized predictors

𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 −𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼



𝑆𝑆𝑆𝑆𝑆𝑆 𝑑𝑑𝑑𝑑𝑑𝑑 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 −𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶



𝑆𝑆𝑆𝑆𝑆𝑆 𝑑𝑑𝑑𝑑𝑑𝑑 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
Ranking predictor importance using
classification function
■ A larger difference indicates a greater impact on the
outcome variable
– The impact of the Income predictor is more than that of
the CCAvg score
– Income better separates the two classes.
■ If you have to choose one predictor of whether a customer
will buy the loan or not you will use their annual income
Classification Function

■ “Classification functions”, estimated by software are used to


– Classify records - computing “classification scores” for a
record.
– Compute class membership probabilities – propensity scores.
– Study predictor importance - ranking predictor importance.

You might also like