You are on page 1of 19

Linear

Discriminant
Analysis

Predictive Modelling – Week3


LDA – Linear Discriminant Analysis

It’s a linear combination of features /variables that ensures there is maximum separation between the
groups in the consideration

Before applying LDA After applying LDA


Where did we see the linear combination of
features before?
Linear Regression - y = b0+b1x1+b2x2 +b3x3+ e Principal Component Analysis - PC1 = a1x1+a2x2+a3x3

Linear Discriminant - LD1 = b0+b1x1+b2x2 +b3x3

Number of Information sources used before purchasing the item


Cost of Appliance purchased Family income Age Gender
Information Sources
0-1
2-4
5-6
6 more
Techniques Independent Variable Dependent Variable
ANOVA Categorical Numeric
Linear Regression Numeric Numeric
LDA Numeric Categorical
How does LDA work?

Within
group Within
Variance Between group
E E group Variance
x x Variance
p p
e e
r r
i i
e e
n n
c c
e e

Skills

Skills

Between group Variance

• Minimize within group variance


• Maximise Between group variance
Difference between LDA & Logistic Regression

• Logistic regression is more similar to LDA than ANOVA is, as they explain the
categorical variable by the values of continuous independent variables
• Logistic Regression is preferable in applications where it is not reasonable to
assume that the independent variables are not normally distributed, which is a
fundamental assumption of the LDA method.
• LDA is preferred for multiple outputs
Difference between PCA & LDA

PC1 E
x
Experience

p
e
r
i
e
n
PC2 c
e

Skills
Skills

1. PCA works on the concept of finding a direction where 1. LDA works on the concept of maximum the class
there is maximum variance separability
2. PCA is unsupervised learning 2. LDA is a supervised learning
3. Discriminant analysis is used when groups are
known a priori.
LDA as dimension reduction
µ2
d n
ee
e tw
Line 1 B
µ1
Correct
classification

Misc
la ssifi
cat io n
Line 2
1-Diamensional
2-Diamensional Correct
classification

• LDA uses both variables to create a new axis . By doing so , it maximises to separate the 2 classes
• Here µ1 and µ2 are the means of the between group and S1 and S2 are the deviation within the group

Maximize (µ1 - µ 2 )2
Multiclass classification

Y
• If there are 2 classes LDA will segregate the
d3 classes using 1 dimensional vector
d1
• If there are 3 classes LDA will segregate the
d2
classes using 2 dimensional vector

X
• If there are k classes LDA will segregate the
Maximize classes using k-1 dimensional vector
How does LDA predict and separates the classes

• In most cases, there are 2 ways in which LDA classes are predicted
1) Bayes rule
2) Distance
Application
a) Altman Z-Score
Bayes rule
Conditional Probability -is the likelihood of an outcome occurring, based on a previous outcome occurring

P(A|B) = P(B|A) * P(A)

P(B)

Posterior probability = Prior Probability x Likelihood of an event happened

Posterior probability = Prior Probability x Likelihood


Evidence
Likelihood

Posterior Prior
Probability P(A|B) = P(B|A) * P(A) Probability

P(B)
Marginal
Probability
Equation on classes
Two classes, Y=1 and Y=0

P(Y=1|X) = P(X|Y=1) * P(Y=1)


For class =1
P(X)

For class =0 P(Y=0|X) = P(X|Y=0) * P(Y=0)

P(X)
Example : What is the probability that a
person shall go out to play Sunny & Hot
Outlook Yes No P(Sunny|Yes ) P(Sunny |No) Temperature Yes No P(Hot|Yes) P(Hot |No)
Sunny 2 3 2/9 3/5 Hot 2 2 2/9 2/5
Overcast 4 0 4/9 0 Mild 4 2 4/9 2/5
Rainy 3 2 3/9 2/5 Cool 3 1 3/9 1/5
Total 9 5 1 1 Total 9 5 1 1

Today(Sunny,Hot) Today =x1,x2 P(Yes) 14 9/14


P(No) 14 5/14
P(Yes|Today) = P(Today|Yes)* P(Yes)
P(Today)
= P(Sunny |Yes)*P(Hot|Yes)*P(Yes) = 2/9*2/9*9/14
P(Hot)*P(Sunny) P(Hot)*P(Sunny)
= 0.031

P(No|Today) = P(Today|No)* P(No)


P(Today)
3/5*2/5*5/14
= P(Sunny |No)*P(Hot|No)*P(No) = P(Hot)*P(Sunny)
P(Hot)*P(Sunny) = 0.085
LDA classification using distance

• Example Personal loan


• Variables - Average spends on credit card, Income, CIBIL score
Single factor or Single Variable
Average spends on credit card is of an individual is 10k /month
Mean of Acceptors - 25k Manhattan Distance - |25-10| = 15k
Mean of Non-Acceptors - 50k Manhattan Distance - |50-10| = 40k

But did we consider the Standard Deviation (Spread)?


Problem with distance calculation?
• Ignores the variance (or Standard Deviation)
• Scale dependent
• Ignores correlation
• Note : First 2 points can be handled by scaling the data , but correlation can’t
be handled that easily
• Then, what is the Solution?
• The solution is statistical distance or Mahalanobis Distance

Mahalanobis distance - apart from calculating distance across plane, it also


helps in capturing the height or variation or altitude

Prasanta Chandra Mahalanobis


Mahanobolis distance equation

Where S is the Covariance matrix of the predictors (diagonals will have the variances and off-
diagonals will have the co-variances od every pair of predictors
T- indicates the vector should be transposed
Computing the statistical distance of a
customer from centre of acceptors class
Average credit card spends 2.7 New customer profile
Age 44
Income 100
Non-Acceptors Acceptors
Mean Average of CC spends 1.73 3.91 Centroid of acceptors
Mean of Age 45.37 45.07
Mean of Income 66.24 144.75

Income CC spends Age


Income 995.5 Covariance table for
CC spends 14.21 4.39 acceptors
Age 7.77 -0.06 134.07
Computation of statistical distance

New Profile Acceptors Diff


2.7 44 100 - 3.91 45.07 144.75 -1.21 -1.07 -44.75

-1
995.5 14.21 7.77 0.0011 -0.0034 -0.0001
14.21 4.39 -0.06 = -0.0034 0.2388 0.0003
7.77 -0.06 134.07 -0.0001 0.0003 0.0075

0.0011 -0.0034 -0.0001 -1.21


-0.0034 0.2388 0.0003 -1.07 = 15.22
-0.0001 0.0003 0.0075 -44.75 √ 15.22
3.901282
Classification or Discriminant Score

−𝑇 −1 1 − 𝑇 −1
𝑐𝑠 ( 𝑥 , 𝑦 )= 𝑦 𝑠 𝑥− 𝑦 𝑠 𝑦
2
Altman Z -Score

• The Altman Z-score is the output of a credit-strength test that gauges a publicly-traded
manufacturing company's likelihood of bankruptcy.
• Z-Score = 1.2A + 1.4B + 3.3C + 0.6D + 1.0E
A = working capital / total assets, B = retained earnings / total assets, C = earnings before interest and tax / total assets,

D = market value of equity / total liabilities, E = sales / total assets

• An Altman Z-score close to 1.8 suggests a company might be headed for bankruptcy, while a
score closer to 3 suggests a company is in solid financial positioning.

You might also like