Linear Discriminant Analysis: Predictive Modelling - Week3

Linear
Discriminant
Analysis
Predictive Modelling – Week3

LDA – Linear Discriminant Analysis
It’s a linear combination of features /variables that ensures there is maximum separation between the
groups in the consideration
Before applying LDA After applying LDA

Where did we see the linear combination of
features before?
Linear Regression - y = b0+b1x1+b2x2 +b3x3+ e Principal Component Analysis - PC1 = a1x1+a2x2+a3x3
Linear Discriminant - LD1 = b0+b1x1+b2x2 +b3x3
Number of Information sources used before purchasing the item

Cost of Appliance purchased Family income Age Gender
Information Sources
0-1
2-4
5-6
6 more
Techniques Independent Variable Dependent Variable
ANOVA Categorical Numeric
Linear Regression Numeric Numeric
LDA Numeric Categorical
How does LDA work?
Within
group Within
Variance Between group
E E group Variance
x x Variance
p p
e e
r r
i i
e e
n n
c c
e e
Skills
Skills
Between group Variance
• Minimize within group variance

• Maximise Between group variance
Difference between LDA & Logistic Regression
• Logistic regression is more similar to LDA than ANOVA is, as they explain the
categorical variable by the values of continuous independent variables
• Logistic Regression is preferable in applications where it is not reasonable to
assume that the independent variables are not normally distributed, which is a
fundamental assumption of the LDA method.
• LDA is preferred for multiple outputs
Difference between PCA & LDA
PC1 E
x
Experience
p
e
r
i
e
n
PC2 c
e
Skills
Skills
1. PCA works on the concept of finding a direction where 1. LDA works on the concept of maximum the class
there is maximum variance separability
2. PCA is unsupervised learning 2. LDA is a supervised learning
3. Discriminant analysis is used when groups are
known a priori.
LDA as dimension reduction
µ2
d n
ee
e tw
Line 1 B
µ1
Correct
classification
Misc
la ssifi
cat io n
Line 2
1-Diamensional
2-Diamensional Correct
classification
• LDA uses both variables to create a new axis . By doing so , it maximises to separate the 2 classes
• Here µ1 and µ2 are the means of the between group and S1 and S2 are the deviation within the group
Maximize (µ1 - µ 2 )2
Multiclass classification
Y
• If there are 2 classes LDA will segregate the
d3 classes using 1 dimensional vector
d1
• If there are 3 classes LDA will segregate the
d2
classes using 2 dimensional vector
X
• If there are k classes LDA will segregate the
Maximize classes using k-1 dimensional vector
How does LDA predict and separates the classes
• In most cases, there are 2 ways in which LDA classes are predicted
1) Bayes rule
2) Distance
Application
a) Altman Z-Score
Bayes rule
Conditional Probability -is the likelihood of an outcome occurring, based on a previous outcome occurring
P(A|B) = P(B|A) * P(A)
P(B)
Posterior probability = Prior Probability x Likelihood of an event happened
Posterior probability = Prior Probability x Likelihood

Evidence
Likelihood
Posterior Prior
Probability P(A|B) = P(B|A) * P(A) Probability
P(B)
Marginal
Probability
Equation on classes
Two classes, Y=1 and Y=0
P(Y=1|X) = P(X|Y=1) * P(Y=1)

For class =1
P(X)
For class =0 P(Y=0|X) = P(X|Y=0) * P(Y=0)
P(X)
Example : What is the probability that a
person shall go out to play Sunny & Hot
Outlook Yes No P(Sunny|Yes ) P(Sunny |No) Temperature Yes No P(Hot|Yes) P(Hot |No)
Sunny 2 3 2/9 3/5 Hot 2 2 2/9 2/5
Overcast 4 0 4/9 0 Mild 4 2 4/9 2/5
Rainy 3 2 3/9 2/5 Cool 3 1 3/9 1/5
Total 9 5 1 1 Total 9 5 1 1
Today(Sunny,Hot) Today =x1,x2 P(Yes) 14 9/14

P(No) 14 5/14
P(Yes|Today) = P(Today|Yes)* P(Yes)
P(Today)
= P(Sunny |Yes)*P(Hot|Yes)*P(Yes) = 2/9*2/9*9/14
P(Hot)*P(Sunny) P(Hot)*P(Sunny)
= 0.031
P(No|Today) = P(Today|No)* P(No)

P(Today)
3/5*2/5*5/14
= P(Sunny |No)*P(Hot|No)*P(No) = P(Hot)*P(Sunny)
P(Hot)*P(Sunny) = 0.085
LDA classification using distance
• Example Personal loan

• Variables - Average spends on credit card, Income, CIBIL score
Single factor or Single Variable
Average spends on credit card is of an individual is 10k /month
Mean of Acceptors - 25k Manhattan Distance - |25-10| = 15k
Mean of Non-Acceptors - 50k Manhattan Distance - |50-10| = 40k
But did we consider the Standard Deviation (Spread)?

Problem with distance calculation?
• Ignores the variance (or Standard Deviation)
• Scale dependent
• Ignores correlation
• Note : First 2 points can be handled by scaling the data , but correlation can’t
be handled that easily
• Then, what is the Solution?
• The solution is statistical distance or Mahalanobis Distance
Mahalanobis distance - apart from calculating distance across plane, it also

helps in capturing the height or variation or altitude
Prasanta Chandra Mahalanobis

Mahanobolis distance equation
Where S is the Covariance matrix of the predictors (diagonals will have the variances and off-
diagonals will have the co-variances od every pair of predictors
T- indicates the vector should be transposed
Computing the statistical distance of a
customer from centre of acceptors class
Average credit card spends 2.7 New customer profile
Age 44
Income 100
Non-Acceptors Acceptors
Mean Average of CC spends 1.73 3.91 Centroid of acceptors
Mean of Age 45.37 45.07
Mean of Income 66.24 144.75
Income CC spends Age

Income 995.5 Covariance table for
CC spends 14.21 4.39 acceptors
Age 7.77 -0.06 134.07
Computation of statistical distance
New Profile Acceptors Diff

2.7 44 100 - 3.91 45.07 144.75 -1.21 -1.07 -44.75
-1
995.5 14.21 7.77 0.0011 -0.0034 -0.0001
14.21 4.39 -0.06 = -0.0034 0.2388 0.0003
7.77 -0.06 134.07 -0.0001 0.0003 0.0075
0.0011 -0.0034 -0.0001 -1.21

-0.0034 0.2388 0.0003 -1.07 = 15.22
-0.0001 0.0003 0.0075 -44.75 √ 15.22
3.901282
Classification or Discriminant Score
−𝑇 −1 1 − 𝑇 −1
𝑐𝑠 ( 𝑥 , 𝑦 )= 𝑦 𝑠 𝑥− 𝑦 𝑠 𝑦
2
Altman Z -Score
• The Altman Z-score is the output of a credit-strength test that gauges a publicly-traded
manufacturing company's likelihood of bankruptcy.
• Z-Score = 1.2A + 1.4B + 3.3C + 0.6D + 1.0E
A = working capital / total assets, B = retained earnings / total assets, C = earnings before interest and tax / total assets,
D = market value of equity / total liabilities, E = sales / total assets
• An Altman Z-score close to 1.8 suggests a company might be headed for bankruptcy, while a
score closer to 3 suggests a company is in solid financial positioning.

Linear Discriminant Analysis: Predictive Modelling - Week3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Discriminant Analysis: Predictive Modelling - Week3

Uploaded by

Copyright:

Available Formats

Linear

Predictive Modelling – Week3

Before applying LDA After applying LDA

Linear Discriminant - LD1 = b0+b1x1+b2x2 +b3x3

Number of Information sources used before purchasing the item

Between group Variance

• Minimize within group variance

P(A|B) = P(B|A) * P(A)

Posterior probability = Prior Probability x Likelihood of an event happened

Posterior probability = Prior Probability x Likelihood

P(Y=1|X) = P(X|Y=1) * P(Y=1)

For class =0 P(Y=0|X) = P(X|Y=0) * P(Y=0)

Today(Sunny,Hot) Today =x1,x2 P(Yes) 14 9/14

P(No|Today) = P(Today|No)* P(No)

• Example Personal loan

But did we consider the Standard Deviation (Spread)?

Mahalanobis distance - apart from calculating distance across plane, it also

Prasanta Chandra Mahalanobis

Income CC spends Age

New Profile Acceptors Diff

0.0011 -0.0034 -0.0001 -1.21

D = market value of equity / total liabilities, E = sales / total assets

You might also like