Discriminant Analysis

Discriminant Analysis
• Researchers often wish to classify people or objects

into two or more groups. One might need to classify
persons as either buyers non buyers, good or bad
credit risks, or to classify superior, average or poor
products in some market.
• DA is frequently used in Market Segmentation
research as a predictive or descriptive tool.
• DA creates an equation which will minimize the
possibility of misclassifying cases into their
respective groups or categories. It is similar to
multiple regression but DV is categorical 1
• DA produces several outputs
– A Discriminant function
– A test of statistical significance
– Discriminant coefficients
– Percentage of correct classification
– How to classify a new entrant
• DA –
– Linear - Two Dependent variable – e.g Male / Female, good / bad,
low / high
– Multiple – Three or more Dependent variable - e.g. Low / medium
/ High
• DV – Categorical, IV - Metric
2
• Hypothesis –
– Ho: Group Means of a set of independent variable for two or more
groups are equal
– H1: Group Means of a set of independent variable for two or more
groups are not equal
– Group Mean is referred to as Centroid
• Steps in DA
– Form Groups
– Estimate Discriminant Function
– Determine Significance of Function and variables
– Interpret the Discriminant Function
– Perform Classification and validation
3
Discriminant Distribution
Good
4
Worked Out Example
• Suppose a Bank wants to start credit card division. They
want to set up a system to screen applicants and classify
them either ‘low risk’ or ‘high risk’ from the information
they have received.
• They get a data of 18 clients from their apex bank who
turned out to be ‘Hi risk’ or ‘Lo Risk’.
• This data table is given on next slide.
• Build the Discriminant Function (Model) and Find
– % of Customers that it is able to classify correctly
– Statistical Significance
– Which variables are relatively better in discriminating
– How to classify a new credit card applicant by building a decision
5
rule
Worked Out Example
Sr No Risk – Hi / Lo AGE INCOME IN ‘000 YRS MARRIED
1 1 35 40 8
2 1 33 45 6
3 1 29 36 5
4 2 22 32 0
5 2 26 30 1
6 1 28 35 6
7 2 30 31 7
8 2 23 27 2
9 1 32 48 6
10 2 24 12 4
11 2 26 15 3
12 1 38 25 7
13 1 40 20 5
14 2 32 18 4
15 1 36 24 3 6
Worked Out Example
Sr No Risk – Hi / Lo AGE INCOME IN ‘000 YRS MARRIED
16 2 31 17 5
17 2 28 14 3
18 1 33 18 6
7
SPSS
1. Analyze
2 Classify
3 Discriminant
8
SPSS
1. Move DV into
Grouping Variable
2. Move IV
Select respective
variables and use
this to move
9
SPSS
1. Select DV
2. Click ‘Define Range’
10
SPSS
Enter the values –

Min as 1 and Max 2
( We have coded
Low Risk1 and High
Risk 2)
Continue
11
SPSS
Click on Statistics
12
SPSS
Click on
ANOVA,
Fisher’s &
Unstandardized
Continue
13
SPSS
Click on ‘Classify’
14
SPSS
Click on ‘Summery Table

& ‘Leave Out Classification’
& then ‘Continue’
15
SPSS
Click on ‘Save’
16
SPSS
Click on all the

options &
Continue
17
SPSS
Click on ‘Ok’
18
SPSS
19
Worked Out Example
• Important System Outputs –
1. % of Customers that it is able to classify correctly
STAT DISRIM CLASSIFICATION MATRIX

ANALYSIS ROWS – OBSERVED CLASSIFICATION
COLUMNS – PREDICTED CLASSIFICATION
GROUP PERCENT G1 (PREDICTED) G2 (PREDICTED)
CORRECT P = 0.5 P = 0.5
G1 - Low 100 9 0
(OBSERVED)
G2 – High 88.8889 1 8
( OBSEVED)
94.4444 10 8
20
SPSS
• The Equation (Basis Unstandardized Canonical

Discriminant Function only) is –
• D = -10.00335 + 0.24560 (AGE) + 0.00008 (INCOME) +
0. 08465 ( YRS MARRIED)
• Age is the most discriminating variable ( Basis

Standardized Canonical Discriminant Function
Coefficients only)
21
SPSS
• Wilks’ Lambda
• = With in S.S. (‘Sum of Squares’) / Total S.S. (Sum of Squares)
= 0.319
• Wilk’s Lambda –This value is between 0 to 1. Any value
closer to 0 indicates better discriminating power
• If the model is good, ‘With in SS’ should be as much less as
possible
• Since Sig. Value is 0.001, therefore
• Confidence Level = (100 – 0.001) = 99.999
• Model is Good Fit
22
SPSS
• ‘Eigenvalue = Between S.S. (Sum of Squares) /

With in S.S. (Sum of Squares)
= 2.136
• If the model is Good, Eigenvalue should be greater than 1,
i.e. Between S.S. > With in S.S.
• Higher the value, better it is
• Canonical Correlation = Sq Root of ( Betn S.S. / Total S.S)

• Any Value > 0.5 – Accept the model
23
SPSS
•All Independent Variables are Useful

•Age – 100 – 0.001 = 99.999 Confidence Level
•Income – 100 – 0.034 = 99.996 C.L.
•Yrs Married – 100 – 0.008 = 99.992 C.L.
•However Age is the most Discriminating Factor ( Higher F Value,
Lower ‘Wilks Lamda’ & higher Significance
24
Worked Out Example
4. How to classify a new credit card applicant by
building a decision rule
• System Output –
GROUP MEANS
G1 (LOW RISK) 1.37793
G2 ( HIGH RISK) -1.37793
High Risk Low Risk
- 1.37793 0 1.37793 25
Worked Out Example
4. How to classify a new credit card applicant by
building a decision rule
• D = -10.00335 + 0.24560 (AGE) + 0.00008
(INCOME) + 0. 08465 ( YRS MARRIED)
• e.g. Age – 40, Income – 25000, Yrs Married – 15
Substituting in the equation above – D = ? & where
does it fit? Low Risk or High Risk?
High Risk Low Risk
- 1.37793 0 1.37793 26
Summery
• Look at
• % of Customers that it is able to classify correctly – At least > 80,
Close to 100
• Unstandardized Coefficients ( Constant included) for forming
the equation
• Standardized Coefficients – For understanding of discriminating
power of IVs
• Wilk’s Lambda –This value is between 0 to 1. Any value closer
to 0 indicates better discriminating power
• ‘Eigenvalue If the model is Good, Eigenvalue should be greater
than 1. Higher the value, better it is
• Canonical Correlation - Any Value > 0.5 – Accept the model
• Significance value / Confidence Level – Very Good at @ 99%
Which is ( 1- Sig. Value). That Means Discrimination between two
27
groups is highly significant
Similarities and Differences
NO. ANALYSIS ANOVA REGRESSION DA REMARKS
1 No. of DVs 1 1 1 Similarity
2 NO of IVs Multiple Multiple Multiple Similarity
3 DVs Metric Metric Categorical Difference
4 IVs Categorical Metric Metric Difference
28
Practice Example
• The retail outlet wants to know the consumer behaviour
pattern of the purchase of the products in two categories -
National Brand (Brand 1) and Local Brand (Brand 2) ,
which would help it to place orders depending on demand
and requirements of the customers. The retail outlet uses
data of 20 customers, wrt Annual Income and Household
Size, from retail outlet in another location to arrive at a
decision about customer visiting at their end. This retail
outlet wants to use Discriminant Analysis to screen the
responsiveness of customers towards national brand
(Brand 1) and local brand (Brand 2) and find out the
following based on SPSS outputs -
29
Practice Example
1. The percentage of customers that it is able to
classify correctly.
2. Statistical significance of the Discriminant
function.
3. Which variable are relatively better in
discriminating between consumers for national and
local brand.
4. Discriminant Function & Classification of new
customers into one of the two groups namely -
national and local brand acceptors.
30
Practice Example
31
Practice Example
32
Practice Example
33

Discriminant Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discriminant Analysis

Uploaded by

Copyright:

Available Formats

Discriminant Analysis

• Researchers often wish to classify people or objects

2. Click ‘Define Range’

Enter the values –

Click on ‘Summery Table

Click on all the

STAT DISRIM CLASSIFICATION MATRIX

• The Equation (Basis Unstandardized Canonical

• Age is the most discriminating variable ( Basis

• ‘Eigenvalue = Between S.S. (Sum of Squares) /

• Canonical Correlation = Sq Root of ( Betn S.S. / Total S.S)

•All Independent Variables are Useful

High Risk Low Risk

High Risk Low Risk

1 No. of DVs 1 1 1 Similarity

2 NO of IVs Multiple Multiple Multiple Similarity

3 DVs Metric Metric Categorical Difference

4 IVs Categorical Metric Metric Difference

You might also like