You are on page 1of 33

Discriminant Analysis

• Researchers often wish to classify people or objects


into two or more groups. One might need to classify
persons as either buyers non buyers, good or bad
credit risks, or to classify superior, average or poor
products in some market.
• DA is frequently used in Market Segmentation
research as a predictive or descriptive tool.
• DA creates an equation which will minimize the
possibility of misclassifying cases into their
respective groups or categories. It is similar to
multiple regression but DV is categorical 1
Discriminant Analysis
• DA produces several outputs
– A Discriminant function
– A test of statistical significance
– Discriminant coefficients
– Percentage of correct classification
– How to classify a new entrant
• DA –
– Linear - Two Dependent variable – e.g Male / Female, good / bad,
low / high
– Multiple – Three or more Dependent variable - e.g. Low / medium
/ High
• DV – Categorical, IV - Metric
2
Discriminant Analysis
• Hypothesis –
– Ho: Group Means of a set of independent variable for two or more
groups are equal
– H1: Group Means of a set of independent variable for two or more
groups are not equal
– Group Mean is referred to as Centroid
• Steps in DA
– Form Groups
– Estimate Discriminant Function
– Determine Significance of Function and variables
– Interpret the Discriminant Function
– Perform Classification and validation
3
Discriminant Distribution

Good

4
Worked Out Example
• Suppose a Bank wants to start credit card division. They
want to set up a system to screen applicants and classify
them either ‘low risk’ or ‘high risk’ from the information
they have received.
• They get a data of 18 clients from their apex bank who
turned out to be ‘Hi risk’ or ‘Lo Risk’.
• This data table is given on next slide.
• Build the Discriminant Function (Model) and Find
– % of Customers that it is able to classify correctly
– Statistical Significance
– Which variables are relatively better in discriminating
– How to classify a new credit card applicant by building a decision
5
rule
Worked Out Example
Sr No Risk – Hi / Lo AGE INCOME IN ‘000 YRS MARRIED
1 1 35 40 8
2 1 33 45 6
3 1 29 36 5
4 2 22 32 0
5 2 26 30 1
6 1 28 35 6
7 2 30 31 7
8 2 23 27 2
9 1 32 48 6
10 2 24 12 4
11 2 26 15 3
12 1 38 25 7
13 1 40 20 5
14 2 32 18 4
15 1 36 24 3 6
Worked Out Example
Sr No Risk – Hi / Lo AGE INCOME IN ‘000 YRS MARRIED
16 2 31 17 5
17 2 28 14 3
18 1 33 18 6

7
SPSS

1. Analyze
2 Classify

3 Discriminant

8
SPSS

1. Move DV into
Grouping Variable
2. Move IV

Select respective
variables and use
this to move

9
SPSS

1. Select DV

2. Click ‘Define Range’

10
SPSS

Enter the values –


Min as 1 and Max 2
( We have coded
Low Risk1 and High
Risk 2)

Continue

11
SPSS

Click on Statistics

12
SPSS

Click on
ANOVA,
Fisher’s &
Unstandardized

Continue

13
SPSS

Click on ‘Classify’

14
SPSS

Click on ‘Summery Table


& ‘Leave Out Classification’
& then ‘Continue’

15
SPSS

Click on ‘Save’

16
SPSS

Click on all the


options &
Continue

17
SPSS

Click on ‘Ok’

18
SPSS

19
Worked Out Example
• Important System Outputs –
1. % of Customers that it is able to classify correctly

STAT DISRIM CLASSIFICATION MATRIX


ANALYSIS ROWS – OBSERVED CLASSIFICATION
COLUMNS – PREDICTED CLASSIFICATION
GROUP PERCENT G1 (PREDICTED) G2 (PREDICTED)
CORRECT P = 0.5 P = 0.5
G1 - Low 100 9 0
(OBSERVED)
G2 – High 88.8889 1 8
( OBSEVED)
94.4444 10 8
20
SPSS

• The Equation (Basis Unstandardized Canonical


Discriminant Function only) is –
• D = -10.00335 + 0.24560 (AGE) + 0.00008 (INCOME) +
0. 08465 ( YRS MARRIED)

• Age is the most discriminating variable ( Basis


Standardized Canonical Discriminant Function
Coefficients only)

21
SPSS

• Wilks’ Lambda
• = With in S.S. (‘Sum of Squares’) / Total S.S. (Sum of Squares)
= 0.319
• Wilk’s Lambda –This value is between 0 to 1. Any value
closer to 0 indicates better discriminating power
• If the model is good, ‘With in SS’ should be as much less as
possible
• Since Sig. Value is 0.001, therefore
• Confidence Level = (100 – 0.001) = 99.999
• Model is Good Fit
22
SPSS

• ‘Eigenvalue = Between S.S. (Sum of Squares) /


With in S.S. (Sum of Squares)
= 2.136
• If the model is Good, Eigenvalue should be greater than 1,
i.e. Between S.S. > With in S.S.
• Higher the value, better it is

• Canonical Correlation = Sq Root of ( Betn S.S. / Total S.S)


• Any Value > 0.5 – Accept the model

23
SPSS

•All Independent Variables are Useful


•Age – 100 – 0.001 = 99.999 Confidence Level
•Income – 100 – 0.034 = 99.996 C.L.
•Yrs Married – 100 – 0.008 = 99.992 C.L.
•However Age is the most Discriminating Factor ( Higher F Value,
Lower ‘Wilks Lamda’ & higher Significance

24
Worked Out Example
4. How to classify a new credit card applicant by
building a decision rule
• System Output –

GROUP MEANS
G1 (LOW RISK) 1.37793
G2 ( HIGH RISK) -1.37793

High Risk Low Risk

- 1.37793 0 1.37793 25
Worked Out Example
4. How to classify a new credit card applicant by
building a decision rule
• D = -10.00335 + 0.24560 (AGE) + 0.00008
(INCOME) + 0. 08465 ( YRS MARRIED)
• e.g. Age – 40, Income – 25000, Yrs Married – 15
Substituting in the equation above – D = ? & where
does it fit? Low Risk or High Risk?

High Risk Low Risk

- 1.37793 0 1.37793 26
Summery
• Look at
• % of Customers that it is able to classify correctly – At least > 80,
Close to 100
• Unstandardized Coefficients ( Constant included) for forming
the equation
• Standardized Coefficients – For understanding of discriminating
power of IVs
• Wilk’s Lambda –This value is between 0 to 1. Any value closer
to 0 indicates better discriminating power
• ‘Eigenvalue If the model is Good, Eigenvalue should be greater
than 1. Higher the value, better it is
• Canonical Correlation - Any Value > 0.5 – Accept the model
• Significance value / Confidence Level – Very Good at @ 99%
Which is ( 1- Sig. Value). That Means Discrimination between two
27
groups is highly significant
Similarities and Differences
NO. ANALYSIS ANOVA REGRESSION DA REMARKS

1 No. of DVs 1 1 1 Similarity

2 NO of IVs Multiple Multiple Multiple Similarity

3 DVs Metric Metric Categorical Difference

4 IVs Categorical Metric Metric Difference

28
Practice Example
• The retail outlet wants to know the consumer behaviour
pattern of the purchase of the products in two categories -
National Brand (Brand 1) and Local Brand (Brand 2) ,
which would help it to place orders depending on demand
and requirements of the customers. The retail outlet uses
data of 20 customers, wrt Annual Income and Household
Size, from retail outlet in another location to arrive at a
decision about customer visiting at their end. This retail
outlet wants to use Discriminant Analysis to screen the
responsiveness of customers towards national brand
(Brand 1) and local brand (Brand 2) and find out the
following based on SPSS outputs -
29
Practice Example
1. The percentage of customers that it is able to
classify correctly.
2. Statistical significance of the Discriminant
function.
3. Which variable are relatively better in
discriminating between consumers for national and
local brand.
4. Discriminant Function & Classification of new
customers into one of the two groups namely -
national and local brand acceptors.
30
Practice Example

31
Practice Example

32
Practice Example

33

You might also like