You are on page 1of 16

Linear Discriminant Analysis

1
HOW DO WE FIND OUT -

• What psychographic characteristics help differentiate between price-sensitive and non-


price sensitive buyers of groceries?

• What are the drivers of consumer response to email solicitations?

Answer…

Discriminant Analysis

2
DISCRIMINANT ANALYSIS – BASIC CONCEPT

• What is Discriminant analysis?


– Is a statistical technique used to determine the set of variables associated with a
given subject which can be used to discriminate or classify the objects (individuals,
respondents, firms, etc.) belonging to two or more naturally occurring groups.

• What is Discriminant function?


– Is a linear combinations of the independent variables, which will best discriminate
between the categories of the dependent variable.

D = b0 + b1 X1 + b2 X2 + … + bk Xk + … + bn Xn

Where, D: discriminant score


bk : discriminant co-efficients or weights
Xk : predictors or independent variables

3
DISCRIMINANT FUNCTION

Group1 The best


Discriminant
Function

Group2

4
If we want to draw a line to separate out the two groups here – what do we do?

5
Linear Discriminant Analysis helps in creating an axis that maximizes the separability
between categories

6
The new axis has been able to do a good job of separating out the two categories

7
How is this achieved?

There are two criteria used to find the ‘best’ line


1. Maximizing the distance (d2)
The algorithm attempts to maximize the distance between the means of the two categories
2. Minimize the variation within the categories, also known as the scatter
The algorithm attempts to maximize the distance between the means of the two categories

8
How is this achieved?

There are two criteria used to find the ‘best’ line


1. Maximizing the distance (d2)
The algorithm attempts to maximize the distance between the means of the two categories
2. Minimize the variation within the categories, also known as the scatter
The algorithm attempts to maximize the distance between the means of the two categories

9
How about LDA for 3 categories?

x2

x1

10
DATA TYPE

• Dependent variables – Non-metric

• Independent variables – Metric

Note: Non-metric  Qualitative / Categorical / Grouping variable, Metric  Quantitative


Dependent variable  Criterion / Grouping variable
Independent variable  Predictor / Discriminating variable

11
TERMINOLOGIES

• Analysis sample: Part of total sample that is used for estimation of the discriminant
function. Usually, 75% of the total sample constitute the analysis sample. Also called
estimation sample.

• Holdout sample: That part of total sample that is used to check the results of the
estimation sample. Usually, 25% of the total sample constitute the holdout sample. Also
called validation sample.

• Discriminant Score: Is a predicted score obtained by plugging the values of


independent variables in the discriminant function.

• Centroid: Is the mean values for the discriminant scores for a particular group. There
are as many centroids as there are groups.

12
TERMINOLOGIES

• F values : Is the ratio of the between sum of squares to the within sum of squares of
variable.

• Wilks’ Lambda: Is the ratio of the within sum of squares to the total sum of squares for
the entire set of variables in analysis. Wilks’ Lambda varies between 0 to 1. Also called
U statistics.

• Classification matrix : Is a matrix that contains the number of correctly classified and
misclassified cases.

• Hit Ratio: Percentage of cases correctly classified by the discriminant function.

13
ASSUMPTIONS

• Groups must be mutually exclusive, with every case belonging to only one group.

• Variable to base size ratio is at least 1:5 to 1:10

• Independent variables should follow multivariate normal distributions.

• Independent variables are Linearly related.

• Homogeneity of covariance/correlation

14
Simulator

15
Thank you!

16

You might also like