Professional Documents
Culture Documents
Introduction
• Discriminant Analysis is a technique for analyzing data when the dependent variable
is categorical (say a dependent variable called loyalty having two groups- brand
switchers and brand loyal coded as 0 and 1 or a firm: which may be classified as a
business group or stand-alone) and the independent variables are metric.
• By categorical is meant any non- metric data (nominal or ordinal) while by metric is
meant interval or ratio scale data.
Group 1
Would purchase 1 8 9 6
2 6 7 5
3 10 6 3
4 9 4 4
5 4 8 2
Group Mean 7.4 6.8 4.0
Group 2
Would not purchase 6 5 4 7
7 3 7 2
8 4 5 5
9 2 4 3
10 2 2 2
Group Mean 3.2 4.4 3.8
Criterion variable: This is the dependent variable, also called the grouping variable.
There will always be (n-1) discriminant function for n group discriminant analysis. For
example if a dependent variable has three groups- switchers, undecided and loyal, we will have
two discriminant function. Also for three groups for each respondent we will have two
discriminant scores since we have two discriminant function.
Assumptions of Discriminant Analysis
Assumption-
1) Multivariate Normality of the Independent Variables
3) No Multi-collinearity
5) Absence of Outliers.
Differences Between Multiple Linear Regression, Discriminant Analysis
The independent variable can be continuous or The independent variable should be continuous
categorical
The dependent variable is always continuous The dependent variable is always categorical
Only one group no question of unequal group Groups primarily should be of equal sizes
sizes
Objective is to reduce Type -II error Objective is to reduce Type -II error
Continuous Variables- interval / ratio variables
Categorical Variables- nominal, ordinal variable
Similarities and Differences between ANOVA, Regression,
and Discriminant Analysis
Similarities
• Number of One One One
dependent
variables
• Number of
independent Multiple Multiple Multiple
variables
Differences
• Nature of the
dependent Metric Metric Categorical
variables
• Nature of the
independent Categorical Metric Metric
variables
Concept of Centroid
Discriminant score, also called the DA score, is the value resulting from applying
a discriminant function formula to the data for a given case. It is a Z score and is a
standardized value, which is obtained for every object of the study.
By averaging the discriminant scores for all the individuals within a
particular group, we arrive at a group mean or centroid. When there
are two groups we have two centroids and when there are three groups
we have three centroids.
The centroids indicate the most typical location of any member from a
particular group, and a comparison of the centroids shows how far apart
the groups are in-terms of the discriminant function.
Discriminant Score (L) = -1.505 + .137 Avg tenure + .043 Size of the
Firms. The discriminant score for each employee can be calculated by
putting the values for the average Tenure and Size of the firms for each
respondent.
Steps of Discriminant Analysis (DA)
• Wilk’s lambda is the ratio of within group sums of squares to total sum of squares. In this
example about 91.4 % of the variance is not explained by group differences.
• A lambda of 1 occurs when observed group means are equal while a small lambda indicates
that the group means appear to differ.
• Wilk’s Lambda is used to test the null hypothesis that the means of all of the discriminating
variables are equal across groups of the dependent variable. If the means of the independent
variables are equal for all groups, the means will not be a useful basis for predicting the
group to which the case belongs, and hence there is no relationship between the
discriminating variables and the dependent variable.
2) If the chi-square test corresponding to the Wilk’s Lambda shows significance, then the
individual independent (discriminating) variables are assessed to see which differ
significantly in mean by group and these are used to classify the dependent variable.
• Eigenvalue, also called the characteristic root of each discriminant function, reflects the
ratio of importance of the dimensions (factors) which classify cases of the
dependent variable.
• For two-group DA, there is one discriminant function and one eigenvalue, which
accounts for 100% of the explained variance.
• Canonical correlation, R*, is a measure of the association between the groups formed
by the dependent and the given discriminant function.
• When R* is zero, there is no relation between the groups and the function. When the
canonical correlation is large, there is a high correlation between the discriminant
functions and the groups. Squared canonical correlation, Rc2, is the percent of variation
in the dependent discriminated by the set of independents in DA or MDA.
(Model) Wilks' lambda- A significant lambda means one can reject the null hypothesis that
the groups have the same mean discriminant function scores and conclude the
model is discriminating.
– The hit ratio must be compared not to zero but to the percent that would
have been correctly classified by chance alone. For two-group discriminant
analysis with a 50-50 split in the dependent variable, the expected percent
is 50%.