You are on page 1of 33

is s

a ly
a n
n t
a
in Presented by

r im Amritashish
si c Bagchi, Anshuman
D Mishra & Sukanta
Goswami
Definition
Discriminant analysis is a multivariate
statistical technique used for classifying a
set of observations into pre defined
groups.

OBJECTIVE
To understand group differences and to predict the
likelihood that a particular entity will belong to a
particular class or group based on independent
variables.
Purpose
1) The main purpose is to classify a subject
into one of the two groups on the basis of
some independent traits.
2) A second purpose of the discriminant
analysis is to study the relationship
between group membership and the
variables used to predict the group
membership.
Situations for its use
When the dependent variable is
dichotomous or multichotomous.

Independent variables are metric, i.e.


interval or ratio.
Application of discriminant
analysis
To identify the characteristics on the
basis of which one can classify an
individual as-
1. basketballer or volleyballer on the
basis of anthropometric variables.
2. High or low performer on the basis of
skill.
3. Juniors or seniors category on the basis
of the maturity parameters.
What we do in discriminant
analysis
It is also known as discriminant function analysis.
In, discriminant analysis, the dependent variable is a
categorical variable, whereas independent
variables are metric.
after developing the discriminant model, for a given
set of new observation the discriminant function Z is
computed, and the subject/ object is assigned to first
group if the value of Z is less than 0 and to second
group if more than 0. This criterion holds true if an
equal number of observations are taken in both the
groups for developing a discriminant function.
Assumptions

1. Sample size
group sizes of the dependent should not be
grossly different i.e. 80:20, here logistic
regression may be prefer.
should be at least five times the number of
independent variables.

2. Normal distribution
Each of the independent variable is normally
distributed.
3. Homogeneity of variances / covariances
All variables have linear and homoscedastic
relationships.

4. Outliers
Outliers should not be present in the data.
DA is highly sensitive to the inclusion of
outliers.

5. Non-multicollinearity
There should be any correlation among the
independent variables.
6. Mutually exclusive
The groups must be mutually exclusive, with
every subject or case belonging to only one
group.

7. Classification
Each of the allocations for the dependent
categories in the initial classification are
correctly classified.

8. Variability
No independent variables should have a zero
variability in either of the groups formed by
the dependent variable.
Terminology
1) Variables in the analysis
2) Discriminant function
A discriminant function is a latent variable which is
constructed as a linear combination of independent
variables, such that
Z= c+b1X1+ b2X2++bnXn
The discriminant function is also known as
canonical root. This discriminant function is used to
classify the subject/cases into one of the two
groups on the basis of the observed values of the
predictor variables
3) Classification matrix
In DA, it serves as a yardstick in measuring the
accuracy of a model in classifying an individual /case
into one of the two groups. It is also known as confusion
matrix, assignment matrix,or prediction matrix. It tells
us as to what percentage of the existing data points are
correctly classified by the model developed in DA.
4) Stepwise method of discriminant analysis
Discriminant function can be developed either by
entering all independent variables together or in
stepwise depending upon whether the study is
confirmatory or exploratory.
5) Power of discriminatory variables
After developing the model in the discriminant analysis
based on the selected independent variables, it is important
to know the relative importance of the variables so selected.
6) Boxs M Test
By using Boxs M Tests, we test a null hypothesis that the
covariance matrices do not differ between groups formed by
the dependent variable. If the Boxs M Test is insignificant, it
indicates that the assumptions required for DA holds true.
7) Eigen values
Eigen value is the index of overall fit.
8) WILKS lambda
It measures the efficiency of discriminant function
in the model.
Its value shows, how much percentage of
variability in dependent variable is not explained
by the independent variables.

9) Cannonial correlation
The canonical correlation is the multiple
correlation between the predictors and the
discriminant function. With only one function it
provides an index of overall model fit which is
interpreted as being the proportion of variance
explained (R2).
du re
r oce
l ed p
t a i
De
STEPS IN ANALYSIS :

STEP 1. STEP 2.
In step one the A discriminant
independent variables function model is
which have the developed by using
discriminating power are the coefficients of
being chosen. independent
variables
STEPS IN ANALYSIS Contd

STEP 3. STEP 4.
In step three Wilks In step four the
lambda is computed independent variables
for testing the which possess
significance of importance in
discriminant discriminating the
function. groups are being
found.
STEPS IN ANALYSIS Contd
STEP 5.

In step five classification of subjects to their


respective group is being made.
APPLICATION OF SPSS
Eg. To identify the players into different
categories during selection process .
Group statistics
Box's Test of Equality of Covariance Matrices
Where,

Height

Back explosive power

Judgement

Patience

Z= -24.880 + .169 + .466 - .423 - .204


Means of the Transformed Groups Centroids

Mean of group 1 Mean of group


2
(Batsmen ) (Bowler)

-4.390 0 4.390
a n k
T h
y o u

You might also like