You are on page 1of 4

Discriminant Analysis

Introduction

During a study, there are often questions that strike the researcher that must be answered. These
questions include questions like ‘are the groups different?’, ‘on what variables, are the groups most
different?’, ‘can one predict which group a person belongs to using such variables?’ etc. In
answering such questions, discriminant analysis is quite helpful.

What is Discriminant Analysis?

Discriminant analysis is a technique that is used by the researcher to analyze the research data
when the criterion or the dependent variable is categorical and the predictor or the independent
variable is interval in nature. The term categorical variable means that the dependent variable is
divided into a number of categories. For example, three brands of computers, Computer A,
Computer B and Computer C can be the categorical dependent variable.

Definition of Discriminant Analysis

These are two definitions of discriminant analysis:


Discriminant analysis is a regression based statistical technique used in determining which
particular classification or group (such as 'ill' or 'healthy') an item of data or an object (such as a
patient) belongs to on the basis of its characteristics or essential features. It differs from group
building techniques such as cluster analysis in that the classifications or groups to choose from
must be known in advance.

Discriminant analysis is a form of multivariate analysis in which the objective is to establish a


discriminate function. The function (typically a mathematical formula) discriminates between
individuals in the population and allocates each of them to a group within the population. The
function is established on the basis of a series of measurements or observations made on the
individuals.

Objectives of Discriminant Analysis

The objective of discriminant analysis is to develop discriminant functions that are nothing but the
linear combination of independent variables that will discriminate between the categories of the
dependent variable in a perfect manner. It enables the researcher to examine whether significant
differences exist among the groups, in terms of the predictor variables. It also evaluates the
accuracy of the classification.

Discriminant analysis is described by the number of categories that is possessed by the dependent
variable.
As in statistics, everything is assumed up until infinity, so in this case, when the dependent variable
has two categories, then the type used is two-group discriminant analysis. If the dependent variable
has three or more than three categories, then the type used is multiple discriminant analysis. The
major distinction to the types of discriminant analysis is that for a two group, it is possible to derive
only one discriminant function. On the other hand, in the case of multiple discriminant analysis,
more than one discriminant function can be computed.

Examples

There are many examples that can explain when discriminant analysis fits. It can be used to know
whether heavy, medium and light users of soft drinks are different in terms of their consumption
of frozen foods. In the field of psychology, it can be used to differentiate between the price
sensitive and non-price sensitive buyers of groceries in terms of their psychological attributes or
characteristics. In the field of business, it can be used to understand the characteristics or the
attributes of a customer possessing store loyalty and a customer who does not have store loyalty.

Discriminant Analysis and ANOVA

For a researcher, it is important to understand the relationship of discriminant analysis with


Regression and Analysis of Variance (ANOVA) which has many similarities and differences.
Often we can find similarities and differences with the people we come across. Similarly, there are
some similarities and differences with discriminant analysis along with two other procedures. The
similarity is that the number of dependent variables is one in discriminant analysis and in the other
two procedures, the number of independent variables are multiple in discriminant analysis. The
difference is categorical or binary in discriminant analysis, but metric in the other two procedures.
The nature of the independent variables is categorical in Analysis of Variance (ANOVA), but
metric in regression and discriminant analysis.

Procedure for Discriminant Analysis

The steps involved in conducting discriminant analysis are as follows:


• The problem is formulated before conducting.
• The discriminant function coefficients are estimated.
• The next step is the determination of the significance of these discriminant functions.
• One must interpret the results obtained.
• The last and the most important step is to assess the validity.

Applications
Linear Discriminant Analysis is applied in positioning and product management. Other
applications are the following:

Marketing
In marketing, discriminant analysis was once often used to determine the factors which distinguish
different types of customers and/or products on the basis of surveys or other forms of collected
data. Logistic regression or other methods are now more commonly used.
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial variables, linear
discriminant analysis was the first statistical method applied to systematically explain which firms
entered bankruptcy vs. survived. Despite limitations, including known non-conformance of
accounting ratios to the normal distribution assumptions of LDA, Edward Altman's 1968 model is
still a leading model in practical applications.

Face recognition
In computerized face recognition, each face is represented by a large number of pixel values.
Linear discriminant analysis is primarily used here to reduce the number of features to a more
manageable number before classification. Each of the new dimensions is a linear combination of
pixel values, which form a template. The linear combinations obtained using Fisher's linear
discriminant are called Fisher faces, while those obtained using the related principal component
analysis are called Eigen faces.

Biomedical studies
The main application of discriminant analysis in medicine is the assessment of severity state of a
patient and prognosis of disease outcome. For example, during retrospective analysis, patients are
divided into groups according to severity of disease – mild, moderate and severe form. Then results
of clinical and laboratory analyses are studied in order to reveal variables which are statistically
different in studied groups. Using these variables, discriminant functions are built which help to
objectively classify disease in a future patient into mild, moderate or severe form.

Earth Science
This method can be used to separate the alteration zones. For example, when different data from
various zones are available, discriminate analysis can find the pattern within the data and classify
them effectively.

Cluster Analysis

What is the Cluster Analysis?

Cluster analysis is an exploratory analysis that tries to identify structures within the data. Cluster
analysis is also called segmentation analysis or taxonomy analysis. More specifically, it tries to
identify homogenous groups of cases if the grouping is not previously known. Because it is
exploratory, it does not make any distinction between dependent and independent variables. The
different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale
(interval or ratio) data.

Cluster analysis is often used in conjunction with other analyses (such as discriminant
analysis). The researcher must be able to interpret the cluster analysis based on their understanding
of the data to determine if the results produced by the analysis are actually meaningful.

Typical research questions the cluster analysis answers are as follows:


 Medicine – What are the diagnostic clusters? To answer this question the researcher would
devise a diagnostic questionnaire that includes possible symptoms (for example, in
psychology, anxiety, depression etc.). The cluster analysis can then identify groups of
patients that have similar symptoms.
 Marketing – What are the customer segments? To answer this question a market
researcher may conduct a survey covering needs, attitudes, demographics, and behavior of
customers. The researcher then may use cluster analysis to identify homogenous groups
of customers that have similar needs and attitudes.
 Education – What are student groups that need special attention? Researchers may
measure psychological, aptitude, and achievement characteristics. A cluster analysis then
may identify what homogeneous groups exist among students (for example, high achievers
in all subjects, or students that excel in certain subjects but fail in others).
 Biology – What is the taxonomy of species? Researchers can collect a data set of different
plants and note different attributes of their phenotypes. A cluster analysis can group those
observations into a series of clusters and help build a taxonomy of groups and subgroups
of similar plants.

Other techniques you might want to try in order to identify similar groups of observations are Q-
analysis, multi-dimensional scaling (MDS), and latent class analysis.

The Cluster Analysis in SPSS

Our research question for this example cluster analysis is as follows:

What homogenous clusters of students emerge based on standardized test scores in mathematics,
reading, and writing?

In SPSS Cluster Analyses can be found in Analyze/Classify…. SPSS offers three methods for the
cluster analysis: K-Means Cluster, Hierarchical Cluster, and Two-Step Cluster.

K-means cluster is a method to quickly cluster large data sets. The researcher define the number
of clusters in advance. This is useful to test different models with a different assumed number of
clusters.

Hierarchical cluster is the most common method. It generates a series of models with cluster
solutions from 1 (all cases in one cluster) to n (each case is an individual cluster). Hierarchical
cluster also works with variables as opposed to cases; it can cluster variables together in a manner
somewhat similar to factor analysis. In addition, hierarchical cluster analysis can handle nominal,
ordinal, and scale data; however it is not recommended to mix different levels of measurement.

Two-step cluster analysis identifies groupings by running pre-clustering first and then by running
hierarchical methods. Because it uses a quick cluster algorithm upfront, it can handle large data
sets that would take a long time to compute with hierarchical cluster methods. In this respect, it is
a combination of the previous two approaches. Two-step clustering can handle scale and ordinal
data in the same model, and it automatically selects the number of clusters.

You might also like