Professional Documents
Culture Documents
Introduction
During a study, there are often questions that strike the researcher that must be answered. These
questions include questions like ‘are the groups different?’, ‘on what variables, are the groups most
different?’, ‘can one predict which group a person belongs to using such variables?’ etc. In
answering such questions, discriminant analysis is quite helpful.
Discriminant analysis is a technique that is used by the researcher to analyze the research data
when the criterion or the dependent variable is categorical and the predictor or the independent
variable is interval in nature. The term categorical variable means that the dependent variable is
divided into a number of categories. For example, three brands of computers, Computer A,
Computer B and Computer C can be the categorical dependent variable.
The objective of discriminant analysis is to develop discriminant functions that are nothing but the
linear combination of independent variables that will discriminate between the categories of the
dependent variable in a perfect manner. It enables the researcher to examine whether significant
differences exist among the groups, in terms of the predictor variables. It also evaluates the
accuracy of the classification.
Discriminant analysis is described by the number of categories that is possessed by the dependent
variable.
As in statistics, everything is assumed up until infinity, so in this case, when the dependent variable
has two categories, then the type used is two-group discriminant analysis. If the dependent variable
has three or more than three categories, then the type used is multiple discriminant analysis. The
major distinction to the types of discriminant analysis is that for a two group, it is possible to derive
only one discriminant function. On the other hand, in the case of multiple discriminant analysis,
more than one discriminant function can be computed.
Examples
There are many examples that can explain when discriminant analysis fits. It can be used to know
whether heavy, medium and light users of soft drinks are different in terms of their consumption
of frozen foods. In the field of psychology, it can be used to differentiate between the price
sensitive and non-price sensitive buyers of groceries in terms of their psychological attributes or
characteristics. In the field of business, it can be used to understand the characteristics or the
attributes of a customer possessing store loyalty and a customer who does not have store loyalty.
Applications
Linear Discriminant Analysis is applied in positioning and product management. Other
applications are the following:
Marketing
In marketing, discriminant analysis was once often used to determine the factors which distinguish
different types of customers and/or products on the basis of surveys or other forms of collected
data. Logistic regression or other methods are now more commonly used.
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial variables, linear
discriminant analysis was the first statistical method applied to systematically explain which firms
entered bankruptcy vs. survived. Despite limitations, including known non-conformance of
accounting ratios to the normal distribution assumptions of LDA, Edward Altman's 1968 model is
still a leading model in practical applications.
Face recognition
In computerized face recognition, each face is represented by a large number of pixel values.
Linear discriminant analysis is primarily used here to reduce the number of features to a more
manageable number before classification. Each of the new dimensions is a linear combination of
pixel values, which form a template. The linear combinations obtained using Fisher's linear
discriminant are called Fisher faces, while those obtained using the related principal component
analysis are called Eigen faces.
Biomedical studies
The main application of discriminant analysis in medicine is the assessment of severity state of a
patient and prognosis of disease outcome. For example, during retrospective analysis, patients are
divided into groups according to severity of disease – mild, moderate and severe form. Then results
of clinical and laboratory analyses are studied in order to reveal variables which are statistically
different in studied groups. Using these variables, discriminant functions are built which help to
objectively classify disease in a future patient into mild, moderate or severe form.
Earth Science
This method can be used to separate the alteration zones. For example, when different data from
various zones are available, discriminate analysis can find the pattern within the data and classify
them effectively.
Cluster Analysis
Cluster analysis is an exploratory analysis that tries to identify structures within the data. Cluster
analysis is also called segmentation analysis or taxonomy analysis. More specifically, it tries to
identify homogenous groups of cases if the grouping is not previously known. Because it is
exploratory, it does not make any distinction between dependent and independent variables. The
different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale
(interval or ratio) data.
Cluster analysis is often used in conjunction with other analyses (such as discriminant
analysis). The researcher must be able to interpret the cluster analysis based on their understanding
of the data to determine if the results produced by the analysis are actually meaningful.
Other techniques you might want to try in order to identify similar groups of observations are Q-
analysis, multi-dimensional scaling (MDS), and latent class analysis.
What homogenous clusters of students emerge based on standardized test scores in mathematics,
reading, and writing?
In SPSS Cluster Analyses can be found in Analyze/Classify…. SPSS offers three methods for the
cluster analysis: K-Means Cluster, Hierarchical Cluster, and Two-Step Cluster.
K-means cluster is a method to quickly cluster large data sets. The researcher define the number
of clusters in advance. This is useful to test different models with a different assumed number of
clusters.
Hierarchical cluster is the most common method. It generates a series of models with cluster
solutions from 1 (all cases in one cluster) to n (each case is an individual cluster). Hierarchical
cluster also works with variables as opposed to cases; it can cluster variables together in a manner
somewhat similar to factor analysis. In addition, hierarchical cluster analysis can handle nominal,
ordinal, and scale data; however it is not recommended to mix different levels of measurement.
Two-step cluster analysis identifies groupings by running pre-clustering first and then by running
hierarchical methods. Because it uses a quick cluster algorithm upfront, it can handle large data
sets that would take a long time to compute with hierarchical cluster methods. In this respect, it is
a combination of the previous two approaches. Two-step clustering can handle scale and ordinal
data in the same model, and it automatically selects the number of clusters.