You are on page 1of 21

Discriminant Analysis

Classify objects into two or more groups based on knowledge


of some characteristics or attributes related to them
• Theory of Discriminant Analysis

• Linear discriminant analysis (LDA) is used for pattern


recognition and classification by doing linear combination of
features or characterizes of objects or customers. Resulting
combination may be used for a linear classifier.

• LDA approaches the problem by assuming that the conditional


probability density functions are both normally distributed
with mean and covariance parameters respectively.

• Under this assumption, the Bayes optimal solution is to predict


points as being from the second class if the log of the
likelihood ratios is below some threshold T.
• Applications

• Bankruptcy prediction
– Based on financial variables e.g. ratios, discriminant analysis was first
used to explain which firms entered bankruptcy vs. survived.

• Marketing
– Determine the factors which distinguish different types of customers
and products on the basis of surveys or other forms of collected data.

• In discriminant analysis dependent variable is categorical and


independent variables are interval in nature.
– When dependent variable has two categories, the technique is known
as two-group discriminant analysis
– When three or more categories are involved, the technique is referred
to as multiple discriminant analysis
• Discriminant analysis can answer the following questions:

– Based on demographic characteristics, how do you differ customers


who exhibit store loyalty from those who do not?

– Do heavy, medium and light users of soft drinks differ in terms of


their consumption of frozen foods?

– What psychographic characteristics help to differentiate between


price-sensitive and non-price-sensitive buyers of groceries?

– Are different market segments differ in their media consumption


habits?

– What is the difference between heavy patrons of regional department


store chains and national chains based on lifestyles,?
• Objectives of discriminant analysis are as follows:

– Development of discriminant functions of the predictor or independent


variables, which will best discriminate between the categories of
criterion or dependent variable (groups)

– Examination of whether significant differences exist among the groups,


in terms of the predictor variables

– Determination of which predictor variables contribute to most of the


intergroup differences

– Classification of cases to one of the groups based on the values of the


predictor variables

– Evaluation of the accuracy of classification


• Discriminant analysis model involves linear combinations of
following form: D = blX1 + b2X2 + b3X3 + ... + bkXk

• Where D = discriminant score, b's = discriminant coefficient or


weight X's = predictor or independent variable

• Coefficients or weights (b) are estimated so that the groups


differ as much as possible on the values of discriminant
function, which occurs when the ratio of between group sum
of squares to within-group sum of squares for the discriminant
scores is at a maximum

• Any other linear combination of the predictors will result in a


smaller ratio
• Geometrical exposition of two-group (G1 and G2) discriminant
analysis was measured on two variables Xl and X2 as shown in
Figure, where X1 and X2 are two axes and members of G1 are
denoted by 1 and members of G2 by 2

• Resultant ellipses encompass some specified percentage of the


points (members), say 93 percent in each group

• A straight line is drawn through the two points where the


ellipses intersect and then projected to a new axis, D

• Overlap between univariate distributions Gl and G2, represented


by the shaded area, is smaller than would be obtained by any
other line drawn through the ellipses representing the scatter
plots. Thus, the groups differ as much as possible on the D axis
• Statistics associated with discriminant analysis are

• Centroid is the mean values for the discriminant scores for a


particular group.
– There are as many centroids as there are groups, because there is one
for each group.
– Means for a group on all the functions are the group centroids

• Discriminant function coefficients are the multipliers of


variables, when the variables are in the original units of
measurement

• Discriminant scores is the multiplication of coefficients and


the values of the variables, and these products are summed
and added to the constant term to obtain discriminant scores
• Classification or confusion or prediction matrix contains
the number of correctly classified and misclassified cases.

– Correctly classified cases appear on the diagonal, because the predicted


and actual groups are the same.

– Off-diagonal elements represent the cases that have been incorrectly


classified.

– Sum of the diagonal elements divided by the total number of cases


represents the hit ratio

• Proportion of trace gives the percentage separation achieved


by each discriminant function.
• Wine Classification

• The wine data set have 13 chemical concentrations describing


wine samples from three cultivars.

• Cultivar is a variety of a plant developed from natural species


and maintained under cultivation.

• Objective is to find the linear combinations of the original


variables (13 chemical concentrations) that gives the best
possible separation between the groups (wine cultivars).

• To separate the wines by cultivar, the wines come from three


different cultivars, so the number of groups G = 3 and the
number of variables is 13 (chemicals concentrations) i.e. p=13.
• Wine Classification

• Maximum number of useful discriminant functions that can


separate the wines by cultivar is the minimum of G−1and p. So
in this case it is the minimum of 2 and 13.

• Thus, we can find at most 2 useful discriminant functions to


separate the wines by cultivar, using the 13 chemical
concentration variables.

• Installed these packages.


– install.packages("MASS")
– install.packages("car")
– install.packages("caret")
– install.packages("klaR")
– install.packages("rattle.data")
• Search for pre-installed database

• > data(wine, package = 'rattle.data’)


• > attach(wine)
• > head(wine)
• > str(wine)

• Wine data showed that Type variable is a categorical data


consist of 3 type categories.

• Notice the category is more than 2, thus linear discriminant


analysis is appropriate method in this case.

• The purpose of the linear discriminant analysis is to find


combination of the variables that give best possible separation
between groups (wine cultivars) in our data set.
• For convenience, the value for each discriminant function are
scaled so that their mean value is zero and its variance is one.

• wine.lda <- lda(Type ~., data = wine)


• > wine.lda

• The linear discriminant function from the result is


– LD1 =−.403∗Alcohol + 0.165∗Malic − 0.369∗Ash + 0.155∗Alcalinity
− 0.002∗Magnesium + 0.618∗Phenols − 1.66∗Flavanoids −
1.496∗Nonflavanoids + 0.134∗Proanthocyanins + 0.355∗Color −
0.818∗Hue − 1.15∗Dilution − 0.003∗Proline

• “Proportion of trace” gives the percentage separation achieved


by each discriminant function.
– First DF does achieve a good separation between three groups (three
cultivars), but second DF does improve the separation of the groups by
quite a large amount, so is it worth using the second DF as well.
Therefore, to achieve a good separation of the groups (cultivars), it is
necessary to use both of the first two discriminant functions.
• Stacked Histogram of the LDA Values

• Histogram displays the result of linear discriminant analysis.

• Make prediction based on LDA function and store as an


object. Predict function generate value from selected model
function.

• The length of the value predicted will be correspond with the


length of the processed data.

• Then using ldahist() function to plot histogram.

• > wine.lda.values <- predict(wine.lda)


• > ldahist(wine.lda.values$x[,1], g = Type)
• Make a stacked histogram of second discriminant function’s
values separates those cultivars

• > ldahist(wine.lda.values$x[,2], g = Type)

• Scatterplot of the best two discriminant functions, with the


data points labelled by cultivar, by typing:

• Make a scatterplot
• > plot(wine.lda.values$x[,1],wine.lda.values$x[,2])

• Add labels
• > text(wine.lda.values$x[,1],wine.lda.values$x[,2],Type,cex=0
.7,pos=4,col="red")
• Scatterplots of Discriminant Functions

• # make a scatterplot
• > plot(wine.lda.values$x[,1],wine.lda.values$x[,2])

• # add labels
• > text(wine.lda.values$x[,1],wine.lda.values$x[,2],wine$Type,
cex=0.7,pos=4,col="red")

• Wines from three cultivars are well separated in scatterplot.


– 1st DF (x-axis) separates cultivars 1 and 3 very well, but doesn’t not
perfectly separate cultivars 1 and 3, or cultivars 2 and 3.
– 2nd DF (y-axis) achieves a fairly good separation of cultivars 1 and 3,
and cultivars 2 and 3, although it is not totally perfect.
– To achieve a very good separation of the three cultivars, it would be
best to use both 1st DF 2nd DF together, since the 1st DF can separate
cultivars 1 and 3 very well, and the 2nd DF can separate cultivars 1 and
2, and cultivars 2 and 3, reasonably well.
• For more advanced graphic, use package ggplot2 only that can
deal with data frame object.

• Create new object contain type variable from wine data, and
value from wine.lda.values.

• Convert to data frame

• > newdata <- data.frame(type = wine[,1], lda =


wine.lda.values$x)
• > install.packages(ggplot2)
• > library(ggplot2)
• > ggplot(newdata) + geom_point(aes(lda.LD1, lda.LD2,
colour = type), size = 2.5)
• Prediction Accuracy

• Prediction accuracy of LDA is comparing prediction result


from the model output with the actual data.

• First step, activate caret package. Then, create prediction result


using train function.

• We will use confusion Matrix command to see the prediction


accuracy of the model.

• > library(caret)
• > install.packages(lattice )
• > library(lattice)
• > wine.lda.predict <-train(Type ~.,method="lda",data=wine)
• > confusionMatrix(wine$Type,predict(wine.lda.predict,wine))
• Visualization tool using klaR package.

• > install.packages(klaR)
• > library(klaR)
• > partimat(Type ~ Alcohol + Alcalinity, data = wine, method
= "lda")

• Add more variable to examine other variables to the wine type


classification

• > partimat(Type ~ Alcohol + Alcalinity + Ash + Magnesium,


data = wine, method = "lda")
• Interpret the Results

• Value of the coefficient for a particular predictor depends on


the other predictors included in the discriminant function.

• Signs of the coefficients are arbitrary, but they indicate which


variable values result in large and small function values and
associate them with particular groups.

• Relative importance of the predictors in discriminating between


the groups is measured by examining the absolute magnitude of
standardized discriminant function coefficients

• Predictors with relatively large standardized coefficients


contribute more to the discriminating power of the function as
compared with predictors with smaller coefficients, and are,
therefore, more important

You might also like