Wine Discriminant Analysis

Discriminant Analysis
Classify objects into two or more groups based on knowledge

of some characteristics or attributes related to them
• Theory of Discriminant Analysis
• Linear discriminant analysis (LDA) is used for pattern

recognition and classification by doing linear combination of
features or characterizes of objects or customers. Resulting
combination may be used for a linear classifier.
• LDA approaches the problem by assuming that the conditional

probability density functions are both normally distributed
with mean and covariance parameters respectively.
• Under this assumption, the Bayes optimal solution is to predict

points as being from the second class if the log of the
likelihood ratios is below some threshold T.
• Applications
• Bankruptcy prediction
– Based on financial variables e.g. ratios, discriminant analysis was first
used to explain which firms entered bankruptcy vs. survived.
• Marketing
– Determine the factors which distinguish different types of customers
and products on the basis of surveys or other forms of collected data.
• In discriminant analysis dependent variable is categorical and

independent variables are interval in nature.
– When dependent variable has two categories, the technique is known
as two-group discriminant analysis
– When three or more categories are involved, the technique is referred
to as multiple discriminant analysis
• Discriminant analysis can answer the following questions:
– Based on demographic characteristics, how do you differ customers

who exhibit store loyalty from those who do not?
– Do heavy, medium and light users of soft drinks differ in terms of

their consumption of frozen foods?
– What psychographic characteristics help to differentiate between

price-sensitive and non-price-sensitive buyers of groceries?
– Are different market segments differ in their media consumption

habits?
– What is the difference between heavy patrons of regional department

store chains and national chains based on lifestyles,?
• Objectives of discriminant analysis are as follows:
– Development of discriminant functions of the predictor or independent

variables, which will best discriminate between the categories of
criterion or dependent variable (groups)
– Examination of whether significant differences exist among the groups,

in terms of the predictor variables
– Determination of which predictor variables contribute to most of the

intergroup differences
– Classification of cases to one of the groups based on the values of the

predictor variables
– Evaluation of the accuracy of classification

• Discriminant analysis model involves linear combinations of
following form: D = blX1 + b2X2 + b3X3 + ... + bkXk
• Where D = discriminant score, b's = discriminant coefficient or

weight X's = predictor or independent variable
• Coefficients or weights (b) are estimated so that the groups

differ as much as possible on the values of discriminant
function, which occurs when the ratio of between group sum
of squares to within-group sum of squares for the discriminant
scores is at a maximum
• Any other linear combination of the predictors will result in a

smaller ratio
• Geometrical exposition of two-group (G1 and G2) discriminant
analysis was measured on two variables Xl and X2 as shown in
Figure, where X1 and X2 are two axes and members of G1 are
denoted by 1 and members of G2 by 2
• Resultant ellipses encompass some specified percentage of the

points (members), say 93 percent in each group
• A straight line is drawn through the two points where the

ellipses intersect and then projected to a new axis, D
• Overlap between univariate distributions Gl and G2, represented

by the shaded area, is smaller than would be obtained by any
other line drawn through the ellipses representing the scatter
plots. Thus, the groups differ as much as possible on the D axis
• Statistics associated with discriminant analysis are
• Centroid is the mean values for the discriminant scores for a

particular group.
– There are as many centroids as there are groups, because there is one
for each group.
– Means for a group on all the functions are the group centroids
• Discriminant function coefficients are the multipliers of

variables, when the variables are in the original units of
measurement
• Discriminant scores is the multiplication of coefficients and

the values of the variables, and these products are summed
and added to the constant term to obtain discriminant scores
• Classification or confusion or prediction matrix contains
the number of correctly classified and misclassified cases.
– Correctly classified cases appear on the diagonal, because the predicted

and actual groups are the same.
– Off-diagonal elements represent the cases that have been incorrectly

classified.
– Sum of the diagonal elements divided by the total number of cases

represents the hit ratio
• Proportion of trace gives the percentage separation achieved

by each discriminant function.
• Wine Classification
• The wine data set have 13 chemical concentrations describing

wine samples from three cultivars.
• Cultivar is a variety of a plant developed from natural species

and maintained under cultivation.
• Objective is to find the linear combinations of the original

variables (13 chemical concentrations) that gives the best
possible separation between the groups (wine cultivars).
• To separate the wines by cultivar, the wines come from three

different cultivars, so the number of groups G = 3 and the
number of variables is 13 (chemicals concentrations) i.e. p=13.
• Wine Classification
• Maximum number of useful discriminant functions that can

separate the wines by cultivar is the minimum of G−1and p. So
in this case it is the minimum of 2 and 13.
• Thus, we can find at most 2 useful discriminant functions to

separate the wines by cultivar, using the 13 chemical
concentration variables.
• Installed these packages.

– install.packages("MASS")
– install.packages("car")
– install.packages("caret")
– install.packages("klaR")
– install.packages("rattle.data")
• Search for pre-installed database
• > data(wine, package = 'rattle.data’)

• > attach(wine)
• > head(wine)
• > str(wine)
• Wine data showed that Type variable is a categorical data

consist of 3 type categories.
• Notice the category is more than 2, thus linear discriminant

analysis is appropriate method in this case.
• The purpose of the linear discriminant analysis is to find

combination of the variables that give best possible separation
between groups (wine cultivars) in our data set.
• For convenience, the value for each discriminant function are
scaled so that their mean value is zero and its variance is one.
• wine.lda <- lda(Type ~., data = wine)

• > wine.lda
• The linear discriminant function from the result is

– LD1 =−.403∗Alcohol + 0.165∗Malic − 0.369∗Ash + 0.155∗Alcalinity
− 0.002∗Magnesium + 0.618∗Phenols − 1.66∗Flavanoids −
1.496∗Nonflavanoids + 0.134∗Proanthocyanins + 0.355∗Color −
0.818∗Hue − 1.15∗Dilution − 0.003∗Proline
• “Proportion of trace” gives the percentage separation achieved

by each discriminant function.
– First DF does achieve a good separation between three groups (three
cultivars), but second DF does improve the separation of the groups by
quite a large amount, so is it worth using the second DF as well.
Therefore, to achieve a good separation of the groups (cultivars), it is
necessary to use both of the first two discriminant functions.
• Stacked Histogram of the LDA Values
• Histogram displays the result of linear discriminant analysis.
• Make prediction based on LDA function and store as an

object. Predict function generate value from selected model
function.
• The length of the value predicted will be correspond with the

length of the processed data.
• Then using ldahist() function to plot histogram.
• > wine.lda.values <- predict(wine.lda)

• > ldahist(wine.lda.values$x[,1], g = Type)
• Make a stacked histogram of second discriminant function’s
values separates those cultivars
• > ldahist(wine.lda.values$x[,2], g = Type)
• Scatterplot of the best two discriminant functions, with the

data points labelled by cultivar, by typing:
• Make a scatterplot
• > plot(wine.lda.values$x[,1],wine.lda.values$x[,2])
• Add labels
• > text(wine.lda.values$x[,1],wine.lda.values$x[,2],Type,cex=0
.7,pos=4,col="red")
• Scatterplots of Discriminant Functions
• # make a scatterplot
• > plot(wine.lda.values$x[,1],wine.lda.values$x[,2])
• # add labels
• > text(wine.lda.values$x[,1],wine.lda.values$x[,2],wine$Type,
cex=0.7,pos=4,col="red")
• Wines from three cultivars are well separated in scatterplot.

– 1st DF (x-axis) separates cultivars 1 and 3 very well, but doesn’t not
perfectly separate cultivars 1 and 3, or cultivars 2 and 3.
– 2nd DF (y-axis) achieves a fairly good separation of cultivars 1 and 3,
and cultivars 2 and 3, although it is not totally perfect.
– To achieve a very good separation of the three cultivars, it would be
best to use both 1st DF 2nd DF together, since the 1st DF can separate
cultivars 1 and 3 very well, and the 2nd DF can separate cultivars 1 and
2, and cultivars 2 and 3, reasonably well.
• For more advanced graphic, use package ggplot2 only that can
deal with data frame object.
• Create new object contain type variable from wine data, and
value from wine.lda.values.
• Convert to data frame
• > newdata <- data.frame(type = wine[,1], lda =

wine.lda.values$x)
• > install.packages(ggplot2)
• > library(ggplot2)
• > ggplot(newdata) + geom_point(aes(lda.LD1, lda.LD2,
colour = type), size = 2.5)
• Prediction Accuracy
• Prediction accuracy of LDA is comparing prediction result

from the model output with the actual data.
• First step, activate caret package. Then, create prediction result

using train function.
• We will use confusion Matrix command to see the prediction

accuracy of the model.
• > library(caret)
• > install.packages(lattice )
• > library(lattice)
• > wine.lda.predict <-train(Type ~.,method="lda",data=wine)
• > confusionMatrix(wine$Type,predict(wine.lda.predict,wine))
• Visualization tool using klaR package.
• > install.packages(klaR)
• > library(klaR)
• > partimat(Type ~ Alcohol + Alcalinity, data = wine, method
= "lda")
• Add more variable to examine other variables to the wine type

classification
• > partimat(Type ~ Alcohol + Alcalinity + Ash + Magnesium,

data = wine, method = "lda")
• Interpret the Results
• Value of the coefficient for a particular predictor depends on

the other predictors included in the discriminant function.
• Signs of the coefficients are arbitrary, but they indicate which

variable values result in large and small function values and
associate them with particular groups.
• Relative importance of the predictors in discriminating between

the groups is measured by examining the absolute magnitude of
standardized discriminant function coefficients
• Predictors with relatively large standardized coefficients

contribute more to the discriminating power of the function as
compared with predictors with smaller coefficients, and are,
therefore, more important

Wine Discriminant Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wine Discriminant Analysis

Uploaded by

Copyright:

Available Formats

Discriminant Analysis

Classify objects into two or more groups based on knowledge

• Linear discriminant analysis (LDA) is used for pattern

• LDA approaches the problem by assuming that the conditional

• Under this assumption, the Bayes optimal solution is to predict

• In discriminant analysis dependent variable is categorical and

– Based on demographic characteristics, how do you differ customers

– Do heavy, medium and light users of soft drinks differ in terms of

– What psychographic characteristics help to differentiate between

– Are different market segments differ in their media consumption

– What is the difference between heavy patrons of regional department

– Development of discriminant functions of the predictor or independent

– Examination of whether significant differences exist among the groups,

– Determination of which predictor variables contribute to most of the

– Classification of cases to one of the groups based on the values of the

– Evaluation of the accuracy of classification

• Where D = discriminant score, b's = discriminant coefficient or

• Coefficients or weights (b) are estimated so that the groups

• Any other linear combination of the predictors will result in a

• Resultant ellipses encompass some specified percentage of the

• A straight line is drawn through the two points where the

• Overlap between univariate distributions Gl and G2, represented

• Centroid is the mean values for the discriminant scores for a

• Discriminant function coefficients are the multipliers of

• Discriminant scores is the multiplication of coefficients and

– Correctly classified cases appear on the diagonal, because the predicted

– Off-diagonal elements represent the cases that have been incorrectly

– Sum of the diagonal elements divided by the total number of cases

• Proportion of trace gives the percentage separation achieved

• The wine data set have 13 chemical concentrations describing

• Cultivar is a variety of a plant developed from natural species

• Objective is to find the linear combinations of the original

• To separate the wines by cultivar, the wines come from three

• Maximum number of useful discriminant functions that can

• Thus, we can find at most 2 useful discriminant functions to

• Installed these packages.

• > data(wine, package = 'rattle.data’)

• Wine data showed that Type variable is a categorical data

• Notice the category is more than 2, thus linear discriminant

• The purpose of the linear discriminant analysis is to find

• wine.lda <- lda(Type ~., data = wine)

• The linear discriminant function from the result is

• “Proportion of trace” gives the percentage separation achieved

• Histogram displays the result of linear discriminant analysis.

• Make prediction based on LDA function and store as an

• The length of the value predicted will be correspond with the

• Then using ldahist() function to plot histogram.

• > wine.lda.values <- predict(wine.lda)

• > ldahist(wine.lda.values$x[,2], g = Type)

• Scatterplot of the best two discriminant functions, with the

• Wines from three cultivars are well separated in scatterplot.

• Convert to data frame

• > newdata <- data.frame(type = wine[,1], lda =

• Prediction accuracy of LDA is comparing prediction result

• First step, activate caret package. Then, create prediction result

• We will use confusion Matrix command to see the prediction

• Add more variable to examine other variables to the wine type

• > partimat(Type ~ Alcohol + Alcalinity + Ash + Magnesium,

• Value of the coefficient for a particular predictor depends on

• Signs of the coefficients are arbitrary, but they indicate which

• Relative importance of the predictors in discriminating between

• Predictors with relatively large standardized coefficients