You are on page 1of 22

Series F

TQM Training Module on


Discriminant Analysis
Doc. No. F-09.01.20180602
Revision 01; 02nd June 2018
Authors: Pratik Kothari

For further clarifications, write to kothari.prartik@jsw.in © Total Quality Management, JSW Group
TQM Training Series: 6 series with 66 training modules
This is a training module on Discriminant Analysis (F-09)

Series-A Series-C Series-D Series-F


Basic Problem Solving Tools Quality Management Basics Productivity & Efficiency Tools Advanced Statistical Tools
A-01 Flow Charts C-01 Quality Mgmt. Basics D-01 Value Stream Mapping (VSM) F-01 Sampling & Distribution
A-02 Cause & Effects Diagram C-02 Basic Statistics D-02 Time & Motion Study F-02 Hypothesis Testing
A-03 Stratification C-03 Statistical Process Control D-03 SMED F-03 Regression
A-04 Scatter Diagram C-04 KPI Drill Down D-04 Wrench Time Analysis F-04 Basics of DoE
A-05 Control Charts C-05 KPI Benchmarking D-05 Queuing Theory F-05 Factorial DoE
A-06 Check Sheets C-06 Strategic Analysis Tools D-06 Inventory Management F-06 Principal Component Analysis
A-07 Histogram C-07 Policy Management D-07 Linear Programming Problem F-07 Cluster Analysis
C-08 Policy Diagnosis F-08 Conjoint Analysis
A-08 Pareto Charts D-08 Game Theory
C-09 Daily Management F-09 Discriminant Analysis
A-09 Graphs D-09 OEE
C-10 Daily Mgmt. in Maintenance F-10 Factor Analysis
D-10 PERT & CPM
Series-B C-11 Cross Functional Mgmt. F-11 Response Surface Method
C-12 Quality Assurance Basics
Basic Management Tools F-12 Taguchi DoE
C-13 MSA
B-01 Brainstorming Series-E F-13 Weibull Analysis
C-14 PFD, FMEA, Control Plan
B-02 Affinity Diagram Decision-making Tools
C-15 Cost of Poor Quality (COPQ)
B-03 Arrow Diagram C-16 Improvement Fundamentals E-01 Quality Function Deployment
B-04 Tree Diagram C-17 4i Methodology E-02 Fault Tree Analysis
B-05 PDPC C-18 5S E-03 AHP & Paired Analysis
B-06 Matrix Diagram C-19 Quality Circles E-04 Pugh Matrix
B-07 Matrix Data Analysis C-20 QC Story Approach E-05 Time Series Analysis
B-08 Relation Diagram C-21 Kaizen, OPL, Poka Yoke
Flow of the module: Discriminant Analysis

Introduction to Discriminant
1 Analysis

Objective of Discriminant
2 Analysis

3 Examples
Introduction to Discriminant Analysis

Questions!!

• How do you decide


– A student should be admitted to college or not?
– Is the product commercial or not?
– A person is suffering with a particular disease or not?

• Loans department of a bank wants to find out the credit worthiness of


applicants before disbursing loans, in order to screen the applicants and
prevent the bad debt in future. How do they do that?

History

• In the 1930’s, 3 different people – R.A. Fisher in UK, Hoteling in US and


Mahalanobis in India were trying to solve the same problem of classifying
the objects (based on a set of certain features) via three different
approaches.

• Later their methods of Fisher discriminant function, Hoteling’s T2 test and


Mahalanobis D2 distance were combined to devise Discriminant Analysis
Introduction to Discriminant Analysis
What is Discriminant analysis?

• Discriminant analysis is a multivariate statistical technique used for classifying a set


of observations into pre-defined groups
• It works with data that is already classified into groups to derive rules for
classifying new (and as yet unclassified) individuals on the basis of their observed
variable values
• E.g. In case of proper baking of the pie in oven, the oven temperature and the
baking duration become the observed variable values

Why discriminant analysis?

To classify observations into two or more groups when you have


a sample with known groups.
Using this analysis, you can do the following:
• Determine how accurately the observations are classified
into the known groups
• Evaluate how the predictor variables differentiate the
groups
• Predict the groups for observations that have
unknown groups
Flow of the module: Discriminant Analysis

Introduction to Discriminant
1 Analysis

Objective of Discriminant
2 Analysis

3 Examples
Objective of Discriminant Analysis

• To understand group differences and to predict the likelihood that a particular entity
will belong to a particular class or group based on independent variables

• To develop discriminant functions that are nothing but the linear combination of
independent variables that will discriminate between the categories of the
dependent variable in a perfect manner.

• It enables to examine whether significant differences exist among the groups, in


terms of the predictor variables.

• It also evaluates the accuracy of the classification.

Predictor Variable?

• Variables in the experiment that affect the response and can be set or measured by the
experimenter are called predictor, explanatory, or independent variables.

• For e.g. Temperature of the oven while cooking, or the temperature of the furnace while
making hot metal
Flow of the module: Discriminant Analysis

Introduction to Discriminant
1 Analysis

Objective of Discriminant
2 Analysis

3 Examples
Example

Example 1

High school administrators assign each student to one of three educational


tracks:
1 - for above average students who can learn independently and have strong
math and language skills
2 - for average students who learn best with a moderate amount of teacher
attention and have average math and language skills
3 - for students who require substantial interaction with the teacher and have
weak math and language skills
An intelligence test and motivation assessment were administered to 60
students from each track. School officials want to know if the students'
intelligence test and motivation assessment scores accurately classify student
placement.
The School administrator wants to create a model to classify future students
into one of three educational track based on the current students track. He uses
data of 180 current students
Example

Doing it in Minitab

In Groups, enter Track.


Stat > Multivariate > Discriminant Analysis In Predictors, enter Test Score and Motivation.
Select Linear, in Discriminant Function
Example

Interpreting Results

Summary of classification

True Group No. of Observations grouped


Put into Group 1 2 3 incorrectly
1 59 5 0
2 1 53 3
No. of Observations grouped
3 0 2 57
correctly
Total N 60 60 60
N correct 59 53 57
Proportion 0.983 0.883 0.950

N = 180 N Correct = 169


93.9% of the observations were correctly
Proportion Correct = 0.939 grouped as compared to true group, high
proportion indicates that model has good
ability to correctly group the observations.
Example

Interpreting Result

Squared Distance Between Groups The distance between Group 3


The distance between
and 1 is the highest, indicating
Group 1 and 2 is the 12.98,
1 2 3 a good discrimination between
indicating a good
groups
discrimination between 1 0.0000 12.9853 48.0911
groups 1 and 2 as 2 12.9853 0.0000 11.3197
compared to group 2 and 3.
3 48.0911 11.3197 0.0000

Squared distance or Mahalanobis distance is a measure of the distance Naturally, distance between
group 3 and group 2 is least
between a point P and a distribution D, introduced by P. C Mahalanobis in 1036.
among pairs, comparatively
It is a multi dimensional generalization of the idea of measuring how many among other pairs the
standard deviation away P is from mean of D, This distance is zero if P is at the groups are closer and hence
mean of D, and grows as P moves away from the mean: along each principle discrimination is least.
component axis, it measures the number of standard deviations from P to the
mean of D. If each of these axes is rescaled to have unit variance, then
Mahalanobis distance corresponds to standard Euclidean distance in the
transformed space. Mahalanobis distance is thus unit less and scale- invariant,
and take into account the correlations of the data set.
Example

Interpreting Result

Linear Discriminant Function for Groups The group with largest


discriminant function
contributes most to the
classification of observations.
1 2 3
Constant -237.85 -170.58 -115.65 Now, if you have IQ score and
IQ Score 1.57 1.19 0.90 Motivation level of a new
Motivation 5.15 4.66 4.00 student and you want to
classify him/her into one of the
groups, you use linear
𝑮𝒓𝒐𝒖𝒑 𝟏 = −𝟐𝟑𝟕. 𝟖𝟓 + 𝟏. 𝟓𝟕 𝑰𝑸 𝒔𝒄𝒐𝒓𝒆 + 𝟓. 𝟏𝟓( 𝑴𝒐𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏) discriminant function for
groups. You put the IQ score
and motivation value in each
𝑮𝒓𝒐𝒖𝒑 𝟐 = −𝟏𝟕𝟎. 𝟓𝟖 + 𝟏. 𝟏𝟗 𝑰𝑸 𝒔𝒄𝒐𝒓𝒆 + 𝟒. 𝟔𝟔( 𝑴𝒐𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏) equation. The student belongs
to the group corresponding to
𝑮𝒓𝒐𝒖𝒑 𝟑 = −𝟏𝟏𝟓. 𝟔𝟓 + 𝟎. 𝟗𝟎 𝑰𝑸 𝒔𝒄𝒐𝒓𝒆 + 𝟒. 𝟎𝟎( 𝑴𝒐𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏) equation with highest value.
Example

Interpreting Result

True Pred Squared According to analysis, school


Observation Group Group Group Distance Probability administrator wrongly classified
4** 1 2 1 3.524 0.438 Observation 4 into group 1.
2 3.028 0.562 It can be observed that the
3 25.579 0.000 estimated probability of it being
65** 2 1 1 2.764 0.677
classified into group 2 is more
2 4.244 0.323
than that of group 1. Moreover,
3 29.419 0.000
squared distance of observation
71** 2 1 1 3.357 0.592
2 4.101 0.408 4 from group 2 is less than
3 27.097 0.000 group 1, indicating affinity
78** 2 1 1 2.327 0.775 towards group 2. It can also be
2 4.801 0.225 observed that probability of
3 29.695 0.000 observation 4 being classified
79** 2 1 1 1.528 0.891 into group 3 is 0 and squared
2 5.732 0.109 distance is also high.
3 32.524 0.000
100** 2 1 1 5.016 0.878
2 8.962 0.122
3 38.213 0.000
Example

Developing Thumb Rule

The thumb rule can be made


by observing this scatter plot
that students with scores
above 125 in IQ and above 50
in Motivation is grouped in
Track 1, and so on.
Example

Example 2

Salmon fish from Alaska and Canada are born in freshwater and then swim to
ocean after about an year. After about 2 years in the ocean, they return to their
place of birth to spawn and die. They are harvested in the ocean when they are
about to return as mature fish. To help regulate catches, samples of fish taken
during the harvest must be identified for its origin. Fish carry information about
their origin in the growth rings in their scales. The rings associated with
freshwater are smaller for the Alaskan-born than for Canadian born. In order to
regulate catches of salmon stocks, it is desirable to identify fish as being of
Alaskan or Canadian origin. Fifty fish from each place of origin were caught and
growth ring diameters of scales were measured for the time when they lived in
freshwater and for the subsequent time when they lived in saltwater. The goal
is to be able to identify newly-caught fish as being from Alaskan or
Canadian stocks.

Data is in Exh mvar.MTW in Minitab sample data folder.


Example

Doing it in Minitab

Enter the column


containing data
concerning group ID

Enter the column


containing data
concerning predictor
variables

Click OK
Example

Summary of classification
True Group For the fish data:
Put into Group Alaska Canada • Group Alaska has a placement of
Alaska 44 1 88.0% (0.880)
Canada 6 49 • Group Canada has a placement of
Total N 50 50 98.0% (0.980)
N correct 44 49 Overall, 93 out of 100 fishes or 93.9%
Proportion 0.880 0.980 (0.939), are correctly placed.

N = 100 N Correct = 93
Squared Distance Between Groups
Proportion Correct = 0.930 Alaska Canada
Alaska 0 8.29187
Canada 8.29187 0
Linear Discriminant Function for Groups
The Squared distance between two groups
Alaska Canada
is 8.29187.
Constant -100.68 -95.14
Marine 0.38 0.33 The larger the distance , the more
Freshwater 0.37 0.50 distinct the groups will be.
Example

These terms are wrongly grouped as shown by the squared distance,


smaller the distance more the affinity corresponding to that group.

Summary of Misclassified Observations

Squared
Observation True Group Pred Group Group Distance Probability This shows the
1** Alaska Canada Alaska 3.544 0.428 probability that a
Canada 2.960 0.572
particular new
2** Alaska Canada Alaska 8.1131 0.019
Canada 0.2729 0.981 observation falls into
12** Alaska Canada Alaska 4.7470 0.118 either of our groups
Canada 0.7270 0.882
13** Alaska Canada Alaska 4.7470 0.118
Canada 0.7270 0.882
30** Alaska Canada Alaska 3.230 0.289
Canada 1.429 0.711
32** Alaska Canada Alaska 2.271 0.464
Canada 1.985 0.536
71** Canada Alaska Alaska 2.045 0.948
Canada 7.849 0.052
Example

Developing Thumb Rule

Scatterplot of Freshwater vs Marine


SalmonOrigin
175 Alaska
Canada

A thumb rule can 150 A thumb rule can


be developed that be developed
a fish having ring that a fish
Freshwater

125
diameter < 368 in having ring dia
marine and ring >368 in marine
diameter >108 100 and ring dia <
can be safely 108 can be
classified as 75 safely classified
Canadian Fish. as Alaskan Fish.
50
300 350 400 450 500
Marine
Please login to:
tqm.jsw.in
to read the training modules
THANK YOU

You might also like