58 views

Uploaded by calmchandan

- Decision Tree Report
- DiscriminantAnalysis_BasicRelationships
- Applying Data Mining Techniques on Soil Fertility Prediction
- Paper-13_A Hybrid Method of Feature Subset Selection for Webpage Classification by Eliminating Noise
- Handouts on Data-driven modelling, part 1 (UNESCO-IHE)
- Naive Baye Seng
- MSIC_2008.pdf
- Headache Disease Type Classification and Predicting System using Data Mining Techniques
- Introduction
- FACE RECOGNITION USING SUPPORT VECTOR MACHINES
- Supervised Learning
- Prediction of Diabetes using Probability Approach
- Change Detection Techniques
- Raja 22
- A Fuzzy Self Constructing Feature Clustering Algorithm for Text Classification
- A STUDY ON STUDENTS ACADEMIC PERFORMANCE ANALYSIS USING CLASSIFICATION AND PREDICTION TECHNIQUES USING DATA MINING TECHNIQUES IN ARAKKONAM HIGHER SECONDARY SCHOOL
- DiscriminantAnalysis_BasicRelationships
- [IJCST-V5I2P25]:S. Gopi, A. Berno Raj, M. Abinav, P.Gokul Sarathy, D.P. Bharath
- 316145
- A Study on Machine Learning Algorithms

You are on page 1of 29

Discriminant Analysis

Discriminant Analysis

Discriminant analysis helps in discriminating between two or more sets of objects or people based on the knowledge of some of their characteristics Discriminate between Bones or skeletons of males or females Dividing people into potential buyers or non buyers Classifying individuals as good or bad credit risk Classifying companies as good or bad investment risks Classifying consumers as brand loyal or brand switchers

Similarities and Differences between ANOVA, Regression, and Discriminant Analysis DISCRIMINANT ANALYSIS ANOVA REGRESSION

Similarities Number of dependent variables Number of independent variables Differences Nature of the dependent variables Nature of the independent variables One One One

Multiple

Multiple

Multiple

Metric Categorical

Metric Metric

Categorical Metric

Discriminant Analysis

Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor or independent variables are metricin nature. The objectives of discriminant analysis are as follows: Development of discriminant functions, or linear combinations of the predictor or independent variables, which will best discriminate between the categories of the criterion or dependent variable (groups). Classification of cases to one of the groups based on the values of the predictor variables. Evaluation of the accuracy of classification.

Discriminant Analysis

When the criterion variable has two categories, the technique is known as two-group discriminant analysis. When three or more categories are involved, the technique is referred to as multiple discriminant analysis.

The discriminant analysis model involves linear combinations of the following form: D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk where D = discriminant score b 's = discriminant coefficient or weight X 's = predictor or independent variable The coefficients, or weights (b), are estimated so that the groups differ as much as possible on the values of the discriminant function.

Formulate the Problem

Identify the objectives, the dependent variable, and the independent variables. The dependent variable must consist of two or more mutually exclusive and collectively exhaustive categories. (Gender, Credit Risk, Investment Risk,) The independent variables should be selected based on a theoretical model or previous research, or the experience of the researcher. Collect data on independent variables for each category of criterion variable One part of the sample, called the estimation or analysis sample, is used for estimation of the discriminant function. The other part, called the holdout or validation sample, is reserved for validating the discriminant function. Often the distribution of the number of cases in the analysis and validation samples follows the distribution in the total sample.

Example

To determine salient characteristics of families that visited a vacation resort during last two years Data were obtained from a sample of 42 families of which 30 were included in analysis sample & 12 in validation sample HHs that visited resort coded as 1 & those that did not as 2 Both analysis and hold out samples were balanced in terms of visits Independent variables were --- Family income (V1) ---Attitude towards travel measured on a 9 point scale (V2) ---Importance attached to family vacation measured on a 9 point scale(V3) ---HH Size(V4) ---Age of the head of the HH(V5)

Annual Attitude Family Importance Household Age of Amount Toward Attached Size Head of No. Visit Income Travel Household Family Vacation

Spent on Vacation

Resort to Family

($000)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

50.2 70.3 62.9 48.5 52.7 75.0 46.2 57.0 64.1 68.1 73.4 71.9 56.2 49.3 62.0

5 6 7 7 6 8 5 2 7 7 6 5 1 4 5

8 7 5 5 6 7 3 4 5 6 7 8 8 2 6

3 4 6 5 4 5 3 6 4 5 5 4 6 3 2

43 61 52 36 55 68 62 51 57 45 44 64 54 56 58

M (2) H (3) H (3) L (1) H (3) H (3) M (2) M (2) H (3) H (3) H (3) H (3) M (2) H (3) H (3)

Resort to Family Annual Attitude Family Importance Household Age of Amount Toward Attached Size Head of No. Visit Income Travel Household Family Vacation 4 3 5 2 6 6 2 5 4 7 1 3 8 2 3 3 2 2 4 3 2 2 3 5 4 3 2 2 3 2 58 55 57 37 42 45 57 51 64 54 56 36 50 48 42 L L M M M L M L L L M M L L L (1) (1) (2) (2) (2) (1) (2) (1) (1) (1) (2) (2) (1) (1) (1)

Spent on Vacation 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

($000) 32.1 36.2 43.2 50.4 44.1 38.3 55.0 46.1 35.0 37.3 41.8 57.0 33.4 37.5 41.3 5 4 2 5 6 6 1 3 6 2 5 8 6 3 3

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Amount

Importance Household Age of Toward Attached Size Head of No. Visit Income Household Family Vacation

1 2 3 4 5 6 7 8 9 10 11 12

1 1 1 1 1 1 2 2 2 2 2 2

50.8 63.6 54.0 45.0 68.0 62.1 35.0 49.6 39.4 37.0 54.5 38.2

4 7 6 5 6 5 4 5 6 2 7 2

7 4 7 4 6 6 3 3 5 6 3 2

3 7 4 3 6 3 4 5 3 5 3 3

45 55 58 60 46 56 54 39 44 51 37 49

M(2) H (3) M(2) M(2) H (3) H (3) L (1) L (1) H (3) L (1) M(2) L (1)

The direct method involves estimating the discriminant function so that all the predictors are included simultaneously. In stepwise discriminant analysis, the predictor variables are entered sequentially, based on their ability to discriminate among groups.

INCOME TRAVEL VACATION 5.40000 4.33333 4.86667 HSIZE AGE 60.52000 41.91333 51.21667 5.80000 4.06667 4.9333 4.33333 2.80000 3.56667 53.73333 50.13333 51.93333 Group Standard Deviations 1 2 Total 9.83065 7.55115 12.79523 1.91982 1.95180 1.97804 1.82052 2.05171 2.09981 1.23443 .94112 1.33089 HSIZE 8.77062 8.27101 8.57395 AGE

Pooled Within-Groups Correlation Matrix INCOME TRAVEL VACATION INCOME TRAVEL VACATION HSIZE AGE 1.00000 0.19745 0.09148 0.08887 - 0.01431 1.00000 0.08434 -0.01681 -0.19709

1.00000 -0.04301

1.00000

Wilks' (U-statistic) and univariate F ratio with 1 and 28 degrees of freedom Variable INCOME TRAVEL VACATION HSIZE AGE Wilks' 0.45310 0.92479 0.82377 0.65672 0.95441 F 33.800 2.277 5.990 14.640 1.338 Significance 0.0000 0.1425 0.0209 0.0007 0.2572

Contd.

Interpretation

When Predictors are considered individually only Income, Importance of vacation &HH size significantly differentiate between those who visited resort & those who did not( F ratio with k=1 & n-k-1= 30-1-1=28 d.f ) Wilks lambda (also called U statistics) is ratio of within group SS to total SS. Its value varies between 0 to 1 Small value of lambda indicate that group means are different & a better discrimination power of that variable

The null hypothesis that, in the population, the means of discriminant functions in both groups are equal can be statistically tested. In SPSS this test is based on Wilks' . If the null hypothesis is rejected, indicating significant discrimination, one can proceed to interpret the results.

cont.

CANONICAL DISCRIMINANT FUNCTIONS Function 1* Eigenvalue 1.7862 % of Variance 100.00 Cum Canonical After Wilks' % Correlation Function Chi-square : 0 0 .3589 26.130 100.00 0.8007 : df Significance 5 0.0001

* marks the 1 canonical discriminant functions remaining in the analysis. Standard Canonical Discriminant Function Coefficients FUNC INCOME TRAVEL VACATION HSIZE AGE 0.74301 0.09611 0.23329 0.46911 0.20922 1

Structure Matrix: Pooled within-groups correlations between discriminating variables & canonical discriminant functions (variables ordered by size of correlation within function) FUNC INCOME HSIZE VACATION TRAVEL AGE 0.82202 0.54096 0.34607 0.21337 0.16354 Contd. 1

cont.

Unstandardized Canonical Discriminant Function Coefficients INCOME TRAVEL VACATION HSIZE AGE (constant) FUNC 1 0.8476710E-01 0.4964455E-01 0.1202813 0.4273893 0.2454380E-01 -7.975476 Canonical discriminant functions evaluated at group means (group centroids) Group 1 2 FUNC 1 1.29118 -1.29118

Classification results for cases selected for use in analysis Actual Group Group Group 1 2 Predicted No. of Cases 15 15 Group Membership 1 2 12 80.0% 0 0.0% 3 20.0% 15 100.0% Contd.

Interpretation

Since 0.0001 is less than .05 we reject the null hypothesis of equality of group means indicating better discriminating power of the discriminant function The unstandardised discriminant function is D= -7.975476 + +0.8476710E-01(INCOME) +0.4964455E-01(TRAVEL) +0.1202813(VACATION) +0.4273893(HSIZE) +0.2454380E-01(AGE)

Group Centroids are values of discriminant function at Group Means The average of the two Group Centroids gives the cut off point The Group Centroids For the Resort Example are: Group Centroid 1 1.29118 2 -1.29118 The average of two centroids is 0 Therefore any positive value of discriminant Score will lead to Classification as Resort Visit & negative value to No Resort Visit

Resort

50.8

45

cont.

Classification Results for cases not selected for use in the analysis (holdout sample) Actual Group Group Group 1 2 Predicted Group Membership No. of Cases 1 6 6 4 66.7% 0 0.0% 2 2 33.3% 6 100.0%

Interpretation of Results

Find out the percentage of cases correctly classified by the model Find out variables which are relatively better in discriminating between groups How to classify a new subject into one of the groups

We can obtain some idea of the relative importance of the variables by examining the absolute magnitude of the standardized discriminant function coefficients. Some idea of the relative importance of the predictors can also be obtained by examining the structure correlations, also called canonical loadings or discriminant loadings. These simple correlations between each predictor and the discriminant function represent the variance that the predictor shares with the function. Another aid to interpreting discriminant analysis results is to develop a characteristic profile for each group by describing each group in terms of the group means for the predictor variables.

The discriminant weights, estimated by using the analysis sample, are multiplied by the values of the predictor variables in the holdout sample to generate discriminant scores for the cases in the holdout sample. The cases are then assigned to groups based on their discriminant scores and an appropriate decision rule. The hit ratio, or the percentage of cases correctly classified, can then be determined by summing the diagonal elements and dividing by the total number of cases.

Stepwise discriminant analysis is analogous to stepwise multiple regression in that the predictors are entered sequentially based on their ability to discriminate between the groups. An F ratio is calculated for each predictor by conducting a univariate analysis of variance in which the groups are treated as the categorical variable and the predictor as the criterion variable. The predictor with the highest F ratio is the first to be selected for inclusion in the discriminant function, if it meets certain significance and tolerance criteria. A second predictor is added based on the highest adjusted or partial F ratio, taking into account the predictor already selected.

Each predictor selected is tested for retention based on its association with other predictors selected. The process of selection and retention is continued until all predictors meeting the significance criteria for inclusion and retention have been entered in the discriminant function. The selection of the stepwise procedure is based on the optimizing criterion adopted. The Mahalanobis procedure is based on maximizing a generalized measure of the distance between the two closest groups. The order in which the variables were selected also indicates their importance in discriminating between the groups.

SPSS Windows

The DISCRIMINANT program performs both two-group and multiple discriminant analysis. To select this procedure using SPSS for Windows click: Analyze>Classify>Discriminant

Eigen Value = Between S S/ Within SS Wilk,s Lamda= WithinSS/TotalSS Canonical R = Correlation beween estimated Y & actual Y

- Decision Tree ReportUploaded bypoojavvce
- DiscriminantAnalysis_BasicRelationshipsUploaded byebtg_f
- Applying Data Mining Techniques on Soil Fertility PredictionUploaded byATS
- Paper-13_A Hybrid Method of Feature Subset Selection for Webpage Classification by Eliminating NoiseUploaded byRachel Wheeler
- Handouts on Data-driven modelling, part 1 (UNESCO-IHE)Uploaded bysolomatine
- Naive Baye SengUploaded bynielsmillikan
- MSIC_2008.pdfUploaded byNurul Alisha Zulaikha Azmi
- Headache Disease Type Classification and Predicting System using Data Mining TechniquesUploaded byEditor IJRITCC
- IntroductionUploaded byTapasKumarDash
- Supervised LearningUploaded bylacie
- Prediction of Diabetes using Probability ApproachUploaded byAnonymous CUPykm6DZ
- A Fuzzy Self Constructing Feature Clustering Algorithm for Text ClassificationUploaded byVenkatesh Gangula
- FACE RECOGNITION USING SUPPORT VECTOR MACHINESUploaded byzzzxxccvvv
- Change Detection TechniquesUploaded byOluwafemi Opaleye
- A STUDY ON STUDENTS ACADEMIC PERFORMANCE ANALYSIS USING CLASSIFICATION AND PREDICTION TECHNIQUES USING DATA MINING TECHNIQUES IN ARAKKONAM HIGHER SECONDARY SCHOOLUploaded byAnonymous vQrJlEN
- DiscriminantAnalysis_BasicRelationshipsUploaded bysnehagpt
- Raja 22Uploaded byKaushik Kr
- [IJCST-V5I2P25]:S. Gopi, A. Berno Raj, M. Abinav, P.Gokul Sarathy, D.P. BharathUploaded byEighthSenseGroup
- 316145Uploaded bytac_0883
- A Study on Machine Learning AlgorithmsUploaded byInternational Journal of Advanced and Innovative Research
- Sta Tug Disc RimUploaded bydkanand86
- 3 - Feature ExtractionUploaded bylittle_john85
- Discriminant Analysis an Illustrated Example (1)Uploaded byMuhammad Ashari Fitra Rachmannullah
- Interaction! - Help - Windows software for graphing statistical interactions III.pdfUploaded bydfisher
- Group TechnologyUploaded byahmed
- Nardulli Althaus Hayes - Progressive Supervised Learning System for Event Data -Sociological Methodology-2015Uploaded byGeorge Connell
- Implementation of Data Mining Algorithms using RUploaded byGRD Journals
- INTELLIGENT CLASSIFICATION TECHNIQUE OF HUMAN BRAIN MRI.docxUploaded byPriya
- Novel Approaches DiscriminationUploaded byRudrasis Chakraborty
- random forest 5.pdfUploaded byAllam Jayaprakash

- Brand audit of "Smirnoff " VodkaUploaded bycalmchandan
- What is BrandUploaded bycalmchandan
- Strategic Brand ManagementUploaded bycalmchandan
- Sally BusUploaded bycalmchandan
- brand extensionUploaded byVirang Sahlot
- strategic managementUploaded bydurgathp
- Managing Brands Over Geographic BoundariesUploaded bycalmchandan
- Levaraging Secondary Brand Knowledge to Build Brand EquityUploaded bycalmchandan
- elliotpercypervan2e_ch02Uploaded bycalmchandan
- 450_Ch9Uploaded byAnkit_4668
- Integrated Brand CommunicationsUploaded byWole Michael
- IBS 27dec 11 Creative Brief for Clas-exerciseUploaded bycalmchandan
- Global Brand ManagementUploaded bycalmchandan
- elliotpercypervan2e_ch07Uploaded bycalmchandan
- elliotpercypervan2e_ch06Uploaded bycalmchandan
- elliotpercypervan2e_ch05Uploaded bycalmchandan
- elliotpercypervan2e_ch04Uploaded bycalmchandan
- elliotpercypervan2e_ch03Uploaded bycalmchandan
- elliotpercypervan2e_ch01Uploaded bycalmchandan
- Creating Brand EquityUploaded bycalmchandan
- Choosing Brand Elements to Build Brand EquityUploaded bycalmchandan
- Building Measuring and Leveraging Brand Equity-27 Slides to ShareUploaded bycalmchandan
- Brands & Branding Management Presentation & DiscussionUploaded bycalmchandan
- Branding and Marketing Promotion StrategiesUploaded bycalmchandan
- Brand CommunicationsUploaded byapi-3757737
- Brand Audit GuidelinesUploaded byTania Bhattacharya
- Strategic Brand Management 3 chapter 03Uploaded byBilawal Shabbir
- Consumer PerceptionsUploaded byZishan Mehmood
- Types of LC - Scotia bankUploaded bygmsangeeth
- IBS MumbaiUploaded bycalmchandan

- Cree Design Considerations for Designing With Cree SiC Modules Part 2Uploaded bytongai
- Equilibrium ConstantUploaded byBettinamae Ordiales De Mesa
- 1 - Altman - Financial Ratios, Discriminant Analysis, And the Prediction of Corporate BankruptcyUploaded bydavidfpessoa
- Handbook ResumeUploaded byPradisa Radja Lazuardi
- Firm 1Uploaded byPeter G. Mwangi
- Modeling Oxygen transfer of Multiple Plunging Jets Aerators using Artificial Neural NetworksUploaded byAdvanced Research Publications
- 9a- Geometric Dimension Ing & TolerancingUploaded byapi-3815216
- Math ProblemsUploaded byAlyssa May Bautista
- LTE eRAN7.0 ICIC FeatureUploaded byPham Duy Nhat
- Silt trapUploaded bymtrego9441
- Applet ExamplesUploaded byManoj Kavedia
- Multidimensional ScalingUploaded byAndrea Teoh
- 173232298 a Guide to Modern Econometrics by Verbeek 1 10Uploaded byAnonymous T2LhplU
- lec1Uploaded byDoan Thanh Thien
- 7. Questions & Answers on MagnetosticsUploaded bykibrom atsbha
- DSP Btech Major ListUploaded byNationalinstituteDsnr
- SPARC64X-and-Xplus-Specification-vol29.0.2015.04.08Uploaded bySahatma Siallagan
- EC2311- COMMUNICATION ENGINEERING.pdfUploaded byThasleema Banu
- Lattice Gauge Theories An Introduction.pdfUploaded bySteven Scott
- CHAPTER 2 PART 1 Sampling DistributionUploaded byNasuha Mutalib
- Mesh-Intro 15.0 L07 Mesh Quality and Advanced TopicsUploaded byMuralidharan Shanmugam
- math think literacyUploaded byapi-232099327
- 12_mathematics (Set - 1)Uploaded bySaksham Sharma
- Integration of CasADi and JModelica.orgUploaded bybarbara_rope
- AYK_ModelAnswerUploaded byVachara Peansupap
- midsolUploaded byAskhat Zinat
- Leakagge MethodUploaded byDinesh Kumar Mehra
- Mathematical Literacy P2 Nov 2012 EngUploaded bybellydanceafrica9540
- modular environmental building (Dissertation Final Hand-In) by Jahronimo EllisUploaded byjahronimo
- 05 Interworking EIS KNXUploaded byJosé Luis Blanco