7 views

Original Title: Session5 Factor Analysis Handout

Uploaded by vishalastronomy

- ejss_9_3_01
- efa-cfa.docx
- 00207540600871269
- Consumer Acceptance of Online Banking 4
- AHU Equipment
- Deleted Things From Ur Analysis Chapeter if Neede to Be Utilized Later
- In Kelantan A research on customer satisfaction
- Service Quality Case Study VG
- research
- Models for Location Selection
- c 03711624
- PCA-SPSS
- Very Nice Factor Analysis
- Output1
- Go Karting @ Odyssey (1)
- 10.1177_109019818701400207
- article1380820653_Behrouzi and Wong.pdf
- Exploratory Structural Equation Modeling
- Using Control Genes to Correct for Unwanted Variation in Microarray Data
- Direct Torque Control With ANN Speed Controller Based on Kalman Filter for PMSM

You are on page 1of 16

Multivariate Analysis

Factor Analysis

z Several variables are to be studied in multivariate analysis. z These variables may / may not be mutually independent of each other z Some may hold strong correlation with some other variables. Multi-collinearity may exist among variables. z Data analysis methods in this situation are called Interdependence Methods.

z Factor Analysis to reduce several correlated variables into a few uncorrelated meaningful factors z Cluster Analysis to classify individual elements of the population into a few homogeneous groups.

Research Studies

z Several variables are to be studied z Purpose is to establish a cause-and-effect relationship z One dependent (effect) variable and several independent (cause) variables z Data are obtained on them from sample z Data analysis methods in such situations are called Dependence Methods.

Dependent Variable Metric Independent Variables Categorical Analysis of Variance Metric Multiple Regression Categorical Independent Variables Categorical Canonical Correlation Metric Multiple Discriminant

ANOVA Similarities Number of dependent Variables Number of independent variables Differences iff Nature of the dependent Variables Nature of the independent variables One DISCRIMINANT ANALYSIS One REGRESSION One

Many

Many

Many

Metric

Categorical

Metric

Categorical

Metric

Metric

Major Multivariate methods: 1. Factor Analysis 2. Cluster Analysis y 3. Multivariate Discriminant Analysis 4. Multivariate Regression Analysis.

Factor Analysis

Factor Analysis

z To define the underlying structure among the variables in the analysis. z Examines the interrelationships among a large number of variables and then attempts to explain them in terms of their common underlying dimensions, referred to as factors. z Examines entire set of inter-dependent relationships without making any distinction between dependent and independent variables. z Reduces the total number of variables in the research study to a smaller number of factors by combining a few correlated variables into a factor.

What is a Factor

A factor is a linear combination of the observed original variables V1 ,V2 , . . ,Vn:

Fi = Wi1V1 + Wi2V2 + Wi3V3 + . . . + WinVn where Fi = The ith factor (i = 1, 2,..,m n) Wi = weight (factor score coefficient) n = number of original variables.

Factor Analysis

z Discovers a smaller set of uncorrelated factors (m) to represent the original set of correlated variables (n) significantly (m n) z These factors do not have multi-collinearity, i.e. they are orthogonal to each other z They can then be used in further multivariate analysis (regression or discriminant analysis).

Example # 1

z Evaluate credit card usage & behavior of customers

Initial set of variables is large: Age, Gender, Marital Status, Income, Education, Employment Status, Credit History, Family Background: Total 8 variables

Example # 1

Reduction of 8 variables into 3 factors (i = 3): z F t 1: Factor 1 Heavy H weightage i ht f for age, gender, d & marital it l status t t and low weightages to other variables

Example # 1

These 3 un-correlated factors can be identified by common characteristics of variables with heavy weightages & named accordingly as follows: z Factor 1: (age, gender, marital status) as Demographic Status Factor 2: (income, education, employment status) as Socioeconomic Status

Factor 2: Heavy weightage for income, education, employment status & low weightages to others z

Factor 3: Heavy weightage for credit history & family background and low weightages to other variables.

Example # 2

z z Evaluate customer motivation for buying a two wheeler Initial set of variables is large: 1. Affordable 2. Sense of freedom 3. Economical 4. Mans vehicle 5. Feel powerful 6. Friends jealous 7. Feel good to see ad of this brand 8. Comfortable ride 9. Safe travel 10. Ride for three.

Example # 2

Reduction of 10 variables to 3 factors: z Pride: (mans vehicle, feel powerful, sense of freedom, friends jealous, feel good to see ad of this brand)

z Enlist all variables that can be important in resolving the p research problem z Collect metric data on each variable from all subjects sampled z Convert all data on each variable into standard format (M (Mean: 0 & Std Std. D Dev.: 1) since i diff different t variables i bl may h have different units of measurement z SPSS / SAS etc. do it automatically.

1. It determines the minimum number of factors that can comfortably represent all variables in the research study

z Obviously, maximum number of factors equals the total number of variables

2. It converts correlated variables into the desired number of un-correlated factors Tool: Principal Component Method.

z SPSS gives inter-variable correlations z PCM assists checking appropriateness of factor analysis (Bartletts test)

Example # 3

z To determine the benefits consumers seek from purchase of a toothpaste z Sample of 30 persons was interviewed

z Assists checking adequacy of sample size (KMO test) z Gives initial eigen values z They determine the minimum number of factors that can represent all variables.

z Respondents were asked to indicate their degree of agreement with the following statements using a 7 point scale: (1=Strongly agree, 7=Strongly disagree)

V1: Buy a toothpaste that prevents cavities V2: Like a toothpaste that gives shiny teeth V3: Toothpaste should strengthen your gums V4: Prefer toothpaste that freshens breath V5: Prevention of tooth decay is not an important benefit V6: Most important concern is attractive teeth Data obtained are given in the next slide.

RESPONDENT NUMBER 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 V1 7.00 1.00 6.00 4.00 1.00 6.00 5.00 6.00 3.00 2.00 6.00 2.00 7.00 4.00 1.00 6.00 5.00 7.00 2.00 3 00 3.00 1.00 5.00 2.00 4.00 6.00 3.00 4.00 3.00 4.00 2.00 V2 3.00 3.00 2.00 5.00 2.00 3.00 3.00 4.00 4.00 6.00 4.00 3.00 2.00 6.00 3.00 4.00 3.00 3.00 4.00 5 00 5.00 3.00 4.00 2.00 6.00 5.00 5.00 4.00 7.00 6.00 3.00 V3 6.00 2.00 7.00 4.00 2.00 6.00 6.00 7.00 2.00 2.00 7.00 1.00 6.00 4.00 2.00 6.00 6.00 7.00 3.00 3 00 3.00 2.00 5.00 1.00 4.00 4.00 4.00 7.00 2.00 3.00 2.00 V4 4.00 4.00 4.00 6.00 3.00 4.00 3.00 4.00 3.00 6.00 3.00 4.00 4.00 5.00 2.00 3.00 3.00 4.00 3.00 6 00 6.00 3.00 4.00 5.00 6.00 2.00 6.00 2.00 6.00 7.00 4.00 V5 2.00 5.00 1.00 2.00 6.00 2.00 4.00 1.00 6.00 7.00 2.00 5.00 1.00 3.00 6.00 3.00 3.00 1.00 6.00 4 00 4.00 5.00 2.00 4.00 4.00 1.00 4.00 2.00 4.00 2.00 7.00 V6 4.00 4.00 3.00 5.00 2.00 4.00 3.00 4.00 3.00 6.00 3.00 4.00 3.00 6.00 4.00 4.00 4.00 4.00 3.00 6 00 6.00 3.00 4.00 4.00 7.00 4.00 7.00 5.00 3.00 7.00 2.00

Variables V1 V2 V3 V4 V5 V6 V1 V2 V3 V4 V5 1.000 -0.530 1.000 0.873 -0.155 1.000 -0.086 0.572 -0.248 1.000 -0.858 0.020 -0.778 -0.007 1.000 0.004 0.640 -0.018 0.640 -0.136 V6

Bartletts Test

z For valid factor analysis, many variables must be correlated with each other

z That means, if each original variable is completely independent of each of the remaining n-1 variables, there is no need to perform factor analysis

1.000

z i.e. if zero correlation among all variables z H0: Correlation matrix is unit matrix.

Unit Matrix

V1 V1 V2 V3 ------Vn 1 0 0 ------0 V2 0 1 0 ------0 V3 0 0 1 ------0 ---0 0 0 ------0 ---0 0 0 ------0 Vn 0 0 0 ------1

Bartletts Test

z For valid factor analysis, many variables must be correlated with each other z H0 : Correlation matrix is unit matrix z Here, SPSS gives p level < 0.05 z Reject j H0 with 95% level of confidence z So, correlation matrix is not unit matrix Conclusion: Factor analysis can be validly done.

KMO Test

z Kaiser-Meyer-Olkin measure of sampling adequacy in this case= 0.660 z Values of KMO between 0.5 and 1.0 suggest that sample is adequate for carrying out factor analysis. Otherwise, we must draw additional sample. z Here, 0.660 > 0.5 z Conclusion: Sample is adequate z Thus, these two tests together confirm appropriateness of factor analysis.

Initial Eigen values

Factor 1 2 3 4 5 6 Eigen value % of variance Cumulat. % 2.731 45.520 45.520 2.218 36.969 82.488 0.442 7.360 89.848 0.341 5.688 95.536 0.183 3.044 98.580 0.085 1.420 100.000

Eigen Value

z Variance of each standardized variable is 1 z Total variance in study = Number of variables (here 6)

z Fi = W i1V1 + W i2V2 + W i3V3 + . . . . . . . . . . . . . + W i6V6

z Each original variable has Eigen value = 1 due to standardization z So, factors with eigen value < 1 are no better than a single variable z Only factors with eigen value 1 are retained z Principal Component Method determines the least number of factors to explain maximum variance.

z Variance explained by a factor is called Eigen Value of that factor z It depends on (a) weights for different variables and (b) correlations l ti b between t th the f factor t & each h variable i bl ( (called ll d Factor F t Loadings) z Higher the eigen value of the factor, bigger is the amount of variance explained by the factor.

z Selects weights (i.e. factor score coefficients) in such a manner that the first factor explains the largest portion of the total variance

z F1 = W11V1 + W12V2 + W13V3 + . . . . . . . . . . . + W1nVn

Factor Eigen Value 1 2 2.731 731 2 2.218 % of Variance 45 45.520 520 36.969 Cumulative % 45 45.520 520 82.488

z F2 = W21V1 + W22V2 + W23V3 + . . . . . . . . . . . + W2nVn

z so that the second factor accounts for most of the residual variance, subject to being uncorrelated with first factor z Process goes on till cumulative variance explained crosses a desired level, usually 60%.

.

Factor Rotation

z Initial factor matrix rarely results in factors that can be easily interpreted

Factor Matrix

Variables V1 V2 V3 V4 V5 V6 Factor 1 0.928 -0.301 0.936 -0.342 -0.869 0 869 -0.177 Factor 2 0.253 0.795 0.131 0.789 -0.351 0 351 0.871

z Therefore, through a process of rotation, the initial factor matrix is transformed into a simpler matrix that is easier to interpret

z It leads to identify which factors are strongly associated with which original variables.

Rotation of Factors

Factor Rotation the th reference f axes of f th the f factors t are t tuned d about b t th the origin i i until some other position has been reached.

1. Orthogonal = axes are maintained at 90 degrees. 2. Oblique = axes are not maintained at 90 degrees.

Since unrotated factor solutions extract factors based on how much variance they account for, with each subsequent factor accounting for less variance. So the ultimate effect of rotating the factor matrix is to redistribute the variance from earlier factors to later ones to achieve a simpler, theoretically more meaningful factor pattern.

+1.0

Unrotated Factor II Orthogonal Rotation: Factor II

V1 +.50 V2

V1

+1.0

+.50

V2

-1.0

-.50

+.50

+1.0 V3 V4 V5

Unrotated Factor I

-1.0

-.50

+.50

-.50

Rotated Factor I

-.50 V5

+1.0 V3 V4 Oblique

Unrotated Factor I

Rotation: Factor I

-1.0

-1.0

o are the most widely used rotational methods. o are the preferred method when the research goal is data reduction to either a smaller number of variables or a set of uncorrelated measures for subsequent use in other multivariate techniques.

Simplification means attempting to making zero values either: in rows (variables, i.e. maximizing a variables loading on a single factor) making as many values in rows as close to zero as possible, possible OR

o best suited to the goal of obtaining several theoretically meaningful factors or constructs because, realistically, very few constructs in the real world are uncorrelated.

in columns (factors, i.e. making the number of high loading as few as possible) - making as many values in each column as close to zero as possible.

Factor Rotation

In rotating the factors, we would like each factor to have significant loadings or coefficients for some of the variables. The process of rotation is called orthogonal rotation if the axes are maintained at right angles Let us see how it is done.

Let us take a simpler illustration z Suppose factor loadings of 2 variables on 2 factors: Factor 1 Factor 2 V1 0.6 0.7 0.5 - 0.5 V2 z Variation explained by V1 = (0.6)2 + (0.7)2 = 0.85 z Variation explained by V2 = (0.5) (0 5)2 + ( (-0 0.5) 5)2 = 0.50 0 50

z None of the loadings is too large or too small to reach any meaningful conclusion z Let us rotate the two axes & see what happens.

Factor 2 +1 V1

Factor 2 +1 V1

-1

0 V2 -1

Factor 1 +1

-1

0 V2 -1

Factor 1 +1

-1 Factor 2 V1

z Factor loadings of 2 variables on 2 factors:

V1 V2 Factor F t 1 -0.2 0.7 Factor F t 2 0.9 0.1

0 V2 -1 Factor 1 +1

z Variation explained by V1 = (-0.2)2 + (0.9)2 = 0.85 z Variation explained by V2 = (0.7)2 + (0.1)2 = 0.50

z Note that variation explained remains unchanged z Some of the loadings are too large or too small z Now, we can reach meaningful conclusion.

Rotated Factor Matrix

Variables V1 V2 V3 V4 V5 V6 Factor 1 0.962 -0.057 0.934 -0.098 -0 0.933 933 0.083 Factor 2 -0.027 0.848 -0.146 0.845 -0 0.084 084 0.885

Variables V1 V2 V3 V4 V5 V6 Factor 1 0.358 -0.001 0.345 -0.017 -0.350 0.052 Factor 2 0.011 0.375 -0.043 0.377 -0.059 0.395

Fi = Wi1V1 + Wi2V2 + Wi3V3 + . . . + Wi6V6 In example # 3:

F1 = 0.358V1 0.001V2 + 0.345V3 0.017V4 0.350V5 + 0.052V6

Interpretation of Factors

A factor can then be interpreted in terms of the variables that load high on it from rotated factor matrix. matrix

FACTOR 1 has high coefficients for: z V1: Buy a toothpaste that prevents cavities z V3: Toothpaste should strengthen your gums z V5: Prevention of tooth decay is not an important benefit (Note: Coefficient is negative) FACTOR 1 may be labelled as Health Factor.

Interpretation of Factors

F2 = 0.011V1 + 0.375V2 0.043V3 + 0.377V4 0.059V5 + 0.395V6

Conclusion

From the data gathered from 30 respondents on 6 basic variables variables, the most important benefits consumers seek from purchase of a toothpaste are HEALTH and AESTHETICS Health has 45.5 % importance Aesthetics has 36.9 % importance.

FACTOR 2 has high coefficients on: z V2: Like a toothpaste that gives shiny teeth z V4: Prefer toothpaste that freshens breath p concern is attractive teeth z V6: Most important FACTOR 2 may be labelled as Aesthetic Factor

z Sometimes, we are not willing to discover new factors but we want to stick to original variables and want to know which ones are important z By examining the factor matrix, we could select for each factor just one variable with the highest loading for that factor, if possible z That variable could then be used as a surrogate variable for the associated factor

z V1 has highest loading on F1

Assumptions

Multicollinearity: Assessed using MSA

adequacy). (measure of sampling

Factor analysis is performed most often only on metric variables, although specialized methods exist for the use of dummy variables. A small number of dummy variables can be included in a set of metric variables that are factor analyzed. If a study is being designed to reveal factor structure, strive to have at least five variables for each proposed factor. For sample size: o the sample must have observations > variables variables. o the minimum absolute sample size >50 observations. Number of observations per variable, with a minimum of five and at least ten observations per variable.

predicts if data are likely to factor well based on correlation and partial correlation.

KMO can be used to identify which variables to drop from the factor analysis

because they lack multicollinearity.

There is a KMO statistic for each individual variable, and their sum is the

KMO overall o erall statistic. statistic KMO varies aries from 0 to 1 1.0. 0

Overall KMO should be >0.50 to proceed with factor analysis. If it is not, remove the variable with the lowest individual KMO statistic value

one at a time until KMO overall rises above .50, and each individual variable KMO is above .50.

There must be a strong conceptual foundation to support the assumption that a structure does exist before the factor analysis is performed. performed A statistically significant Bartletts test of sphericity (sig. > .05) indicates that sufficient correlations exist among the variables to proceed. Measure of Sampling Adequacy (MSA) values must exceed .50 for both the overall test and each individual variable. Variables with values <0.50 should be omitted from the factor analysis one at a time, with the smallest one being omitted each time.

Although both component and common factor analysis models yield similar results in common research settings (>30 variables or communalities of >0.60 for most variables):

9 the component analysis model is most appropriate when data reduction is paramount. 9 the common factor model is best in well-specified theoretical applications.

9 use of several stopping criteria to determine the initial number of factors to retain. 9 Factors With Eigenvalues >1.0. 9 A pre-determined number of factors based on research objectives and/or prior research. 9 Enough factors to meet a specified % of variance explained explained, usually >60%. >60% 9 Factors shown by the scree test to have substantial amounts of common variance (i.e., factors before inflection point). 9 More factors when there is heterogeneity among sample subgroups.

Consideration of several alternative solutions (one more and one less factor than the initial solution) to ensure the best structure is identified.

greater than + 0.50 are considered necessary for practical significance.

An optimal structure exists when all variables have high loadings only on a single factor. Variables that cross-load (load highly on two or more factors) are usually deleted unless theoretically justified. Variables should generally have communalities of >0.50 to be retained in the analysis. Re-specification of a factor analysis can include options such as: o deleting a variable(s), o changing rotation methods, and/or o increasing or decreasing the number of factors.

To be considered significant:

o A smaller loading (i.e. +0.30) is enough either a larger sample size, or a larger number of variables being analyzed. o A larger loading (i.e. + 0.50 and above) is needed for a smaller sample size.

conservative and should be considered only as starting points needed for including a variable for further consideration.

- ejss_9_3_01Uploaded byAman Haque
- efa-cfa.docxUploaded byWegu Mister
- 00207540600871269Uploaded byKamboj Shampy
- Consumer Acceptance of Online Banking 4Uploaded byandreea condurache
- AHU EquipmentUploaded byMurugavel Venkat
- Deleted Things From Ur Analysis Chapeter if Neede to Be Utilized LaterUploaded byBefore After
- In Kelantan A research on customer satisfactionUploaded byKhalid Mishczsuski Limu
- Service Quality Case Study VGUploaded byapi-3854027
- researchUploaded bypuraverajan
- Models for Location SelectionUploaded bySajal Chakarvarty
- c 03711624Uploaded byIOSRJEN : hard copy, certificates, Call for Papers 2013, publishing of journal
- PCA-SPSSUploaded byDina Tauhida
- Very Nice Factor AnalysisUploaded bySarbarup Banerjee
- Output1Uploaded byAnkit Jain
- Go Karting @ Odyssey (1)Uploaded byShadab Akhter
- 10.1177_109019818701400207Uploaded byManjeev Guragain
- article1380820653_Behrouzi and Wong.pdfUploaded byHadi P.
- Exploratory Structural Equation ModelingUploaded byPayal Anand
- Using Control Genes to Correct for Unwanted Variation in Microarray DataUploaded byneelimachitturi
- Direct Torque Control With ANN Speed Controller Based on Kalman Filter for PMSMUploaded bybadro1980
- Pattern of Regional Disparities in Socio-Economic Development in IndiaUploaded byAnonymous b6OaJn
- lelUploaded byMTC
- FACTOR ANALYSIS OF ENGLISH COMMUNICATION COMPETENCY AMONG MALAYSIAN TECHNOLOGYUNDERGRADUATESUploaded byIAEME Publication
- [Brajendra C. Sutradhar]Longitudinal Categorical Data Analysis(PDF){Zzzzz}Uploaded byTchakounte Njoda
- A Supplementary Notes on Block DiagramsUploaded byTWChan69
- AMR asnmntUploaded bysnjb13459969
- Visuospatial Functioning in the Primary Progressive Aphasias (2018)Uploaded bymysticmd
- hammad-article-submitted-23-aug-2017.pdfUploaded byHamaad Rafique
- Anuj RMUploaded byÄŋüĵ Kumar Singh
- 09e4150901bd46fc31000000.pdfUploaded byHammad Salahuddin

- SQL Server Data Type Conversion Chart_grayscaleUploaded byvishalastronomy
- IRF BrochureUploaded byvishalastronomy
- Literature Review1Uploaded byvishalastronomy
- Creating a Sales Order to Posting the Details in GLUploaded byvishalastronomy
- Pinnacle T6 Initiation of 7th Slab 05-10-2018Uploaded byvishalastronomy
- An Overview of Commodity Market in IndiaUploaded byvishalastronomy
- An Overview of Commodity Market in IndiaUploaded byvishalastronomy
- Software Development Kit Ct ENGUploaded byvishalastronomy
- Doctrine of indoor management or Turovand’s ruleUploaded byvishalastronomy
- survey_format_final.xlsxUploaded byvishalastronomy
- Afs 12Uploaded bysanket1992
- Ar Sample InvoiceUploaded byvishalastronomy
- Session 13 14 Z T ANOVA F HandoutUploaded byvishalastronomy
- Session 13 14 Z T ANOVA F HandoutUploaded byvishalastronomy
- Interview Presentation FinalUploaded byvishalastronomy
- CawUploaded byvishalastronomy
- Creating a Sales Order to Posting the Details in GLUploaded byvishalastronomy
- 2G SpectrumUploaded byvishalastronomy
- Usability 2 pptUploaded byvishalastronomy
- Pilot Pen CompanyUploaded byvishalastronomy

- A General Lower and Upper Bound Theorem of Static Stability 1993Uploaded bycisco
- TEACHING AND LEARNING BASED OPTIMISATIONUploaded byuday wankar
- Teorema FWLUploaded byDouglas Akamine
- lec5_lcs.pptUploaded byZain Aslam
- msg00459Uploaded byWan Rodie
- Finite Element AnalysisUploaded byHem Kumar
- Econ3120 Spr 14 Prelim1 Final Solution v2Uploaded byDouglas Bourdett
- Mathematics 3 MAy 2013Uploaded byDax Shukla
- Numerical Algorithms Addis CoderUploaded byShabba
- De Book Spring 2014 cUploaded byLibby Cohen
- VFAUploaded bySoh Wei Sheng
- Important Infinite SumsUploaded byJacob Richey
- Dirac Delta FunctionUploaded bykrishna
- Arithmatic FormulaUploaded bySelva Kumar Krishnan
- Variable Weights Decision-Making and Its Fuzzy Inference ImplementationUploaded bymanuaugustine
- Acfm12 Abstract AbridgedUploaded bykonarkarora
- 2 Overview of Numerical AnalysisUploaded byVashish Ramrecha
- ch4.pdfUploaded byAndrew Borg
- Analysis and Synthesis- On Scientific MethodUploaded byAparajay Surnameless
- Weibull FadingUploaded bywiznick
- Buena Introducion a RbfUploaded byFelipe Treschi Escobar
- Mat220 Syllabus ScheduleUploaded bySaher
- Module 5 Problems With Case AnalysisUploaded bymahua adak
- Spatial Statistics 6Uploaded byjosenaranjo
- Text+ +What+to+Read%2C+Skim%2C+and+Skip 1Uploaded byMalhar Parikh
- Differentiation s5 Unit 1 Outcome 3CAPEUploaded byJayBigHarry
- Six Sigma Training PostersUploaded byDenise Cheung
- New Microsoft Word Document (2)Uploaded byFirdous Ahmad
- MATH 2209 Probability Notes CompilationUploaded byCallum Biggs
- Ranking the Problems of Women Entrepreneurs in India Using Vikor Method with LonsmUploaded byMia Amalia