Professional Documents
Culture Documents
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
1. Factor Analysis has three primary objectives. Identification of the structure of relationships among either variables or respondents. Identification of representative variables from a much larger set of variables for use in subsequent multivariate analyses. Creation of an entirely new set of variables, which are much smaller in number in order to partially or completely replace the original set of variables for inclusion in subsequent multivariate techniques.
1. There are two approaches to calculate the correlation matrix that determine the type of factor analysis performed: R-type factor analysis: input data matrix is computed from correlations between variables. Q-type factor analysis: input data matrix is computed from correlations between individual respondents.
2. Variables in factor analysis are generally metric. Dummy variables may be used in special circumstances. 3. The researcher should minimize the number of variables included in the analysis, but include a sufficient number of variables to represent each proposed factor (i.e. five or more).
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
The sample size should be 100 or larger. Sample sizes between 50 and 100 may be analyzed but with extreme caution. The ratio of observations to variables should be at least 5 to 1 in order to provide the most stable results.
2. Factor analysis assumes the use of metric data. Metric variables are assumed, although dummy variables may be used (coded 0-1). Factor analysis does not require multivariate normality. Multivariate normality is necessary if the researcher wishes to apply statistical tests for significance of factors.
3. The data matrix must have sufficient correlations to justify the use of factor analysis. Rule of thumb: a substantial number of correlations greater than .30 are needed. Tests of appropriateness: anti-image correlation matrix of partial correlations, the Bartlett test of sphericity, and the measure of sampling adequacy (MSA greater than .50).
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
o While the factors contain mostly common variances the plot line will continue to decline sharply, but once specific variance becomes too large the plot line will become horizontal. o Point where the line becomes horizontal is the appropriate number of factors. o Scree test almost always suggests more factors than the latent root criterion. Heterogeneity of the Respondents In a heterogeneous sample, the first factors extracted are those which are more homogeneous across the entire sample. Those factors which best discriminate among subgroups in the sample will be extracted later in the analysis.
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
1. Interpretation is assisted through selection of a rotational method. Rotation redistributes the variance from earlier factors to later factors by turning the axes of the factors about the origin until a new position is reached. Rotation is used as an aid in explaining the factors by providing a more simple structure in the factor matrix. It will not change the amount of variance extracted or the number of factors extracted. There are two general types of rotation, orthogonal and oblique. Orthogonal rotation maintains the axes at 90 degrees thus the factors are uncorrelated. There are three orthogonal approaches that operate on different aspects of the factor matrix. o QUARTIMAX attempts to simplify the rows of the matrix so that variables load highly on a single factor. This tends to create a large general factor. o VARIMAX simplifies the columns of the factor matrix indicating a clearer association and separation among variables and factors. o EQUIMAX is a compromise between VARIMAX and QUARTIMAX in that it simplifies both the rows and columns. Oblique rotation methods such as OBLIMIN (SPSS) and PROMAX (SAS) allow correlated factors.
2. Criteria for Practical and Statistical Significance of Factor Loadings Magnitude for practical significance: Factor loadings can be classified based on their magnitude: o Greater than + .30 minimum consideration level. o + .40 more important o + .50 practically significant (the factor accounts for 25% of variance in the variable).
Power and statistical significance: Given the sample size, the researcher may determine the level of factor loadings necessary to be significant at a predetermined level of power. For example, in a sample of 100 at an 80% power level, factor loadings of .55 and above are significant. Necessary loading level to be significant varies due to several factors:
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
o Increases in the number of variables; decreases the level for significance. o Increases in the sample size; decreases the level necessary to consider a loading significant. o Increases in the number of factors extracted; increases the level necessary to consider a loading significant. 3. Interpreting a Factor Matrix: Look for clear factor structure indicated by significant loadings on a single factor and high communalities. Variables that load across factors or that have low loadings or communalities may be candidates for deletion. Naming the factor is based on an interpretation of the factor loadings. o Significant loadings: The variables that most significantly load on each factor should be used in naming the factors. The variables' magnitude and strength provide meaning to the factors. o Impact of the Rotation: The selection of a rotation method affects the interpretation of the loadings. Orthogonal rotation each variable's loading on each factor is independent of its loading on another factor. Oblique rotation independence of the loadings is not preserved and interpretation then becomes more complex. 4. Respecification should always be considered. Some methods are: Deletion of a variable(s) from the analysis Employing a different rotational method for interpretation Extraction of a different number of factors Employing a different extraction method
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
1. Validation assesses 1) the degree of generalizability of the findings and 2) the degree to which the results are influenced by individual cases. Results should be replicable. Confirmatory factor analysis is the most commonly used replication technique. Analyses can be run using a split sample or another new data set. The factor structure should be stable across additional analyses. Stability is highly dependent on sample size and the number of observations per variable. The impact of outliers should be determined by running the factor model with and without the influential observations.
Beyond the interpretation and understanding of the relationship among the variables, the researcher may wish to use the factor analysis results in subsequent analysis. Factor analysis may be used to reduce the data for further use by (1) the selection of a surrogate variable, (2) creation of a new variable with a summated scale, or (3) replacement of the factor with a factor score.
1. A surrogate variable that is representative of the factor may be selected as the variable with the highest loading. 2. All the variables loading highly on a factor may be combined (the sum or the average) to form a replacement variable. Advantages of the summated scale: o Measurement Error is reduced by multiple measures. o Taps all aspects or domains of a concept with highly related multiple indicators. Basic Issues of Scale Construction: o A conceptual definition is the starting point for creating a scale. The scale must appropriately measure what it purports to measure to assure content or face validity.
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
o A scale must be unidimensional, meaning that all items are strongly associated with each other and represent a single concept. o Reliability of the scale is essential. Reliability is the degree of consistency between multiple measurements of a variable. Test-retest reliability is one form of reliability. Another form of reliability is the internal consistency of the items in a scale. Measures of internal consistency include item-to-total correlation, inter-item correlation, and the reliability coefficient. o Once content or face validity, unidimensionality, and reliability are established other forms of scale validity should be assessed. Discriminant validity is the extent that two measures of similar but different concepts are distinct. Nomological validity refers to the degree that the scale makes accurate predictions of other concepts. 3. Factor scores, computed using all variables loading on a factor, may also be used as a composite replacement for the original variable. Factor scores are computed using all variables that load on a factor. Factor scores may not be easy to replicate.
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
10
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
11
analysis by the a priori criterion. If the research question is largely to explain a minimum amount of variance then the percentage of variance criterion may be most important. When the objective of the research is to determine the number of latent factors underlying a set of variables a combination of criterion, possibly including the a priori and percentage of variance criterion, may be used in selecting the final number of factors. The latent root criterion is the most commonly used technique. This technique is to extract the number of factors having eigenvalues greater than 1. The rationale being that a factor should explain at least as much variance as a single variable. A related technique is the scree test criterion. To develop this test the latent roots (eigenvalues) are plotted against the number of factors in their order of extraction. The resulting plot shows an elbow in the sloped line where the unique variance begins to dominate common variance. The scree test criterion usually indicates more factors than the latent root rule. One of these four criterion for the initial number of factors to be extracted should be specified. Then an initial solution and several trial solutions are calculated. These solutions are rotated and the factor structure is examined for meaning. The factor structure that best represents the data and explains an acceptable amount of variance is retained as the final solution. (4) HOW DO YOU USE THE FACTOR-LOADING MATRIX TO INTERPRET THE MEANING OF FACTORS? Answer The first step in interpreting the factor-loading matrix is to identify the largest significant loading of each variable on a factor. This is done by moving horizontally across the factor matrix and underlining the highest significant loading for each variable. Once completed for each variable the researcher continues to look for other significant loadings. If there is simple structure, only single significant loadings for each variable, then the factors are labeled. Variables with high factor loadings are considered more important than variables with lower factor loadings in the interpretation phase. In general, factor names will be assigned in such a way as to express the variables which load most significantly on the factor. (5) HOW AND WHEN SHOULD YOU USE FACTOR SCORES IN CONJUNCTION WITH OTHER MULTIVARIATE STATISTICAL TECHNIQUES? Answer When the analyst is interested in creating an entirely new set of a smaller number of composite variables to replace either in part or completely the original set of variables, then the analyst would compute factor scores for use as such composite variables. Factor scores are composite measures for each factor representing each subject. The original raw data measurements and the factor analysis results are utilized to compute factor scores for each individual. Factor scores may replicate as easily as a summated scale, therefore this must be considered in their use.
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3 (6)
12
WHAT ARE THE DIFFERENCES BETWEEN FACTOR SCORES AND SUMMATED SCALES? WHEN ARE EACH MOST APPROPRIATE? Answer The key difference between the two is that the factor score is computed based on the factor loadings of all variables loading on a factor, whereas the summated scale is calculated by combining only selected variables. Thus, the factor score is characterized by not only the variables that load highly on a factor, but also those that have lower loadings. The summated scale represents only those variables that load highly on the factor. Although both summated scales and factor scores are composite measures there are differences that lead to certain advantages and disadvantages for each method. Factor scores have the advantage of representing a composite of all variables loading on a factor. This is also a disadvantage in that it makes interpretation and replication more difficult. Also, factor scores can retain orthogonality whereas summated scales may not remain orthogonal. The key advantage of summated scales is, that by including only those variables that load highly on a factor, the use of summated scales makes interpretation and replication easier. Therefore, the decision rule would be that if data are used only in the original sample or orthogonality must be maintained, factor scores are suitable. If generalizability or transferability is desired then summated scales are preferred.
(7)
WHAT IS THE DIFFERENCE BETWEEN Q-TYPE FACTOR ANALYSIS AND CLUSTER ANALYSIS? Answer Both Q-Type factor analysis and cluster analysis compare a series of responses to a number of variables and place the respondents into several groups. The difference is that the resulting groups for a Q-type factor analysis would be based on the intercorrelations between the means and standard deviations of the respondents. In a typical cluster analysis approach, groupings would be based on a distance measure between the respondents' scores on the variables being analyzed.
(8)
WHEN WOULD THE RESEARCHER USE AN OBLIQUE ROTATION INSTEAD OF AN ORTHOGONAL ROTATION? WHAT ARE THE BASIC DIFFERENCES BETWEEN THEM? Answer In an orthogonal factor rotation, the correlation between the factor axes is arbitrarily set at zero and the factors are assumed to be independent. This simplifies the mathematical procedures. In oblique factor rotation, the angles between axes are allowed to seek their own values, which depend on the
Synthese van Hair, J.F., et al. (2006), Multivariate Data Analysis, Pearson, Prentice Hall, 6th ed., Ch. 3
13
density of variable clusterings. Thus, oblique rotation is more flexible and more realistic (it allows for correlation of underlying dimensions) than orthogonal rotation although it is more demanding mathematically. In fact, there is yet no consensus on a best technique for oblique rotation. When the objective is to utilize the factor results in a subsequent statistical analysis, the analyst may wish to select an orthogonal rotation procedure. This is because the factors are orthogonal (independent) and therefore eliminate collinearity. However, if the analyst is simply interested in obtaining theoretically meaningful constructs or dimensions, the oblique factor rotation may be more desirable because it is theoretically and empirically more realistic.