Professional Documents
Culture Documents
This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, som e thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.
In the unlikely event that the author did not send UMI a complete manuscript
and there a re missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF NORTHERN COLORADO
Greeley, Colorado
Tian-LuKe.
College o f Education
Department o f Applied Statistics and Research Methods
May 2001
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
UMI N u m b e r: 3 0 0 6 5 9 6
___ ®
UMI
UMI Microform 3006596
Copyright 2001 by Bell & Howell Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
© 2001
Tian-Lu Ke
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
THIS DISSERTATION WAS SPONSORED
BY
Tian-Lu Ke
DISSERTATION COMMITTEE
Advisory Professor
ay^rSchaffer, Ph.D: '
Faculty Representative
Charmayne B.^Cullom' Ph.D.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
ABSTRACT
Ke, Tian-Lu. Minimum Sample Sizes for Conducting Exploratory Factor Analyses.
Published Doctor o f Philosophy dissertation, University o f Northern Colorado,
2001 .
The purpose o f this study was to investigate the relationship between the sample
size, the number o f variables, the number o f factors, the level o f communality and the
necessary sample sizes for each o f .180 different conditions ( six numbers o f factors, ten
ratios o f the number o f variables to the number o f factors, and three levels o f
communality) were obtained using two values o f coefficients o f congruence (0.92 and
0.98) as criteria. There were 371,600 population correlation coefficient matrices and
matrices for each population correlation coefficient matrix) generated in this study.
Three conclusions were obtained. First, the ratio o f the sample size (N) to the number o f
variables (p) may not be an appropriate index to decide the minimum necessary sample
size. In this study, when the number o f factors (f) is fixed, N and p bear an inverse
between two different levels o f communality w ill decrease as the p/f ratio increases.
Finally, trying to give an absolute sample size is unrealistic. The minimum necessary
sample sizes for these 180 conditions range from thousands to fifty. It is impossible to
give a recommendation only based on absolute sample size. In this study, some
iii
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
guidelines about minimum necessary sample sizes for exploratory factor analysis are
presented for various p/f ratios and 6 different conditions (3 levels o f communality for
iv
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
ACKNOWLEDGMENT
advisors, Dr. Dale Shaw and Dr. Daniel Mundfrom. Not only because the topic o f this
study was suggested by them, but also because o f their patient guidance and valued
to Dr. Jay Schaffer. Without his encouragement, I could not have persisted.
Thanks also go to Dr. Ann Thomas, Kim McFann, and Dawn Strongin. They
were always so nice to me during my whole doctoral program. Also, special thanks go to
Brittany Lane for her editing. It must have been a nightmare to correcting m y writing.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
TABLE OF CONTENTS
CHAPTER Page
I. I N T R O D U C T IO N .................................................................................. 1
EL REVIEW OF LITERATURE . . . . 11
m. M E T H O D O L O G Y ................................................................................. 36
IV. R E S U L T S .............................................................................................. 54
Factor-Orientated Section . . . . . 57
Relationship o f Sample Size to Level o f Communality . 67
vi
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
V. D I S C U S S I O N ................................................................................75
Conclusions . . . . . . . 75
Limitations and Suggestions for Further Researches . . 80
APPENDIX A . . . . . . . 83
APPENDIX B. . . . . . . . 87
BIBLIOGRAPHY . . . . . . . 1 1 7
vii
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
LIST OF TABLES
Table Page
1. The correlation matrix o f x, y x, y 2, y 3, y 4 , and y 5 . 12
2. The . 31
J. The . 33
4. The . 40
5. The . 40
6. The . 41
7. The . 42
8. The . 43
9. The . 43
11. The . 44
12. The . 45
13. The
unde . 56
vm
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
LIST OF FIGURES
Figure Page
la. The minimum necessary sample sizes for one factor with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . 58
lb. The minimum necessary sample sizes for one factor with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 58
3a. The minimum necessary sample sizes for three factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . 61
3b. The minimum necessary sample sizes for three factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 61
4a. The minimum necessary sample sizes for four factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . 62
4b. The minimum necessary sample sizes for four factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 62
5a. The minimum necessary sample sizes for five factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . 63
5b. The minimum necessary sample sizes for five factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 63
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
Figure Page
6a. The minimum necessary sample sizes for six factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . . 64
6b. The minimum necessary sample sizes for six factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . . 64
7a. The minimum necessary sample sizes for 4 different factor numbers
and high level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the excellent-level criterion (0.98) . . 68
7b. The minimum necessary sample sizes for 5 different factor numbers
and high level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the good-level criterion (0.92) . 68
8a. The minimum necessary sample sizes for 4 different factor numbers
and wide level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the excellent-level criterion (0.98) . . 69
8b. The minimum necessary sample sizes for 5 different factor numbers
and wide level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the good-level criterion (0.92) . 69
9a. The minimum necessary sample sizes for 4 different factor numbers
and low level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the excellent-level criterion (0.98) . . 70
9b. The minimum necessary sample sizes for 5 different factor numbers
and low level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the good-level criterion (0.92) . 70
10. The minimum necessary sample sizes for six conditions with the
related number o f variables . . . . . . . 73
R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
CHAPTER I
INTRODUCTION
Factor analysis is a statistical technique that may be used to simplify complex sets o f
data. With the advent o f powerful computers and the creation o f sophisticated software,
the use o f factor analysis has increased, especially in psychology and social science.
inferences from the data we observe to a model we believe accounts for or captures the
variability in the data. The assumption is made that the information from the sample o f
observed data can reflect the information in the whole population. To a great extent, the
accuracy o f our inferences relies on the size o f the obtained sample. Thus, determining
an appropriate sample size becomes a critical matter when we plan to conduct a factor
analysis.
Regarding the appropriate sample size one should use when conducting a factor
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
2
In the same article, Tanaka also argued that even though statisticians can find solace
often wondering about the relevance o f such theory for finite samples.
Ideally, the answer to the “How big is enough “ question (i.e., the minimum
derived formula for the minimum necessary sample size has been found. Some
researchers (Girshick, 1939; Archer & Jennrich, 1973; Cudeck & O’Dell, 1994) have
investigated a connection between standard error in factor loadings and sample size by
looking for a minimum sample size that can yield stable and adequate small standard
Finding the standard errors (the sampling variability) o f loadings in factor analysis
can allow researchers to determine, on the basis o f sample data, when a pattern o f zero
loadings is tenable in the population model. In simple structure, a factor loading o f zero
means this particular factor does not influence the corresponding variable.
Even though this research can not directly be used to determine the minimum
necessary sample size for factor analyses, it can provide information regarding which
Lawley (1967) identified the asymptotic standard errors o f the unrotated loadings
produced in maximum likelihood factor analysis. Jennrich (1973) used Lawley’s (1967)
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
MacCallum, Widaman, Zhang, and Hong (1999) did a thorough review o f the
sample size issue in factor analysis, concluding: “Although this effect is well-defined
theoretically and has been demonstrated with simulations, there is no guidance available
to indicate how large N must be to obtain adequately small standard errors o f loadings....”
Cudeck and O’Dell (1994) concluded that it is too difficult to derive directly the
theoretical answer when all the parts that contribute to a factor analysis are considered,
the method o f estimation, the method o f analytic rotation, the size o f the sample, the
number o f factors, the clarity o f the solution (i.e., the extent to which simple structure
exists in the variables), the degree o f correlation among the factors or among the
variables, the number coefficients estimated, and the interaction of each part.
Zhang, & Hong, 1999; Kline, 1994; Cudeck & O ’Dell, 1994; Comrey & Lee, 1992;
Velicer, Peacock, & Jackson, 1982). However, if the question is changed to “ how big is
enough,” the recommendations and findings are diverse and often contradictory
that the minimum necessary sample size should not be smaller than 100. Comrey and
Lee (1992) gave a rough scale for the adequacy o f sample size: 50 - very poor, 100 -
poor; 200 - fair; 300 - good; 500 - very good; and 1000 or more - excellent. Further,
Comrey and Lee (1992) emphasized that if some other kind o f correlation coefficient
other than the Pearson product-moment correlation coefficient is used, larger samples are
needed to achieve the same level o f stability in the factor solution. Kline (1994) agreed
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
with Gorsuch’s recommendation that 1 00 subjects is the m in im u m . However, Kline
added another recommendation: that the ratio o f subjects to variables be at least 2:1.
Some researchers, like Kline (1994), consider that the ratio o f the number o f
subjects (N) to the number o f variables (p) is a better way to decide on the minimum
sample size. This recommendation seems reasonable because the more variables we
measure, the larger the sample size we should use. However, Arrindell and van der Ende
analysis and component analysis concluded that these recommendations are vague. For
with an absolute minimum o f about 250 observations. However, Everitt (1975) argued,
based on a Monte Carlo study, that perhaps 10 individuals for each variable may be a
sufficient ratio o f observations to variables to aim for, though even this may be rather
optimistic. He also noted that a factor analysis in which the number o f observations is
less than 5 times the number o f variables should be viewed with at least some skepticism.
communality, high (0.6 ~ 0.8), wide (0.2 ~ 0.8), and low (0.2 ~ 0.4) and two different
ratios o f the number o f variables, p, to the number o f factors, f, 20/3 and 20/7. Tucker's
purpose was to study the effectiveness o f factor analytic methods. However, he found
that major differences in quality o f results were associated with fewer factors, so he
recommended that the ratio o f the number o f variables to the number o f factors should be
high.
In another Monte Carlo study, Geweke and Singleton (1980) used four different
sample sizes (10, 30, 150, and 300) to examine the behavior o f the likelihood ratio
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
5
chi-square test statistic for assessing model fit in maximum likelihood factor analysis.
They used the likelihood ratio statistic for testing the goodness o f fit o f the exploratory
factor model. In their conclusions, they argued that the likelihood ratio statistic might b e
more reliable in small samples than previously believed. The fewer factors being fit, tb e
increased, with the threshold being approximately 10 observations for one factor antd
perhaps 25 for two. They also considered that the likelihood ratio test has considerable
factor analysis methods, also examined the effects o f increasing the ratio o f the numbesr
o f observed variables to the number o f factors, and o f increasing sample size. He foun«d
that increasing the ratio o f the number o f observed variables to the number o f factors,
correlations imposed by the factor analysis model, and increasing sample size, have th«
following effects:
(a) The accuracy o f the estimates of the factor loadings is increased with the increase
being greater for maximum likelihood estimates than for other estimates.
(b) The probability o f the occurrence o f a maximum likelihood communality estimate off
one is reduced.
(c) The number o f iterations required for convergence o f the maximum determinamt
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
In sum, these researchers demonstrated that not only is the number o f variables
related to the minimum necessary sample size, but the number o f factors is as well. In
fact, all o f these researchers indicated that the minimum sample size is related to the
variables-to-factors ratio.
Another characteristic that has also been shown to be related to m inim um sample
size is the size o f communality. Velicer, Peacock, and Jackson (1982) investigated the
effect that methods (maximum likelihood factor analysis, principal component analysis,
image component analysis) would have on the factor patterns. Two sizes o f
communality (0.3 and 0.8) and two different sample sizes (144 and 288) were compared.
All three methods performed better with larger sample sizes and with higher
communalities.
MacCallum et al. (1999) expanded Tucker et al.'s (1969) study to compare three
variables-to-factors ratios (10:3, 20:3, and 20:7) with each o f 3 different communality
ranges. Thus, MacCallum et al. used 9 different population correlation matrix conditions
with four levels o f sample size (60, 100, 200, 400) and generated 100 sample correlation
sample size, level o f communality, and ratio o f variables-to-factors all affect the recovery
communality. Although all o f these studies showed that the effect o f sample size is
related to the level o f communality and the ratio o f the number o f observed variables to
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
the number o f common factors, none o f them provided a guideline for the minimum
Tanaka (1987)_ admitted that Monte Carlo procedures could be o f some utility in
determining appropriate sample sizes. He argued, however, that even in the most
comprehensive studies done to date, only small subsets o f models have been investigated.
MacCallum et al. (1999) brought up another point. They suggested that previous
recommendations regarding the issue o f sample size in factor analysis were based on a
misconception that appropriate sample size was influenced predominately by the number
o f variables. In their research, they demonstrated that with different communality levels
and varying ratios o f variables to factors the minimum sample size required to achieve
that arises from the lack o f fit o f the model in the population, and "sample error," that
arises from the lack o f exact correspondence between a sample and a population.
Because MacCallum and Tuckers’ model can efficiently focus on the effect o f sample
size on both model error and sample error, this study will use their model as the
The purpose o f this study is to investigate the relationship between the sample size,
the number o f variables, the number o f factors, the level o f communality and the stability
to minimum sample size have been given in different studies, but they are limited in the
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
number o f situations they consider. Now, with recent advances in computer technology
and software, it is possible to get more specific guidelines for the minimum sample size
Computer Procedure
variety o f data sets that varied in the following ways. The numbers o f factors varied from
factors (p/f), ranging from 3 to 12, were used. For each combination o f number o f factors
and ratio o f variables to factors, three different levels o f communality (high 0.6, 0.7, 0.8;
wide 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8; low 0.2, 0.3, 0.4), were used. Therefore, a total o f 6
x 10 x 3 = 180 different population situations were investigated. For each o f these 180
Tucker’s procedure (Tucker et al., 1969). Hence, a total o f 18,000 population correlation
matrices were considered in this study. For the case o f number o f factors equal to one, a
different population correlation matrix generating procedure was used. This different
Then, sample correlation matrices were generated from each o f these 18,000
population correlation matrices by using a small sample size as a start point. The first
sample size used in the procedure was dependent on the number o f variables. The sample
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
(3) When sample size is between 100 and 300, it increases bylO.
(4) When sample size is between 300 and 500, it increases by 50.
This procedure was stopped when the results o f population and sample correlation
matrices match both o f two criteria that use the coefficient o f congruence to present the
similarity o f two matrices and will be described clearly later. If any o f these two criteria
could not be matched when sample size exceeded 5000, then the procedure was also
normal distributions.
correlation matrices were generated for each size of sample until the size o f sample made
the coefficients o f congruence match both o f criteria. Each o f these sample matrices was
analyzed using maximum likelihood factor analysis. The retained number o f factors was
set equal to the known number o f factors in the population (i.e. from one to six). More
detail about how to decide minimum necessary sample size will be presented in Chapter
3.
larger, then the congruence between the population and the sample is excellent; between
0.92 and 0.98 is considered “ good ” agreement; between 0.82 and 0.92 is borderline;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
10
Limitations
The first limitation is due to the nature o f the distributions used in this study. For
simplicity, all the data generated for this work have normal distributions. Consequently,
minimum sample size recommendations made here may be inappropriate for non-normal
data.
Second, Cudeck and O'dell (1994) emphasized that in addition to sample size, the
method o f rotation, the number o f factors, and the degree o f correlation among the factors
will all affect the standard errors o f the factor loadings. Therefore, it is possible that the
results o f this study will not generalize to situations using other estimation procedures
A third limitation stems from the decision to not investigate the effect o f
measurement error in this study. In the common factor model, the error o f measurement
contributes to the influence o f the unique factor for a given variable. When different
amounts o f measurement error are present in the data, the minimum sample size
recommendations made here may need to be adjusted to larger values. However, there is
no apparent research dealing with the effect o f different orthogonal rotation methods.
Therefore, we do not know i f the decision to use another orthogonal rotation method will
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
CHAPTER E
REVIEW OF LITERATURE
The general description o f the essential purpose o f factor analysis is expressed as "to
describe, i f possible, the covariance relationships among many variables in terms o f a few
underlying, but unobservable, random quantities called factors." (Johnson & Wichem,
1998, p. 514)
McDonald (1985) gave a more specific description o f this purpose. He asserted that
common factor analysis uses the partial correlation aspect o f regression theory to explain
When considering the effect o f the composition o f the sample, Andrew and Howard
(1993) provided an example to illustrate the importance o f range o f variable scores in the
data on factor analytic results. They measured psychological tests o f Verbal Ability,
samples. The first random sample was taken from the general population. The second
sample o f equal size to the first one consists entirely o f individuals who have an IQ o f
exactly 100. Andrew and Howard showed that a factor analysis o f the intercorrelations
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
12
Intelligence from the first sample but that the second sample would fail to produce such a
factor.
sampling, this example can help us to understand the meaning o f factors. To do this,
consider a correlation matrix from McDonald (1985) in conjunction with Andrew and
Howard’s example. Let the independent variable x be the value o f IQ, y, be the test
score o f Verbal Ability, y 2 be the test score o f Numerical Ability, y 3 be the test score o f
Arithm etic Reasoning, y x be the test score o f Memory, and y 5 be the test score o f
Perceptual Speed. All x and y t are standard measures. A correlation matrix is given in
Table 1.
Table 1
The correlation matrix o f x, y, 1_ y31_ y 4 , and yc.
1 0.9 0.8 0.7 0.6 0.5
0.9 1 0.72 0.63 0.54 0.45
0.8 0.72 1 0.56 0.48 0.40
0.7 0.63 0.56 1 0.42 0.35
0.6 0.54 0.48 0.42 1 0.3
0.8 0.45 0.40 0.35 0.3 1
. -
jk - x I
r j k ~ r j x r tx
~
_ .
k
and calculating the matrix of partial correlations between the five dependent variables
when the independent variable x is partialled out, we see that every partial correlation is
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
13
zero. That is, a single independent variable explains all the correlations in the matrix o f
Now, suppose w e had measured y l , y 2 , ..., y s , but had not chosen the measure x.
We then have the (5 x 5 ) submatrix o f Table 1 obtained by deleting the first row and the
first column. Then we' find that each correlation in the matrix is a product o f two o f a
sequence numbers, 0.9, 0.8, 0.7, 0.6, 0.5 (e.g. rl2 = 0 .9 -0 .8 , r35 = 0 .7 -0 .5 ). Hence, the
numbers. It needs to be noted that this situation will not be true in general o f every
variables in Table 1, w e can deduce from the regularity o f its formation that there may
exist an independent variable, which w e have not observed, that would make all the
correlation matrices that looked as if their correlations could be explained in this way.
This concept o f one-common factor can be extended to develop the concept o f multiple-
common factors. McDonald (1985) used the following expression to express the partial
where the independent variables xx, x2,... ,x mare mutually uncorrelated. McDonald did
not write the expression for the denominator because the only condition considered here
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
14
know rJk - (rjxrki + rj2rk2 +-... + rj m ) =0 which can be arranged in the form
between n dependent variables, then each such correlation can be written as a sum o f m
products o f two numbers - the correlation o f each dependent variable with each
This statement implies that there exists a number o f unobserved variables (common
factors) that explain the observed correlations, in the sense that when these are partialled
out, the partial correlations o f our observed variables all become zero. Alternatively, we
can say that each o f our observed variables can be expressed as the sum o f a (common)
residual about that regression and that the residuals are uncorrelated.
where j = l, 2, ...,n; y yis the j-th observed variable; xp is the p-th common factor, p= l, 2,
..., m; ej is the residual o f about its regression on the factors (the unique factor); and
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
variable] on factor p), together with the statement that the residuals are uncorrelated
(McDonald, 1985). The model (2-4) is the general common factor model.
y = xCT 2-5
where the x is a row vector containing scores on common and unique factors and Q is a
matrix o f population loadings for common and unique factors. These two matrices can
be expressed as:
X = lXc>X»] 2-6
unique factors, A is a px f matrix o f population loadings for the common factors, and
Then the following equations can be obtained directly by substituting 2-6 and 2-7
into 2-5:
2-8
= A O A '+T2 2-9
MacCallum and Tucker (1991) began their investigation from the perspective that no
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
mathematical model will fit real-world phenomena exactly. Therefore, they represented
lack o f fit in the conceptual and mathematical expression of the model that consists o f
nonlinearity and minor factors. This notion may be expressed mathematically as follows:
y =z+z 2-10
where z is a vector representing that portion o f y that is accounted for by the common
• factor model and z is a vector representing that portion o f y not accounted for by the
model.
MacCallum and Tucker (1991) emphasized that z is neither equivalent to the error
o f measurement nor equivalent to the unique factor in the model. The unique factor is
part o f the common factor model, and error o f measurement is a phenomenon that
contributes to the influence o f the unique factor for a given variable. These influences
are incorporated into the common factor model, and thus contribute to z and not to z .
Thus, the measured variables are defined as linear combinations o f the factors plus a
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
17
Given that is the population covariance matrix for the measured variables,
MacCallum and Tucker (1991) also defined a matrix Z_ as the population covariance
matrix for the modeled variables. The following factorial structure for Z_ is easily
Matrix Zxc is the population covariance matrix for the common and unique factors.
z cc z cu
2-14
z uc z uu
population covariances o f unique factors with common factors; and Zm is the transpose
o f Zuc. Without loss o f generality, MacCallum and Tucker (1991) defined all factors as
being standardized in the population, which means that all entries in are correlation
other and with common factors in the population. Thus, the structure o f Za simplifies to
2-15
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
18
Substituting from Equations 2-15 and 2-7 into Equation 2-13 yields the following:
2 = = AcDA'+'F2 2-16
From Equation 2-10, the following relation among covariance matrices can be
derived easily:
2 yy = 2 „ -f 2 . + 2 . + 2 ... 2-17
is defined as
A = 2 _+ 2 _ + 2 _ 2-18
substituting 2-18 for 2-17, the relation between the observed covariance matrix 2yy and
2 W= 2 = + A S 2-19
Substituting 2-16 for 2-19 yields an expression for the factor structure o f the
By using a similar procedure to obtain the expression for the population factorial
C
^cc C cu A’
= ACccA’+ACc„T + 'f'C„eA,+'FClllI'i' 2-21
c uc c uu. ¥
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
19
MacCallum and Tucker (1991) considered the deviation from zero o f the
covariances in , Cuc, and off-diagonal entries in Cuu to give rise to one source o f
from this phenomenon. Incorporating this lack-of-fit term into the model 2-21 yields the
Cs = AC^A'-t-T2 + A. 2-22
Now, considering the final step in this development: the expression o f the model in
terms o f the factorial structure o f C , which contains the sample covariances o f the
measured variables.
Because the last three terms in 2-23 are covariance matrices involving those portions
o f the measured variables that are not fit by the model, those terms represent sample
Ac is defined by Ac = C , + C _ + C _ 2-24
Substituting from 2-22 into 2-23 yields the following model for C' :
c „ = (ACccA'+^F2 + A .) + Ac 2-25
This model expresses the factorial structure o f C and incorporates two distinct
sources o f error: (a) A ., representing lack o f fit as a result o f sampling error arising from
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
20
nonzero sample covariances o f unique factors with, each other and with common factors;
MacCallum and Tucker (1991) explained the population and sample models for
In the population, there is one source o f error called model error. It arises from lack
o f correspondence between the model and the real world in the sense that the
measured variables w ill not be exact linear combinations o f the common and unique
factors.... In the sample, five distinct sources o f error have been identified. One
such source is model error in the sample, arising in the same manner as in the
population. In addition, there are four distinct sources o f sampling error that
influence solutions. One involves sampling variability in the common-factor
covariances. The estimates o f factor loadings are, however, affected by other
sources o f sampling error. One involves sampling error arising from nonzero
covariances o f unique factors with each other and with common factors. As noted in
the sample model,-the violation of the assumption that such covariances are zero in
the sample gives rise to a primary source o f lack o f fit o f the model. The last two
are what researcher discusses in this study. ... The final two sources o f sampling
error arise from standardization o f common factors and measured variables in the
sample. In general, overall fit would not be affected by standardization of measured
variables and common factors in exploratory factor analysis (Cudeck, 1989). ...
However, model error in the sample and error arising from nonzero sample
covariances involving unique factors will result in a poorer fit between the model
and the sample data.
MacCallum and Tucker’s (1991) models clearly show the sources o f error, which is
very useful in selecting parameter values to use in generating the population correlation
matrices. In this study, the researcher will focus on the sources o f error involving
sampling variability in the common factor covariance structure and arising from nonzero
covariances o f the unique factors with each other and with the common factors. To
investigate the effect o f sample size on these sources o f error in various population
conditions involving the number o f factors, the ratio o f variables to factors, and the level
o f communality, the assumption o f no model error in the population and sample model is
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
21
made. The procedure used to generate population correlation matrices is based on this
assumption.
W hyU seM L FA
Many different methods exist to extract common factors. The method used in this
(2) A study may involve so many variables that its dimensions will not fit the
(3) The ordinary user does not even know about maximum likelihood factor analysis.
analysis estimates leads some researchers to recommend other analyses that do not
It is obvious that the first three reasons to use other methods stem from the
limitations o f computers. However, the advent o f high-speed computers has made these
three reasons untenable today. In the case o f Heywood results, the fourth reason is
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
22
questionable. According to McDonald (1985), a Heywood result may indicate that the
study has not been well designed in the sense that not enough variables have been
included to define each factor adequately. He argues that Heywood cases are not a
reason to reject use o f maximum likelihood factor analysis, but are a caution to
The reason we have chosen ML method for this study is the same as MacCallum et
Maximum likelihood estimation is based on the assumption that the common factor
model holds exactly in the population and that the measured variables follow a
multivariate normal distribution in the population, conditions that are inherent in the
simulation design and that imply that all lack o f fit and error o f estimation are due to
sampling error, which is our focus.
How MLFA Work
Lawley (1940) made a major break through with the development o f equations for
the maximum likelihood estimation o f factor loadings, and he also provided a framework
for statistical testing in factor analysis. A more condensed derivation o f the method
analysis. Howe (1955) showed that the maximum likelihood estimators o f the factor
assumptions about the variates, and he also provided a Gauss-Seidel computing algorithm
that, according to Mulaik (1972), was far superior to Lawley's for obtaining these
estimates.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
23
It has not been found possible to establish exact conditions under which the above
procedure converges, but in practice this is usually the case. Convergence is,
however, often very slow and, as Have (1955) has pointed out, it is possible for
differences between successive iterates to be extremely small and yet to be far from
the exact solution.
Joreskog (1967) developed a new computational method, arising from private
correspondence with Lawley, which has the advantage that the iterative procedure always
desired. His follow-up work (Joreskog, 1975) demonstrated how iterations can converge
much more quickly. This method will be used in this study and its detail can be
described as follows.
;£ = / / + A / - h e 2-26
combined effect o f specific factors and random error, and A: [ A~] is a p x f matrix o f
factor loadings.
The residuals e are assumed to be uncorrelated with each other and with the
respectively by O , T , and £ .
Joreskog (1975) assumed that the common factors have unit variance, so the
diagonal elements o f O are unities. If, in addition, for f > 1, the common factors are
orthogonal or uncorrelated, then the nondiagonal elements o f <t> are zeros and thus ®
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
24
Z = A O A V f'2 2-27
Equation 2-26 and 2-27 represent a model for a population o f individuals. The
parameters /u, A , <t>, and T 2 characterizing the population are usually unknown and
I H __ __
and the sample covariance matrix S = (S0 ) where S;j = —— ~ xj ) - The
N 1 a =l
I-------------- *
estimation problem is then to fit a matrix Z o f the form 2-26 to an observed covariance
matrix S.
When f > 1, and there is more than one common factor, it is necessary to remove an
element o f indeterminacy in the basic model before the procedure for minimizing M can
be applied.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
25
This indeterminacy arises from the fact that there exist nonsingular linear
transformations o f the common factors that change A , and in general also <t>, but leave
E , and therefore the function unaltered. So we must impose some additional restrictions
these two steps (to find the equation o fM and solve it) , we can obtain the estimates,
Let y, > y , > . . . > y p be the eigenvalues o f VFS-!HF and let vv,, w2,.. ., w p be an
Q t = [w I,w2,...,w/ ].
the mathematics procedures. The computational procedures used in this study combine
Joreskog's (1975, 1967) procedure and Johnson and Wichem's (1998) scheme:
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
26
~2 1 f 1
2 p r
2. Using the given T'2, compute thefirst f distinct eignevalues, y x > y 2 and
A = vF r i( Q , - / ) I/2 2-33
*2 ~2 „ „ A2 ~2 „2
3. Using A , obtain a new T7 : —A A ’ . The values if/x , y/z , . . . , y / p obtained
A 2 A.
Steps (2) and (3) are repeated until convergence is achieved, i.e., until the
^2
differences between successive values o f \f/i are negligible. For example, Joreskog
*2
o f tffl .
R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
It often happens that some \f/i become negative. This solution is inadmissable and
is said to be improper, or a Heywood case. Joreskog (1967) provided the criteria to deal
A 2
with Heywood cases. When some T , s are smaller than eu (0.01), the estimation
R~l = I T 2-35
and d x > <7, > ...> d p are the eigenvalues o f T’(R —'i,2)T = I p —T'm 2T and. iq,
a2 A2
In this study, when some if/1 become negative, these negative if/ { will be set to
Simple Structure
After the estimation o f the factor loading matrix A px/ has been found, there is a
situation that must be mentioned. When the column number, f, is > 1, then there is some
ambiguity associated with the factor model. To demonstrate this, let T be any f x f
A A A A A A
orthogonal matrix, so that TT’=T’T=I, then:A A' = A TT' A ' = A" A*1 2-36
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
28
This creates a problem at the stage at which we wish to understand our results.
contained in the notion o f simple structure, which was advocated by Thurstone (1935).
According to Harman (1976, p. 98), Thurstone’s original three conditions for simple
1. Each row o f the factor structure should have at least one zero.
2. Each column should have at least m zeros (m being the total number o f
common factors).
3. For every pair o f columns there should be at least m variables whose entries
frame can be found such that each test vector is contained in one or more o f the...
Coordinate hyperplanes, then the combined frame and configuration is called a simple
structure.” (p. 328) In the same book, the other two conditions for simple structure were
1. Each row o f the factor matrix should have at least one zeros.
2. If there are m common factors, each column o f the factor matrixshould have
at least m zeros.
3. For every pair of columns o f the factor matrix, there should beseveral
variables whose entries vanish in one column but not in the other.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
29
4. For every pair o f columns o f factor matrix, a large proportion o f the variables
should have vanishing entries in both columns when there are four orr more
factors.
5. For every pair o f columns o f the factor matrix there should be only a small
McDonald (1985) believed that these five rules, partly on the basis o f experience,
are supposed to legislate an unambiguous choice among alternative solutions that might
Mulaik (1972) mentioned that the simple-structure criteria do not require necesssarily
orthogonal reference axes. All that these criteria require o f the m reference axes Es that
they be a set o f linearly independent vectors in the common-factor space. Mulaik ( 1972)
thought that the basic idea that in a simple-structure factor solution each varialble is
accounted for by fewer than the total number o f common factors obtained in the analysis
Therefore, once the estimated factor loading matrix A is obtained, it will be rotated
with the criteria o f simple structure, to make the loading more interpretable. As Johanson
and Wichem (1998, p. 546) said: rotation in factor analysis may be likened “...to
sharpening the focus o f a microscope in order to see the detail more clearly.”
Varimax Rotation
Many different methods o f rotation exist. Some present orthogonality, some d o not.
The choice o f rotation method is often subjective. In this study, the Varimax method, was
chosen to be the rotation method because it is the most commonly used method (D allas,
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
30
2-37
where g ;j. is the loading o f i-th variable on j-th factor, hf is the communality o f i-th
Kaiser (1958) proved that two factors could be rotated to maximize V with the
2-39
Where x.
p p
C= X —vf ) > D ~ uivi » fij *s factor loading, and hi is i-th communality.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
31
Then, using Table 2 (Hannan 1976, p. 287) we can find the angle (p that maximizes
V .
Table 2
The criteria o f (p
+ + -F I 0° —2 2 .5 °
+- - - n 2 2 .5 ° - 4 5 °
- + in - 4 5 ° -----2 2 .5
- ~r - IV —2 2 . 5 ° - 0 °
cycle, the value o f V is calculated. In this study, when the difference between V for
two consecutive two cycles is smaller than 0.0001, the rotation procedure will be stopped
A
and the A rotated from the final cycle will be the result.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
32
Procrustes Rotation
arbitrary factor solution that would lead to a least-squares fit to a specified factor
Because o f this restriction, Mosier suggested a method for obtaining an approximate least
Green (1952) solved this problem o f orthogonal rotation to a least squares fit.
applicable to matrices A and B that are o f less than full column rank. Then, Browne
The methods mentioned previously all need two matrices A and B which are fully
squares fit to a partially specified target matrix were presented by Browne (1972, a, b).
The most frequently used criterion in deriving solutions for procrustes matching is
transformation which when applied to the matrix A will product the greatest similarity
between AT and B. Such least squares solutions are generally useful and have good
statistical properties. However, Korth and Tucker (1976) argued that in dealing with the
This kind o f criterion would involve normalization. To describe their idea, Korth and
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission .
33
Table 3
According to Korth and Tucker (1976), this adjustment produced a strikingly different
pattern. Pattern EH contains coefficients that are half the values o f the coefficients in
Pattern I. The interpretation o f this pattern would be very similar to that o f Pattern I.
But, the sum o f the squared differences for Patterns I and II are smaller than that for
As Korth and Tucker (1976) noted, a factor is likely to fluctuate in its importance
from situation to situation, and hence all o f the coefficients are expected to fluctuate
along with it. A criterion that captures similarity in this context is the congruence
To calculate the congruence coefficients, all three factors in Pattern I and Pattern IH
have congruence coefficients o f 1.0, while the factors o f Patterns I and H have
congruence coefficients o f 0.992, 0.893 and 0.749 respectively. Korth and Tucker (1976)
considered the last three coefficients as substantial, but represent poor matching. They
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
34
factors.
This Procrustes method was generated by Mosier (1939) to obtain the approximate
tr(E’E) 2-41
Korth and Tucker (1976) proved that M osier’s (1939) approximation solution to his
coefficient. This procedure can be separated into two parts, minimization and
normalization.
First, T* is found as
The congruence coefficient obtained in matrix form for column k (factor k) is:
2-43
The formula 2-25 for this coefficient (f>k also can be written as:
^
/=!
fjH s 'j/jk U )
2-44
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
35
where f Ms) indicates the factor loading comes from sample, and f Mt) indicates the
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
CHAPTER E l
METHODOLOGY
procedure to generate simulated correlation matrices that tend to produce a fairly strong
simple structure. By specifying a major factor domain, the number o f factors in the
major domain, and the number o f variables, their procedure can easily generate a
correlation matrix.
Although Tucker et al.’s (1969) original aim was to use this procedure to study the
procedure can also be used as the population correlation matrices in this study. It will
permit us to control the number o f variables, the number o f c o m m on factors, and the
level o f communality.
variables, common factors, and levels o f communality. Tucker et al.'s (1969) procedure
can be used with the SAS/IML program to produce these 180 combinations. (Tucker et
al.’s procedure can be used only when the number o f factors is larger than or equal to
two. Hence another procedure used when the number o f factors was equal to one.)
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
37
considered three different types o f factors: Type 1 is major factors, Type 2 is m inor
factors, and Type 3 is unique factors. Tucker et al. used the subscript s to represent the
type o f factors.
The number o f factors o f each type was designated by M s and the factors o f each
ms = 1,2,3
Variables were designated by the subscript j or j' with the number o f variables being
J, thus:
j or j' = 1, 2, 3, . . . J
For each type o f factor, there is a matrix At with entries o f "actual input factor
loadings". As is a matrix o f order J x M s ( i.e. As has a row for each variable and a
Then, Tucker et al. defined a matrix A] for each matrix As by adjusting the rows o f
3-1
R —BlPlBl +■B2P2B1
2 2 2 + Bz
3 P3B.
3 3 3-2
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Where Bx, B2, and f?3 (in general Bs ) are diagonal matrices with entries bXJ, b2J and
b3J (in general bsj). These entries bsj were restricted to being real, positive numbers such
that:
3-3
The matrix As o f actual input factor loadings may be defined in terms o f the
3-4
R —AiA^-hA^Ai'+AjAj1' 3-5
When B2 is zero, the simulation model is identical to the formal model and Bx contains
The central feature o f the simulation model is the development to the matrices As o f
"actual input factor loadings". The matrices A2 and Az for minor factor and unique
factors were set to zero, which represents the only sensible idea that the designer o f the
Comments about the matrix Ax for factors in the major factor domain and a
procedure used in Tucker et al. (1969) are given in the following paragraphs. There is
one thing must be mentioned. Using this procedure, the number o f variables for each
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
39
factor is not fixed. Therefore, with the same ratio o f variables to factors, the
combinations o f the number o f variables and the number o f factors could be very
different. For example, 18 variables and 3 factors will produce a p /f ratio equal to 6.
Using this procedure, two different combinations o f the number o f variables and the
number o f factors - (1) 6 variables per factor and (2) 9 variables for one factor, 8
variables for another factor, and 1 variable for the other — both can randomly be
Conceptual Input Factor Loadings for Factors in the Major Factor Domain
representing the case when only vague ideas exist about the major factor domain.
First, relative conceptual input loadings were developed for the variable that
constituted a row vector; then the vector was adjusted to unit length by a multiplying
factor. The relative conceptual loadings were developed by the following procedure.
For a f -factor major domain, the sum o f loadings for each variable was controlled
at ( / i -1)- The first loading was an integer in the range 0 through ( / , - ! ) (with equal
probability) on a randomly chosen factor. The second loading was in the range from 0
through a value o f ( f t -1 ) —a ,, (where at is the value o f the first loading,) on one o f the
remaining factors etc. It is to be noted that this procedure tended to produce a fairly
major factors, which is used to show what will result from the previous procedure.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
40
Table 4
The relative conceptual loadings matrix
0 2 1 0
0 1 1 1
1 0 2 0
0 1 1 1
1 1 1 0
0 0 2 1
3 0 0 0
0 0 0 3
0 0 Jn 0
2 0 1 0
1 1 1 0
0 0 0 3
And the following is Al , the conceptual input factor loadings o f the major factor
each row vector o f the relative conceptual loading matrix to unit length.
Table 5
The conceptual input factor loadings matrix
0 0.8944272 0.4472136 0
0 0.5773503 0.5773503 0. 5773503
0. 4 4 7 2 1 3 6 0 0.8944272 0
0 0.5773503 0.5773503 0. 5773503
0.5773503 0.5773503 0.5773503 0
0 0 0.8944272 0. 4472136
1 0 0 0
0 0 0 1
0 0 1 ' 0
0.8944272 0 0.4472136 0
0.5773503 0.5773503 0.5773503 0
0 0 0 1
After At, the conceptual input factor loadings matrix o f the major factor domain, is
generated, a three step procedure is utilized to develop the matrix Al o f actual input
factor loadings for the major factor domain from the matrix At o f conceptual input factor
loadings.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
41
First step:
In this step, the conceptual input factor loadings are combined with random normal
corresponding to each (a1) ymi. COym, is the output from the first step and is defined by:
O i) jmx = cO
T, + d ijXj.ni (1 - < ) I/2 3-6
Where cmi is a constant for each factor mx and d xj is a constant for each variable j. The
constant d XJ is used to normalize each row of x. to a unit length vector and is defined
by:
d u = ( Z x% X U2 3-7
m.
experimenter has on the loading o f actual variables on the factors. Values o f c used in
Tucker et a lls study were 0.7, 0.8 or 0.9, chosen at random with equal probability for
Table 6
The matrix o f ct
0.9 0 "o’ 0
0 0.9 0 0
0 0 0.8 0
0 0 0 0.9
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
42
Using this matrix o f c, and the formula o f 3-6, the matrix o f y x is generated as
Table 7.
Table 7
The matrix o f y,
0.26301 ~0*. 658 8 971 0.2268473 0.3007285
0.1291547 0.8099287 0.1681853 0.311018
0.3145179 0.0670379 1.2948148 0.025822
0.3724287 0.6808306 0.5192825 0.3660985
0.1923219 0.3392633 0.6429454 0.1817946
-0.019473 -0.154742 1.2176335 0.5831157
1.0651629 -0.329524 0.0783715 0.225596
0.1908908 0.2127205 0.4266397 0.7893482
0.1546132 0.2256259 0.9980241 -0.307395
0.9791882 0 . 154.3488 - 0 . 124923 0.1134055
0.3173704 0.5050336 0.2188414 0.3430885
0.2254766 -0.148122 0.238238 1.1954063
Second step:
The second step uses a "skewing function" which was introduced to reduce and limit
the negativity o f the factor loadings (2'1) y„i - This function produces coefficients (z ,)ymi
as follows:
inclusive. Tucker et al. (1969) used a value o f k=0.2 which was also used in this study.
Using formula 3-8, the matrix o f z x is generated in Table 9. Then each vector o f
1/2
where glJ -
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Table 8
The matrix o fK
Table 9
The matrix o f z,
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
44
Table 10
The matrix o f A'
The final step in developing the matrices Ax o f actual input factor loadings for the
Ax = BXAX 3-10
In this example, the matrix o f Bx* Bxis shown in Table 11 and the matrices Ax of
actual input factor loadings for the major domains is shown in Table 12.
Table 11
The matrix o f B,
0.447 0 0 0 0 0 0 0 0 0 0 0
0 0.632 0 0 0 0 0 0 0 0 0 0
0 0 0.447 0 0 0 0 0 0 0 0 0
0 0 0 0.447 0 0 0 0 0 0 0 0
0 0 0 0 0.632 0 0 0 0 0 0 0
0 0 0 0 0 0.447 0 0 0 0 0 0
0 0 0 0 0 0 0.447 0 0 0 0 0
0 0 0 0 0 0 0 0.547 0 0 0 0
0 0 0 0 0 0 0 0 0.547 0 0 0
0 0 0 0 0 0 0 0 0 0.632 0 0
0 0 0 0 0 0 0 0 0 0 0.547 0
0 0 0 0 0 0 0 0 0 0 0 0.632
R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
45
Table 12
The matrix o f A,
There are two popular methods used to generate sample correlation matrices. One
was developed by Kaiser and Dickman (1962) and the other was proposed by Wijsman
(1959). Kaiser and Dickman's (1962) procedure to produce sample correlation matrices
analysis,
Z = XF 3-11
N xp matrix whose elements are randomly generated from a normal distribution with 0
However, Hong (1998) stated that Wijsman's (1959) procedure could reduce
matrix. (X o f order N xp is score matrix.) Therefore, Wijsman's method was used in the
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
46
correlation matrix.
P=FF 3-12
A = FGG'F 3-13
generated. The off-diagonal entries o f G are random normal deviates, drawn from a
normal distribution with mean 0 and variance 1. The diagonal element in column j is the
positive square root o f a random chi-square value with degrees o f freedom n-j, where n is
Then, using matrix A, the sample covariance matrix, C, can be obtained such that:
3-14
n
R = D~U2CD~UZ 3-15
where D is a diagonal matrix whose elements are the corresponding diagonal entries
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
47
Analysis Procedure
This study is a Monte Carlo study to investigate the relationship among the number
o f measured variables, the number o f factors, the level o f communality, and the sample
Step (1) Generate 100 population correlation matrices with a given number o f
Step (2) Generate 100 sample correlation matrices for each o f the 100 population
Step (3) Sample correlation matrices and the related population correlation matrix will
correspondence between the 100 sample solutions and the their corresponding
population solution will be calculated. Then it can be found how the sample
Step (4) By using two criteria, the minimum necessary sample size for each of
In order to investigate the relationship among the number o f common factors, the
number o f measured variables, the level o f communality, and the sample sizes, we need
The conditions o f population correlation matrices used in this study are listed below.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
48
1) Three levels o f communality: high: 0.6,0.7 or 0.8;wide: 0.2, 0.3, 0.4, 0.5, 0.6, 0.7 or
ranging from 3 to 12
For each level o f communality, each number o f common factors was used to
these matrices was related to a particular ratio of p/f. For example, if the level o f
communality is high, and the number o f common factors is 5, then we would generate
o f common factors from one to six) times 10 (p/f ratio values 3 to 12), or 180 different
conditions.
Tucker et al. (1969) conducted a Monte Carlo study that provided an appropriate
al. generated 18 matrices in his paper and used the 18 matrices to study the effectiveness
wide, low and three ratio o f p/f: 10/3, 20/3 and 20/7).
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
49
For each o f these nine population correlation matrices, MacCallum et al. generated
sample correlation matrices in four levels o f sample size (60, 100, 200, 400). 100 sample
correlation matrices were produced for each level o f sample size in each o f the nine
MacCallum et al.'s study provides a general conception about the effect o f sample
size. However, MacCallum et al.'s study was limited in that only one population
correlation matrix was used for each situation. To investigate the more general effect o f
situation and a larger variety o f sample sizes for each population correlation matrix. To
accomplish this goal, 100 population correlation matrices will be generated for each o f
the 180 situations using Tucker et al.’s procedure (Tucker et al., 1969). Hence, 18,000
Then, sample correlation matrices are generated from each o f these 18000
population correlation matrices by using a small sample size as a start point. The first
sample size used in the procedure is dependent on the number o f variables. The sample
(3) When sample size is between 100 and 300, it increases bylO.
(4) When sample size is between 300 and 500, it increases by 50.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
50
This procedure is stopped when the results o f population and sample correlation
matrices match both o f two criteria that use the coefficient o f congruence to present the
similarity o f two matrices and will be described clearly later. If these two criteria cannot
be matched when sample size is greater then 5000, then the procedure is also stopped. In
distributions.
correlation matrices were generated for each size o f sample until the size o f sample made
the coefficients o f congruence match both o f the criteria. Each o f these sample matrices
was analyzed using maximum likelihood factor analysis. The retained number o f factors
was set equal to the known number o f factors in the population (i.e. from one to six).
assumption, the maximum likelihood factor analysis, which is based on the assumption
that the population and measured variables follow a multivariate normal distribution, was
Rotation Method
In order to carry out a comparison between the solution obtained from each sample
correlation matrix and the solution from the corresponding population correlation matrix,
and because all sample and population solutions can be freely rotated, we need to
consider the issue o f rotation (1999, MacCallum et al). MacCallum et al. (1999) used
direct quartimin rotation, an oblique analytical rotation method, to rotate his population
solution. He thought that the relationships among the factors would be unknown in
practice, so a less restrictive oblique rotation would be more appropriate. However, the
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
51
population factors were orthogonal in Tucker et al.'s (1969) design, and just as Johnson
(1998) said:
method, was used to rotate both the population and sample solutions.
Coefficient o f Congruence
between each factor from the sample solution and the corresponding factor from the
( £ / 2j*(o ) ( £ / 2* m )
y=i y=i
where f Mt) is the population factor loading for variable j on factor k and f Jk{s) is the
when we generated this population correlation coefficient), w e compute the mean value
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
52
where K is the average value o f tf>k . However, f! different Ks can be obtained from a f-
factor condition by rearranging the f columns order. In this study, the maximum K value
o f these f! Ks will be used to present the most similar situation between this sample
correlation matrix’s rotated MLFA solution and the corresponding population correlation
matrix’s rotated MLFA solution. Therefore, 100 Ks will be obtained from a population
correlation matrix and its related sample correlation matrices. These 100 Ks will be
K (S} + K sq\
sorted by their values as: K (l) < K {2) <■■■< K^lQ0). Then the value o f —-—- — -— will be
used to present the 95% Cl’s lower boundary o f this population correlation matrix with
this particular sample size level. This 95% C l’s lower boundary w ill be indicated by
conditions with a specific sample size, 100 K 35 s will be obtained by this procedure.
congruence: 0.98 to 1.00 = excellent, 0.92 to 0.98 = good, 0.82 to 0.92 = borderline, 0.68
In this study, R32 is defined as the percentage o f the number o f K gs s that are larger
than 0.92 in the 100 K 3S s from one condition with a specific sample size, and R gi is
defined as the percentage of the number o f K ss s that are larger than 0.98 in the 100
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
53
interpretation, R32 will be called as good-level criterion and Rss will be called as
excellent-level criterion.
The following two situations depict the “match the good-level criterion”.
(1) Three successive sample sizes’ R 92s are equal to or greater than 0.95.
(2) Two successive sample sizes’ RS2 s are equal to or greater than 0.95, the next
sample size’s R 92 is less than 0.95, and the next two successive sample sizes’
The following two situations depict the “match the excellent-level criterion”.
(1) Three successive sample sizes’ i?98 s are equal to or greater than 0.95.
(2) Two successive sample sizes’ Rss s are equal to or greater than 0.95, the next
sample size’s R gs is less than 0.95, and the next two successive sample sizes’
Using these two matching-situations with two criteria, this study tries to provide two
minimum necessary sample sizes for each o f 180 conditions and uses these m inim um
necessary sample sizes as an index to discuss the relationship between number o f factors,
number o f variable, ratio o f variable number to factor number, and the level o f
communality.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
CHAPTER IV
RESULTS
In this study, two minimum necessary sample sizes o f each o f 180 different conditions
determine these two minimum necessary sample sizes o f each condition, various sample
sizes were used in the calculation for each condition. There were 371,600 population
(100 sample correlation coefficient matrices for each population correlation coefficient
The results o f this study are organized in Table 13. Table 13 shows the minimum
necessary sample sizes for each set o f conditions under two criteria; those obtained for 0.92
are considered to reflect good matching (good-level criterion) and those obtained for .98
are considered to reflect excellent matching (excellent-level criterion). When the number
necessary sample sizes were calculated from unrotated maximum likelihood factor
loadings. When the number o f factors is equal to or greater than 2, the coefficients o f
congruence were calculated from rotated maximum likelihood factor loadings. There is no
minimum necessary sample size suggested in Table 13 for the excellent-level criterion
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
55
when the factor number is equal to 2 because the criterion can not be matched even when
communality- orientated figures, were generated from Table 13. The red numbers in Table
13 indicate those minimum necessary sample sizes that are not really m in im u m ^ they can
be smaller. However, due to the restriction o f the sample correlation coefficient matrix
generating procedure that requires that the sample size can not be less than the number o f
variables plus 3, those sample sizes were used as the suggested m in im u m necessary sample
size in this study. The notation FI, F 2 , ..., F6 is used to represent factor numbers o f 1, 2,. .. ,
6, respectively.
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
56
Table 13
The minimum necessary sample sizes o f each condition under two criteria
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
57
Factor-Orientated Section
minim um necessary sample sizes for one factor with the ratios o f variables to factors
ranging from 3 to 12 for the excellent-level criterion (0.98) and good-level criterion (0.92).
The m in im u m necessary sample sizes decrease as the ratios o f variables to factors increase
and the levels o f communality become higher. In Figure la and lb w e see that both p /f
In Figure lb , the curve for the high level o f communality rises slightly after the p /f
ratio becomes larger than 7. In this study, there is a restriction that the sample size (N) and
variable number (p) must follow the rule: N-p>3. So, in some conditions, the m in im u m
sample sizes w ill increase as the p /f ratio increases. In this study, all o f the slightly rising
curves occur for the same reason as mentioned above. In this study, only one o f the rising
curves occurs when using the excellent-level criterion, all the other rising curves occur
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
3 4
fllow98 150 95 75 70 50 55 50 50 50 50
fl wide98 no <55 50 SO 40 36 33 32 36 30
ifl high98 . 32 27 21 19 18 18 17 15 16 15
p /f ratio
Figure la. The minimum necessary sample sizes for one factor with the ratios o f variables
to factors ranging from 3 to 12 for the excellent-level criterion (0.981.
n low92
nwide92
flhig92
0 -*=•
3 4 5 6 7 8 9 (0 II 12
fl low92 45 35 35 30 30 23 22 20 20 20
fl wide92 ! 35 25 30 20 20 15 15 14 14 15
fl hlg92 1 13 13 II 12 11 II 12 13 14 IS
p /f ratio
Figure lb. The minimum necessary sample sizes for one factor with the ratios o f variables
to factors ranging from 3 to 12 for the good-level criterion (0.92).
R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
59
Figure 2 shows the minimum necessary sample sizes o f two factors with p /f ratios from
3 to 12 using the good-level criterion (0.92). When the number o f factor is 2, the
excellent-level criterion could not be matched even with the sample size larger than ten
thousand, so there is no figure for the excellent-level criterion. As shown in Figure la and
lb, the minimum sample sizes o f factor number equal to 2 will decrease as the p /f ratio
In Figure 2, it is clear that when the p /f ratio increases, the m inim um necessary sample
sizes o f the three different levels o f communality become more alike. And, as the p /f ratios
are equal to or larger than 6, the minimum necessary sample sizes for each level o f
communality decrease very slowly. For example, under high levels o f communality, the
minimum necessary sample size decreases from 40 to 35 as the p /f ratio increases from 6 to
12-
numbers (3, 4, 5, and 6) using the two criteria (0.98 and 0.92) are presented. The
relationships between the minimum necessary sample sizes, level o f communality, and p /f
ratio for 4 different factor numbers are similar to each other. And the m inim um necessary
sample sizes will decrease very slow ly with increasing p/f ratio as these p /f ratios are equal
to or larger than 7.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
3 4 S 6 7 8 9 10 II 12
Etav92 (CD 120 75 60 ISO (D SO 45 40
f2wide92 1<50 90 (0 55 SO 4S 40 35 35 35
aam SO 75 45 40 40 40 35 35 35 35
pvTratio
Figure 2. The minimum necessary sample sizes for two factors with the ratios o f variables
to factors ranging from 3 to 12 for the good-level criterion fO-92T
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
61
1 4 5 6 7 8 9 10 II 12
rjlo>9B 1700 4 50 220 160 10 0 10 0 85 85 75 75
11 w i d . 98 17 0 0 ISO 200 14 0 10 s 90 70 75 65 70
rjt>iih?8 ' <00 260 1JO 95 7S 75 60 60 55 55
Figure 3a. The minimum necessary sample sizes for three factors with the ratios o f
variables to factors ranging from 3 to 12 for the excellent-level criterion (0.98).
1400
1200
1000
f3Iow92
f3wide92
F3high92
0 -*=
3 4 5 6 7 8 9 10 II <2
f3lov»92 i 1200 230 85 85 i 65 60 60 40 45 40
;f3wide92 j 450 130 80 65 55 45 40 35 40 40
f3high92 i 170 120 65 50 40 30 30 35 40 40
p /f ratio
Figure 3b. The minimum necessary sample sizes for three factors with the ratios o f
variables to factors ranging from 3 to 12 for the good-level criterion f0.92).
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
62
f4 l o w 9 8
r4w ide98
fJh.ahQK
0 -
3 4 « * 7 8 9 I0 t I 12
Figure 4a. The minimum necessary sample sizes for four factors with the ratios o f
variables to factors ranging from 3 to 12 for the excellent-level criterion fO.98).
f4low92
f4wide92
f4high92
0
3 4 5 6 7 8 9 10 11 12
f4low92 1200 250 170 130 75 75 60 60 50 55
f4wide92 500 240 110 75 75 50 50 45 50 50
;f4hiph92 j 260 170 90 55 55 40 40 45 55 55
p/f ratio
Figure 4b. The minimum necessary sample sizes for four factors with the ratios o f
variables to factors ranging from 3 to 12 for the good-level criterion (0.92).
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
63
3500 T.
151o\v98
f5wide98
f5high98
U-
3 4 5 6 7 8 9 10 11 12
151ow98 3000 1000 430 200 170 130 100 110 95 100
f5wide98 1400 900 300 200 150 130 90 85 85 85
£5hieh98 1000 450 260 200 130 75 80 65 60 65
p/f ratio
Figure 5a. The minimum necessary sample sizes for five factors with the ratios of
variables to factors ranging from 3 to 12 for the excellent-level criterion fO.98).
1400
j 1200
f51ow92
| 1000
t/i f5wide92
I* 800
f5high92
%
0 600
c
3S 400
1 200
u
3 4 5 6 7 8 9 10 11 12
Figure 5b. The minimum necessary sample sizes for five factors with the ratios of
variables to factors ranging from 3 to 12 for the good-level criterion (0.92T
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
64
«
C. folovv98
£
f5wide98
(5high98
E
c3:
u
3 4 5 6 7 8 9 10 11 12
f8Iow98 3800 1400 400 260 140 130 120 110 105 110
Figure 6a. The minimum necessary sample sizes for six factors with the ratios o f variables
to factors ranging from 3 to 12 for the excellent-level criterion (0.98).
f6low 92
E f6w idc92
3
£a> f6high92
<ss
4>
it
e
E
3 4 5 6 7 8 9 10 11 12
Figure 6b. The minimum necessary sample sizes for six factors with the ratios o f variables
to factors ranging from 3 to 12 for the good-level criterion f0.92).
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
65
Three conclusions emerge from these figures. First, higher levels o f communality for
the same number o f factors require smaller m in im u m sample sizes for each o f these two
criteria. When p /f ratios are less than or equal to 5, the minimum necessary sample size for
low levels o f communality can be triple that o f high levels o f communality or even more.
In contrast, when the p /f ratio is equal to or greater then 7, the differences in m inim um
necessary sample sizes between the three levels o f communality w ill become very small.
Table 14 shows the ranges o f the minimum necessary sample sizes under 11 different
conditions.
Table 14
The ranges o f minimum necessary sample size in 11 different conditions for p /f ratio=7.
Second, the minimum necessary sample size w ill decrease as the p /f ratio increases.
However, i f the p /f ratio is equal to or greater than 7, for any number o f factors, the
Third, if the p /f ratio is equal to or greater than 6, the minimum necessary sample sizes
for the three levels o f communality will be very much alike. In addition, as the p /f ratio
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
66
increases, the minimum necessary sample sizes for the three different levels o f
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
67
Figures 7, 8, and 9 show the relationships between the p /f ratio and the m inim um
necessary sample size. Figure 7a, 8a, and 9a present these relationships for three different
levels o f communality, the excellent-level criterion, and 4 different factor numbers. Figure
7b, 8b, and 9b present these relationships for three different levels o f communality, the
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
68
f6high98
f5high98
f4high98
f3high98
3 4 5 6 7 8 9 10 1 1 1 2
Figure 7a. The minimum necessary sample sizes for 4 different factor numbers and high
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion fO.98).
400
f6high92
300 £5high92
f4high92
s B high92
£2hieh92
u
3 4 5 6 7 8 9 10 11 12
:Ghieh92 ’ 90 75 45 40 40 40 35 35 35 35
p /f ratio
Figure 7b. The minimum necessary sample sizes for 5 different factor numbers and high
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion fO.92).
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
69
2000
- V- -7 V l ^r . V ^ i
E 1500
a
us
&•
1000
u
3 4 5 6 7 8 9 10 11 12
Figure 8a. The minimum necessary sample sizes for 4 different factor numbers and wide
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion f0.98).
f 6 w td c 9
rSwidc9
f4 w id e 9 2
f3 w id e 9
f2 w id e 9
0
3 4 5 6 7 8 9 10 11 12
f3wide92 450 1 30 80 65 55 45 40 35 40 40
f2w ide9 2 160 90 60 55 50 45 40 35 35 35
p / f ra tio
Figure 8b. The minimum necessary sample sizes for 5 different factor numbers and wide
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92).
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
70
3500
f6 lo w 9 8
3000 f5 l o w 9 8
f4 t o w 9 8
2500 f3low98
i£*
I
K
2000 -
e
£ 1500 -
Figure 9a. The minimum necessary sample sizes for 4 different factor numbers and low
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion fQ.gBT
[6 lo w
ra lo w
f3 l ow
f 2 l ow
f6 lo w 4 2 i 1200 400 I 60 1 20 80 75 70 70 70 75
f5 l o w 9 2 : 1300 400 I 80 1 20 85 80 65 60 60 65
f4 lo w 9 2 1 120 0 250 1 70 I 30 75 75 60 6 0 50 55
m o w 0 2 : 1200 2 30 8 5 8 5 65 60 60 4 0 4 5 40
f2 !o w 9 2 ' 600 ! 20 7 5 60 60 60 50 4 5 45 40
p / f ra tio
Figure 9b. The minimum necessary sample sizes for 5 different factor numbers and low
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion ('0.921.
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
Basically, these figures provide information similar to that found in Figures 1 through
6. However, in Figures 7, 8, and 9, there are some new phenomena that need to be
discussed. First, i f the p /f ratio is equal to or greater than 7, the m in im um necessary sample
size for the three levels o f communality and for the different factor numbers will be very
close. Table 15 shows, the ranges o f minimum necessary sample size under 6 different
conditions (3 levels o f communality and two criteria) when the p /f ratio is equal to 7. In
Table 15, it is clear that the range for higher level o f c o m m unality is smaller than the range
Table 15
The range of the maximum necessary sample sizes o f factor numbers ranging from 2 to 6
under three levels o f communality and two criteria when p/f ratio is equal to 7.
High Wide Low
Secondly, when the p /f ratio is fixed, a larger number o f factors requires a larger
sample size generally. But this relationship is not always true. When the p /f ratio is greater
then 5 and using the good-level criterion (0.92), sometimes a smaller number o f factors
requires a larger sample sizes than a larger number o f factors. This probably results from
Figures 10a, 10b, 10c, lOd, lOe, and lOf present the same information as presented in
Figures 7, 8, and 9 with the p /f ratio on the horizontal axis replaced with the number of
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
72
variables. It is clear in these figures that a larger number o f factors requires a larger sample
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
73
N v A No ve ^ ^ j. & ^ j ^
F i g u r e I Oa ihe a u m b e r o f v a r u b l c t
■^““f3wide98
® f4widc98
f5wide98
* f6wide98
§ 2500
8 210 00
8
*.s 1000 J
F i g u r e 1 0c
the n u m b e r o f variables
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
74
“Ghi g 9 2
Oh i g h 9 2
f4high92
C5high92
fbhigh92
■♦—-f2w tde92
« — f3w lde92
f4wfde92
*<— f5wfde92
4K— f$wlde92
1400
1000
Figure 10. The minimum necessary sample sizes for six conditions with the related number of variables.
Each of these 6 panels shows the minimum necessary sample size of one of six conditions (three levels of
communality and two criteria). The horizontal axis shows the number of variables and the vertical axis
shows the minimum necessary sample size in each condition.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission .
CHAPTER V
DISCUSSION
This study attempted to investigate the relationships between the sample size, the
number o f variables, the number o f factors and the level o f communality in factor analysis
to provide some recommendations about the minimum necessary sample size under
different conditions.
Conclusions
First, the ratio o f the sample size (N) to the number o f variables (p) may not be an
appropriate index to decide the minimum necessary sample size. Many different N/p ratios
have been proposed. Cattell (1978) suggested that this ratio should be in the range o f 3 to 6.
Nunnally (1967) offered a widely cited rule that “a good rule is to have at least ten times as
many subjects as variables”. Everitt (1975) also gave the same suggestion that the N/p
In this study, when the number o f factors (f) is fixed, N and p bear an inverse
relationship to each other. When using the coefficient o f congruence criterion with fixed
size and a smaller variable number requires a larger m inim um necessary sample size. The
relationship between the minimum necessary sample size and the number o f variables for a
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
76
between the minimum necessary sample size and the ratio o f variable numbers to factor
Marsh et al. (1998) has presented a similar result. Their major focus is on the question
“Is more ever too much” in relation to N (sample size) and particularly p /f in confirmatory
factor analysis. They used the frequency o f fully proper, improper, and nonconverged
solutions (number o f iteration higher than 500) and standard errors as indexes to
investigate the relationship between 5 levels o f sample sizes (50, 100, 200, 400, 1000) and
They found that in the same level o f sample sizes, using more indicators per factor can
(even when solutions are improper), more accurate and stable parameter estimates, and
more reliable factors.” They concluded that there is a compensatory relationship between
sample size and the number o f indicators per factor in confirmatory factor analysis.
Secondly, the difference in minimum necessary sample sizes between two different
levels o f communality. But in practice this is not so easy. The conservative choice is to use
the minimum necessary sample size for low levels o f communality. Therefore, using a
higher p /f ratio (at least 5) will be a better choice when researchers have no prior estimate
o f the level o f communality. If it is possible, the p/f ratio should be equal to or greater than
7.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
77
A widely cited recommendation is the sample size should be at least 100 (Gorsuch,
1983; Kline, 1979). In this study, using the good-level criterion (0.92) and a low level o f
communality, N =100 is not sufficient when the p/f ratio is 3 or 4 and the number o f factors
ratio is 6.
Comrey and Lee (1992) offered a rough rating scale for adequate sample sizes in factor
analysis: 100 = poor, 200 = fair, 300 = good, 500 = very good, 1000 or more = excellent. In
fact, when the p /f ratio is equal to or greater than 7, using the excellent-level criterion (0.98)
and a low level o f communality, N = 200 is sufficient for factor numbers = 3, 4, 5, 6. If the
p/f ratio is 3 and using the good-level criterion (0.92), the minimum necessary sample sizes
for factor numbers o f 3, 4, 5, 6 and low level communality are all larger than 1000. So,
recommendations regarding absolute sample sizes should be restricted, if not avoided all
together.
The purpose o f this study was to provide some guidelines about minimum necessary
sample size o f exploratory factor analysis. Based on the figures shown in Chapter 4 and
these three conclusions in this chapter, some suggestions are made in Table 16.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
78
Table 16
Recommendation o f minimum necessary sample size with different p /f ratios for three
51
4^
i-n
II
P /f = 4
ii
5
N > 500 N > 900 N > 1400
Excellent-level P/f = 6 P /f = 6 P /f = 6
P/f = 8 P /f = 8 P /f = 8
P /f = 5 P /f = 5 P /f = 5
(0.92)
•-*>
00
00
II
P /f = 7
ii
5
N > 55 N > 60 N > 80
In Table 16, minimum necessary sample sizes are presented for various p/f ratio and 6
criterion, three minimum necessary sample sizes are given with a related p /f ratio. If it is
possible, larger p /f ratios are recommended because, for each o f the three levels o f
communality, the decreasing proportion o f the change o f m inim um necessary sample size
is larger than the increasing proportion o f the change in the p/f ratio. That is, a larger p/f
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
79
ratio will have a smaller N x p /f in the same condition. For example, if the factor number is
set to be 5 with low levels o f communality, there are three choices to match the
For the first choice, there would be at least 20 x 1400=28,000 elements in the data
matrix. For the second choice, there would be at least 30 x 260=7,800 elements in the data
matrix. And for the third choice, there are would be least 40 x 130=5,200 elements in the
data matrix. It is clear that a higher p /f ratio can dramatically reduce the volume or size o f
When the p /f ratio is larger than 8, the m inim um sample size decreases very slowly in
both the excellent- and good-level criteria. Keeping the p /f ratio as high as 8 permits the
volume o f the data set to be small in most conditions. Regardless, if possible, it is always
When using the good-level criterion and high level o f communality, there is no
recommendation for the p/f ratio equal to 8, because the limitation o f N > p+3 causes the
minimum necessary sample size to rise after the p/f ratio exceeds 7.
The p /f ratio should never be less than 3 unless extremely large samples are available.
Here “extremely large” means at least five thousand. In many cases, even twenty thousand
is not enough.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
80
This study used simulation methods to provide two m inim um necessary sample sizes
for the excellent-level and good-level criteria. Based on the nature o f simulation, however,
these suggested m in im u m necessary sample sizes are not expected to be the actual values
for each condition. These suggestions should be used as an estimate when a researcher
In this study, the correct factor number is assumed to be known. The criterion can be
matched using smaller sample sizes with this assumption. Without this assumption, both
underfactoring and overfactoring may occur and the methods o f “how to decide the factor
number” will need to be considered further for this situation. Therefore, the questions of
“what sample size is needed to get the same number o f factors from sample and population
data under different conditions” and “which method can be used to decide the number o f
Another limitation deals with communality. There were only three levels o f
communality considered in this study. Other levels may have produced different results.
And, even if all possible combinations o f communality could have been analyzed, a
researcher cannot know which combination he/she should use unless he/she already knows
the results. No further study is suggested for this question even though the level of
above, since it is unusual for a researcher to know the exact level o f communality in a
population, and even if all combinations o f minimum sample sizes are known, the
researcher wouldn’t know which one he/she should use. The best suggestion is to use a
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
81
high p /f ratio (> 7), then the differences between the minimum necessary sample sizes
One other limitation in this study is the relationship between variable numbers and
factor numbers. For example, 18 variables and 3 factors will produce a p/f ratio equal to 6.
The issue is how many variables define a factor. In this study, it can be (1) 6 variables per
factor, or (2) 9 variables for one factor, 8 variables for another factor, and 1 variable for the
other. The second situation will require a larger sample size than the first one. When the
p/f ratio is smaller then 5, it is frequently the case that one factor is related to only single
number o f variables will require a larger sample than a population correlation matrix in
which every factor is related to the same number o f variables, even though both these
population correlation matrices have the same p/f ratio. Therefore, even though the p /f
ratio is the same, the different combinations o f the number o f variables and the number o f
factors will require different sample sizes to match the same criterion. Using the
assumption that every factor is related to the same number o f variables, Marsh et al. (1998)
obtained their conclusions that “There was a compensatory relation between N and p /f ’ in
confirmatory factor analysis. But so far, no research has been published about “what is the
difference between the different combinations o f the variables and factors with the fixed
Finally, all the suggestions made in this study result from a study o f maximum
likelihood factor analysis and the Varimax rotation method. Other factor analysis methods
are expected to have only slight difference for m inim um necessary sample size. These
differences may become larger if other rotation methods are used. However, the
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
82
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Appendix A
The population correlation coefficient matrices generating procedure when the number of
factors is one and the source code o f the S AS/IML program which is used to perform this
procedure.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
84
(1) To generate a p x 1column vector whose elements are randomly generated by following
rules:
(a) For high level o f communality: elements are randomly picked from V 0 .6 , V 0 .7 ,
and V 0 .8 .
(b) For wide level o f communality: elements are randomly picked from V 0 .2 , V 0.3 ,
4 0 A , V 0 5 , 7 0 6 , V o /7 , and V 0 8 .
(c) For low level o f communality: elements are randomly picked from V 0 .2 , V 0.3 ,
and V 0 . 4 .
(2) To use this column vector as an factor pattern and multiple its transpose vector to get a
p x p matrix.
(3) To make the diagonal element o f this matrix equal to 1 then a population correlation
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
dm ' l o g ; c l e a r ; o u t p u t ; c l e a r ; ' ;
proc iml;
start buildpop(fl,p,bltype) global(bl,bll,b33,mlflpopl,popcorr) ;
bl=i(p);
b3=i(p);
mlflpopl=j(p,fl,0);
if (bltype=l) then do i = 1 to p;
b h = u n i f o r m (-1)*3;
if (0 <= bh 6 bh < 1) then bl[i,i] = sqrt (0.6);
else if (1 <= bh & bh < 2) then bl[i,i] = s q r t (0.7);
else if (2 <= bh & bh < 3) then bl[i,i] = sqrt(0.8)
b3[i,i] = sqrt(1-bl[i,i]*bl[i,i]);
end; ++** end of if (bltype=l) *+**;
if (bltype=2) then do i = 1 to p;
bw=uniform(-l)*7;
if (0 <= bw & bw < 1) then bl[i,i] = s q r t (0.2);
else if (1<= bw & bw < 2) then bl[i,i] = s q r t (0.3)
else if (2<= bw & bw < 3) then bl[i,i] = s q r t (0.4)
else if (3<= bw & bw < 4) then bl[i,i] = sqrt(0.5)
else if (4<= bw & bw < 5) then bl[i,i] = s q r t (0.6)
else if (5<= bw & bw < 6) then bl[i,i] = s q r t (0.7)
else if (6<= bw & bw < 7) then bl[i,i] = sqrt(0.8)
b3[i, i]=sqrt(1-bl[i, i] *bl[i,i]);
end; *** end .of if (bitype=2) * + *+;
if (bltype=3) then do i = 1 to p;
b l = u n i f o r m (-1)*3;
if (0 <= bl & bl < 1) then bl[i,i] = s q r t (0.2);
else if (1 <= bl & bl <-2) then bl[i,i] = sqrt (0.3)
else if (2 <= bl & bl < 3) then bl[i,i] = sqrt(0
b3[i,i] = sqrt (1-bl [i, i] *bl [i, i] ) ;
end; *** end of if (bltype=3);
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
86
do i = 1 to p;
mlflpopl[x, f 1]=bl[i,i] ;
end;
bll=bl*bl;
b33=b3*b3;
popcorr=mlflpopl*mlflpopl'+b33;
finish buildpop;
fl=l; bltype=l; p=4;
run buildpop(fl,p,bltype);
print bl,bll,b33,mlflpopl,popcorr;
quit ;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
87
Appendix B
The source code and flowchart o f the SAS/IML program which was used in this study for
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
Flow chart
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
dm'log;clear;output;clear;';
option linesize=120;
proc iml;
bl=i(p);
b3=i(p);
if (bltype=l) then do i = 1 to p;
bh=uniform(-1) *3;
if (0 <= bh & bh < 1) then bl[i,i] = sqrt(0.6);
else if (1 <= bh & bh < 2) then b l[i,i] = sqrt(0.7);
else if (2 <= bh & bh < 3) then b l[i,i] = sqrt(0.8);
b3[i,i] = sqrt(l-bl[i,i]*bl[i,i]);
end; **** end o f i f (bltype=l) ****;
if (bltype=2) then do i = I to p;
bw=unifonn(-1)*7;
if (0 <= bw & bw < 1) then bl[i,i] = sqrt(0.2);
else if (1 <= bw & bw < 2) then b l[i,i] = sqrt(0.3)
else if (2 <= bw & bw < 3) then b l[i,i] = sqrt(0.4)
else if (3 <= bw & bw < 4) then b l[i,i] —sqrt(0.5)
else if (4 <= bw & bw < 5) then b l[i,i] = sqrt(0.6)
else if (5 <= bw & bw < 6) then b 1[i,i] = sqrt(0.7)
else i f (6 <= bw & bw < 7) then b l[i,i] = sqrt(0.8)
b3 [i,i]=sqrt(l-b 1[i,i]*b I [i,i]);
end; *** end o f i f (bltype=2) ****;
if (bltype=3) then do i = 1 to p;
bl=uniform(-1)* 3 ;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
i f (0 <= bl & bl < 1) then bl[i,i] = sqrt(0.2);
else if (1 < = b l & bl < 2) then bl[i,i] = sqrt(0.3);
else if (2 <= bl & bl < 3) then b 1[i,i] = sqrt(0.4);
b3[i,i] = sqrt(l-b 1[i,i]*b 1[i,i]);
end; *** end o f if (bltype=3);
b ll= b l* b l;
b33=b3*b3;
* * * * * * * : i c * * * * * J l c * = ( c : * : : t : * * * * * * * * * * * * j | c : | c : t e : | e : ( : : | e : ( c : t : : t e : t : * : | e ! | e : t c : t c : t : : ( c : ( c s t c : t : S(c
IA l= j(p,fl,l);
do i = 1 to p ;
resele=fl;
odd=fl;
usevec=l:fl;
do j = 1 to f l ;
order=int(uniform(-1) *resele+l);
fload=usevec [ 1,order];
i f j< fl then do;
putin==int(uniform(-l)!f:odd);
IA1 [i,fload]=putin;
odd=odd-p utin;
end; *** end o f ifj< fl ***;
else if j= fl then IAl[i,fload]=(odd-l);
usevec=remove(usevec,order);
resele=resele-l;
end; *** end o f do j = 1 ***;
end; *** end o f i = 1 to p ***;
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
iaia=IAl*IA r;
diaia=diag(iaia);
dia=diaia##0.5;
idia=inv(dia);
SAl=idia*IAl;
** ** * * * * * * ** * * afe* * * * * * * * * ** ** ** * * ** * * * * * * ** * * * * * * * * .
»
** by 1968 LINN'S paper, equ(22) can trasfer **;
** SA1 to the actual input factor loading (ALFL) **;
** then we premultiply AIFL with B l **;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
**************************************************
** A1C is a fl* fl diagonal matrix **
** ( is Cm l in 1969 Tucker eq(8) ) **
** which is used to present the general **
** control an experimenter has on the **
** loading o f actual variables on the factor **
*** sfc* ******* *****=!:***=f=********* =t=**** ***:***** s****=t=*
A lC =I(fl);
do i = 1 to fl;
t = int(uniform(-l)*3);
if t = 0 then c=0.7;
else if t = 1 then c=0.8;
else i f t = 2 then c=0.9;
AlC[i,i]=c;
end; *** end o f do i = 1 ***;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
9
** xran is a p*fl matrix **;
** which presents the random affect **;
** on each input loading **;
** and premutiply (1-A1CA2) to present **;
** the affect **;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
xran=3 (p ,fl,l);
do i = 1 to p;
do j = 1 to f l ;
xran[ij]=normal(-l);
end; *** end o f j = 1 to fl ****;
end; *** end o f i = 1 to p ***;
sic*************************************************.
9
xransq=xran*xran';
invdidi 1=diag(xransq);
invdil=root(invdidil);
di 1=inv(invdi 1);
y 1=S A1 * A1 C+di 1*xran*root(I(fl )-A l C*A1 C);
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
************************************.
9
zlsq = z l* z l';
invgigi 1=diag(z 1sq);
invgi 1=root(invgigi 1);
gi 1=inv(invgi 1);
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
I* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
** TA1 is actual input factor loading for the major domains **;
9
T A l= g il* zl;
** TA3 is actual input factor loading for the unique domains **;
********************************** ****************************** . 9
TA3=i(p);
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
A2star(A2*) **
A3star(A3*) **
** in 1969 Tucker's paper **
* *
FA1 means final A1 (major)
which is A1 in Tucker's paper **
FA2 means final A2 (minor) **
** which is A2 in Tucker's paper **
FA3 means final A3 (unique) **
** which is A3 in Tucker's paper **
sfe * * * * * * sfc * sfe afe * * s * * * * sfe * * * * * * * * * * * * * * * a{e a*c * * * * 4s
F A l=bl*T A l;
FA3=b3*TA3;
popcorr=FAl *FA1 '+FA3 *FA3';
finish buildpop;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*******************************************************************
start buildsam(popcorr,n,p) global(samcorr);
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
***************************************************.
** the following processes are try to ***;
** to use Wijsman method 1959 to do the ***;
** smae thing as Kaiser but reduce the the computing cost ***;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
ag=i(p);
do i = 1 to p;
do j = 1 to i;
i f j < i then ag[iJ]=normal(-l);
i f j = i then ag[ij]=rangam(-l,(n-j)/2);
end;
end;
ifrnew=root(popcorr);
fmew==ifmew';
samA=frnew*ag*ag' *frnew';
cfiroot=samA/n;
dcfp=diag(cffoot);
dcfprt=root(dcfp);
idcfprt=inv(dcfprt);
samcorr=idcfprt*cfiroot*idcfprt;
finish buildsam;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
do until (cri<mlcri);
phirt=phipop##0.5;
invphi=inv(phipop);
invphirt=invphi##0.5;
rstar=4nvphirt*dat*invphirt;
call eigen(vals,vecs,rstar);
vals=diag(vals);
keepnum=l
keep val=vals [keepnum,keepmim] ;
keepvec=vecs[,keepnum];
inum=i(fhum);
m]flpop^hirt*keepvec*((keepval-mum)##0.5);
resM=dat-mlflpop*mlflpop';
newphi=diag(resM);
do posphi = 1 to p;
i f newphi[posphi,posphi]<0 then newphi[posphi,posphi]=0;
end;
diffi=abs(newphi-phipop);
cri=max(diff);
phipop=newphi;
end;
finish mlfapop;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
phisam=( 1-fhum/(2*p))*invsii;
do until (cri<mlcri);
phirt=phisam##0.5;
invphi=inv(phisam);
mvphirt=invphi##0.5;
rstar4nvphirt*dat*invphirt;
call eigen(vals,vecs,rstar);
vals=diag(vals);
keepnum=l :£hum;
keep val=vals [keepnum,keepnum];
keepvec=vecs [,keepnum];
inum=i(fiium);
mlflsam=phirt*keep vec*((keep val-inum)##0.5);
resM=dat-mlflsam*mlflsarn’;
newphi=diag(resM);
do posphi = 1 to p;
if newphi[posphi.posphi]<0 then newphi[posphi,posphi]=0;
end;
diff=abs(newphi-phisam);
cri=max(diff);
phisam=newphi;
end;
finish mlfasam;
S ic * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
9
allhhmax=](p, 1,0);
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
97
do i = 1 to p ;
do j —1 to fhum;
allhhmax[i, 1]=allhhmax[i, 1]+popfacpa[i j]##2;
end;
end;
allhxnax=allhhmax##0.5; **** because w e need to use the all hA2 ****■
**** but hmax is used for only two column ****;
do i = 1 to (fnum-l);
do j = (i+1) to fhum;
tempfp=popfacpa[heigh,i][|popfacpa[heigh,j];
fircol=temp£p[heigh, l]/allhmax; ************ fir sec is x/h y/h ******-^
seccol=tempfp [heigh,2]/allhmax;
tempfpn=fircol||seccol;
uxxyy=fircol#fircol-seccol#seccol;
vxy=2*fircol#seccol;
uvc=uxxyy#uxxyy-vxy#vxy;
uvd=2*uxxyy#vxy;
asumu=uxxyy[-r,];
bsumv=vxy [+,];
csumuvc=uvc [+, ];
dsumuvd=uvd[-i-,];
tan4=(dsumuvd-2*asumu*bsumv/p)/(csumuvc-(asumu*asumu-bsumv*bsumv)/p);
foursida=atan(tan4);
if (dsumuvd-2*asumu*bsumv/p) > 0 then do;
if (csumuvc-(asumu*asumu-bsumv*bsumv)/p) > 0 then sida=foursida/4;
if (csumuvc-(asumu!(casumu-bsumv*bsumv)/p) < 0 then do;
foursida=foursida+3.1415926;
sida=(foursida)/4;
end;
end;
if (dsumuvd-2*asumu*bsumv/p) < 0 then do;
if (csumuvc-(asumu*asumu-bsumv*bsumv)/p) < 0 then do;
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
foursida=foursida-3.1415926;
sida=foursida/4;
end;
if (csumuvc-(asumu*asumu-bsumv*bsurnv)/p) > 0 then sida=foursida/4;
end;
csida=cos(sida); ssida=sin(sida); nssida—ssida;
trans=3 (2 ,2 ,0 );
trans[l,l]=csida; trans[l,2]=nssida; trans[2,l]=ssida; trans[2,2]=csida;
tempg=tempfpn*trans; *** tempg is only two columns which is rotated this time
tempgi=tempg[, l]#allhmax;
tempgj=tempg[,2]#allhmax;
popfacpa[,i]=tempgi;
popfacpa[,j]=tempgj;
end;
end;
hforg=j(p,fnum,0);
do i = 1 to fnum;
hforg[,i]=allhmax;
end;
ghforv=popfacpa/hforg;
g4=ghforv##4;
g4row=g4[+,];
g4sum=sum(g4ro w );
g2=ghforv##2;
g2sump=g2[+,];
g2sump2=g2sump##2;
g22sum=sum(g2sump2);
newv=p*g4sum-g22sum;
vcrit=newv-prev;
prev=newv;
count=count+l ;
angel=sida/3.14*180;
ang4sida=foursida/3.14* 180;
end;
finish ropop;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
99
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
J
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
samfacpa=mlflsam;
fhum=ncol(samfacpa);
crit=0.05; *********** crit is the criteria for v-value ******;
*********** use 0 1 *******
prev=-l;
count=0;
allhhmax=]'(p, 1,0);
do i = 1 to p ;
do j = 1 to fhum;
allhhmaxfi, 1]=allhhmax[i, 1]+samfacpa[i j]##2;
end; *** end o f do j = 1 *****;
end; *** end o f do i == 1 ********;
allhmax=alllihmax##0.5; ** because we need to use the all hA2 ****;
** but hmax is used for only two column ****;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
100
asumu=uxxyy [+, ];
bsumv=vxy [+,];
csumuvc=uvc[+,];
dsumuvd=uvd[+-,];
tan4=(dsumuvd-2*asumu*bsumv/p)/(csumuvc-(asumu*asurnu-bsumv*bsumv)/p);
foirrsida=atan(tan4);
if (dsumuvd-2*asumu*bsumv/p) > 0 then do;
if (csumuvc-(asumu*asumu-bsum.v*bsumv)/p) > 0 then sida=foursida/4;
if (csumuvc-(asumu*asumu-bsumv*bsumv)/p) < 0 then do;
foursida=foursida+3.14;
sida=(foursida)/4;
end; ** end o f do in if* * ;
end; ** end o f do in if **;
if (dsumuvd-2*asumu*bsuinv/p) < 0 then do;
if (csumuvc-(asumu*asumu-bsumv*bsnmv)/p) < 0 then do;
foursida=foursida-3.14;
sida=foursida/4;
end; ** end o f do in if* * ;
i f (csuxnuvc-(asumu*asumu-bsumv*bsumv)/p) > 0 then sida=foursida/4;
end; ** end o f do in if**;
csida=cos(sida); ssida=sin(sida); nssida—ssida;
trans=j(2 ,2 , 0 );
trans[l,l]=csida; trans[l,2]=nssida; transpJJ^ssida; trans[2,2]=csida;
tempg=tempfpn*trans; * tempg is only two columns which is rotated this time *;
tempgi=tempg[, 1]#allhmax;
tempgj=tempg[,2]#allhmax;
samfacpa[,i]=tempgi;
samfacpafj]=tem pgj;
end; ** end o f do j = **;
end; ** end o f do i = **;
hforg=j(p,fhum,0);
do i = 1 to fhum;
hforg[,i]=allhmax;
end;
ghforv=samfacp a/hforg;
g4=ghforv##4;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
g4row=g4[+,];
g4sum=sum(g4row);
g2=ghforv##2;
g2sump=g2 [+,];
g2sump2=g2sump##2;
g22sum=sum(g2sump2);
newv=p*g4sum-g22sum;
vcrit=newv-prev;
prev=newv;
count=count+1;
angel=sida/3.14*180;
ang4sida=foursida/3.14* 180;
end; ** end o f do until **;
finish rosam;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
************************************************************
start procru(samfacpa,popfacpa,fl) global(factorK,coefF);
sama=samfacpa;
popb=popfacpa;
asama=sama' *sama;
iasama=inv(asama);
Tstar=iasama*sama' *popb;
itt=Tstar' *Tstar;
ditt=diag(itt);
sditt=ditt##0.5;
isditt=inv(sditt);
goodT=Tstar*isditt;
rotsam=sama*goodT;
sami=rotsam';
sqrotsam=rotsam##2;
sqpopb=popb##2;
sqsumsam=sqrotsam[+,];
sqsumpop=sqpopb[+,];
coefF = j(l,fl,0);
do i = 1 to fl;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
upr=sami[i,] *popb [,i];
samXpop=sqsumsam[ 1,i] *sqsumpop [1 ,i];
downx=samxpop##0.5;
coefF[l,i]=upr/downr;
end;
factorK=sum(coefF)/fl;
finish procru;
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
start coefinlr(popfacpa,samfacpa,fl,rotime,fbmatrix) global(coe£R,allcoR);
ra=popfacpa;
rb=samfacpa;
ratra=ra';
nn=fbmatrix;
sqrb=rb##2;
sqra=ra##2;
sumsqra=sqra[+,];
sumsqrb=sqrb[+,];
coefsamR=j( 1 fl ,0);
a!lcoR=3 (1 ,rotime,0);
do rr = 1 to rotime;
seq=rm[rr,];
do i = 1 to fl;
bbb=seq[l,i];
uprr=ratra[i,] *rb [,bbb];
aXbr=sumsqra[ 1,i] *sumsqrb [ 1,bbb];
downrr=aXbr##0.5;
coefsamR[ 1,i]=uprr/downrr;
end;
posr=abs(coefsamR);
coefRR=sum(posr)/fl;
allcoR[ 1,rr]=coefRR;
end;
coefR=max(allcoR);
finish coefinlr;
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
iasamar=inv(asamar);
Tstarr=iasamar*samar' *popbr;
ittr=Tstarr' *Tstarr;
dittr=diag(ittr);
sdittr=dittr##0.5;
isdittr=inv(sdittr);
goodTr=Tstarr*isdittr;
rotsamr=samar*goodTr;
samii=rotsarnr';
sqsamr=rotsamr##2;
sqpopr=popbr##2;
sqsumsr=sqsamr[+,];
sqsumpr=sqpopr[+,] ;
coefU m r^ (1 ,fl ,0);
do i = 1 to fl;
upru=samir[i,]*popbr[,i};
samXpopu=sqsumsr[ 1,i]*sqsumpr[ 1,i];
do wnru=samXpopu##0.5;
coefUmr[ 1,i]=upru/downru;
end;
coefU=sum(coe£Umr)/fl;
finish prU;
* % * * * * * * * fe * * * % * * * * * * * * * * * * * * * * * * * * * * * * sfe * * * * * * * * * * * * * * * * sfe * .
5
* * * ** * sfe *= * * * * 4c * * sfe * * * * * * * * * * * * * * * * * * * * * * * * * * * %* * * * * * * * * * * * * .
j
fb2={l 2,
2 1};
fb21=fb2[,l];
fb22=fb2[,2];
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
fb3_l=ins3||fb21||fb22;
fb3_2=fb21|[ins3||fb22;
fb3_3=fb21||fb22||ins3;
fb3=fb3_l//fb3_2//fb3_3;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * :£
ins4=j(6,l,4);
fb31=fb3[,l];
fb32=fb3[,2];
fb33=fb3[,3];
fb4_l=ins4||fb3 I||fb32||fb33;
fb4_2=fb31||ins4j|fb32||fb33;
fb4__3=fb31||fb32||ins4||fb33;
fb4_4=fb31||fb32[|fb33||ms4;
fb4=fb4_l//fb4_2//fb4_3//fb4_4;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
J
* * * * * * * * * * * * * * # * * * * * * * * * * * * * * sfe * * * * * * * * * * * * * * * * * * * * * * .
j
ins5=3'(24,l,5);
fb41=fb4[,l];
fb42=fb4[,2];
fb43=fb4[,3];
fb44=fb4[,4];
fb5_l=ins5||fb41||fb42||fb43||fb44;
fb5 2=fb41||ms5|jfb42||fb43||fb44;
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
fb5_3=fb41|[fb42||ins5||fb43||fb44;
fb5_4=fb41 ||fb42||fb43 ||ins5||fb44;
fb5_5=fb4111fb42| [fb431| fb44| [ins5;
fb5=fb5_l//fb5_2//fb5_3//£b5_4//fb5_5;
row5=nro w(fb5);
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
ins6=g(120,l,6);
fb51=fb5[,l];
fb52£=fb5[,2];
fb53=fb5[,3];
fb54=fb5[,4];
fb55=fb5[,5];
fb6_l=ins6||fb51|[fb52||fb53||fb54||fb55;
fb6_2=£b51||ms6||fb52||fb53!|fb54||fb55;
fb6_3=fb51||fb52||ins6||fb53||fb54|ifb55;
fb6_4=fb51||fb52||fb53||ins6||fb54||fb55;
fb6_5=fb51||fb52||fb53|jfb54||ins6||fb55;
fb6_6=fb51||fb52||fb53||fb54[|fb55||ms6;
fb6=fb6_l//fb6_2//fb6_3//fb6_4//fb6_5//fb6_6;
row6=nro w(fb6);
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
fl =6;
P=18;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
107
bltype=2;
k=0.2;
samtime=100;
poptime=100;
do popnum = 1 to poptime;
run buildpop(fl,p,bltype,k);
run mlfapop(popcorr,fl);
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
run ropop(mIflpop,p);
fk=j (1 ,samtime,0);
mlflm=] (1 ,samtime,0);
mlflr=3 (1 ,samtime, 0);
mlflu=3 (1,samtime,0);
do samnum = 1 to samtime;
run buildsam(popcorr,n,p);
run mIfasam(samcorr,fl);
run coefinI(mlflpop,mlflsam,fI);
mlflsam=mlflsam*adjmat;
run rosam(mlflsam,p);
runprocru(samfacpa,popfacpa,fl);
run prU(mLflsam,popfacpa,fl);
fk[ 1^amnumj^factorK;
mlflm[l,samnum]=coefM;
mlfIr[l,samnum]=coefR;
mlflu[l ,samnum]=coefU;
rankfk=fk;
fk[,rank(fk)]=rankflc; *** sort fk ********;
fk95[l,popnum]=(fk[l,6]+fk[l,5])/2; **** vector o f 95% o f K *****
fk90 [ 1,popnum]=(fk[ 1,11 ]+fk[ 1,10])/2;
fkmean=sum(£k)/samtime; ***** mean o f K ****;
avgfk[l,popnum]=fkmean; *** vector o f mean o f K ***;
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
rank m fM = m Iflm ;
mlflmLrank(mlflm)]=rankni£M;
mi95 [ 1,popnum]=(mlflm[ l,6]+mlflm[ 1,5])/2;
mf90[ 1,popnum]=(mlflm[ 1,1 l]+mlflm[ 1,10])/2;
mfmeaa=sum(inlflm)/samtime;
avgmf[ 1,popnum]=mfrnean;
rankRR=mlflr;
mIflr[,rank(inlflr)]=rankRR;
rf95[l,popnum]=(mlflr[l ,6]+mlflr[l ,5])/2;
rf90[ 1,popnum]=(mlflr[ 1,11 ]+mlflr[ 1,10])/2;
rfinean=sum(nilflr)/samtime;
avgrf[l ,popnum]=rfinean;
rankuu=mlflu;
mlflu[,rank(inlflu) ]=rankuu;
uf95 [ 1,popnum]=(mlflu[ 1,6]+mlflu[ 1,5])/2; .
uf90[ 1,popnum]=(inlflu[ 1,11 ]+mlflu[ 1,10])/2;
ufinean=sura(inlflu)/samtime;
avguf[ 1,popnum]=ufinean;
end; *** end o f popnum ***;
rankfk95=fk95;
fk95 [,rank(fk95)]=rankfk95;
rankfk90=fk90;
fk90[,rank(fk90)]=rankfk90;
rankavgK=avgfk;
avgfk[,rank(avgfk)]=rankavgK;
rankmf95=mf95;
mf95 [,rank(mf95)]==rankmf95;
rankmf90=mf90;
mf90[,rank(mf90)]=rankmf90;
rankavgm=avgmf;
avgmf[,rank(avgmf)J=rankavgm;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
rankr£95=rf95;
rf95[^ank(rf95)]=rankrf95;
rankrf90=rf90;
rf90[,rank(rf90)]=rankrf90;
rankavgr=avgrf;
avgrf[,rank(avgrf) ]=rankavgr;
rankuf95=uf95;
uf9 5 [,rank(uf9 5) ]=rankuf9 5;
rankuf90=uf90;
uf90[^rank(uf90)]=rankuf90;
rankavgu=avguf;
avguf[,rank(avguf)]=rankavgu;
print fk90,fk95,avgfk,mf90,mf95,avgmf,rf90,rf95,avgrf,uS0,uf95,avguf;
o oII
p87=0 ci95p87=0;
»o-*. o
VO
II
p90=0 ci95p90=0;
p92=0 ci95p92=0; ci90p92=0;
p95=0 ci95p95=0; ci90p95=0;
p98=0 ci95p98=0; ci90p98=0;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
I ll
do i = 1 to poptime;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
if avgfk[l,i]>0.98 then p98=p98+l;
if fk95[l,i]>0.98 then ci95p98=ei95p98+l;
if fk90[l,i]>0.98 then ci90p98=ci90p98+l;
if avgrf[l,i]>0.82 thenr82=r82+l;
i f rf95[l,i]>0.82 then ci95r82=ci95r82+l;
if rf90[l,i]>0.82 then ci90r82=ci90r82+l;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
113
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
114
end;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
pvaluer=](l,7,0); pvrf95=[j (1,7,0); pvrf9(H( 1,7,0);
pvaluer[l, l]=r82; pvr£95[l,l]=ci95r82 pvrf90[l, I]=ci90r82
pvaluer[l ,2]=r85; pvrf95[l,2]=ci95r85 pvrf90[l,2]=ci90r85
pvaluer[l,3]=r87; pvrf95[l,3]=ci95r87 pvrf90[l,3]=ci90r87
p valuer[ 1,4]=r90; pvrf95[l ,4]=ci95r90 pvrf90[l ,4]=ci90r90
pvaluer[l,5]=r92; pvrf95[l,5]=ci95r92 pvrf90[l,5]=ci90r92
pvaluer[ 1,6]=r95; pvrf95[l,6]=ei95r95 pvrf90[ 1,6]=ci90r95
pvaluer[i,7]=r98; p vrf95[l ,7]=ci95r98 pvrf90[l,7]=ci90r98
pvalueu=j (1,7,0); pvuf95=j(U7,0); pvuf90=j(l,7,0);
pvalueu[l,l]=u82; pvuf95[l,l]=ci95u82; pvui90[l, 1]=ci90u82
pvalueu[l,2]=u85; p vuf9 5 [ 1,2]=ci9 5u8 5 pvuf90[l,2]=ci90u85
pvalueu[l,3]=u87; pvuf95[l,3]=ci95u87 pvuf90[l,3]=ci90u87
pvalueu[l ,4]=u90; pvuf95[l ,4]=ci95u90 pvuf90[ 1,4]=ci90u90
pvalueu[l,5]=u92; pvuf95[l,5]=ci95u92 pvuf90[ 1,5]=ci90u92
pvalueu[l,6]=u95; p vuf95 [ 1,6]=ci95u95 pvu£90[l ,6]=ci90u95
pvalueu[l ,7]=u98; pvnf95[l,7]=ci95u98 pvuf90[l,7]=ci90u98
pfratio=p/fl;
npratio=n/p;
k3matrix=pvalue//pvfk90//pvfk95;
m3matrix=pvaluem//pvmf90//pvmf95;
r3matrix=pvaluer//pvrf90//pvrf95;
u3matrix=pvalueu//pvuf90//pvuf95;
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
print r3matrix[rowname=rowtype colname=percent];
print u3matrix[rowname=rowtype colname=percent];
quit;
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
117
BIBLIOGRAPHY
Archer, C. O., Jennrich, R. I. (1973). Standard errors for rotated factor loadings.
Psychometrika, 38, 581-605.
Ajrrindell, W. A., & van der Ende, J. (1985). An empirical test o f the utility o f the
observations-to-variables ratio in factor and components analysis. Applied Psychological
Measurement, 9, 165-178.
Cattle, R. B. (1978). The scientific use o f factor analysis. New York: Plenum.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. Hillsdale, NJ:
Erlbau.
Everitt, B. S. (1975). Multivariate analysis: The need for data, and other problems.
British Journal o f Psychiatry, 126, 237-240.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
118
Geweke, J. F., & Singleton, K. J. (1980). Interpreting the likelihood ratio statistic in
factor models when sample is small. Journal o f the American StatisticalAssociation, 75,
133-137.
Hong, S. (1999). Generating correlation matrices with model error for simulation
studies in factor analysis: A combination o f the Tucker-Koopman-Linn model with
Wijsman's algorithm. Behavior Research Methods, Instruments, & Computers, 31,
727-730.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
119
Kaiser, H. F. (1958). The Varimax criterion for analytic rotation in factor analysis.
Psychometrika, 23, 187-200.
Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrixes and
sample correlation matrices from an arbitrary population correlation matrix.
Psychometrika, 27, 179-182.
Kline, P. (1994). An easy guide to factor analysis. London; New York: Routledge.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
120
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in
factor analysis. Psychological Methods, 4, 84-99.
Marsh, H. W., Hau, K., Balia, J. R., & Grayson, D. (1998). Is more ever too much?
The number o f indicators per factor in confirmatory factor analysis. Multivariate
Behavioral Research, 33 (2), 181-220.
Mosier, C. I. (1939). Determining a simple structure when loadings for certain tests
are known. Psychometrika, 4, 149-192.
Tanaka, J. S. (1987). "How big is big enough?": Sample size and goodness o f fit in
structural equation models with latent variables. Child Development, 58, 134-146.
Tucker, L. R., Koopman, R. F., & Linn, R. L. (1969). Evaluation o f factor analytic
research procedures by means o f simulated correlation matrices. Psychometrika, 34,
421-459.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission .
121
Velicer, W. F., & Fava, J. L. (1987). An evaluation o f the effects o f variable sampling
on component, image, and factor analysis. Multivariate Behavioral Research, 22, 193-210.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
VITA
NAME: Tian-Lu Ke
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.