You are on page 1of 136

INFORMATION TO U SERS

This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, som e thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.

The quality o f this reproduction is dependent upon the quality of the


copy subm itted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript
and there a re missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by


sectioning the original, beginning at the upper left-hand comer and continuing
from left to right in equal sections with small overlaps.

Photographs included in the original manuscript have been reproduced


xerographically in this copy. Higher quality 6n x 9” black and white
photographic prints are available for any photographs or illustrations appearing
in this copy for an additional charge. Contact UMI directly to order.

Bell & Howell Information and Learning


300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA
800-521-0600

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF NORTHERN COLORADO

Greeley, Colorado

The Graduate School

MINIMUM SAMPLE SIZES FOR CONDUCTING


EXPLORATORY FACTOR ANALYSES

A Dissertation Submitted in Partial Fulfillment


o f the Requirements for the Degree o f
Doctor o f Philosophy

Tian-LuKe.

College o f Education
Department o f Applied Statistics and Research Methods

May 2001

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
UMI N u m b e r: 3 0 0 6 5 9 6

___ ®

UMI
UMI Microform 3006596
Copyright 2001 by Bell & Howell Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.

Bell & Howell Information and Learning Company


300 North Zeeb Road
P.O . Box 1346
Ann Arbor, Mi 48106-1346

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
© 2001

Tian-Lu Ke

ALL RIGHTS RESERVED

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
THIS DISSERTATION WAS SPONSORED

BY

^D aniel J. M ^dfrom , Ph.D.


Research Co-Advisor

Dale G. Shaw, Ph.D.


Research Co-Advisor

Tian-Lu Ke

DISSERTATION COMMITTEE

Advisory Professor
ay^rSchaffer, Ph.D: '

Faculty Representative
Charmayne B.^Cullom' Ph.D.

DEAN OF THE GRADUATE SCHOOL

Examination Date o f Dissertation H-5Qt

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
ABSTRACT

Ke, Tian-Lu. Minimum Sample Sizes for Conducting Exploratory Factor Analyses.
Published Doctor o f Philosophy dissertation, University o f Northern Colorado,
2001 .

The purpose o f this study was to investigate the relationship between the sample

size, the number o f variables, the number o f factors, the level o f communality and the

stability o f the factor structure in an exploratory factor analysis. Two minimum

necessary sample sizes for each o f .180 different conditions ( six numbers o f factors, ten

ratios o f the number o f variables to the number o f factors, and three levels o f

communality) were obtained using two values o f coefficients o f congruence (0.92 and

0.98) as criteria. There were 371,600 population correlation coefficient matrices and

37,160,000 sample correlation coefficient matrices (100 sample correlation coefficient

matrices for each population correlation coefficient matrix) generated in this study.

Three conclusions were obtained. First, the ratio o f the sample size (N) to the number o f

variables (p) may not be an appropriate index to decide the minimum necessary sample

size. In this study, when the number o f factors (f) is fixed, N and p bear an inverse

relationship to each other. Secondly, the difference in m in im um necessary sample sizes

between two different levels o f communality w ill decrease as the p/f ratio increases.

Finally, trying to give an absolute sample size is unrealistic. The minimum necessary

sample sizes for these 180 conditions range from thousands to fifty. It is impossible to

give a recommendation only based on absolute sample size. In this study, some

iii

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
guidelines about minimum necessary sample sizes for exploratory factor analysis are

presented for various p/f ratios and 6 different conditions (3 levels o f communality for

each, o f 2 different criteria).

iv

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
ACKNOWLEDGMENT

It is my pleasure to gratefully acknowledge the assistance o f my research

advisors, Dr. Dale Shaw and Dr. Daniel Mundfrom. Not only because the topic o f this

study was suggested by them, but also because o f their patient guidance and valued

assistance throughout my entire doctoral study. Grateful acknowledgement is also made

to Dr. Jay Schaffer. Without his encouragement, I could not have persisted.

I am very grateful to Dr. Charmayne Cullom, my faculty representative, for her

valuable suggestions for improving this dissertation.

Thanks also go to Dr. Ann Thomas, Kim McFann, and Dawn Strongin. They

were always so nice to me during my whole doctoral program. Also, special thanks go to

Brittany Lane for her editing. It must have been a nightmare to correcting m y writing.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
TABLE OF CONTENTS

CHAPTER Page
I. I N T R O D U C T IO N .................................................................................. 1

Statement o f the Problem . . . . . 7


Computer Procedure . . . . . . 7
Limitations . . . . . . . 9

EL REVIEW OF LITERATURE . . . . 11

The Basic Concept o f Factor Analysis . . . 11


Sources o f Error in the Common Factor Model . . 15
The Model for Population Covariances . . . 17
The Model for Sample Covariances . . . . 18
Maximum Likelihood Factor Analysis (MLFA) . . 21
Why Use MLFA . . . . . 21
How MLFA Work . . . . . 22
Simple Structure . . . . . . 27
Varimax Rotation . . . . . . 29
Procrustes Rotation . . . . . . 32

m. M E T H O D O L O G Y ................................................................................. 36

Procedure for Generating Population Correlation Matrices . 36


Conceptual Input Factor Loadings for Factors in the
Major Factor Domain . . . . . 39
Procedure o f Generating Sample Correlation Matrices . 45
Analysis Procedure . . . . . . 47
Decide the Conditions o f Population Correlation
Matrices . . . . . . 47
Decide Size o f Sample . . . . 48
Rotation Method . . . . . . 50
Coefficient o f Congruence . . . . . 51

IV. R E S U L T S .............................................................................................. 54

Factor-Orientated Section . . . . . 57
Relationship o f Sample Size to Level o f Communality . 67

vi

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
V. D I S C U S S I O N ................................................................................75

Conclusions . . . . . . . 75
Limitations and Suggestions for Further Researches . . 80

APPENDIX A . . . . . . . 83

APPENDIX B. . . . . . . . 87

BIBLIOGRAPHY . . . . . . . 1 1 7

vii

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
LIST OF TABLES

Table Page
1. The correlation matrix o f x, y x, y 2, y 3, y 4 , and y 5 . 12

2. The . 31

J. The . 33

4. The . 40

5. The . 40

6. The . 41

7. The . 42

8. The . 43

9. The . 43

10. The matrix o f A' . 44

11. The . 44

12. The . 45

13. The
unde . 56

14. The ranges o f minimum necessary sample size in 11


different conditions for p /f ratio=7 . 65

15. The range o f the maximum necessary sample sizes o f factor


numbers ranging from 2 to 6 under three levels o f
communality and two criteria when p /f ratio is equal to 7 . 71

16. Recommendation o f minimum necessary sample size with different


p /f ratios for three levels o f communality and two criteria . . 78

vm

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
LIST OF FIGURES

Figure Page
la. The minimum necessary sample sizes for one factor with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . 58

lb. The minimum necessary sample sizes for one factor with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 58

2. The m in im um necessary sample sizes for two factors with the


ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 60

3a. The minimum necessary sample sizes for three factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . 61

3b. The minimum necessary sample sizes for three factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 61

4a. The minimum necessary sample sizes for four factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . 62

4b. The minimum necessary sample sizes for four factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 62

5a. The minimum necessary sample sizes for five factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . 63

5b. The minimum necessary sample sizes for five factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . 63

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
Figure Page
6a. The minimum necessary sample sizes for six factors with the
ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion (0.98) . . . . . . 64

6b. The minimum necessary sample sizes for six factors with the
ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92) . . . . . . . 64

7a. The minimum necessary sample sizes for 4 different factor numbers
and high level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the excellent-level criterion (0.98) . . 68

7b. The minimum necessary sample sizes for 5 different factor numbers
and high level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the good-level criterion (0.92) . 68

8a. The minimum necessary sample sizes for 4 different factor numbers
and wide level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the excellent-level criterion (0.98) . . 69

8b. The minimum necessary sample sizes for 5 different factor numbers
and wide level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the good-level criterion (0.92) . 69

9a. The minimum necessary sample sizes for 4 different factor numbers
and low level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the excellent-level criterion (0.98) . . 70

9b. The minimum necessary sample sizes for 5 different factor numbers
and low level o f communality with the ratios o f variables to factors
ranging from 3 to 12 for the good-level criterion (0.92) . 70

10. The minimum necessary sample sizes for six conditions with the
related number o f variables . . . . . . . 73

R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
CHAPTER I

INTRODUCTION

Factor analysis is a statistical technique that may be used to simplify complex sets o f

data. With the advent o f powerful computers and the creation o f sophisticated software,

the use o f factor analysis has increased, especially in psychology and social science.

When we .use factor analysis or any other multivariate procedure, we make

inferences from the data we observe to a model we believe accounts for or captures the

variability in the data. The assumption is made that the information from the sample o f

observed data can reflect the information in the whole population. To a great extent, the

accuracy o f our inferences relies on the size o f the obtained sample. Thus, determining

an appropriate sample size becomes a critical matter when we plan to conduct a factor

analysis.

Regarding the appropriate sample size one should use when conducting a factor

analysis, Tanaka (1987) points out:

Unlike more familiar univariate statistical models, such as ANOVA or multiple


regression, statistical theory is not available explicitly to take into account
differences in sample size, as is done in omnibus F or t tests... These "large sample"
results buy some degree o f confidence (but not certainty) when N is large, but do not
provide a guideline about when sample sizes are large enough.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
2

In the same article, Tanaka also argued that even though statisticians can find solace

in asymptotic statistical theory, the developmental researcher using these methods is

often wondering about the relevance o f such theory for finite samples.

Ideally, the answer to the “How big is enough “ question (i.e., the minimum

necessary sample size) should be obtained theoretically. However, no theoretically

derived formula for the minimum necessary sample size has been found. Some

researchers (Girshick, 1939; Archer & Jennrich, 1973; Cudeck & O’Dell, 1994) have

investigated a connection between standard error in factor loadings and sample size by

looking for a minimum sample size that can yield stable and adequate small standard

errors o f factor loadings.

Finding the standard errors (the sampling variability) o f loadings in factor analysis

can allow researchers to determine, on the basis o f sample data, when a pattern o f zero

loadings is tenable in the population model. In simple structure, a factor loading o f zero

means this particular factor does not influence the corresponding variable.

Even though this research can not directly be used to determine the minimum

necessary sample size for factor analyses, it can provide information regarding which

characteristics o f a data set may affect the minimum sample size.

Lawley (1967) identified the asymptotic standard errors o f the unrotated loadings

produced in maximum likelihood factor analysis. Jennrich (1973) used Lawley’s (1967)

results and those o f Girshick (1939) on the asymptotic distribution o f principal

component loadings to obtain the asymptotic distribution o f the corresponding

analytically rotated loadings.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
MacCallum, Widaman, Zhang, and Hong (1999) did a thorough review o f the

sample size issue in factor analysis, concluding: “Although this effect is well-defined

theoretically and has been demonstrated with simulations, there is no guidance available

to indicate how large N must be to obtain adequately small standard errors o f loadings....”

Cudeck and O’Dell (1994) concluded that it is too difficult to derive directly the

theoretical answer when all the parts that contribute to a factor analysis are considered,

the method o f estimation, the method o f analytic rotation, the size o f the sample, the

number o f factors, the clarity o f the solution (i.e., the extent to which simple structure

exists in the variables), the degree o f correlation among the factors or among the

variables, the number coefficients estimated, and the interaction of each part.

It is generally accepted that larger samples are better (MacCallum, Widaman,

Zhang, & Hong, 1999; Kline, 1994; Cudeck & O ’Dell, 1994; Comrey & Lee, 1992;

Velicer, Peacock, & Jackson, 1982). However, if the question is changed to “ how big is

enough,” the recommendations and findings are diverse and often contradictory

(MacCallum et al., 1999).

Some researchers provide an absolute sample size. Gorsuch, (1983) recommended

that the minimum necessary sample size should not be smaller than 100. Comrey and

Lee (1992) gave a rough scale for the adequacy o f sample size: 50 - very poor, 100 -

poor; 200 - fair; 300 - good; 500 - very good; and 1000 or more - excellent. Further,

Comrey and Lee (1992) emphasized that if some other kind o f correlation coefficient

other than the Pearson product-moment correlation coefficient is used, larger samples are

needed to achieve the same level o f stability in the factor solution. Kline (1994) agreed

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
with Gorsuch’s recommendation that 1 00 subjects is the m in im u m . However, Kline

added another recommendation: that the ratio o f subjects to variables be at least 2:1.

Some researchers, like Kline (1994), consider that the ratio o f the number o f

subjects (N) to the number o f variables (p) is a better way to decide on the minimum

sample size. This recommendation seems reasonable because the more variables we

measure, the larger the sample size we should use. However, Arrindell and van der Ende

(1985) after reviewing guidelines regarding the observations-to-variables ratio in factor

analysis and component analysis concluded that these recommendations are vague. For

example, Cattle (1978) suggested a ratio o f 3 to 6 times as many observations as variables

with an absolute minimum o f about 250 observations. However, Everitt (1975) argued,

based on a Monte Carlo study, that perhaps 10 individuals for each variable may be a

sufficient ratio o f observations to variables to aim for, though even this may be rather

optimistic. He also noted that a factor analysis in which the number o f observations is

less than 5 times the number o f variables should be viewed with at least some skepticism.

Tucker, Koopman, and Linn (1969) developed a comparison on three levels o f

communality, high (0.6 ~ 0.8), wide (0.2 ~ 0.8), and low (0.2 ~ 0.4) and two different

ratios o f the number o f variables, p, to the number o f factors, f, 20/3 and 20/7. Tucker's

purpose was to study the effectiveness o f factor analytic methods. However, he found

that major differences in quality o f results were associated with fewer factors, so he

recommended that the ratio o f the number o f variables to the number o f factors should be

high.

In another Monte Carlo study, Geweke and Singleton (1980) used four different

sample sizes (10, 30, 150, and 300) to examine the behavior o f the likelihood ratio

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
5

chi-square test statistic for assessing model fit in maximum likelihood factor analysis.

They used the likelihood ratio statistic for testing the goodness o f fit o f the exploratory

factor model. In their conclusions, they argued that the likelihood ratio statistic might b e

more reliable in small samples than previously believed. The fewer factors being fit, tb e

sooner the asymptotic distribution theory becomes appropriate as sample size ns

increased, with the threshold being approximately 10 observations for one factor antd

perhaps 25 for two. They also considered that the likelihood ratio test has considerable

power even when sample size is only 10.

Browne’s (1968) comprehensive Monte Carlo investigation to compare the different

factor analysis methods, also examined the effects o f increasing the ratio o f the numbesr

o f observed variables to the number o f factors, and o f increasing sample size. He foun«d

that increasing the ratio o f the number o f observed variables to the number o f factors,

which is equivalent to increasing the number o f constraints on the population

correlations imposed by the factor analysis model, and increasing sample size, have th«

following effects:

(a) The accuracy o f the estimates of the factor loadings is increased with the increase

being greater for maximum likelihood estimates than for other estimates.

(b) The probability o f the occurrence o f a maximum likelihood communality estimate off

one is reduced.

(c) The number o f iterations required for convergence o f the maximum determinamt

computing procedure is reduced.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
In sum, these researchers demonstrated that not only is the number o f variables

related to the minimum necessary sample size, but the number o f factors is as well. In

fact, all o f these researchers indicated that the minimum sample size is related to the

variables-to-factors ratio.

Another characteristic that has also been shown to be related to m inim um sample

size is the size o f communality. Velicer, Peacock, and Jackson (1982) investigated the

effect that methods (maximum likelihood factor analysis, principal component analysis,

image component analysis) would have on the factor patterns. Two sizes o f

communality (0.3 and 0.8) and two different sample sizes (144 and 288) were compared.

All three methods performed better with larger sample sizes and with higher

communalities.

MacCallum et al. (1999) expanded Tucker et al.'s (1969) study to compare three

variables-to-factors ratios (10:3, 20:3, and 20:7) with each o f 3 different communality

ranges. Thus, MacCallum et al. used 9 different population correlation matrix conditions

with four levels o f sample size (60, 100, 200, 400) and generated 100 sample correlation

matrices under each combination o f condition and sample size.

MacCallum et al. (1999) used a coefficient o f congruence to demonstrate that

sample size, level o f communality, and ratio o f variables-to-factors all affect the recovery

o f population factors. Consequently, the minimum necessary sample size will be

different with different combinations o f variables-to-factors ratios and levels o f

communality. Although all o f these studies showed that the effect o f sample size is

related to the level o f communality and the ratio o f the number o f observed variables to

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
the number o f common factors, none o f them provided a guideline for the minimum

necessary sample size.

Tanaka (1987)_ admitted that Monte Carlo procedures could be o f some utility in

determining appropriate sample sizes. He argued, however, that even in the most

comprehensive studies done to date, only small subsets o f models have been investigated.

MacCallum et al. (1999) brought up another point. They suggested that previous

recommendations regarding the issue o f sample size in factor analysis were based on a

misconception that appropriate sample size was influenced predominately by the number

o f variables. In their research, they demonstrated that with different communality levels

and varying ratios o f variables to factors the minimum sample size required to achieve

adequate stability and recovery o f population factors is not invariant.

MacCallum and Tucker (1991) distinguished theoretically between "model error,"

that arises from the lack o f fit o f the model in the population, and "sample error," that

arises from the lack o f exact correspondence between a sample and a population.

Because MacCallum and Tuckers’ model can efficiently focus on the effect o f sample

size on both model error and sample error, this study will use their model as the

theoretical framework to investigate the relationship o f sample size to the number o f

variables, number o f factors, variables-to-facto'rs ratio, and levels o f communalities.

Statement o f the problem

The purpose o f this study is to investigate the relationship between the sample size,

the number o f variables, the number o f factors, the level o f communality and the stability

o f the factor structure in an exploratory factor analysis. Some recommendations in regard

to minimum sample size have been given in different studies, but they are limited in the

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
number o f situations they consider. Now, with recent advances in computer technology

and software, it is possible to get more specific guidelines for the minimum sample size

in a larger variety o f situations.

Computer Procedure

In order to accomplish this purpose, computer programs were written to generate a

variety o f data sets that varied in the following ways. The numbers o f factors varied from

I to 6. For each o f the number o f factors considered, 10 different ratios o f variables to

factors (p/f), ranging from 3 to 12, were used. For each combination o f number o f factors

and ratio o f variables to factors, three different levels o f communality (high 0.6, 0.7, 0.8;

wide 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8; low 0.2, 0.3, 0.4), were used. Therefore, a total o f 6

x 10 x 3 = 180 different population situations were investigated. For each o f these 180

situations, 100 population correlation matrices were generated randomly by using

Tucker’s procedure (Tucker et al., 1969). Hence, a total o f 18,000 population correlation

matrices were considered in this study. For the case o f number o f factors equal to one, a

different population correlation matrix generating procedure was used. This different

procedure is shown in Appendix A.

Then, sample correlation matrices were generated from each o f these 18,000

population correlation matrices by using a small sample size as a start point. The first

sample size used in the procedure was dependent on the number o f variables. The sample

size was then increased as follows:

(1) When sample size is less then 30, it increases by 1.

(2) When sample size is less then 100, it increases by 5.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
(3) When sample size is between 100 and 300, it increases bylO.

(4) When sample size is between 300 and 500, it increases by 50.

(5) When sample size is greater then 500, it increases by 100.

This procedure was stopped when the results o f population and sample correlation

matrices match both o f two criteria that use the coefficient o f congruence to present the

similarity o f two matrices and will be described clearly later. If any o f these two criteria

could not be matched when sample size exceeded 5000, then the procedure was also

stopped. In addition, we assumed that the population distributions were multivariate

normal distributions.

As mentioned earlier, 100 population correlation matrices and 10000 sample

correlation matrices were generated for each size of sample until the size o f sample made

the coefficients o f congruence match both o f criteria. Each o f these sample matrices was

analyzed using maximum likelihood factor analysis. The retained number o f factors was

set equal to the known number o f factors in the population (i.e. from one to six). More

detail about how to decide minimum necessary sample size will be presented in Chapter

3.

Tucker (1987) suggested guidelines for interpreting the value o f K: If K is 0.98 or

larger, then the congruence between the population and the sample is excellent; between

0.92 and 0.98 is considered “ good ” agreement; between 0.82 and 0.92 is borderline;

between 0.68 and 0.82 is poor and below 0.68 is terrible.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
10

Limitations

The first limitation is due to the nature o f the distributions used in this study. For

simplicity, all the data generated for this work have normal distributions. Consequently,

minimum sample size recommendations made here may be inappropriate for non-normal

data.

Second, Cudeck and O'dell (1994) emphasized that in addition to sample size, the

method o f rotation, the number o f factors, and the degree o f correlation among the factors

will all affect the standard errors o f the factor loadings. Therefore, it is possible that the

results o f this study will not generalize to situations using other estimation procedures

and/or different rotation methods.

A third limitation stems from the decision to not investigate the effect o f

measurement error in this study. In the common factor model, the error o f measurement

contributes to the influence o f the unique factor for a given variable. When different

amounts o f measurement error are present in the data, the minimum sample size

recommendations made here may need to be adjusted to larger values. However, there is

no apparent research dealing with the effect o f different orthogonal rotation methods.

Therefore, we do not know i f the decision to use another orthogonal rotation method will

effect the results o f this study.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
CHAPTER E

REVIEW OF LITERATURE

The Basic Concept of Factor Analysis

The general description o f the essential purpose o f factor analysis is expressed as "to

describe, i f possible, the covariance relationships among many variables in terms o f a few

underlying, but unobservable, random quantities called factors." (Johnson & Wichem,

1998, p. 514)

McDonald (1985) gave a more specific description o f this purpose. He asserted that

common factor analysis uses the partial correlation aspect o f regression theory to explain

the covariance among the variables.

When considering the effect o f the composition o f the sample, Andrew and Howard

(1993) provided an example to illustrate the importance o f range o f variable scores in the

data on factor analytic results. They measured psychological tests o f Verbal Ability,

Numerical Ability, Arithmetic Reasoning, Memory, and Perceptual Speed on two

samples. The first random sample was taken from the general population. The second

sample o f equal size to the first one consists entirely o f individuals who have an IQ o f

exactly 100. Andrew and Howard showed that a factor analysis o f the intercorrelations

o f these variables would be likely to produce a very prominent factor o f General

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
12

Intelligence from the first sample but that the second sample would fail to produce such a

factor.

In addition to Andrew and Howard's original purpose o f illustrating the effect o f

sampling, this example can help us to understand the meaning o f factors. To do this,

consider a correlation matrix from McDonald (1985) in conjunction with Andrew and

Howard’s example. Let the independent variable x be the value o f IQ, y, be the test

score o f Verbal Ability, y 2 be the test score o f Numerical Ability, y 3 be the test score o f

Arithm etic Reasoning, y x be the test score o f Memory, and y 5 be the test score o f

Perceptual Speed. All x and y t are standard measures. A correlation matrix is given in

Table 1.

Table 1
The correlation matrix o f x, y, 1_ y31_ y 4 , and yc.
1 0.9 0.8 0.7 0.6 0.5
0.9 1 0.72 0.63 0.54 0.45
0.8 0.72 1 0.56 0.48 0.40
0.7 0.63 0.56 1 0.42 0.35
0.6 0.54 0.48 0.42 1 0.3
0.8 0.45 0.40 0.35 0.3 1

Using the formula for partial correlation

. -
jk - x I
r j k ~ r j x r tx
~
_ .
k

and calculating the matrix of partial correlations between the five dependent variables

when the independent variable x is partialled out, we see that every partial correlation is

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
13

zero. That is, a single independent variable explains all the correlations in the matrix o f

correlations among the dependent variables.

Now, suppose w e had measured y l , y 2 , ..., y s , but had not chosen the measure x.

We then have the (5 x 5 ) submatrix o f Table 1 obtained by deleting the first row and the

first column. Then we' find that each correlation in the matrix is a product o f two o f a

sequence numbers, 0.9, 0.8, 0.7, 0.6, 0.5 (e.g. rl2 = 0 .9 -0 .8 , r35 = 0 .7 -0 .5 ). Hence, the

10 distinct correlations in the submatrix can be expressed as products o f pairs o f five

numbers. It needs to be noted that this situation will not be true in general o f every

(5x5) correlation matrix.

In summary, just from knowing the (5x5) correlation submatrix o f dependent

variables in Table 1, w e can deduce from the regularity o f its formation that there may

exist an independent variable, which w e have not observed, that would make all the

partial correlations zero i f that independent variable was partialled out.

Common factor analysis theory originated from the recognition o f empirical

correlation matrices that looked as if their correlations could be explained in this way.

This concept o f one-common factor can be extended to develop the concept o f multiple-

common factors. McDonald (1985) used the following expression to express the partial

correlation between y . and y k with m variables partialled out.

r rjk ~ CQi + 02^2 + ~ + W t , ) 2 2


jk-xt-xz.^cm denominator

where the independent variables xx, x2,... ,x mare mutually uncorrelated. McDonald did

not write the expression for the denominator because the only condition considered here

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
14

is when the partial


L
correlations are zero. If we assume r.t
J k - x x *.r2—xm
= 0,» then we can also

know rJk - (rjxrki + rj2rk2 +-... + rj m ) =0 which can be arranged in the form

rjk = + rJ2rn + - + rjmrbn 2-3

where j , j = l .. .n, and k = l.. .n.

The implication is that if m (uncorrelated) variables explain the n(n-l)/2 correlations

between n dependent variables, then each such correlation can be written as a sum o f m

products o f two numbers - the correlation o f each dependent variable with each

independent variable. The statement rjk ~(rjlrkl + rJ2rk2 + —+ rjmrtm) = 0 is described as

the fundamental theorem o f factor analysis.

This statement implies that there exists a number o f unobserved variables (common

factors) that explain the observed correlations, in the sense that when these are partialled

out, the partial correlations o f our observed variables all become zero. Alternatively, we

can say that each o f our observed variables can be expressed as the sum o f a (common)

part that is a regression on a number o f unobserved variables (common factors) and a

residual about that regression and that the residuals are uncorrelated.

This alternative model can be expressed as

y j = f j i x i + f/2xi + - + fjmxm + ej 2-4

where j = l, 2, ...,n; y yis the j-th observed variable; xp is the p-th common factor, p= l, 2,

..., m; ej is the residual o f about its regression on the factors (the unique factor); and

f jpis the regression weight o f on x p (the common factor loading or coefficient o f

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
variable] on factor p), together with the statement that the residuals are uncorrelated

(McDonald, 1985). The model (2-4) is the general common factor model.

Sources o f Error in the Common Factor Model

Equation 2-4 can be rewritten as

y = xCT 2-5

where the x is a row vector containing scores on common and unique factors and Q is a

matrix o f population loadings for common and unique factors. These two matrices can

be expressed as:

X = lXc>X»] 2-6

and Cl = [A, T'] 2-7

where xc is a row vector o f scores on f common factors, xu is a row vector o f scores on p

unique factors, A is a px f matrix o f population loadings for the common factors, and

4* is a pxp diagonal matrix o f population unique factor loadings.

Then the following equations can be obtained directly by substituting 2-6 and 2-7

into 2-5:

2-8

and the variance-covariance matrix o f y i ’s is

= A O A '+T2 2-9

where ® is an / x / matrix o f population correlations for common factors.

MacCallum and Tucker (1991) began their investigation from the perspective that no

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
mathematical model will fit real-world phenomena exactly. Therefore, they represented

lack o f fit in the conceptual and mathematical expression of the model that consists o f

nonlinearity and minor factors. This notion may be expressed mathematically as follows:

y =z+z 2-10

where z is a vector representing that portion o f y that is accounted for by the common

• factor model and z is a vector representing that portion o f y not accounted for by the

model.

MacCallum and Tucker (1991) emphasized that z is neither equivalent to the error

o f measurement nor equivalent to the unique factor in the model. The unique factor is

part o f the common factor model, and error o f measurement is a phenomenon that

contributes to the influence o f the unique factor for a given variable. These influences

are incorporated into the common factor model, and thus contribute to z and not to z .

Given this view , it would be appropriate to reformulate 2-8 as follows:

z = xQ' = xeA'-t-xuT 2-11

y = xcA'+x„ HP+ z 2-12

Thus, the measured variables are defined as linear combinations o f the factors plus a

portion not accounted for by the model.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
17

The Model for Population Covariances

Given that is the population covariance matrix for the measured variables,

MacCallum and Tucker (1991) also defined a matrix Z_ as the population covariance

matrix for the modeled variables. The following factorial structure for Z_ is easily

derived from equation 2-11:

T = = E{z'z) = Q Z xtQ' 2-13

Matrix Zxc is the population covariance matrix for the common and unique factors.

This matrix can be partitioned as follows:

z cc z cu
2-14
z uc z uu

Matrix ZC(.contains population variances and covariances o f the common factors;

Zuu is diagonal and contains population variances o f unique factors; Zuccontains

population covariances o f unique factors with common factors; and Zm is the transpose

o f Zuc. Without loss o f generality, MacCallum and Tucker (1991) defined all factors as

being standardized in the population, which means that all entries in are correlation

coefficients. Furthermore, by definition, unique factors must be uncorrelated with each

other and with common factors in the population. Thus, the structure o f Za simplifies to

the following form:

2-15

where <J> is an f x f matrix o f population copulations for common factors.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
18

Substituting from Equations 2-15 and 2-7 into Equation 2-13 yields the following:

2 = = AcDA'+'F2 2-16

From Equation 2-10, the following relation among covariance matrices can be

derived easily:

2 yy = 2 „ -f 2 . + 2 . + 2 ... 2-17

is defined as

A = 2 _+ 2 _ + 2 _ 2-18

a matrix o f lack-of-fit terms representing model error in the population. Then by

substituting 2-18 for 2-17, the relation between the observed covariance matrix 2yy and

the modeled covariance matrix 2_ can be obtained as:

2 W= 2 = + A S 2-19

Substituting 2-16 for 2-19 yields an expression for the factor structure o f the

observed covariance matrix:

2^, = AOA’+'F2 + AE 2-20

The Model for Sample Covariances

By using a similar procedure to obtain the expression for the population factorial

structure for 2 = , MacCallum and Tucker (1991) found C_ as:

C
^cc C cu A’
= ACccA’+ACc„T + 'f'C„eA,+'FClllI'i' 2-21
c uc c uu. ¥

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
19

MacCallum and Tucker (1991) considered the deviation from zero o f the

covariances in , Cuc, and off-diagonal entries in Cuu to give rise to one source o f

sampling error. It is useful to define a matrix Ar as a matrix o f lack-of-fit terms arising

from this phenomenon. Incorporating this lack-of-fit term into the model 2-21 yields the

following more appropriate representation:

Cs = AC^A'-t-T2 + A. 2-22

Now, considering the final step in this development: the expression o f the model in

terms o f the factorial structure o f C , which contains the sample covariances o f the

measured variables.

C>y = C = + C , + C . -hC 2-23

Because the last three terms in 2-23 are covariance matrices involving those portions

o f the measured variables that are not fit by the model, those terms represent sample

covariances that will contribute to lack o f fit o f the model to Cyy .

Ac is defined by Ac = C , + C _ + C _ 2-24

as a matrix representing this lack o f fit.

Substituting from 2-22 into 2-23 yields the following model for C' :

c „ = (ACccA'+^F2 + A .) + Ac 2-25

This model expresses the factorial structure o f C and incorporates two distinct

sources o f error: (a) A ., representing lack o f fit as a result o f sampling error arising from

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
20

nonzero sample covariances o f unique factors with, each other and with common factors;

and (b) Ac , representing lack o f fit as a result o f imperfect correspondence between

modeled variables and measured variables.

MacCallum and Tucker (1991) explained the population and sample models for

expressing sources o f error as follows:

In the population, there is one source o f error called model error. It arises from lack
o f correspondence between the model and the real world in the sense that the
measured variables w ill not be exact linear combinations o f the common and unique
factors.... In the sample, five distinct sources o f error have been identified. One
such source is model error in the sample, arising in the same manner as in the
population. In addition, there are four distinct sources o f sampling error that
influence solutions. One involves sampling variability in the common-factor
covariances. The estimates o f factor loadings are, however, affected by other
sources o f sampling error. One involves sampling error arising from nonzero
covariances o f unique factors with each other and with common factors. As noted in
the sample model,-the violation of the assumption that such covariances are zero in
the sample gives rise to a primary source o f lack o f fit o f the model. The last two
are what researcher discusses in this study. ... The final two sources o f sampling
error arise from standardization o f common factors and measured variables in the
sample. In general, overall fit would not be affected by standardization of measured
variables and common factors in exploratory factor analysis (Cudeck, 1989). ...
However, model error in the sample and error arising from nonzero sample
covariances involving unique factors will result in a poorer fit between the model
and the sample data.
MacCallum and Tucker’s (1991) models clearly show the sources o f error, which is

very useful in selecting parameter values to use in generating the population correlation

matrices. In this study, the researcher will focus on the sources o f error involving

sampling variability in the common factor covariance structure and arising from nonzero

covariances o f the unique factors with each other and with the common factors. To

investigate the effect o f sample size on these sources o f error in various population

conditions involving the number o f factors, the ratio o f variables to factors, and the level

o f communality, the assumption o f no model error in the population and sample model is

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
21

made. The procedure used to generate population correlation matrices is based on this

assumption.

Maximum Likelihood Factor Analysis (MLFA)

W hyU seM L FA

Many different methods exist to extract common factors. The method used in this

study is maximum likelihood factor analysis. Mulaik (1972, p. 162) said:

From the point o f view o f statisticians, maximum-likelihood estimators are usually


superior to other estimators in estimating population parameter. When estimating a
population parameter, i f a sufficient statistic exists to estimate the parameter, the
maximum-likelihood estimator is usually based on it. Moreover, the maximum-
likelihood estimator is a consistent estimator as well as frequently a minimum-
variance estimator. And, finally, in many cases in connection with typical
distributions the maximum-likelihood estimators are (approximately) normally
distributed with large samples.
The question is, then, Why are other methods still used? McDonald (1985) provides

four reasons to explain why.

(1) To save computing costs.

(2) A study may involve so many variables that its dimensions will not fit the

limitations o f the available maximum likelihood factor analysis computer program.

(3) The ordinary user does not even know about maximum likelihood factor analysis.

(4) The frequency o f occurrence o f Heywood cases in maximum likelihood factor

analysis estimates leads some researchers to recommend other analyses that do not

yield Heywood results.

It is obvious that the first three reasons to use other methods stem from the

limitations o f computers. However, the advent o f high-speed computers has made these

three reasons untenable today. In the case o f Heywood results, the fourth reason is

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
22

questionable. According to McDonald (1985), a Heywood result may indicate that the

study has not been well designed in the sense that not enough variables have been

included to define each factor adequately. He argues that Heywood cases are not a

reason to reject use o f maximum likelihood factor analysis, but are a caution to

researchers to re-check their models.

The reason we have chosen ML method for this study is the same as MacCallum et

al.’s (1999) reason:

Maximum likelihood estimation is based on the assumption that the common factor
model holds exactly in the population and that the measured variables follow a
multivariate normal distribution in the population, conditions that are inherent in the
simulation design and that imply that all lack o f fit and error o f estimation are due to
sampling error, which is our focus.
How MLFA Work

Lawley (1940) made a major break through with the development o f equations for

the maximum likelihood estimation o f factor loadings, and he also provided a framework

for statistical testing in factor analysis. A more condensed derivation o f the method

appeared in a book by Lawley and Maxwell (1963).

Rao (1955) related the maximum likelihood method to canonical correlation

analysis. Howe (1955) showed that the maximum likelihood estimators o f the factor

loadings derived by Lawley could be derived from a model making no distributional

assumptions about the variates, and he also provided a Gauss-Seidel computing algorithm

that, according to Mulaik (1972), was far superior to Lawley's for obtaining these

estimates.

All o f these individuals’ methods require using an iterative procedure to obtain

solutions. Lawley and Maxwell (1963) say:

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
23

It has not been found possible to establish exact conditions under which the above
procedure converges, but in practice this is usually the case. Convergence is,
however, often very slow and, as Have (1955) has pointed out, it is possible for
differences between successive iterates to be extremely small and yet to be far from
the exact solution.
Joreskog (1967) developed a new computational method, arising from private

correspondence with Lawley, which has the advantage that the iterative procedure always

converges. The maximum likelihood solution can be determined very accurately, if

desired. His follow-up work (Joreskog, 1975) demonstrated how iterations can converge

much more quickly. This method will be used in this study and its detail can be

described as follows.

The basic factor analysis model used here is,

;£ = / / + A / - h e 2-26

Where X is a column vector o f observations on p variables, // is the mean vector o f X ,

f is a vector o f f common factors, e is a vector o f p residuals, which represent the

combined effect o f specific factors and random error, and A: [ A~] is a p x f matrix o f

factor loadings.

The residuals e are assumed to be uncorrelated with each other and with the

common factors f. The dispersion or covariance matrices o f f, e, and x are denoted

respectively by O , T , and £ .

Joreskog (1975) assumed that the common factors have unit variance, so the

diagonal elements o f O are unities. If, in addition, for f > 1, the common factors are

orthogonal or uncorrelated, then the nondiagonal elements o f <t> are zeros and thus ®

becomes the identity matrix o f order f.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
24

In view o f 2-26 and Joreskog’s assumptions, Z is given in terms o f the other

matrices by the equation

Z = A O A V f'2 2-27

Equation 2-26 and 2-27 represent a model for a population o f individuals. The

parameters /u, A , <t>, and T 2 characterizing the population are usually unknown and

must be estimated from data on N individuals.

We can calculate the sample mean vector x = (xl,x,,...,xiD) where xi = — Y .r . ,

I H __ __
and the sample covariance matrix S = (S0 ) where S;j = —— ~ xj ) - The
N 1 a =l

information provided by S may also be represented by a correlation matrix R = {rtJ) and a

I-------------- *

set o f standard deviations s l , s p where s i = y]si; and r . = —— . The remaining


SiSi

estimation problem is then to fit a matrix Z o f the form 2-26 to an observed covariance

matrix S.

In the following parts, it is assumed that the number o f factors f is known in

advance. The maximum likelihood method o f fitting Z to S is to minimize

M = tr(J.-lS ) - log|z_l,s| - p 2-28

When f > 1, and there is more than one common factor, it is necessary to remove an

element o f indeterminacy in the basic model before the procedure for minimizing M can

be applied.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
25

This indeterminacy arises from the fact that there exist nonsingular linear

transformations o f the common factors that change A , and in general also <t>, but leave

E , and therefore the function unaltered. So we must impose some additional restrictions

to obtain a unique set o f estimates.

The usual way o f eliminating this indeterminacy in exploratory factor analysis is to

choose O to an identity matrix and A'HP-2A to be diagonal and to estimate the

parameters in A and HP subject to these conditions.

The minimization o f M is done in two steps. First the conditional minimum o f M

for given T is found. It can be expressed as:

mOF)= £ (iogr*+7 — 1) 2-29


m=£+I Ym

where y, > y 2 > ... > y p are the eigenvalues o f vFS’-lvF .

Then m( HP) is minimized numerically using the Newton-Raphson procedure. With

these two steps (to find the equation o fM and solve it) , we can obtain the estimates,

A = T ,Q,(rj~l —If ) U2 2-30

Let y, > y , > . . . > y p be the eigenvalues o f VFS-!HF and let vv,, w2,.. ., w p be an

orthogonal set o f corresponding eigenvectors. Let rt = diag(yi, y 2,...,yf ) and

Q t = [w I,w2,...,w/ ].

When using a computer program, however, it is not necessary to go through all o f

the mathematics procedures. The computational procedures used in this study combine

Joreskog's (1975, 1967) procedure and Johnson and Wichem's (1998) scheme:

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
26

1. Compute initial estimates o f the specific variances

~2 1 f 1
2 p r

where r" is the i-th diagonal element o f R~l .

2. Using the given T'2, compute thefirst f distinct eignevalues, y x > y 2 and

the corresponding eigenvectors, wi, w z , . . . , w f o f

R' = vF 'Ii?4/_l 2-32

Let Qi =[w i [ wz [... | w f \ be p*f matrix o f normalized eigenvectors and

Ti = diag[y x, y y f ] be the f*f diagonal matrix o f eigenvalues.

Then A can be estimated as:

A = vF r i( Q , - / ) I/2 2-33

*2 ~2 „ „ A2 ~2 „2
3. Using A , obtain a new T7 : —A A ’ . The values if/x , y/z , . . . , y / p obtained

A 2 A.

from are employed at step (2) to create a new A .

Steps (2) and (3) are repeated until convergence is achieved, i.e., until the

^2
differences between successive values o f \f/i are negligible. For example, Joreskog

(1967) suggested 0.0005 or smaller as acceptable differences between successive values

*2
o f tffl .

R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
It often happens that some \f/i become negative. This solution is inadmissable and

is said to be improper, or a Heywood case. Joreskog (1967) provided the criteria to deal

A 2

with Heywood cases. When some T , s are smaller than eu (0.01), the estimation

equation 2-33 will be replaced by

A = T’-' UxD\n- 2-34

where T is a lower triangular matrix obtained from

R~l = I T 2-35

and d x > <7, > ...> d p are the eigenvalues o f T’(R —'i,2)T = I p —T'm 2T and. iq,

u2 are an orthonormal set o f corresponding eigenvectors.

£>, = diag(dl, d 2,...,df ) and Ux = [ u l,u2,...,uf ] .

a2 A2
In this study, when some if/1 become negative, these negative if/ { will be set to

zero instead o f the procedure mentioned above. The Zero-instead-negative method is

widely used in statistical software such as SAS.

Simple Structure

After the estimation o f the factor loading matrix A px/ has been found, there is a

situation that must be mentioned. When the column number, f, is > 1, then there is some

ambiguity associated with the factor model. To demonstrate this, let T be any f x f
A A A A A A

orthogonal matrix, so that TT’=T’T=I, then:A A' = A TT' A ' = A" A*1 2-36

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
28

This creates a problem at the stage at which we wish to understand our results.

Therefore, we need more restrictions to get an unambiguous solution.

A widely accepted goal, in transforming a given factor pattern into another, is

contained in the notion o f simple structure, which was advocated by Thurstone (1935).

According to Harman (1976, p. 98), Thurstone’s original three conditions for simple

structure, were as follows (Thurstone 1935, p. 156)

1. Each row o f the factor structure should have at least one zero.

2. Each column should have at least m zeros (m being the total number o f

common factors).

3. For every pair o f columns there should be at least m variables whose entries

vanish in one column but not in the other.

Thurstone (1947) gave a general definition o f “simple structure”: “If a reference

frame can be found such that each test vector is contained in one or more o f the...

Coordinate hyperplanes, then the combined frame and configuration is called a simple

structure.” (p. 328) In the same book, the other two conditions for simple structure were

proposed as insurance that the reference hyperplanes be distinct and overdetermined by

the data. These criteria (p. 335) are as follows,

1. Each row o f the factor matrix should have at least one zeros.

2. If there are m common factors, each column o f the factor matrixshould have

at least m zeros.

3. For every pair of columns o f the factor matrix, there should beseveral

variables whose entries vanish in one column but not in the other.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
29

4. For every pair o f columns o f factor matrix, a large proportion o f the variables

should have vanishing entries in both columns when there are four orr more

factors.

5. For every pair o f columns o f the factor matrix there should be only a small

number o f variables with non-vanishing entries in both columns.

McDonald (1985) believed that these five rules, partly on the basis o f experience,

are supposed to legislate an unambiguous choice among alternative solutions that might

be equally acceptable in terms o f the fundamental definition.

Mulaik (1972) mentioned that the simple-structure criteria do not require necesssarily

orthogonal reference axes. All that these criteria require o f the m reference axes Es that

they be a set o f linearly independent vectors in the common-factor space. Mulaik ( 1972)

thought that the basic idea that in a simple-structure factor solution each varialble is

accounted for by fewer than the total number o f common factors obtained in the analysis

is not difficult to understand.

Therefore, once the estimated factor loading matrix A is obtained, it will be rotated

with the criteria o f simple structure, to make the loading more interpretable. As Johanson

and Wichem (1998, p. 546) said: rotation in factor analysis may be likened “...to

sharpening the focus o f a microscope in order to see the detail more clearly.”

Varimax Rotation

Many different methods o f rotation exist. Some present orthogonality, some d o not.

The choice o f rotation method is often subjective. In this study, the Varimax method, was

chosen to be the rotation method because it is the most commonly used method (D allas,

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
30

E. Johnson, 1998, p. 173). Paul Kline (1994, p. 68) said:

Certainly I have found that Varimax is an excellent method o f reaching orthogonal


simple structure and that in many case oblique solutions are virtually identical
because the correlation between the factors is so small as to be negligible. In
conclusion, where an orthogonal simple structure rotation is desired, Varimax should
be applied.
Varimax was proposed by Kaiser (1958). Its criteria is to maximize

2-37

where g ;j. is the loading o f i-th variable on j-th factor, hf is the communality o f i-th

variable, p is variable number, f is common factor number.

Kaiser (1958) proved that two factors could be rotated to maximize V with the

following transformation matrix,

f cos (p —sin <2^


2-38
^sin^ cos^> j

the angle (p is given by Kaiser (1958) to be the angle such that

2-39

Where x.

p p
C= X —vf ) > D ~ uivi » fij *s factor loading, and hi is i-th communality.

All sum are on i from 1 to p, p is the number o f variables.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
31

Then, using Table 2 (Hannan 1976, p. 287) we can find the angle (p that maximizes

V .

Table 2

The criteria o f (p

Numerator Denominator tan 4<p Resulting Quadrant o f 4^ Limits for (p

+ + -F I 0° —2 2 .5 °

+- - - n 2 2 .5 ° - 4 5 °

- + in - 4 5 ° -----2 2 .5

- ~r - IV —2 2 . 5 ° - 0 °

Factors are rotated two at a time as

I\. rotated —A. 2-40

where i = 1, 2, ...(f-1); j = i-t-1, i+2, ... f

The complete set o f ~ pairings o f f and p is called a cycle. After each

cycle, the value o f V is calculated. In this study, when the difference between V for

two consecutive two cycles is smaller than 0.0001, the rotation procedure will be stopped
A

and the A rotated from the final cycle will be the result.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
32

Procrustes Rotation

Mosier (1939) was the first person attempting to find a transformation o f an

arbitrary factor solution that would lead to a least-squares fit to a specified factor

structure. The equations derived by Mosier are impossible to solve algebraically.

Because o f this restriction, Mosier suggested a method for obtaining an approximate least

squares fit which has since acquired the name o f “Procrustes”.

Green (1952) solved this problem o f orthogonal rotation to a least squares fit.

Shonemann (1966) proposed a different approach which, in contrast to Green’s is

applicable to matrices A and B that are o f less than full column rank. Then, Browne

(1967) developed an effective iterative procedure for “oblique” Procrustes rotations.

The methods mentioned previously all need two matrices A and B which are fully

specified. The procedures to rotate a factor matrix, orthogonally or obliquely, to a least

squares fit to a partially specified target matrix were presented by Browne (1972, a, b).

The most frequently used criterion in deriving solutions for procrustes matching is

the minimization of the raw squared discrepancies between AT and B where T is a

transformation which when applied to the matrix A will product the greatest similarity

between AT and B. Such least squares solutions are generally useful and have good

statistical properties. However, Korth and Tucker (1976) argued that in dealing with the

similarity o f factor solutions, a proportional least squares criterion might be appropriate.

This kind o f criterion would involve normalization. To describe their idea, Korth and

Tucker give us three factor patterns:

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission .
33

Table 3

The pattern o f three factor loadings matrices

Pattern I Pattern H Pattern HI


Factors Factors Factors
1 2 3 1 2 3 1 2 3
Vars
1 0.8 0 . 0.2 0.6 0 0 0.4 0 0.1
2 0.8 0 0.2 0.7 0 0 0.4 0 0.1
3 0.6 0.6 0 0.4 0.3 0.2 0.3 0.3 0
4 0.6 0.6 0 0.3 0.3 0.3 0.3 0.3 0
5 0.4 0.6 0.4 0.7 0.8 0.2 0.2 0.3 0.2
6 0.4 0.6 0.4 0.6 0.8 0.1 0.2 0.3 0.2
7 0 0.2 0.8 0.2 0.5 0.5 0 0.1 0.4
8 0 0.2 0.8 0.1 0.4 0.4 0 0.1 0.4
Pattern II is the result o f adding at most ± 0 .4 to each o f the elements o f Pattern I.

According to Korth and Tucker (1976), this adjustment produced a strikingly different

pattern. Pattern EH contains coefficients that are half the values o f the coefficients in

Pattern I. The interpretation o f this pattern would be very similar to that o f Pattern I.

But, the sum o f the squared differences for Patterns I and II are smaller than that for

Patterns I and EH (1.34 vs. 1.38).

As Korth and Tucker (1976) noted, a factor is likely to fluctuate in its importance

from situation to situation, and hence all o f the coefficients are expected to fluctuate

along with it. A criterion that captures similarity in this context is the congruence

coefficient (Tucker & Notel, 1951). A congruence coefficient is an unadjusted

correlation coefficient, i.e., a correlation based on raw scores instead o f deviations.

To calculate the congruence coefficients, all three factors in Pattern I and Pattern IH

have congruence coefficients o f 1.0, while the factors o f Patterns I and H have

congruence coefficients o f 0.992, 0.893 and 0.749 respectively. Korth and Tucker (1976)

considered the last three coefficients as substantial, but represent poor matching. They

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
34

provided a Procrustes method based on maximizing the congruence coefficient between

factors.

This Procrustes method was generated by Mosier (1939) to obtain the approximate

solution which is based on the least squares criterion o f minimizing

tr(E’E) 2-41

where E=B-AT and diag(T’T)=I.

Korth and Tucker (1976) proved that M osier’s (1939) approximation solution to his

problem is in fact an exact solution to the problem o f maximizing the congruence

coefficient. This procedure can be separated into two parts, minimization and

normalization.

First, T* is found as

T* = (A' A)~l A'B 2-42

Then, T* is normalized to T so that diag ( T ’T ) = I.

The congruence coefficient obtained in matrix form for column k (factor k) is:

2-43

The formula 2-25 for this coefficient (f>k also can be written as:

^
/=!
fjH s 'j/jk U )
2-44

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
35

where f Ms) indicates the factor loading comes from sample, and f Mt) indicates the

loading comes from the population and is used as the target.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
CHAPTER E l

METHODOLOGY

The primary purpose o f factor analysis is to explain the matrix o f covariances or

correlations by a small number o f common factors. Tucker et al. (1969) developed a

procedure to generate simulated correlation matrices that tend to produce a fairly strong

simple structure. By specifying a major factor domain, the number o f factors in the

major domain, and the number o f variables, their procedure can easily generate a

correlation matrix.

Although Tucker et al.’s (1969) original aim was to use this procedure to study the

effectiveness o f factor analytic methods, the correlation matrices produced by this

procedure can also be used as the population correlation matrices in this study. It will

permit us to control the number o f variables, the number o f c o m m on factors, and the

level o f communality.

Procedure for Generating Population Correlation Matrices

In this study, the researcher generated 180 different combinations o f measured

variables, common factors, and levels o f communality. Tucker et al.'s (1969) procedure

can be used with the SAS/IML program to produce these 180 combinations. (Tucker et

al.’s procedure can be used only when the number o f factors is larger than or equal to

two. Hence another procedure used when the number o f factors was equal to one.)

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
37

In developing their procedure for simulating correlation matrices, Tucker et al.

considered three different types o f factors: Type 1 is major factors, Type 2 is m inor

factors, and Type 3 is unique factors. Tucker et al. used the subscript s to represent the

type o f factors.

The number o f factors o f each type was designated by M s and the factors o f each

type are designated by the subscript on such that:

ms = 1,2,3

Variables were designated by the subscript j or j' with the number o f variables being

J, thus:

j or j' = 1, 2, 3, . . . J

For each type o f factor, there is a matrix At with entries o f "actual input factor

loadings". As is a matrix o f order J x M s ( i.e. As has a row for each variable and a

column for each factor in section s o f factors.)

Then, Tucker et al. defined a matrix A] for each matrix As by adjusting the rows o f

As to unit length vectors. And a matrix Ps is defined from each A] by:

3-1

The simulated correlation matrix, R is defined by:

R —BlPlBl +■B2P2B1
2 2 2 + Bz
3 P3B.
3 3 3-2

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Where Bx, B2, and f?3 (in general Bs ) are diagonal matrices with entries bXJ, b2J and

b3J (in general bsj). These entries bsj were restricted to being real, positive numbers such

that:

3-3

The matrix As o f actual input factor loadings may be defined in terms o f the

matrices Bs and A] by:

3-4

From 3-1, 3-2, and 3-4:,

R —AiA^-hA^Ai'+AjAj1' 3-5

Coefficients in the Bs matrices are important parameters o f the simulation model.

When B2 is zero, the simulation model is identical to the formal model and Bx contains

the commonalties while B] contains the uniqueness o f the variables.

The central feature o f the simulation model is the development to the matrices As o f

"actual input factor loadings". The matrices A2 and Az for minor factor and unique

factors were set to zero, which represents the only sensible idea that the designer o f the

measures could have for these factors.

Comments about the matrix Ax for factors in the major factor domain and a

procedure used in Tucker et al. (1969) are given in the following paragraphs. There is

one thing must be mentioned. Using this procedure, the number o f variables for each

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
39

factor is not fixed. Therefore, with the same ratio o f variables to factors, the

combinations o f the number o f variables and the number o f factors could be very

different. For example, 18 variables and 3 factors will produce a p /f ratio equal to 6.

Using this procedure, two different combinations o f the number o f variables and the

number o f factors - (1) 6 variables per factor and (2) 9 variables for one factor, 8

variables for another factor, and 1 variable for the other — both can randomly be

generated in this study.

Conceptual Input Factor Loadings for Factors in the Major Factor Domain

The procedure adapted for development o f the matrices At was considered as

representing the case when only vague ideas exist about the major factor domain.

Each variable was developed independently from every other variable.

First, relative conceptual input loadings were developed for the variable that

constituted a row vector; then the vector was adjusted to unit length by a multiplying

factor. The relative conceptual loadings were developed by the following procedure.

For a f -factor major domain, the sum o f loadings for each variable was controlled

at ( / i -1)- The first loading was an integer in the range 0 through ( / , - ! ) (with equal

probability) on a randomly chosen factor. The second loading was in the range from 0

through a value o f ( f t -1 ) —a ,, (where at is the value o f the first loading,) on one o f the

remaining factors etc. It is to be noted that this procedure tended to produce a fairly

strong simple structure for the conceptual input factors.

The matrix in Table 4 is a relative conceptual loadings matrix o f 12 variables and 4

major factors, which is used to show what will result from the previous procedure.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
40

Table 4
The relative conceptual loadings matrix

0 2 1 0
0 1 1 1
1 0 2 0
0 1 1 1
1 1 1 0
0 0 2 1
3 0 0 0
0 0 0 3
0 0 Jn 0
2 0 1 0
1 1 1 0
0 0 0 3
And the following is Al , the conceptual input factor loadings o f the major factor

domain generated from the relative conceptual loadings. A{ is generated by adjusting

each row vector o f the relative conceptual loading matrix to unit length.

Table 5
The conceptual input factor loadings matrix

0 0.8944272 0.4472136 0
0 0.5773503 0.5773503 0. 5773503
0. 4 4 7 2 1 3 6 0 0.8944272 0
0 0.5773503 0.5773503 0. 5773503
0.5773503 0.5773503 0.5773503 0
0 0 0.8944272 0. 4472136
1 0 0 0
0 0 0 1
0 0 1 ' 0
0.8944272 0 0.4472136 0
0.5773503 0.5773503 0.5773503 0
0 0 0 1
After At, the conceptual input factor loadings matrix o f the major factor domain, is

generated, a three step procedure is utilized to develop the matrix Al o f actual input

factor loadings for the major factor domain from the matrix At o f conceptual input factor

loadings.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
41

First step:

In this step, the conceptual input factor loadings are combined with random normal

deviates to represent discrepancies that might occur in the actual construction o f

measuring instruments, ( a i s defined as the entry in row j and column ml o f matrix

Ax and x Jmi is a random normal deviate (/ j = 0 , cr = 1) drawn independently

corresponding to each (a1) ymi. COym, is the output from the first step and is defined by:

O i) jmx = cO
T, + d ijXj.ni (1 - < ) I/2 3-6

Where cmi is a constant for each factor mx and d xj is a constant for each variable j. The

constant d XJ is used to normalize each row of x. to a unit length vector and is defined

by:

d u = ( Z x% X U2 3-7
m.

The constants c are conceptualized as representing the general control an

experimenter has on the loading o f actual variables on the factors. Values o f c used in

Tucker et a lls study were 0.7, 0.8 or 0.9, chosen at random with equal probability for

each factor. Table 6 is the matrix o f c, used in this example.

Table 6
The matrix o f ct
0.9 0 "o’ 0
0 0.9 0 0
0 0 0.8 0
0 0 0 0.9

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
42

Using this matrix o f c, and the formula o f 3-6, the matrix o f y x is generated as
Table 7.

Table 7
The matrix o f y,
0.26301 ~0*. 658 8 971 0.2268473 0.3007285
0.1291547 0.8099287 0.1681853 0.311018
0.3145179 0.0670379 1.2948148 0.025822
0.3724287 0.6808306 0.5192825 0.3660985
0.1923219 0.3392633 0.6429454 0.1817946
-0.019473 -0.154742 1.2176335 0.5831157
1.0651629 -0.329524 0.0783715 0.225596
0.1908908 0.2127205 0.4266397 0.7893482
0.1546132 0.2256259 0.9980241 -0.307395
0.9791882 0 . 154.3488 - 0 . 124923 0.1134055
0.3173704 0.5050336 0.2188414 0.3430885
0.2254766 -0.148122 0.238238 1.1954063
Second step:

The second step uses a "skewing function" which was introduced to reduce and limit

the negativity o f the factor loadings (2'1) y„i - This function produces coefficients (z ,)ymi

as follows:

_ (1 + k ) ( y x)M [Q/t)ymi + 1Q , ) M 1+fc]


’M (2 + *)[l(j'l)>J+ *]
where k is a parameter with a value to be chosen within the range o f 0 to in fin ity,

inclusive. Tucker et al. (1969) used a value o f k=0.2 which was also used in this study.

The matrix o f K is presented in Table 8.

Using formula 3-8, the matrix o f z x is generated in Table 9. Then each vector o f

O i)M is adjusted to a unit vector by: (a')jmi = g iy(z,)ymi 3-9

1/2
where glJ -

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Table 8

The matrix o fK

0.2 0.2 0.2 0.2


0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2
0.2 0.2 0.2 0,2
0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2
0.2 03. 0.2 0.2
0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2

Using formula 3-9, the matrix o f A[ is generated in Table 10.

Table 9
The matrix o f z,

0.2249515 0.6351085 0.1894936 0.2625494


0.0980906 0.7960714 0.1336427 0.272897
0.2764246 0.0457458 1.3180302 0.0156953
0.3353101 0.6584034 0.4877324 0.3288306
0.1563278 0.3014737 0.6181873 0.1463769
-0.009679 -0.047587 1.2346271 0.5548961
1.0701504 -0.067888 0.0547832 0.1882788
0.1549702 0.1758321 0.3911519 0.7740693
0.1211048 0.1883079 0.9978744 -0.066091
0.9776171 0.1208622 -0.041942 0.0842407
0.2793026 0.4728012 0.1817369 0.3053617
0.1881629 -0.046417 0.2005912 1.2106245

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
44

Table 10
The matrix o f A'

0.3009251 0.8496057 0.2534919 0.351221


0.114362 0.9281244 0.1558115 0.3181654
0.2051277 0.0339468 0.978077 0.0116471
0.3550412 0.6971467 0.5164327 0.3481804
0.217016 0.4185091 0.8581743 0.203202
-0.007146 -0.035133 0.9115247 0.4096796
0.9817147 -0.062278 0.050256 0.1727197
0.1724965 0.1957178 0.4353891 0.8616124
0.1181723 0.183748 0.9737109 -0.064491
0.9879465 0.1221392 -0.042385 0.0851308
0.4270138 0.7228455 0.2778498 0.4668544
0.1514585 -0.037363 0.1614624 0.9744713
Third step:

The final step in developing the matrices Ax o f actual input factor loadings for the

major domains was to premultiply the matrix Ax by the matrices Bx.

Ax = BXAX 3-10

In this example, the matrix o f Bx* Bxis shown in Table 11 and the matrices Ax of

actual input factor loadings for the major domains is shown in Table 12.

Table 11

The matrix o f B,

0.447 0 0 0 0 0 0 0 0 0 0 0
0 0.632 0 0 0 0 0 0 0 0 0 0
0 0 0.447 0 0 0 0 0 0 0 0 0
0 0 0 0.447 0 0 0 0 0 0 0 0
0 0 0 0 0.632 0 0 0 0 0 0 0
0 0 0 0 0 0.447 0 0 0 0 0 0
0 0 0 0 0 0 0.447 0 0 0 0 0
0 0 0 0 0 0 0 0.547 0 0 0 0
0 0 0 0 0 0 0 0 0.547 0 0 0
0 0 0 0 0 0 0 0 0 0.632 0 0
0 0 0 0 0 0 0 0 0 0 0.547 0
0 0 0 0 0 0 0 0 0 0 0 0.632

R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
45

Table 12

The matrix o f A,

0.1345778 0.3799552 0.113365 0.1570708


0.0723288 0.5869974 0.0985438 0.2012255
0.0917359 0.0151815 0.4374093 0.0052087
0.1587792 0.3117735 0.2309557 0.155711
0.1372529 0.2646884 0.5427571 0.1285162
-0.003196 -0.015712 0.4076462 0.1832143
0.4390362 -0.027851 0.0224752 0.0772426
0.0944802 0.107199 0.2384724 0.4719245
0.0647256 0.1006429 0.5333234 -0.035323
0.6248322 0.0772476 -0.026807 0.0538414
0.2338851 0.3959188 0.1521846 0.2557067
0.0957908 -0.02363 0.1021178 0.6163097

Procedure o f Generating Sample Correlation Matrices

There are two popular methods used to generate sample correlation matrices. One

was developed by Kaiser and Dickman (1962) and the other was proposed by Wijsman

(1959). Kaiser and Dickman's (1962) procedure to produce sample correlation matrices

for a specific correlation matrix is based on the fundamental postulate o f component

analysis,

Z = XF 3-11

where F is a p xp principal components matrix of the specific correlation matrix, X is an

N xp matrix whose elements are randomly generated from a normal distribution with 0

mean and variance 1, and Z is the score matrix.

However, Hong (1998) stated that Wijsman's (1959) procedure could reduce

computing costs by generating sample correlation matrices without producing the X

matrix. (X o f order N xp is score matrix.) Therefore, Wijsman's method was used in the

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
46

procedure o f generating the sample correlation matrices from a given population

correlation matrix.

Wijsman's procedure, as reorganized by Hong (1998), is described as follows:

First, given a population correlation matrix P, a factoring can be obtained by:

P=FF 3-12

Then a matrix can be generated by:

A = FGG'F 3-13

where F is a p xp factor matrix o f P, G is a lower triangular p xp matrix, randomly

generated. The off-diagonal entries o f G are random normal deviates, drawn from a

normal distribution with mean 0 and variance 1. The diagonal element in column j is the

positive square root o f a random chi-square value with degrees o f freedom n-j, where n is

the sample size.

Then, using matrix A, the sample covariance matrix, C, can be obtained such that:

3-14
n

The covariance matrix can be rescaled to a sample correlation matrix R using:

R = D~U2CD~UZ 3-15

where D is a diagonal matrix whose elements are the corresponding diagonal entries

(i.e. variances) in the covariance matrix C.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
47

Analysis Procedure

This study is a Monte Carlo study to investigate the relationship among the number

o f measured variables, the number o f factors, the level o f communality, and the sample

sizes. The following steps are used to investigate the relationship:

Step (1) Generate 100 population correlation matrices with a given number o f

variables, number o f common factors, and level o f communality.

Step (2) Generate 100 sample correlation matrices for each o f the 100 population

correlation matrices through small sample sizes to greater sample sizes.

Step (3) Sample correlation matrices and the related population correlation matrix will

be factor analyzed and the coefficient o f congruence assessing the

correspondence between the 100 sample solutions and the their corresponding

population solution will be calculated. Then it can be found how the sample

size affects the coefficient o f congruence.

Step (4) By using two criteria, the minimum necessary sample size for each of

different condition will be given.

Each step is now discussed in more detail.

Decide the Conditions o f Population Correlation Matrices

In order to investigate the relationship among the number o f common factors, the

number o f measured variables, the level o f communality, and the sample sizes, we need

to obtain population correlation matrices that vary with respect to these.

The conditions o f population correlation matrices used in this study are listed below.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
48

1) Three levels o f communality: high: 0.6,0.7 or 0.8;wide: 0.2, 0.3, 0.4, 0.5, 0.6, 0.7 or

0.8;low: 0.2, 0.3 or 0.4.

2) Six numbers o f common factors: ranging from 1 to 6

3) Ten ratios o f p /f (p is number o f measured variables, f is number o f common factors):

ranging from 3 to 12

For each level o f communality, each number o f common factors was used to

generate population correlation matrices with 10 different numbers o f variables. Each o f

these matrices was related to a particular ratio of p/f. For example, if the level o f

communality is high, and the number o f common factors is 5, then we would generate

matrices with each o f 10 different numbers o f variables that range from 15 to 60 in

increments o f 5. Hence, we need to consider 3 (levels o f communality) times 6 (numbers

o f common factors from one to six) times 10 (p/f ratio values 3 to 12), or 180 different

conditions.

Decide Size o f Sample

Tucker et al. (1969) conducted a Monte Carlo study that provided an appropriate

procedure to generate a population correlation matrix with controlled level o f

communality, number o f common factors, and number o f measured variables. Tucker et

al. generated 18 matrices in his paper and used the 18 matrices to study the effectiveness

o f factor analytic methods.

MacCallum et al. (1999), based on Tucker et al.'s (1969) study, generated 9

population correlation matrices for different situations (3 levels o f communality: high,

wide, low and three ratio o f p/f: 10/3, 20/3 and 20/7).

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
49

For each o f these nine population correlation matrices, MacCallum et al. generated

sample correlation matrices in four levels o f sample size (60, 100, 200, 400). 100 sample

correlation matrices were produced for each level o f sample size in each o f the nine

population correlation matrices, for a total o f 3600 sample correlation matrices.

MacCallum et al.'s study provides a general conception about the effect o f sample

size. However, MacCallum et al.'s study was limited in that only one population

correlation matrix was used for each situation. To investigate the more general effect o f

sample size, it is necessary to consider several population correlation matrices in each

situation and a larger variety o f sample sizes for each population correlation matrix. To

accomplish this goal, 100 population correlation matrices will be generated for each o f

the 180 situations using Tucker et al.’s procedure (Tucker et al., 1969). Hence, 18,000

population correlation matrices are considered in this study.

Then, sample correlation matrices are generated from each o f these 18000

population correlation matrices by using a small sample size as a start point. The first

sample size used in the procedure is dependent on the number o f variables. The sample

size is then increased as follows:

(1) When sample size is less then 30, it increases by 1.

(2) When sample size is less then 100, it increases by 5.

(3) When sample size is between 100 and 300, it increases bylO.

(4) When sample size is between 300 and 500, it increases by 50.

(5) When sample size is greater then 500, it increases by 100.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
50

This procedure is stopped when the results o f population and sample correlation

matrices match both o f two criteria that use the coefficient o f congruence to present the

similarity o f two matrices and will be described clearly later. If these two criteria cannot

be matched when sample size is greater then 5000, then the procedure is also stopped. In

addition, we assume . that the population distributions were multivariate normal

distributions.

As mentioned earlier, 100 population correlation matrices and 10,000 sample

correlation matrices were generated for each size o f sample until the size o f sample made

the coefficients o f congruence match both o f the criteria. Each o f these sample matrices

was analyzed using maximum likelihood factor analysis. The retained number o f factors

was set equal to the known number o f factors in the population (i.e. from one to six).

Because the sampling procedure is based on the multivariate normal distribution

assumption, the maximum likelihood factor analysis, which is based on the assumption

that the population and measured variables follow a multivariate normal distribution, was

the method used in this study.

Rotation Method

In order to carry out a comparison between the solution obtained from each sample

correlation matrix and the solution from the corresponding population correlation matrix,

and because all sample and population solutions can be freely rotated, we need to

consider the issue o f rotation (1999, MacCallum et al). MacCallum et al. (1999) used

direct quartimin rotation, an oblique analytical rotation method, to rotate his population

solution. He thought that the relationships among the factors would be unknown in

practice, so a less restrictive oblique rotation would be more appropriate. However, the

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
51

population factors were orthogonal in Tucker et al.'s (1969) design, and just as Johnson

(1998) said:

In the initial development o f a FA model, we assume that there exists an uncorrelated


set o f underlying factors that drive or control the variables being measured. Allowing
oblique rotations seems to say that we did not really believe the assumptions o f the
initial model. What would happen if we e lim in ated the assumptions o f uncorrelated
factors from the factor analysis assumptions initially?
So, in this study, the Varimax rotation method (Kaiser, 1958), an orthogonal rotation

method, was used to rotate both the population and sample solutions.

Coefficient o f Congruence

For each o f these rotated sample solutions, we calculated a coefficient o f congruence

between each factor from the sample solution and the corresponding factor from the

population. The formula for this coefficient is,

( £ / 2j*(o ) ( £ / 2* m )
y=i y=i

where f Mt) is the population factor loading for variable j on factor k and f Jk{s) is the

corresponding sample factor loading.

To assess degree o f congruence across f factors (f is the number o f c o m m on factors

when we generated this population correlation coefficient), w e compute the mean value

o f <f>k across the f factors.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
52

This value was designated as K :

where K is the average value o f tf>k . However, f! different Ks can be obtained from a f-

factor condition by rearranging the f columns order. In this study, the maximum K value

o f these f! Ks will be used to present the most similar situation between this sample

correlation matrix’s rotated MLFA solution and the corresponding population correlation

matrix’s rotated MLFA solution. Therefore, 100 Ks will be obtained from a population

correlation matrix and its related sample correlation matrices. These 100 Ks will be

K (S} + K sq\
sorted by their values as: K (l) < K {2) <■■■< K^lQ0). Then the value o f —-—- — -— will be

used to present the 95% Cl’s lower boundary o f this population correlation matrix with

this particular sample size level. This 95% C l’s lower boundary w ill be indicated by

K 3j . Since 100 population correlation matrices will be generated in each o f 180

conditions with a specific sample size, 100 K 35 s will be obtained by this procedure.

MacCallum et al. (1999) provided an interpretation o f Tucker’s coefficient o f

congruence: 0.98 to 1.00 = excellent, 0.92 to 0.98 = good, 0.82 to 0.92 = borderline, 0.68

to 0.82 = poor, and below 0.68 = terrible.

In this study, R32 is defined as the percentage o f the number o f K gs s that are larger

than 0.92 in the 100 K 3S s from one condition with a specific sample size, and R gi is

defined as the percentage of the number o f K ss s that are larger than 0.98 in the 100

K 35 s from one condition with a specific sample size. According to Tucker’s

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
53

interpretation, R32 will be called as good-level criterion and Rss will be called as

excellent-level criterion.

The following two situations depict the “match the good-level criterion”.

(1) Three successive sample sizes’ R 92s are equal to or greater than 0.95.

(2) Two successive sample sizes’ RS2 s are equal to or greater than 0.95, the next

sample size’s R 92 is less than 0.95, and the next two successive sample sizes’

R32 s are equal to or greater than 0.95.

The following two situations depict the “match the excellent-level criterion”.

(1) Three successive sample sizes’ i?98 s are equal to or greater than 0.95.

(2) Two successive sample sizes’ Rss s are equal to or greater than 0.95, the next

sample size’s R gs is less than 0.95, and the next two successive sample sizes’

Rsg s are equal to or greater than 0.95.

Using these two matching-situations with two criteria, this study tries to provide two

minimum necessary sample sizes for each o f 180 conditions and uses these m inim um

necessary sample sizes as an index to discuss the relationship between number o f factors,

number o f variable, ratio o f variable number to factor number, and the level o f

communality.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
CHAPTER IV

RESULTS

In this study, two minimum necessary sample sizes o f each o f 180 different conditions

were obtained by using two values o f coefficients o f congruence as criteria. In order to

determine these two minimum necessary sample sizes o f each condition, various sample

sizes were used in the calculation for each condition. There were 371,600 population

correlation coefficient matrices and 37,160,000 sample correlation coefficient matrices

(100 sample correlation coefficient matrices for each population correlation coefficient

matrix) generated in this study.

The results o f this study are organized in Table 13. Table 13 shows the minimum

necessary sample sizes for each set o f conditions under two criteria; those obtained for 0.92

are considered to reflect good matching (good-level criterion) and those obtained for .98

are considered to reflect excellent matching (excellent-level criterion). When the number

o f factors is equal to one, the coefficients o f congruence used to decide these m in im u m

necessary sample sizes were calculated from unrotated maximum likelihood factor

loadings. When the number o f factors is equal to or greater than 2, the coefficients o f

congruence were calculated from rotated maximum likelihood factor loadings. There is no

minimum necessary sample size suggested in Table 13 for the excellent-level criterion

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
55

when the factor number is equal to 2 because the criterion can not be matched even when

the sample sizes are greater than ten thousand.

In order to facilitate interpretation, two different kinds o f figures, factor- and

communality- orientated figures, were generated from Table 13. The red numbers in Table

13 indicate those minimum necessary sample sizes that are not really m in im u m ^ they can

be smaller. However, due to the restriction o f the sample correlation coefficient matrix

generating procedure that requires that the sample size can not be less than the number o f

variables plus 3, those sample sizes were used as the suggested m in im u m necessary sample

size in this study. The notation FI, F 2 , ..., F6 is used to represent factor numbers o f 1, 2,. .. ,

6, respectively.

R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
56

Table 13

The minimum necessary sample sizes o f each condition under two criteria

Level of communality is HIGH


Excellent-level criterion (0.98) Good-level criterion(0.92)
p/f ratio FI F3 F4 F5 F6 p/f ratio FI F2 F3 F4 F5 F6
3 32 600 800 1000 1200 3 13 90 170 260 300 350
4 27 260 350 450 500 4 13 75 120 170 220 170
5 21 130 260 260 300 5 11 45 65 90 130 110
6 19 95 160 200 160 6 12 40 50 55 70 70
7 18 75 110 130 110 7 11 40 40 55 55 55
8 18 75 90 75 70 8 11 40 30 40 45 55
9 17 60 65 80 80 9 12 35 30 40 50 60
10 15 60 70 65 65 10 13 35 35 45 55 65
11 16 55 60 60 75 11 14 35 40 55 60 70
12 15 55 55 65 75 12 15 35 40 55 65 75
Level of communality is WIDE
Excellent-level criterion (0.98) Good-level criterion(0.92)
p/f ratio FI F3 F4 F5 F6 p/f ratio FI F2 F3 F4 F5 F6
3 110 1300 1400 1400 1600 3 35 160 450 500 700 600
4 65 350 700 900 900 4 25 90 130 240 320 300
5 50 200 300 300 350 5 30 60 80 110 140 130
6 50 140 180 200 180 6 20 55 65 75 70 100
7 40 105 160 150 130 7 20 50 55 75 65 60
8 36 90 90 130 110 8 15 45 45 50 55 60
9 33 70 85 90 100 9 15 40 40 50 50 60
10 32 75 80 85 95 10 14 35 35 45 55 65
11 36 65 75 85 95 11 14 35 40 50 60 70
12 30 70 75 85 95 12 15 35 40 50 65 75
Level of communality is LOW
Excellent-level criterion (0.98) Good-level criterion(0.92)
p/f ratio FI F3 F4 F5 F6 p/f ratio FI F2 F3 F4 F5 F6
3 150 1700 2600 3000 3800 3 45 600 1200 1200 1300 1200
4 95 450 800 1000 1400 4 35 120 230 250 400 400
5 75 220 370 430 400 5 35 75 85 170 180 160
6 70 160 190 200 260 6 30 60 85 130 120 120
7 60 100 180 170 140 7 30 60 65 75 85 80
8 55 100 100 130 130 8 23 60 60 75 80 75
9 50 85 110 100 120 9 22 50 60 60 65 70
10 50 85 90 110 110 10 20 45 40 60 60 70
11 50 75 95 95 105 11 20 45 45 50 60 70
12 50 75 85 100 110 12 20 40 40 55 65 75

R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
57

Factor-Orientated Section

Table 13 was re-organized in Figures 1 through 6. Figures la and lb show the

minim um necessary sample sizes for one factor with the ratios o f variables to factors

ranging from 3 to 12 for the excellent-level criterion (0.98) and good-level criterion (0.92).

The m in im u m necessary sample sizes decrease as the ratios o f variables to factors increase

and the levels o f communality become higher. In Figure la and lb w e see that both p /f

ratio and level o f communality affect the minimum samples sizes.

In Figure lb , the curve for the high level o f communality rises slightly after the p /f

ratio becomes larger than 7. In this study, there is a restriction that the sample size (N) and

variable number (p) must follow the rule: N-p>3. So, in some conditions, the m in im u m

sample sizes w ill increase as the p /f ratio increases. In this study, all o f the slightly rising

curves occur for the same reason as mentioned above. In this study, only one o f the rising

curves occurs when using the excellent-level criterion, all the other rising curves occur

when using the good-level criterion.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
3 4
fllow98 150 95 75 70 50 55 50 50 50 50
fl wide98 no <55 50 SO 40 36 33 32 36 30
ifl high98 . 32 27 21 19 18 18 17 15 16 15
p /f ratio

Figure la. The minimum necessary sample sizes for one factor with the ratios o f variables
to factors ranging from 3 to 12 for the excellent-level criterion (0.981.

n low92
nwide92
flhig92

0 -*=•
3 4 5 6 7 8 9 (0 II 12
fl low92 45 35 35 30 30 23 22 20 20 20
fl wide92 ! 35 25 30 20 20 15 15 14 14 15
fl hlg92 1 13 13 II 12 11 II 12 13 14 IS
p /f ratio

Figure lb. The minimum necessary sample sizes for one factor with the ratios o f variables
to factors ranging from 3 to 12 for the good-level criterion (0.92).

R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
59

Figure 2 shows the minimum necessary sample sizes o f two factors with p /f ratios from

3 to 12 using the good-level criterion (0.92). When the number o f factor is 2, the

excellent-level criterion could not be matched even with the sample size larger than ten

thousand, so there is no figure for the excellent-level criterion. As shown in Figure la and

lb, the minimum sample sizes o f factor number equal to 2 will decrease as the p /f ratio

increases or the level o f communality becomes higher.

In Figure 2, it is clear that when the p /f ratio increases, the m inim um necessary sample

sizes o f the three different levels o f communality become more alike. And, as the p /f ratios

are equal to or larger than 6, the minimum necessary sample sizes for each level o f

communality decrease very slowly. For example, under high levels o f communality, the

minimum necessary sample size decreases from 40 to 35 as the p /f ratio increases from 6 to

12-

In Figures 3 through 6, the minimum necessary sample sizes o f different factor

numbers (3, 4, 5, and 6) using the two criteria (0.98 and 0.92) are presented. The

relationships between the minimum necessary sample sizes, level o f communality, and p /f

ratio for 4 different factor numbers are similar to each other. And the m inim um necessary

sample sizes will decrease very slow ly with increasing p/f ratio as these p /f ratios are equal

to or larger than 7.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
3 4 S 6 7 8 9 10 II 12
Etav92 (CD 120 75 60 ISO (D SO 45 40
f2wide92 1<50 90 (0 55 SO 4S 40 35 35 35
aam SO 75 45 40 40 40 35 35 35 35
pvTratio

Figure 2. The minimum necessary sample sizes for two factors with the ratios o f variables
to factors ranging from 3 to 12 for the good-level criterion fO-92T

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
61

1 4 5 6 7 8 9 10 II 12
rjlo>9B 1700 4 50 220 160 10 0 10 0 85 85 75 75
11 w i d . 98 17 0 0 ISO 200 14 0 10 s 90 70 75 65 70
rjt>iih?8 ' <00 260 1JO 95 7S 75 60 60 55 55

Figure 3a. The minimum necessary sample sizes for three factors with the ratios o f
variables to factors ranging from 3 to 12 for the excellent-level criterion (0.98).

1400

1200

1000
f3Iow92
f3wide92
F3high92

0 -*=
3 4 5 6 7 8 9 10 II <2
f3lov»92 i 1200 230 85 85 i 65 60 60 40 45 40
;f3wide92 j 450 130 80 65 55 45 40 35 40 40
f3high92 i 170 120 65 50 40 30 30 35 40 40
p /f ratio

Figure 3b. The minimum necessary sample sizes for three factors with the ratios o f
variables to factors ranging from 3 to 12 for the good-level criterion f0.92).

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
62

f4 l o w 9 8
r4w ide98
fJh.ahQK

0 -
3 4 « * 7 8 9 I0 t I 12

M l ow 98 2600 800 3 70 I 90 180 100 1 10 90 95 85


f4w id c9 K ‘ 1400 700 300 ! 80 160 90 85 80 75 75
f4 b iir h 9 8 800 3 S0 260 160 I 0 90 65 70 60 55
p / f ratio

Figure 4a. The minimum necessary sample sizes for four factors with the ratios o f
variables to factors ranging from 3 to 12 for the excellent-level criterion fO.98).

f4low92
f4wide92
f4high92

0
3 4 5 6 7 8 9 10 11 12
f4low92 1200 250 170 130 75 75 60 60 50 55
f4wide92 500 240 110 75 75 50 50 45 50 50
;f4hiph92 j 260 170 90 55 55 40 40 45 55 55
p/f ratio

Figure 4b. The minimum necessary sample sizes for four factors with the ratios o f
variables to factors ranging from 3 to 12 for the good-level criterion (0.92).

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
63

3500 T.

151o\v98
f5wide98
f5high98

U-
3 4 5 6 7 8 9 10 11 12

151ow98 3000 1000 430 200 170 130 100 110 95 100
f5wide98 1400 900 300 200 150 130 90 85 85 85
£5hieh98 1000 450 260 200 130 75 80 65 60 65
p/f ratio

Figure 5a. The minimum necessary sample sizes for five factors with the ratios of
variables to factors ranging from 3 to 12 for the excellent-level criterion fO.98).

1400
j 1200
f51ow92
| 1000
t/i f5wide92
I* 800
f5high92
%
0 600
c
3S 400
1 200
u
3 4 5 6 7 8 9 10 11 12

6Iow92 1300 400 180 120 85 80 65 60 60 65


;Sw ide92; 700 320 140 70 65 55 50 55 60 65
:f5high92 300 220 130 70 55 45 50 55 60 65
prf ratio

Figure 5b. The minimum necessary sample sizes for five factors with the ratios of
variables to factors ranging from 3 to 12 for the good-level criterion (0.92T

R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
64

«
C. folovv98
£
f5wide98
(5high98

E
c3:

u
3 4 5 6 7 8 9 10 11 12

f8Iow98 3800 1400 400 260 140 130 120 110 105 110

f5wide98 1600 900 350 180 130 110 100 95 95 95

16histi98 1200 500 300 160 110 70 80 65 75 75


p'fratio

Figure 6a. The minimum necessary sample sizes for six factors with the ratios o f variables
to factors ranging from 3 to 12 for the excellent-level criterion (0.98).

f6low 92
E f6w idc92
3
£a> f6high92
<ss
4>
it
e
E

3 4 5 6 7 8 9 10 11 12

f6low92 1200 400 160 120 80 75 70 70 70 75


f6widc92 600 300 130 100 60 60 60 65 70 75
f6hieh92 350 170 110 70 55 55 60 65 70 75
p /f ratio

Figure 6b. The minimum necessary sample sizes for six factors with the ratios o f variables
to factors ranging from 3 to 12 for the good-level criterion f0.92).

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
65

Three conclusions emerge from these figures. First, higher levels o f communality for

the same number o f factors require smaller m in im u m sample sizes for each o f these two

criteria. When p /f ratios are less than or equal to 5, the minimum necessary sample size for

low levels o f communality can be triple that o f high levels o f communality or even more.

In contrast, when the p /f ratio is equal to or greater then 7, the differences in m inim um

necessary sample sizes between the three levels o f communality w ill become very small.

Table 14 shows the ranges o f the minimum necessary sample sizes under 11 different

conditions.

Table 14

The ranges o f minimum necessary sample size in 11 different conditions for p /f ratio=7.

Excellent-level criterion 0.98 Good-level criterion 0.92

Factor number=l 18-60 11-30

Factor number=2 40-60

Factor number=3 75-100 40-65

Factor number=4 110-180 55-75

Factor number=5 130-170 55-85

Factor number=6 110-140 55-80

Second, the minimum necessary sample size w ill decrease as the p /f ratio increases.

However, i f the p /f ratio is equal to or greater than 7, for any number o f factors, the

minimum necessary sample size will decrease very slowly.

Third, if the p /f ratio is equal to or greater than 6, the minimum necessary sample sizes

for the three levels o f communality will be very much alike. In addition, as the p /f ratio

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
66

increases, the minimum necessary sample sizes for the three different levels o f

communality will become even closer.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
67

Relationship o f Sample Size to Level o f Communality

Figures 7, 8, and 9 show the relationships between the p /f ratio and the m inim um

necessary sample size. Figure 7a, 8a, and 9a present these relationships for three different

levels o f communality, the excellent-level criterion, and 4 different factor numbers. Figure

7b, 8b, and 9b present these relationships for three different levels o f communality, the

good-level criterion, and 5 different factor number.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
68

f6high98
f5high98
f4high98
f3high98

3 4 5 6 7 8 9 10 1 1 1 2

f6high98 1200 500 300 160 110 70 80 65 75 75

S h ish 98 1000 450 260 200 130 75 80 65 60 65

f4hieh98 800 350 260 160 110 90 65 70 60 55

Ohish98 600 260 130 95 75 75 60 60 55 55


p /f ratio

Figure 7a. The minimum necessary sample sizes for 4 different factor numbers and high
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion fO.98).

400
f6high92

300 £5high92
f4high92
s B high92
£2hieh92

u
3 4 5 6 7 8 9 10 11 12

:f6hieh92 : 350 170 no 70 55 55 60 65 70 75

f5hieh92 300 220 130 70 55 45 50 55 60 65

r4hiEh92 260 170 90 55 55 40 40 45 55 55

Dhiah92 170 120 65 50 40 30 30 35 40 40

:Ghieh92 ’ 90 75 45 40 40 40 35 35 35 35
p /f ratio

Figure 7b. The minimum necessary sample sizes for 5 different factor numbers and high
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion fO.92).

R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
69

2000

- V- -7 V l ^r . V ^ i
E 1500
a
us
&•
1000

u
3 4 5 6 7 8 9 10 11 12

ffcideSB 1600 ; 900 350 ISO 130 110 ICO 95 95 95

6wicfc98 MOO 900 300 200 130 130 90 85 85 85

rv i± s8 MOO ; 700 300 180 160 90 85 80 75 75

Gwid=96 1300 330 200 MO 105 90 ■» 75 65 TO


pfiaio

Figure 8a. The minimum necessary sample sizes for 4 different factor numbers and wide
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion f0.98).

f 6 w td c 9
rSwidc9
f4 w id e 9 2
f3 w id e 9
f2 w id e 9

0
3 4 5 6 7 8 9 10 11 12

f6w ide9 2 600 300 130 100 60 60 60 65 70 75


fSwide92 700 320 140 70 65 55 so 55 60 65
f4wide92 500 240 1 10 75 75 50 50 45 50 50

f3wide92 450 1 30 80 65 55 45 40 35 40 40
f2w ide9 2 160 90 60 55 50 45 40 35 35 35
p / f ra tio

Figure 8b. The minimum necessary sample sizes for 5 different factor numbers and wide
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion (0.92).

R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
70

3500
f6 lo w 9 8

3000 f5 l o w 9 8
f4 t o w 9 8

2500 f3low98
i£*
I
K
2000 -

e
£ 1500 -

f6 lo w 9 8 3 8 00 1400 400 260 140 130 120 1 10 105 1I0


f5 lo w 9 8 3000 1000 430 200 170 130 I 00 I I 0 95 100
f4 lo w 9 8 2600 800 370 I 90 180 100 1 10 90 95 85
H low 98 1700 450 220 160 100 100 85 85 75 75
p / f ratio

Figure 9a. The minimum necessary sample sizes for 4 different factor numbers and low
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
excellent-level criterion fQ.gBT

[6 lo w

ra lo w
f3 l ow
f 2 l ow

f6 lo w 4 2 i 1200 400 I 60 1 20 80 75 70 70 70 75
f5 l o w 9 2 : 1300 400 I 80 1 20 85 80 65 60 60 65
f4 lo w 9 2 1 120 0 250 1 70 I 30 75 75 60 6 0 50 55
m o w 0 2 : 1200 2 30 8 5 8 5 65 60 60 4 0 4 5 40

f2 !o w 9 2 ' 600 ! 20 7 5 60 60 60 50 4 5 45 40
p / f ra tio

Figure 9b. The minimum necessary sample sizes for 5 different factor numbers and low
level o f communality with the ratios o f variables to factors ranging from 3 to 12 for the
good-level criterion ('0.921.

R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
Basically, these figures provide information similar to that found in Figures 1 through

6. However, in Figures 7, 8, and 9, there are some new phenomena that need to be

discussed. First, i f the p /f ratio is equal to or greater than 7, the m in im um necessary sample

size for the three levels o f communality and for the different factor numbers will be very

close. Table 15 shows, the ranges o f minimum necessary sample size under 6 different

conditions (3 levels o f communality and two criteria) when the p /f ratio is equal to 7. In

Table 15, it is clear that the range for higher level o f c o m m unality is smaller than the range

for the lower level o f communality.

Table 15

The range of the maximum necessary sample sizes o f factor numbers ranging from 2 to 6
under three levels o f communality and two criteria when p/f ratio is equal to 7.
High Wide Low

0.98 75-130 105-160 100-180

0.92 40-55 50-75 60-85

Secondly, when the p /f ratio is fixed, a larger number o f factors requires a larger

sample size generally. But this relationship is not always true. When the p /f ratio is greater

then 5 and using the good-level criterion (0.92), sometimes a smaller number o f factors

requires a larger sample sizes than a larger number o f factors. This probably results from

the randomization procedures used in this simulation study.

Figures 10a, 10b, 10c, lOd, lOe, and lOf present the same information as presented in

Figures 7, 8, and 9 with the p /f ratio on the horizontal axis replaced with the number of

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
72

variables. It is clear in these figures that a larger number o f factors requires a larger sample

size when the number o f variables is fixed.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
73

N v A No ve ^ ^ j. & ^ j ^

F i g u r e I Oa ihe a u m b e r o f v a r u b l c t

■^““f3wide98
® f4widc98
f5wide98
* f6wide98

the num ber o f variables

§ 2500

8 210 00
8
*.s 1000 J

F i g u r e 1 0c
the n u m b e r o f variables

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
74

“Ghi g 9 2
Oh i g h 9 2
f4high92
C5high92
fbhigh92

Figure I Od the number o f variables

■♦—-f2w tde92
« — f3w lde92
f4wfde92
*<— f5wfde92
4K— f$wlde92

Figure (Oe the num ber o f variables

1400

1000

F i g u r e lOf the n u m b er o f variables

Figure 10. The minimum necessary sample sizes for six conditions with the related number of variables.
Each of these 6 panels shows the minimum necessary sample size of one of six conditions (three levels of
communality and two criteria). The horizontal axis shows the number of variables and the vertical axis
shows the minimum necessary sample size in each condition.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission .
CHAPTER V

DISCUSSION

This study attempted to investigate the relationships between the sample size, the

number o f variables, the number o f factors and the level o f communality in factor analysis

to provide some recommendations about the minimum necessary sample size under

different conditions.

Conclusions

First, the ratio o f the sample size (N) to the number o f variables (p) may not be an

appropriate index to decide the minimum necessary sample size. Many different N/p ratios

have been proposed. Cattell (1978) suggested that this ratio should be in the range o f 3 to 6.

Nunnally (1967) offered a widely cited rule that “a good rule is to have at least ten times as

many subjects as variables”. Everitt (1975) also gave the same suggestion that the N/p

ratio should be at least 10.

In this study, when the number o f factors (f) is fixed, N and p bear an inverse

relationship to each other. When using the coefficient o f congruence criterion with fixed

factor numbers, a larger number o f variables requires a smaller m in im u m necessary sample

size and a smaller variable number requires a larger m inim um necessary sample size. The

relationship between the minimum necessary sample size and the number o f variables for a

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
76

fixed number o f factors is compensatory, not proportional. In addition, the relationship

between the minimum necessary sample size and the ratio o f variable numbers to factor

numbers is also compensatory.

Marsh et al. (1998) has presented a similar result. Their major focus is on the question

“Is more ever too much” in relation to N (sample size) and particularly p /f in confirmatory

factor analysis. They used the frequency o f fully proper, improper, and nonconverged

solutions (number o f iteration higher than 500) and standard errors as indexes to

investigate the relationship between 5 levels o f sample sizes (50, 100, 200, 400, 1000) and

5 different numbers o f indicators (variables) per factor (2, 3, 4, 6, 12).

They found that in the same level o f sample sizes, using more indicators per factor can

have “fewer nonconverged solutions, fewer improper solutions, greater interpretability

(even when solutions are improper), more accurate and stable parameter estimates, and

more reliable factors.” They concluded that there is a compensatory relationship between

sample size and the number o f indicators per factor in confirmatory factor analysis.

Secondly, the difference in minimum necessary sample sizes between two different

levels o f communality will decrease as the p/f ratio increases.

In simulation studies it is easy to generate correlation coefficient matrices with known

levels o f communality. But in practice this is not so easy. The conservative choice is to use

the minimum necessary sample size for low levels o f communality. Therefore, using a

higher p /f ratio (at least 5) will be a better choice when researchers have no prior estimate

o f the level o f communality. If it is possible, the p/f ratio should be equal to or greater than

7.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
77

Finally, trying to give an absolute sample size is unrealistic.

A widely cited recommendation is the sample size should be at least 100 (Gorsuch,

1983; Kline, 1979). In this study, using the good-level criterion (0.92) and a low level o f

communality, N =100 is not sufficient when the p/f ratio is 3 or 4 and the number o f factors

is 2, 3, 4, 5 or 6. I f the number o f factors is 4, 5, or 6, N=100 is not sufficient even if the p/f

ratio is 6.

Comrey and Lee (1992) offered a rough rating scale for adequate sample sizes in factor

analysis: 100 = poor, 200 = fair, 300 = good, 500 = very good, 1000 or more = excellent. In

fact, when the p /f ratio is equal to or greater than 7, using the excellent-level criterion (0.98)

and a low level o f communality, N = 200 is sufficient for factor numbers = 3, 4, 5, 6. If the

p/f ratio is 3 and using the good-level criterion (0.92), the minimum necessary sample sizes

for factor numbers o f 3, 4, 5, 6 and low level communality are all larger than 1000. So,

recommendations regarding absolute sample sizes should be restricted, if not avoided all

together.

The purpose o f this study was to provide some guidelines about minimum necessary

sample size o f exploratory factor analysis. Based on the figures shown in Chapter 4 and

these three conclusions in this chapter, some suggestions are made in Table 16.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
78

Table 16

Recommendation o f minimum necessary sample size with different p /f ratios for three

levels o f communality and two criteria

High level o f Wide level o f Low level of


communality communality communality

51
4^

i-n
II
P /f = 4

ii
5
N > 500 N > 900 N > 1400

Excellent-level P/f = 6 P /f = 6 P /f = 6

criterion (0:98) N > 250 N > 200 N > 260

P/f = 8 P /f = 8 P /f = 8

N > 100 N > 130 N > 130

P /f = 5 P /f = 5 P /f = 5

Good-level criterion N > 130 N >140 N > 200

(0.92)
•-*>

00
00
II

P /f = 7

ii
5
N > 55 N > 60 N > 80

In Table 16, minimum necessary sample sizes are presented for various p/f ratio and 6

different conditions (3 levels o f communality, 2 different criteria). For the excellent-level

criterion, three minimum necessary sample sizes are given with a related p /f ratio. If it is

possible, larger p /f ratios are recommended because, for each o f the three levels o f

communality, the decreasing proportion o f the change o f m inim um necessary sample size

is larger than the increasing proportion o f the change in the p/f ratio. That is, a larger p/f

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
79

ratio will have a smaller N x p /f in the same condition. For example, if the factor number is

set to be 5 with low levels o f communality, there are three choices to match the

excellent-level criterion (0.98):

(1) 20 variables with a sample size o f 1400 or more.

(2) 30 variables with a sample size o f 260 or more.

(3) 40 variables with a sample size o f 130 or more.

For the first choice, there would be at least 20 x 1400=28,000 elements in the data

matrix. For the second choice, there would be at least 30 x 260=7,800 elements in the data

matrix. And for the third choice, there are would be least 40 x 130=5,200 elements in the

data matrix. It is clear that a higher p /f ratio can dramatically reduce the volume or size o f

the data set.

When the p /f ratio is larger than 8, the m inim um sample size decreases very slowly in

both the excellent- and good-level criteria. Keeping the p /f ratio as high as 8 permits the

volume o f the data set to be small in most conditions. Regardless, if possible, it is always

desirable to have a large p/f ratio.

When using the good-level criterion and high level o f communality, there is no

recommendation for the p/f ratio equal to 8, because the limitation o f N > p+3 causes the

minimum necessary sample size to rise after the p/f ratio exceeds 7.

The p /f ratio should never be less than 3 unless extremely large samples are available.

Here “extremely large” means at least five thousand. In many cases, even twenty thousand

is not enough.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
80

Limitations and Suggestions for Further Researches

This study used simulation methods to provide two m inim um necessary sample sizes

for the excellent-level and good-level criteria. Based on the nature o f simulation, however,

these suggested m in im u m necessary sample sizes are not expected to be the actual values

for each condition. These suggestions should be used as an estimate when a researcher

tries to design maximum likelihood factor analysis studies.

In this study, the correct factor number is assumed to be known. The criterion can be

matched using smaller sample sizes with this assumption. Without this assumption, both

underfactoring and overfactoring may occur and the methods o f “how to decide the factor

number” will need to be considered further for this situation. Therefore, the questions of

“what sample size is needed to get the same number o f factors from sample and population

data under different conditions” and “which method can be used to decide the number o f

factors with a smaller sample size” still need to be addressed.

Another limitation deals with communality. There were only three levels o f

communality considered in this study. Other levels may have produced different results.

And, even if all possible combinations o f communality could have been analyzed, a

researcher cannot know which combination he/she should use unless he/she already knows

the results. No further study is suggested for this question even though the level of

communality could be categorized in a more refined manner. The reason is as mentioned

above, since it is unusual for a researcher to know the exact level o f communality in a

population, and even if all combinations o f minimum sample sizes are known, the

researcher wouldn’t know which one he/she should use. The best suggestion is to use a

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
81

high p /f ratio (> 7), then the differences between the minimum necessary sample sizes

under different levels o f communality will become very small.

One other limitation in this study is the relationship between variable numbers and

factor numbers. For example, 18 variables and 3 factors will produce a p/f ratio equal to 6.

The issue is how many variables define a factor. In this study, it can be (1) 6 variables per

factor, or (2) 9 variables for one factor, 8 variables for another factor, and 1 variable for the

other. The second situation will require a larger sample size than the first one. When the

p/f ratio is smaller then 5, it is frequently the case that one factor is related to only single

variable. A population correlation matrix in which each factor is related to a different

number o f variables will require a larger sample than a population correlation matrix in

which every factor is related to the same number o f variables, even though both these

population correlation matrices have the same p/f ratio. Therefore, even though the p /f

ratio is the same, the different combinations o f the number o f variables and the number o f

factors will require different sample sizes to match the same criterion. Using the

assumption that every factor is related to the same number o f variables, Marsh et al. (1998)

obtained their conclusions that “There was a compensatory relation between N and p /f ’ in

confirmatory factor analysis. But so far, no research has been published about “what is the

difference between the different combinations o f the variables and factors with the fixed

number o f variables and factors”.

Finally, all the suggestions made in this study result from a study o f maximum

likelihood factor analysis and the Varimax rotation method. Other factor analysis methods

are expected to have only slight difference for m inim um necessary sample size. These

differences may become larger if other rotation methods are used. However, the

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
82

relationships between the p /f ratio, level o f communality, and the m in im u m necessary

sample size should be similar.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Appendix A

The population correlation coefficient matrices generating procedure when the number of

factors is one and the source code o f the S AS/IML program which is used to perform this

procedure.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
84

Tucker’s (1969) population correlation coefficient matrix generating procedure is used


to generate multi-factor matrices. So, when factor number is equal to one, the following
procedure will be used.

(1) To generate a p x 1column vector whose elements are randomly generated by following

rules:

(a) For high level o f communality: elements are randomly picked from V 0 .6 , V 0 .7 ,

and V 0 .8 .

(b) For wide level o f communality: elements are randomly picked from V 0 .2 , V 0.3 ,

4 0 A , V 0 5 , 7 0 6 , V o /7 , and V 0 8 .

(c) For low level o f communality: elements are randomly picked from V 0 .2 , V 0.3 ,

and V 0 . 4 .

(2) To use this column vector as an factor pattern and multiple its transpose vector to get a

p x p matrix.

(3) To make the diagonal element o f this matrix equal to 1 then a population correlation

coefficient matrix is generated for a specific level of communality.

The SAS/IML program for this procedure is attached to following pages.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
dm ' l o g ; c l e a r ; o u t p u t ; c l e a r ; ' ;

proc iml;
start buildpop(fl,p,bltype) global(bl,bll,b33,mlflpopl,popcorr) ;

bl=i(p);
b3=i(p);
mlflpopl=j(p,fl,0);

if (bltype=l) then do i = 1 to p;
b h = u n i f o r m (-1)*3;
if (0 <= bh 6 bh < 1) then bl[i,i] = sqrt (0.6);
else if (1 <= bh & bh < 2) then bl[i,i] = s q r t (0.7);
else if (2 <= bh & bh < 3) then bl[i,i] = sqrt(0.8)
b3[i,i] = sqrt(1-bl[i,i]*bl[i,i]);
end; ++** end of if (bltype=l) *+**;

if (bltype=2) then do i = 1 to p;
bw=uniform(-l)*7;
if (0 <= bw & bw < 1) then bl[i,i] = s q r t (0.2);
else if (1<= bw & bw < 2) then bl[i,i] = s q r t (0.3)
else if (2<= bw & bw < 3) then bl[i,i] = s q r t (0.4)
else if (3<= bw & bw < 4) then bl[i,i] = sqrt(0.5)
else if (4<= bw & bw < 5) then bl[i,i] = s q r t (0.6)
else if (5<= bw & bw < 6) then bl[i,i] = s q r t (0.7)
else if (6<= bw & bw < 7) then bl[i,i] = sqrt(0.8)
b3[i, i]=sqrt(1-bl[i, i] *bl[i,i]);
end; *** end .of if (bitype=2) * + *+;

if (bltype=3) then do i = 1 to p;
b l = u n i f o r m (-1)*3;
if (0 <= bl & bl < 1) then bl[i,i] = s q r t (0.2);
else if (1 <= bl & bl <-2) then bl[i,i] = sqrt (0.3)
else if (2 <= bl & bl < 3) then bl[i,i] = sqrt(0
b3[i,i] = sqrt (1-bl [i, i] *bl [i, i] ) ;
end; *** end of if (bltype=3);

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
86

do i = 1 to p;

mlflpopl[x, f 1]=bl[i,i] ;
end;

bll=bl*bl;
b33=b3*b3;

popcorr=mlflpopl*mlflpopl'+b33;

finish buildpop;
fl=l; bltype=l; p=4;
run buildpop(fl,p,bltype);
print bl,bll,b33,mlflpopl,popcorr;
quit ;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
87

Appendix B

The source code and flowchart o f the SAS/IML program which was used in this study for

factor numbers equal to or greater than 2.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
Flow chart

Decide p, f, N and level o f


communality.

Generated 100 sample


correlation coefficient
Generate 100 population
matrices for each o f 100
correlation coefficient
population correlation
matrices.
coefficient matrices.

Use MLFA and varimax


rotation method to get the
solution o f each the Use MLFA and varimax
population correlation rotation method to get all
coefficient matrices. samples’ solution.

Use each population solution as a target to


calculate the 95% CI’s low bound value, K-95
and get R.92 and R-98from 100 K-95’s.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
dm'log;clear;output;clear;';

option linesize=120;

proc iml;

start buildpop(fI,p,bltype,k) global(popcorr);

bl=i(p);

b3=i(p);

if (bltype=l) then do i = 1 to p;
bh=uniform(-1) *3;
if (0 <= bh & bh < 1) then bl[i,i] = sqrt(0.6);
else if (1 <= bh & bh < 2) then b l[i,i] = sqrt(0.7);
else if (2 <= bh & bh < 3) then b l[i,i] = sqrt(0.8);
b3[i,i] = sqrt(l-bl[i,i]*bl[i,i]);
end; **** end o f i f (bltype=l) ****;

if (bltype=2) then do i = I to p;
bw=unifonn(-1)*7;
if (0 <= bw & bw < 1) then bl[i,i] = sqrt(0.2);
else if (1 <= bw & bw < 2) then b l[i,i] = sqrt(0.3)
else if (2 <= bw & bw < 3) then b l[i,i] = sqrt(0.4)
else if (3 <= bw & bw < 4) then b l[i,i] —sqrt(0.5)
else if (4 <= bw & bw < 5) then b l[i,i] = sqrt(0.6)
else if (5 <= bw & bw < 6) then b 1[i,i] = sqrt(0.7)
else i f (6 <= bw & bw < 7) then b l[i,i] = sqrt(0.8)
b3 [i,i]=sqrt(l-b 1[i,i]*b I [i,i]);
end; *** end o f i f (bltype=2) ****;

if (bltype=3) then do i = 1 to p;
bl=uniform(-1)* 3 ;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
i f (0 <= bl & bl < 1) then bl[i,i] = sqrt(0.2);
else if (1 < = b l & bl < 2) then bl[i,i] = sqrt(0.3);
else if (2 <= bl & bl < 3) then b 1[i,i] = sqrt(0.4);
b3[i,i] = sqrt(l-b 1[i,i]*b 1[i,i]);
end; *** end o f if (bltype=3);

b ll= b l* b l;

b33=b3*b3;
* * * * * * * : i c * * * * * J l c * = ( c : * : : t : * * * * * * * * * * * * j | c : | c : t e : | e : ( : : | e : ( c : t : : t e : t : * : | e ! | e : t c : t c : t : : ( c : ( c s t c : t : S(c

** SA1 is simple strucure factor loading matrix **


** which is a p*fl matrix **
** by Tucker 1969 **

IA l= j(p,fl,l);
do i = 1 to p ;
resele=fl;
odd=fl;
usevec=l:fl;
do j = 1 to f l ;
order=int(uniform(-1) *resele+l);
fload=usevec [ 1,order];
i f j< fl then do;
putin==int(uniform(-l)!f:odd);
IA1 [i,fload]=putin;
odd=odd-p utin;
end; *** end o f ifj< fl ***;
else if j= fl then IAl[i,fload]=(odd-l);
usevec=remove(usevec,order);
resele=resele-l;
end; *** end o f do j = 1 ***;
end; *** end o f i = 1 to p ***;

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
iaia=IAl*IA r;
diaia=diag(iaia);
dia=diaia##0.5;
idia=inv(dia);
SAl=idia*IAl;

** ** * * * * * * ** * * afe* * * * * * * * * ** ** ** * * ** * * * * * * ** * * * * * * * * .
»
** by 1968 LINN'S paper, equ(22) can trasfer **;
** SA1 to the actual input factor loading (ALFL) **;
** then we premultiply AIFL with B l **;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

**************************************************
** A1C is a fl* fl diagonal matrix **
** ( is Cm l in 1969 Tucker eq(8) ) **
** which is used to present the general **
** control an experimenter has on the **
** loading o f actual variables on the factor **
*** sfc* ******* *****=!:***=f=********* =t=**** ***:***** s****=t=*
A lC =I(fl);
do i = 1 to fl;
t = int(uniform(-l)*3);
if t = 0 then c=0.7;
else if t = 1 then c=0.8;
else i f t = 2 then c=0.9;
AlC[i,i]=c;
end; *** end o f do i = 1 ***;

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
9
** xran is a p*fl matrix **;
** which presents the random affect **;
** on each input loading **;
** and premutiply (1-A1CA2) to present **;
** the affect **;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
xran=3 (p ,fl,l);
do i = 1 to p;
do j = 1 to f l ;
xran[ij]=normal(-l);
end; *** end o f j = 1 to fl ****;
end; *** end o f i = 1 to p ***;

sic*************************************************.
9

** d il is used to standardize xran *****


St*************************************************.
9

xransq=xran*xran';
invdidi 1=diag(xransq);
invdil=root(invdidil);
di 1=inv(invdi 1);
y 1=S A1 * A1 C+di 1*xran*root(I(fl )-A l C*A1 C);

* * * * * ate* * * * afe* * * * ** * * * ** * * * * * * * * * * * *** ** ** ** * * * * * ** * .


9

** matrixK is used in eq(10) 1969 Tucker **;


**************************************************.
9

matrixK=j (p,fl ,k);

* * * * * * * * * * * * * * * * * * * * * * * * * * * * *

** to find z for SA1 **;


****************************.

z 1=(( 1+k)/(2+k))* (y 1#(y l+abs(y 1)+matrixk))/(abs(y 1)+matrixk);

************************************.
9

** g il is used to standardize z l **;


************************************.
9

zlsq = z l* z l';
invgigi 1=diag(z 1sq);
invgi 1=root(invgigi 1);
gi 1=inv(invgi 1);

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
I* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .

** TA1 is actual input factor loading for the major domains **;
9

T A l= g il* zl;

** TA3 is actual input factor loading for the unique domains **;
********************************** ****************************** . 9

TA3=i(p);

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .

** TA1 TA2 TA3 are A1 star(Al *) 3k*. 9

A2star(A2*) **
A3star(A3*) **
** in 1969 Tucker's paper **
* *
FA1 means final A1 (major)
which is A1 in Tucker's paper **
FA2 means final A2 (minor) **
** which is A2 in Tucker's paper **
FA3 means final A3 (unique) **
** which is A3 in Tucker's paper **
sfe * * * * * * sfc * sfe afe * * s * * * * sfe * * * * * * * * * * * * * * * a{e a*c * * * * 4s

F A l=bl*T A l;
FA3=b3*TA3;
popcorr=FAl *FA1 '+FA3 *FA3';

finish buildpop;

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

*******************************************************************
start buildsam(popcorr,n,p) global(samcorr);

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
***************************************************.
** the following processes are try to ***;
** to use Wijsman method 1959 to do the ***;
** smae thing as Kaiser but reduce the the computing cost ***;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

ag=i(p);
do i = 1 to p;
do j = 1 to i;
i f j < i then ag[iJ]=normal(-l);
i f j = i then ag[ij]=rangam(-l,(n-j)/2);
end;
end;
ifrnew=root(popcorr);
fmew==ifmew';
samA=frnew*ag*ag' *frnew';
cfiroot=samA/n;
dcfp=diag(cffoot);
dcfprt=root(dcfp);
idcfprt=inv(dcfprt);
samcorr=idcfprt*cfiroot*idcfprt;

finish buildsam;

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

start mlfapop(popcorr,fl) global(mlflpop);


dat=popcorr;
mlcri=0.00001;
p=ncol(dat);
fiium=fl;
Hcri=0.00005;
sinv=inv(dat);
dsinv=diag(sinv);
invsii=inv(dsinv);
phipop=( 1-fhum/(2*p)) *invsii;

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
do until (cri<mlcri);
phirt=phipop##0.5;
invphi=inv(phipop);
invphirt=invphi##0.5;
rstar=4nvphirt*dat*invphirt;
call eigen(vals,vecs,rstar);
vals=diag(vals);
keepnum=l
keep val=vals [keepnum,keepmim] ;
keepvec=vecs[,keepnum];
inum=i(fhum);
m]flpop^hirt*keepvec*((keepval-mum)##0.5);
resM=dat-mlflpop*mlflpop';
newphi=diag(resM);
do posphi = 1 to p;
i f newphi[posphi,posphi]<0 then newphi[posphi,posphi]=0;
end;

diffi=abs(newphi-phipop);
cri=max(diff);
phipop=newphi;
end;
finish mlfapop;

start mlfasam(samcorr,fl) global(mlflsam);


dat=samcorr;
mlcri=0.00001;
p=ncol(dat);
fhum=fl ;
Hcri=0.00005;
sinv=inv(dat);
dsinv=diag(sinv);
invsii=inv(dsinv);

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
phisam=( 1-fhum/(2*p))*invsii;
do until (cri<mlcri);
phirt=phisam##0.5;
invphi=inv(phisam);
mvphirt=invphi##0.5;
rstar4nvphirt*dat*invphirt;
call eigen(vals,vecs,rstar);
vals=diag(vals);
keepnum=l :£hum;
keep val=vals [keepnum,keepnum];
keepvec=vecs [,keepnum];
inum=i(fiium);
mlflsam=phirt*keep vec*((keep val-inum)##0.5);
resM=dat-mlflsam*mlflsarn’;
newphi=diag(resM);
do posphi = 1 to p;
if newphi[posphi.posphi]<0 then newphi[posphi,posphi]=0;
end;

diff=abs(newphi-phisam);
cri=max(diff);
phisam=newphi;
end;
finish mlfasam;

S ic * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
9

* * 5fc :fcsfe 3fe * * * * * : * * * * e * e * * * * * * * * * * * * * * * * * c * sic * * * * * * * * * * * * :fe * * * * .


9

start ropop(mlflpop,p) global(popfacpa);


popfacpa=mlflpop;
fiium=nco l(p op facp a);
crit=0.05; *********** crit is the criteria for v-value ******-
*********** use 0 1 ******.
prev=-l;
count=0;

allhhmax=](p, 1,0);

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
97

do i = 1 to p ;
do j —1 to fhum;
allhhmax[i, 1]=allhhmax[i, 1]+popfacpa[i j]##2;
end;
end;
allhxnax=allhhmax##0.5; **** because w e need to use the all hA2 ****■
**** but hmax is used for only two column ****;

do until(vcrit < crit) ;


heigh=l :p;

do i = 1 to (fnum-l);
do j = (i+1) to fhum;
tempfp=popfacpa[heigh,i][|popfacpa[heigh,j];
fircol=temp£p[heigh, l]/allhmax; ************ fir sec is x/h y/h ******-^
seccol=tempfp [heigh,2]/allhmax;
tempfpn=fircol||seccol;
uxxyy=fircol#fircol-seccol#seccol;
vxy=2*fircol#seccol;
uvc=uxxyy#uxxyy-vxy#vxy;
uvd=2*uxxyy#vxy;
asumu=uxxyy[-r,];
bsumv=vxy [+,];
csumuvc=uvc [+, ];
dsumuvd=uvd[-i-,];
tan4=(dsumuvd-2*asumu*bsumv/p)/(csumuvc-(asumu*asumu-bsumv*bsumv)/p);
foursida=atan(tan4);
if (dsumuvd-2*asumu*bsumv/p) > 0 then do;
if (csumuvc-(asumu*asumu-bsumv*bsumv)/p) > 0 then sida=foursida/4;
if (csumuvc-(asumu!(casumu-bsumv*bsumv)/p) < 0 then do;
foursida=foursida+3.1415926;
sida=(foursida)/4;
end;
end;
if (dsumuvd-2*asumu*bsumv/p) < 0 then do;
if (csumuvc-(asumu*asumu-bsumv*bsumv)/p) < 0 then do;

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
foursida=foursida-3.1415926;
sida=foursida/4;
end;
if (csumuvc-(asumu*asumu-bsumv*bsurnv)/p) > 0 then sida=foursida/4;
end;
csida=cos(sida); ssida=sin(sida); nssida—ssida;
trans=3 (2 ,2 ,0 );
trans[l,l]=csida; trans[l,2]=nssida; trans[2,l]=ssida; trans[2,2]=csida;
tempg=tempfpn*trans; *** tempg is only two columns which is rotated this time
tempgi=tempg[, l]#allhmax;
tempgj=tempg[,2]#allhmax;
popfacpa[,i]=tempgi;
popfacpa[,j]=tempgj;
end;
end;
hforg=j(p,fnum,0);
do i = 1 to fnum;
hforg[,i]=allhmax;
end;
ghforv=popfacpa/hforg;
g4=ghforv##4;
g4row=g4[+,];
g4sum=sum(g4ro w );
g2=ghforv##2;
g2sump=g2[+,];
g2sump2=g2sump##2;
g22sum=sum(g2sump2);
newv=p*g4sum-g22sum;
vcrit=newv-prev;
prev=newv;
count=count+l ;
angel=sida/3.14*180;
ang4sida=foursida/3.14* 180;
end;

finish ropop;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
99

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
J

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

start rosam(mIflsam,p) global(samfacpa);


* s(c * * * 3fc * * * * * * * 3f£ * * : 4c * * * * * * * * * * * * *c *e *c * * * * * * * * * * * * * * * :.
5

** varimax rotation for samcorr ***;


5

samfacpa=mlflsam;
fhum=ncol(samfacpa);
crit=0.05; *********** crit is the criteria for v-value ******;
*********** use 0 1 *******
prev=-l;
count=0;

allhhmax=]'(p, 1,0);
do i = 1 to p ;
do j = 1 to fhum;
allhhmaxfi, 1]=allhhmax[i, 1]+samfacpa[i j]##2;
end; *** end o f do j = 1 *****;
end; *** end o f do i == 1 ********;
allhmax=alllihmax##0.5; ** because we need to use the all hA2 ****;
** but hmax is used for only two column ****;

do until(vcrit < crit) ;


heigh=l :p;
do i = 1 to (fhum-1);
do j = (i+1) to fhum;
tempfp=samfacpa[heigh,i]||samfacpa[heighj];
fircol=tempfp[heigh,l]/allhmax; *** fir sec is x/h y/h ******;
seccol=tempfp [heigh, 2]/allhmax;
tempfpn=fircol| |secco 1;
uxxyy=fircol#fircol-seccol#seccol;
vxy=2*fircol#seccol;
uvc=uxxyy#uxxyy-vxy#vxy;
uvd=2*uxxyy#vxy;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
100

asumu=uxxyy [+, ];
bsumv=vxy [+,];
csumuvc=uvc[+,];
dsumuvd=uvd[+-,];
tan4=(dsumuvd-2*asumu*bsumv/p)/(csumuvc-(asumu*asurnu-bsumv*bsumv)/p);
foirrsida=atan(tan4);
if (dsumuvd-2*asumu*bsumv/p) > 0 then do;
if (csumuvc-(asumu*asumu-bsum.v*bsumv)/p) > 0 then sida=foursida/4;
if (csumuvc-(asumu*asumu-bsumv*bsumv)/p) < 0 then do;
foursida=foursida+3.14;
sida=(foursida)/4;
end; ** end o f do in if* * ;
end; ** end o f do in if **;
if (dsumuvd-2*asumu*bsuinv/p) < 0 then do;
if (csumuvc-(asumu*asumu-bsumv*bsnmv)/p) < 0 then do;
foursida=foursida-3.14;
sida=foursida/4;
end; ** end o f do in if* * ;
i f (csuxnuvc-(asumu*asumu-bsumv*bsumv)/p) > 0 then sida=foursida/4;
end; ** end o f do in if**;
csida=cos(sida); ssida=sin(sida); nssida—ssida;
trans=j(2 ,2 , 0 );
trans[l,l]=csida; trans[l,2]=nssida; transpJJ^ssida; trans[2,2]=csida;
tempg=tempfpn*trans; * tempg is only two columns which is rotated this time *;
tempgi=tempg[, 1]#allhmax;
tempgj=tempg[,2]#allhmax;
samfacpa[,i]=tempgi;
samfacpafj]=tem pgj;
end; ** end o f do j = **;
end; ** end o f do i = **;
hforg=j(p,fhum,0);
do i = 1 to fhum;
hforg[,i]=allhmax;
end;
ghforv=samfacp a/hforg;
g4=ghforv##4;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
g4row=g4[+,];
g4sum=sum(g4row);
g2=ghforv##2;
g2sump=g2 [+,];
g2sump2=g2sump##2;
g22sum=sum(g2sump2);
newv=p*g4sum-g22sum;
vcrit=newv-prev;
prev=newv;
count=count+1;
angel=sida/3.14*180;
ang4sida=foursida/3.14* 180;
end; ** end o f do until **;

finish rosam;
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

************************************************************
start procru(samfacpa,popfacpa,fl) global(factorK,coefF);
sama=samfacpa;
popb=popfacpa;
asama=sama' *sama;
iasama=inv(asama);
Tstar=iasama*sama' *popb;
itt=Tstar' *Tstar;
ditt=diag(itt);
sditt=ditt##0.5;
isditt=inv(sditt);
goodT=Tstar*isditt;
rotsam=sama*goodT;
sami=rotsam';
sqrotsam=rotsam##2;
sqpopb=popb##2;
sqsumsam=sqrotsam[+,];
sqsumpop=sqpopb[+,];
coefF = j(l,fl,0);
do i = 1 to fl;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
upr=sami[i,] *popb [,i];
samXpop=sqsumsam[ 1,i] *sqsumpop [1 ,i];
downx=samxpop##0.5;
coefF[l,i]=upr/downr;
end;
factorK=sum(coefF)/fl;

finish procru;
*

start coefinl(mlflpop,mlflsam,fl) global(coefM,coefsamM,adjmat);


ma=mlflpop;
mb=mlflsam;
matra=ma';
sqmb=mb##2;
sqma=ma##2;
sumsqma=sqma[+,];
sumsqmb=sqmb [+,];
coefsamM=g(l>fl >0);
posim=3(l>fi,0);
adjm=3( l , f l , 0 );
do i = 1 to f l ;
upm=matra[i,] *mb [,i];
aXb=sumsqma[ 1,i] *sumsqmb [l,i];
downm=aXb##0.5;
coefsamM[l ,i]=upni/downm;
end;
posim=abs(coefsamM);
adjm=posim/coefsamM;
adjmat=diag(adjm);
coefM=sum(posim)/fl;
finish coefinl;

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .

R ep ro d u ced with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
start coefinlr(popfacpa,samfacpa,fl,rotime,fbmatrix) global(coe£R,allcoR);
ra=popfacpa;
rb=samfacpa;
ratra=ra';
nn=fbmatrix;
sqrb=rb##2;
sqra=ra##2;
sumsqra=sqra[+,];
sumsqrb=sqrb[+,];
coefsamR=j( 1 fl ,0);
a!lcoR=3 (1 ,rotime,0);
do rr = 1 to rotime;
seq=rm[rr,];
do i = 1 to fl;
bbb=seq[l,i];
uprr=ratra[i,] *rb [,bbb];
aXbr=sumsqra[ 1,i] *sumsqrb [ 1,bbb];
downrr=aXbr##0.5;
coefsamR[ 1,i]=uprr/downrr;
end;
posr=abs(coefsamR);
coefRR=sum(posr)/fl;
allcoR[ 1,rr]=coefRR;
end;
coefR=max(allcoR);

finish coefinlr;

* * * * * * * * * * * * % * * * * 9fe * * 3*C s fe 4 c * * * * * sfe * * * * * * * * * * * sfe * * sfe * * * * * * * * * .


9

start prU(mlflsam,popfacpa,fl) global(coefU,coefUmr);


samar=m 1fl sam;
popbr=popfacpa;
asamar=samar' *samar;

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
iasamar=inv(asamar);
Tstarr=iasamar*samar' *popbr;
ittr=Tstarr' *Tstarr;
dittr=diag(ittr);
sdittr=dittr##0.5;
isdittr=inv(sdittr);
goodTr=Tstarr*isdittr;
rotsamr=samar*goodTr;
samii=rotsarnr';
sqsamr=rotsamr##2;
sqpopr=popbr##2;
sqsumsr=sqsamr[+,];
sqsumpr=sqpopr[+,] ;
coefU m r^ (1 ,fl ,0);
do i = 1 to fl;
upru=samir[i,]*popbr[,i};
samXpopu=sqsumsr[ 1,i]*sqsumpr[ 1,i];
do wnru=samXpopu##0.5;
coefUmr[ 1,i]=upru/downru;
end;
coefU=sum(coe£Umr)/fl;

finish prU;
* % * * * * * * * fe * * * % * * * * * * * * * * * * * * * * * * * * * * * * sfe * * * * * * * * * * * * * * * * sfe * .
5
* * * ** * sfe *= * * * * 4c * * sfe * * * * * * * * * * * * * * * * * * * * * * * * * * * %* * * * * * * * * * * * * .
j

fb2={l 2,
2 1};

* ************* ******* ******* ******* ******* ********** ****=(£.


J
**** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
9
ins3={3,
3},-

fb21=fb2[,l];
fb22=fb2[,2];

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
fb3_l=ins3||fb21||fb22;
fb3_2=fb21|[ins3||fb22;
fb3_3=fb21||fb22||ins3;

fb3=fb3_l//fb3_2//fb3_3;

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * :£

ins4=j(6,l,4);

fb31=fb3[,l];
fb32=fb3[,2];
fb33=fb3[,3];

fb4_l=ins4||fb3 I||fb32||fb33;
fb4_2=fb31||ins4j|fb32||fb33;
fb4__3=fb31||fb32||ins4||fb33;
fb4_4=fb31||fb32[|fb33||ms4;

fb4=fb4_l//fb4_2//fb4_3//fb4_4;

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .
J

* * * * * * * * * * * * * * # * * * * * * * * * * * * * * sfe * * * * * * * * * * * * * * * * * * * * * * .
j

ins5=3'(24,l,5);

fb41=fb4[,l];
fb42=fb4[,2];
fb43=fb4[,3];
fb44=fb4[,4];

fb5_l=ins5||fb41||fb42||fb43||fb44;
fb5 2=fb41||ms5|jfb42||fb43||fb44;

R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
fb5_3=fb41|[fb42||ins5||fb43||fb44;
fb5_4=fb41 ||fb42||fb43 ||ins5||fb44;
fb5_5=fb4111fb42| [fb431| fb44| [ins5;

fb5=fb5_l//fb5_2//fb5_3//£b5_4//fb5_5;
row5=nro w(fb5);

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

ins6=g(120,l,6);

fb51=fb5[,l];
fb52£=fb5[,2];
fb53=fb5[,3];
fb54=fb5[,4];
fb55=fb5[,5];

fb6_l=ins6||fb51|[fb52||fb53||fb54||fb55;
fb6_2=£b51||ms6||fb52||fb53!|fb54||fb55;
fb6_3=fb51||fb52||ins6||fb53||fb54|ifb55;
fb6_4=fb51||fb52||fb53||ins6||fb54||fb55;
fb6_5=fb51||fb52||fb53|jfb54||ins6||fb55;
fb6_6=fb51||fb52||fb53||fb54[|fb55||ms6;

fb6=fb6_l//fb6_2//fb6_3//fb6_4//fb6_5//fb6_6;

row6=nro w(fb6);

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * .

fl =6;
P=18;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
107

bltype=2;
k=0.2;
samtime=100;
poptime=100;

if fl= 2 then do; fbmateix=fb2; rotime=2; end;


if fl= 3 then do; fbmatrix=fb3; rotime=6; end;
if fl= 4 then do; fbmatrix=fb4; rotime=24; end;
if fl= 5 then do; fbmatrix=fb5; rotime=120; end;
if fl= 6 then do; fbmatrix=fb6; rotime=720; end;

do n = 1500 to 2500 by 100;

if bltype=l then print, f l, " this is high level of communality", p, n;


else if bltype=2 then print, f l , " this is wide level o f communality", p, n;
else i f bltype=3 then print, f l , " this is low level o f communality", p, n;
avgfk=j (1 ,poptime,0);
fkSS^XUpoptun.6*!));
fk90=j (1,poptime,0);
avgmf^'(l>P°ptime,0);
mf95=j(i>P°ptime,0);
m f90=j(l,poptime,0);
avgrf=j( 1,poptime,0);
rf95=3(hpoptime,0);
rf90=3 (1,pop time,0);
avgu^jX^popti1116^);
ufPS^X^Popti1116^);
uf90=3(l,poptime,0); '

do popnum = 1 to poptime;

run buildpop(fl,p,bltype,k);
run mlfapop(popcorr,fl);

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
run ropop(mIflpop,p);

fk=j (1 ,samtime,0);
mlflm=] (1 ,samtime,0);
mlflr=3 (1 ,samtime, 0);
mlflu=3 (1,samtime,0);

do samnum = 1 to samtime;
run buildsam(popcorr,n,p);
run mIfasam(samcorr,fl);
run coefinI(mlflpop,mlflsam,fI);

mlflsam=mlflsam*adjmat;

run rosam(mlflsam,p);

runprocru(samfacpa,popfacpa,fl);

run coefmlr(popfacpa,samfacpa,fl ,rotime,fbmatrix);

run prU(mLflsam,popfacpa,fl);

fk[ 1^amnumj^factorK;
mlflm[l,samnum]=coefM;
mlfIr[l,samnum]=coefR;
mlflu[l ,samnum]=coefU;

end; ** end o f sam num **;

rankfk=fk;
fk[,rank(fk)]=rankflc; *** sort fk ********;
fk95[l,popnum]=(fk[l,6]+fk[l,5])/2; **** vector o f 95% o f K *****
fk90 [ 1,popnum]=(fk[ 1,11 ]+fk[ 1,10])/2;
fkmean=sum(£k)/samtime; ***** mean o f K ****;
avgfk[l,popnum]=fkmean; *** vector o f mean o f K ***;

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
rank m fM = m Iflm ;
mlflmLrank(mlflm)]=rankni£M;
mi95 [ 1,popnum]=(mlflm[ l,6]+mlflm[ 1,5])/2;
mf90[ 1,popnum]=(mlflm[ 1,1 l]+mlflm[ 1,10])/2;
mfmeaa=sum(inlflm)/samtime;
avgmf[ 1,popnum]=mfrnean;

rankRR=mlflr;
mIflr[,rank(inlflr)]=rankRR;
rf95[l,popnum]=(mlflr[l ,6]+mlflr[l ,5])/2;
rf90[ 1,popnum]=(mlflr[ 1,11 ]+mlflr[ 1,10])/2;
rfinean=sum(nilflr)/samtime;
avgrf[l ,popnum]=rfinean;

rankuu=mlflu;
mlflu[,rank(inlflu) ]=rankuu;
uf95 [ 1,popnum]=(mlflu[ 1,6]+mlflu[ 1,5])/2; .
uf90[ 1,popnum]=(inlflu[ 1,11 ]+mlflu[ 1,10])/2;
ufinean=sura(inlflu)/samtime;
avguf[ 1,popnum]=ufinean;
end; *** end o f popnum ***;

rankfk95=fk95;
fk95 [,rank(fk95)]=rankfk95;
rankfk90=fk90;
fk90[,rank(fk90)]=rankfk90;
rankavgK=avgfk;
avgfk[,rank(avgfk)]=rankavgK;

rankmf95=mf95;
mf95 [,rank(mf95)]==rankmf95;
rankmf90=mf90;
mf90[,rank(mf90)]=rankmf90;
rankavgm=avgmf;
avgmf[,rank(avgmf)J=rankavgm;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
rankr£95=rf95;
rf95[^ank(rf95)]=rankrf95;
rankrf90=rf90;
rf90[,rank(rf90)]=rankrf90;
rankavgr=avgrf;
avgrf[,rank(avgrf) ]=rankavgr;

rankuf95=uf95;
uf9 5 [,rank(uf9 5) ]=rankuf9 5;
rankuf90=uf90;
uf90[^rank(uf90)]=rankuf90;
rankavgu=avguf;
avguf[,rank(avguf)]=rankavgu;
print fk90,fk95,avgfk,mf90,mf95,avgmf,rf90,rf95,avgrf,uS0,uf95,avguf;

p82=0 ci95p82=0; ci90p82=0;


p85=0 ci95p85=0; ci90p85=0;
o ~~4
$VO £00
VO

o oII

p87=0 ci95p87=0;
»o-*. o
VO

II

p90=0 ci95p90=0;
p92=0 ci95p92=0; ci90p92=0;
p95=0 ci95p95=0; ci90p95=0;
p98=0 ci95p98=0; ci90p98=0;

m82=0 ; ci95m82=0; ci90m82=0;


m85=0 ; ci95m85=0; ci90m85=0
m87=0 ; ci95m87=0; ci90m87=0
m90=0 ; ci95m90=0; ci90m90=0
m92=0 ; ci95m92=0; ci90m92=0
m95=0 ; ci95m95=0; ci90m95=0
m98=0 ; ci95m98=0; ci90m98=0

r82=0; ci95r82=0; ci90r82=0;


r85=0; ci95r85=0; ci90r85=0;
r87=0; ci95r87=0; ci90r87=0;
r90=0; ci95r90=0; ci90r90=0;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
I ll

r92=0; ci95r92=0; ci90r92=0;


r95=0; ci95r95=0; ci90r95=0;
r98=0; ci95r98=0; ci90r98=0;
u82=0; ci95u82=0; ci90u82=0;
u85=0; ci95u85=0; ci90u85=0;
u87=0; ci95u87=0; ci90u87=0;
u90=0; ci95u90=0; ci90u90=0;
u92=0; ci95u92=0; ci90u92=0;
u95=0; ci95u95=0; ci90u95=0;
u98=0; ci95u98=0; ci90u98=0;

do i = 1 to poptime;

if avgfk[l,i]>0.82 then p82=p82+l;


i f flc95[l,i]>0.82 then ci95p82=ci95p82+l;
i f fle90[l,i]>0.82 then ci90p82=ci90p82+l;

i f avgfk[l,i]>0.85 then p85=p85+l;


i f fk95[l,i]>0.85 then ci95p85=ei95p85+l;
i f fk90[l,i]>0.85 then ci90p85=ci90p85+l;

i f avgfk[l,i]>0.87 then p87=p87+l;


if fk95[l,i]>0.87 then ci95p87=ci95p87+l;
if fk90[l,i]>0.87 then ci90p87=ci90p87+l;

i f avgfk[l,i]>0.90 then p90=p90+l;


if fk95[l,i]>0.90 then ci95p90=ei95p9CH-l;
if fk90[l,i]>0.90 then ci90p90=ci90p90+l;

i f avgfk[l,i]>0.92 then p92=p92+l;


if fk95[l,i]>0.92 then ci95p92=ci95p92+l;
if fk90[l,i]>0.92 then ci90p92=ci90p92+l;

if avgfk[l,i]>0.95 then p95=p95+l;


if fk95[l,i]>0.95 then ci95p95=ci95p95+l;
i f fk90[l,i]>0.95 then ci90p95=ci90p95-H;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
if avgfk[l,i]>0.98 then p98=p98+l;
if fk95[l,i]>0.98 then ci95p98=ei95p98+l;
if fk90[l,i]>0.98 then ci90p98=ci90p98+l;

if avgmf[l,i]>0.82 then m82=m82+l;


if mf95[l,i]>0.82 then ci95m82=ci95m82+l;
if m f90[l,i]>0.82 then ci90m82=ci90m82+l;

if avgmf[l,i]>0.85 then m85=rm85+l;


i f m f95[l,i]>0.85 then ci95m85=ci95m85+l;
if mf90[l,i]>0.85 then ci90m85=ci90m85+l;

i f avgmf[l,i]>0.87 then m87=m87+l;


if mf95[l,i]>0.87 then ci95m 87=ci95m87+l;
if mf90[l,i]>0.87 then ci90m87=ei90m87+l;

if avgmf[l,i]>0.90 then m90=m90+l;


if m f95[l,i]>0.90 then ci95m90=ci95m90+l;
if m f90[l,i]>0.90 then ci90m90=ci90m90+l;

if avgmi[l,i]>0.92 then m92=m92+l;


if m£95[l,i]>0.92 then ci95m92=ei95m92+l;
if m£90[l,i]>0.92 then ci90m92=ci90m92+l;

if avgmf[l,i]>0.95 then m95=m95+1;


if mf95[l,i]>0.95 then ci95ra95=ci95m95+l;
if mf90[l,i]>0.95 then ci90m95=ci90m95+l;

if avgmf[l,i]>0.98 then m98=m98+l;


if mf95[l,i]>0.98 then ci95m98=ci95m98+l;
if mf90[l,i]>0.98 then ci90m98=ci90m98+l;

if avgrf[l,i]>0.82 thenr82=r82+l;
i f rf95[l,i]>0.82 then ci95r82=ci95r82+l;
if rf90[l,i]>0.82 then ci90r82=ci90r82+l;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
113

if avgrf[l,i]>0.85 then r85=r85-H;


if rf95[l,i]>0.85 then ci95r85=ci95r85+l;
if rf90[l,i]>0.85 then ci90r85=ci90r85+l;

i f avgrf[l,i]>0.87 then r87=r87+l;


if rf95[l,i]>0.87 then ci95r87=ci95r87+l;
if rf90[l,i]>0.87 then ci90r87=ci90r87+l;

if avgrf[L,i]>0.90 then r90=r90+l;


if rf95[l,i]>0.90 then ci95r90=ci95r90+l;
if rf90[l,i]>0.90 then ci90r90=ci90r90+l;

if avgrf[l,i]>0.92 then r92=r92+l;


if rf95[l,i]>0.92 then ci95r92=ci95r92+l;
if rf90[l,i]>0.92 then ci90r92=ci90r92+l;

if avgrf[l,i]>0.95 then r95=r95+l;


if rf95[l,i]>0.95 then ci95r95=ci95r95+l;
if rf90[l,i]>0.95 then ci90r95=ci90r95+l;

if avgrf[l,i]>0.98 then r98=r98+l;


if rf95[l,i]>0.98 then ci95r98=ci95r98+l;
if rf90[l,i]X).98 thenci90r98=ci90r98-M;

if avguf[l,i]>0.82 then u82=u82+l;


if uf95[l,i]>0.82 then ci95u82=ci95u82+l;
if uf90[l,i]>0.82 then ci90u82=ci90u82+l;

if avguf[l,i]>0.85 then u85=u85+l;


if uf95[l,i]>0.85 then ci95u85=ci95u85+l;
if uf90[l,i]>0.85 then ci90u85=ci90u85+l;

if avguf[l,i]>0.87 then u87=u87+l;


if uf95[l,i]>0.87 then ci95u87=ci95u87+l;
if uf90[l,i]>0.87 then ci90u87=ei90u87+l;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
114

if avguf[l,i]X).90 then u90=u90+l;


if u£95[l,i]>0.90 then ci95u90=ci95u90+l;
if uf90[l,i]>0.90 then ci90u90=ei90u90+l;

if avguf[l,i]>0.92 then u92=u92+l;


if uf95[l,i]>0.92 then ci95u92=ci95u92+l;
if uf90[l,i]>0.92 then ci90u92=ci90u92+l;

if avguf[l,i]>0.95 then u95=u95+l;


if uf95[l,i]>0.95 then ci95u95=ci95u95+l;
if uf90[l,i]>0.95 then ci90u95=ci90u95+l;

if avguf[l,i]>0.98 then u98=u98+l;


if uf95[l,i]>0.98 then ci95u98=ci95u98-H;
if uf90[l,i]>0.98 then ci90u98=ci90u98-H;

end;

pvalue=^(1.7,0); pvfk95=3(l,7 >0); pvfk90^j(l,7,0);


pvalue[l,l]=p82 pvfk95[l,l]=ci95p82 pvfk90[l, I]=ci90p82;
pvalue[l,2]=p85 pvfk95[l,2]=ci95p85 pvfk90[l,2]=ci90p85;
pvalue[l,3]=p87 pvfk95[l,3]=ci95p87 pvfk90[l,3]=ci90p87;
pvalue[l,4]=p90 p vfk95 [ 1,4]=ci95p90 pvfk90[ 1,4]=ci90p90;
pvalue[l,5]=p92 pvfk95[ 1,5]=ci95p92 pvfk90[l,5]=ci90p92;
pvalue[l,6]=p95 pvfk95[l,6]=ci95p95 pvfk90[l,6]=ci90p95;
pvalue[l,7]=p98 pvfk95[l,7]=ci95p98 pvfk90[l,7]=ci90p98;

pvaluem=j (1,7,0); pvmf95=](l.>7,0); pvmf90^j( 1,7,0);


pvaluem[l,l]=m82; pvmf95 [1,1 ]=ci95m82 pvmf90[ 1,1 ]=ci90m82
pvaluem[ 1,2]=m8 5; pvmf95 [ 1,2]=ci95m85 pvmf90[ 1,2]=ci90m85
pvaluem[l ,3]=m87; pvm f95[l,3]=ci95m 87 pvm f90[l,3]=ci90m87
pvaluem[l,4]=m90; pvmf95 [1,4]=ci95m90 pvmf90[l,4]=ci90m90
pvaluem[ 1,5]=m92; pvm f95[l,5]=ci95m 92 pvmf90[ 1,5]=ci90m92
pvaluem[ 1,6]=xn95; pvm f95[l,6]=ci95m 95 pvmf90[l,6]=ci90m95
pvaluem[l ,7]=m98; pvm f95[l,7]=ci95m98 pvmf90[l,7]=ci90m98

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
pvaluer=](l,7,0); pvrf95=[j (1,7,0); pvrf9(H( 1,7,0);
pvaluer[l, l]=r82; pvr£95[l,l]=ci95r82 pvrf90[l, I]=ci90r82
pvaluer[l ,2]=r85; pvrf95[l,2]=ci95r85 pvrf90[l,2]=ci90r85
pvaluer[l,3]=r87; pvrf95[l,3]=ci95r87 pvrf90[l,3]=ci90r87
p valuer[ 1,4]=r90; pvrf95[l ,4]=ci95r90 pvrf90[l ,4]=ci90r90
pvaluer[l,5]=r92; pvrf95[l,5]=ci95r92 pvrf90[l,5]=ci90r92
pvaluer[ 1,6]=r95; pvrf95[l,6]=ei95r95 pvrf90[ 1,6]=ci90r95
pvaluer[i,7]=r98; p vrf95[l ,7]=ci95r98 pvrf90[l,7]=ci90r98
pvalueu=j (1,7,0); pvuf95=j(U7,0); pvuf90=j(l,7,0);
pvalueu[l,l]=u82; pvuf95[l,l]=ci95u82; pvui90[l, 1]=ci90u82
pvalueu[l,2]=u85; p vuf9 5 [ 1,2]=ci9 5u8 5 pvuf90[l,2]=ci90u85
pvalueu[l,3]=u87; pvuf95[l,3]=ci95u87 pvuf90[l,3]=ci90u87
pvalueu[l ,4]=u90; pvuf95[l ,4]=ci95u90 pvuf90[ 1,4]=ci90u90
pvalueu[l,5]=u92; pvuf95[l,5]=ci95u92 pvuf90[ 1,5]=ci90u92
pvalueu[l,6]=u95; p vuf95 [ 1,6]=ci95u95 pvu£90[l ,6]=ci90u95
pvalueu[l ,7]=u98; pvnf95[l,7]=ci95u98 pvuf90[l,7]=ci90u98

pfratio=p/fl;
npratio=n/p;

if bltype=l then print, f l , " this is high level o f communality", p, n;


else if bltype=2 then print, f l, " this is wide level o f comrnunality", p, n;
else i f bltype=3 then print, f l , " this is low level o f communality", p, n;
print pfratio,npratio;

percent={p82 p85 p87 p90 p92 p95 p98};


rowtype={mean ci90 ci95};

k3matrix=pvalue//pvfk90//pvfk95;
m3matrix=pvaluem//pvmf90//pvmf95;
r3matrix=pvaluer//pvrf90//pvrf95;
u3matrix=pvalueu//pvuf90//pvuf95;

print k3 matrix [rowname=ro wtyp e colname=percent];


print m3matrix[rowname=rowtype colname=percent];

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
print r3matrix[rowname=rowtype colname=percent];
print u3matrix[rowname=rowtype colname=percent];

end; **** end o f do n 9

quit;

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
117

BIBLIOGRAPHY

Archer, C. O., Jennrich, R. I. (1973). Standard errors for rotated factor loadings.
Psychometrika, 38, 581-605.

Ajrrindell, W. A., & van der Ende, J. (1985). An empirical test o f the utility o f the
observations-to-variables ratio in factor and components analysis. Applied Psychological
Measurement, 9, 165-178.

Browne, M. W. (1967). On oblique procrustes rotation. Psychometrika 33, 267-334.

Browne, M. W. (1968). A comparison o f factor analytic techniques. Psychometrika


33, 267-334.

Browne, M. W. (1972a). Orthogonal rotation to a partially specified target. British


Journal o f Mathematical and Statistical Psychology, 25, 115-120.

Browne, M. W. (1972a). Oblique rotation to a partially specified target. British


Journal o f Mathematical and Statistical Psychology. 25, 207-212.

Cattle, R. B. (1978). The scientific use o f factor analysis. New York: Plenum.

Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. Hillsdale, NJ:
Erlbau.

Cudeck, R. (1989). Analysis o f correlation matrices using covariance structure


models. Psychological Bulletin, 105, 317-327.

Cudeck, R., & O'Dell, L. L. (1994). Applications o f standard error estimation in


unrestricted factor analysis: Significance tests for factor loadings and correlations.
Psychological Bulletin, 115, 475-487.

Everitt, B. S. (1975). Multivariate analysis: The need for data, and other problems.
British Journal o f Psychiatry, 126, 237-240.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
118

Geweke, J. F., & Singleton, K. J. (1980). Interpreting the likelihood ratio statistic in
factor models when sample is small. Journal o f the American StatisticalAssociation, 75,
133-137.

Girshick, M. A. (1939). On the sampling theory if roots o f determinatal equation.


Annals o f Mathematical Statistics, 10, 203-224.

Gorsuch, R. L. (1983). Factor analysis (2nd ed.) Hillsdale, NJ: Erlbaum.

Green, B. F. (1952). The orthogonal approximation o f an oblique simple structure in


factor analysis. Psychometrika. 17, 429-440.

Harman, H. H. (1976). Modem factor analysis (3rd ed.). Chicago: University o f


Chicago Press.

Hong, S. (1999). Generating correlation matrices with model error for simulation
studies in factor analysis: A combination o f the Tucker-Koopman-Linn model with
Wijsman's algorithm. Behavior Research Methods, Instruments, & Computers, 31,
727-730.

Howe, W. G. (1955). Some contributions to factor analysis. Report No. ORNL-1919,


Oak Ridge National Laboratory, Oak Ridge, Tennessee.

Jennrich, R. I. (1973). Standard error for obliquely rotated factor loadings.


Psychometrika, 38, 593-604.

Jennrich, R. I., & Robinson, S. M. (1969). A Newton-Raphson algorithm for


maximum likelihood factor analysis. Psychometrika. 34, 111-123.

Johnson, D. E. (1998). Applied multivariate methods for data analysis. California:


Pacific Grove.

Johnson, R. A., & Wichem, D. W. (1998). Applied multivariate statistical analysis.


Upper Saddle River, NJ: Prentice Hall.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
119

Joreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis.


Psychometrika, 32, 443-482.

Joreskog, K. G. (1975). Factor analysis by least-squares and maximum-likehood


methods. In Statistical Methods for Digital Computers, edited by Enslein, K., Ralston, A.,
and Wilf, H. S. New York: John Wiley.

Kaiser, H. F. (1958). The Varimax criterion for analytic rotation in factor analysis.
Psychometrika, 23, 187-200.

Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrixes and
sample correlation matrices from an arbitrary population correlation matrix.
Psychometrika, 27, 179-182.

Kline, P. (1994). An easy guide to factor analysis. London; New York: Routledge.

Korth, B., & Tucker, L. R. (1976). Procrustes matching by congruence coefficients.


Psychometrika, 41, 531-535.

Lawley, D. N., & Maxwell, A. E. (1963). Factor analysis as a statistical method.


London: Butter-worths.

Lawley, D. N. (1940). The estimation o f factor loadings by the method o f maximum


likelihood. Proceedings o f the Royal Society o f Edinburgh, 60,64-82.

Lawley, D. N. (1967). Some new results in maximum likelihood factor analysis.


Proceedings o f the Royal Society o f Edinburgh, 67 A, 256-264.

Linn, R. L. (1968). A Monte Carlo approach to the number o f factors problem.


Psychometrika 33, 37-71.

MacCallum, R. C., & Tucker, L. R. (1991). Representing sources o f error in the


common-factor model: Implications for theory and practice. Psychological Bulletin, 109,
502-511.

R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
120

MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in
factor analysis. Psychological Methods, 4, 84-99.

Marsh, H. W., Hau, K., Balia, J. R., & Grayson, D. (1998). Is more ever too much?
The number o f indicators per factor in confirmatory factor analysis. Multivariate
Behavioral Research, 33 (2), 181-220.

McDonald, R. P. (1985). Factor analysis and related methods. Hillsdale, NJ:


Erbaum.

Mosier, C. I. (1939). Determining a simple structure when loadings for certain tests
are known. Psychometrika, 4, 149-192.

Mulaik, S. A. (1972). The foundations o f factor analysis. New York: McGraw-Hill.

Nunnally, J. C. (1967). Psychometric theory. New York: McGraw-Hill.

Rao, C. R. (1955). Estimation and tests o f significance in factor analysis.


Psychometrika, 20, 93-111.

Schonemann, P. H. (1966). A generalized solution o f the orthogonal Procrustes


problem. Psychometrika, 31, 1-10.

Tanaka, J. S. (1987). "How big is big enough?": Sample size and goodness o f fit in
structural equation models with latent variables. Child Development, 58, 134-146.

Thurstone, L. L. (1935). The vectors o f mind. Chicago: University o f Chicago Press.

Thurstone, L. L. (1947). Multiple-factor analysis. Chicago: University o f Chicago


Press.

Tucker, L. R., Koopman, R. F., & Linn, R. L. (1969). Evaluation o f factor analytic
research procedures by means o f simulated correlation matrices. Psychometrika, 34,
421-459.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission .
121

Tucker, L. R. (1951). A method o f synthesis o f factor analysis studies (Personnel


Research Section Report No. 984). Washington, D. C.: Department o f the Army.

Velicer, W. F., & Fava, J. L. (1987). An evaluation o f the effects o f variable sampling
on component, image, and factor analysis. Multivariate Behavioral Research, 22, 193-210.

Velicer, W. F., Peacock, A. C., & Jackson, D. N. (1982). A comparison o f component


and factor pattern: A Monte Carlo approach. Multivariate Behavioral Research, 17,
371-388.

Wijsman, R. A. (1959). Applications o f a certain representation o f the Wishart matrix.


Annals o f mathematical statistics, 30, 597-601.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
VITA

NAME: Tian-Lu Ke

BIRTH: April 23, 1970, Taipei, Taiwan

EDUCATION: 1992 B .A. National Taiwan University.


Major: Electric Engineering.

1995 M.A. National Taiwan University.


Major: Biomedical Engineering.

2001 Ph.D. University o f Northern Colorado.


Major: Applied Statistics and Research Methods.

R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.

You might also like