You are on page 1of 23

Feature Article

Exploratory Factor Analysis as a


Construct Validation Tool:
(Mis)applications in Applied
Linguistics Research
HOSSEIN KARAMI
University of Tehran
Factor analysis has been frequently exploited in applied
research to provide evidence about the underlying factors in
various measurement instruments. A close inspection of a large
number of studies published in leading applied linguistic journals shows that there is a misconception among applied linguists as to the relative merits of exploratory factor analysis and
principal components analysis (PCA) and the kind of interpretations that can be drawn from each method. In addition, it is
argued that the widespread application of orthogonal, rather
than oblique, rotations and also the criteria used for factor selection are not in keeping with the findings in psychometrics. It is
further argued that the current situation is partly due to the fact
that PCA and orthogonal rotation are default options in mainstream statistical packages such as SPSS and the guidebooks on
such software do not provide an explanation of the issues
discussed in this article.
doi: 10.1002/tesj.176

The veracity of much of the findings in applied research


depends on the quality of the measures used. Hence, every
attempt must be made to ensure that the measures are
psychometrically sound and that the interpretation of the data
is justified. Specifically, researchers have to ascertain the validity
of their measures before any interpretations can be made.
The emphasis on test validity is not new at all and has in fact
been at the center of psychometric discussions for close to a
TESOL Journal 0.0, xxxx 2014
2014 TESOL International Association

century (Kane, 2013). Different techniques have been applied to


ensure test validity, ranging from simple correlation indices to
complex statistical techniques such as multidimensional item
response theory (e.g., Reckase, 2009), structural equation modeling
(e.g., Kline, 2010), and cognitive diagnostic assessment (e.g., Rupp,
Templin, & Henson, 2010). Validity inquiry has not stopped here,
and broad validation frameworks have also been offered.
Examples include the consequentialist framework of Messick
(1989); the argument-based approach by Kane (2013); and the
evidence centered design by Mislevy, Steinberg, and Almond
(2003).
Despite the emergence of a whole array of validation
techniques, not all validity studies are so comprehensive. Most
of the time, the entire validity evidence is based on a simple
statistical analysis from a cross-sectional research design. There is
of course no problem with such practice as long as two points
are considered. First, the analyses are properly done (i.e., a
suitable statistical technique is adopted and properly applied).
Second, the interpretation of the results is plausible. However, it
is usually observed that even these simple statistical analyses are
misapplied and/or the interpretation of the results is not
justified.
One of the most frequently used statistical techniques in scale
validation is exploratory factor analysis (EFA). An overview of the
studies that have used this technique, however, shows that there is
a misconception among applied researchers as to the theoretical
underpinnings of EFA and the kind of interpretations that can be
justifiably drawn. These misapplications are especially important
because, in the majority of the cases, the results of EFA represent
the only piece of available evidence in support of the validity of
the scales.
This study is an attempt to review the applications of EFA in
applied linguistics. The article provides a discussion of the
underlying principles of EFA and principal components analysis
(PCA), and the kind of decisions that researchers have to make in
any factor analysis. A critical overview of a large number of
studies that have used EFA and PCA is then provided in order to
depict the misunderstandings that prevail.
2

TESOL Journal

FACTOR ANALYSIS
Factor analysis (FA) refers to a body of multivariate analytical
techniques that aim to extract a smaller number of underlying
variables or factors from the observed variables which explain
most of the variance. It refers to a collection of methods which are
aimed at reducing a large number of variables to a manageable
number of factors.
There are two types of factor analysis: exploratory factor
analysis and confirmatory factor analysis (CFA). Both are based on
the common factor model. EFA is applied in settings where there is
no rigorous prior research on the nature of constructs measured by
a particular instrument. CFA, on the other hand, is mainly
exploited to confirm, or disconfirm, the underlying structure of a
measure that has been suggested by a particular theory or a large
body of research. Specifically, its application is more justified in
situations where
the researcher has some knowledge of the underlying latent variable structure. Based on knowledge of the theory, empirical
research, or both, he or she postulates relations between the
observed measures and the underlying factors a priori and then
tests this hypothesized structure statistically.
(Byrne, 2010, p. 6)

Due to space limitations, only EFA will be discussed in this


article.
In any factor analysis, researchers have to make a number of
important decisions about the suitability of FA for their study or,
alternatively, the suitability of their data for factor analysis, the
method of factor extraction, the procedures for factor selection,
and finally factor rotation. An overview of the fundamental
considerations at each stage is provided next. This will lay the
ground for a deeper appreciation of the plausibility of factor
analyses conducted in applied linguistics.
Design Considerations
The first step in factor analysis, as in almost all other statistical
analyses, is to evaluate the appropriateness of the particular
statistical technique for the purposes of the study. Specifically,
Exploratory Factor Analysis as a Construct Validation Tool

researchers should see whether the specific statistical technique


they have in mind is the most effective method to answer their
research questions. Furthermore, the outcome of any statistical
analysis depends on the quality of the input data (e.g., sample
size, quality of the measurement instruments used in data
collection). It is the quality of the data, not statistical analysis per
se, that determines the plausibility of the interpretations made
from the study. So it is of extreme importance that researchers
check the suitability of their data and the selected methods of data
analysis before they embark on statistical analysis.
The issue of appropriate sample size for EFA has been widely
discussed. Many rules of thumb have been suggested for
determining the sample size. Comrey and Lee (1992), for example,
suggest that 50 cases is very poor, 100 is poor, 200 is fair, 300 is
good, and 500 or more is very good. Others have suggested rules
based on the ratio of cases per variable. Gorsuch (1983) suggests a
ratio of 5 cases per variable, whereas Nunnally (1978) suggests a
ratio of at least 10 to 1. Although some rules of thumb have been
suggested for determining the appropriate sample size, research
has shown that they are prone to variation when other aspects of
the analysis change (Fabrigar, Wegener, MacCallum, & Strahan,
1999; Gorsuch, 1983; Hogarty, Hines, Kromrey, Ferron, &
Mumford, 2005; Kahn, 2006; MacCallum, Widaman, Zhang, &
Hong, 1999). Despite these variations, however, Kahn (2006)
suggests a minimum sample size of 300 to be confident of the
results. The results of a Monte Carlo study by MacCallum et al.
(1999) indicate that a sample size between 100 and 200 is
needed for variables with communalities around 0.5. Lower
communalities required samples sizes in excess of 300 cases.
Note that the majority of the suggestions for acceptable sample
size are based on Monte Carlo simulation studies. In such studies,
data are generated from known parameters (i.e., communalities).
The simulated data are then submitted to FA and the results are
compared with the initial parameters. By changing various
features of the data, the effect of different conditions on the
outcome of FA is then examined.
Although the suggestion for an appropriate sample size has not
been uniform, it appears that more dependable estimates may be
4

TESOL Journal

obtained with larger sample sizes. In addition, it has also been


indicated that EFA performs well when the communalities among
the variables are high (Fabrigar et al., 1999; Russell, 2002) and the
factors are overdetermined (i.e., there is an adequate number of
items for each factor; de Winter, Dodou, & Wieringa, 2009). With
respect to the number of variables to be included, Mundfrom,
Shaw, and Ke (2005) suggest a minimum of 7 variables for each
factor. Their results also indicate that, with a minimum of 7
variables for each factor, the required minimum sample size was
always less than 180 even when communalities were low.
However, with a low variables-to-factors ratio of 3 and when the
communalities are low, the required samples size was at least
1,200.
It should also be noted that the sample should be
representative of the population to which the research is willing to
generalize. That is, the sample should be very similar to the target
population. The results from a nonrepresentative sample would be
useless regardless of sample size. So researchers should aim for a
large and representative sample in doing FA, select variables with
high communalities, and include enough variables for each factor
such that the factors are overdetermined. Needless to say, there
are other design considerations when running a factor analysis.
Explaining all these is beyond the scope of this article. For a
detailed discussion of these, see Tabachnick and Fidell (2007).
Factor Extraction
After the researchers have ensured the suitability of their data for
EFA, they should decide on a method for extracting factors. There
a number of factor extraction methods, among which PCA and
principal axis factoring (PAF) are the most frequently used
(Thompson, 2004). PAF is based on the common factor model;
PCA is not.
In the common factor model, it is hypothesized that a number
of variables or factors underlie a set of observed variables or
indicators that are responsible for the covariation, or common
variance, among these variables. Specifically, it is assumed that the
observed measures are intercorrelated because they share a
common cause (i.e., they are influenced by the same underlying
Exploratory Factor Analysis as a Construct Validation Tool

construct); if the latent construct was partialed out, the


intercorrelations among the observed measures would be zero
(T. A. Brown, 2006, p. 13).
The common factor model is based on the assumption that
each observed variable is a linear function of one or more common
factors and one unique factor. The common factors are responsible
for the intercorrelations, or communalities, among the indicators.
The amount of variance in the observed variables that is explained
by the underlying factors is captured in communalities. The
unique variance, on the other hand, is the variance that is not
shared and that is uniquely owned by the particular indicators.
Part of unique variance is systematic variance that is not shared
with other indicators. This is called specific variance. The rest of
unique variance is random variance that is usually considered to
be measurement error or unreliability (Gorsuch, 1983). Because the
purpose of the common factor modelbased extraction methods
such as PAF is to obtain a number of common factors underlying a
set of indicators, communalities are put in the diagonal of the
matrix of associations (i.e., the correlation or covariance matrices).
That is, not all variance among the indicators is to be explained.
This way, EFA takes account of the error in the indicators. In other
words, indicators are not assumed to be free of error.
A closely related factor extraction technique is PCA. Despite
the computational similarities between the PCA and EFA, the two
techniques are conceptually different. The diagonal of the
correlation matrix in PCA includes unities rather than
communalities, which is the same as assuming that the scores
from the indicators are perfectly reliable (Thompson, 2004). In
other words, PCA does not distinguish between common and
unique variance and does not take measurement error into account
(Fabrigar et al., 1999). The outcome of PCA is components rather
than factors. Each component is a weighted sum of the test
scores (Bartholomew, 2004, p. 71). These components cannot be
interpreted in the same way that factors are interpreted. As
indicated earlier, factors may be interpreted as common causes
that bring about the shared variance among the indicators.
Components, on the other hand, are sum scores that are brought
6

TESOL Journal

about by the indicators. The purpose of PCA appears to be


completely different from EFA:
The purpose of common factor models is to understand the
latent (unobserved) variables that account for relationships
among measured variables; the goal of PCA is simply to reduce
the number of variables by creating linear combinations that
retain as much of the original measures variance as possible
(without interpretation in terms of constructs).
(Conway & Huffcutt, 2003, p. 150)

It appears that components are like total raw scores that are the
sum of item responses. As Borsboom (2006) puts it, principal
component scores are caused by their indicators in much the
same way that sumscores are caused by item scores (p. 426).
Researchers are no more justified in interpreting components as
factors or latent variables than they are in interpreting total raw
scores in this way. Although this point has been reiterated many
times in the psychometrics literature (e.g., Bartholomew, 2004;
Borsboom, 2006; Fabrigar et al., 1999; Kahn, 2006; Mulaik, 1990),
it is generally ignored in applied linguistics research.
It should be pointed out here that some researchers (e.g.,
Velicer & Jackson, 1990; Wilkinson, 1989) have argued in favor of
PCA because, in certain conditions, the results of PCA and EFA
are highly similar. Others (e.g., Borgatta, Kercher, & Stull, 1986;
Gorsuch, 1990; Hubbard & Allen, 1987; Snook & Gorsuch, 1989)
have argued that PCAs results do not resemble the results of EFA
in many conditions. This led Gorsuch (1990) to argue that if
common factor analysis produces more sensible results than
component analysis in some cases and produces the same in other
cases, there seems little advantage to recommending the special
case component analysis over the general case common factor
analysis (p. 34). Even in those situations where the results of PCA
and EFA are similar, researchers should not ignore the conceptual
differences between the two techniques and the difference
between factors and components as explained above. Gorsuch
concludes that
common factor analysis should be routinely applied as the standard analysis because it recognizes we have error in our variExploratory Factor Analysis as a Construct Validation Tool

ables, gives unbiased instead of inflated loadings, and is more


elegant as a part of the standard model used in univariate and
multivariate analysis. (p. 39)

There are other extraction methods such as maximum likelihood


(ML) that have their own advantages. Only PCA and PAF are
discussed here. For an overview of other extraction methods, see
Gorsuch (1983).
Factor Selection
Regardless of the type of factor extraction used, it is the researcher
that finally should pass a judgment as to the most plausible
number of factors to be selected. Researchers usually draw on
prior theory in deciding on the number of factors to retain. They
may also consider the percentage of variance explained by each
factor. Although prior research and theoretical issues play an
important role in FA, these procedures are rather subjective (Kahn,
2006). There are, however, a number of guiding principles that
make the process of factor selection more objective. Before
discussing these criteria, note that when ML extraction is applied,
other criteria are usually considered to determine the plausibility
of the model. For a review, see T. A. Brown (2006).
Some suggested procedures are based on eigenvalues.
Eigenvalues are indices of the amount of variance explained by
each factor. The maximum number of eigenvalues equals the
number of indicators. Therefore, an eigenvalue with a value of 1
explains the variance of a single observed variable. There are three
factor selection rules based on the eigenvalues. Kaisers (1974)
criterion requires that only components with eigenvalues above 1
be selected. The logic of the criterion is that a component cannot
explain less than one indicator. It should be noted here that the
criterion is appropriate only in PCA. This is because the
communalities rather than the variances are put in the diagonal of
the correlation matrix in the PAF (Fabrigar et al., 1999). Due to the
simplicity of the rule, however, this procedure has been widely
applied by researchers.
Another eigenvalue-based method for deciding on the
number of factors is Cattels (1966) scree plot. An example
8

TESOL Journal

scree plot is displayed in Figure 1. Note that the eigenvalues


are plotted against the factors. Factor selection is guided by the
last break or change of slope in the plot. In this plot, for
example, there is a large break between the first and the
second factors. So it may be assumed that only one factor
would be selected. The problem with the scree plot is that the
results may appear to be ambiguous, and this may render
factor selection completely subjective.
Another factor selection procedure based on eigenvalues is
parallel analysis (Horn, 1965). A parallel analysis proceeds by
generating eigenvalues from random data based on the same
number of indicators and the same number of cases. These
eigenvalues are then compared to the eigenvalues produced from
the actual data. Factors with eigenvalues larger than the
eigenvalues obtained from the random data will be retained.
The justification for parallel analysis is that each selected factor
should logically explain more variance than is expected by chance
(T. A. Brown, 2006).

Figure 1. A scree plot


Exploratory Factor Analysis as a Construct Validation Tool

From among the eigenvalue-based factor selection procedures,


many researchers (e.g., Dinno, 2009; Hayton, Allen, & Scarpello,
2004; Henson & Roberts, 2006; Kahn, 2006; Schmitt, 2011; Zwick &
Velicer, 1986) have pointed out that parallel analysis is the most
effective. Others (e.g., Conway & Huffcutt, 2003; Fabrigar et al.,
1999) have pointed out that a combination of these techniques
should be applied. The satisfactory performance of parallel
analysis has led some journals such as Educational and Psychological
Measurement to require that parallel analysis be included in studies
reporting FA (Thompson & Daniel, 1996).
Factor Rotation
The results of an initial FA are rotated to render the results more
interpretable. This is because for any given factor solution, there
are an infinite number of ways that the factor axes can be rotated
in the multidimensional space (Fabrigar et al., 1999). Rotation
refers to changing the reference points for the variables (Kahn,
2006, p. 693). The reference points are rotated such that simple
structure is obtained, which makes the interpretation of the factors
simpler.
There are two types of factor rotation: orthogonal and oblique.
In orthogonal rotation, factors are rotated such that the correlation
between any two factors equals zero. That is, in orthogonal
rotation, factors are forced to be uncorrelated (Conway &
Huffcutt, 2003). In oblique rotation, on the other hand, the factors
are allowed to be correlated. There are a number of rotation
techniques within each category. Varimax and quartimax are
examples of orthogonal rotation. Oblique rotation, on the other
hand, includes such techniques as promax, quartamin, direct
oblimin, and orthooblique.
Orthogonal rotation may lead to biased results when the
underlying factors are in fact correlated (T. A. Brown, 2006). On
the other hand, correlated factors do not pose any problems for
oblique rotation. Therefore, it is generally preferred over the
orthogonal rotation (T. A. Brown, 2006; Conway & Huffcutt, 2003;
Fabrigar et al., 1999; Ford, MacCallum, & Tait, 1986; Gorsuch,
1997). T. A. Brown (2006) sums it up neatly:
10

TESOL Journal

Thus, in most cases, oblique rotation is preferred because it provides a more realistic representation of how factors are interrelated. If the factors are in fact uncorrelated, oblique rotation will
produce a solution that is virtually the same as one produced
by orthogonal rotation. If the factors are interrelated, however,
oblique rotation will yield a more accurate representation of the
magnitude of these relationships. In addition, estimation of factor correlations provides important information such as the existence of redundant factors or a potential higher-order structure.
(p. 32)

Conway and Huffcutt (2003) go even further to suggest that there


is no reason to use orthogonal rotations if oblique rotations can
provide better results in some cases and similar results in the rest
of the cases.

METHOD
Selection of the Articles
For the purposes of the present study, all articles appearing
between January 1990 and December 2011 were selected from 10
high-quality journals in applied linguistics: Applied Linguistics,
Foreign Language Annals, Language Awareness, Language Learning,
Language Teaching Research, Language Testing, Modern Language
Journal, Studies in Second Language Acquisition, System, and TESOL
Quarterly. Unfortunately, we had to discard a number of articles
from further analysis because of the ambiguities involved in
reporting the results of the factor analysis. These articles provided
no details on any aspect of FA and merely stated in passing that FA
was conducted. One can get virtually nothing out of the explanation
as to the factor extraction, selection, and rotation procedures. The
total number of articles analyzed in this study was 111.
Procedure
All the articles that had used FA were from the 10 leading journals
in applied linguistics. These articles were closely examined with
respect to four aspects of their analyses: design considerations
such as the factorability of the matrix of associations and the
sample size, factor extraction, factor selection, and factor rotation.
The results of each analysis are reported in the next section.
Exploratory Factor Analysis as a Construct Validation Tool

11

RESULTS AND DISCUSSION


Design Considerations
The first aspect of the studies to be examined was the sample size.
The results are reported in Table 1. It appears from the table that
only 35% of the studies had sample sizes of more than 300 cases.
Also, 26% of the studies had sample sizes of less than 100. What is
more striking is the studies with sample sizes as low as 25.
Hadden (1991), for example, administered a 24-item questionnaire
to two groups: a group of second language teachers and a group
of nonteachers. The sample sizes for these two groups were 25 and
32, respectively. The sample size-to-variable ratio is around 1 for
these two groups! As a more recent example, Wang, Spencer, and
Xing (2009) gave a three-part questionnaire to 45 students. Each
part was then factor-analyzed. One part of the questionnaire had
26 items. With 45 students, this would make a ratio of fewer than
2 cases per variable. Such a sample size is not in keeping with any
guidelines suggested so far.
Mundfrom et al. (2005) have provided evidence from
simulation studies that when the variables-to-factors ratio is 6, and
with low levels of communality, the minimum required sample
size is 260. However, if the ratio is decreased to 4, then the sample
size must be at least 1,400. Therefore, it appears that one way of
increasing the dependability of the results is to increase the
number of variables. The average variables-to-factors ratio for the
articles examined in this study was 4.83. With such a ratio,
Mundfrom et al. argued that when the communalities are not
uniformly high, the minimum required sample size for a fourfactor solution would be at least 300 (when communalities are
high) and 800 (when communalities are low). I would not hesitate
to say that these indices and the average variables-to-factors ratio

TABLE 1. Distribution of Sample Sizes in the Articles


Sample size
N
%
12

0100

100200

200300

Beyond 300

29
26

22
20

21
19

39
35
TESOL Journal

for the articles examined in the present study should be


interpreted cautiously. However, the fact that only 35% of the
articles I have reviewed had sample sizes larger than 300 may
be a cause for concern.
It should be noted here that when calculating the average
variables-to-factors ratio, I excluded two studies because they were
not typical of the rest of the articles, and the results would be
distorted. In the first study, Sinharay et al. (2009) had run EFA on
a test with 100 items to come up with only one factor. In the
second study, Beglar and Hunt (1999) went through PCA to assess
the dimensionality of the 2000 Word Level Test. The test had 55
items and the authors came up with one component.
Another important point in any factor analysis, like other
statistical analyses, is to check for the assumptions. Of the 111
articles examined, only 22 (20%) explicitly stated that the
factorability of the matrix of associations was checked, employing
either the Kaiser-Meyer-Olkin measure of sampling adequacy
(Kaiser, 1974) or Bartletts test of sphericity (Bartlett, 1954). It has
been frequently argued that violations of the assumptions may
lead to biased or undependable results. The fact that so many
studies do not provide any details on whether the assumptions
have been met may indicate a lack of concern for such
assumptions. The majority of the articles we examined did not
provide a detailed description of the steps taken toward ensuring
that the assumptions hold. All in all, almost half the articles had
sample sizes below 200 and only 20% of the articles reported that
they had checked for the assumptions.
Factor Extraction
The next aspect of the FA studies to be examined was the method
of factor extraction exploited. The results are reported in Table 2.
It is evident that a large number of studies applied PCA rather
than EFA. Specifically, PCA was exploited in 71 studies (64%).
Among the remaining studies, 17 applied PAF and 10 used
maximum likelihood. Alpha factoring was used in only one study.
The rest of the studies (12) did not specify their extraction method.
The frequent use of PCA in applied linguistics is a cause for
concern in the face of the kind of interpretations offered. This is
Exploratory Factor Analysis as a Construct Validation Tool

13

TABLE 2. Distribution of Extraction Methods in the Articles


Method
N
%

PCA

PAF

ML

Alpha

Not specified

71
64

17
15

10
9

1
1

12
11

not to argue that all applications of PCA are wrong. Components


may be used as scores in a subsequent analysis. Interpreting such
components as underlying factors, however, is a mistake. As
discussed earlier, components should not be interpreted as
underlying factors. Yet an inspection of the articles examined in
this study clearly indicates that applied linguistics has not caught
up with the developments in psychological and educational
assessment. A few examples are provided to depict the kind of
interpretations that are usually offered for PCA.
Csizer and Luk
acs (2010) used PCA for factor extraction in
their study and use the latent terminology in their interpretation
of the results:
In order to identify broader dimensions underlying the attitudinal/motivational variables measured by the questionnaire, we
submitted the items belonging to the specific scales to principal
component analysis (conducting separate analyses for the English and German languages). The statistical characteristics of the
latent dimensions for the two languages were similar and sufficient to conclude that except for the scale of Ought-to self, the
motivational characteristics of Hungarian language learners of
both English and German could be described with the hypothesized latent dimensions. (pp. 56)

Such statements prevail in applied linguistics research. They


appear to be the norm rather than the exception. Huang (2011)
also interpreted the results of a PCA in order to provide evidence
for the validity of trait variables. In discussing the results of a PCA,
Clark and Schleefs (2010) abundant use of the trait terminology is
interesting:
These personality trait adjectives have been grouped into three
underlying traits or supervariables labeled status/power,
social attractiveness, and solidarity. These underlying evaluative dimensions were arrived at after the data were subjected
14

TESOL Journal

to the data reduction technique of principal components analysis (PCA). PCA allows the researcher to examine the underlying
structure of the data and to identify relationships between the
adjectives in the semantic differential scale. (p. 308)

Still another example is Shimizu and Green (2002), who propose


that PCA makes it possible to elucidate better the fundamental
trait underlying item intercorrelation (p. 231).
It is evident that such interpretations of the results of PCA are
ubiquitous among applied linguists. The quotes provided here are
exemplary of a large number of studies that have used PCA. There
is little theoretical justification for such interpretations, as we have
seen. PCA results do not indicate anything about the underlying
factors in a set of indicators. EFA should be used for such
purposes.
Factor Selection
A large number of studies selected their factors based on
theoretical criteria (see Table 3). Kaisers (1974) criterion was
applied in 58 studies (52%), and the scree plot in 35 studies (31%).
On the other hand, only 3 studies reported that they based their
analysis on parallel analysis (PA). Furthermore, 25 studies (22.5%)
reported to have applied both the scree plot and Kaisers
criterion. Fabrigar et al. (1999), among others, urge researchers to
use multiple criteria in factor selection. Among the articles
examined in the present study, only 1 study used a combination
of Kaisers criterion, the scree plot, and PA to decide on the
number of factors to select. In addition, 9 studies that had gone
through PFA for factor extraction reported to have used Kaisers
criterion for factor selection. As explained earlier, this is not
appropriate because Kaisers criterion is suitable for PCA, not
EFA.
TABLE 3. Distribution of Factor Selection Criteria
Criterion
N
%

Kaisers
criterion

Scree
plot

Both Kaiser and


scree plot

Parallel
analysis

All
three

Not
specified

58
52

35
32

25
22.5

3
2.8

1
0.9

35
32

Exploratory Factor Analysis as a Construct Validation Tool

15

Factor Rotation
The relative frequency of each factor rotation technique is
displayed in Table 4. The vast majority of studies used an
orthogonal rotation (57%). Among the orthogonal rotation
techniques, varimax rotation was by far the most widely exploited
method. On the other hand, only 27% of the studies applied
oblique rotation. Note also that 9% of the studies did not specify
which rotation they applied and 7% stated that they applied both.
A striking outcome of the inspection of the studies was the
confusion in terminology among researchers. Green and Oxford
(1995), for example, state, We used a 9-factor varimax (oblique)
factor analytic solution (p. 272), although varimax is an
orthogonal rotation. Another example is the following statement
from Hedgcock and Lefkowitz (1996): Because the feedback
factors in the study were presupposed to be correlated, a varimax
oblique rotation was applied, allowing for the examination of
common as well as unique variance (p. 300). The justification
provided clearly fits an oblique rotation. Apparently, the authors
confused varimax rotation with oblique rotation.
A possible reason for the prevalence of PCA and varimax
rotation among the applied researchers may be the fact that these
methods are the default options in commonly used statistical
packages such as SPSS. Although these user-friendly software
packages are of much value in data analysis, the blind application
of statistical techniques without a concern for their theoretical
underpinnings is an unfavorable side effect.
Another reason for the widespread application of PCA and
orthogonal rotation is that they are the suggested methods in
commonly used books on SPSS, such as those by Pallant (2007)
and Field (2009). In fact, a number of studies refer to these books
to justify their use of PCA or orthogonal rotation. Whereas Field

TABLE 4. Distribution of Rotation Methods in the Articles


Method
N
%
16

Orthogonal

Oblique

Both

Not specified

63
57

30
27

8
7

10
9
TESOL Journal

hints at the arguments over the relative merits of PCA and EFA
in his book, Pallant makes no reference at all and simply suggests
PCA followed by varimax rotation.

CONCLUSION
An overview of the factor analytic studies in applied linguistics
research indicates that a misconception prevails among applied
linguists as to the relative merits of EFA and PCA and the kind of
interpretations drawn from each method. It was explained that
while EFA aids researchers by revealing the underlying structure
of measures commonly used in practical research, PCA is only a
data reduction tool. The components do not provide any
information about the underlying dimensions in a measure. They
are simply weighted sum-scores like the total raw score. An
overview of the research studies in leading applied linguistic
journals indicates, however, that such an important distinction is
usually overlooked and researchers routinely apply PCA for
construct validation purposes and draw interpretations that are
sometimes unwarranted.
It was also argued that the extensive application of orthogonal
factor rotation techniques such as varimax will lead to distorted
results in cases where the extracted factors or components are in
fact related. Orthogonal rotation is widely exploited in applied
linguistics.
Furthermore, the Kaiser criterion and the scree plot are the
most widely exploited criteria in factor selection decisions. Only
three studies applied parallel analysis, which has received stronger
support from previous research. Despite frequent calls for
exploiting multiple criteria in making factor selection decisions,
only one study appeared to use a combination of the three
eigenvalue-based criteria in factor selection.
The wide application of the Kaiser criterion and the scree plot,
despite their shortcomings, may also be partly due to the fact that
they are readily available in software such as SPSS. Unfortunately,
PA is not available in this software and this may be an
impediment in the way of its wider exploitation by applied
researchers.
Exploratory Factor Analysis as a Construct Validation Tool

17

The current trend in the use of PCA and the orthogonal


rotation techniques may be partly due to the fact that they are the
default options in established statistical packages such as SPSS.
Despite their many advantages, a side effect of such software is
the blind application of the default options without an
understanding of their theoretical underpinnings. Borsboom (2006)
neatly sums it up:
The reason that, say, Cronbachs alpha and principal components analysis are so popular in psychology is not that these
techniques are appropriate to answer psychological research
questions, or that they represent an optimal way to conduct
analyses of measurement instruments. The reason for their popularity is that they are default options in certain mouse-click
sequences of certain popular statistics programs. Since psychologists are monogamous in their use of such software . . . there is
little chance of convincing them to use a modelany model
that is not clickable in the menus of major statistical programs. (p. 433)

Gorsuch (1990) takes a similar position:


The continued use of components is primarily the result of decisions made when there were problems computing common factor analysis which no longer exist and the continuation of its
being a ready default on computer programs designed during
an earlier era. (p. 39)

Part of the problem also pertains to the fact that commonly


used manuals for these software programs, such as Pallant (2007)
and Field (2009), recommend the use of PCA and orthogonal
rotation. In addition, texts of the how to do statistics sort in
applied linguistics have not always been discriminating with
respect to the issues discussed here. For example, in his
manuscript on statistics, J. D. Brown (1992) states: When
searching out underlying variables (mostly among interval scales)
that are linearly related, principal components analysis and factor
analysis would be appropriate (p. 651, italics in original).
Due to their background in humanities, applied linguists do
not usually reach for texts on statistics that are mathematically
oriented. Unfortunately, the books on research and statistics that
18

TESOL Journal

are especially tailored for such people hardly ever discuss the
intricacies involved in statistical reasoning. More discussion is
needed to make applied researchers aware of the consequences of
unwise application of the default options in statistical packages.

THE AUTHOR
Hossein Karami is a lecturer at the University of Tehran. His main
area of interest is language testing and assessment. He has
published in various international journals, including Educational
Research and Evaluation, RELC Journal, Psychological Test and
Assessment Modeling, and Asia-Pacific Education Review.

REFERENCES
Bartholomew, D. J. (2004). Measuring intelligence: Facts and fallacies.
Cambridge, England: Cambridge University Press.
Bartlett, M. S. (1954). A note on the multiplying factors for various
chi square approximations. Journal of the Royal Statistical Society,
16 (Series B), 296298.
Beglar, D., & Hunt, A. (1999). Revising and validating the 2000
word level and the university word level vocabulary tests.
Language Testing, 16(2), 131162. doi:10.1177/02655322
9901600202
Borgatta, E. F., Kercher, K., & Stul, l D. E. (1986). A cautionary
note on the use of principal components analysis. Sociological
Methods and Research, 15, 160168. doi:10.1177/00491241
86015001011
Borsboom, D. (2006). The attack of the psychometricians.
Psychometrika, 71, 425440. doi:10.1007/s11336-006-1447-6
Brown, J. D. (1992). Statistics as a foreign languagePart 2: More
things to look for in reading statistical language studies. TESOL
Quarterly, 26, 629664. doi:10.2307/3586867
Brown, T. A. (2006). Confirmatory factor analysis for applied research.
New York, NY: Guilford Press.
Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic
concepts, applications, and programming. New York, NY:
Routledge.
Cattell, R. B. (1966). The scree test for the number of factors.
Multivariate Behavioral Research, 1(2), 245276.
Exploratory Factor Analysis as a Construct Validation Tool

19

Clark, L., & Schleef, E. (2010). The acquisition of sociolinguistic


evaluations among Polish-born adolescents learning English:
Evidence from perception. Language Awareness, 19, 299322.
doi:10.1080/09658416.2010.524301
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis
(2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of
exploratory factor analysis practices in organizational research.
Organizational Research Methods, 6(2), 147168. doi:10.1177/
1094428103251541
Csizer, K., & Luk
acs, G. (2010). The comparative analysis of
motivation, attitudes and selves: The case of English and German
in Hungary. System, 38, 113. doi:10.1016/j.system.2009.12.001
de Winter, J. C. F., Dodou, D., & Wieringa, P. A. (2009). Exploratory
factor analysis with small sample sizes. Multivariate Behavioral
Research, 44(2), 147181. doi:10.1080/00273170902794206
Dinno, A. (2009). Exploring the sensitivity of Horns parallel
analysis to the distributional form of simulated data.
Multivariate Behavioral Research, 44, 362388. doi:10.1080/
00273170902938969
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J.
(1999). Evaluating the use of exploratory factor analysis in
psychological research. Psychological Methods, 4, 272299. doi:10.
1037/1082989X.4.3.272
Field, A. (2009). Discovering statistics using SPSS. London, England:
Sage.
Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application of
exploratory factor analysis in applied psychology: A critical
review and analysis. Personnel Psychology, 39, 291314. doi:10.
1111/j.1744-6570.1986.tb00583.x
Gorsuch, R. L. (1983). Factor analysis. Hillsdale, NJ: Lawrence Erlbaum.
Gorsuch, R. L. (1990). Common factor-analysis versus component
analysis: Some well and little known facts. Multivariate
Behavioral Research, 25(1), 3339. doi:10.1207/s15327906
mbr2501_3
Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item
analysis. Journal of Personality Assessment, 68, 532560. doi:10.
1207/s15327752jpa6803_5
20

TESOL Journal

Green, J., & Oxford, R. (1995). A closer look at learning strategies,


L2 proficiency and gender. TESOL Quarterly, 29, 261297.
doi:10.2307/3587625
Hadden, B. L. (1991). Teacher and nonteacher perceptions of
second-language communication. Language Learning, 41, 124.
doi:10.1111/j.1467-1770.1991.tb00674.x
Hayton, J. C., Allen, D. G., & Scarpello, S. (2004). Factor retention
decisions in exploratory factor analysis: A tutorial on parallel
analysis. Organizational Research Methods, 7, 191205. doi:10.
1177/1094428104263675
Hedgcock, J., & Lefkowitz, N. (1996). Some input on input: Two
analyses of student response to expert feedback in L2 writing.
Modern Language Journal, 80, 287308. doi:10.1111/j.1540-4781.
1996.tb01612.x
Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor
analysis in published research: Common errors and some
comment on improved practice. Educational and Psychological
Measurement, 66, 393416. doi:10.1177/0013164405282485
Hogarty, K., Hines, C., Kromrey, J., Ferron, J., & Mumford, K.
(2005). The quality of factor solutions in exploratory factor
analysis: The influence of sample size, communality, and
overdetermination. Educational and Psychological Measurement,
65, 202226. doi:10.1177/0013164404267287
Horn, J. L. (1965). A rationale and test for the number of factors
in factor analysis. Psychometrika, 30, 179185. doi:10.1007/
BF02289447
Huang, S.-C. (2011). Convergent vs. divergent assessment: Impact
on college EFL students motivation and self-regulated learning
strategies. Language Testing, 28, 251271. doi:10.1177/
0265532210392199
Hubbard, R., & Allen, S. J. (1987). A cautionary note on the use of
principal components analysis: Supportive empirical evidence.
Sociological Methods and Research, 16, 301308. doi:10.1177/
0049124187016002005
Kahn, J. H. (2006). Factor analysis in counseling psychology
research, training, and practice: Principles, advances, and
applications. The Counseling Psychologist, 34, 684718. doi:10.
1177/0011000006286347
Exploratory Factor Analysis as a Construct Validation Tool

21

Kaiser, H. (1974). An index of factorial simplicity. Psychometrika,


39, 3136. doi:10.1007/BF02291575
Kane, M. (2013). Validating the interpretations and uses of test
scores. Journal of Educational Measurement, 50, 173. doi:10.1111/
jedm.12000
Kline, R. B. (2010). Principles and practice of structural equation
modeling (3rd ed.). New York, NY: Guilford.
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999).
Sample size in factor analysis. Psychological Methods, 4, 8499.
doi:10.1037/1082-989X.4.1.84
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational
measurement (3rd ed., pp. 13103). New York, NY: American
Council on Education and Macmillan.
Mislevy, R., Steinberg, L., & Almond, R. (2003). On the structure of
educational assessments. Measurement: Interdisciplinary Research
and Perspectives, 1, 362. doi:10.1207/S15366359MEA0101_02
Mulaik, S. A. (1990). Blurring the distinctions between component
analysis and common factor analysis. Multivariate Behavioral
Research, 25, 5359. doi:10.1207/s15327906mbr2501_6
Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005). Minimum
sample size recommendations for conducting factor analyses.
International Journal of Testing, 5, 159168. doi:10.1207/
s15327574ijt0502_4
Nunnally, J. C. (1978). Psychometric theory. New York, NY:
McGraw-Hill.
Pallant, J. (2007). SPSS survival manual. Maidenhead, England:
Open University Press.
Reckase, M. D. (2009). Multidimensional item response theory. New
York, NY: Springer.
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic
measurement: Theory, methods, and applications. New York, NY:
Guilford Press.
Russell, D. W. (2002). In search of underlying dimensions: The use
(and abuse) of factor analysis. Personality and Social Psychology
Bulletin, 28, 16291646. doi:10.1177/014616702237645
Schmitt, T. A. (2011). Current methodological considerations in
exploratory and confirmatory factor analysis. Journal of
22

TESOL Journal

Psychoeducational Assessment, 29, 304321. doi:10.1177/


0734282911406653
Shimizu, H., & Green, K. E. (2002). Japanese language educators
strategies for and attitudes toward teaching kanji. Modern
Language Journal, 86, 227241. doi:10.1111/1540-4781.00146
Sinharay, S., Powers, D. E., Feng, Y., Saldivia, L., Giunta, A.,
Simpson, A., & Weng, V. (2009). Appropriateness of the
TOEIC Bridge test for students in three countries of South
America. Language Testing, 26, 589619. doi:10.1177/
0265532209340195
Snook, S. C., & Gorsuch, R. L. (1989). Component analysis versus
common factor analysis: A Monte Carlo study. Psychological
Bulletin, 106, 148154. doi:10.1037/0033-2909.106.1.148
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics
(4th ed.). Boston, MA: Allyn & Bacon.
Thompson, B. (2004). Exploratory and confirmatory factor analysis:
Understanding concepts and applications. Washington, DC:
American Psychological Association.
Thompson, B., & Daniel, L. G. (1996). Factor analytic evidence for
the construct validity of scores: A historical overview and some
guidelines. Educational and Psychological Measurement, 56, 197
208. doi:10.1177/0013164496056002001
Velicer, W. F., & Jackson, D. N. (1990). Component analysis versus
common factor analysis: Some issues in selecting an
appropriate procedure. Multivariate Behavioral Research, 25, 1
28. doi:10.1207/s15327906mbr2501_1
Wang, J., Spencer, K., & Xing, M. (2009). Metacognitive beliefs and
strategies in learning Chinese as a foreign language. System, 37,
4656. doi:10.1016/j.system.2008.05.001
Wilkinson, L. (1989). A cautionary note on the use of factor
analysis: A response to Borgatta, Kercher, and Stull, and
Hubbard and Allen. Sociological Methods and Research, 17,
449459. doi:10.1177/0049124189017004008
Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for
determining the number of components to retain. Psychological
Bulletin, 99, 432442. doi:10.1037/0033-2909.99.3.432

Exploratory Factor Analysis as a Construct Validation Tool

23

You might also like