Professional Documents
Culture Documents
72 Inpress ATutorialon Hierarchically Structured Constructs
72 Inpress ATutorialon Hierarchically Structured Constructs
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/232706750
CITATIONS READS
147 907
3 authors:
Oliver Wilhelm
Ulm University
163 PUBLICATIONS 6,011 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Oliver Wilhelm on 18 October 2017.
Martin Brunner
University of Luxembourg
Luxembourg
Gabriel Nagy
University of Tuebingen
Germany
Oliver Wilhelm
University of Ulm
Germany
Brunner, M., Nagy, G., & Wilhelm, O. (in press). A tutorial on hierarchically structured constructs. Journal of
Personality.
Contact: martin.brunner@uni.lu
HIERARCHICALLY STRUCTURED CONSTRUCTS 2
Abstract
operate at various levels of generality. Alternative confirmatory factor analytic (CFA) models
can be used to study various aspects of this proposition: (a) the one-factor model focuses on
the top of the hierarchy and contains only a general construct, (b) the first-order factor model
focuses on the intermediate level of the hierarchy and contains only specific constructs, and
both (c) the higher-order factor model and (d) the nested-factor model consider the hierarchy
in its entirety and contain both general and specific constructs (e.g., bifactor model). This
tutorial considers these CFA models in depth, addressing their psychometric properties,
interpretation of general and specific constructs, and implications for model-based score
reliabilities. The authors illustrate their arguments with normative data obtained for the
Wechsler Adult Intelligence Scale and conclude with recommendations on which CFA model
models
HIERARCHICALLY STRUCTURED CONSTRUCTS 3
constructs include traditional personality traits (DeYoung, 2006), self-concept (Marsh &
Craven, 2006; Shavelson, Hubner, & Stanton, 1976), disorders like depression (Steer, Ball,
Ranieri, & Beck, 1999; Tanaka & Huba, 1984), subjective wellbeing (Chen, West, & Sousa,
2006; Gallagher, Lopez, & Preacher, 2009), and intelligence (Carroll, 1993; McGrew, 2009).
For example, the study by DeYoung (2006) supported a hierarchical structure of personality
with two general personality constructs—stability and plasticity—at the top of the hierarchy
and the more specific Big Five personality dimensions at the next level in the hierarchy:
Crucially, neither general nor specific constructs are directly observable entities, but
rather (unobserved) latent variables that are reflected in observable scores on corresponding
measures. One key task for all personality and individual difference researchers is therefore to
choose from a variety of measurement models that link general and/or narrower, specific
constructs with observable measures in different ways. It is also this decision that provides a
statistical rationale for the computation of scale scores that reflect respondents‘ levels on
general and/or specific constructs. Such a rationale is required, for example, by the Standards
for Educational and Psychological Testing, which state that ―where composite scores are
developed, the basis and rationale for arriving at the composites should be given‖ and that
―the rationale and supporting evidence must pertain directly to the specific score […] to be
measurement model is also the crucial prerequisite for assessing score reliability (Cortina,
Taken together, measurement models are of critical importance for both research and
applied assessment. The major goal of this tutorial is therefore to guide personality and
structured constructs. To this end, we consider the psychometric properties and the
confirmatory factor analytic (CFA) models. Further, we elaborate on the implications of these
CFA models for the computation and interpretation of both scores and model-based estimates
Zinbarg, Yovel, Revelle, & McDonald, 2006) to demonstrate how to compute the reliability
of scores that are intended to assess constructs at various levels of the hierarchy. We
synthesize our key points in the General Discussion, offering recommendations on which
CFA model is most appropriate for which questions in personality research. In so doing, we
note the potential inherent restrictions of specific CFA models when they are used to examine
Adult Intelligence Scale–Third Edition (WAIS–III, Wechsler, 1997)—a widely used measure
of intelligence. Software code that can be used to examine the WAIS–III by means of the four
CFA models discussed and to compute model-based reliabilities of scores (Cheung, 2009) can
Psychological theories can be divided into two components: one component that
specifies how theoretical constructs are related to corresponding measures and one component
that defines the mutual relationships of the theoretical constructs (Edwards & Bagozzi, 2000).
CFA models (and structural equation models in general) are useful statistical tools for
empirically examining both components. In this section, we discuss four popular CFA models
that may be applied in many areas of personality and individual differences research to study
HIERARCHICALLY STRUCTURED CONSTRUCTS 5
hierarchically structured constructs. Here, we use these models to test alternative theories of
the structure of intelligence. More specifically, (a) the one-factor model, based on Spearman‘s
work (1904), contains only a general construct representing general cognitive ability (g). (b)
The first-order factor model reflects important ideas of the theory of fluid and crystallized
intelligence (Horn & Noll, 1997) and focuses on ability constructs that are narrower in scope
and specific to cognitive operations (e.g., processing speed) or content domains (e.g., verbal
comprehension). (c) The higher-order factor model and (d) the nested-factor model are
informed by current theories of the structure of intelligence (Carroll, 1993; McGrew, 2009).
These theories conceive general cognitive ability to be the most general construct at the apex
of the hierarchy; specific abilities that are narrower in scope are located at lower levels of the
ability hierarchy.
To study these alternative theories, we use the correlations among the subtests of the
WAIS–III (Table 1) as obtained for the Spanish standardization sample (Colom, Abad,
Garcia, & Juan-Espinosa, 2002, Table A3).2 This sample comprises data from 1,369 persons
(703 women and 666 men; age range: 15 to 94 years). It is representative of the Spanish
population in terms of educational level, geographical region, and residence in urban and rural
general cognitive ability and on specific abilities that are relatively broad in scope. We
therefore use subtest scores, which serve as adequate manifest measures for representing such
broad constructs (Bagozzi & Edwards, 1998). It is important to note that the CFA models
presented in this article can also be applied on the basis of item scores (e.g., rating-scale items
with response categories ranging from ―strongly disagree‖ to ―strongly agree‖), as is done in
many areas of personality and individual difference research. We elaborate on the use of item
HIERARCHICALLY STRUCTURED CONSTRUCTS 6
scores when discussing the statistical requirements of the model-based estimation of score
reliabilities.
One-Factor Model
Charles Spearman (1904) found that measures of cognitive abilities are positively
intercorrelations by the operation of a single factor gOF representing general cognitive ability.
These ideas are reflected in the one-factor model (OF, see Figure 1a), which focuses on a
general ability construct. (To distinguish the constructs specified in a particular model from
related constructs in other models, we use a subscript to index all corresponding factors; here,
OF). Specifically, the one-factor model predicts that individual differences in all subtests of
common latent factor (depicted as an ellipse) that represents gOF. The influence of gOF on the
implies that higher scores on gOF are associated with higher scores on all 14 subtests of the
second factor orthogonal to gOF (which is why he called it the two-factor theory of
intelligence). This second factor represents some specific ability that is required to complete a
certain subtest. Further, each subtest score may also be affected to some degree by random
measurement error. Both of these latter influences (i.e., reliable but subtest-specific variance
and unreliable error variance) are represented by a single factor eOF for each subtest. These
subtest-specific factors (i.e., eOF,1 to eOF,14) are depicted by the single arrows pointing to the
individual subtests in Figure 1a. Note that the factor eOF is specific to each subtest for two
reasons: (a) Unless two measures share measure-specific variance (e.g., when the same
subtest is applied at two successive points of measurement or when two self-report items have
HIERARCHICALLY STRUCTURED CONSTRUCTS 7
similar wordings), it is not possible to disentangle the variance of a particular subtest that is
attributable to random measurement error from that attributable to specific variance. (b)
Measurement error is uncorrelated across subtests because its influence on subtests is random
(i.e., unpredictable). Thus, the factors eOF,1 to eOF,14 as well as gOF are assumed to operate
How well does the one-factor model fit the data from the Spanish standardization
sample of the WAIS–III? The overall model fit of the one-factor model is modest (see Table
2), suggesting that a single general construct does not adequately explain the associations
among the subtests. We therefore do not interpret the model parameters any further.
The one-factor model focuses on a very general ability construct (i.e., gOF) and does
not address abilities that are narrower in scope (and that may be located at intermediate levels
of the hierarchy). In the first-order factor model (FO), in contrast, each measure is assumed to
be influenced by a single first-order factor that influences a subset of the WAIS–III subtests
(see Figure 1b). Note that this conception of cognitive abilities, and the absence of a factor
representing general cognitive ability, is informed by a more recent version of the theory of
fluid and crystallized abilities (Horn & Noll, 1997). A meaningful first-order factor model for
the WAIS–III (e.g., Tulsky & Ledbetter, 2000) therefore conceives the 14 subtests to be
influenced by four mutually correlated first-order factors (the correlations are depicted in
Figure 1b as double-headed arrows). These first-order factors represent constructs that are
operation (i.e., Perceptual Organization POFO, Working Memory WMFO, Processing Speed
PSFO). For example, the subtests Information, Vocabulary, Similarities, and Comprehension
are all assumed to be affected by a single factor that represents the operation of the (domain-
specific) construct VCFO. Higher scores on VCFO are thus associated with higher scores on
HIERARCHICALLY STRUCTURED CONSTRUCTS 8
these four verbal subtests. Further, each subtest is influenced by subtest-specific factors (i.e.,
eFO,1 to eFO,14) that represent an ability specifically required for a certain subtest as well as
measurement error. The latter two sources of variance cannot be separated (see above) and are
An important assumption of the first-order factor model is that the first-order factors
may be correlated. However, the first-order factor model typically does not specify a priori
the direction of the mutual associations of VCFO, POFO, WMFO, and PSFO by placing
How well does the first-order factor model fit the data? The overall model fit of the
first-order factor model is good (see Table 2), indicating that four correlated first-order
constructs provide a reasonable explanation of the associations among the subtests of the
WAIS–III. Further, it is possible to statistically test whether four correlated first-order ability
constructs are better able to explain the data than is a single common factor representing
general cognitive ability gOF (Rindskopf & Rose, 1988)—the one-factor model is statistically
equivalent to a first-order factor model in which all factor correlations are fixed to r = 1.
However, when boundary values (here: r = 1) are involved, the difference between the 2
goodness-of-fit test values as obtained for the first-order factor model and the one-factor
model does not follow a 2 distribution. Hence, we applied a multi-step procedure developed
by Stoel and colleagues (Stoel, Garre, Dolan, & van den Wittenboer, 2006) to compute a test
parameters are fixed to their boundary values (the online supplement describes this procedure
and contains the corresponding software code). The critical value at = .05 of the
was 2 = 1,408 and thus considerably larger than the critical value of the 2 distribution.
Thus, if the two models fitted the data equally well in the population, such a difference in
HIERARCHICALLY STRUCTURED CONSTRUCTS 9
model fit would be very unlikely to emerge. These findings (along with the improvement seen
in the descriptive fit statistics CFI, RMSEA, and SRMR) indicate that the first-order factor
Further, values of the standardized factor loadings of the subtests on first-order factors
(see Figure 1b) range from = .78 to = .90 (Mdn = .85), showing that each factor is well
defined and that the subtests are substantively influenced by the corresponding first-order
construct. Finally, the first-order constructs are strongly positively correlated with one
another: correlations between the latent constructs range from r = .74 (VCFO and PSFO) to r =
.90 (POFO and PSFO), suggesting that the first-order constructs share a considerable amount of
common variance.
The one-factor model focuses on general abilities and the first-order factor model on
specific abilities. Neither model simultaneously addresses general and specific abilities (that
are located at different levels of the ability hierarchy). In contrast, the higher-order factor
model (HO) and the nested-factor model (see next section) both consider the ability hierarchy
in its entirety. We start with the higher-order factor model, in which higher-order factors
reflect the operation of one or more higher-order constructs that explain the intercorrelations
(i.e., the common variance) among the lower-order constructs. Hence, a higher-order factor
Figure 1c shows the higher-order factor model for the WAIS–III subtests. As in the
first-order factor model, the subtests are influenced by four first-order factors (representing
Memory WMHO, and Processing Speed PSHO). This model implies, for example, that higher
scores on VCHO are associated with higher scores on the four verbal subtests. Again, subtests
HIERARCHICALLY STRUCTURED CONSTRUCTS 10
are also influenced by subtest-specific factors (i.e., eHO,1 to eHO,14) that can be interpreted in
the same way as in the first-order factor model. Subtest-specific factors are therefore mutually
independent and uncorrelated with first-order and higher-order factors. Hence, the part of the
model that links first-order constructs with subtests is structurally equivalent to the first-order
factor model.
In contrast to the first-order factor model, however, the shared variance of the first-
order factors is accounted for by a second-order factor gHO that represents the higher-order
construct general cognitive ability. Hence (if the higher-order factor model fits the data well),
gHO accounts for the correlations among the first-order factors (observed in the first-order
factor model) and thus explains the common variance of the first-order factors. This implies
that gHO influences all first-order constructs; higher scores on gHO are therefore associated
Consequently, there are two components to the variances of the first-order factors: one
component that is explained by gHO and one component that is independent of gHO (Edwards
& Bagozzi, 2000; Gorsuch, 1983). The latter component is represented in Figure 1c by
specific factors (e.g., specific Verbal Comprehension VCHO,specific) that point to the first-order
factors and that explain individual differences in the first-order factors over and above gHO. In
the present model (and in most applications of higher-order factor models), these specific
factors are uncorrelated with the higher-order factor gHO and among themselves. The total
variance of the first-order constructs therefore represents a blend of the variance attributable
to gHO and to specific factors (e.g., the variance of VCHO is a blend of the variance attributable
Close inspection of the higher-order factor model reveals that the impact of gHO on
manifest subtest scores is mediated by the first-order constructs (Edwards & Bagozzi, 2000;
Schmid & Leiman, 1957): gHO (indirectly) influences all subtests of the WAIS–III and is
therefore clearly broader in scope than the first-order constructs, which influence only a
HIERARCHICALLY STRUCTURED CONSTRUCTS 11
selection of subtests. The direct impact of the higher-order and specific factors on manifest
interpretability of higher-order and lower-order factors (see also Brown, 2006; Gorsuch, 1983;
model yields uncorrelated (first-order) factors that represent both the (higher-order) general
and the specific ability constructs. The factor loadings of the manifest measures on these
factors (see below for details of the computation) reflect the incremental impact of general
and specific abilities on the corresponding measures. Note that the Schmid-Leiman
original higher-order factor model; the empirical fit of both models is therefore identical
How well does the higher-order factor model fit the data? The overall model fit is
adequate (see Table 2). Note that the higher-order factor model is nested in the first-order
factor model (Rindskopf & Rose, 1988). Thus, it is possible to statistically test whether the
higher-order factor representing gHO is capable of fully accounting for the correlations
observed among the first-order constructs in the first-order factor model. 2 difference testing
indicates that this is not completely the case, 2(2, N = 1,369) = 55; p < .001. The next step
is therefore (see also McDonald, 2010) to carefully inspect the residual correlations r that
are computed as the difference between the model-implied correlations among the first-order
constructs and the corresponding correlations in the first-order factor model. These residual
correlations range between r = –.04 (for WMFO and VCFO) and r = .04 (for PSFO and VCFO).
These discrepancies are not ―troublingly large‖ (i.e., –.10 r .10; McDonald, 2010, p.
679). Thus, although 2 difference testing indicates that the higher-order factor gHO does not
HIERARCHICALLY STRUCTURED CONSTRUCTS 12
fully capture the correlations found among the first-order constructs in the first-order factor
model, the small residual correlations in combination with the adequate overall fit provide
empirical support for the theoretical proposition that cognitive abilities are hierarchically
corresponding first-order construct (range: = .78 to = .90; Mdn = .85), indicating that
these factors are well defined (see Figure 1c). Further, the first-order ability constructs are
strongly influenced by gHO: the standardized loadings on the higher-order factor are between
= .86 (VCHO on gHO) and = .97 (POHO on gHO; note that the upper bound of the
corresponding 95% confidence interval does not include 1.00), suggesting that the first-order
The Schmid-Leiman transformation can be used to estimate the direct impact of gHO
and specific ability constructs on the corresponding subtest scores (Table 3). Specifically, the
factor loadings of the manifest subtest scores on gHO (see Figure 1c) can be computed by
multiplying the factor loading of each subtest on the corresponding first-order factor by the
factor loading of this first-order factor on gHO. For example, the loading of the Information
subtest score on gHO is computed as .82 .86 = .70. Further, the loadings of the subtests on a
specific factor can be computed by multiplying the factor loading of each subtest on the
corresponding first-order factor by the standard deviation of the corresponding specific factor.
For example, the loading of the Information subtest score on VCHO,specific is .82 .51 = .42 (the
(Table 3) are noteworthy. First, the factor loadings of the subtests on gHO are large,
demonstrating that gHO exerts strong effects on all subtests: the median of the loadings on gHO
is = .77 (range: = .70 to = .87). Second, each subtest has substantial loadings on specific
HIERARCHICALLY STRUCTURED CONSTRUCTS 13
abilities (Mdn = .35; range: = .20 to = .45), indicating that specific ability constructs
have an incremental impact on the corresponding subtest scores, over and above gHO. Third,
the factor loadings of the subtests on gHO are considerably larger than the factor loadings of
the subtests on the factors representing specific abilities: Hence, the subtest scores contain
constraint affects the proportion of variance in the subtest scores explained by general and
specific ability constructs (Schmiedek & Li, 2004). Specifically, for a given set of subtests,
the ratios of variance attributable to the respective first-order ability to variance attributable to
gHO are constrained to be the same. For example, the standardized factor loadings on VCHO,
specific are = .415 for the Information subtest and = .449 for the Vocabulary subtest. The
standardized factor loadings of these subtests on gHO are = .703 and = .760, respectively.
Obviously, if the variance ratios are computed from the squared factor loadings, the ratios of
the variance attributable to gHO to the variance attributable to VCHO, specific are the same for the
two subtests: variance ratio for the Information subtest = .4152 / .7032 = .349; variance ratio
for the Vocabulary subtest = .4492 / .7602 = .349. Crucially, the proportionality constraint
limits the value of the higher-order factor model in providing insights into the relationship
between general and specific abilities, on the one hand, and other psychological constructs,
Discussion).
Nested-Factor Model
Another CFA model (that is not subject to the proportionality constraint) that
considers the ability hierarchy in its entirety is the nested-factor model (NF; Figure 1d). The
term ―nested-factor model‖ was chosen because the factors representing specific constructs
are nested within the general factor representing the general construct (see also Gustafsson &
HIERARCHICALLY STRUCTURED CONSTRUCTS 14
Balke, 1993). Other terms used to label this kind of model include ―general-specific model‖
and ―bifactor model‖ (Chen et al., 2006). As noted above, current theories of the structure of
influential three-stratum theory, Carroll (1993) defined general cognitive ability as the
broadest ability construct (located at the apex of the hierarchy) and narrower ability constructs
as specific to domains or cognitive operations (located at lower levels of the hierarchy). This
conception of general and specific abilities is reflected in the specification of the nested-factor
model: general cognitive ability is represented as a first-order factor gNF that directly
influences all subtests of the WAIS–III. Hence, as for the one-factor model, higher scores on
gNF are associated with higher scores on all 14 subtests. Further, the nested-factor model
incorporates the multifaceted view of intelligence and the idea that abilities differ in their
breadth, with related sets of subtests being affected by a first-order factor that represents a
VCNF,specific). For example, the first-order factor representing the (domain-specific) construct
over and above gNF. Higher scores on VCNF,specific are therefore associated with higher scores
on these four subtests. Further, subtests are additionally influenced by subtest-specific factors
(i.e., eNF,1 to eNF,14). Crucially, general cognitive ability gNF, specific abilities, and subtest-
specific factors (i.e., eNF,1 to eNF,14) are assumed to be mutually independent and are therefore
Finally, it is important to note that subtest loadings on factors that represent general or
specific abilities are freely estimated in the nested-factor model (vs. constrained in the higher-
order factor model). In contrast to the higher-order factor model, the nested-factor model does
not impose the proportionality constraint on variance ratios of general and specific abilities in
subtest scores. Hence, the nested-factor model can be seen as a generalization of the higher-
How well does the nested-factor model fit the data? The overall model fit of the
nested-factor factor model is good (see Table 2)—the best of the four models under
investigation. Note that the higher-order factor model can be tested against the nested-factor
model (Yung et al., 1999). Thus, it is possible to test whether the higher-order factor model
with its proportionality constraints fits as well as the nested-factor model with freely
estimated factor loadings. 2 difference testing indicates that this is not the case, 2(9, N =
1,369) = 194; p < .001. The descriptive fit statistics also show some improvement in the fit of
Further, the one-factor model can be tested against the nested-factor model (Rindskopf
& Rose, 1988). This test helps to decide whether the factors representing domain-specific
abilities and general cognitive ability gNF in the nested-factor model are better able to explain
the correlations among subtest scores than a single common factor representing gOF. The
difference in the 2 values is large, at 2(13, N = 1,369) = 1,547, with a probability value of
p < .001. This result (along with the improvement in the descriptive fit statistics CFI,
RMSEA, and SRMR) supports the assumption of the nested-factor model that specific
abilities account for a substantial amount of common variance among subtest scores, over and
above gNF.
Three further findings obtained for the nested-factor model are noteworthy (see Figure
1d). First, the factor loadings of the subtests on gNF are large, demonstrating that gNF has
strong effects on all subtests: the median of the loadings on gNF is = .77 (range: = .66 to
= .89). Second, each subtest has substantial (and statistically significant) loadings on specific
.52 (Digit Span on WMNF,specific), with a median loading of Mdn = .37. Hence, both specific
abilities and gNF affect subtests. However, the influence of specific abilities on subtests is
clearly not equally strong for all subtests. Third, the factor loadings of the subtests on gNF are
HIERARCHICALLY STRUCTURED CONSTRUCTS 16
considerably larger than those on the factors representing specific abilities: Hence the subtest
scores contain substantially more variance attributable to gNF than to specific abilities.
Taken together, these results indicate (a) that the nested-factor model captures the
correlations among the subtest scores of the WAIS–III reasonably well, (b) that the
assumption of proportionality constraints as imposed in the higher-order factor model may not
hold, and (c) that specific abilities account for a substantial amount of the common variance
among subtest scores over and above gNF. In sum, these empirical results support the
theoretical proposition that cognitive abilities are hierarchically structured and differ in their
generality.
The four CFA models presented above either focus exclusively on general cognitive
ability (i.e., OF) or specific abilities (i.e., FO) or consider the ability hierarchy in its entirety,
containing both general and specific constructs (i.e., HO and NF). At first glance, it may
appear that general and specific ability constructs as specified in these four models can be
interpreted interchangeably. However, as we explain in the following, this is generally not the
case.
We start with specific abilities, which are included in the first-order factor model, the
higher-order factor model, and the nested-factor model. Crucially, the substantive
interpretations of these specific ability constructs vary to different extents. In the first-order
factor model, the first-order constructs (i.e., VCFO, POFO, WMFO, and PSFO) affect the
intercorrelations of the first-order constructs. In the higher-order factor model, in contrast, the
first-order constructs (i.e., VCHO, POHO, WMHO, and PSHO) affect the subtests as in the first-
order factor model, but are in turn influenced by two independently operating factors, namely
to the-first order model, the higher-order factor model includes a higher-order construct that
HIERARCHICALLY STRUCTURED CONSTRUCTS 17
accounts for the interrelations among the first-order constructs: the higher-order factor gHO.
Consequently, the first-order constructs in the higher-order factor model contain variance
attributable to gHO (and variance attributable to a specific factor), whereas the variance of
ability constructs in the first-order factor model is not separated into components that are
In the nested-factor model, each subtest is directly affected by general cognitive ability
gNF and a specific ability construct (i.e., VCNF,specific, PONF,specific, WMNF,specific, or PSNF,specific).
Hence, in contrast to specific abilities in the first-order factor model (i.e., VCFO, POFO, WMFO,
and PSFO), specific abilities as conceptualized in the nested-factor model are abilities that
explain variance in the subtest scores over and above gNF. Thus, specific ability constructs in
the nested-factor model account for variance in subtest scores, account taken of the impact of
gNF on subtest scores, whereas the ability constructs in the first-order factor model explain
total variance in subtest scores. To sum up, it is only if there is no general construct operating
that corresponding specific constructs as specified in the first-order factor model (e.g., VCFO),
the higher-order factor model (e.g., VCHO,specific), and the nested-factor model (e.g.,
VCNF,specific) are identical. Thus, the stronger the empirical impact of the general construct on
manifest measures is, the more distinct the specific constructs in the first-order factor model
become from the corresponding specific constructs in the higher-order model and the nested-
The specific ability constructs in both the higher-order factor model (e.g., VCHO,specific)
and the nested-factor model (e.g., VCNF,specific) are specified to operate mutually independently
as well as independently of general cognitive ability (in terms of gHO or gNF). A refined
Brunner, 2008, p. 162). Ideally, this interpretation would rest on cognitive measures that tap
only the specific ability construct but not general cognitive ability. In the realm of ability
research, such measures have not yet been found (Brunner, 2008). However, problems with
HIERARCHICALLY STRUCTURED CONSTRUCTS 18
the labelling of specific constructs do not mean that specific constructs (as represented by
specific factors) do not have substantive meaning (see also the General Discussion).
research (e.g., clinical psychology), more readily identifiable labels for specific factors are
available. For example, in the tripartite model by Clark and Watson (1991), the variance in
anxiety and depression (which are both represented as first-order factors) can be decomposed
into three parts: variance attributable to (a) general negative affect (represented as higher-
order factor) that influences both anxiety and depression, (b) specific physiological
hyperarousal that affects only anxiety, and (c) a specific factor representing low positive
affect (―anhedonia‖) that affects only depression. In this example, the specific factors
represent symptoms that are independent of general negative affect and specific to anxiety and
depression, respectively.
Although specific constructs in the higher-order factor model and the nested-factor
model may have the same substantive interpretations, there are subtle differences between
them. Specifically, the impact of specific ability constructs (e.g., VCHO,specific) on subtest
scores as specified in the higher-order factor model is subject to the proportionality constraint,
whereas the impact of specific abilities (e.g., VCNF,specific) in the nested-factor model is not.
Hence, given the operation of a general construct, it is only when the proportionality
constraint holds that the corresponding specific constructs in the higher-order factor model
and the nested-factor model are mathematically identical. The more the empirical
relationships deviate from the proportionality constraint, the more distinct corresponding
specific constructs as specified in the higher-order and the nested-factor model become from
each other and the more distinct their substantive interpretations may become.
We now turn to the factor representing general cognitive ability, which is included in
the one-factor model, the higher-order factor model, and the nested-factor model. Except for
rare cases, these general ability constructs are not identical. In the one-factor model, gOF
HIERARCHICALLY STRUCTURED CONSTRUCTS 19
(along with the subtest-specific factors eOF) is the only influence on subtest scores. This is the
major difference to the higher-order factor model and the nested-factor model: in the higher-
order factor model, gHO (indirectly) influences subtest scores independently of the specific
constructs (e.g., VCHO,specific); in the nested-factor model, gNF (directly) influences subtest
scores independently of the specific constructs (e.g., VCNF,specific). Hence, gOF explains total
variance in the subtest scores, whereas both gHO and gNF explain variance in the subtest scores
influence manifest measures (i.e., if the corresponding factor variances are zero) that the three
general factors are identical. In other words, the stronger the empirical impact of specific
constructs on manifest measures is, the more distinct gOF, on the one hand, becomes from gHO
and gNF, on the other. Moreover, the impact of general cognitive ability gHO on subtest scores
as specified in the higher-order factor model underlies the proportionality constraint, whereas
the impact of gNF in the nested-factor model does not. Given that specific constructs influence
manifest measures, it is thus only if the proportionality constraint holds that gHO and gNF are
identical. In other words, the more the proportions of variance in the subtest scores that are
attributable to general and specific constructs deviate from the proportionality constraint, the
in the different CFA models are generally not (completely) interchangeable. Thus, the choice
of a CFA model that links cognitive measures to general and/or specific ability constructs
implies certain constraints on the interpretation of these constructs. Because the four CFA
models discussed are partly nested within each other (i.e., one model is a restricted version of
another), cross-model comparison by means of model fit indices and 2 difference tests can be
Cognitive abilities are not directly observable entities, but latent variables. To assess
an individual‘s cognitive abilities, we have to estimate his or her level on the respective latent
variable. In most applied psychological research, several manifest scale indicators are
summed using unit weights (i.e., each scale indicator has the same weight in the computation
of the sum score) to form a manifest scale score. This scale score gives an estimate of the
person‘s level on the latent general or specific ability construct (Grice, 2001). For example, a
scale score reflecting a person‘s level of general cognitive ability can be computed by using
unit weights to sum up his or her scores on all 14 subtests of the WAIS–III.3 But how reliable
To answer this question, we first show how reliability can be mathematically defined.
As most readers are familiar with the fundamental ideas of classical test theory (CTT), we
start by considering how CTT defines reliability. Within the framework of CTT, a person‘s
observed score is partitioned into one component that reflects his or her true score and one
component that is independent of the true score and reflects measurement error (Lord &
Novick, 1968, p. 29). The observed score variance is thus composed of variance attributable
to true scores (true score variance) and variance attributable to measurement error (error
variance). Score reliability in the context of CTT is thus mathematically defined in terms of
the proportion of true score variance to observed score variance (Lord & Novick, 1968, p. 61).
The mathematical definition of reliability in the context of CTT has two conceptually
overlapping meanings. Reliability (a) assesses the consistency of measurement (across time or
across instruments) and (b) is an index of measurement precision (Lord & Novick, 1968;
McDonald, 1999; Mellenbergh, 1996). In this article, we draw on the conceptual definition of
estimates of score reliability by means of CFA models.4 As we show below, for the one-factor
model and the first-order factor model, the total amount of reliable variance provides an
estimate of how precisely a certain scale score assesses a certain target construct. For the
higher-order factor model and the nested-factor model, however, different model-based
reliability indices can (a) estimate the total amount of reliable variance in a scale score or (b)
indicate how precisely a certain scale score measures a certain target construct. We therefore
discuss the computation and interpretation of model-based score reliabilities separately for
reliability (as well as all reliability estimates based on CTT) are population dependent
(Mellenbergh, 1996). Thus, score reliability depends on how heterogeneous the sample is on
relationship between the true score concept of CTT and construct scores as defined in CFA
models (Bollen, 1989, p. 219; Borsboom & Mellenbergh, 2002). For example, Borsboom and
Mellenbergh (2002) point to fundamental conceptual differences between true scores and
construct scores. In contrast, proponents of stochastic measurement theory (e.g., Steyer, 1989)
integrate CTT into CFA models by incorporating statistical assumptions on the relationship
between true scores and construct scores (see also Bollen, 1989, pp. 218-222). Here, we take a
model-based approach to score reliability in the context of CFA models, as perhaps most
clearly elaborated in Bollen (1989) and McDonald (1999). For didactic reasons, we point to
some general similarities between CTT and the model-based approach. A thorough discussion
of the (sometimes subtle) conceptual and statistical differences between these psychometric
One-Factor Model
We first analyze how well the scale score representing general cognitive ability
assesses the latent construct general cognitive ability in terms of gOF (Figure 1a). In the one-
HIERARCHICALLY STRUCTURED CONSTRUCTS 22
factor model, the variance of the latent factor representing gOF can be interpreted as the
reliable (―construct score‖) variance of the score representing general cognitive ability.
Further, gOF and subtest-specific factors are specified to be unrelated, reflecting the idea that
construct score and error score are mutually independent. As noted above, the subtest-specific
factors (i.e., eOF,1 to eOF,14) may comprise both reliable subtest-specific variance and
specific variance that is not shared with the target construct (e.g., gOF) should be seen as part
of a measure‘s reliable variance (Bollen, 1989, pp. 219-221). Given that variance attributable
to measurement error and reliable subtest-specific variance are typically not separated in
applications of CFA, we do not consider the latter to be part of the reliable variance, and we
do not take it into account when computing scale score reliability (Bollen, 1989, pp. 220-221).
Hence, the model-based reliability estimates that we discuss in the present article may be
scale score may be defined as the proportion of variance accounted for by one latent target
construct (e.g., gOF) relative to observed score variance. In line with McDonald (1999) and
Zinbarg and colleagues (Zinbarg, Revelle, Yovel, & Li, 2005; Zinbarg et al., 2006), we refer
to this reliability coefficient as omega (). More formally, these ideas can be expressed as
follows. When unit weights are used, a scale score Y is computed by summing up p manifest
scale indicators Yi: Y = Y1 + Y2 + ... + Yp. When standardized model parameters are used, for
2
p
ij
i 1 . (1)
2
p
p
ij ei
i 1 i 1
HIERARCHICALLY STRUCTURED CONSTRUCTS 23
Here, ij is the standardized factor loading of manifest measure Yi on factor j, and ei is
the standardized variance of the subtest-specific factor affecting the manifest variable Yi. The
numerator in Equation 1 represents the amount of score variance in the scale score Y that can
be attributed to the variance of the factor representing the target construct. The denominator
represents the total variance of the scale score, which comprises (a) the score variance
accounted for by the target construct and (b) the variances attributable to the subtest-specific
factors of the scale indicators. Values of omega can range from 0 (no reliability) to 1 (perfect
reliability). In other words, when a one-factor model is applied, a value of = 1 indicates that
the sum score Y measures the target construct with perfect accuracy; the more omega departs
from 1, the lower the precision with which Y measures the latent target construct.
Table 4 presents the computation used to derive omega for the scale score General
When the model parameters obtained for the one-factor model are used (Figure 1a),
omega of the General Cognitive Ability score is computed as the ratio of the variance
attributable to gOF to the total variance of this score. The total variance of the General
Cognitive Ability score is the sum of the variances that can be attributed to (a) gOF and (b)
subtest-specific factors (i.e., the sum of the variances of eOF,1 to eOF,14). The value of = .96
represents the reliability of the General Cognitive Ability score to measure gOF. In other
words, 96% of the variance in this scale score is accounted for by gOF. Table 5 reports the
composition of scale score variance in terms of variance attributable to gOF and subtest-
specific factors (eOF,1 to eOF,14). Note that the omega value obtained for the one-factor model
should be interpreted with caution, as the fit of this model to the data was modest (see also the
discussion below).
How can the reliabilities of scale scores be computed in the context of a first-order
factor model? Because each measure is assumed to reflect one latent target construct only, the
scale score‘s reliability can be computed in the same way as for the one-factor model, using
omega (see Equation 1). For example, the scale score assessing the target construct Verbal
Comprehension in terms of VCFO is computed as the unit-weighted sum of the subtest scores
this scale score assesses VCFO is then computed as the ratio of variance attributable to VCFO
(i.e., the squared sum of the corresponding standardized factor loadings; see Figure 1b) to the
total variance of this scale score (i.e., the squared sum of the corresponding standardized
factor loadings plus the sum of the corresponding subtest-specific variances). Table 4 shows
the necessary computation. The value of = .91 indicates that 91% of the variance in the
scale score reflecting Verbal Comprehension is attributable to VCFO. Table 5 reports the
omega values and the variance composition for the other scale scores measuring specific
abilities.
In the one-factor model and the first-order factor model, each subtest is affected by a
single ability construct. In the higher-order factor model, subtests are affected by both general
and specific ability constructs. The computation of a scale score‘s reliability is thus more
complex: In the higher-order factor model, the observed variance of a manifest subtest score
is composed of (a) the variance attributable to the general/higher-order construct (i.e., gHO),
(b) the variance attributable to the specific constructs (e.g., VCHO,specific), and (c) subtest-
specific factors (i.e., eHO,1 to eHO,14). When several subtests are summed to create a scale
score, the total variance of this scale score thus comprises variance attributable to general
cognitive ability and variance attributable to a certain specific ability (in addition to variance
the one-factor model and in the first-order factor model, this definition can be mathematically
expressed as the proportion of variance in the target construct to observed score variance
(McDonald, 1999; Mellenbergh, 1996). In the context of higher-order factor models (and
nested-factor models), the computation of score reliability is more complex: The model-based
Omega. The first way of defining reliable variance is as the amount of variance
accounted for by all (i.e., general/higher-order and specific) constructs that underlie a scale
score. In line with Zinbarg and colleagues (Zinbarg et al., 2005; Zinbarg et al., 2006), we
again refer to this reliability coefficient as omega (). In the case of a higher-order factor
model with k mutually orthogonal latent factors that represent k (general/higher-order and
k p
2
ij
j 1 i 1
= . (2)
k
p
2
p
ij ei
j 1 i 1
i 1
Note that the numerator in Equation 2 represents the total amount of variance that can
be attributed to the variances of the k constructs that underlie the scale score Y. The
denominator represents the total variance of the scale score, which comprises (a) the total
variance accounted for by all k underlying constructs and (b) the variances attributable to
subtest-specific factors. Omega thus informs on the reliability (i.e., measurement precision)
with which a scale score assesses the blend of the general/higher-order and specific
we present the necessary computations for the General Cognitive Ability score of the WAIS–
HIERARCHICALLY STRUCTURED CONSTRUCTS 26
III in Table 4. The omega value of this score can be computed by calculating the total amount
of variance attributable to latent constructs in the scale sum (i.e., variance due to gHO,
VCHO,specific, POHO,specific, WMHO,specific, and PSHO,specific), using the standardized factor loadings
as obtained by applying the Schmid-Leiman transformation (Table 3). The denominator is the
total variance of the General Cognitive Ability score—the sum of the variances attributable to
(a) gHO, (b) VCHO,specific, (c) POHO,specific, (d) WMHO,specific, (e) PSHO,specific, and (f) subtest-
specific factors (i.e., eHO,1 to eHO,14). The value of = .97 represents the reliability of the
General Cognitive Ability score to measure the blend of gHO and specific abilities. Table 5
displays the omega values and the variance composition for scale scores measuring specific
abilities.
variance accounted for by all (i.e., general/higher-order and specific) constructs that underlie a
scale score. Alternatively, reliable variance may be defined as the variance in a scale score
accounted for by just one target construct (represented by factor j). To this end, we adapt the
colleagues (Zinbarg et al., 2005; Zinbarg et al., 2006). Specifically, these researchers
developed the reliability coefficient omega hierarchical (h), which gauges how precisely a
Note that they did not use omega hierarchical to compute score reliability of specific
constructs, as specified in the present models. Nevertheless, we apply the term omega
hierarchical in this article, because the same methodological approach is used. In this article,
h is defined as follows:
2
p
ij
h i 1 . (3)
k p
p
2
ij ei
j 1 i 1
i 1
HIERARCHICALLY STRUCTURED CONSTRUCTS 27
Note that h indicates the proportion of variance in the scale score that is accounted
transformed higher-order factor model) to total observed variance (i.e., the sum of the
variances accounted for by all k underlying constructs and the sum of all variances
attributable to subtest-specific factors of the p subtests). Thus, omega hierarchical reflects the
measurement precision with which a scale score assesses a certain target construct; it can
We now illustrate the computation of h for the General Cognitive Ability Score. To
compute the omega hierarchical of this score to assess gHO, we again enter the standardized
Equation 3. The denominator representing total scale score variance is thus identical to the
one used to calculate omega (see Table 4). However, the numerator now represents the
variance accounted for by gHO only. Therefore, it contains only the squared sum of the
loadings of the 14 subtests on gHO. The value of h = .92 represents the reliability (i.e.,
measurement precision) of the General Cognitive Ability score to measure the construct gHO.
By the same token, omega hierarchical for the specific abilities (e.g., VCHO,specific) is
computed by only taking into account the standardized factor loadings of the subtests (see
Table 3) on these specific factors when computing the numerator of Equation 3. The
corresponding denominator contains the total variance of the corresponding specific scale
scores, computed as the sum of the variances attributable to (a) gHO, (b) a certain specific
ability factor (e.g., VCHO,specific), and (c) corresponding subtest-specific factors (e.g., eHO,1 to
eHO,4). The value of h = .23 (see Table 5) represents the reliability (i.e., measurement
Table 5 displays the omega hierarchical values and the variance composition for all
scale scores. As the scale scores measuring specific abilities contain a great proportion of
HIERARCHICALLY STRUCTURED CONSTRUCTS 28
variance attributable to gHO, the omega hierarchical values of these scores are relatively low
(ranging from h = .06 for Perceptual Organization to h = .23 for Verbal Comprehension).
Nested-Factor Model
How can the reliabilities of scale scores be computed in the context of the nested-
factor model? Because each subtest is assumed to be influenced by both general cognitive
ability gNF and a certain specific ability, the scale score‘s reliability can be computed in the
same way as for the higher-order factor model, using omega (see Equation 2) and omega
For example, reliability of the Verbal Comprehension score is computed as the ratio of
variance attributable to gNF and VCNF,specific (i.e., adding up the squared sums of the
corresponding standardized factor loadings of subtests on gNF and VCNF,specific; see Figure 1d)
to the total variance of this scale score (i.e., adding up the squared sums of the corresponding
standardized factor loadings of subtests on gNF and VCNF,specific plus the sum of the
value of = .91 indicates that 91% of the variance in the Verbal Comprehension score is
attributable to the blend of gNF and VCNF,specific. This omega value therefore reflects how well
the Verbal Comprehension score measures the blend of general cognitive ability and specific
Verbal Comprehension.
Omega hierarchical, in contrast, reflects how well the Verbal Comprehension score
assesses VCNF,specific. It is computed as the ratio of variance attributable to VCNF,specific (i.e., the
the total variance of the Verbal Comprehension score. The total variance is computed in the
same way as for omega. The value of h = .23 thus indicates that 23% of the variance in the
Verbal Comprehension scale score is attributable to VCNF,specific. This is the reliability (i.e.,
measurement precision) with which the Verbal Comprehension score assesses VCNF,specific.
Table 5 shows how well scale scores assessed the blend of general and specific abilities (in
HIERARCHICALLY STRUCTURED CONSTRUCTS 29
terms of ) and a certain ability construct (in terms of h). As for the higher-order factor
model, scale scores showed relatively low reliability (in terms of h) in assessing specific
Statistical Requirements
Omega and omega hierarchical are based on parameter estimates (i.e., estimates of
factor loadings and factor variances) that are derived for a certain CFA model. Hence, two
vital statistical requirements need to be fulfilled: (1) Proper interpretation of omega and
omega hierarchical requires that the target model fits the empirical data well (Bentler, 2009;
McDonald, 1999; Yang & Green, 2010). (2) Parameter estimates need to be precise.
We first address the evaluation of model fit, which should achieve an optimal balance
between the fit of the model to the empirical data, on the one hand, and theoretical
considerations, on the other. There has been considerable debate on which fit indices should
be used and on the strategies applied to evaluate model fit (Hu & Bentler, 1999; Jöreskog,
1993; Marsh, Hau, & Wen, 2004; McDonald, 2010). Although no consensus has yet been
reached, several methodologists strongly recommend comparing the preferred target model
with several a priori specified and theoretically supported alternatives. This approach takes
into account that cutoff values of model fit indices are model dependent, considers alternative
explanations of the data, and allows some models to be ruled out while giving stronger
support for others (MacCallum & Austin, 2000; Marsh, Hau, & Grayson, 2005; West, Taylor,
& Wu, in press). In this tutorial, for example, we computed omega for the General Cognitive
Ability score using the results obtained for the one-factor model for illustrative purposes.
However, we would be very cautious in interpreting this value as the reliability of the General
Cognitive Ability score to assess the latent construct general cognitive ability, because the
one-factor model provided only a modest fit to the data and—even more importantly—a
poorer fit than alternative CFA models. As the nested-factor model was theoretically derived
HIERARCHICALLY STRUCTURED CONSTRUCTS 30
and provided the best fit of the four models under investigation, we would use the model
parameters obtained for this model to compute score reliability (in terms of and h).
We now turn to the precision of model parameters, which is affected by two key
factors. First, sample size needs to be sufficiently large to obtain trustworthy estimates of
model parameters (Yang & Green, 2010).5 In general, a larger sample size is always better,
and a sample size of N 200 allows proper estimation of model parameters (e.g., nonnegative
Hoogland, 2001). There is also growing consensus that the required sample size depends on
the properties of the model investigated and the data to be analyzed: A higher ratio of
measures per factor and higher factor loadings may compensate for smaller sample size
(Marsh, Hau, Balla, & Grayson, 1998; Yang & Green, 2010). Thus, methodologists strongly
encourage applied researchers to conduct Monte Carlo studies of the target CFA models to
determine the required sample size (L. K. Muthén & Muthén, 2002). For example, previous
simulation studies have demonstrated that trustworthy model-based reliability estimates may
be obtained even with relatively small sample sizes (e.g., N = 100; see Zinbarg et al., 2006).
Second, parameters for CFA models are typically derived by maximum likelihood
estimation, which requires continuous raw data that follow a multivariate normal distribution.
However, many studies in personality and interindividual differences research administer self-
report items with a limited number of response options (e.g., ―disagree,‖ ―disagree
somewhat,‖ ―agree somewhat,‖ ―agree‖); therefore, the assumption that raw data are
continuous may not be tenable. Moreover, empirical data frequently fail to follow a normal
what can be done? Model parameters—including factor loadings and variances of subtest-
three conditions are fulfilled: the raw data are continuous, the sample size is reasonably large,
and the assumption of multivariate normality is not severely violated. Parameter estimates are
HIERARCHICALLY STRUCTURED CONSTRUCTS 31
quite robust to violations of the multivariate normality assumption as long as the indicators
are ―reasonably‖ continuous. For example, a recent simulation study by Rhemtulla, Brosseau-
Liard, and Savalei (2010) demonstrated that maximum likelihood-based estimation methods
yield acceptable parameter estimates for CFA models under a wide range of conditions, even
tackle the problem—for example, employing alternative (robust) estimation methods with less
stringent distributional assumptions or transforming the input data to better match the
distributional assumptions. Modern software packages used to study CFA models include
robust estimation methods, such as robust maximum likelihood estimation (Satorra, 1990) and
robust weighted least squares estimation (B. O. Muthén, 1984; B. O. Muthén & Kaplan,
1985). These estimation methods may yield higher precision (a) to assess model fit, (b) to
compute standard errors of model parameters, and (c) in the case of robust weighted least
squares, to estimate the model parameters themselves. Thus, robust weighted least squares
may also be an appropriate method for analyzing item-level data from items with fewer than
four response categories (Rhemtulla et al., 2010). Further information on this method can be
found in Wirth and Edwards (2007), who provide an excellent review of factor models and
various estimation methods for item-level data. Moreover, robust maximum likelihood
estimation allows the use of omega and omega hierarchical as explained in this tutorial; in the
case of weighted least squares estimation, score reliability may be estimated using the
Alternatively, item scores that are intended to measure the same construct(s) may be
integrated into parcel scores. Subtest scores (as applied in this article) are a special case of
parcel scores (i.e., all items making up a subtest are integrated into one subtest score). Parcel
scores may then be used as manifest measures of the latent variables in CFA models, and
model parameters can be estimated by (robust) maximum likelihood procedures. Parcel scores
HIERARCHICALLY STRUCTURED CONSTRUCTS 32
may have several advantages over item scores: they show better distributional properties (i.e.,
normality), keep the ratio of observable measures to latent constructs manageable, and
increase the chances of adequate model fit (Bagozzi & Edwards, 1998; Hall, Snell, & Singer
Foust, 1999; Little, Cunningham, Shahar, & Widaman, 2002; West, Finch, & Curran, 1995).
Crucially, when parcel scores are applied, two key requirements need to be fulfilled: (a) The
parcel scores must adequately represent the target construct(s) (Bagozzi & Edwards, 1998;
Little et al., 2002). (b) The dimensional structure underlying the items needs to be taken into
account. Otherwise, inaccurate parameter estimates and model fit statistics will result (Hall et
al., 1999; Little et al., 2002). Ideally, the inter-item structure is unidimensional (Little et al.,
2002). For example, when a one-factor model fits reasonably well to a set of items, these
items may be randomly distributed to parcels (for other parceling strategies, see Hall et al.,
Given that the statistical requirements for CFA are met, several aspects of the
consideration. First, in this article, we applied WAIS–III subtest data to estimate the
the hierarchy (i.e., specific and general ability constructs located at the intermediate and the
top levels of the ability hierarchy). Importantly, the methods outlined in this tutorial may also
be applied to obtain model-based reliability estimates when item scores are used (in place of
subtest scores) or when more than two levels of the construct hierarchy are investigated.
Second, a model-based approach to estimating score reliability may render not only
the computation but also the concept of score reliability more complex. CTT defines score
reliability ―as the proportion of true-score variance, without considering the composition of
the true score,‖ whereas model-based approaches to reliability ―decompose the true-score
variance into different variance components, and the researcher has to decide which variance
HIERARCHICALLY STRUCTURED CONSTRUCTS 33
models and first-order factor models are applied, the omega and omega hierarchical values of
a scale score are identical, as the reliable (―construct score‖) variance of the corresponding
scale scores is not divided into variances attributable to general/higher-order and specific
the classic definition of score reliability as the proportion of ―construct score‖ variance to
total score variance applies. In the case of the one-factor model and the first-order factor
model, omega therefore indicates the precision with which a scale score assesses a certain
target construct. Interestingly, this interpretation of omega converges with the concept of
construct validity—the extent to which a measure assesses the construct it was designed to
measure (Bollen, 1989, p. 195; McDonald, 1999, p. 63 and p. 208). Note, however, that this
interpretation of omega applies only for researchers who conceive of validity as a quantitative
concept, and not for those who conceive of validity as a qualitative concept (i.e., a measure is
or is not valid to assess a certain target construct). The latter researchers may consider two
measures to be valid, but one to be more reliable (Borsboom, Mellenbergh, & van Heerden,
2004, p. 1070). Omega is thus an index of reliability in terms of measurement precision only.
In contrast to the one-factor and the first-order factor model, the higher-order factor
and the nested-factor model imply that scale scores can be conceived of as assessing more
than one ability construct simultaneously. Thus, these models involve two forms of reliability:
(a) omega indicates how precisely a score measures the blend of general and specific
constructs, whereas (b) omega hierarchical indicates how precisely a score measures a certain
target construct at a certain level of the hierarchy. Omega is therefore closely tied to the
classic definition of score reliability (Bollen, 1989, p. 221): it reflects the total amount of
reliable score variance to the total scale score variance. When constructs are hierarchically
structured, however, the composition of reliable (―true score‖) variance entered in the
computation of omega is complex. Omega values are therefore ambiguous with respect to the
HIERARCHICALLY STRUCTURED CONSTRUCTS 34
key question of how precisely a score assesses a certain target construct. It is omega
quantitative concept (see above), it is also omega hierarchical that may be interpreted in terms
Third, in applied assessment, the focus of interest may also be on the total amount of
reliable variance of a scale score at the bottom level of the hierarchy (e.g., the Information
reliability estimate: (a) The total reliable variance of such a scale score can be defined as the
degree to which this score is free of ―error‖ in terms of the subtest-specific factors ei (Bollen,
1989, pp. 220-221). For example, when using the standardized results obtained for the nested-
factor model (Figure 1d), the total reliable variance of the Information subtest score is
estimated by 1 – eNF,1 = 1 – .34 = .66. As noted above, however, the subtest-specific factors
may comprise both reliable subtest-specific variance and unreliable variance attributable to
random measurement error (Bollen, 1989, pp. 219-221). Hence, the result may be interpreted
is not taken into account. Several alternatives based on the interrelationships of the items
entering the scale score can be used to overcome this problem: (b) alpha (Cronbach, 1951), (c)
reliability estimates based on a unidimensional nonlinear factor analytic model (Green &
Yang, 2009; Raykov, Dimitrov, & Asparouhov, 2010), or (d) a unidimensional item-response
model (Mellenbergh, 1996). In many cases, these reliability estimates can be expected to be
larger than those obtained by approach (a), because they can take into account reliable
subtest-specific variance. Crucially, when interpreting the values obtained from approaches
(a) to (d), researchers should be aware that scale scores at the bottom level of the hierarchy
may measure several constructs simultaneously. Thus, these reliability estimates may reflect
the precision with which a scale score assesses the blend of higher-order constructs (approach
[a]) or the blend of subtest specific and higher-order constructs (approaches [b] to [d]).
HIERARCHICALLY STRUCTURED CONSTRUCTS 35
Fourth, latent variable models are blind to several threats to the validity of statistical
relationships between latent variables (Cohen, Cohen, West, & Aiken, 2003). Obtaining
manifest composite scores and visually inspecting plots can therefore help to carefully
diagnose regression relationships (Cohen et al., 2003; West, 2006; Wilkinson & Task Force
hierarchical (if HO or NF are empirically supported) may thus serve a useful purpose, as these
reliability coefficients inform on the measurement precision with which a manifest score
assesses a certain latent target construct. Specifically, when hierarchical construct definitions
are endorsed, omega hierarchical may help (see also Zinbarg et al., 2006, p. 124) to evaluate
regression relationships and to judge whether unexpected results (based on manifest scores)
are due to random error or to the fact that the score does not precisely measure the target
Fifth, as noted in the review by Hogan, Benjamin, and Brezinski (2000), researchers
often use alpha (e.g., Cronbach, 1951) to estimate score reliability (see also Streiner, 2003).
However, alpha is not suitable for estimating the reliability of measures of hierarchically
structured constructs that operate at various levels of generality. Alpha does not indicate how
reliably either (a) a specific construct or (b) the general construct can be measured. On the
contrary, the alpha value reflects a complex blend of variance attributable to the general
construct, variance attributable to the specific constructs, and variability in factor loadings of
the scale indicators assessing general and specific constructs (Zinbarg et al., 2005). Hence,
unlike omega or omega hierarchical, alpha is not suitable for estimating score reliability when
model.
Sixth, in this tutorial we computed the values of omega and omega hierarchical by
hand (see Table 4). However, these values can also be estimated by structural equation
HIERARCHICALLY STRUCTURED CONSTRUCTS 36
software (e.g., Mplus, L. K. Muthén & Muthén, 1998–2010) within the methodological
approach presented by Cheung (2009). The necessary Mplus syntax is provided in the online
supplement. The ―omega‖ function of the R-package ―psych‖ (Revelle, 2010) is an excellent
tool for computing omega hierarchical for a general factor derived by exploratory factor
analysis.
Seventh, the results of the higher-order factor and the nested-factor model converged
to show that score reliabilities in terms of omega were satisfactory, whereas reliabilities of
scores to assess specific constructs in terms of omega hierarchical were relatively low (Table
5). For example, for the nested-factor model, omega values ranged from .88 (Working
Memory) to .93 (Perceptual Organization), whereas omega hierarchical values ranged from
.05 (Perceptual Organization) to .23 (Verbal Comprehension). It was only the General
Cognitive Ability score that captured gNF with sufficient measurement precision (h = .93).6
construct that explains much of the common variance in the manifest measures. Thus, the
observation that values of omega hierarchical are low for scale scores capturing specific
traditional self-report dimensions (DeYoung, 2006), disorders like depression (Steer et al.,
1999; Tanaka & Huba, 1984), subjective wellbeing (Chen et al., 2006; Gallagher et al., 2009),
and intelligence. For example, in the assessment of intelligence, it can be assumed—as a rule
general intelligence accounts for roughly 50% of the common variance‖ (Lubinski, 2004, p.
98). Hence, the observation that scores show relatively low measurement precision (in terms
In diagnostic settings, the low values of omega hierarchical obtained for the WAIS–III
scores assessing specific constructs are problematic, because they imply large confidence
intervals around a respondent‘s scale score. Any interpretations of a person‘s level of specific
ability therefore involve great uncertainty. When scale scores are interpreted to represent a
blend of general and specific ability constructs, however, the values of omega of the scores
are found to be satisfactory. Hence, the confidence intervals around the scale scores are
relatively small. In this case, however, interpretations of scale scores capturing specific
constructs should take into account that these scores represent the joint functioning of general
and specific abilities (when higher-order factor models or nested-factor models are applied).
assumptions of CFA are met and that the target model provides a good fit to the data. The
choice of a reliability index ( or h) should then be guided by (a) the CFA model applied
General Discussion
In this tutorial, we elaborated on four different kinds of CFA models that are widely
used to study various aspects of the theoretical proposition that psychological constructs are
hierarchically structured. In this section, we synthesize the key points of our theoretical and
empirical analyses: For which questions in personality and individual differences research is
each model most appropriate? And which model supports the computation of which
composite scores?
One-Factor Model
The one-factor model focuses on a single, very general construct. Thus, it is most
applicable when (a) a general construct is hypothesized to account for the common variance
among measures, (b) no specific constructs beyond that general construct are predicted to
account for the common variance, (c) the research objective is to study a general construct
characteristics, or life outcomes), (d) the fit of a one-factor model is satisfactory, and (e) more
complex models do not provide a better fit than the one-factor model. Model comparison is
specified in the first-order factor model, are empirically distinguishable or whether specific
theoretically supported and provides a good fit to the data, it is inconsistent to use anything
but a total scale score for research and diagnostic purposes. An appropriate reliability index of
Relative to the one-factor model, the first-order factor model focuses on constructs that
are narrower in scope. It is therefore most applicable when (a) (mutually correlated) first-
order constructs are hypothesized to account for the common variance among corresponding
measures, or (b) no general construct or higher-order construct is predicted to operate, and (c)
the research objective is to study the relationship between specific constructs (but not higher-
characteristics, or life outcomes. Moreover, the first-order factor model is (d) a useful tool to
analyze the multifaceted nature of psychological constructs in terms of their convergent and
discriminant relationships (see Stoel et al., 2006, for appropriate statistical methods for testing
the discriminability of latent constructs). Further, (e) as the relationships among first-order
constructs may point to the operation of more general or higher-order constructs, careful
examination of first-order factor models is the traditional starting point for analyses of higher-
order factor models that include such constructs (Marsh, 1987). Hence, the first-order factor
model is particularly useful as a comparison model for more restrictive higher-order factor
models. If the first-order factor model provides a good fit to the data, the use of specific scale
scores for research and diagnostic purposes is strongly supported, but the use of a total scale
HIERARCHICALLY STRUCTURED CONSTRUCTS 39
score is not supported (McDonald, 1999, p. 208). An appropriate reliability index for the
are typically found to be correlated. Thus, researchers should be aware of two issues: First, as
which their specific constructs (as represented by first-order factors) are correlated (e.g., see
van der Maas et al., 2006, for intelligence). Second, if specific constructs as specified in the
first-order factor model are used as predictors in a regression context, the regression
coefficients may be affected by multicollinearity. This may render both the substantive
interpretation of regression coefficients and their sampling stability problematic (Cohen et al.,
confidence intervals will be large, implying high uncertainty about the magnitude of an effect.
Note that multicollinearity cannot occur in the other three CFA models: In the one-factor
model, a single construct is specified; in the higher-order factor model and the nested-factor
Commonalities
Both the higher-order and the nested-factor models focus on the construct hierarchy in
generality. Thus, both models may be applied when (a) a general construct is hypothesized to
account for the common variance among measures, (b) multiple specific constructs are
above the general construct, and (c) the research objective is to study specific and general
constructs (Chen et al., 2006). Further, (d) if there is no theoretical explanation of why first-
order constructs are correlated (see above), many methodologists recommend including
HIERARCHICALLY STRUCTURED CONSTRUCTS 40
constructs with the widest possible generalizability in their models to provide ―the fullest
possible understanding of the data‖ (Gorsuch, 1983, p. 255). This rationale applies equally to
higher-order and nested-factor models. As both models include constructs that operate at two
levels of generality, both support the computation of specific scale scores and a total scale
score. Crucially, these scale scores may be interpreted with respect to either (a) the blend of
general and specific constructs or (b) a certain target construct only. Depending on the
interpretation chosen, different reliability estimates apply: (a) omega or (b) omega
hierarchical.
Given that the higher-order factor model and the nested-factor model have much in
common, which of these two models should be used in a particular context? If the general
construct is the focus of research, and if the higher-order factor model fits the data as well as
the nested-factor model, the higher-order factor model is preferable to the nested-factor model
In all other cases, but particularly in applications where the research focus is on the
relation of latent general and specific constructs to external criteria, the nested-factor model is
preferable to the higher-order factor model. More specifically, the higher-order factor model
is subject to the proportionality constraint, which renders the estimated relations of external
criteria to general and specific constructs linearly dependent (Schmiedek & Li, 2004). Thus, it
is not possible to examine the associations between all general and specific constructs as
specified in the higher-order factor model and external criteria (for further discussion, see
Schmiedek & Li, 2004). The nested-factor model, in contrast, allows the relationships
between general and all specific constructs, on the one hand, and external variables, on the
researchers need to constrain the relationship of one construct (either the general construct or
any of the specific constructs) with an external criterion variable to zero. However, this
HIERARCHICALLY STRUCTURED CONSTRUCTS 41
requires the correct identification of these zero relationships; otherwise, model parameters
may be biased (see the discussion by Schmiedek & Li, 2004, on potential problems in
identifying these relationships). Taken together, all constructs specified in the nested-factor
model may be linked to external variables. This key advantage makes the structural
fruitful approach to implementing the specificity matching principle (e.g., Swann, Chang-
Schneider, & McClarty, 2007; see also Wittmann, 1988). According to this principle, it is best
to use specific predictor variables (e.g., mathematical ability test scores) to predict specific
outcomes (e.g., mathematics grades); likewise, it is best to use general predictor variables
(e.g., intelligence g) to predict general outcomes (e.g., grade point average). Application of
this principle has helped to reconcile opposing perspectives on the power of personality traits
(Fleeson, 2004), attitudes (Ajzen & Fishbein, 2005), and (perhaps) self-concepts (Marsh &
Craven, 2006; Swann et al., 2007) to explain key outcome variables at different levels of
nested-factor model has helped researchers to understand the interplay between general and
specific abilities, on the one hand, and school grades, academic interests, and students‘
socioeconomic status, on the other (Brunner, 2008; Gustafsson & Balke, 1993). From a wider
perspective, these studies clearly demonstrate the value of measures assessing general and
specific constructs, because both general and specific constructs play a key role in
common for the specific factors to collapse—as was the case, for example, in the Gustafsson
and Balke (1993) study on the structure of intelligence or in the Chen et al. (2006) study on
the structure of quality of life. Moreover, in the present tutorial we had to constrain the factor
HIERARCHICALLY STRUCTURED CONSTRUCTS 42
loadings of the two subtests on PSNF,specific to be equal because the nested-factor model was
models, researchers need to carefully examine parameter estimates, standard errors, and
variances of general and specific factors: Parameters out of the range of admissible values
(e.g., standardized factor loadings greater than 1 or negative variances of latent variables),
large standard errors, or variances of general or specific factors very close to zero may
A further limitation of the higher-order factor and nested-factor models concerns the
assumption that general and specific constructs are mutually uncorrelated. If this constraint is
removed and the correlations among general and specific constructs are freely estimated,
identification problems are likely to occur (Chen et al., 2006; Rindskopf & Rose, 1988). What
can be done to overcome this problem? If correlations among specific factors and/or the
general factor are of interest, there are several ways to identify the higher-order factor model
and the nested-factor model. First, equality constraints may be imposed on factor loadings.
For example, Wilhelm and Oberauer (2006) analyzed the relationship between reasoning
ability, working memory, and mental speed. They used experimental manipulations to design
measures of mental speed and examined precisely formulated hypotheses on how cognitive
loadings. These constraints then allowed them to investigate the correlation among general
and specific cognitive constructs. Note that constraining model parameters to be equal (or to
any other predefined values; see below) assumes that these assumptions hold in the target
careful evaluation of model fit (particularly of local misfit; see Tomarken & Waller, 2003,
Graham and Collins (1991), researchers may ―borrow strength‖ to achieve identification by
HIERARCHICALLY STRUCTURED CONSTRUCTS 43
including additional variables in the model. Specifically, external variables that are known to
predict only one of the factors and/or that are uniquely predicted by one of the factors may be
included. However, this again requires correct specification of the relationships, as model
parameters may otherwise be biased (Schmiedek & Li, 2004). Third, factor correlations
among specific factors may be fixed to certain values that are informed by substantive
theoretical considerations. Note that in this case the orthogonality between the general factor
and the specific factors needs to be maintained. Fourth, when a nested-factors model is used,
removing one specific factor may allow correlations among the remaining specific factors
while retaining the orthogonality of the general and specific factors. Importantly, the specific
factor to be removed should represent the standard method of operationalizing the general
construct (Eid, Lischetzke, Nussbeck, & Trierweiler, 2003). In intelligence research, for
example, reasoning tasks can be considered standard indicators of general cognitive ability,
(Gottfredson, 1997; Snow, Kyllonen, & Marshalek, 1984). Thus, when this approach is
implemented, reasoning tasks are affected only by the factor representing general cognitive
ability.
Our goals in writing this tutorial are twofold. First, we offer an in-depth discussion of
the psychometric properties and the interpretation of four popular, but different, CFA models
that can be used to study hierarchically structured personality constructs. Ideally, this
discussion will encourage researchers to systematically compare their favorite CFA model
with theoretically supported alternative models (Jöreskog, 1993). This comparison will foster
levels of generality with respect to both components of psychological theories (Edwards &
Bagozzi, 2000): (a) how personality constructs are related to corresponding measures and (b)
characteristics, or life out comes. Second, we hope that our tutorial will generate greater
awareness for model-based approaches to the computation of score reliability (Sijtsma, 2009).
limitations of widely used reliability indices, such as alpha. Taken together, the guidance
provided in this tutorial may thus help researchers to implement the Standards for
1999) by providing a statistical rationale for the derivation and interpretation of scale scores
References
Ajzen, I., & Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B.
T. Johnson, & M. P. Zanna (Eds.), The handbook of attitudes (pp. 173-221). Mahwah,
NJ: Erlbaum.
Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley &
Sons.
Bonett, D. G. (2003). Sample size requirements for testing and estimating coefficient alpha.
Boomsma, A., & Hoogland, J. J. (2001). The robustness of LISREL modeling revisited. In R.
Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural equation models: Present and
Borsboom, D., & Mellenbergh, G. J. (2002). True scores, latent variables, and constructs: A
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity.
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford
Press.
Caruso, J. C., & Cliff, N. (1999). The properties of equally and differentially weighted
Chen, F. F., West, S. G., & Sousa, K. H. (2006). A comparison of bifactor and second-order
Cheung, M. W.-L. (2009). Constructing approximate confidence intervals for parameters with
Clark, L. A., & Watson, D. (1991). Tripartite model of anxiety and depression: Psychometric
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ:
Colom, R., Abad, F. J., Garcia, L. F., & Juan-Espinosa, M. (2002). Education, Wechsler‘s
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,
16, 297-334.
Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between
Eid, M., Lischetzke, T., Nussbeck, F. W., & Trierweiler, L. I. (2003). Separating trait effects
Fleeson, W. (2004). Moving personality beyond the person–situation debate. The challenge
Gallagher, M., Lopez, S. J., & Preacher, K. J. (2009). The hierarchical structure of well-being.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Graham, J. W., & Collins, N. L. (1991). Controlling correlational bias via confirmatory factor
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6, 430-
450.
Gustafsson, J. E., & Balke, G. (1993). General and specific abilities as predictors of school
Hall, R. J., Snell, A. F., & Singer Foust, M. (1999). Item parceling strategies in SEM:
Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the
523-531.
Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf-Gc theory. In D. P. Flanagan,
Theories, tests, and issues (pp. 53-91). New York: The Guilford Press.
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure
6, 1-55.
(Eds.), Testing structural equation models (pp. 294-316). Newbury Park, CA: Sage
Publications.
Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to
parcel: Exploring the question, weighting the merits. Structural Equation Modeling, 9,
151-173.
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:
Addison-Wesley.
Lubinski, D. (2004). Introduction to the special section on cognitive abilities: 100 years after
17-39.
Marsh, H. W., & Craven, R. (2006). Reciprocal effects of self-concept and performance from
Marsh, H. W., Hau, K. T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The
Marsh, H. W., Hau, K.-T., & Grayson, D. (2005). Goodness of fit in structural equation
Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on
hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in
320-341.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum
Associates.
McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.
Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor
Muthén, L. K., & Muthén, B. O. (1998–2010). Mplus User’s Guide (6th ed.). Los Angeles,
Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample
Raykov, T., Dimitrov, D. M., & Asparouhov, T. (2010). Evaluation of scale reliability with
binary measures using latent variable modeling. Structural Equation Modeling, 17,
265-279.
Revelle, W. (2010). Package ‗psych‘. Retrieved August 24, 2010, from http://personality-
project.org/r/psych_manual.pdf
Rhemtulla, M., Brosseau-Liard, P., & Savalei, V. (2010). How many categories is enough to
publication.
Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions.
Schmiedek, F., & Li, S.-C. (2004). Toward an alternative representation for disentangling
Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Self-concept: Validation of construct
Sijtsma, K. (2009). Reliability beyond theory and into practice. Psychometrika, 74, 169-173.
Slaney, K. L., & Maraun, M. D. (2008). A proposed framework for conducting data-based test
Snow, R. E., Kyllonen, P. C., & Marshalek, B. (1984). The topography of ability and learning
Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1999). Dimensions of the Beck
Methodika, 3, 25-60.
Stoel, R. D., Garre, F. G., Dolan, C., & van den Wittenboer, G. (2006). On the likelihood ratio
matter? Self-concept and self-esteem in everyday life. American Psychologist, 62, 84-
94.
621-635.
Tomarken, A. J., & Waller, N. G. (2003). Potential problems with ―well fitting‖ models.
Tulsky, D. S., & Ledbetter, M. F. (2000). Updating to the WAIS–III and WMS–III:
Considerations for research and clinical practice. Psychological Assessment, 12, 263-
262.
van der Maas, H. L. J., Dolan, C. V., Grasman, R. P. P. P., Wicherts, J. M., Huizenga, H. M.,
Wechsler, D. (1997). Wechsler Adult Intelligence Scale–Third Edition. San Antonio, TX: The
Psychological Corporation.
West, S. G. (2006). Seeing your data: Using modern statistical graphics to display and detect
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal
Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage.
HIERARCHICALLY STRUCTURED CONSTRUCTS 53
West, S. G., Taylor, A. B., & Wu, W. (in press). Model fit and model selection in structural
Wilhelm, O., & Oberauer, K. (2006). Why are reasoning ability and working memory
in choice reaction time tasks. European Journal of Cognitive Psychology, 18, 18-50.
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in
604.
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future
of multivariate experimental psychology (2nd ed., pp. 505-560). New York: Plenum.
Yang, Y., & Green, S. B. (2010). A note on structural equation modeling estimates of
Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-
order factor model and the hierarchical factor model. Psychometrika, 64, 113-128.
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach‘s alpha, Revelle‘s beta, and
McDonald‘s omega h: Their relations with each other and two alternative
Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability
Footnotes
1
The online supplemental materials can be retrieved from www.emacs.uni.lu.
2
The maximum likelihood estimator used in this article is based on statistical
likelihood estimation leads to improper standard error estimates of model parameters and to
misleading confidence intervals and test statistics unless certain constraints are imposed on
the model parameters (Cudeck, 1989; McDonald, 1999). Only the correlation matrix of the
subtest scores was provided for the Spanish standardization sample of the WAIS–III. To
obtain correct standard errors for model parameters, we therefore followed McDonald (1999,
pp. 193-195) and specified in all models under investigation (a) appropriate constraints and
(b) scaling parameters for the estimation of subtest-specific factors (i.e., e1 to e14). Mplus (L.
K. Muthén & Muthén, 1998–2010) syntax files are presented in the Appendix. Note that the
2 value of the goodness-of-fit statistic of overall fit and the descriptive fit statistics (e.g.,
RMSEA, CFI, and SRMR) are typically not affected when correlation matrices are used
to the index scores of the WAIS–III, which were computed with a selection of subtests
(Caruso & Cliff, 1999). Accordingly, our reliability estimates do not apply to the Full Scale
IQ or to the index scores of the WAIS–III for the Spanish standardization sample.
4
We estimated score reliability for CFA models in which factor loadings and variances
of subtest-specific factors could vary across manifest measures (reflecting the assumption of
congeneric measures). Note that CFA models may also be applied to estimate score reliability
in more restrictive measurement models (Bollen, 1989, p. 208) in which factor loadings are
constrained to be equal for all measures (tau-equivalent measures) or in which factor loadings
and the variances of the subtest-specific factors are constrained to be equal for all measures
(parallel measures).
HIERARCHICALLY STRUCTURED CONSTRUCTS 55
5
Sample size also affects the precision of the estimation of alpha (Bonett, 2003). Thus,
alpha may not be preferable to omega or omega hierarchical, even in cases of small samples.
6
This pattern of results is akin to problems encountered in the interpretation of scale
score profiles, when differences between scale scores are computed to identify a person‘s
strengths and weaknesses. For example, the reliability of differences between WAIS–III index
scores assessing specific ability constructs was found to be low (when using unit weights to
compute index scores) as these scores proved to be strongly mutually correlated (e.g., Caruso
Acknowledgements
We thank the editor Stephen West and all four reviewers for their valuable comments on an
earlier version of this manuscript, and Susannah Goss for editorial support.
HIERARCHICALLY STRUCTURED CONSTRUCTS 57
Table 1
Intercorrelations Among the Subtest Scores of the WAIS–III (as Obtained for the Spanish Standardization Sample)
Task score 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
1. Vocabulary (voc)a --
2. Similarities (sim) .755 --
3. Arithmetic (ari) .608 .596 --
4. Digit span (dig) .555 .566 .614 --
5. Information (inf) .715 .678 .661 .543 --
6. Comprehension (com) .729 .697 .554 .503 .671 --
7. Letter–number (let) .627 .612 .669 .759 .603 .567 --
8. Picture completion (pic_c) .616 .621 .567 .538 .599 .552 .612 --
9. Digit–symbol coding (cod) .606 .582 .576 .590 .532 .502 .689 .643 --
10. Block design (blo) .598 .605 .625 .567 .616 .496 .655 .679 .668 --
11. Matrices (mat) .657 .668 .699 .609 .634 .564 .692 .711 .711 .769 --
12. Picture arrangement (pic_a) .613 .623 .585 .568 .616 .574 .665 .677 .672 .692 .753 --
13. Symbol search (sym) .588 .57 .584 .563 .533 .494 .675 .623 .787 .673 .717 .670 --
14. Object assembly (obj) .560 .554 .537 .54 .538 .490 .597 .619 .627 .742 .689 .673 .649 --
Note. This table is reprinted from Intelligence, 30, R. Colom, F. J. Abad, L. F. Garcia, M. Juan-Espinosa, Education, Wechsler‘s Full Scale IQ, and
g, 449-462, Copyright (2002), with permission from Elsevier.
a
Labels in parentheses are those used in the Mplus syntax presented in the Appendix.
HIERARCHICALLY STRUCTURED CONSTRUCTS 58
Table 2
Fit of the Four Factor Models to the WAIS–III Data
Model 2 df CFI RMSEA SRMR
Note. All 2 goodness-of-fit tests were statistically significant at p < .001. CFI = Comparative
Fit Index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root
Mean Squared Residual.
HIERARCHICALLY STRUCTURED CONSTRUCTS 59
Table 3
Standardized Factor Loadings of the Subtests of the WAIS–III as Obtained by Applying the
Schmid-Leiman Transformation to the Higher-Order Factor Model (Figure 1c)
Subtest gHO VCHO, specific POHO, specific WMHO, specific PSHO, specific eHO
Table 4
Example Equations for the Computation of Score Reliability (in Terms of and h)
Equations
= (.79 + .78 + .77 + .73 + .76 + .71 + .82 + .79 + .81 + .83 + .88 + .82 + .80
+ .77)2 / [(.79 + .78 + .77 + .73 + .76 + .71 + .82 + .79 + .81 + .83 + .88 + .82 + .80
+ .77)2 + (.38 + .39 + .41 + .47 + .42 + .50 + .33 + .38 + .35 + .32 + .23 + .32 + .36 + .41)] =
.96
= (.82 + .88 + .85 + .81)2 / [(.82 + .88 + .85 + .81)2 + (.33 + .23 + .27 + .34)] = .91
= [(.70 + .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42
+ .45 + .43 + .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2] / [(.70 +
.76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42 + .45 + .43
+ .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2 + (.33 + .22 + .27 +
.34 + .37 + .27 + .36 + .20 + .30 + .34 + .20 + .39 + .21 + .22)] = .97
h = (.70 + .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 / [(.70
+ .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42 + .45 +
.43 + .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2 + (.33 + .22 + .27
+ .34 + .37 + .27 + .36 + .20 + .30 + .34 + .20 + .39 + .21 + .22)] = .92
= [(.73 + .76 + .75 + .66)2 + (.35 + .46 + .39 + .50)2] / [(.73 + .76 + .75 + .66)2 + (.35 + .46
+ .39 + .50)2 + (.34 + .22 + .28 + .31)] = .91
h = (.35 + .46 + .39 + .50)2 / [(.73 + .76 + .75 + .66)2 + (.35 + .46 + .39 + .50)2 + (.34 + .22
+ .28 + .31)] = .23
HIERARCHICALLY STRUCTURED CONSTRUCTS 61
Table 5
Model-Based Variance Composition and Reliabilities ( and h) of the WAIS–III Scale Scores
Nested-factor model
Verbal comprehension VCNF,specific 12.5 23.4 67.5 9.1 .91 .23
Perceptual organization PONF,specific 18.9 5.0 87.6 7.3 .93 .05
Working memory WMNF,specific 7.1 15.4 72.6 12.0 .88 .15
Processing speed PSNF,specific 3.6 15.9 72.1 11.9 .88 .16
General cognitive ability gNF 127.0 92.7 4.3 3.0 .97 .93
Note. g = general cognitive ability; VC = Verbal Comprehension; PO = Perceptual Organization; WM = Working Memory; PS = Processing Speed; OF = one-factor model; FO =
first-order factor model; HO = higher-order factor model; NF = nested-factor model.
HIERARCHICALLY STRUCTURED CONSTRUCTS 62
Figure Captions
Figure 1. Alternative factor models for the WAIS–III: (a) One-factor model (OF), (b) first-
order factor model (FO), (c) higher-order factor model (HO), and (d) nested-factor model
Speed. Note: All models were identified by fixing the variance of the latent factors to 1.00; all
other model parameters were freely estimated. For the higher-order factor model, the
variances of the specific first-order factors (i.e., VCHO,specific, POHO,specific, WMHO,specific, and
PSHO,specific) were thus constrained to 1 – squared factor loading of the corresponding first-
order factor on gHO (e.g., the variance of VCHO,specific was constrained to equal 1 – squared
factor loading of VC on gHO). Further, for the nested-factor model, the factor loadings of the
two subtests on PSNF,specific were constrained to be equal (the nested-factor model otherwise
Figure 1
a. One-Factor Model b. First-Order Factor Model
eOF,1 = .42 eFO,1 = .33
Information Information
.82
eOF,2 = .38 .76 eFO,2 = .23 1
Vocabularly Vocabularly .88
.79
eFO,3 = .27
VCFO
eOF,3 = .39 Similarities Similarities .85
.78
eOF,4 = .50 eFO,4 = .34 .81
Comprehension Comprehension
.71
.84
eOF,5 = .41 eFO,5 = .37
Object Assembly Object Assembly
.77 .80
eOF,6 = .32 eFO,6 = .27
Block Design .83 Block Design .85 1
eOF,7 = .38 eFO,7 = .36 .80
Picture Completion .79 Picture Completion
POFO .82
1 .89
eOF,8 = .23 .88 eFO,8 = .20 Matrix Reasoning
Matrix Reasoning
.82 gOF Picture
.84 .74
eOF,9 = .32 Picture eFO,9 = .30
Arrangement Arrangement
.87
.73
eOF,10 = .47 eFO,10 = .34 1
Digit Span Digit Span
.82 .81
eOF,11 = .33 Letter-Number eFO,11 = .20 Letter-Number .90
Sequencing Sequencing WMFO .90
eOF,12 = .41 .77 eFO,12 = .39
Arithmetic Arithmetic .78
.73
.84
1
.80
eOF,13 = .47 eFO,13 = .34 Digit-Symbol .89
Digit-Symbol
Coding Coding
PSFO
eOF,14 = .36 eFO,14 = .22
Symbol Search Symbol Search .89
eHO,3 = .27
.85 VCHO .73 .39 VCNF,specific
Similarities Similarities eNF,3 = .28
.50
eHO,4 = .34 .81
Comprehension Comprehension eNF,4 = .31
.76
.86 .75
.06
eHO,5 = .37 .66
Object Assembly
.80 POHO,specific Object Assembly .33 eNF,5 = .32
.75
eHO,6 = .27
Block Design .85 Block Design .37 eNF,6 = .19
.82
1
eHO,7 = .36 .80 1
Picture Completion .09
POHO Picture Completion eNF,7 = .37
.89 .97 1 PONF,specific
.79 .10
eHO,8 = .20 Matrix Reasoning gNF .89
Matrix Reasoning
.08
eNF,8 = .20
WMHO,specific .91
eHO,10 = .34 .70
Digit Span .81 .52 eNF,10 = .24
Digit Span
.81 1
eHO,11 = .20 Letter-Number .90 Letter-Number .37 eNF,11 = .21
Sequencing WMHO .76 Sequencing WMNF,specific
.91 .16
eHO,12 = .39 eNF,12 = .40
Arithmetic .78 Arithmetic
.17 .80
PSHO,specific
.80
eHO,13 = .34 Digit-Symbol .89 Digit-Symbol .38 1 eNF,13 = .21
Coding Coding
.88 PSHO .38 PSNF,specific
eHO,14 = .22 eNF,14 = .22
Symbol Search Symbol Search