72 Inpress ATutorialon Hierarchically Structured Constructs

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/232706750
A Tutorial on Hierarchically Structured

Constructs
Article in Journal of Personality · August 2012

DOI: 10.1111/j.1467-6494.2011.00749.x · Source: PubMed
CITATIONS READS
147 907
3 authors:
Martin Brunner Gabriel Nagy

Universität Potsdam Leibniz Institute for Science and Mathematic…
206 PUBLICATIONS 2,698 CITATIONS 102 PUBLICATIONS 1,621 CITATIONS
SEE PROFILE SEE PROFILE
Oliver Wilhelm
Ulm University
163 PUBLICATIONS 6,011 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Meta-heuristics in psychological assessment View project
Prosocial behavior View project
All content following this page was uploaded by Oliver Wilhelm on 18 October 2017.
The user has requested enhancement of the downloaded file.

Running Head: HIERARCHICALLY STRUCTURED CONSTRUCTS 1
A Tutorial on Hierarchically Structured Constructs
Martin Brunner
University of Luxembourg
Luxembourg
Gabriel Nagy
University of Tuebingen
Germany
Oliver Wilhelm
University of Ulm
Germany
Brunner, M., Nagy, G., & Wilhelm, O. (in press). A tutorial on hierarchically structured constructs. Journal of
Personality.
Contact: martin.brunner@uni.lu
HIERARCHICALLY STRUCTURED CONSTRUCTS 2
Abstract
Many psychological constructs are conceived to be hierarchically structured and thus to
operate at various levels of generality. Alternative confirmatory factor analytic (CFA) models
can be used to study various aspects of this proposition: (a) the one-factor model focuses on
the top of the hierarchy and contains only a general construct, (b) the first-order factor model
focuses on the intermediate level of the hierarchy and contains only specific constructs, and
both (c) the higher-order factor model and (d) the nested-factor model consider the hierarchy
in its entirety and contain both general and specific constructs (e.g., bifactor model). This
tutorial considers these CFA models in depth, addressing their psychometric properties,
interpretation of general and specific constructs, and implications for model-based score
reliabilities. The authors illustrate their arguments with normative data obtained for the
Wechsler Adult Intelligence Scale and conclude with recommendations on which CFA model
is most appropriate for which research and diagnostic purposes.
Keywords: latent constructs, confirmatory factor analysis, reliability, hierarchical factor
models
A Tutorial on Hierarchically Structured Constructs
Many psychological constructs are conceptualized to be hierarchically structured and
thus to operate at various levels of generality (Emmons, 1995). Hierarchically structured
constructs include traditional personality traits (DeYoung, 2006), self-concept (Marsh &
Craven, 2006; Shavelson, Hubner, & Stanton, 1976), disorders like depression (Steer, Ball,
Ranieri, & Beck, 1999; Tanaka & Huba, 1984), subjective wellbeing (Chen, West, & Sousa,
2006; Gallagher, Lopez, & Preacher, 2009), and intelligence (Carroll, 1993; McGrew, 2009).
For example, the study by DeYoung (2006) supported a hierarchical structure of personality
with two general personality constructs—stability and plasticity—at the top of the hierarchy
and the more specific Big Five personality dimensions at the next level in the hierarchy:
neuroticism (reversed), agreeableness, and conscientiousness were subordinate to stability;
extraversion and openness were subordinate to plasticity.
Crucially, neither general nor specific constructs are directly observable entities, but
rather (unobserved) latent variables that are reflected in observable scores on corresponding
measures. One key task for all personality and individual difference researchers is therefore to
choose from a variety of measurement models that link general and/or narrower, specific
constructs with observable measures in different ways. It is also this decision that provides a
statistical rationale for the computation of scale scores that reflect respondents‘ levels on
general and/or specific constructs. Such a rationale is required, for example, by the Standards
for Educational and Psychological Testing, which state that ―where composite scores are
developed, the basis and rationale for arriving at the composites should be given‖ and that
―the rationale and supporting evidence must pertain directly to the specific score […] to be
interpreted or used‖ (American Educational Research Association, American Psychological
Association, & National Council on Measurement in Education, 1999, p. 20). The
measurement model is also the crucial prerequisite for assessing score reliability (Cortina,
1993; Slaney & Maraun, 2008).

Taken together, measurement models are of critical importance for both research and
applied assessment. The major goal of this tutorial is therefore to guide personality and
individual differences researchers in their choice of a measurement model for hierarchically
structured constructs. To this end, we consider the psychometric properties and the
interpretation of general and specific constructs as conceptualized in four different kinds of
confirmatory factor analytic (CFA) models. Further, we elaborate on the implications of these
CFA models for the computation and interpretation of both scores and model-based estimates
of score reliability. We capitalize on recent psychometric advances (McDonald, 1999;
Zinbarg, Yovel, Revelle, & McDonald, 2006) to demonstrate how to compute the reliability
of scores that are intended to assess constructs at various levels of the hierarchy. We
synthesize our key points in the General Discussion, offering recommendations on which
CFA model is most appropriate for which questions in personality research. In so doing, we
note the potential inherent restrictions of specific CFA models when they are used to examine
how constructs relate to other variables (e.g., psychological constructs, sociodemographic
characteristics, or life outcomes). We illustrate our arguments by reference to the Wechsler
Adult Intelligence Scale–Third Edition (WAIS–III, Wechsler, 1997)—a widely used measure
of intelligence. Software code that can be used to examine the WAIS–III by means of the four
CFA models discussed and to compute model-based reliabilities of scores (Cheung, 2009) can
be downloaded from our website.1
Confirmatory Factor Analytic Models for Hierarchically Structured Constructs
Psychological theories can be divided into two components: one component that
specifies how theoretical constructs are related to corresponding measures and one component
that defines the mutual relationships of the theoretical constructs (Edwards & Bagozzi, 2000).
CFA models (and structural equation models in general) are useful statistical tools for
empirically examining both components. In this section, we discuss four popular CFA models
that may be applied in many areas of personality and individual differences research to study
hierarchically structured constructs. Here, we use these models to test alternative theories of
the structure of intelligence. More specifically, (a) the one-factor model, based on Spearman‘s
work (1904), contains only a general construct representing general cognitive ability (g). (b)
The first-order factor model reflects important ideas of the theory of fluid and crystallized
intelligence (Horn & Noll, 1997) and focuses on ability constructs that are narrower in scope
and specific to cognitive operations (e.g., processing speed) or content domains (e.g., verbal
comprehension). (c) The higher-order factor model and (d) the nested-factor model are
informed by current theories of the structure of intelligence (Carroll, 1993; McGrew, 2009).
These theories conceive general cognitive ability to be the most general construct at the apex
of the hierarchy; specific abilities that are narrower in scope are located at lower levels of the
ability hierarchy.
To study these alternative theories, we use the correlations among the subtests of the
WAIS–III (Table 1) as obtained for the Spanish standardization sample (Colom, Abad,
Garcia, & Juan-Espinosa, 2002, Table A3).2 This sample comprises data from 1,369 persons
(703 women and 666 men; age range: 15 to 94 years). It is representative of the Spanish
population in terms of educational level, geographical region, and residence in urban and rural
areas (Colom et al., 2002).
***** Please insert Table 1 about here *****
In this article, we investigate theories of the structure of intelligence that focus on
general cognitive ability and on specific abilities that are relatively broad in scope. We
therefore use subtest scores, which serve as adequate manifest measures for representing such
broad constructs (Bagozzi & Edwards, 1998). It is important to note that the CFA models
presented in this article can also be applied on the basis of item scores (e.g., rating-scale items
with response categories ranging from ―strongly disagree‖ to ―strongly agree‖), as is done in
many areas of personality and individual difference research. We elaborate on the use of item
scores when discussing the statistical requirements of the model-based estimation of score
reliabilities.
One-Factor Model
Charles Spearman (1904) found that measures of cognitive abilities are positively
correlated across domains. In his theory of intelligence, Spearman explained these
intercorrelations by the operation of a single factor gOF representing general cognitive ability.
These ideas are reflected in the one-factor model (OF, see Figure 1a), which focuses on a
general ability construct. (To distinguish the constructs specified in a particular model from
related constructs in other models, we use a subscript to index all corresponding factors; here,
OF). Specifically, the one-factor model predicts that individual differences in all subtests of
the WAIS–III (depicted as rectangles) are caused by individual differences in a single
common latent factor (depicted as an ellipse) that represents gOF. The influence of gOF on the
subtests of the WAIS–III is represented by a single-headed arrow. The one-factor model
implies that higher scores on gOF are associated with higher scores on all 14 subtests of the
WAIS–III. Hence, gOF is a very general construct as it influences all subtests.
************** Please insert Figure 1 about here **************
In Spearman‘s theory of intelligence, each ability measure is also influenced by a
second factor orthogonal to gOF (which is why he called it the two-factor theory of
intelligence). This second factor represents some specific ability that is required to complete a
certain subtest. Further, each subtest score may also be affected to some degree by random
measurement error. Both of these latter influences (i.e., reliable but subtest-specific variance
and unreliable error variance) are represented by a single factor eOF for each subtest. These
subtest-specific factors (i.e., eOF,1 to eOF,14) are depicted by the single arrows pointing to the
individual subtests in Figure 1a. Note that the factor eOF is specific to each subtest for two
reasons: (a) Unless two measures share measure-specific variance (e.g., when the same
subtest is applied at two successive points of measurement or when two self-report items have
similar wordings), it is not possible to disentangle the variance of a particular subtest that is
attributable to random measurement error from that attributable to specific variance. (b)
Measurement error is uncorrelated across subtests because its influence on subtests is random
(i.e., unpredictable). Thus, the factors eOF,1 to eOF,14 as well as gOF are assumed to operate
mutually independently; these factors are therefore specified to be mutually uncorrelated.
How well does the one-factor model fit the data from the Spanish standardization
sample of the WAIS–III? The overall model fit of the one-factor model is modest (see Table
2), suggesting that a single general construct does not adequately explain the associations
among the subtests. We therefore do not interpret the model parameters any further.
First-Order Factor Model
The one-factor model focuses on a very general ability construct (i.e., gOF) and does
not address abilities that are narrower in scope (and that may be located at intermediate levels
of the hierarchy). In the first-order factor model (FO), in contrast, each measure is assumed to
be influenced by a single first-order factor that influences a subset of the WAIS–III subtests
(see Figure 1b). Note that this conception of cognitive abilities, and the absence of a factor
representing general cognitive ability, is informed by a more recent version of the theory of
fluid and crystallized abilities (Horn & Noll, 1997). A meaningful first-order factor model for
the WAIS–III (e.g., Tulsky & Ledbetter, 2000) therefore conceives the 14 subtests to be
influenced by four mutually correlated first-order factors (the correlations are depicted in
Figure 1b as double-headed arrows). These first-order factors represent constructs that are
specific to either a content domain (i.e., Verbal Comprehension VCFO) or a cognitive
operation (i.e., Perceptual Organization POFO, Working Memory WMFO, Processing Speed
PSFO). For example, the subtests Information, Vocabulary, Similarities, and Comprehension
are all assumed to be affected by a single factor that represents the operation of the (domain-
specific) construct VCFO. Higher scores on VCFO are thus associated with higher scores on
these four verbal subtests. Further, each subtest is influenced by subtest-specific factors (i.e.,
eFO,1 to eFO,14) that represent an ability specifically required for a certain subtest as well as
measurement error. The latter two sources of variance cannot be separated (see above) and are
therefore represented as a single factor eFO that affects each subtest.
An important assumption of the first-order factor model is that the first-order factors
may be correlated. However, the first-order factor model typically does not specify a priori
the direction of the mutual associations of VCFO, POFO, WMFO, and PSFO by placing
restrictions on these correlations.
How well does the first-order factor model fit the data? The overall model fit of the
first-order factor model is good (see Table 2), indicating that four correlated first-order
constructs provide a reasonable explanation of the associations among the subtests of the
WAIS–III. Further, it is possible to statistically test whether four correlated first-order ability
constructs are better able to explain the data than is a single common factor representing
general cognitive ability gOF (Rindskopf & Rose, 1988)—the one-factor model is statistically
equivalent to a first-order factor model in which all factor correlations are fixed to r = 1.
However, when boundary values (here: r = 1) are involved, the difference between the 2
goodness-of-fit test values as obtained for the first-order factor model and the one-factor
model does not follow a 2 distribution. Hence, we applied a multi-step procedure developed
by Stoel and colleagues (Stoel, Garre, Dolan, & van den Wittenboer, 2006) to compute a test
statistic—a so-called chi-bar-square (  2 ) statistic—that is appropriate when model
parameters are fixed to their boundary values (the online supplement describes this procedure
and contains the corresponding software code). The critical value at  = .05 of the
corresponding  2 distribution was  2 = 8.82. The difference in 2 goodness-of-fit test values
was 2 = 1,408 and thus considerably larger than the critical value of the  2 distribution.
Thus, if the two models fitted the data equally well in the population, such a difference in
model fit would be very unlikely to emerge. These findings (along with the improvement seen
in the descriptive fit statistics CFI, RMSEA, and SRMR) indicate that the first-order factor
model is clearly preferable to the one-factor model.
Further, values of the standardized factor loadings of the subtests on first-order factors
(see Figure 1b) range from  = .78 to  = .90 (Mdn  = .85), showing that each factor is well
defined and that the subtests are substantively influenced by the corresponding first-order
construct. Finally, the first-order constructs are strongly positively correlated with one
another: correlations between the latent constructs range from r = .74 (VCFO and PSFO) to r =
.90 (POFO and PSFO), suggesting that the first-order constructs share a considerable amount of
common variance.
Higher-Order Factor Model
The one-factor model focuses on general abilities and the first-order factor model on
specific abilities. Neither model simultaneously addresses general and specific abilities (that
are located at different levels of the ability hierarchy). In contrast, the higher-order factor
model (HO) and the nested-factor model (see next section) both consider the ability hierarchy
in its entirety. We start with the higher-order factor model, in which higher-order factors
reflect the operation of one or more higher-order constructs that explain the intercorrelations
(i.e., the common variance) among the lower-order constructs. Hence, a higher-order factor
model (or a nested-factor model) seems to be a natural statistical representation of theories
such as Carroll‘s (1993) three-stratum theory or McGrew‘s (2009) Cattell-Horn-Carroll
theory, which conceive intelligence to be a multifaceted, hierarchically structured construct.
Figure 1c shows the higher-order factor model for the WAIS–III subtests. As in the
first-order factor model, the subtests are influenced by four first-order factors (representing
the constructs Verbal Comprehension VCHO, Perceptual Organization POHO, Working
Memory WMHO, and Processing Speed PSHO). This model implies, for example, that higher
scores on VCHO are associated with higher scores on the four verbal subtests. Again, subtests
are also influenced by subtest-specific factors (i.e., eHO,1 to eHO,14) that can be interpreted in
the same way as in the first-order factor model. Subtest-specific factors are therefore mutually
independent and uncorrelated with first-order and higher-order factors. Hence, the part of the
model that links first-order constructs with subtests is structurally equivalent to the first-order
factor model.
In contrast to the first-order factor model, however, the shared variance of the first-
order factors is accounted for by a second-order factor gHO that represents the higher-order
construct general cognitive ability. Hence (if the higher-order factor model fits the data well),
gHO accounts for the correlations among the first-order factors (observed in the first-order
factor model) and thus explains the common variance of the first-order factors. This implies
that gHO influences all first-order constructs; higher scores on gHO are therefore associated
with higher scores on all first-order factors.
Consequently, there are two components to the variances of the first-order factors: one
component that is explained by gHO and one component that is independent of gHO (Edwards
& Bagozzi, 2000; Gorsuch, 1983). The latter component is represented in Figure 1c by
specific factors (e.g., specific Verbal Comprehension VCHO,specific) that point to the first-order
factors and that explain individual differences in the first-order factors over and above gHO. In
the present model (and in most applications of higher-order factor models), these specific
factors are uncorrelated with the higher-order factor gHO and among themselves. The total
variance of the first-order constructs therefore represents a blend of the variance attributable
to gHO and to specific factors (e.g., the variance of VCHO is a blend of the variance attributable
to gHO and to VCHO,specific).
Close inspection of the higher-order factor model reveals that the impact of gHO on
manifest subtest scores is mediated by the first-order constructs (Edwards & Bagozzi, 2000;
Schmid & Leiman, 1957): gHO (indirectly) influences all subtests of the WAIS–III and is
therefore clearly broader in scope than the first-order constructs, which influence only a
selection of subtests. The direct impact of the higher-order and specific factors on manifest
subtest scores can be estimated by applying a mathematical transformation to the higher-order
factor model—namely the Schmid-Leiman transformation. This transformation has been
recommended as an ―elegant method‖ (Thompson, 2004, p. 74) for enhancing the
interpretability of higher-order and lower-order factors (see also Brown, 2006; Gorsuch, 1983;
Loehlin, 2004). Specifically, the Schmid-Leiman transformation of a higher-order factor
model yields uncorrelated (first-order) factors that represent both the (higher-order) general
and the specific ability constructs. The factor loadings of the manifest measures on these
factors (see below for details of the computation) reflect the incremental impact of general
and specific abilities on the corresponding measures. Note that the Schmid-Leiman
transformation is a mathematical transformation of the model parameters obtained for the
original higher-order factor model; the empirical fit of both models is therefore identical
(Yung, Thissen, & McLeod, 1999).
How well does the higher-order factor model fit the data? The overall model fit is
adequate (see Table 2). Note that the higher-order factor model is nested in the first-order
factor model (Rindskopf & Rose, 1988). Thus, it is possible to statistically test whether the
higher-order factor representing gHO is capable of fully accounting for the correlations
observed among the first-order constructs in the first-order factor model. 2 difference testing
indicates that this is not completely the case, 2(2, N = 1,369) = 55; p < .001. The next step
is therefore (see also McDonald, 2010) to carefully inspect the residual correlations r that
are computed as the difference between the model-implied correlations among the first-order
constructs and the corresponding correlations in the first-order factor model. These residual
correlations range between r = –.04 (for WMFO and VCFO) and r = .04 (for PSFO and VCFO).
These discrepancies are not ―troublingly large‖ (i.e., –.10  r  .10; McDonald, 2010, p.
679). Thus, although 2 difference testing indicates that the higher-order factor gHO does not
fully capture the correlations found among the first-order constructs in the first-order factor
model, the small residual correlations in combination with the adequate overall fit provide
empirical support for the theoretical proposition that cognitive abilities are hierarchically
structured, with abilities operating at various levels of generality.
Further, as expected, each subtest of the WAIS–III is substantively influenced by the
corresponding first-order construct (range:  = .78 to  = .90; Mdn  = .85), indicating that
these factors are well defined (see Figure 1c). Further, the first-order ability constructs are
strongly influenced by gHO: the standardized loadings on the higher-order factor are between
 = .86 (VCHO on gHO) and  = .97 (POHO on gHO; note that the upper bound of the
corresponding 95% confidence interval does not include 1.00), suggesting that the first-order
abilities share a considerable amount of common variance.
The Schmid-Leiman transformation can be used to estimate the direct impact of gHO
and specific ability constructs on the corresponding subtest scores (Table 3). Specifically, the
factor loadings of the manifest subtest scores on gHO (see Figure 1c) can be computed by
multiplying the factor loading of each subtest on the corresponding first-order factor by the
factor loading of this first-order factor on gHO. For example, the loading of the Information
subtest score on gHO is computed as .82  .86 = .70. Further, the loadings of the subtests on a
specific factor can be computed by multiplying the factor loading of each subtest on the
corresponding first-order factor by the standard deviation of the corresponding specific factor.
For example, the loading of the Information subtest score on VCHO,specific is .82  .51 = .42 (the
variance of VCHO,specific is .26; hence, the corresponding standard deviation is .51).
Three results of the Schmid-Leiman transformation of the higher-order factor model
(Table 3) are noteworthy. First, the factor loadings of the subtests on gHO are large,
demonstrating that gHO exerts strong effects on all subtests: the median of the loadings on gHO
is  = .77 (range:  = .70 to  = .87). Second, each subtest has substantial loadings on specific
abilities (Mdn  = .35; range:  = .20 to  = .45), indicating that specific ability constructs
have an incremental impact on the corresponding subtest scores, over and above gHO. Third,
the factor loadings of the subtests on gHO are considerably larger than the factor loadings of
the subtests on the factors representing specific abilities: Hence, the subtest scores contain
substantially more variance attributable to gHO than to specific abilities.
Finally, close inspection of the Schmid-Leiman transformation reveals an intrinsic
psychometric property of the higher-order factor—the proportionality constraint. This
constraint affects the proportion of variance in the subtest scores explained by general and
specific ability constructs (Schmiedek & Li, 2004). Specifically, for a given set of subtests,
the ratios of variance attributable to the respective first-order ability to variance attributable to
gHO are constrained to be the same. For example, the standardized factor loadings on VCHO,
specific are  = .415 for the Information subtest and  = .449 for the Vocabulary subtest. The
standardized factor loadings of these subtests on gHO are  = .703 and  = .760, respectively.
Obviously, if the variance ratios are computed from the squared factor loadings, the ratios of
the variance attributable to gHO to the variance attributable to VCHO, specific are the same for the
two subtests: variance ratio for the Information subtest = .4152 / .7032 = .349; variance ratio
for the Vocabulary subtest = .4492 / .7602 = .349. Crucially, the proportionality constraint
limits the value of the higher-order factor model in providing insights into the relationship
between general and specific abilities, on the one hand, and other psychological constructs,
sociodemographic characteristics, or life outcomes, on the other (see also General
Discussion).
Nested-Factor Model
Another CFA model (that is not subject to the proportionality constraint) that
considers the ability hierarchy in its entirety is the nested-factor model (NF; Figure 1d). The
term ―nested-factor model‖ was chosen because the factors representing specific constructs
are nested within the general factor representing the general construct (see also Gustafsson &
Balke, 1993). Other terms used to label this kind of model include ―general-specific model‖
and ―bifactor model‖ (Chen et al., 2006). As noted above, current theories of the structure of
intelligence conceive it to be a multifaceted, hierarchically structured construct. In his
influential three-stratum theory, Carroll (1993) defined general cognitive ability as the
broadest ability construct (located at the apex of the hierarchy) and narrower ability constructs
as specific to domains or cognitive operations (located at lower levels of the hierarchy). This
conception of general and specific abilities is reflected in the specification of the nested-factor
model: general cognitive ability is represented as a first-order factor gNF that directly
influences all subtests of the WAIS–III. Hence, as for the one-factor model, higher scores on
gNF are associated with higher scores on all 14 subtests. Further, the nested-factor model
incorporates the multifaceted view of intelligence and the idea that abilities differ in their
breadth, with related sets of subtests being affected by a first-order factor that represents a
corresponding narrower, specific ability construct (e.g., specific Verbal Comprehension
VCNF,specific). For example, the first-order factor representing the (domain-specific) construct
VCNF,specific affects the subtests Information, Vocabulary, Similarities, and Comprehension
over and above gNF. Higher scores on VCNF,specific are therefore associated with higher scores
on these four subtests. Further, subtests are additionally influenced by subtest-specific factors
(i.e., eNF,1 to eNF,14). Crucially, general cognitive ability gNF, specific abilities, and subtest-
specific factors (i.e., eNF,1 to eNF,14) are assumed to be mutually independent and are therefore
specified to be mutually uncorrelated.
Finally, it is important to note that subtest loadings on factors that represent general or
specific abilities are freely estimated in the nested-factor model (vs. constrained in the higher-
order factor model). In contrast to the higher-order factor model, the nested-factor model does
not impose the proportionality constraint on variance ratios of general and specific abilities in
subtest scores. Hence, the nested-factor model can be seen as a generalization of the higher-
order model (Chen et al., 2006; Yung et al., 1999).

How well does the nested-factor model fit the data? The overall model fit of the
nested-factor factor model is good (see Table 2)—the best of the four models under
investigation. Note that the higher-order factor model can be tested against the nested-factor
model (Yung et al., 1999). Thus, it is possible to test whether the higher-order factor model
with its proportionality constraints fits as well as the nested-factor model with freely
estimated factor loadings. 2 difference testing indicates that this is not the case, 2(9, N =
1,369) = 194; p < .001. The descriptive fit statistics also show some improvement in the fit of
the nested-factor model relative to that of the higher-order factor model.
Further, the one-factor model can be tested against the nested-factor model (Rindskopf
& Rose, 1988). This test helps to decide whether the factors representing domain-specific
abilities and general cognitive ability gNF in the nested-factor model are better able to explain
the correlations among subtest scores than a single common factor representing gOF. The
difference in the 2 values is large, at 2(13, N = 1,369) = 1,547, with a probability value of
p < .001. This result (along with the improvement in the descriptive fit statistics CFI,
RMSEA, and SRMR) supports the assumption of the nested-factor model that specific
abilities account for a substantial amount of common variance among subtest scores, over and
above gNF.
Three further findings obtained for the nested-factor model are noteworthy (see Figure
1d). First, the factor loadings of the subtests on gNF are large, demonstrating that gNF has
strong effects on all subtests: the median of the loadings on gNF is  = .77 (range:  = .66 to 
= .89). Second, each subtest has substantial (and statistically significant) loadings on specific
abilities. The range of loadings is from  = .08 (Picture Arrangement on PONF,specific) to  =
.52 (Digit Span on WMNF,specific), with a median loading of Mdn  = .37. Hence, both specific
abilities and gNF affect subtests. However, the influence of specific abilities on subtests is
clearly not equally strong for all subtests. Third, the factor loadings of the subtests on gNF are
considerably larger than those on the factors representing specific abilities: Hence the subtest
scores contain substantially more variance attributable to gNF than to specific abilities.
Taken together, these results indicate (a) that the nested-factor model captures the
correlations among the subtest scores of the WAIS–III reasonably well, (b) that the
assumption of proportionality constraints as imposed in the higher-order factor model may not
hold, and (c) that specific abilities account for a substantial amount of the common variance
among subtest scores over and above gNF. In sum, these empirical results support the
theoretical proposition that cognitive abilities are hierarchically structured and differ in their
generality.
Cross-Model Comparison of General and Specific Constructs
The four CFA models presented above either focus exclusively on general cognitive
ability (i.e., OF) or specific abilities (i.e., FO) or consider the ability hierarchy in its entirety,
containing both general and specific constructs (i.e., HO and NF). At first glance, it may
appear that general and specific ability constructs as specified in these four models can be
interpreted interchangeably. However, as we explain in the following, this is generally not the
case.
We start with specific abilities, which are included in the first-order factor model, the
higher-order factor model, and the nested-factor model. Crucially, the substantive
interpretations of these specific ability constructs vary to different extents. In the first-order
factor model, the first-order constructs (i.e., VCFO, POFO, WMFO, and PSFO) affect the
corresponding subtests. However, no higher-order construct is specified to explain the
intercorrelations of the first-order constructs. In the higher-order factor model, in contrast, the
first-order constructs (i.e., VCHO, POHO, WMHO, and PSHO) affect the subtests as in the first-
order factor model, but are in turn influenced by two independently operating factors, namely
gHO and VCHO,specific, POHO,specific, WMHO,specific, or PSHO,specific, respectively. Hence, in contrast
to the-first order model, the higher-order factor model includes a higher-order construct that
accounts for the interrelations among the first-order constructs: the higher-order factor gHO.
Consequently, the first-order constructs in the higher-order factor model contain variance
attributable to gHO (and variance attributable to a specific factor), whereas the variance of
ability constructs in the first-order factor model is not separated into components that are
attributable to lower-order and higher-order constructs.
In the nested-factor model, each subtest is directly affected by general cognitive ability
gNF and a specific ability construct (i.e., VCNF,specific, PONF,specific, WMNF,specific, or PSNF,specific).
Hence, in contrast to specific abilities in the first-order factor model (i.e., VCFO, POFO, WMFO,
and PSFO), specific abilities as conceptualized in the nested-factor model are abilities that
explain variance in the subtest scores over and above gNF. Thus, specific ability constructs in
the nested-factor model account for variance in subtest scores, account taken of the impact of
gNF on subtest scores, whereas the ability constructs in the first-order factor model explain
total variance in subtest scores. To sum up, it is only if there is no general construct operating
that corresponding specific constructs as specified in the first-order factor model (e.g., VCFO),
the higher-order factor model (e.g., VCHO,specific), and the nested-factor model (e.g.,
VCNF,specific) are identical. Thus, the stronger the empirical impact of the general construct on
manifest measures is, the more distinct the specific constructs in the first-order factor model
become from the corresponding specific constructs in the higher-order model and the nested-
factor model, respectively.
The specific ability constructs in both the higher-order factor model (e.g., VCHO,specific)
and the nested-factor model (e.g., VCNF,specific) are specified to operate mutually independently
as well as independently of general cognitive ability (in terms of gHO or gNF). A refined
substantive interpretation of these specific ability constructs is challenging, however (see
Brunner, 2008, p. 162). Ideally, this interpretation would rest on cognitive measures that tap
only the specific ability construct but not general cognitive ability. In the realm of ability
research, such measures have not yet been found (Brunner, 2008). However, problems with
the labelling of specific constructs do not mean that specific constructs (as represented by
specific factors) do not have substantive meaning (see also the General Discussion).
Importantly, as pointed out by two anonymous reviewers, in other areas of psychological
research (e.g., clinical psychology), more readily identifiable labels for specific factors are
available. For example, in the tripartite model by Clark and Watson (1991), the variance in
anxiety and depression (which are both represented as first-order factors) can be decomposed
into three parts: variance attributable to (a) general negative affect (represented as higher-
order factor) that influences both anxiety and depression, (b) specific physiological
hyperarousal that affects only anxiety, and (c) a specific factor representing low positive
affect (―anhedonia‖) that affects only depression. In this example, the specific factors
represent symptoms that are independent of general negative affect and specific to anxiety and
depression, respectively.
Although specific constructs in the higher-order factor model and the nested-factor
model may have the same substantive interpretations, there are subtle differences between
them. Specifically, the impact of specific ability constructs (e.g., VCHO,specific) on subtest
scores as specified in the higher-order factor model is subject to the proportionality constraint,
whereas the impact of specific abilities (e.g., VCNF,specific) in the nested-factor model is not.
Hence, given the operation of a general construct, it is only when the proportionality
constraint holds that the corresponding specific constructs in the higher-order factor model
and the nested-factor model are mathematically identical. The more the empirical
relationships deviate from the proportionality constraint, the more distinct corresponding
specific constructs as specified in the higher-order and the nested-factor model become from
each other and the more distinct their substantive interpretations may become.
We now turn to the factor representing general cognitive ability, which is included in
the one-factor model, the higher-order factor model, and the nested-factor model. Except for
rare cases, these general ability constructs are not identical. In the one-factor model, gOF
(along with the subtest-specific factors eOF) is the only influence on subtest scores. This is the
major difference to the higher-order factor model and the nested-factor model: in the higher-
order factor model, gHO (indirectly) influences subtest scores independently of the specific
constructs (e.g., VCHO,specific); in the nested-factor model, gNF (directly) influences subtest
scores independently of the specific constructs (e.g., VCNF,specific). Hence, gOF explains total
variance in the subtest scores, whereas both gHO and gNF explain variance in the subtest scores
while controlling for specific constructs. Consequently, it is only if no specific constructs
influence manifest measures (i.e., if the corresponding factor variances are zero) that the three
general factors are identical. In other words, the stronger the empirical impact of specific
constructs on manifest measures is, the more distinct gOF, on the one hand, becomes from gHO
and gNF, on the other. Moreover, the impact of general cognitive ability gHO on subtest scores
as specified in the higher-order factor model underlies the proportionality constraint, whereas
the impact of gNF in the nested-factor model does not. Given that specific constructs influence
manifest measures, it is thus only if the proportionality constraint holds that gHO and gNF are
identical. In other words, the more the proportions of variance in the subtest scores that are
attributable to general and specific constructs deviate from the proportionality constraint, the
more distinct gHO becomes from gNF.
In sum, the interpretations of general and specific ability constructs as conceptualized
in the different CFA models are generally not (completely) interchangeable. Thus, the choice
of a CFA model that links cognitive measures to general and/or specific ability constructs
implies certain constraints on the interpretation of these constructs. Because the four CFA
models discussed are partly nested within each other (i.e., one model is a restricted version of
another), cross-model comparison by means of model fit indices and 2 difference tests can be
used to guide the choice of model.
Model-Based Reliabilities of Measures Assessing Psychological Constructs

Cognitive abilities are not directly observable entities, but latent variables. To assess
an individual‘s cognitive abilities, we have to estimate his or her level on the respective latent
variable. In most applied psychological research, several manifest scale indicators are
summed using unit weights (i.e., each scale indicator has the same weight in the computation
of the sum score) to form a manifest scale score. This scale score gives an estimate of the
person‘s level on the latent general or specific ability construct (Grice, 2001). For example, a
scale score reflecting a person‘s level of general cognitive ability can be computed by using
unit weights to sum up his or her scores on all 14 subtests of the WAIS–III.3 But how reliable
is this scale score?
Classical Test Theory and Reliability
To answer this question, we first show how reliability can be mathematically defined.
As most readers are familiar with the fundamental ideas of classical test theory (CTT), we
start by considering how CTT defines reliability. Within the framework of CTT, a person‘s
observed score is partitioned into one component that reflects his or her true score and one
component that is independent of the true score and reflects measurement error (Lord &
Novick, 1968, p. 29). The observed score variance is thus composed of variance attributable
to true scores (true score variance) and variance attributable to measurement error (error
variance). Score reliability in the context of CTT is thus mathematically defined in terms of
the proportion of true score variance to observed score variance (Lord & Novick, 1968, p. 61).
Reliability may range between 0 (no reliability) and 1 (perfect reliability).
Model-Based Score Reliability in the Context of CFA Models
The mathematical definition of reliability in the context of CTT has two conceptually
overlapping meanings. Reliability (a) assesses the consistency of measurement (across time or
across instruments) and (b) is an index of measurement precision (Lord & Novick, 1968;
McDonald, 1999; Mellenbergh, 1996). In this article, we draw on the conceptual definition of
reliability as an index of measurement precision. Specifically, we focus on model-based

estimates of score reliability by means of CFA models.4 As we show below, for the one-factor
model and the first-order factor model, the total amount of reliable variance provides an
estimate of how precisely a certain scale score assesses a certain target construct. For the
higher-order factor model and the nested-factor model, however, different model-based
reliability indices can (a) estimate the total amount of reliable variance in a scale score or (b)
indicate how precisely a certain scale score measures a certain target construct. We therefore
discuss the computation and interpretation of model-based score reliabilities separately for
each of the four CFA models presented above.
In this respect, it is important to highlight that all model-based estimates of score
reliability (as well as all reliability estimates based on CTT) are population dependent
(Mellenbergh, 1996). Thus, score reliability depends on how heterogeneous the sample is on
the target construct(s). Moreover, psychometricians differ in their interpretations of the
relationship between the true score concept of CTT and construct scores as defined in CFA
models (Bollen, 1989, p. 219; Borsboom & Mellenbergh, 2002). For example, Borsboom and
Mellenbergh (2002) point to fundamental conceptual differences between true scores and
construct scores. In contrast, proponents of stochastic measurement theory (e.g., Steyer, 1989)
integrate CTT into CFA models by incorporating statistical assumptions on the relationship
between true scores and construct scores (see also Bollen, 1989, pp. 218-222). Here, we take a
model-based approach to score reliability in the context of CFA models, as perhaps most
clearly elaborated in Bollen (1989) and McDonald (1999). For didactic reasons, we point to
some general similarities between CTT and the model-based approach. A thorough discussion
of the (sometimes subtle) conceptual and statistical differences between these psychometric
models is beyond the scope of this article.
One-Factor Model
We first analyze how well the scale score representing general cognitive ability
assesses the latent construct general cognitive ability in terms of gOF (Figure 1a). In the one-
factor model, the variance of the latent factor representing gOF can be interpreted as the
reliable (―construct score‖) variance of the score representing general cognitive ability.
Further, gOF and subtest-specific factors are specified to be unrelated, reflecting the idea that
construct score and error score are mutually independent. As noted above, the subtest-specific
factors (i.e., eOF,1 to eOF,14) may comprise both reliable subtest-specific variance and
unreliable variance attributable to random measurement error. It is debatable whether subtest-
specific variance that is not shared with the target construct (e.g., gOF) should be seen as part
of a measure‘s reliable variance (Bollen, 1989, pp. 219-221). Given that variance attributable
to measurement error and reliable subtest-specific variance are typically not separated in
applications of CFA, we do not consider the latter to be part of the reliable variance, and we
do not take it into account when computing scale score reliability (Bollen, 1989, pp. 220-221).
Hence, the model-based reliability estimates that we discuss in the present article may be
interpreted as lower-bound estimates of a scale score‘s total amount of reliable variance.
Taken together, in the case of a one-factor model, the model-based reliability of a
scale score may be defined as the proportion of variance accounted for by one latent target
construct (e.g., gOF) relative to observed score variance. In line with McDonald (1999) and
Zinbarg and colleagues (Zinbarg, Revelle, Yovel, & Li, 2005; Zinbarg et al., 2006), we refer
to this reliability coefficient as omega (). More formally, these ideas can be expressed as
follows. When unit weights are used, a scale score Y is computed by summing up p manifest
scale indicators Yi: Y = Y1 + Y2 + ... + Yp. When standardized model parameters are used,  for
the scale score Y is computed as follows:
2
 p 
   ij 
  i 1  . (1)
2
 p
 p
   ij    ei
 i 1  i 1
Here, ij is the standardized factor loading of manifest measure Yi on factor j, and ei is
the standardized variance of the subtest-specific factor affecting the manifest variable Yi. The
numerator in Equation 1 represents the amount of score variance in the scale score Y that can
be attributed to the variance of the factor representing the target construct. The denominator
represents the total variance of the scale score, which comprises (a) the score variance
accounted for by the target construct and (b) the variances attributable to the subtest-specific
factors of the scale indicators. Values of omega can range from 0 (no reliability) to 1 (perfect
reliability). In other words, when a one-factor model is applied, a value of  = 1 indicates that
the sum score Y measures the target construct with perfect accuracy; the more omega departs
from 1, the lower the precision with which Y measures the latent target construct.
Table 4 presents the computation used to derive omega for the scale score General
Cognitive Ability in terms of gOF (Figure 1a).
****** Please insert Table 4 about here ******
When the model parameters obtained for the one-factor model are used (Figure 1a),
omega of the General Cognitive Ability score is computed as the ratio of the variance
attributable to gOF to the total variance of this score. The total variance of the General
Cognitive Ability score is the sum of the variances that can be attributed to (a) gOF and (b)
subtest-specific factors (i.e., the sum of the variances of eOF,1 to eOF,14). The value of  = .96
represents the reliability of the General Cognitive Ability score to measure gOF. In other
words, 96% of the variance in this scale score is accounted for by gOF. Table 5 reports the
composition of scale score variance in terms of variance attributable to gOF and subtest-
specific factors (eOF,1 to eOF,14). Note that the omega value obtained for the one-factor model
should be interpreted with caution, as the fit of this model to the data was modest (see also the
discussion below).
****** Please insert Table 5 about here ******

How can the reliabilities of scale scores be computed in the context of a first-order
factor model? Because each measure is assumed to reflect one latent target construct only, the
scale score‘s reliability can be computed in the same way as for the one-factor model, using
omega (see Equation 1). For example, the scale score assessing the target construct Verbal
Comprehension in terms of VCFO is computed as the unit-weighted sum of the subtest scores
Information, Vocabulary, Similarities, and Comprehension. Reliability in terms of how well
this scale score assesses VCFO is then computed as the ratio of variance attributable to VCFO
(i.e., the squared sum of the corresponding standardized factor loadings; see Figure 1b) to the
total variance of this scale score (i.e., the squared sum of the corresponding standardized
factor loadings plus the sum of the corresponding subtest-specific variances). Table 4 shows
the necessary computation. The value of  = .91 indicates that 91% of the variance in the
scale score reflecting Verbal Comprehension is attributable to VCFO. Table 5 reports the
omega values and the variance composition for the other scale scores measuring specific
abilities.
Higher-Order Factor Model
In the one-factor model and the first-order factor model, each subtest is affected by a
single ability construct. In the higher-order factor model, subtests are affected by both general
and specific ability constructs. The computation of a scale score‘s reliability is thus more
complex: In the higher-order factor model, the observed variance of a manifest subtest score
is composed of (a) the variance attributable to the general/higher-order construct (i.e., gHO),
(b) the variance attributable to the specific constructs (e.g., VCHO,specific), and (c) subtest-
specific factors (i.e., eHO,1 to eHO,14). When several subtests are summed to create a scale
score, the total variance of this scale score thus comprises variance attributable to general
cognitive ability and variance attributable to a certain specific ability (in addition to variance
attributable to subtest-specific factors eHO,1 to eHO,14).

As stated above, score reliability can be defined in terms of measurement precision. In
the one-factor model and in the first-order factor model, this definition can be mathematically
expressed as the proportion of variance in the target construct to observed score variance
(McDonald, 1999; Mellenbergh, 1996). In the context of higher-order factor models (and
nested-factor models), the computation of score reliability is more complex: The model-based
computation of score reliability depends on the researcher‘s decision as to which variance
components are defined as reliable (―construct score‖) variance (Sijtsma, 2009).
Omega. The first way of defining reliable variance is as the amount of variance
accounted for by all (i.e., general/higher-order and specific) constructs that underlie a scale
score. In line with Zinbarg and colleagues (Zinbarg et al., 2005; Zinbarg et al., 2006), we
again refer to this reliability coefficient as omega (). In the case of a higher-order factor
model with k mutually orthogonal latent factors that represent k (general/higher-order and
specific) constructs,  of the scale score Y can be expressed as follows:
k  p  
2
    ij  
j 1  i 1
  
= . (2)
k 
  p
2
 p
    ij     ei
j 1  i 1
   i 1
Note that the numerator in Equation 2 represents the total amount of variance that can
be attributed to the variances of the k constructs that underlie the scale score Y. The
denominator represents the total variance of the scale score, which comprises (a) the total
variance accounted for by all k underlying constructs and (b) the variances attributable to
subtest-specific factors. Omega thus informs on the reliability (i.e., measurement precision)
with which a scale score assesses the blend of the general/higher-order and specific
constructs; it can range from 0 (no reliability) to 1 (perfect reliability).
To illustrate the computation of omega in the context of a higher-order factor model,
we present the necessary computations for the General Cognitive Ability score of the WAIS–
III in Table 4. The omega value of this score can be computed by calculating the total amount
of variance attributable to latent constructs in the scale sum (i.e., variance due to gHO,
VCHO,specific, POHO,specific, WMHO,specific, and PSHO,specific), using the standardized factor loadings
as obtained by applying the Schmid-Leiman transformation (Table 3). The denominator is the
total variance of the General Cognitive Ability score—the sum of the variances attributable to
(a) gHO, (b) VCHO,specific, (c) POHO,specific, (d) WMHO,specific, (e) PSHO,specific, and (f) subtest-
specific factors (i.e., eHO,1 to eHO,14). The value of  = .97 represents the reliability of the
General Cognitive Ability score to measure the blend of gHO and specific abilities. Table 5
displays the omega values and the variance composition for scale scores measuring specific
abilities.
Omega hierarchical. In the computation of omega, reliable variance is defined as the
variance accounted for by all (i.e., general/higher-order and specific) constructs that underlie a
scale score. Alternatively, reliable variance may be defined as the variance in a scale score
accounted for by just one target construct (represented by factor j). To this end, we adapt the
methodological approach developed by McDonald (1999) as well as by Zinbarg and
colleagues (Zinbarg et al., 2005; Zinbarg et al., 2006). Specifically, these researchers
developed the reliability coefficient omega hierarchical (h), which gauges how precisely a
total score assesses a general construct as specified in a higher-order or a nested-factor model.
Note that they did not use omega hierarchical to compute score reliability of specific
constructs, as specified in the present models. Nevertheless, we apply the term omega
hierarchical in this article, because the same methodological approach is used. In this article,
h is defined as follows:
2
 p 
   ij 
h   i 1  . (3)
k  p
  p
2

    ij     ei
j 1  i 1
   i 1
Note that h indicates the proportion of variance in the scale score that is accounted
for by a particular target construct (as represented by factor j in the Schmid-Leiman
transformed higher-order factor model) to total observed variance (i.e., the sum of the
variances accounted for by all k underlying constructs and the sum of all variances
attributable to subtest-specific factors of the p subtests). Thus, omega hierarchical reflects the
measurement precision with which a scale score assesses a certain target construct; it can
range from 0 (no reliability) to 1 (perfect reliability).
We now illustrate the computation of h for the General Cognitive Ability Score. To
compute the omega hierarchical of this score to assess gHO, we again enter the standardized
factor loadings as obtained by applying the Schmid-Leiman transformation (Table 3) in
Equation 3. The denominator representing total scale score variance is thus identical to the
one used to calculate omega (see Table 4). However, the numerator now represents the
variance accounted for by gHO only. Therefore, it contains only the squared sum of the
loadings of the 14 subtests on gHO. The value of h = .92 represents the reliability (i.e.,
measurement precision) of the General Cognitive Ability score to measure the construct gHO.
By the same token, omega hierarchical for the specific abilities (e.g., VCHO,specific) is
computed by only taking into account the standardized factor loadings of the subtests (see
Table 3) on these specific factors when computing the numerator of Equation 3. The
corresponding denominator contains the total variance of the corresponding specific scale
scores, computed as the sum of the variances attributable to (a) gHO, (b) a certain specific
ability factor (e.g., VCHO,specific), and (c) corresponding subtest-specific factors (e.g., eHO,1 to
eHO,4). The value of h = .23 (see Table 5) represents the reliability (i.e., measurement
precision) of the Verbal Comprehension score to measure the construct VCHO,specific.
Table 5 displays the omega hierarchical values and the variance composition for all
scale scores. As the scale scores measuring specific abilities contain a great proportion of
variance attributable to gHO, the omega hierarchical values of these scores are relatively low
(ranging from h = .06 for Perceptual Organization to h = .23 for Verbal Comprehension).
Nested-Factor Model
How can the reliabilities of scale scores be computed in the context of the nested-
factor model? Because each subtest is assumed to be influenced by both general cognitive
ability gNF and a certain specific ability, the scale score‘s reliability can be computed in the
same way as for the higher-order factor model, using omega (see Equation 2) and omega
hierarchical (see Equation 3).
For example, reliability of the Verbal Comprehension score is computed as the ratio of
variance attributable to gNF and VCNF,specific (i.e., adding up the squared sums of the
corresponding standardized factor loadings of subtests on gNF and VCNF,specific; see Figure 1d)
to the total variance of this scale score (i.e., adding up the squared sums of the corresponding
standardized factor loadings of subtests on gNF and VCNF,specific plus the sum of the
corresponding subtest-specific variances). Table 4 shows the necessary computations. The
value of  = .91 indicates that 91% of the variance in the Verbal Comprehension score is
attributable to the blend of gNF and VCNF,specific. This omega value therefore reflects how well
the Verbal Comprehension score measures the blend of general cognitive ability and specific
Verbal Comprehension.
Omega hierarchical, in contrast, reflects how well the Verbal Comprehension score
assesses VCNF,specific. It is computed as the ratio of variance attributable to VCNF,specific (i.e., the
squared sum of the corresponding standardized factor loadings of subtests on VCNF,specific) to
the total variance of the Verbal Comprehension score. The total variance is computed in the
same way as for omega. The value of h = .23 thus indicates that 23% of the variance in the
Verbal Comprehension scale score is attributable to VCNF,specific. This is the reliability (i.e.,
measurement precision) with which the Verbal Comprehension score assesses VCNF,specific.
Table 5 shows how well scale scores assessed the blend of general and specific abilities (in
terms of ) and a certain ability construct (in terms of h). As for the higher-order factor
model, scale scores showed relatively low reliability (in terms of h) in assessing specific
abilities because they contained a large amount of variance attributable to gNF.
Statistical Requirements
Omega and omega hierarchical are based on parameter estimates (i.e., estimates of
factor loadings and factor variances) that are derived for a certain CFA model. Hence, two
vital statistical requirements need to be fulfilled: (1) Proper interpretation of omega and
omega hierarchical requires that the target model fits the empirical data well (Bentler, 2009;
McDonald, 1999; Yang & Green, 2010). (2) Parameter estimates need to be precise.
We first address the evaluation of model fit, which should achieve an optimal balance
between the fit of the model to the empirical data, on the one hand, and theoretical
considerations, on the other. There has been considerable debate on which fit indices should
be used and on the strategies applied to evaluate model fit (Hu & Bentler, 1999; Jöreskog,
1993; Marsh, Hau, & Wen, 2004; McDonald, 2010). Although no consensus has yet been
reached, several methodologists strongly recommend comparing the preferred target model
with several a priori specified and theoretically supported alternatives. This approach takes
into account that cutoff values of model fit indices are model dependent, considers alternative
explanations of the data, and allows some models to be ruled out while giving stronger
support for others (MacCallum & Austin, 2000; Marsh, Hau, & Grayson, 2005; West, Taylor,
& Wu, in press). In this tutorial, for example, we computed omega for the General Cognitive
Ability score using the results obtained for the one-factor model for illustrative purposes.
However, we would be very cautious in interpreting this value as the reliability of the General
Cognitive Ability score to assess the latent construct general cognitive ability, because the
one-factor model provided only a modest fit to the data and—even more importantly—a
poorer fit than alternative CFA models. As the nested-factor model was theoretically derived
and provided the best fit of the four models under investigation, we would use the model
parameters obtained for this model to compute score reliability (in terms of  and h).
We now turn to the precision of model parameters, which is affected by two key
factors. First, sample size needs to be sufficiently large to obtain trustworthy estimates of
model parameters (Yang & Green, 2010).5 In general, a larger sample size is always better,
and a sample size of N  200 allows proper estimation of model parameters (e.g., nonnegative
variances of subtest-specific factors) under a large variety of conditions (Boomsma &
Hoogland, 2001). There is also growing consensus that the required sample size depends on
the properties of the model investigated and the data to be analyzed: A higher ratio of
measures per factor and higher factor loadings may compensate for smaller sample size
(Marsh, Hau, Balla, & Grayson, 1998; Yang & Green, 2010). Thus, methodologists strongly
encourage applied researchers to conduct Monte Carlo studies of the target CFA models to
determine the required sample size (L. K. Muthén & Muthén, 2002). For example, previous
simulation studies have demonstrated that trustworthy model-based reliability estimates may
be obtained even with relatively small sample sizes (e.g., N = 100; see Zinbarg et al., 2006).
Second, parameters for CFA models are typically derived by maximum likelihood
estimation, which requires continuous raw data that follow a multivariate normal distribution.
However, many studies in personality and interindividual differences research administer self-
report items with a limited number of response options (e.g., ―disagree,‖ ―disagree
somewhat,‖ ―agree somewhat,‖ ―agree‖); therefore, the assumption that raw data are
continuous may not be tenable. Moreover, empirical data frequently fail to follow a normal
distribution (Micceri, 1989) and, consequently, to have a multivariate normal distribution. So
what can be done? Model parameters—including factor loadings and variances of subtest-
specific factors used to compute omega or omega hierarchical—are generally trustworthy if
three conditions are fulfilled: the raw data are continuous, the sample size is reasonably large,
and the assumption of multivariate normality is not severely violated. Parameter estimates are
quite robust to violations of the multivariate normality assumption as long as the indicators
are ―reasonably‖ continuous. For example, a recent simulation study by Rhemtulla, Brosseau-
Liard, and Savalei (2010) demonstrated that maximum likelihood-based estimation methods
yield acceptable parameter estimates for CFA models under a wide range of conditions, even
when the manifest variables contain only four response categories.
If distributional assumptions are severely violated, several routes can be taken to
tackle the problem—for example, employing alternative (robust) estimation methods with less
stringent distributional assumptions or transforming the input data to better match the
distributional assumptions. Modern software packages used to study CFA models include
robust estimation methods, such as robust maximum likelihood estimation (Satorra, 1990) and
robust weighted least squares estimation (B. O. Muthén, 1984; B. O. Muthén & Kaplan,
1985). These estimation methods may yield higher precision (a) to assess model fit, (b) to
compute standard errors of model parameters, and (c) in the case of robust weighted least
squares, to estimate the model parameters themselves. Thus, robust weighted least squares
may also be an appropriate method for analyzing item-level data from items with fewer than
four response categories (Rhemtulla et al., 2010). Further information on this method can be
found in Wirth and Edwards (2007), who provide an excellent review of factor models and
various estimation methods for item-level data. Moreover, robust maximum likelihood
estimation allows the use of omega and omega hierarchical as explained in this tutorial; in the
case of weighted least squares estimation, score reliability may be estimated using the
approaches proposed by Green and Yang (2009) or Bentler (2009, p. 142).
Alternatively, item scores that are intended to measure the same construct(s) may be
integrated into parcel scores. Subtest scores (as applied in this article) are a special case of
parcel scores (i.e., all items making up a subtest are integrated into one subtest score). Parcel
scores may then be used as manifest measures of the latent variables in CFA models, and
model parameters can be estimated by (robust) maximum likelihood procedures. Parcel scores
may have several advantages over item scores: they show better distributional properties (i.e.,
normality), keep the ratio of observable measures to latent constructs manageable, and
increase the chances of adequate model fit (Bagozzi & Edwards, 1998; Hall, Snell, & Singer
Foust, 1999; Little, Cunningham, Shahar, & Widaman, 2002; West, Finch, & Curran, 1995).
Crucially, when parcel scores are applied, two key requirements need to be fulfilled: (a) The
parcel scores must adequately represent the target construct(s) (Bagozzi & Edwards, 1998;
Little et al., 2002). (b) The dimensional structure underlying the items needs to be taken into
account. Otherwise, inaccurate parameter estimates and model fit statistics will result (Hall et
al., 1999; Little et al., 2002). Ideally, the inter-item structure is unidimensional (Little et al.,
2002). For example, when a one-factor model fits reasonably well to a set of items, these
items may be randomly distributed to parcels (for other parceling strategies, see Hall et al.,
1999, and Little et al., 2002).
Application and Interpretation
Given that the statistical requirements for CFA are met, several aspects of the
application and interpretation of model-based estimates of score reliability warrant
consideration. First, in this article, we applied WAIS–III subtest data to estimate the
reliabilities of scores to assess hierarchically structured constructs at two different levels of
the hierarchy (i.e., specific and general ability constructs located at the intermediate and the
top levels of the ability hierarchy). Importantly, the methods outlined in this tutorial may also
be applied to obtain model-based reliability estimates when item scores are used (in place of
subtest scores) or when more than two levels of the construct hierarchy are investigated.
Second, a model-based approach to estimating score reliability may render not only
the computation but also the concept of score reliability more complex. CTT defines score
reliability ―as the proportion of true-score variance, without considering the composition of
the true score,‖ whereas model-based approaches to reliability ―decompose the true-score
variance into different variance components, and the researcher has to decide which variance
components contribute to test-score reliability‖ (Sijtsma, 2009, p. 170). When one-factor
models and first-order factor models are applied, the omega and omega hierarchical values of
a scale score are identical, as the reliable (―construct score‖) variance of the corresponding
scale scores is not divided into variances attributable to general/higher-order and specific
constructs, respectively. Consequently, omega is sufficient to estimate score reliability, and
the classic definition of score reliability as the proportion of ―construct score‖ variance to
total score variance applies. In the case of the one-factor model and the first-order factor
model, omega therefore indicates the precision with which a scale score assesses a certain
target construct. Interestingly, this interpretation of omega converges with the concept of
construct validity—the extent to which a measure assesses the construct it was designed to
measure (Bollen, 1989, p. 195; McDonald, 1999, p. 63 and p. 208). Note, however, that this
interpretation of omega applies only for researchers who conceive of validity as a quantitative
concept, and not for those who conceive of validity as a qualitative concept (i.e., a measure is
or is not valid to assess a certain target construct). The latter researchers may consider two
measures to be valid, but one to be more reliable (Borsboom, Mellenbergh, & van Heerden,
2004, p. 1070). Omega is thus an index of reliability in terms of measurement precision only.
In contrast to the one-factor and the first-order factor model, the higher-order factor
and the nested-factor model imply that scale scores can be conceived of as assessing more
than one ability construct simultaneously. Thus, these models involve two forms of reliability:
(a) omega indicates how precisely a score measures the blend of general and specific
constructs, whereas (b) omega hierarchical indicates how precisely a score measures a certain
target construct at a certain level of the hierarchy. Omega is therefore closely tied to the
classic definition of score reliability (Bollen, 1989, p. 221): it reflects the total amount of
reliable score variance to the total scale score variance. When constructs are hierarchically
structured, however, the composition of reliable (―true score‖) variance entered in the
computation of omega is complex. Omega values are therefore ambiguous with respect to the
key question of how precisely a score assesses a certain target construct. It is omega
hierarchical that addresses this question. Further, when validity is conceived of as a
quantitative concept (see above), it is also omega hierarchical that may be interpreted in terms
of a measure‘s construct validity (Bollen, 1989, p. 195).
Third, in applied assessment, the focus of interest may also be on the total amount of
reliable variance of a scale score at the bottom level of the hierarchy (e.g., the Information
subtest score). Several methodological approaches can be used to obtain a corresponding
reliability estimate: (a) The total reliable variance of such a scale score can be defined as the
degree to which this score is free of ―error‖ in terms of the subtest-specific factors ei (Bollen,
1989, pp. 220-221). For example, when using the standardized results obtained for the nested-
factor model (Figure 1d), the total reliable variance of the Information subtest score is
estimated by 1 – eNF,1 = 1 – .34 = .66. As noted above, however, the subtest-specific factors
may comprise both reliable subtest-specific variance and unreliable variance attributable to
random measurement error (Bollen, 1989, pp. 219-221). Hence, the result may be interpreted
as a lower-bound estimate of total score reliability, because reliable subtest-specific variance
is not taken into account. Several alternatives based on the interrelationships of the items
entering the scale score can be used to overcome this problem: (b) alpha (Cronbach, 1951), (c)
reliability estimates based on a unidimensional nonlinear factor analytic model (Green &
Yang, 2009; Raykov, Dimitrov, & Asparouhov, 2010), or (d) a unidimensional item-response
model (Mellenbergh, 1996). In many cases, these reliability estimates can be expected to be
larger than those obtained by approach (a), because they can take into account reliable
subtest-specific variance. Crucially, when interpreting the values obtained from approaches
(a) to (d), researchers should be aware that scale scores at the bottom level of the hierarchy
may measure several constructs simultaneously. Thus, these reliability estimates may reflect
the precision with which a scale score assesses the blend of higher-order constructs (approach
[a]) or the blend of subtest specific and higher-order constructs (approaches [b] to [d]).
Fourth, latent variable models are blind to several threats to the validity of statistical
conclusions, such as individual outliers, heteroscedasticity of residuals, and nonlinear
relationships between latent variables (Cohen, Cohen, West, & Aiken, 2003). Obtaining
manifest composite scores and visually inspecting plots can therefore help to carefully
diagnose regression relationships (Cohen et al., 2003; West, 2006; Wilkinson & Task Force
on Statistical Inference, 1999). Omega (if OF or FO are empirically supported) or omega
hierarchical (if HO or NF are empirically supported) may thus serve a useful purpose, as these
reliability coefficients inform on the measurement precision with which a manifest score
assesses a certain latent target construct. Specifically, when hierarchical construct definitions
are endorsed, omega hierarchical may help (see also Zinbarg et al., 2006, p. 124) to evaluate
regression relationships and to judge whether unexpected results (based on manifest scores)
are due to random error or to the fact that the score does not precisely measure the target
construct (e.g., VCNF,specific).
Fifth, as noted in the review by Hogan, Benjamin, and Brezinski (2000), researchers
often use alpha (e.g., Cronbach, 1951) to estimate score reliability (see also Streiner, 2003).
However, alpha is not suitable for estimating the reliability of measures of hierarchically
structured constructs that operate at various levels of generality. Alpha does not indicate how
reliably either (a) a specific construct or (b) the general construct can be measured. On the
contrary, the alpha value reflects a complex blend of variance attributable to the general
construct, variance attributable to the specific constructs, and variability in factor loadings of
the scale indicators assessing general and specific constructs (Zinbarg et al., 2005). Hence,
unlike omega or omega hierarchical, alpha is not suitable for estimating score reliability when
constructs are structurally conceptualized in terms of a higher-order factor or a nested-factor
model.
Sixth, in this tutorial we computed the values of omega and omega hierarchical by
hand (see Table 4). However, these values can also be estimated by structural equation
software (e.g., Mplus, L. K. Muthén & Muthén, 1998–2010) within the methodological
approach presented by Cheung (2009). The necessary Mplus syntax is provided in the online
supplement. The ―omega‖ function of the R-package ―psych‖ (Revelle, 2010) is an excellent
tool for computing omega hierarchical for a general factor derived by exploratory factor
analysis.
Seventh, the results of the higher-order factor and the nested-factor model converged
to show that score reliabilities in terms of omega were satisfactory, whereas reliabilities of
scores to assess specific constructs in terms of omega hierarchical were relatively low (Table
5). For example, for the nested-factor model, omega values ranged from .88 (Working
Memory) to .93 (Perceptual Organization), whereas omega hierarchical values ranged from
.05 (Perceptual Organization) to .23 (Verbal Comprehension). It was only the General
Cognitive Ability score that captured gNF with sufficient measurement precision (h = .93).6
In most practical settings, omega hierarchical values of scores assessing specific
constructs can be expected to be relatively low when there is a strong higher-order/general
construct that explains much of the common variance in the manifest measures. Thus, the
observation that values of omega hierarchical are low for scale scores capturing specific
constructs may generalize to many different assessment domains, including measures of
traditional self-report dimensions (DeYoung, 2006), disorders like depression (Steer et al.,
1999; Tanaka & Huba, 1984), subjective wellbeing (Chen et al., 2006; Gallagher et al., 2009),
and intelligence. For example, in the assessment of intelligence, it can be assumed—as a rule
of thumb—that ―in heterogeneous collections of cognitive tests in a wide range of talent,
general intelligence accounts for roughly 50% of the common variance‖ (Lubinski, 2004, p.
98). Hence, the observation that scores show relatively low measurement precision (in terms
of h) in assessing specific abilities can be assumed to generalize to cognitive measures
beyond the WAIS–III.

In diagnostic settings, the low values of omega hierarchical obtained for the WAIS–III
scores assessing specific constructs are problematic, because they imply large confidence
intervals around a respondent‘s scale score. Any interpretations of a person‘s level of specific
ability therefore involve great uncertainty. When scale scores are interpreted to represent a
blend of general and specific ability constructs, however, the values of omega of the scores
are found to be satisfactory. Hence, the confidence intervals around the scale scores are
relatively small. In this case, however, interpretations of scale scores capturing specific
constructs should take into account that these scores represent the joint functioning of general
and specific abilities (when higher-order factor models or nested-factor models are applied).
In conclusion, a model-based approach to scale score reliability requires that statistical
assumptions of CFA are met and that the target model provides a good fit to the data. The
choice of a reliability index ( or h) should then be guided by (a) the CFA model applied
and (b) the envisaged interpretation of a scale score.
General Discussion
In this tutorial, we elaborated on four different kinds of CFA models that are widely
used to study various aspects of the theoretical proposition that psychological constructs are
hierarchically structured. In this section, we synthesize the key points of our theoretical and
empirical analyses: For which questions in personality and individual differences research is
each model most appropriate? And which model supports the computation of which
composite scores?
One-Factor Model
The one-factor model focuses on a single, very general construct. Thus, it is most
applicable when (a) a general construct is hypothesized to account for the common variance
among measures, (b) no specific constructs beyond that general construct are predicted to
account for the common variance, (c) the research objective is to study a general construct
and its relationships to other variables (e.g., psychological constructs, sociodemographic

characteristics, or life outcomes), (d) the fit of a one-factor model is satisfactory, and (e) more
complex models do not provide a better fit than the one-factor model. Model comparison is
particularly important as it allows researchers to test whether first-order constructs, as
specified in the first-order factor model, are empirically distinguishable or whether specific
constructs, as specified in the nested-factor model, explain independent variance in the
measures beyond the contribution of a general construct. If the one-factor model is
theoretically supported and provides a good fit to the data, it is inconsistent to use anything
but a total scale score for research and diagnostic purposes. An appropriate reliability index of
this total scale score is omega.
Relative to the one-factor model, the first-order factor model focuses on constructs that
are narrower in scope. It is therefore most applicable when (a) (mutually correlated) first-
order constructs are hypothesized to account for the common variance among corresponding
measures, or (b) no general construct or higher-order construct is predicted to operate, and (c)
the research objective is to study the relationship between specific constructs (but not higher-
order/general constructs) and other psychological constructs, sociodemographic
characteristics, or life outcomes. Moreover, the first-order factor model is (d) a useful tool to
analyze the multifaceted nature of psychological constructs in terms of their convergent and
discriminant relationships (see Stoel et al., 2006, for appropriate statistical methods for testing
the discriminability of latent constructs). Further, (e) as the relationships among first-order
constructs may point to the operation of more general or higher-order constructs, careful
examination of first-order factor models is the traditional starting point for analyses of higher-
order factor models that include such constructs (Marsh, 1987). Hence, the first-order factor
model is particularly useful as a comparison model for more restrictive higher-order factor
models. If the first-order factor model provides a good fit to the data, the use of specific scale
scores for research and diagnostic purposes is strongly supported, but the use of a total scale
score is not supported (McDonald, 1999, p. 208). An appropriate reliability index for the
specific scale scores is omega.
It is important to note that specific constructs as represented in the first-order model
are typically found to be correlated. Thus, researchers should be aware of two issues: First, as
no general construct is included, researchers should delineate the theoretical mechanisms by
which their specific constructs (as represented by first-order factors) are correlated (e.g., see
van der Maas et al., 2006, for intelligence). Second, if specific constructs as specified in the
first-order factor model are used as predictors in a regression context, the regression
coefficients may be affected by multicollinearity. This may render both the substantive
interpretation of regression coefficients and their sampling stability problematic (Cohen et al.,
2003). Specifically, multicollinearity may lead to large standard errors of regression
coefficients, which in turn undermine the trustworthiness of regression parameters:
confidence intervals will be large, implying high uncertainty about the magnitude of an effect.
Note that multicollinearity cannot occur in the other three CFA models: In the one-factor
model, a single construct is specified; in the higher-order factor model and the nested-factor
model, general and specific constructs are specified to be mutually uncorrelated.
Higher-Order Factor and Nested-Factor Model
Commonalities
Both the higher-order and the nested-factor models focus on the construct hierarchy in
its entirety by simultaneously examining the operation of constructs at various levels of
generality. Thus, both models may be applied when (a) a general construct is hypothesized to
account for the common variance among measures, (b) multiple specific constructs are
predicted to contribute to observed individual differences in assigned measures over and
above the general construct, and (c) the research objective is to study specific and general
constructs (Chen et al., 2006). Further, (d) if there is no theoretical explanation of why first-
order constructs are correlated (see above), many methodologists recommend including
constructs with the widest possible generalizability in their models to provide ―the fullest
possible understanding of the data‖ (Gorsuch, 1983, p. 255). This rationale applies equally to
higher-order and nested-factor models. As both models include constructs that operate at two
levels of generality, both support the computation of specific scale scores and a total scale
score. Crucially, these scale scores may be interpreted with respect to either (a) the blend of
general and specific constructs or (b) a certain target construct only. Depending on the
interpretation chosen, different reliability estimates apply: (a) omega or (b) omega
hierarchical.
Choosing Between the Higher-Order Factor and the Nested-Factor Model
Given that the higher-order factor model and the nested-factor model have much in
common, which of these two models should be used in a particular context? If the general
construct is the focus of research, and if the higher-order factor model fits the data as well as
the nested-factor model, the higher-order factor model is preferable to the nested-factor model
(Chen et al., 2006).
In all other cases, but particularly in applications where the research focus is on the
relation of latent general and specific constructs to external criteria, the nested-factor model is
preferable to the higher-order factor model. More specifically, the higher-order factor model
is subject to the proportionality constraint, which renders the estimated relations of external
criteria to general and specific constructs linearly dependent (Schmiedek & Li, 2004). Thus, it
is not possible to examine the associations between all general and specific constructs as
specified in the higher-order factor model and external criteria (for further discussion, see
Schmiedek & Li, 2004). The nested-factor model, in contrast, allows the relationships
between general and all specific constructs, on the one hand, and external variables, on the
other, to be examined. To avoid identification problems in the higher-order factor model,
researchers need to constrain the relationship of one construct (either the general construct or
any of the specific constructs) with an external criterion variable to zero. However, this
requires the correct identification of these zero relationships; otherwise, model parameters
may be biased (see the discussion by Schmiedek & Li, 2004, on potential problems in
identifying these relationships). Taken together, all constructs specified in the nested-factor
model may be linked to external variables. This key advantage makes the structural
conceptualization of constructs in the nested-factor model a theoretically and empirically
fruitful approach to implementing the specificity matching principle (e.g., Swann, Chang-
Schneider, & McClarty, 2007; see also Wittmann, 1988). According to this principle, it is best
to use specific predictor variables (e.g., mathematical ability test scores) to predict specific
outcomes (e.g., mathematics grades); likewise, it is best to use general predictor variables
(e.g., intelligence g) to predict general outcomes (e.g., grade point average). Application of
this principle has helped to reconcile opposing perspectives on the power of personality traits
(Fleeson, 2004), attitudes (Ajzen & Fishbein, 2005), and (perhaps) self-concepts (Marsh &
Craven, 2006; Swann et al., 2007) to explain key outcome variables at different levels of
generality. Moreover, the structural conceptualization of cognitive abilities in terms of a
nested-factor model has helped researchers to understand the interplay between general and
specific abilities, on the one hand, and school grades, academic interests, and students‘
socioeconomic status, on the other (Brunner, 2008; Gustafsson & Balke, 1993). From a wider
perspective, these studies clearly demonstrate the value of measures assessing general and
specific constructs, because both general and specific constructs play a key role in
understanding and predicting individual behavior.
Limitations of the Higher-Order Factor and Nested-Factor Models
Researchers applying the higher-order or nested-factor models should carefully inspect
model parameters for any lack of (empirical) identification. Specifically, it is relatively
common for the specific factors to collapse—as was the case, for example, in the Gustafsson
and Balke (1993) study on the structure of intelligence or in the Chen et al. (2006) study on
the structure of quality of life. Moreover, in the present tutorial we had to constrain the factor
loadings of the two subtests on PSNF,specific to be equal because the nested-factor model was
otherwise empirically underidentified. In general, to check the identification status of their
models, researchers need to carefully examine parameter estimates, standard errors, and
variances of general and specific factors: Parameters out of the range of admissible values
(e.g., standardized factor loadings greater than 1 or negative variances of latent variables),
large standard errors, or variances of general or specific factors very close to zero may
indicate (empirical) identification problems.
A further limitation of the higher-order factor and nested-factor models concerns the
assumption that general and specific constructs are mutually uncorrelated. If this constraint is
removed and the correlations among general and specific constructs are freely estimated,
identification problems are likely to occur (Chen et al., 2006; Rindskopf & Rose, 1988). What
can be done to overcome this problem? If correlations among specific factors and/or the
general factor are of interest, there are several ways to identify the higher-order factor model
and the nested-factor model. First, equality constraints may be imposed on factor loadings.
For example, Wilhelm and Oberauer (2006) analyzed the relationship between reasoning
ability, working memory, and mental speed. They used experimental manipulations to design
measures of mental speed and examined precisely formulated hypotheses on how cognitive
constructs affect these measures by imposing equality constraints on corresponding factor
loadings. These constraints then allowed them to investigate the correlation among general
and specific cognitive constructs. Note that constraining model parameters to be equal (or to
any other predefined values; see below) assumes that these assumptions hold in the target
population. If these assumptions do not hold, the substantive interpretation of model
parameters may be compromised. Thus, the implementation of such an approach requires
careful evaluation of model fit (particularly of local misfit; see Tomarken & Waller, 2003,
and McDonald, 2010) to detect potential model misspecification. Second, according to
Graham and Collins (1991), researchers may ―borrow strength‖ to achieve identification by
including additional variables in the model. Specifically, external variables that are known to
predict only one of the factors and/or that are uniquely predicted by one of the factors may be
included. However, this again requires correct specification of the relationships, as model
parameters may otherwise be biased (Schmiedek & Li, 2004). Third, factor correlations
among specific factors may be fixed to certain values that are informed by substantive
theoretical considerations. Note that in this case the orthogonality between the general factor
and the specific factors needs to be maintained. Fourth, when a nested-factors model is used,
removing one specific factor may allow correlations among the remaining specific factors
while retaining the orthogonality of the general and specific factors. Importantly, the specific
factor to be removed should represent the standard method of operationalizing the general
construct (Eid, Lischetzke, Nussbeck, & Trierweiler, 2003). In intelligence research, for
example, reasoning tasks can be considered standard indicators of general cognitive ability,
because reasoning ability is at the heart of the conceptual definition of intelligence
(Gottfredson, 1997; Snow, Kyllonen, & Marshalek, 1984). Thus, when this approach is
implemented, reasoning tasks are affected only by the factor representing general cognitive
ability.
Conclusion and Outlook
Our goals in writing this tutorial are twofold. First, we offer an in-depth discussion of
the psychometric properties and the interpretation of four popular, but different, CFA models
that can be used to study hierarchically structured personality constructs. Ideally, this
discussion will encourage researchers to systematically compare their favorite CFA model
with theoretically supported alternative models (Jöreskog, 1993). This comparison will foster
the development of cumulative knowledge on personality constructs operating at various
levels of generality with respect to both components of psychological theories (Edwards &
Bagozzi, 2000): (a) how personality constructs are related to corresponding measures and (b)
how personality constructs are related to other theoretical constructs, sociodemographic

characteristics, or life out comes. Second, we hope that our tutorial will generate greater
awareness for model-based approaches to the computation of score reliability (Sijtsma, 2009).
When it comes to hierarchically structured constructs, model-based approaches address severe
limitations of widely used reliability indices, such as alpha. Taken together, the guidance
provided in this tutorial may thus help researchers to implement the Standards for
Educational and Psychological Testing (American Educational Research Association et al.,
1999) by providing a statistical rationale for the derivation and interpretation of scale scores
assessing constructs at various levels of generality.

References
Ajzen, I., & Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B.
T. Johnson, & M. P. Zanna (Eds.), The handbook of attitudes (pp. 173-221). Mahwah,
NJ: Erlbaum.
American Educational Research Association, American Psychological Association, &
National Council on Measurement in Education. (1999). Standards for educational
and psychological testing. Washington, DC: Author.
Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in
organizational research. Organizational Research Methods, 1, 45-87.
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency
reliability. Psychometrika, 74, 137-143.
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley &
Sons.
Bonett, D. G. (2003). Sample size requirements for testing and estimating coefficient alpha.
Journal of Educational and Behavioral Statistics, 27, 335-340.
Boomsma, A., & Hoogland, J. J. (2001). The robustness of LISREL modeling revisited. In R.
Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural equation models: Present and
future. A Festschrift in honor of Karl Jöreskog (pp. 139-168). Lincolnwood, IL:
Scientific Software International.
Borsboom, D., & Mellenbergh, G. J. (2002). True scores, latent variables, and constructs: A
comment on Schmidt and Hunter. Intelligence, 30, 505-514.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity.
Psychological Review, 111, 1061-1071.
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford
Press.
Brunner, M. (2008). No g in education? Learning and Individual Differences, 18, 152-165.

Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New
York: Cambridge University Press.
Caruso, J. C., & Cliff, N. (1999). The properties of equally and differentially weighted
WAIS–III factor scores. Psychological Assessment, 11, 198-206.
Chen, F. F., West, S. G., & Sousa, K. H. (2006). A comparison of bifactor and second-order
models of quality of life. Multivariate Behavioral Research, 41, 189-225.
Cheung, M. W.-L. (2009). Constructing approximate confidence intervals for parameters with
structural equation models. Structural Equation Modeling, 16, 267-294.
Clark, L. A., & Watson, D. (1991). Tripartite model of anxiety and depression: Psychometric
evidence and taxonomic implications. Journal of Abnormal Psychology, 100, 316-336.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
Colom, R., Abad, F. J., Garcia, L. F., & Juan-Espinosa, M. (2002). Education, Wechsler‘s
Full Scale IQ, and g. Intelligence, 30, 449-462.
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications.
Journal of Applied Psychology, 78, 98-104.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,
16, 297-334.
Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models.
Psychological Bulletin, 105, 317-327.
DeYoung, C. G. (2006). Higher-order factors of the big five in a multi-informant sample.
Journal of Personality and Social Psychology, 91, 1138-1151.
Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between
constructs and measures. Psychological Methods, 5, 155-174.

Eid, M., Lischetzke, T., Nussbeck, F. W., & Trierweiler, L. I. (2003). Separating trait effects
from trait-specific method effects in multitrait–multimethod models: A multiple-
indicator CT-C(M-1) model. Psychological Methods, 8, 38-60.
Emmons, R. A. (1995). Levels and domains in personality: An introduction. Journal of
Personality, 63, 341-364.
Fleeson, W. (2004). Moving personality beyond the person–situation debate. The challenge
and the opportunity of within-person variability. Current Directions in Psychological
Science, 13, 83-87.
Gallagher, M., Lopez, S. J., & Preacher, K. J. (2009). The hierarchical structure of well-being.
Journal of Personality, 77, 1025-1050.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Gottfredson, L. S. (1997). Mainstream science on intelligence: An editorial with 52
signatories, history and bibliography. Intelligence, 24, 13-23.
Graham, J. W., & Collins, N. L. (1991). Controlling correlational bias via confirmatory factor
analysis of MTMM data. Multivariate Behavioral Research, 26, 607-629.
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation
modeling: An alternative to coefficient alpha. Psychometrika, 74, 155-167.
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6, 430-
450.
Gustafsson, J. E., & Balke, G. (1993). General and specific abilities as predictors of school
achievement. Multivariate Behavioral Research, 28, 407-434.
Hall, R. J., Snell, A. F., & Singer Foust, M. (1999). Item parceling strategies in SEM:
Investigating the subtle effects of unmodeled secondary constructs. Organizational
Research Methods, 2, 233-256.

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the
frequency of use of various types. Educational and Psychological Measurement, 60,
523-531.
Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf-Gc theory. In D. P. Flanagan,
J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment:
Theories, tests, and issues (pp. 53-91). New York: The Guilford Press.
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure
analysis: Conventional criteria versus new alternatives. Structural Equation Modeling,
6, 1-55.
Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long
(Eds.), Testing structural equation models (pp. 294-316). Newbury Park, CA: Sage
Publications.
Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to
parcel: Exploring the question, weighting the merits. Structural Equation Modeling, 9,
151-173.
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural
analysis (4th ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:
Addison-Wesley.
Lubinski, D. (2004). Introduction to the special section on cognitive abilities: 100 years after
Spearman‘s (1904) ―‗General intelligence‘, objectively determined and measured‖.
Journal of Personality and Social Psychology, 86, 96-111.
MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in
psychological research. Annual Review of Psychology, 51, 201-226.

Marsh, H. W. (1987). The hierarchical structure of self-concept and the application of
hierarchical confirmatory factor analysis. Journal of Educational Measurement, 24,
17-39.
Marsh, H. W., & Craven, R. (2006). Reciprocal effects of self-concept and performance from
a multidimensional perspective. Perspectives on Psychological Science, 1, 133-163.
Marsh, H. W., Hau, K. T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The
number of indicators per factor in confirmatory factor analysis. Multivariate
Behavioral Research, 33, 181-220.
Marsh, H. W., Hau, K.-T., & Grayson, D. (2005). Goodness of fit in structural equation
models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics
(pp. 275-340). Mahwah, NJ: Erlbaum.
Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on
hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in
overgeneralizing Hu and Bentler‘s (1999) findings. Structural Equation Modeling, 11,
320-341.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum
Associates.
McDonald, R. P. (2010). Structural models and the art of approximation. Perspectives on
Psychological Science, 5, 675-686.
McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the
shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1-10.
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models.
Psychological Methods, 1, 293-299.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.
Psychological Bulletin, 105(1), 156-166.

Muthén, B. O. (1984). A general structural equation model with dichotomous, ordered
categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.
Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor
analysis of non-normal Likert variables. British Journal of Mathematical and
Statistical Psychology, 38, 171-189.
Muthén, L. K., & Muthén, B. O. (1998–2010). Mplus User’s Guide (6th ed.). Los Angeles,
CA: Muthén & Muthén.
Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample
size and determine power. Structural Equation Modeling, 9, 599-620.
Raykov, T., Dimitrov, D. M., & Asparouhov, T. (2010). Evaluation of scale reliability with
binary measures using latent variable modeling. Structural Equation Modeling, 17,
265-279.
Revelle, W. (2010). Package ‗psych‘. Retrieved August 24, 2010, from http://personality-
project.org/r/psych_manual.pdf
Rhemtulla, M., Brosseau-Liard, P., & Savalei, V. (2010). How many categories is enough to
treat data as continuous? A comparison of robust continuous and categorical SEM
estimation methods under a range of non-ideal situations. Manuscript submitted for
publication.
Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-
order factor analysis. Multivariate Behavioral Research, 23, 51-67.
Satorra, A. (1990). Robustness issues in structural equation modeling: A review of recent
developments. Quality & Quantity, 24, 367-386.
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions.
Psychometrika, 22, 53-61.

Schmiedek, F., & Li, S.-C. (2004). Toward an alternative representation for disentangling
age-associated differences in general and specific cognitive abilities. Psychology and
Aging, 19, 40-56.
Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Self-concept: Validation of construct
interpretations. Review of Educational Research, 46, 407-441.
Sijtsma, K. (2009). Reliability beyond theory and into practice. Psychometrika, 74, 169-173.
Slaney, K. L., & Maraun, M. D. (2008). A proposed framework for conducting data-based test
analysis. Psychological Methods, 13, 376-390.
Snow, R. E., Kyllonen, P. C., & Marshalek, B. (1984). The topography of ability and learning
correlations. In R. J. Sternberg (Ed.), Advances in the psychology of human
intelligence (pp. 47-103). Hillsdale, NJ: Lawrence Erlbaum Associates.
Spearman, C. (1904). ―General intelligence,‖ objectively determined and measured. American
Journal of Psychology, 15, 201-293.
Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1999). Dimensions of the Beck
Depression Inventory–II in clinically depressed outpatients. Journal of Clinical
Psychology, 55, 117-128.
Steyer, R. (1989). Models of classical psychometric test theory as stochastic measurement
models: Representation, uniqueness, meaningfulness, identifiability, and testability.
Methodika, 3, 25-60.
Stoel, R. D., Garre, F. G., Dolan, C., & van den Wittenboer, G. (2006). On the likelihood ratio
test in structural equation modeling when parameters are subject to boundary
constraints. Psychological Methods, 11, 439-455.
Streiner, D. L. (2003). Starting at the beginning: An introduction to coefficient alpha and
internal consistency. Journal of Personality Assessment, 80, 99-103.

Swann, W. B. J., Chang-Schneider, C., & McClarty, K. L. (2007). Do people‘s self-views
matter? Self-concept and self-esteem in everyday life. American Psychologist, 62, 84-
94.
Tanaka, J. S., & Huba, G. J. (1984). Confirmatory hierarchical factor analyses of
psychological distress measures. Journal of Personality and Social Psychology, 46,
621-635.
Thompson, B. (2004). Exploratory and confirmatory factor analysis. Understanding concepts
and applications. Washington, DC: American Psychological Association.
Tomarken, A. J., & Waller, N. G. (2003). Potential problems with ―well fitting‖ models.
Journal of Abnormal Psychology, 112, 578-598.
Tulsky, D. S., & Ledbetter, M. F. (2000). Updating to the WAIS–III and WMS–III:
Considerations for research and clinical practice. Psychological Assessment, 12, 263-
262.
van der Maas, H. L. J., Dolan, C. V., Grasman, R. P. P. P., Wicherts, J. M., Huizenga, H. M.,
& Raijmakers, M. E. J. (2006). A dynamical model of general intelligence: The
positive manifold of intelligence by mutualism. Psychological Review, 113, 842-861.
Wechsler, D. (1997). Wechsler Adult Intelligence Scale–Third Edition. San Antonio, TX: The
Psychological Corporation.
West, S. G. (2006). Seeing your data: Using modern statistical graphics to display and detect
relationships. In R. R. Bootzin & P. E. McKnight (Eds.), Strengthening research
methodology: Psychological measurement and evaluation (pp. 159-182). Washington,
DC: American Psychological Association.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal
variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling:
Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage.
West, S. G., Taylor, A. B., & Wu, W. (in press). Model fit and model selection in structural
equation modeling. In R. H. Hoyle (Ed.), Handbook of Structural Equation Modeling.
New York: Guilford Press.
Wilhelm, O., & Oberauer, K. (2006). Why are reasoning ability and working memory
capacity related to mental speed? An investigation of stimulus–response compatibility
in choice reaction time tasks. European Journal of Cognitive Psychology, 18, 18-50.
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in
psychology journals: Guidelines and explanations. American Psychologist, 54, 594-
604.
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future
directions. Psychological Methods, 12, 58-79.
Wittmann, W. W. (1988). Multivariate reliability theory: Principles of symmetry and
successful validation strategies. In J. R. Nesselroade & R. B. Cattell (Eds.), Handbook
of multivariate experimental psychology (2nd ed., pp. 505-560). New York: Plenum.
Yang, Y., & Green, S. B. (2010). A note on structural equation modeling estimates of
reliability. Structural Equation Modeling, 17, 66-81.
Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-
order factor model and the hierarchical factor model. Psychometrika, 64, 113-128.
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach‘s alpha, Revelle‘s beta, and
McDonald‘s omega h: Their relations with each other and two alternative
conceptualizations of reliability. Psychometrika, 70, 123-133.
Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability
to a latent variable common to all of a scale‘s indicators: A comparison of estimators
for ωh. Applied Psychological Measurement, 30, 121-144.

Footnotes
1
The online supplemental materials can be retrieved from www.emacs.uni.lu.
2
The maximum likelihood estimator used in this article is based on statistical
distribution theory of covariance matrices. Using correlation matrices with maximum
likelihood estimation leads to improper standard error estimates of model parameters and to
misleading confidence intervals and test statistics unless certain constraints are imposed on
the model parameters (Cudeck, 1989; McDonald, 1999). Only the correlation matrix of the
subtest scores was provided for the Spanish standardization sample of the WAIS–III. To
obtain correct standard errors for model parameters, we therefore followed McDonald (1999,
pp. 193-195) and specified in all models under investigation (a) appropriate constraints and
(b) scaling parameters for the estimation of subtest-specific factors (i.e., e1 to e14). Mplus (L.
K. Muthén & Muthén, 1998–2010) syntax files are presented in the Appendix. Note that the
2 value of the goodness-of-fit statistic of overall fit and the descriptive fit statistics (e.g.,
RMSEA, CFI, and SRMR) are typically not affected when correlation matrices are used
instead of covariance matrices (McDonald, 1999, p. 194).

3
The scale scores derived in the present article are not identical to the Full Scale IQ or
to the index scores of the WAIS–III, which were computed with a selection of subtests
(Caruso & Cliff, 1999). Accordingly, our reliability estimates do not apply to the Full Scale
IQ or to the index scores of the WAIS–III for the Spanish standardization sample.
4
We estimated score reliability for CFA models in which factor loadings and variances
of subtest-specific factors could vary across manifest measures (reflecting the assumption of
congeneric measures). Note that CFA models may also be applied to estimate score reliability
in more restrictive measurement models (Bollen, 1989, p. 208) in which factor loadings are
constrained to be equal for all measures (tau-equivalent measures) or in which factor loadings
and the variances of the subtest-specific factors are constrained to be equal for all measures
(parallel measures).
5
Sample size also affects the precision of the estimation of alpha (Bonett, 2003). Thus,
alpha may not be preferable to omega or omega hierarchical, even in cases of small samples.
6
This pattern of results is akin to problems encountered in the interpretation of scale
score profiles, when differences between scale scores are computed to identify a person‘s
strengths and weaknesses. For example, the reliability of differences between WAIS–III index
scores assessing specific ability constructs was found to be low (when using unit weights to
compute index scores) as these scores proved to be strongly mutually correlated (e.g., Caruso
& Cliff, 1999).

Acknowledgements
We thank the editor Stephen West and all four reviewers for their valuable comments on an
earlier version of this manuscript, and Susannah Goss for editorial support.
Table 1
Intercorrelations Among the Subtest Scores of the WAIS–III (as Obtained for the Spanish Standardization Sample)
Task score 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
1. Vocabulary (voc)a --
2. Similarities (sim) .755 --
3. Arithmetic (ari) .608 .596 --
4. Digit span (dig) .555 .566 .614 --
5. Information (inf) .715 .678 .661 .543 --
6. Comprehension (com) .729 .697 .554 .503 .671 --
7. Letter–number (let) .627 .612 .669 .759 .603 .567 --
8. Picture completion (pic_c) .616 .621 .567 .538 .599 .552 .612 --
9. Digit–symbol coding (cod) .606 .582 .576 .590 .532 .502 .689 .643 --
10. Block design (blo) .598 .605 .625 .567 .616 .496 .655 .679 .668 --
11. Matrices (mat) .657 .668 .699 .609 .634 .564 .692 .711 .711 .769 --
12. Picture arrangement (pic_a) .613 .623 .585 .568 .616 .574 .665 .677 .672 .692 .753 --
13. Symbol search (sym) .588 .57 .584 .563 .533 .494 .675 .623 .787 .673 .717 .670 --
14. Object assembly (obj) .560 .554 .537 .54 .538 .490 .597 .619 .627 .742 .689 .673 .649 --
Note. This table is reprinted from Intelligence, 30, R. Colom, F. J. Abad, L. F. Garcia, M. Juan-Espinosa, Education, Wechsler‘s Full Scale IQ, and
g, 449-462, Copyright (2002), with permission from Elsevier.
a
Labels in parentheses are those used in the Mplus syntax presented in the Appendix.
Table 2
Fit of the Four Factor Models to the WAIS–III Data
Model 2 df CFI RMSEA SRMR
One-factor model 1923 77 .888 .132 .050
First-order factor model 515 71 .973 .068 .028
Higher-order factor model 570 73 .970 .071 .032
Nested-factor model 376 64 .981 .060 .023
Note. All 2 goodness-of-fit tests were statistically significant at p < .001. CFI = Comparative
Fit Index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root
Mean Squared Residual.
Table 3
Standardized Factor Loadings of the Subtests of the WAIS–III as Obtained by Applying the
Schmid-Leiman Transformation to the Higher-Order Factor Model (Figure 1c)
Subtest gHO VCHO, specific POHO, specific WMHO, specific PSHO, specific eHO
Information .70 .42 .33
Vocabulary .76 .45 .22
Similarities .74 .43 .27
Comprehension .70 .41 .34
Object assembly .77 .20 .37
Block design .83 .21 .27
Picture completion .77 .20 .36
Matrix reasoning .87 .22 .20
Picture arrangement .81 .21 .30
Digit span .74 .33 .34
Letter–number sequencing .82 .37 .20
Arithmetic .71 .32 .39
Digit–symbol coding .81 .36 .21
Symbol search .81 .36 .22

Table 4
Example Equations for the Computation of Score Reliability (in Terms of  and h)
Equations
One-Factor Model: General Cognitive Ability Score
 = (.79 + .78 + .77 + .73 + .76 + .71 + .82 + .79 + .81 + .83 + .88 + .82 + .80
+ .77)2 / [(.79 + .78 + .77 + .73 + .76 + .71 + .82 + .79 + .81 + .83 + .88 + .82 + .80
+ .77)2 + (.38 + .39 + .41 + .47 + .42 + .50 + .33 + .38 + .35 + .32 + .23 + .32 + .36 + .41)] =
.96
First-Order Factor Model: Verbal Comprehension Score
 = (.82 + .88 + .85 + .81)2 / [(.82 + .88 + .85 + .81)2 + (.33 + .23 + .27 + .34)] = .91
Higher-Order Factor Model: General Cognitive Ability Score
 = [(.70 + .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42
+ .45 + .43 + .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2] / [(.70 +
.76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42 + .45 + .43
+ .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2 + (.33 + .22 + .27 +
.34 + .37 + .27 + .36 + .20 + .30 + .34 + .20 + .39 + .21 + .22)] = .97
h = (.70 + .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 / [(.70
+ .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42 + .45 +
.43 + .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2 + (.33 + .22 + .27
+ .34 + .37 + .27 + .36 + .20 + .30 + .34 + .20 + .39 + .21 + .22)] = .92
Nested-factor Model: Verbal Comprehension Score
 = [(.73 + .76 + .75 + .66)2 + (.35 + .46 + .39 + .50)2] / [(.73 + .76 + .75 + .66)2 + (.35 + .46
+ .39 + .50)2 + (.34 + .22 + .28 + .31)] = .91
h = (.35 + .46 + .39 + .50)2 / [(.73 + .76 + .75 + .66)2 + (.35 + .46 + .39 + .50)2 + (.34 + .22
+ .28 + .31)] = .23
Table 5
Model-Based Variance Composition and Reliabilities ( and h) of the WAIS–III Scale Scores
Composition of scale score variance

Scale score Target Observed % target % nontarget % error  h
construct variance construct construct(s)
One-factor model
General cognitive ability gOF 127.1 95.9 -- 4.1 .96 --
First-order factor model

Verbal comprehension VCFO 12.5 90.7 -- 9.3 .91 --
Perceptual organization POFO 19.0 92.1 -- 7.9 .92 --
Working memory WMFO 7.1 86.9 -- 13.1 .87 --
Processing speed PSFO 3.6 88.1 -- 11.9 .88 --
Higher-order factor model

Verbal comprehension VCHO,specific 12.5 23.5 67.2 9.3 .91 .23
Perceptual organization POHO,specific 19.0 5.8 86.3 7.9 .92 .06
Working memory WMHO,specific 7.1 14.5 72.4 13.1 .87 .15
Processing speed PSHO,specific 3.6 14.6 73.5 11.9 .88 .15
General cognitive ability gHO 126.9 92.4 4.4 3.2 .97 .92
Nested-factor model
Verbal comprehension VCNF,specific 12.5 23.4 67.5 9.1 .91 .23
Perceptual organization PONF,specific 18.9 5.0 87.6 7.3 .93 .05
Working memory WMNF,specific 7.1 15.4 72.6 12.0 .88 .15
Processing speed PSNF,specific 3.6 15.9 72.1 11.9 .88 .16
General cognitive ability gNF 127.0 92.7 4.3 3.0 .97 .93
Note. g = general cognitive ability; VC = Verbal Comprehension; PO = Perceptual Organization; WM = Working Memory; PS = Processing Speed; OF = one-factor model; FO =
first-order factor model; HO = higher-order factor model; NF = nested-factor model.
Figure Captions
Figure 1. Alternative factor models for the WAIS–III: (a) One-factor model (OF), (b) first-
order factor model (FO), (c) higher-order factor model (HO), and (d) nested-factor model
(NF). Standardized solution is shown. g = general cognitive ability; VC = Verbal
Comprehension; PO = Perceptual Organization; WM = Working Memory; PS = Processing
Speed. Note: All models were identified by fixing the variance of the latent factors to 1.00; all
other model parameters were freely estimated. For the higher-order factor model, the
variances of the specific first-order factors (i.e., VCHO,specific, POHO,specific, WMHO,specific, and
PSHO,specific) were thus constrained to 1 – squared factor loading of the corresponding first-
order factor on gHO (e.g., the variance of VCHO,specific was constrained to equal 1 – squared
factor loading of VC on gHO). Further, for the nested-factor model, the factor loadings of the
two subtests on PSNF,specific were constrained to be equal (the nested-factor model otherwise
proved to be empirically underidentified).

63
Figure 1
a. One-Factor Model b. First-Order Factor Model
eOF,1 = .42 eFO,1 = .33
Information Information
.82
eOF,2 = .38 .76 eFO,2 = .23 1
Vocabularly Vocabularly .88
.79
eFO,3 = .27
VCFO
eOF,3 = .39 Similarities Similarities .85
.78
eOF,4 = .50 eFO,4 = .34 .81
Comprehension Comprehension
.71
.84
eOF,5 = .41 eFO,5 = .37
Object Assembly Object Assembly
.77 .80
eOF,6 = .32 eFO,6 = .27
Block Design .83 Block Design .85 1
eOF,7 = .38 eFO,7 = .36 .80
Picture Completion .79 Picture Completion
POFO .82
1 .89
eOF,8 = .23 .88 eFO,8 = .20 Matrix Reasoning
Matrix Reasoning
.82 gOF Picture
.84 .74
eOF,9 = .32 Picture eFO,9 = .30
Arrangement Arrangement
.87
.73
eOF,10 = .47 eFO,10 = .34 1
Digit Span Digit Span
.82 .81
eOF,11 = .33 Letter-Number eFO,11 = .20 Letter-Number .90
Sequencing Sequencing WMFO .90
eOF,12 = .41 .77 eFO,12 = .39
Arithmetic Arithmetic .78
.73
.84
1
.80
eOF,13 = .47 eFO,13 = .34 Digit-Symbol .89
Digit-Symbol
Coding Coding
PSFO
eOF,14 = .36 eFO,14 = .22
Symbol Search Symbol Search .89
c. Higher-Order Factor Model d. Nested-Factor Model

.26
eHO,1 = .33 VCHO,specific
Information
.82 Information eNF,1 = .34
.35
eHO,2 = .22 1
Vocabularly .88 Vocabularly .46 eNF,2 = .22
eHO,3 = .27
.85 VCHO .73 .39 VCNF,specific
Similarities Similarities eNF,3 = .28
.50
eHO,4 = .34 .81
Comprehension Comprehension eNF,4 = .31
.76
.86 .75
.06
eHO,5 = .37 .66
Object Assembly
.80 POHO,specific Object Assembly .33 eNF,5 = .32
.75
eHO,6 = .27
Block Design .85 Block Design .37 eNF,6 = .19
.82
1
eHO,7 = .36 .80 1
Picture Completion .09
POHO Picture Completion eNF,7 = .37
.89 .97 1 PONF,specific
.79 .10
eHO,8 = .20 Matrix Reasoning gNF .89
Matrix Reasoning
.08
eNF,8 = .20
eHO,9 = .30 Picture .84 gHO Picture

Arrangement .17 .83 eNF,9 = .31
Arrangement
WMHO,specific .91
eHO,10 = .34 .70
Digit Span .81 .52 eNF,10 = .24
Digit Span
.81 1
eHO,11 = .20 Letter-Number .90 Letter-Number .37 eNF,11 = .21
Sequencing WMHO .76 Sequencing WMNF,specific
.91 .16
eHO,12 = .39 eNF,12 = .40
Arithmetic .78 Arithmetic
.17 .80
PSHO,specific
.80
eHO,13 = .34 Digit-Symbol .89 Digit-Symbol .38 1 eNF,13 = .21
Coding Coding
.88 PSHO .38 PSNF,specific
eHO,14 = .22 eNF,14 = .22
Symbol Search Symbol Search
View publication stats

72 Inpress ATutorialon Hierarchically Structured Constructs

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

72 Inpress ATutorialon Hierarchically Structured Constructs

Uploaded by

Copyright:

Available Formats

See

A Tutorial on Hierarchically Structured

Article in Journal of Personality · August 2012

Martin Brunner Gabriel Nagy

SEE PROFILE SEE PROFILE

Meta-heuristics in psychological assessment View project

Prosocial behavior View project

The user has requested enhancement of the downloaded file.

A Tutorial on Hierarchically Structured Constructs

Many psychological constructs are conceived to be hierarchically structured and thus to

is most appropriate for which research and diagnostic purposes.

Keywords: latent constructs, confirmatory factor analysis, reliability, hierarchical factor

A Tutorial on Hierarchically Structured Constructs

Many psychological constructs are conceptualized to be hierarchically structured and

thus to operate at various levels of generality (Emmons, 1995). Hierarchically structured

neuroticism (reversed), agreeableness, and conscientiousness were subordinate to stability;

extraversion and openness were subordinate to plasticity.

interpreted or used‖ (American Educational Research Association, American Psychological

Association, & National Council on Measurement in Education, 1999, p. 20). The

1993; Slaney & Maraun, 2008).

individual differences researchers in their choice of a measurement model for hierarchically

interpretation of general and specific constructs as conceptualized in four different kinds of

of score reliability. We capitalize on recent psychometric advances (McDonald, 1999;

how constructs relate to other variables (e.g., psychological constructs, sociodemographic

characteristics, or life outcomes). We illustrate our arguments by reference to the Wechsler

be downloaded from our website.1

Confirmatory Factor Analytic Models for Hierarchically Structured Constructs

areas (Colom et al., 2002).

***** Please insert Table 1 about here *****

In this article, we investigate theories of the structure of intelligence that focus on

correlated across domains. In his theory of intelligence, Spearman explained these

the WAIS–III (depicted as rectangles) are caused by individual differences in a single

subtests of the WAIS–III is represented by a single-headed arrow. The one-factor model

WAIS–III. Hence, gOF is a very general construct as it influences all subtests.

************** Please insert Figure 1 about here **************

In Spearman‘s theory of intelligence, each ability measure is also influenced by a

mutually independently; these factors are therefore specified to be mutually uncorrelated.

***** Please insert Table 2 about here *****

First-Order Factor Model

specific to either a content domain (i.e., Verbal Comprehension VCFO) or a cognitive

therefore represented as a single factor eFO that affects each subtest.

restrictions on these correlations.

statistic—a so-called chi-bar-square (  2 ) statistic—that is appropriate when model

corresponding  2 distribution was  2 = 8.82. The difference in 2 goodness-of-fit test values

model is clearly preferable to the one-factor model.

Higher-Order Factor Model

model (or a nested-factor model) seems to be a natural statistical representation of theories

such as Carroll‘s (1993) three-stratum theory or McGrew‘s (2009) Cattell-Horn-Carroll

theory, which conceive intelligence to be a multifaceted, hierarchically structured construct.

the constructs Verbal Comprehension VCHO, Perceptual Organization POHO, Working

with higher scores on all first-order factors.

to gHO and to VCHO,specific).

subtest scores can be estimated by applying a mathematical transformation to the higher-order

factor model—namely the Schmid-Leiman transformation. This transformation has been

recommended as an ―elegant method‖ (Thompson, 2004, p. 74) for enhancing the

Loehlin, 2004). Specifically, the Schmid-Leiman transformation of a higher-order factor

transformation is a mathematical transformation of the model parameters obtained for the

(Yung, Thissen, & McLeod, 1999).

structured, with abilities operating at various levels of generality.

Further, as expected, each subtest of the WAIS–III is substantively influenced by the

abilities share a considerable amount of common variance.

variance of VCHO,specific is .26; hence, the corresponding standard deviation is .51).

***** Please insert Table 3 about here *****

* Please insert Table 1 about here *

Please insert Figure 1 about here

* Please insert Table 2 about here *

* Please insert Table 3 about here *

Please insert Table 4 about here

Please insert Table 5 about here