You are on page 1of 64

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/232706750

A Tutorial on Hierarchically Structured


Constructs

Article in Journal of Personality · August 2012


DOI: 10.1111/j.1467-6494.2011.00749.x · Source: PubMed

CITATIONS READS

147 907

3 authors:

Martin Brunner Gabriel Nagy


Universität Potsdam Leibniz Institute for Science and Mathematic…
206 PUBLICATIONS 2,698 CITATIONS 102 PUBLICATIONS 1,621 CITATIONS

SEE PROFILE SEE PROFILE

Oliver Wilhelm
Ulm University
163 PUBLICATIONS 6,011 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Meta-heuristics in psychological assessment View project

Prosocial behavior View project

All content following this page was uploaded by Oliver Wilhelm on 18 October 2017.

The user has requested enhancement of the downloaded file.


Running Head: HIERARCHICALLY STRUCTURED CONSTRUCTS 1

A Tutorial on Hierarchically Structured Constructs

Martin Brunner
University of Luxembourg
Luxembourg

Gabriel Nagy
University of Tuebingen
Germany

Oliver Wilhelm
University of Ulm
Germany

Brunner, M., Nagy, G., & Wilhelm, O. (in press). A tutorial on hierarchically structured constructs. Journal of

Personality.

Contact: martin.brunner@uni.lu
HIERARCHICALLY STRUCTURED CONSTRUCTS 2

Abstract

Many psychological constructs are conceived to be hierarchically structured and thus to

operate at various levels of generality. Alternative confirmatory factor analytic (CFA) models

can be used to study various aspects of this proposition: (a) the one-factor model focuses on

the top of the hierarchy and contains only a general construct, (b) the first-order factor model

focuses on the intermediate level of the hierarchy and contains only specific constructs, and

both (c) the higher-order factor model and (d) the nested-factor model consider the hierarchy

in its entirety and contain both general and specific constructs (e.g., bifactor model). This

tutorial considers these CFA models in depth, addressing their psychometric properties,

interpretation of general and specific constructs, and implications for model-based score

reliabilities. The authors illustrate their arguments with normative data obtained for the

Wechsler Adult Intelligence Scale and conclude with recommendations on which CFA model

is most appropriate for which research and diagnostic purposes.

Keywords: latent constructs, confirmatory factor analysis, reliability, hierarchical factor

models
HIERARCHICALLY STRUCTURED CONSTRUCTS 3

A Tutorial on Hierarchically Structured Constructs

Many psychological constructs are conceptualized to be hierarchically structured and

thus to operate at various levels of generality (Emmons, 1995). Hierarchically structured

constructs include traditional personality traits (DeYoung, 2006), self-concept (Marsh &

Craven, 2006; Shavelson, Hubner, & Stanton, 1976), disorders like depression (Steer, Ball,

Ranieri, & Beck, 1999; Tanaka & Huba, 1984), subjective wellbeing (Chen, West, & Sousa,

2006; Gallagher, Lopez, & Preacher, 2009), and intelligence (Carroll, 1993; McGrew, 2009).

For example, the study by DeYoung (2006) supported a hierarchical structure of personality

with two general personality constructs—stability and plasticity—at the top of the hierarchy

and the more specific Big Five personality dimensions at the next level in the hierarchy:

neuroticism (reversed), agreeableness, and conscientiousness were subordinate to stability;

extraversion and openness were subordinate to plasticity.

Crucially, neither general nor specific constructs are directly observable entities, but

rather (unobserved) latent variables that are reflected in observable scores on corresponding

measures. One key task for all personality and individual difference researchers is therefore to

choose from a variety of measurement models that link general and/or narrower, specific

constructs with observable measures in different ways. It is also this decision that provides a

statistical rationale for the computation of scale scores that reflect respondents‘ levels on

general and/or specific constructs. Such a rationale is required, for example, by the Standards

for Educational and Psychological Testing, which state that ―where composite scores are

developed, the basis and rationale for arriving at the composites should be given‖ and that

―the rationale and supporting evidence must pertain directly to the specific score […] to be

interpreted or used‖ (American Educational Research Association, American Psychological

Association, & National Council on Measurement in Education, 1999, p. 20). The

measurement model is also the crucial prerequisite for assessing score reliability (Cortina,

1993; Slaney & Maraun, 2008).


HIERARCHICALLY STRUCTURED CONSTRUCTS 4

Taken together, measurement models are of critical importance for both research and

applied assessment. The major goal of this tutorial is therefore to guide personality and

individual differences researchers in their choice of a measurement model for hierarchically

structured constructs. To this end, we consider the psychometric properties and the

interpretation of general and specific constructs as conceptualized in four different kinds of

confirmatory factor analytic (CFA) models. Further, we elaborate on the implications of these

CFA models for the computation and interpretation of both scores and model-based estimates

of score reliability. We capitalize on recent psychometric advances (McDonald, 1999;

Zinbarg, Yovel, Revelle, & McDonald, 2006) to demonstrate how to compute the reliability

of scores that are intended to assess constructs at various levels of the hierarchy. We

synthesize our key points in the General Discussion, offering recommendations on which

CFA model is most appropriate for which questions in personality research. In so doing, we

note the potential inherent restrictions of specific CFA models when they are used to examine

how constructs relate to other variables (e.g., psychological constructs, sociodemographic

characteristics, or life outcomes). We illustrate our arguments by reference to the Wechsler

Adult Intelligence Scale–Third Edition (WAIS–III, Wechsler, 1997)—a widely used measure

of intelligence. Software code that can be used to examine the WAIS–III by means of the four

CFA models discussed and to compute model-based reliabilities of scores (Cheung, 2009) can

be downloaded from our website.1

Confirmatory Factor Analytic Models for Hierarchically Structured Constructs

Psychological theories can be divided into two components: one component that

specifies how theoretical constructs are related to corresponding measures and one component

that defines the mutual relationships of the theoretical constructs (Edwards & Bagozzi, 2000).

CFA models (and structural equation models in general) are useful statistical tools for

empirically examining both components. In this section, we discuss four popular CFA models

that may be applied in many areas of personality and individual differences research to study
HIERARCHICALLY STRUCTURED CONSTRUCTS 5

hierarchically structured constructs. Here, we use these models to test alternative theories of

the structure of intelligence. More specifically, (a) the one-factor model, based on Spearman‘s

work (1904), contains only a general construct representing general cognitive ability (g). (b)

The first-order factor model reflects important ideas of the theory of fluid and crystallized

intelligence (Horn & Noll, 1997) and focuses on ability constructs that are narrower in scope

and specific to cognitive operations (e.g., processing speed) or content domains (e.g., verbal

comprehension). (c) The higher-order factor model and (d) the nested-factor model are

informed by current theories of the structure of intelligence (Carroll, 1993; McGrew, 2009).

These theories conceive general cognitive ability to be the most general construct at the apex

of the hierarchy; specific abilities that are narrower in scope are located at lower levels of the

ability hierarchy.

To study these alternative theories, we use the correlations among the subtests of the

WAIS–III (Table 1) as obtained for the Spanish standardization sample (Colom, Abad,

Garcia, & Juan-Espinosa, 2002, Table A3).2 This sample comprises data from 1,369 persons

(703 women and 666 men; age range: 15 to 94 years). It is representative of the Spanish

population in terms of educational level, geographical region, and residence in urban and rural

areas (Colom et al., 2002).

***** Please insert Table 1 about here *****

In this article, we investigate theories of the structure of intelligence that focus on

general cognitive ability and on specific abilities that are relatively broad in scope. We

therefore use subtest scores, which serve as adequate manifest measures for representing such

broad constructs (Bagozzi & Edwards, 1998). It is important to note that the CFA models

presented in this article can also be applied on the basis of item scores (e.g., rating-scale items

with response categories ranging from ―strongly disagree‖ to ―strongly agree‖), as is done in

many areas of personality and individual difference research. We elaborate on the use of item
HIERARCHICALLY STRUCTURED CONSTRUCTS 6

scores when discussing the statistical requirements of the model-based estimation of score

reliabilities.

One-Factor Model

Charles Spearman (1904) found that measures of cognitive abilities are positively

correlated across domains. In his theory of intelligence, Spearman explained these

intercorrelations by the operation of a single factor gOF representing general cognitive ability.

These ideas are reflected in the one-factor model (OF, see Figure 1a), which focuses on a

general ability construct. (To distinguish the constructs specified in a particular model from

related constructs in other models, we use a subscript to index all corresponding factors; here,

OF). Specifically, the one-factor model predicts that individual differences in all subtests of

the WAIS–III (depicted as rectangles) are caused by individual differences in a single

common latent factor (depicted as an ellipse) that represents gOF. The influence of gOF on the

subtests of the WAIS–III is represented by a single-headed arrow. The one-factor model

implies that higher scores on gOF are associated with higher scores on all 14 subtests of the

WAIS–III. Hence, gOF is a very general construct as it influences all subtests.

************** Please insert Figure 1 about here **************

In Spearman‘s theory of intelligence, each ability measure is also influenced by a

second factor orthogonal to gOF (which is why he called it the two-factor theory of

intelligence). This second factor represents some specific ability that is required to complete a

certain subtest. Further, each subtest score may also be affected to some degree by random

measurement error. Both of these latter influences (i.e., reliable but subtest-specific variance

and unreliable error variance) are represented by a single factor eOF for each subtest. These

subtest-specific factors (i.e., eOF,1 to eOF,14) are depicted by the single arrows pointing to the

individual subtests in Figure 1a. Note that the factor eOF is specific to each subtest for two

reasons: (a) Unless two measures share measure-specific variance (e.g., when the same

subtest is applied at two successive points of measurement or when two self-report items have
HIERARCHICALLY STRUCTURED CONSTRUCTS 7

similar wordings), it is not possible to disentangle the variance of a particular subtest that is

attributable to random measurement error from that attributable to specific variance. (b)

Measurement error is uncorrelated across subtests because its influence on subtests is random

(i.e., unpredictable). Thus, the factors eOF,1 to eOF,14 as well as gOF are assumed to operate

mutually independently; these factors are therefore specified to be mutually uncorrelated.

How well does the one-factor model fit the data from the Spanish standardization

sample of the WAIS–III? The overall model fit of the one-factor model is modest (see Table

2), suggesting that a single general construct does not adequately explain the associations

among the subtests. We therefore do not interpret the model parameters any further.

***** Please insert Table 2 about here *****

First-Order Factor Model

The one-factor model focuses on a very general ability construct (i.e., gOF) and does

not address abilities that are narrower in scope (and that may be located at intermediate levels

of the hierarchy). In the first-order factor model (FO), in contrast, each measure is assumed to

be influenced by a single first-order factor that influences a subset of the WAIS–III subtests

(see Figure 1b). Note that this conception of cognitive abilities, and the absence of a factor

representing general cognitive ability, is informed by a more recent version of the theory of

fluid and crystallized abilities (Horn & Noll, 1997). A meaningful first-order factor model for

the WAIS–III (e.g., Tulsky & Ledbetter, 2000) therefore conceives the 14 subtests to be

influenced by four mutually correlated first-order factors (the correlations are depicted in

Figure 1b as double-headed arrows). These first-order factors represent constructs that are

specific to either a content domain (i.e., Verbal Comprehension VCFO) or a cognitive

operation (i.e., Perceptual Organization POFO, Working Memory WMFO, Processing Speed

PSFO). For example, the subtests Information, Vocabulary, Similarities, and Comprehension

are all assumed to be affected by a single factor that represents the operation of the (domain-

specific) construct VCFO. Higher scores on VCFO are thus associated with higher scores on
HIERARCHICALLY STRUCTURED CONSTRUCTS 8

these four verbal subtests. Further, each subtest is influenced by subtest-specific factors (i.e.,

eFO,1 to eFO,14) that represent an ability specifically required for a certain subtest as well as

measurement error. The latter two sources of variance cannot be separated (see above) and are

therefore represented as a single factor eFO that affects each subtest.

An important assumption of the first-order factor model is that the first-order factors

may be correlated. However, the first-order factor model typically does not specify a priori

the direction of the mutual associations of VCFO, POFO, WMFO, and PSFO by placing

restrictions on these correlations.

How well does the first-order factor model fit the data? The overall model fit of the

first-order factor model is good (see Table 2), indicating that four correlated first-order

constructs provide a reasonable explanation of the associations among the subtests of the

WAIS–III. Further, it is possible to statistically test whether four correlated first-order ability

constructs are better able to explain the data than is a single common factor representing

general cognitive ability gOF (Rindskopf & Rose, 1988)—the one-factor model is statistically

equivalent to a first-order factor model in which all factor correlations are fixed to r = 1.

However, when boundary values (here: r = 1) are involved, the difference between the 2

goodness-of-fit test values as obtained for the first-order factor model and the one-factor

model does not follow a 2 distribution. Hence, we applied a multi-step procedure developed

by Stoel and colleagues (Stoel, Garre, Dolan, & van den Wittenboer, 2006) to compute a test

statistic—a so-called chi-bar-square (  2 ) statistic—that is appropriate when model

parameters are fixed to their boundary values (the online supplement describes this procedure

and contains the corresponding software code). The critical value at  = .05 of the

corresponding  2 distribution was  2 = 8.82. The difference in 2 goodness-of-fit test values

was 2 = 1,408 and thus considerably larger than the critical value of the  2 distribution.

Thus, if the two models fitted the data equally well in the population, such a difference in
HIERARCHICALLY STRUCTURED CONSTRUCTS 9

model fit would be very unlikely to emerge. These findings (along with the improvement seen

in the descriptive fit statistics CFI, RMSEA, and SRMR) indicate that the first-order factor

model is clearly preferable to the one-factor model.

Further, values of the standardized factor loadings of the subtests on first-order factors

(see Figure 1b) range from  = .78 to  = .90 (Mdn  = .85), showing that each factor is well

defined and that the subtests are substantively influenced by the corresponding first-order

construct. Finally, the first-order constructs are strongly positively correlated with one

another: correlations between the latent constructs range from r = .74 (VCFO and PSFO) to r =

.90 (POFO and PSFO), suggesting that the first-order constructs share a considerable amount of

common variance.

Higher-Order Factor Model

The one-factor model focuses on general abilities and the first-order factor model on

specific abilities. Neither model simultaneously addresses general and specific abilities (that

are located at different levels of the ability hierarchy). In contrast, the higher-order factor

model (HO) and the nested-factor model (see next section) both consider the ability hierarchy

in its entirety. We start with the higher-order factor model, in which higher-order factors

reflect the operation of one or more higher-order constructs that explain the intercorrelations

(i.e., the common variance) among the lower-order constructs. Hence, a higher-order factor

model (or a nested-factor model) seems to be a natural statistical representation of theories

such as Carroll‘s (1993) three-stratum theory or McGrew‘s (2009) Cattell-Horn-Carroll

theory, which conceive intelligence to be a multifaceted, hierarchically structured construct.

Figure 1c shows the higher-order factor model for the WAIS–III subtests. As in the

first-order factor model, the subtests are influenced by four first-order factors (representing

the constructs Verbal Comprehension VCHO, Perceptual Organization POHO, Working

Memory WMHO, and Processing Speed PSHO). This model implies, for example, that higher

scores on VCHO are associated with higher scores on the four verbal subtests. Again, subtests
HIERARCHICALLY STRUCTURED CONSTRUCTS 10

are also influenced by subtest-specific factors (i.e., eHO,1 to eHO,14) that can be interpreted in

the same way as in the first-order factor model. Subtest-specific factors are therefore mutually

independent and uncorrelated with first-order and higher-order factors. Hence, the part of the

model that links first-order constructs with subtests is structurally equivalent to the first-order

factor model.

In contrast to the first-order factor model, however, the shared variance of the first-

order factors is accounted for by a second-order factor gHO that represents the higher-order

construct general cognitive ability. Hence (if the higher-order factor model fits the data well),

gHO accounts for the correlations among the first-order factors (observed in the first-order

factor model) and thus explains the common variance of the first-order factors. This implies

that gHO influences all first-order constructs; higher scores on gHO are therefore associated

with higher scores on all first-order factors.

Consequently, there are two components to the variances of the first-order factors: one

component that is explained by gHO and one component that is independent of gHO (Edwards

& Bagozzi, 2000; Gorsuch, 1983). The latter component is represented in Figure 1c by

specific factors (e.g., specific Verbal Comprehension VCHO,specific) that point to the first-order

factors and that explain individual differences in the first-order factors over and above gHO. In

the present model (and in most applications of higher-order factor models), these specific

factors are uncorrelated with the higher-order factor gHO and among themselves. The total

variance of the first-order constructs therefore represents a blend of the variance attributable

to gHO and to specific factors (e.g., the variance of VCHO is a blend of the variance attributable

to gHO and to VCHO,specific).

Close inspection of the higher-order factor model reveals that the impact of gHO on

manifest subtest scores is mediated by the first-order constructs (Edwards & Bagozzi, 2000;

Schmid & Leiman, 1957): gHO (indirectly) influences all subtests of the WAIS–III and is

therefore clearly broader in scope than the first-order constructs, which influence only a
HIERARCHICALLY STRUCTURED CONSTRUCTS 11

selection of subtests. The direct impact of the higher-order and specific factors on manifest

subtest scores can be estimated by applying a mathematical transformation to the higher-order

factor model—namely the Schmid-Leiman transformation. This transformation has been

recommended as an ―elegant method‖ (Thompson, 2004, p. 74) for enhancing the

interpretability of higher-order and lower-order factors (see also Brown, 2006; Gorsuch, 1983;

Loehlin, 2004). Specifically, the Schmid-Leiman transformation of a higher-order factor

model yields uncorrelated (first-order) factors that represent both the (higher-order) general

and the specific ability constructs. The factor loadings of the manifest measures on these

factors (see below for details of the computation) reflect the incremental impact of general

and specific abilities on the corresponding measures. Note that the Schmid-Leiman

transformation is a mathematical transformation of the model parameters obtained for the

original higher-order factor model; the empirical fit of both models is therefore identical

(Yung, Thissen, & McLeod, 1999).

How well does the higher-order factor model fit the data? The overall model fit is

adequate (see Table 2). Note that the higher-order factor model is nested in the first-order

factor model (Rindskopf & Rose, 1988). Thus, it is possible to statistically test whether the

higher-order factor representing gHO is capable of fully accounting for the correlations

observed among the first-order constructs in the first-order factor model. 2 difference testing

indicates that this is not completely the case, 2(2, N = 1,369) = 55; p < .001. The next step

is therefore (see also McDonald, 2010) to carefully inspect the residual correlations r that

are computed as the difference between the model-implied correlations among the first-order

constructs and the corresponding correlations in the first-order factor model. These residual

correlations range between r = –.04 (for WMFO and VCFO) and r = .04 (for PSFO and VCFO).

These discrepancies are not ―troublingly large‖ (i.e., –.10  r  .10; McDonald, 2010, p.

679). Thus, although 2 difference testing indicates that the higher-order factor gHO does not
HIERARCHICALLY STRUCTURED CONSTRUCTS 12

fully capture the correlations found among the first-order constructs in the first-order factor

model, the small residual correlations in combination with the adequate overall fit provide

empirical support for the theoretical proposition that cognitive abilities are hierarchically

structured, with abilities operating at various levels of generality.

Further, as expected, each subtest of the WAIS–III is substantively influenced by the

corresponding first-order construct (range:  = .78 to  = .90; Mdn  = .85), indicating that

these factors are well defined (see Figure 1c). Further, the first-order ability constructs are

strongly influenced by gHO: the standardized loadings on the higher-order factor are between

 = .86 (VCHO on gHO) and  = .97 (POHO on gHO; note that the upper bound of the

corresponding 95% confidence interval does not include 1.00), suggesting that the first-order

abilities share a considerable amount of common variance.

The Schmid-Leiman transformation can be used to estimate the direct impact of gHO

and specific ability constructs on the corresponding subtest scores (Table 3). Specifically, the

factor loadings of the manifest subtest scores on gHO (see Figure 1c) can be computed by

multiplying the factor loading of each subtest on the corresponding first-order factor by the

factor loading of this first-order factor on gHO. For example, the loading of the Information

subtest score on gHO is computed as .82  .86 = .70. Further, the loadings of the subtests on a

specific factor can be computed by multiplying the factor loading of each subtest on the

corresponding first-order factor by the standard deviation of the corresponding specific factor.

For example, the loading of the Information subtest score on VCHO,specific is .82  .51 = .42 (the

variance of VCHO,specific is .26; hence, the corresponding standard deviation is .51).

***** Please insert Table 3 about here *****

Three results of the Schmid-Leiman transformation of the higher-order factor model

(Table 3) are noteworthy. First, the factor loadings of the subtests on gHO are large,

demonstrating that gHO exerts strong effects on all subtests: the median of the loadings on gHO

is  = .77 (range:  = .70 to  = .87). Second, each subtest has substantial loadings on specific
HIERARCHICALLY STRUCTURED CONSTRUCTS 13

abilities (Mdn  = .35; range:  = .20 to  = .45), indicating that specific ability constructs

have an incremental impact on the corresponding subtest scores, over and above gHO. Third,

the factor loadings of the subtests on gHO are considerably larger than the factor loadings of

the subtests on the factors representing specific abilities: Hence, the subtest scores contain

substantially more variance attributable to gHO than to specific abilities.

Finally, close inspection of the Schmid-Leiman transformation reveals an intrinsic

psychometric property of the higher-order factor—the proportionality constraint. This

constraint affects the proportion of variance in the subtest scores explained by general and

specific ability constructs (Schmiedek & Li, 2004). Specifically, for a given set of subtests,

the ratios of variance attributable to the respective first-order ability to variance attributable to

gHO are constrained to be the same. For example, the standardized factor loadings on VCHO,

specific are  = .415 for the Information subtest and  = .449 for the Vocabulary subtest. The

standardized factor loadings of these subtests on gHO are  = .703 and  = .760, respectively.

Obviously, if the variance ratios are computed from the squared factor loadings, the ratios of

the variance attributable to gHO to the variance attributable to VCHO, specific are the same for the

two subtests: variance ratio for the Information subtest = .4152 / .7032 = .349; variance ratio

for the Vocabulary subtest = .4492 / .7602 = .349. Crucially, the proportionality constraint

limits the value of the higher-order factor model in providing insights into the relationship

between general and specific abilities, on the one hand, and other psychological constructs,

sociodemographic characteristics, or life outcomes, on the other (see also General

Discussion).

Nested-Factor Model

Another CFA model (that is not subject to the proportionality constraint) that

considers the ability hierarchy in its entirety is the nested-factor model (NF; Figure 1d). The

term ―nested-factor model‖ was chosen because the factors representing specific constructs

are nested within the general factor representing the general construct (see also Gustafsson &
HIERARCHICALLY STRUCTURED CONSTRUCTS 14

Balke, 1993). Other terms used to label this kind of model include ―general-specific model‖

and ―bifactor model‖ (Chen et al., 2006). As noted above, current theories of the structure of

intelligence conceive it to be a multifaceted, hierarchically structured construct. In his

influential three-stratum theory, Carroll (1993) defined general cognitive ability as the

broadest ability construct (located at the apex of the hierarchy) and narrower ability constructs

as specific to domains or cognitive operations (located at lower levels of the hierarchy). This

conception of general and specific abilities is reflected in the specification of the nested-factor

model: general cognitive ability is represented as a first-order factor gNF that directly

influences all subtests of the WAIS–III. Hence, as for the one-factor model, higher scores on

gNF are associated with higher scores on all 14 subtests. Further, the nested-factor model

incorporates the multifaceted view of intelligence and the idea that abilities differ in their

breadth, with related sets of subtests being affected by a first-order factor that represents a

corresponding narrower, specific ability construct (e.g., specific Verbal Comprehension

VCNF,specific). For example, the first-order factor representing the (domain-specific) construct

VCNF,specific affects the subtests Information, Vocabulary, Similarities, and Comprehension

over and above gNF. Higher scores on VCNF,specific are therefore associated with higher scores

on these four subtests. Further, subtests are additionally influenced by subtest-specific factors

(i.e., eNF,1 to eNF,14). Crucially, general cognitive ability gNF, specific abilities, and subtest-

specific factors (i.e., eNF,1 to eNF,14) are assumed to be mutually independent and are therefore

specified to be mutually uncorrelated.

Finally, it is important to note that subtest loadings on factors that represent general or

specific abilities are freely estimated in the nested-factor model (vs. constrained in the higher-

order factor model). In contrast to the higher-order factor model, the nested-factor model does

not impose the proportionality constraint on variance ratios of general and specific abilities in

subtest scores. Hence, the nested-factor model can be seen as a generalization of the higher-

order model (Chen et al., 2006; Yung et al., 1999).


HIERARCHICALLY STRUCTURED CONSTRUCTS 15

How well does the nested-factor model fit the data? The overall model fit of the

nested-factor factor model is good (see Table 2)—the best of the four models under

investigation. Note that the higher-order factor model can be tested against the nested-factor

model (Yung et al., 1999). Thus, it is possible to test whether the higher-order factor model

with its proportionality constraints fits as well as the nested-factor model with freely

estimated factor loadings. 2 difference testing indicates that this is not the case, 2(9, N =

1,369) = 194; p < .001. The descriptive fit statistics also show some improvement in the fit of

the nested-factor model relative to that of the higher-order factor model.

Further, the one-factor model can be tested against the nested-factor model (Rindskopf

& Rose, 1988). This test helps to decide whether the factors representing domain-specific

abilities and general cognitive ability gNF in the nested-factor model are better able to explain

the correlations among subtest scores than a single common factor representing gOF. The

difference in the 2 values is large, at 2(13, N = 1,369) = 1,547, with a probability value of

p < .001. This result (along with the improvement in the descriptive fit statistics CFI,

RMSEA, and SRMR) supports the assumption of the nested-factor model that specific

abilities account for a substantial amount of common variance among subtest scores, over and

above gNF.

Three further findings obtained for the nested-factor model are noteworthy (see Figure

1d). First, the factor loadings of the subtests on gNF are large, demonstrating that gNF has

strong effects on all subtests: the median of the loadings on gNF is  = .77 (range:  = .66 to 

= .89). Second, each subtest has substantial (and statistically significant) loadings on specific

abilities. The range of loadings is from  = .08 (Picture Arrangement on PONF,specific) to  =

.52 (Digit Span on WMNF,specific), with a median loading of Mdn  = .37. Hence, both specific

abilities and gNF affect subtests. However, the influence of specific abilities on subtests is

clearly not equally strong for all subtests. Third, the factor loadings of the subtests on gNF are
HIERARCHICALLY STRUCTURED CONSTRUCTS 16

considerably larger than those on the factors representing specific abilities: Hence the subtest

scores contain substantially more variance attributable to gNF than to specific abilities.

Taken together, these results indicate (a) that the nested-factor model captures the

correlations among the subtest scores of the WAIS–III reasonably well, (b) that the

assumption of proportionality constraints as imposed in the higher-order factor model may not

hold, and (c) that specific abilities account for a substantial amount of the common variance

among subtest scores over and above gNF. In sum, these empirical results support the

theoretical proposition that cognitive abilities are hierarchically structured and differ in their

generality.

Cross-Model Comparison of General and Specific Constructs

The four CFA models presented above either focus exclusively on general cognitive

ability (i.e., OF) or specific abilities (i.e., FO) or consider the ability hierarchy in its entirety,

containing both general and specific constructs (i.e., HO and NF). At first glance, it may

appear that general and specific ability constructs as specified in these four models can be

interpreted interchangeably. However, as we explain in the following, this is generally not the

case.

We start with specific abilities, which are included in the first-order factor model, the

higher-order factor model, and the nested-factor model. Crucially, the substantive

interpretations of these specific ability constructs vary to different extents. In the first-order

factor model, the first-order constructs (i.e., VCFO, POFO, WMFO, and PSFO) affect the

corresponding subtests. However, no higher-order construct is specified to explain the

intercorrelations of the first-order constructs. In the higher-order factor model, in contrast, the

first-order constructs (i.e., VCHO, POHO, WMHO, and PSHO) affect the subtests as in the first-

order factor model, but are in turn influenced by two independently operating factors, namely

gHO and VCHO,specific, POHO,specific, WMHO,specific, or PSHO,specific, respectively. Hence, in contrast

to the-first order model, the higher-order factor model includes a higher-order construct that
HIERARCHICALLY STRUCTURED CONSTRUCTS 17

accounts for the interrelations among the first-order constructs: the higher-order factor gHO.

Consequently, the first-order constructs in the higher-order factor model contain variance

attributable to gHO (and variance attributable to a specific factor), whereas the variance of

ability constructs in the first-order factor model is not separated into components that are

attributable to lower-order and higher-order constructs.

In the nested-factor model, each subtest is directly affected by general cognitive ability

gNF and a specific ability construct (i.e., VCNF,specific, PONF,specific, WMNF,specific, or PSNF,specific).

Hence, in contrast to specific abilities in the first-order factor model (i.e., VCFO, POFO, WMFO,

and PSFO), specific abilities as conceptualized in the nested-factor model are abilities that

explain variance in the subtest scores over and above gNF. Thus, specific ability constructs in

the nested-factor model account for variance in subtest scores, account taken of the impact of

gNF on subtest scores, whereas the ability constructs in the first-order factor model explain

total variance in subtest scores. To sum up, it is only if there is no general construct operating

that corresponding specific constructs as specified in the first-order factor model (e.g., VCFO),

the higher-order factor model (e.g., VCHO,specific), and the nested-factor model (e.g.,

VCNF,specific) are identical. Thus, the stronger the empirical impact of the general construct on

manifest measures is, the more distinct the specific constructs in the first-order factor model

become from the corresponding specific constructs in the higher-order model and the nested-

factor model, respectively.

The specific ability constructs in both the higher-order factor model (e.g., VCHO,specific)

and the nested-factor model (e.g., VCNF,specific) are specified to operate mutually independently

as well as independently of general cognitive ability (in terms of gHO or gNF). A refined

substantive interpretation of these specific ability constructs is challenging, however (see

Brunner, 2008, p. 162). Ideally, this interpretation would rest on cognitive measures that tap

only the specific ability construct but not general cognitive ability. In the realm of ability

research, such measures have not yet been found (Brunner, 2008). However, problems with
HIERARCHICALLY STRUCTURED CONSTRUCTS 18

the labelling of specific constructs do not mean that specific constructs (as represented by

specific factors) do not have substantive meaning (see also the General Discussion).

Importantly, as pointed out by two anonymous reviewers, in other areas of psychological

research (e.g., clinical psychology), more readily identifiable labels for specific factors are

available. For example, in the tripartite model by Clark and Watson (1991), the variance in

anxiety and depression (which are both represented as first-order factors) can be decomposed

into three parts: variance attributable to (a) general negative affect (represented as higher-

order factor) that influences both anxiety and depression, (b) specific physiological

hyperarousal that affects only anxiety, and (c) a specific factor representing low positive

affect (―anhedonia‖) that affects only depression. In this example, the specific factors

represent symptoms that are independent of general negative affect and specific to anxiety and

depression, respectively.

Although specific constructs in the higher-order factor model and the nested-factor

model may have the same substantive interpretations, there are subtle differences between

them. Specifically, the impact of specific ability constructs (e.g., VCHO,specific) on subtest

scores as specified in the higher-order factor model is subject to the proportionality constraint,

whereas the impact of specific abilities (e.g., VCNF,specific) in the nested-factor model is not.

Hence, given the operation of a general construct, it is only when the proportionality

constraint holds that the corresponding specific constructs in the higher-order factor model

and the nested-factor model are mathematically identical. The more the empirical

relationships deviate from the proportionality constraint, the more distinct corresponding

specific constructs as specified in the higher-order and the nested-factor model become from

each other and the more distinct their substantive interpretations may become.

We now turn to the factor representing general cognitive ability, which is included in

the one-factor model, the higher-order factor model, and the nested-factor model. Except for

rare cases, these general ability constructs are not identical. In the one-factor model, gOF
HIERARCHICALLY STRUCTURED CONSTRUCTS 19

(along with the subtest-specific factors eOF) is the only influence on subtest scores. This is the

major difference to the higher-order factor model and the nested-factor model: in the higher-

order factor model, gHO (indirectly) influences subtest scores independently of the specific

constructs (e.g., VCHO,specific); in the nested-factor model, gNF (directly) influences subtest

scores independently of the specific constructs (e.g., VCNF,specific). Hence, gOF explains total

variance in the subtest scores, whereas both gHO and gNF explain variance in the subtest scores

while controlling for specific constructs. Consequently, it is only if no specific constructs

influence manifest measures (i.e., if the corresponding factor variances are zero) that the three

general factors are identical. In other words, the stronger the empirical impact of specific

constructs on manifest measures is, the more distinct gOF, on the one hand, becomes from gHO

and gNF, on the other. Moreover, the impact of general cognitive ability gHO on subtest scores

as specified in the higher-order factor model underlies the proportionality constraint, whereas

the impact of gNF in the nested-factor model does not. Given that specific constructs influence

manifest measures, it is thus only if the proportionality constraint holds that gHO and gNF are

identical. In other words, the more the proportions of variance in the subtest scores that are

attributable to general and specific constructs deviate from the proportionality constraint, the

more distinct gHO becomes from gNF.

In sum, the interpretations of general and specific ability constructs as conceptualized

in the different CFA models are generally not (completely) interchangeable. Thus, the choice

of a CFA model that links cognitive measures to general and/or specific ability constructs

implies certain constraints on the interpretation of these constructs. Because the four CFA

models discussed are partly nested within each other (i.e., one model is a restricted version of

another), cross-model comparison by means of model fit indices and 2 difference tests can be

used to guide the choice of model.

Model-Based Reliabilities of Measures Assessing Psychological Constructs


HIERARCHICALLY STRUCTURED CONSTRUCTS 20

Cognitive abilities are not directly observable entities, but latent variables. To assess

an individual‘s cognitive abilities, we have to estimate his or her level on the respective latent

variable. In most applied psychological research, several manifest scale indicators are

summed using unit weights (i.e., each scale indicator has the same weight in the computation

of the sum score) to form a manifest scale score. This scale score gives an estimate of the

person‘s level on the latent general or specific ability construct (Grice, 2001). For example, a

scale score reflecting a person‘s level of general cognitive ability can be computed by using

unit weights to sum up his or her scores on all 14 subtests of the WAIS–III.3 But how reliable

is this scale score?

Classical Test Theory and Reliability

To answer this question, we first show how reliability can be mathematically defined.

As most readers are familiar with the fundamental ideas of classical test theory (CTT), we

start by considering how CTT defines reliability. Within the framework of CTT, a person‘s

observed score is partitioned into one component that reflects his or her true score and one

component that is independent of the true score and reflects measurement error (Lord &

Novick, 1968, p. 29). The observed score variance is thus composed of variance attributable

to true scores (true score variance) and variance attributable to measurement error (error

variance). Score reliability in the context of CTT is thus mathematically defined in terms of

the proportion of true score variance to observed score variance (Lord & Novick, 1968, p. 61).

Reliability may range between 0 (no reliability) and 1 (perfect reliability).

Model-Based Score Reliability in the Context of CFA Models

The mathematical definition of reliability in the context of CTT has two conceptually

overlapping meanings. Reliability (a) assesses the consistency of measurement (across time or

across instruments) and (b) is an index of measurement precision (Lord & Novick, 1968;

McDonald, 1999; Mellenbergh, 1996). In this article, we draw on the conceptual definition of

reliability as an index of measurement precision. Specifically, we focus on model-based


HIERARCHICALLY STRUCTURED CONSTRUCTS 21

estimates of score reliability by means of CFA models.4 As we show below, for the one-factor

model and the first-order factor model, the total amount of reliable variance provides an

estimate of how precisely a certain scale score assesses a certain target construct. For the

higher-order factor model and the nested-factor model, however, different model-based

reliability indices can (a) estimate the total amount of reliable variance in a scale score or (b)

indicate how precisely a certain scale score measures a certain target construct. We therefore

discuss the computation and interpretation of model-based score reliabilities separately for

each of the four CFA models presented above.

In this respect, it is important to highlight that all model-based estimates of score

reliability (as well as all reliability estimates based on CTT) are population dependent

(Mellenbergh, 1996). Thus, score reliability depends on how heterogeneous the sample is on

the target construct(s). Moreover, psychometricians differ in their interpretations of the

relationship between the true score concept of CTT and construct scores as defined in CFA

models (Bollen, 1989, p. 219; Borsboom & Mellenbergh, 2002). For example, Borsboom and

Mellenbergh (2002) point to fundamental conceptual differences between true scores and

construct scores. In contrast, proponents of stochastic measurement theory (e.g., Steyer, 1989)

integrate CTT into CFA models by incorporating statistical assumptions on the relationship

between true scores and construct scores (see also Bollen, 1989, pp. 218-222). Here, we take a

model-based approach to score reliability in the context of CFA models, as perhaps most

clearly elaborated in Bollen (1989) and McDonald (1999). For didactic reasons, we point to

some general similarities between CTT and the model-based approach. A thorough discussion

of the (sometimes subtle) conceptual and statistical differences between these psychometric

models is beyond the scope of this article.

One-Factor Model

We first analyze how well the scale score representing general cognitive ability

assesses the latent construct general cognitive ability in terms of gOF (Figure 1a). In the one-
HIERARCHICALLY STRUCTURED CONSTRUCTS 22

factor model, the variance of the latent factor representing gOF can be interpreted as the

reliable (―construct score‖) variance of the score representing general cognitive ability.

Further, gOF and subtest-specific factors are specified to be unrelated, reflecting the idea that

construct score and error score are mutually independent. As noted above, the subtest-specific

factors (i.e., eOF,1 to eOF,14) may comprise both reliable subtest-specific variance and

unreliable variance attributable to random measurement error. It is debatable whether subtest-

specific variance that is not shared with the target construct (e.g., gOF) should be seen as part

of a measure‘s reliable variance (Bollen, 1989, pp. 219-221). Given that variance attributable

to measurement error and reliable subtest-specific variance are typically not separated in

applications of CFA, we do not consider the latter to be part of the reliable variance, and we

do not take it into account when computing scale score reliability (Bollen, 1989, pp. 220-221).

Hence, the model-based reliability estimates that we discuss in the present article may be

interpreted as lower-bound estimates of a scale score‘s total amount of reliable variance.

Taken together, in the case of a one-factor model, the model-based reliability of a

scale score may be defined as the proportion of variance accounted for by one latent target

construct (e.g., gOF) relative to observed score variance. In line with McDonald (1999) and

Zinbarg and colleagues (Zinbarg, Revelle, Yovel, & Li, 2005; Zinbarg et al., 2006), we refer

to this reliability coefficient as omega (). More formally, these ideas can be expressed as

follows. When unit weights are used, a scale score Y is computed by summing up p manifest

scale indicators Yi: Y = Y1 + Y2 + ... + Yp. When standardized model parameters are used,  for

the scale score Y is computed as follows:

2
 p 
   ij 
  i 1  . (1)
2
 p
 p
   ij    ei
 i 1  i 1
HIERARCHICALLY STRUCTURED CONSTRUCTS 23

Here, ij is the standardized factor loading of manifest measure Yi on factor j, and ei is

the standardized variance of the subtest-specific factor affecting the manifest variable Yi. The

numerator in Equation 1 represents the amount of score variance in the scale score Y that can

be attributed to the variance of the factor representing the target construct. The denominator

represents the total variance of the scale score, which comprises (a) the score variance

accounted for by the target construct and (b) the variances attributable to the subtest-specific

factors of the scale indicators. Values of omega can range from 0 (no reliability) to 1 (perfect

reliability). In other words, when a one-factor model is applied, a value of  = 1 indicates that

the sum score Y measures the target construct with perfect accuracy; the more omega departs

from 1, the lower the precision with which Y measures the latent target construct.

Table 4 presents the computation used to derive omega for the scale score General

Cognitive Ability in terms of gOF (Figure 1a).

****** Please insert Table 4 about here ******

When the model parameters obtained for the one-factor model are used (Figure 1a),

omega of the General Cognitive Ability score is computed as the ratio of the variance

attributable to gOF to the total variance of this score. The total variance of the General

Cognitive Ability score is the sum of the variances that can be attributed to (a) gOF and (b)

subtest-specific factors (i.e., the sum of the variances of eOF,1 to eOF,14). The value of  = .96

represents the reliability of the General Cognitive Ability score to measure gOF. In other

words, 96% of the variance in this scale score is accounted for by gOF. Table 5 reports the

composition of scale score variance in terms of variance attributable to gOF and subtest-

specific factors (eOF,1 to eOF,14). Note that the omega value obtained for the one-factor model

should be interpreted with caution, as the fit of this model to the data was modest (see also the

discussion below).

****** Please insert Table 5 about here ******

First-Order Factor Model


HIERARCHICALLY STRUCTURED CONSTRUCTS 24

How can the reliabilities of scale scores be computed in the context of a first-order

factor model? Because each measure is assumed to reflect one latent target construct only, the

scale score‘s reliability can be computed in the same way as for the one-factor model, using

omega (see Equation 1). For example, the scale score assessing the target construct Verbal

Comprehension in terms of VCFO is computed as the unit-weighted sum of the subtest scores

Information, Vocabulary, Similarities, and Comprehension. Reliability in terms of how well

this scale score assesses VCFO is then computed as the ratio of variance attributable to VCFO

(i.e., the squared sum of the corresponding standardized factor loadings; see Figure 1b) to the

total variance of this scale score (i.e., the squared sum of the corresponding standardized

factor loadings plus the sum of the corresponding subtest-specific variances). Table 4 shows

the necessary computation. The value of  = .91 indicates that 91% of the variance in the

scale score reflecting Verbal Comprehension is attributable to VCFO. Table 5 reports the

omega values and the variance composition for the other scale scores measuring specific

abilities.

Higher-Order Factor Model

In the one-factor model and the first-order factor model, each subtest is affected by a

single ability construct. In the higher-order factor model, subtests are affected by both general

and specific ability constructs. The computation of a scale score‘s reliability is thus more

complex: In the higher-order factor model, the observed variance of a manifest subtest score

is composed of (a) the variance attributable to the general/higher-order construct (i.e., gHO),

(b) the variance attributable to the specific constructs (e.g., VCHO,specific), and (c) subtest-

specific factors (i.e., eHO,1 to eHO,14). When several subtests are summed to create a scale

score, the total variance of this scale score thus comprises variance attributable to general

cognitive ability and variance attributable to a certain specific ability (in addition to variance

attributable to subtest-specific factors eHO,1 to eHO,14).


HIERARCHICALLY STRUCTURED CONSTRUCTS 25

As stated above, score reliability can be defined in terms of measurement precision. In

the one-factor model and in the first-order factor model, this definition can be mathematically

expressed as the proportion of variance in the target construct to observed score variance

(McDonald, 1999; Mellenbergh, 1996). In the context of higher-order factor models (and

nested-factor models), the computation of score reliability is more complex: The model-based

computation of score reliability depends on the researcher‘s decision as to which variance

components are defined as reliable (―construct score‖) variance (Sijtsma, 2009).

Omega. The first way of defining reliable variance is as the amount of variance

accounted for by all (i.e., general/higher-order and specific) constructs that underlie a scale

score. In line with Zinbarg and colleagues (Zinbarg et al., 2005; Zinbarg et al., 2006), we

again refer to this reliability coefficient as omega (). In the case of a higher-order factor

model with k mutually orthogonal latent factors that represent k (general/higher-order and

specific) constructs,  of the scale score Y can be expressed as follows:

k  p  
2

    ij  
j 1  i 1
  
= . (2)
k 
  p
2
 p
    ij     ei
j 1  i 1
   i 1

Note that the numerator in Equation 2 represents the total amount of variance that can

be attributed to the variances of the k constructs that underlie the scale score Y. The

denominator represents the total variance of the scale score, which comprises (a) the total

variance accounted for by all k underlying constructs and (b) the variances attributable to

subtest-specific factors. Omega thus informs on the reliability (i.e., measurement precision)

with which a scale score assesses the blend of the general/higher-order and specific

constructs; it can range from 0 (no reliability) to 1 (perfect reliability).

To illustrate the computation of omega in the context of a higher-order factor model,

we present the necessary computations for the General Cognitive Ability score of the WAIS–
HIERARCHICALLY STRUCTURED CONSTRUCTS 26

III in Table 4. The omega value of this score can be computed by calculating the total amount

of variance attributable to latent constructs in the scale sum (i.e., variance due to gHO,

VCHO,specific, POHO,specific, WMHO,specific, and PSHO,specific), using the standardized factor loadings

as obtained by applying the Schmid-Leiman transformation (Table 3). The denominator is the

total variance of the General Cognitive Ability score—the sum of the variances attributable to

(a) gHO, (b) VCHO,specific, (c) POHO,specific, (d) WMHO,specific, (e) PSHO,specific, and (f) subtest-

specific factors (i.e., eHO,1 to eHO,14). The value of  = .97 represents the reliability of the

General Cognitive Ability score to measure the blend of gHO and specific abilities. Table 5

displays the omega values and the variance composition for scale scores measuring specific

abilities.

Omega hierarchical. In the computation of omega, reliable variance is defined as the

variance accounted for by all (i.e., general/higher-order and specific) constructs that underlie a

scale score. Alternatively, reliable variance may be defined as the variance in a scale score

accounted for by just one target construct (represented by factor j). To this end, we adapt the

methodological approach developed by McDonald (1999) as well as by Zinbarg and

colleagues (Zinbarg et al., 2005; Zinbarg et al., 2006). Specifically, these researchers

developed the reliability coefficient omega hierarchical (h), which gauges how precisely a

total score assesses a general construct as specified in a higher-order or a nested-factor model.

Note that they did not use omega hierarchical to compute score reliability of specific

constructs, as specified in the present models. Nevertheless, we apply the term omega

hierarchical in this article, because the same methodological approach is used. In this article,

h is defined as follows:

2
 p 
   ij 
h   i 1  . (3)
k  p
  p
2

    ij     ei
j 1  i 1
   i 1
HIERARCHICALLY STRUCTURED CONSTRUCTS 27

Note that h indicates the proportion of variance in the scale score that is accounted

for by a particular target construct (as represented by factor j in the Schmid-Leiman

transformed higher-order factor model) to total observed variance (i.e., the sum of the

variances accounted for by all k underlying constructs and the sum of all variances

attributable to subtest-specific factors of the p subtests). Thus, omega hierarchical reflects the

measurement precision with which a scale score assesses a certain target construct; it can

range from 0 (no reliability) to 1 (perfect reliability).

We now illustrate the computation of h for the General Cognitive Ability Score. To

compute the omega hierarchical of this score to assess gHO, we again enter the standardized

factor loadings as obtained by applying the Schmid-Leiman transformation (Table 3) in

Equation 3. The denominator representing total scale score variance is thus identical to the

one used to calculate omega (see Table 4). However, the numerator now represents the

variance accounted for by gHO only. Therefore, it contains only the squared sum of the

loadings of the 14 subtests on gHO. The value of h = .92 represents the reliability (i.e.,

measurement precision) of the General Cognitive Ability score to measure the construct gHO.

By the same token, omega hierarchical for the specific abilities (e.g., VCHO,specific) is

computed by only taking into account the standardized factor loadings of the subtests (see

Table 3) on these specific factors when computing the numerator of Equation 3. The

corresponding denominator contains the total variance of the corresponding specific scale

scores, computed as the sum of the variances attributable to (a) gHO, (b) a certain specific

ability factor (e.g., VCHO,specific), and (c) corresponding subtest-specific factors (e.g., eHO,1 to

eHO,4). The value of h = .23 (see Table 5) represents the reliability (i.e., measurement

precision) of the Verbal Comprehension score to measure the construct VCHO,specific.

Table 5 displays the omega hierarchical values and the variance composition for all

scale scores. As the scale scores measuring specific abilities contain a great proportion of
HIERARCHICALLY STRUCTURED CONSTRUCTS 28

variance attributable to gHO, the omega hierarchical values of these scores are relatively low

(ranging from h = .06 for Perceptual Organization to h = .23 for Verbal Comprehension).

Nested-Factor Model

How can the reliabilities of scale scores be computed in the context of the nested-

factor model? Because each subtest is assumed to be influenced by both general cognitive

ability gNF and a certain specific ability, the scale score‘s reliability can be computed in the

same way as for the higher-order factor model, using omega (see Equation 2) and omega

hierarchical (see Equation 3).

For example, reliability of the Verbal Comprehension score is computed as the ratio of

variance attributable to gNF and VCNF,specific (i.e., adding up the squared sums of the

corresponding standardized factor loadings of subtests on gNF and VCNF,specific; see Figure 1d)

to the total variance of this scale score (i.e., adding up the squared sums of the corresponding

standardized factor loadings of subtests on gNF and VCNF,specific plus the sum of the

corresponding subtest-specific variances). Table 4 shows the necessary computations. The

value of  = .91 indicates that 91% of the variance in the Verbal Comprehension score is

attributable to the blend of gNF and VCNF,specific. This omega value therefore reflects how well

the Verbal Comprehension score measures the blend of general cognitive ability and specific

Verbal Comprehension.

Omega hierarchical, in contrast, reflects how well the Verbal Comprehension score

assesses VCNF,specific. It is computed as the ratio of variance attributable to VCNF,specific (i.e., the

squared sum of the corresponding standardized factor loadings of subtests on VCNF,specific) to

the total variance of the Verbal Comprehension score. The total variance is computed in the

same way as for omega. The value of h = .23 thus indicates that 23% of the variance in the

Verbal Comprehension scale score is attributable to VCNF,specific. This is the reliability (i.e.,

measurement precision) with which the Verbal Comprehension score assesses VCNF,specific.

Table 5 shows how well scale scores assessed the blend of general and specific abilities (in
HIERARCHICALLY STRUCTURED CONSTRUCTS 29

terms of ) and a certain ability construct (in terms of h). As for the higher-order factor

model, scale scores showed relatively low reliability (in terms of h) in assessing specific

abilities because they contained a large amount of variance attributable to gNF.

Statistical Requirements

Omega and omega hierarchical are based on parameter estimates (i.e., estimates of

factor loadings and factor variances) that are derived for a certain CFA model. Hence, two

vital statistical requirements need to be fulfilled: (1) Proper interpretation of omega and

omega hierarchical requires that the target model fits the empirical data well (Bentler, 2009;

McDonald, 1999; Yang & Green, 2010). (2) Parameter estimates need to be precise.

We first address the evaluation of model fit, which should achieve an optimal balance

between the fit of the model to the empirical data, on the one hand, and theoretical

considerations, on the other. There has been considerable debate on which fit indices should

be used and on the strategies applied to evaluate model fit (Hu & Bentler, 1999; Jöreskog,

1993; Marsh, Hau, & Wen, 2004; McDonald, 2010). Although no consensus has yet been

reached, several methodologists strongly recommend comparing the preferred target model

with several a priori specified and theoretically supported alternatives. This approach takes

into account that cutoff values of model fit indices are model dependent, considers alternative

explanations of the data, and allows some models to be ruled out while giving stronger

support for others (MacCallum & Austin, 2000; Marsh, Hau, & Grayson, 2005; West, Taylor,

& Wu, in press). In this tutorial, for example, we computed omega for the General Cognitive

Ability score using the results obtained for the one-factor model for illustrative purposes.

However, we would be very cautious in interpreting this value as the reliability of the General

Cognitive Ability score to assess the latent construct general cognitive ability, because the

one-factor model provided only a modest fit to the data and—even more importantly—a

poorer fit than alternative CFA models. As the nested-factor model was theoretically derived
HIERARCHICALLY STRUCTURED CONSTRUCTS 30

and provided the best fit of the four models under investigation, we would use the model

parameters obtained for this model to compute score reliability (in terms of  and h).

We now turn to the precision of model parameters, which is affected by two key

factors. First, sample size needs to be sufficiently large to obtain trustworthy estimates of

model parameters (Yang & Green, 2010).5 In general, a larger sample size is always better,

and a sample size of N  200 allows proper estimation of model parameters (e.g., nonnegative

variances of subtest-specific factors) under a large variety of conditions (Boomsma &

Hoogland, 2001). There is also growing consensus that the required sample size depends on

the properties of the model investigated and the data to be analyzed: A higher ratio of

measures per factor and higher factor loadings may compensate for smaller sample size

(Marsh, Hau, Balla, & Grayson, 1998; Yang & Green, 2010). Thus, methodologists strongly

encourage applied researchers to conduct Monte Carlo studies of the target CFA models to

determine the required sample size (L. K. Muthén & Muthén, 2002). For example, previous

simulation studies have demonstrated that trustworthy model-based reliability estimates may

be obtained even with relatively small sample sizes (e.g., N = 100; see Zinbarg et al., 2006).

Second, parameters for CFA models are typically derived by maximum likelihood

estimation, which requires continuous raw data that follow a multivariate normal distribution.

However, many studies in personality and interindividual differences research administer self-

report items with a limited number of response options (e.g., ―disagree,‖ ―disagree

somewhat,‖ ―agree somewhat,‖ ―agree‖); therefore, the assumption that raw data are

continuous may not be tenable. Moreover, empirical data frequently fail to follow a normal

distribution (Micceri, 1989) and, consequently, to have a multivariate normal distribution. So

what can be done? Model parameters—including factor loadings and variances of subtest-

specific factors used to compute omega or omega hierarchical—are generally trustworthy if

three conditions are fulfilled: the raw data are continuous, the sample size is reasonably large,

and the assumption of multivariate normality is not severely violated. Parameter estimates are
HIERARCHICALLY STRUCTURED CONSTRUCTS 31

quite robust to violations of the multivariate normality assumption as long as the indicators

are ―reasonably‖ continuous. For example, a recent simulation study by Rhemtulla, Brosseau-

Liard, and Savalei (2010) demonstrated that maximum likelihood-based estimation methods

yield acceptable parameter estimates for CFA models under a wide range of conditions, even

when the manifest variables contain only four response categories.

If distributional assumptions are severely violated, several routes can be taken to

tackle the problem—for example, employing alternative (robust) estimation methods with less

stringent distributional assumptions or transforming the input data to better match the

distributional assumptions. Modern software packages used to study CFA models include

robust estimation methods, such as robust maximum likelihood estimation (Satorra, 1990) and

robust weighted least squares estimation (B. O. Muthén, 1984; B. O. Muthén & Kaplan,

1985). These estimation methods may yield higher precision (a) to assess model fit, (b) to

compute standard errors of model parameters, and (c) in the case of robust weighted least

squares, to estimate the model parameters themselves. Thus, robust weighted least squares

may also be an appropriate method for analyzing item-level data from items with fewer than

four response categories (Rhemtulla et al., 2010). Further information on this method can be

found in Wirth and Edwards (2007), who provide an excellent review of factor models and

various estimation methods for item-level data. Moreover, robust maximum likelihood

estimation allows the use of omega and omega hierarchical as explained in this tutorial; in the

case of weighted least squares estimation, score reliability may be estimated using the

approaches proposed by Green and Yang (2009) or Bentler (2009, p. 142).

Alternatively, item scores that are intended to measure the same construct(s) may be

integrated into parcel scores. Subtest scores (as applied in this article) are a special case of

parcel scores (i.e., all items making up a subtest are integrated into one subtest score). Parcel

scores may then be used as manifest measures of the latent variables in CFA models, and

model parameters can be estimated by (robust) maximum likelihood procedures. Parcel scores
HIERARCHICALLY STRUCTURED CONSTRUCTS 32

may have several advantages over item scores: they show better distributional properties (i.e.,

normality), keep the ratio of observable measures to latent constructs manageable, and

increase the chances of adequate model fit (Bagozzi & Edwards, 1998; Hall, Snell, & Singer

Foust, 1999; Little, Cunningham, Shahar, & Widaman, 2002; West, Finch, & Curran, 1995).

Crucially, when parcel scores are applied, two key requirements need to be fulfilled: (a) The

parcel scores must adequately represent the target construct(s) (Bagozzi & Edwards, 1998;

Little et al., 2002). (b) The dimensional structure underlying the items needs to be taken into

account. Otherwise, inaccurate parameter estimates and model fit statistics will result (Hall et

al., 1999; Little et al., 2002). Ideally, the inter-item structure is unidimensional (Little et al.,

2002). For example, when a one-factor model fits reasonably well to a set of items, these

items may be randomly distributed to parcels (for other parceling strategies, see Hall et al.,

1999, and Little et al., 2002).

Application and Interpretation

Given that the statistical requirements for CFA are met, several aspects of the

application and interpretation of model-based estimates of score reliability warrant

consideration. First, in this article, we applied WAIS–III subtest data to estimate the

reliabilities of scores to assess hierarchically structured constructs at two different levels of

the hierarchy (i.e., specific and general ability constructs located at the intermediate and the

top levels of the ability hierarchy). Importantly, the methods outlined in this tutorial may also

be applied to obtain model-based reliability estimates when item scores are used (in place of

subtest scores) or when more than two levels of the construct hierarchy are investigated.

Second, a model-based approach to estimating score reliability may render not only

the computation but also the concept of score reliability more complex. CTT defines score

reliability ―as the proportion of true-score variance, without considering the composition of

the true score,‖ whereas model-based approaches to reliability ―decompose the true-score

variance into different variance components, and the researcher has to decide which variance
HIERARCHICALLY STRUCTURED CONSTRUCTS 33

components contribute to test-score reliability‖ (Sijtsma, 2009, p. 170). When one-factor

models and first-order factor models are applied, the omega and omega hierarchical values of

a scale score are identical, as the reliable (―construct score‖) variance of the corresponding

scale scores is not divided into variances attributable to general/higher-order and specific

constructs, respectively. Consequently, omega is sufficient to estimate score reliability, and

the classic definition of score reliability as the proportion of ―construct score‖ variance to

total score variance applies. In the case of the one-factor model and the first-order factor

model, omega therefore indicates the precision with which a scale score assesses a certain

target construct. Interestingly, this interpretation of omega converges with the concept of

construct validity—the extent to which a measure assesses the construct it was designed to

measure (Bollen, 1989, p. 195; McDonald, 1999, p. 63 and p. 208). Note, however, that this

interpretation of omega applies only for researchers who conceive of validity as a quantitative

concept, and not for those who conceive of validity as a qualitative concept (i.e., a measure is

or is not valid to assess a certain target construct). The latter researchers may consider two

measures to be valid, but one to be more reliable (Borsboom, Mellenbergh, & van Heerden,

2004, p. 1070). Omega is thus an index of reliability in terms of measurement precision only.

In contrast to the one-factor and the first-order factor model, the higher-order factor

and the nested-factor model imply that scale scores can be conceived of as assessing more

than one ability construct simultaneously. Thus, these models involve two forms of reliability:

(a) omega indicates how precisely a score measures the blend of general and specific

constructs, whereas (b) omega hierarchical indicates how precisely a score measures a certain

target construct at a certain level of the hierarchy. Omega is therefore closely tied to the

classic definition of score reliability (Bollen, 1989, p. 221): it reflects the total amount of

reliable score variance to the total scale score variance. When constructs are hierarchically

structured, however, the composition of reliable (―true score‖) variance entered in the

computation of omega is complex. Omega values are therefore ambiguous with respect to the
HIERARCHICALLY STRUCTURED CONSTRUCTS 34

key question of how precisely a score assesses a certain target construct. It is omega

hierarchical that addresses this question. Further, when validity is conceived of as a

quantitative concept (see above), it is also omega hierarchical that may be interpreted in terms

of a measure‘s construct validity (Bollen, 1989, p. 195).

Third, in applied assessment, the focus of interest may also be on the total amount of

reliable variance of a scale score at the bottom level of the hierarchy (e.g., the Information

subtest score). Several methodological approaches can be used to obtain a corresponding

reliability estimate: (a) The total reliable variance of such a scale score can be defined as the

degree to which this score is free of ―error‖ in terms of the subtest-specific factors ei (Bollen,

1989, pp. 220-221). For example, when using the standardized results obtained for the nested-

factor model (Figure 1d), the total reliable variance of the Information subtest score is

estimated by 1 – eNF,1 = 1 – .34 = .66. As noted above, however, the subtest-specific factors

may comprise both reliable subtest-specific variance and unreliable variance attributable to

random measurement error (Bollen, 1989, pp. 219-221). Hence, the result may be interpreted

as a lower-bound estimate of total score reliability, because reliable subtest-specific variance

is not taken into account. Several alternatives based on the interrelationships of the items

entering the scale score can be used to overcome this problem: (b) alpha (Cronbach, 1951), (c)

reliability estimates based on a unidimensional nonlinear factor analytic model (Green &

Yang, 2009; Raykov, Dimitrov, & Asparouhov, 2010), or (d) a unidimensional item-response

model (Mellenbergh, 1996). In many cases, these reliability estimates can be expected to be

larger than those obtained by approach (a), because they can take into account reliable

subtest-specific variance. Crucially, when interpreting the values obtained from approaches

(a) to (d), researchers should be aware that scale scores at the bottom level of the hierarchy

may measure several constructs simultaneously. Thus, these reliability estimates may reflect

the precision with which a scale score assesses the blend of higher-order constructs (approach

[a]) or the blend of subtest specific and higher-order constructs (approaches [b] to [d]).
HIERARCHICALLY STRUCTURED CONSTRUCTS 35

Fourth, latent variable models are blind to several threats to the validity of statistical

conclusions, such as individual outliers, heteroscedasticity of residuals, and nonlinear

relationships between latent variables (Cohen, Cohen, West, & Aiken, 2003). Obtaining

manifest composite scores and visually inspecting plots can therefore help to carefully

diagnose regression relationships (Cohen et al., 2003; West, 2006; Wilkinson & Task Force

on Statistical Inference, 1999). Omega (if OF or FO are empirically supported) or omega

hierarchical (if HO or NF are empirically supported) may thus serve a useful purpose, as these

reliability coefficients inform on the measurement precision with which a manifest score

assesses a certain latent target construct. Specifically, when hierarchical construct definitions

are endorsed, omega hierarchical may help (see also Zinbarg et al., 2006, p. 124) to evaluate

regression relationships and to judge whether unexpected results (based on manifest scores)

are due to random error or to the fact that the score does not precisely measure the target

construct (e.g., VCNF,specific).

Fifth, as noted in the review by Hogan, Benjamin, and Brezinski (2000), researchers

often use alpha (e.g., Cronbach, 1951) to estimate score reliability (see also Streiner, 2003).

However, alpha is not suitable for estimating the reliability of measures of hierarchically

structured constructs that operate at various levels of generality. Alpha does not indicate how

reliably either (a) a specific construct or (b) the general construct can be measured. On the

contrary, the alpha value reflects a complex blend of variance attributable to the general

construct, variance attributable to the specific constructs, and variability in factor loadings of

the scale indicators assessing general and specific constructs (Zinbarg et al., 2005). Hence,

unlike omega or omega hierarchical, alpha is not suitable for estimating score reliability when

constructs are structurally conceptualized in terms of a higher-order factor or a nested-factor

model.

Sixth, in this tutorial we computed the values of omega and omega hierarchical by

hand (see Table 4). However, these values can also be estimated by structural equation
HIERARCHICALLY STRUCTURED CONSTRUCTS 36

software (e.g., Mplus, L. K. Muthén & Muthén, 1998–2010) within the methodological

approach presented by Cheung (2009). The necessary Mplus syntax is provided in the online

supplement. The ―omega‖ function of the R-package ―psych‖ (Revelle, 2010) is an excellent

tool for computing omega hierarchical for a general factor derived by exploratory factor

analysis.

Seventh, the results of the higher-order factor and the nested-factor model converged

to show that score reliabilities in terms of omega were satisfactory, whereas reliabilities of

scores to assess specific constructs in terms of omega hierarchical were relatively low (Table

5). For example, for the nested-factor model, omega values ranged from .88 (Working

Memory) to .93 (Perceptual Organization), whereas omega hierarchical values ranged from

.05 (Perceptual Organization) to .23 (Verbal Comprehension). It was only the General

Cognitive Ability score that captured gNF with sufficient measurement precision (h = .93).6

In most practical settings, omega hierarchical values of scores assessing specific

constructs can be expected to be relatively low when there is a strong higher-order/general

construct that explains much of the common variance in the manifest measures. Thus, the

observation that values of omega hierarchical are low for scale scores capturing specific

constructs may generalize to many different assessment domains, including measures of

traditional self-report dimensions (DeYoung, 2006), disorders like depression (Steer et al.,

1999; Tanaka & Huba, 1984), subjective wellbeing (Chen et al., 2006; Gallagher et al., 2009),

and intelligence. For example, in the assessment of intelligence, it can be assumed—as a rule

of thumb—that ―in heterogeneous collections of cognitive tests in a wide range of talent,

general intelligence accounts for roughly 50% of the common variance‖ (Lubinski, 2004, p.

98). Hence, the observation that scores show relatively low measurement precision (in terms

of h) in assessing specific abilities can be assumed to generalize to cognitive measures

beyond the WAIS–III.


HIERARCHICALLY STRUCTURED CONSTRUCTS 37

In diagnostic settings, the low values of omega hierarchical obtained for the WAIS–III

scores assessing specific constructs are problematic, because they imply large confidence

intervals around a respondent‘s scale score. Any interpretations of a person‘s level of specific

ability therefore involve great uncertainty. When scale scores are interpreted to represent a

blend of general and specific ability constructs, however, the values of omega of the scores

are found to be satisfactory. Hence, the confidence intervals around the scale scores are

relatively small. In this case, however, interpretations of scale scores capturing specific

constructs should take into account that these scores represent the joint functioning of general

and specific abilities (when higher-order factor models or nested-factor models are applied).

In conclusion, a model-based approach to scale score reliability requires that statistical

assumptions of CFA are met and that the target model provides a good fit to the data. The

choice of a reliability index ( or h) should then be guided by (a) the CFA model applied

and (b) the envisaged interpretation of a scale score.

General Discussion

In this tutorial, we elaborated on four different kinds of CFA models that are widely

used to study various aspects of the theoretical proposition that psychological constructs are

hierarchically structured. In this section, we synthesize the key points of our theoretical and

empirical analyses: For which questions in personality and individual differences research is

each model most appropriate? And which model supports the computation of which

composite scores?

One-Factor Model

The one-factor model focuses on a single, very general construct. Thus, it is most

applicable when (a) a general construct is hypothesized to account for the common variance

among measures, (b) no specific constructs beyond that general construct are predicted to

account for the common variance, (c) the research objective is to study a general construct

and its relationships to other variables (e.g., psychological constructs, sociodemographic


HIERARCHICALLY STRUCTURED CONSTRUCTS 38

characteristics, or life outcomes), (d) the fit of a one-factor model is satisfactory, and (e) more

complex models do not provide a better fit than the one-factor model. Model comparison is

particularly important as it allows researchers to test whether first-order constructs, as

specified in the first-order factor model, are empirically distinguishable or whether specific

constructs, as specified in the nested-factor model, explain independent variance in the

measures beyond the contribution of a general construct. If the one-factor model is

theoretically supported and provides a good fit to the data, it is inconsistent to use anything

but a total scale score for research and diagnostic purposes. An appropriate reliability index of

this total scale score is omega.

First-Order Factor Model

Relative to the one-factor model, the first-order factor model focuses on constructs that

are narrower in scope. It is therefore most applicable when (a) (mutually correlated) first-

order constructs are hypothesized to account for the common variance among corresponding

measures, or (b) no general construct or higher-order construct is predicted to operate, and (c)

the research objective is to study the relationship between specific constructs (but not higher-

order/general constructs) and other psychological constructs, sociodemographic

characteristics, or life outcomes. Moreover, the first-order factor model is (d) a useful tool to

analyze the multifaceted nature of psychological constructs in terms of their convergent and

discriminant relationships (see Stoel et al., 2006, for appropriate statistical methods for testing

the discriminability of latent constructs). Further, (e) as the relationships among first-order

constructs may point to the operation of more general or higher-order constructs, careful

examination of first-order factor models is the traditional starting point for analyses of higher-

order factor models that include such constructs (Marsh, 1987). Hence, the first-order factor

model is particularly useful as a comparison model for more restrictive higher-order factor

models. If the first-order factor model provides a good fit to the data, the use of specific scale

scores for research and diagnostic purposes is strongly supported, but the use of a total scale
HIERARCHICALLY STRUCTURED CONSTRUCTS 39

score is not supported (McDonald, 1999, p. 208). An appropriate reliability index for the

specific scale scores is omega.

It is important to note that specific constructs as represented in the first-order model

are typically found to be correlated. Thus, researchers should be aware of two issues: First, as

no general construct is included, researchers should delineate the theoretical mechanisms by

which their specific constructs (as represented by first-order factors) are correlated (e.g., see

van der Maas et al., 2006, for intelligence). Second, if specific constructs as specified in the

first-order factor model are used as predictors in a regression context, the regression

coefficients may be affected by multicollinearity. This may render both the substantive

interpretation of regression coefficients and their sampling stability problematic (Cohen et al.,

2003). Specifically, multicollinearity may lead to large standard errors of regression

coefficients, which in turn undermine the trustworthiness of regression parameters:

confidence intervals will be large, implying high uncertainty about the magnitude of an effect.

Note that multicollinearity cannot occur in the other three CFA models: In the one-factor

model, a single construct is specified; in the higher-order factor model and the nested-factor

model, general and specific constructs are specified to be mutually uncorrelated.

Higher-Order Factor and Nested-Factor Model

Commonalities

Both the higher-order and the nested-factor models focus on the construct hierarchy in

its entirety by simultaneously examining the operation of constructs at various levels of

generality. Thus, both models may be applied when (a) a general construct is hypothesized to

account for the common variance among measures, (b) multiple specific constructs are

predicted to contribute to observed individual differences in assigned measures over and

above the general construct, and (c) the research objective is to study specific and general

constructs (Chen et al., 2006). Further, (d) if there is no theoretical explanation of why first-

order constructs are correlated (see above), many methodologists recommend including
HIERARCHICALLY STRUCTURED CONSTRUCTS 40

constructs with the widest possible generalizability in their models to provide ―the fullest

possible understanding of the data‖ (Gorsuch, 1983, p. 255). This rationale applies equally to

higher-order and nested-factor models. As both models include constructs that operate at two

levels of generality, both support the computation of specific scale scores and a total scale

score. Crucially, these scale scores may be interpreted with respect to either (a) the blend of

general and specific constructs or (b) a certain target construct only. Depending on the

interpretation chosen, different reliability estimates apply: (a) omega or (b) omega

hierarchical.

Choosing Between the Higher-Order Factor and the Nested-Factor Model

Given that the higher-order factor model and the nested-factor model have much in

common, which of these two models should be used in a particular context? If the general

construct is the focus of research, and if the higher-order factor model fits the data as well as

the nested-factor model, the higher-order factor model is preferable to the nested-factor model

(Chen et al., 2006).

In all other cases, but particularly in applications where the research focus is on the

relation of latent general and specific constructs to external criteria, the nested-factor model is

preferable to the higher-order factor model. More specifically, the higher-order factor model

is subject to the proportionality constraint, which renders the estimated relations of external

criteria to general and specific constructs linearly dependent (Schmiedek & Li, 2004). Thus, it

is not possible to examine the associations between all general and specific constructs as

specified in the higher-order factor model and external criteria (for further discussion, see

Schmiedek & Li, 2004). The nested-factor model, in contrast, allows the relationships

between general and all specific constructs, on the one hand, and external variables, on the

other, to be examined. To avoid identification problems in the higher-order factor model,

researchers need to constrain the relationship of one construct (either the general construct or

any of the specific constructs) with an external criterion variable to zero. However, this
HIERARCHICALLY STRUCTURED CONSTRUCTS 41

requires the correct identification of these zero relationships; otherwise, model parameters

may be biased (see the discussion by Schmiedek & Li, 2004, on potential problems in

identifying these relationships). Taken together, all constructs specified in the nested-factor

model may be linked to external variables. This key advantage makes the structural

conceptualization of constructs in the nested-factor model a theoretically and empirically

fruitful approach to implementing the specificity matching principle (e.g., Swann, Chang-

Schneider, & McClarty, 2007; see also Wittmann, 1988). According to this principle, it is best

to use specific predictor variables (e.g., mathematical ability test scores) to predict specific

outcomes (e.g., mathematics grades); likewise, it is best to use general predictor variables

(e.g., intelligence g) to predict general outcomes (e.g., grade point average). Application of

this principle has helped to reconcile opposing perspectives on the power of personality traits

(Fleeson, 2004), attitudes (Ajzen & Fishbein, 2005), and (perhaps) self-concepts (Marsh &

Craven, 2006; Swann et al., 2007) to explain key outcome variables at different levels of

generality. Moreover, the structural conceptualization of cognitive abilities in terms of a

nested-factor model has helped researchers to understand the interplay between general and

specific abilities, on the one hand, and school grades, academic interests, and students‘

socioeconomic status, on the other (Brunner, 2008; Gustafsson & Balke, 1993). From a wider

perspective, these studies clearly demonstrate the value of measures assessing general and

specific constructs, because both general and specific constructs play a key role in

understanding and predicting individual behavior.

Limitations of the Higher-Order Factor and Nested-Factor Models

Researchers applying the higher-order or nested-factor models should carefully inspect

model parameters for any lack of (empirical) identification. Specifically, it is relatively

common for the specific factors to collapse—as was the case, for example, in the Gustafsson

and Balke (1993) study on the structure of intelligence or in the Chen et al. (2006) study on

the structure of quality of life. Moreover, in the present tutorial we had to constrain the factor
HIERARCHICALLY STRUCTURED CONSTRUCTS 42

loadings of the two subtests on PSNF,specific to be equal because the nested-factor model was

otherwise empirically underidentified. In general, to check the identification status of their

models, researchers need to carefully examine parameter estimates, standard errors, and

variances of general and specific factors: Parameters out of the range of admissible values

(e.g., standardized factor loadings greater than 1 or negative variances of latent variables),

large standard errors, or variances of general or specific factors very close to zero may

indicate (empirical) identification problems.

A further limitation of the higher-order factor and nested-factor models concerns the

assumption that general and specific constructs are mutually uncorrelated. If this constraint is

removed and the correlations among general and specific constructs are freely estimated,

identification problems are likely to occur (Chen et al., 2006; Rindskopf & Rose, 1988). What

can be done to overcome this problem? If correlations among specific factors and/or the

general factor are of interest, there are several ways to identify the higher-order factor model

and the nested-factor model. First, equality constraints may be imposed on factor loadings.

For example, Wilhelm and Oberauer (2006) analyzed the relationship between reasoning

ability, working memory, and mental speed. They used experimental manipulations to design

measures of mental speed and examined precisely formulated hypotheses on how cognitive

constructs affect these measures by imposing equality constraints on corresponding factor

loadings. These constraints then allowed them to investigate the correlation among general

and specific cognitive constructs. Note that constraining model parameters to be equal (or to

any other predefined values; see below) assumes that these assumptions hold in the target

population. If these assumptions do not hold, the substantive interpretation of model

parameters may be compromised. Thus, the implementation of such an approach requires

careful evaluation of model fit (particularly of local misfit; see Tomarken & Waller, 2003,

and McDonald, 2010) to detect potential model misspecification. Second, according to

Graham and Collins (1991), researchers may ―borrow strength‖ to achieve identification by
HIERARCHICALLY STRUCTURED CONSTRUCTS 43

including additional variables in the model. Specifically, external variables that are known to

predict only one of the factors and/or that are uniquely predicted by one of the factors may be

included. However, this again requires correct specification of the relationships, as model

parameters may otherwise be biased (Schmiedek & Li, 2004). Third, factor correlations

among specific factors may be fixed to certain values that are informed by substantive

theoretical considerations. Note that in this case the orthogonality between the general factor

and the specific factors needs to be maintained. Fourth, when a nested-factors model is used,

removing one specific factor may allow correlations among the remaining specific factors

while retaining the orthogonality of the general and specific factors. Importantly, the specific

factor to be removed should represent the standard method of operationalizing the general

construct (Eid, Lischetzke, Nussbeck, & Trierweiler, 2003). In intelligence research, for

example, reasoning tasks can be considered standard indicators of general cognitive ability,

because reasoning ability is at the heart of the conceptual definition of intelligence

(Gottfredson, 1997; Snow, Kyllonen, & Marshalek, 1984). Thus, when this approach is

implemented, reasoning tasks are affected only by the factor representing general cognitive

ability.

Conclusion and Outlook

Our goals in writing this tutorial are twofold. First, we offer an in-depth discussion of

the psychometric properties and the interpretation of four popular, but different, CFA models

that can be used to study hierarchically structured personality constructs. Ideally, this

discussion will encourage researchers to systematically compare their favorite CFA model

with theoretically supported alternative models (Jöreskog, 1993). This comparison will foster

the development of cumulative knowledge on personality constructs operating at various

levels of generality with respect to both components of psychological theories (Edwards &

Bagozzi, 2000): (a) how personality constructs are related to corresponding measures and (b)

how personality constructs are related to other theoretical constructs, sociodemographic


HIERARCHICALLY STRUCTURED CONSTRUCTS 44

characteristics, or life out comes. Second, we hope that our tutorial will generate greater

awareness for model-based approaches to the computation of score reliability (Sijtsma, 2009).

When it comes to hierarchically structured constructs, model-based approaches address severe

limitations of widely used reliability indices, such as alpha. Taken together, the guidance

provided in this tutorial may thus help researchers to implement the Standards for

Educational and Psychological Testing (American Educational Research Association et al.,

1999) by providing a statistical rationale for the derivation and interpretation of scale scores

assessing constructs at various levels of generality.


HIERARCHICALLY STRUCTURED CONSTRUCTS 45

References

Ajzen, I., & Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B.

T. Johnson, & M. P. Zanna (Eds.), The handbook of attitudes (pp. 173-221). Mahwah,

NJ: Erlbaum.

American Educational Research Association, American Psychological Association, &

National Council on Measurement in Education. (1999). Standards for educational

and psychological testing. Washington, DC: Author.

Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in

organizational research. Organizational Research Methods, 1, 45-87.

Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency

reliability. Psychometrika, 74, 137-143.

Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley &

Sons.

Bonett, D. G. (2003). Sample size requirements for testing and estimating coefficient alpha.

Journal of Educational and Behavioral Statistics, 27, 335-340.

Boomsma, A., & Hoogland, J. J. (2001). The robustness of LISREL modeling revisited. In R.

Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural equation models: Present and

future. A Festschrift in honor of Karl Jöreskog (pp. 139-168). Lincolnwood, IL:

Scientific Software International.

Borsboom, D., & Mellenbergh, G. J. (2002). True scores, latent variables, and constructs: A

comment on Schmidt and Hunter. Intelligence, 30, 505-514.

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity.

Psychological Review, 111, 1061-1071.

Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford

Press.

Brunner, M. (2008). No g in education? Learning and Individual Differences, 18, 152-165.


HIERARCHICALLY STRUCTURED CONSTRUCTS 46

Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New

York: Cambridge University Press.

Caruso, J. C., & Cliff, N. (1999). The properties of equally and differentially weighted

WAIS–III factor scores. Psychological Assessment, 11, 198-206.

Chen, F. F., West, S. G., & Sousa, K. H. (2006). A comparison of bifactor and second-order

models of quality of life. Multivariate Behavioral Research, 41, 189-225.

Cheung, M. W.-L. (2009). Constructing approximate confidence intervals for parameters with

structural equation models. Structural Equation Modeling, 16, 267-294.

Clark, L. A., & Watson, D. (1991). Tripartite model of anxiety and depression: Psychometric

evidence and taxonomic implications. Journal of Abnormal Psychology, 100, 316-336.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple

regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ:

Lawrence Erlbaum Associates.

Colom, R., Abad, F. J., Garcia, L. F., & Juan-Espinosa, M. (2002). Education, Wechsler‘s

Full Scale IQ, and g. Intelligence, 30, 449-462.

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications.

Journal of Applied Psychology, 78, 98-104.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,

16, 297-334.

Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models.

Psychological Bulletin, 105, 317-327.

DeYoung, C. G. (2006). Higher-order factors of the big five in a multi-informant sample.

Journal of Personality and Social Psychology, 91, 1138-1151.

Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between

constructs and measures. Psychological Methods, 5, 155-174.


HIERARCHICALLY STRUCTURED CONSTRUCTS 47

Eid, M., Lischetzke, T., Nussbeck, F. W., & Trierweiler, L. I. (2003). Separating trait effects

from trait-specific method effects in multitrait–multimethod models: A multiple-

indicator CT-C(M-1) model. Psychological Methods, 8, 38-60.

Emmons, R. A. (1995). Levels and domains in personality: An introduction. Journal of

Personality, 63, 341-364.

Fleeson, W. (2004). Moving personality beyond the person–situation debate. The challenge

and the opportunity of within-person variability. Current Directions in Psychological

Science, 13, 83-87.

Gallagher, M., Lopez, S. J., & Preacher, K. J. (2009). The hierarchical structure of well-being.

Journal of Personality, 77, 1025-1050.

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum

Associates.

Gottfredson, L. S. (1997). Mainstream science on intelligence: An editorial with 52

signatories, history and bibliography. Intelligence, 24, 13-23.

Graham, J. W., & Collins, N. L. (1991). Controlling correlational bias via confirmatory factor

analysis of MTMM data. Multivariate Behavioral Research, 26, 607-629.

Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation

modeling: An alternative to coefficient alpha. Psychometrika, 74, 155-167.

Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6, 430-

450.

Gustafsson, J. E., & Balke, G. (1993). General and specific abilities as predictors of school

achievement. Multivariate Behavioral Research, 28, 407-434.

Hall, R. J., Snell, A. F., & Singer Foust, M. (1999). Item parceling strategies in SEM:

Investigating the subtle effects of unmodeled secondary constructs. Organizational

Research Methods, 2, 233-256.


HIERARCHICALLY STRUCTURED CONSTRUCTS 48

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the

frequency of use of various types. Educational and Psychological Measurement, 60,

523-531.

Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf-Gc theory. In D. P. Flanagan,

J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment:

Theories, tests, and issues (pp. 53-91). New York: The Guilford Press.

Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure

analysis: Conventional criteria versus new alternatives. Structural Equation Modeling,

6, 1-55.

Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long

(Eds.), Testing structural equation models (pp. 294-316). Newbury Park, CA: Sage

Publications.

Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to

parcel: Exploring the question, weighting the merits. Structural Equation Modeling, 9,

151-173.

Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural

analysis (4th ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:

Addison-Wesley.

Lubinski, D. (2004). Introduction to the special section on cognitive abilities: 100 years after

Spearman‘s (1904) ―‗General intelligence‘, objectively determined and measured‖.

Journal of Personality and Social Psychology, 86, 96-111.

MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in

psychological research. Annual Review of Psychology, 51, 201-226.


HIERARCHICALLY STRUCTURED CONSTRUCTS 49

Marsh, H. W. (1987). The hierarchical structure of self-concept and the application of

hierarchical confirmatory factor analysis. Journal of Educational Measurement, 24,

17-39.

Marsh, H. W., & Craven, R. (2006). Reciprocal effects of self-concept and performance from

a multidimensional perspective. Perspectives on Psychological Science, 1, 133-163.

Marsh, H. W., Hau, K. T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The

number of indicators per factor in confirmatory factor analysis. Multivariate

Behavioral Research, 33, 181-220.

Marsh, H. W., Hau, K.-T., & Grayson, D. (2005). Goodness of fit in structural equation

models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics

(pp. 275-340). Mahwah, NJ: Erlbaum.

Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on

hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in

overgeneralizing Hu and Bentler‘s (1999) findings. Structural Equation Modeling, 11,

320-341.

McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum

Associates.

McDonald, R. P. (2010). Structural models and the art of approximation. Perspectives on

Psychological Science, 5, 675-686.

McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the

shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1-10.

Mellenbergh, G. J. (1996). Measurement precision in test score and item response models.

Psychological Methods, 1, 293-299.

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.

Psychological Bulletin, 105(1), 156-166.


HIERARCHICALLY STRUCTURED CONSTRUCTS 50

Muthén, B. O. (1984). A general structural equation model with dichotomous, ordered

categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.

Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor

analysis of non-normal Likert variables. British Journal of Mathematical and

Statistical Psychology, 38, 171-189.

Muthén, L. K., & Muthén, B. O. (1998–2010). Mplus User’s Guide (6th ed.). Los Angeles,

CA: Muthén & Muthén.

Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample

size and determine power. Structural Equation Modeling, 9, 599-620.

Raykov, T., Dimitrov, D. M., & Asparouhov, T. (2010). Evaluation of scale reliability with

binary measures using latent variable modeling. Structural Equation Modeling, 17,

265-279.

Revelle, W. (2010). Package ‗psych‘. Retrieved August 24, 2010, from http://personality-

project.org/r/psych_manual.pdf

Rhemtulla, M., Brosseau-Liard, P., & Savalei, V. (2010). How many categories is enough to

treat data as continuous? A comparison of robust continuous and categorical SEM

estimation methods under a range of non-ideal situations. Manuscript submitted for

publication.

Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-

order factor analysis. Multivariate Behavioral Research, 23, 51-67.

Satorra, A. (1990). Robustness issues in structural equation modeling: A review of recent

developments. Quality & Quantity, 24, 367-386.

Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions.

Psychometrika, 22, 53-61.


HIERARCHICALLY STRUCTURED CONSTRUCTS 51

Schmiedek, F., & Li, S.-C. (2004). Toward an alternative representation for disentangling

age-associated differences in general and specific cognitive abilities. Psychology and

Aging, 19, 40-56.

Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Self-concept: Validation of construct

interpretations. Review of Educational Research, 46, 407-441.

Sijtsma, K. (2009). Reliability beyond theory and into practice. Psychometrika, 74, 169-173.

Slaney, K. L., & Maraun, M. D. (2008). A proposed framework for conducting data-based test

analysis. Psychological Methods, 13, 376-390.

Snow, R. E., Kyllonen, P. C., & Marshalek, B. (1984). The topography of ability and learning

correlations. In R. J. Sternberg (Ed.), Advances in the psychology of human

intelligence (pp. 47-103). Hillsdale, NJ: Lawrence Erlbaum Associates.

Spearman, C. (1904). ―General intelligence,‖ objectively determined and measured. American

Journal of Psychology, 15, 201-293.

Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1999). Dimensions of the Beck

Depression Inventory–II in clinically depressed outpatients. Journal of Clinical

Psychology, 55, 117-128.

Steyer, R. (1989). Models of classical psychometric test theory as stochastic measurement

models: Representation, uniqueness, meaningfulness, identifiability, and testability.

Methodika, 3, 25-60.

Stoel, R. D., Garre, F. G., Dolan, C., & van den Wittenboer, G. (2006). On the likelihood ratio

test in structural equation modeling when parameters are subject to boundary

constraints. Psychological Methods, 11, 439-455.

Streiner, D. L. (2003). Starting at the beginning: An introduction to coefficient alpha and

internal consistency. Journal of Personality Assessment, 80, 99-103.


HIERARCHICALLY STRUCTURED CONSTRUCTS 52

Swann, W. B. J., Chang-Schneider, C., & McClarty, K. L. (2007). Do people‘s self-views

matter? Self-concept and self-esteem in everyday life. American Psychologist, 62, 84-

94.

Tanaka, J. S., & Huba, G. J. (1984). Confirmatory hierarchical factor analyses of

psychological distress measures. Journal of Personality and Social Psychology, 46,

621-635.

Thompson, B. (2004). Exploratory and confirmatory factor analysis. Understanding concepts

and applications. Washington, DC: American Psychological Association.

Tomarken, A. J., & Waller, N. G. (2003). Potential problems with ―well fitting‖ models.

Journal of Abnormal Psychology, 112, 578-598.

Tulsky, D. S., & Ledbetter, M. F. (2000). Updating to the WAIS–III and WMS–III:

Considerations for research and clinical practice. Psychological Assessment, 12, 263-

262.

van der Maas, H. L. J., Dolan, C. V., Grasman, R. P. P. P., Wicherts, J. M., Huizenga, H. M.,

& Raijmakers, M. E. J. (2006). A dynamical model of general intelligence: The

positive manifold of intelligence by mutualism. Psychological Review, 113, 842-861.

Wechsler, D. (1997). Wechsler Adult Intelligence Scale–Third Edition. San Antonio, TX: The

Psychological Corporation.

West, S. G. (2006). Seeing your data: Using modern statistical graphics to display and detect

relationships. In R. R. Bootzin & P. E. McKnight (Eds.), Strengthening research

methodology: Psychological measurement and evaluation (pp. 159-182). Washington,

DC: American Psychological Association.

West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal

variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling:

Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage.
HIERARCHICALLY STRUCTURED CONSTRUCTS 53

West, S. G., Taylor, A. B., & Wu, W. (in press). Model fit and model selection in structural

equation modeling. In R. H. Hoyle (Ed.), Handbook of Structural Equation Modeling.

New York: Guilford Press.

Wilhelm, O., & Oberauer, K. (2006). Why are reasoning ability and working memory

capacity related to mental speed? An investigation of stimulus–response compatibility

in choice reaction time tasks. European Journal of Cognitive Psychology, 18, 18-50.

Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in

psychology journals: Guidelines and explanations. American Psychologist, 54, 594-

604.

Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future

directions. Psychological Methods, 12, 58-79.

Wittmann, W. W. (1988). Multivariate reliability theory: Principles of symmetry and

successful validation strategies. In J. R. Nesselroade & R. B. Cattell (Eds.), Handbook

of multivariate experimental psychology (2nd ed., pp. 505-560). New York: Plenum.

Yang, Y., & Green, S. B. (2010). A note on structural equation modeling estimates of

reliability. Structural Equation Modeling, 17, 66-81.

Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-

order factor model and the hierarchical factor model. Psychometrika, 64, 113-128.

Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach‘s alpha, Revelle‘s beta, and

McDonald‘s omega h: Their relations with each other and two alternative

conceptualizations of reliability. Psychometrika, 70, 123-133.

Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability

to a latent variable common to all of a scale‘s indicators: A comparison of estimators

for ωh. Applied Psychological Measurement, 30, 121-144.


HIERARCHICALLY STRUCTURED CONSTRUCTS 54

Footnotes
1
The online supplemental materials can be retrieved from www.emacs.uni.lu.
2
The maximum likelihood estimator used in this article is based on statistical

distribution theory of covariance matrices. Using correlation matrices with maximum

likelihood estimation leads to improper standard error estimates of model parameters and to

misleading confidence intervals and test statistics unless certain constraints are imposed on

the model parameters (Cudeck, 1989; McDonald, 1999). Only the correlation matrix of the

subtest scores was provided for the Spanish standardization sample of the WAIS–III. To

obtain correct standard errors for model parameters, we therefore followed McDonald (1999,

pp. 193-195) and specified in all models under investigation (a) appropriate constraints and

(b) scaling parameters for the estimation of subtest-specific factors (i.e., e1 to e14). Mplus (L.

K. Muthén & Muthén, 1998–2010) syntax files are presented in the Appendix. Note that the

2 value of the goodness-of-fit statistic of overall fit and the descriptive fit statistics (e.g.,

RMSEA, CFI, and SRMR) are typically not affected when correlation matrices are used

instead of covariance matrices (McDonald, 1999, p. 194).


3
The scale scores derived in the present article are not identical to the Full Scale IQ or

to the index scores of the WAIS–III, which were computed with a selection of subtests

(Caruso & Cliff, 1999). Accordingly, our reliability estimates do not apply to the Full Scale

IQ or to the index scores of the WAIS–III for the Spanish standardization sample.
4
We estimated score reliability for CFA models in which factor loadings and variances

of subtest-specific factors could vary across manifest measures (reflecting the assumption of

congeneric measures). Note that CFA models may also be applied to estimate score reliability

in more restrictive measurement models (Bollen, 1989, p. 208) in which factor loadings are

constrained to be equal for all measures (tau-equivalent measures) or in which factor loadings

and the variances of the subtest-specific factors are constrained to be equal for all measures

(parallel measures).
HIERARCHICALLY STRUCTURED CONSTRUCTS 55

5
Sample size also affects the precision of the estimation of alpha (Bonett, 2003). Thus,

alpha may not be preferable to omega or omega hierarchical, even in cases of small samples.
6
This pattern of results is akin to problems encountered in the interpretation of scale

score profiles, when differences between scale scores are computed to identify a person‘s

strengths and weaknesses. For example, the reliability of differences between WAIS–III index

scores assessing specific ability constructs was found to be low (when using unit weights to

compute index scores) as these scores proved to be strongly mutually correlated (e.g., Caruso

& Cliff, 1999).


HIERARCHICALLY STRUCTURED CONSTRUCTS 56

Acknowledgements

We thank the editor Stephen West and all four reviewers for their valuable comments on an

earlier version of this manuscript, and Susannah Goss for editorial support.
HIERARCHICALLY STRUCTURED CONSTRUCTS 57

Table 1
Intercorrelations Among the Subtest Scores of the WAIS–III (as Obtained for the Spanish Standardization Sample)
Task score 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
1. Vocabulary (voc)a --
2. Similarities (sim) .755 --
3. Arithmetic (ari) .608 .596 --
4. Digit span (dig) .555 .566 .614 --
5. Information (inf) .715 .678 .661 .543 --
6. Comprehension (com) .729 .697 .554 .503 .671 --
7. Letter–number (let) .627 .612 .669 .759 .603 .567 --
8. Picture completion (pic_c) .616 .621 .567 .538 .599 .552 .612 --
9. Digit–symbol coding (cod) .606 .582 .576 .590 .532 .502 .689 .643 --
10. Block design (blo) .598 .605 .625 .567 .616 .496 .655 .679 .668 --
11. Matrices (mat) .657 .668 .699 .609 .634 .564 .692 .711 .711 .769 --
12. Picture arrangement (pic_a) .613 .623 .585 .568 .616 .574 .665 .677 .672 .692 .753 --
13. Symbol search (sym) .588 .57 .584 .563 .533 .494 .675 .623 .787 .673 .717 .670 --
14. Object assembly (obj) .560 .554 .537 .54 .538 .490 .597 .619 .627 .742 .689 .673 .649 --
Note. This table is reprinted from Intelligence, 30, R. Colom, F. J. Abad, L. F. Garcia, M. Juan-Espinosa, Education, Wechsler‘s Full Scale IQ, and
g, 449-462, Copyright (2002), with permission from Elsevier.
a
Labels in parentheses are those used in the Mplus syntax presented in the Appendix.
HIERARCHICALLY STRUCTURED CONSTRUCTS 58

Table 2
Fit of the Four Factor Models to the WAIS–III Data
Model 2 df CFI RMSEA SRMR

One-factor model 1923 77 .888 .132 .050

First-order factor model 515 71 .973 .068 .028

Higher-order factor model 570 73 .970 .071 .032

Nested-factor model 376 64 .981 .060 .023

Note. All 2 goodness-of-fit tests were statistically significant at p < .001. CFI = Comparative
Fit Index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root
Mean Squared Residual.
HIERARCHICALLY STRUCTURED CONSTRUCTS 59

Table 3
Standardized Factor Loadings of the Subtests of the WAIS–III as Obtained by Applying the
Schmid-Leiman Transformation to the Higher-Order Factor Model (Figure 1c)
Subtest gHO VCHO, specific POHO, specific WMHO, specific PSHO, specific eHO

Information .70 .42 .33

Vocabulary .76 .45 .22

Similarities .74 .43 .27

Comprehension .70 .41 .34

Object assembly .77 .20 .37

Block design .83 .21 .27

Picture completion .77 .20 .36

Matrix reasoning .87 .22 .20

Picture arrangement .81 .21 .30

Digit span .74 .33 .34

Letter–number sequencing .82 .37 .20

Arithmetic .71 .32 .39

Digit–symbol coding .81 .36 .21

Symbol search .81 .36 .22


HIERARCHICALLY STRUCTURED CONSTRUCTS 60

Table 4

Example Equations for the Computation of Score Reliability (in Terms of  and h)

Equations

One-Factor Model: General Cognitive Ability Score

 = (.79 + .78 + .77 + .73 + .76 + .71 + .82 + .79 + .81 + .83 + .88 + .82 + .80

+ .77)2 / [(.79 + .78 + .77 + .73 + .76 + .71 + .82 + .79 + .81 + .83 + .88 + .82 + .80

+ .77)2 + (.38 + .39 + .41 + .47 + .42 + .50 + .33 + .38 + .35 + .32 + .23 + .32 + .36 + .41)] =
.96

First-Order Factor Model: Verbal Comprehension Score

 = (.82 + .88 + .85 + .81)2 / [(.82 + .88 + .85 + .81)2 + (.33 + .23 + .27 + .34)] = .91

Higher-Order Factor Model: General Cognitive Ability Score

 = [(.70 + .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42
+ .45 + .43 + .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2] / [(.70 +
.76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42 + .45 + .43
+ .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2 + (.33 + .22 + .27 +
.34 + .37 + .27 + .36 + .20 + .30 + .34 + .20 + .39 + .21 + .22)] = .97

h = (.70 + .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 / [(.70
+ .76 + .74 + .70 + .77 + .83 + .77 + .87 + .81 + .74 + .82 + .71 + .81 + .81)2 + (.42 + .45 +
.43 + .41)2 + (.20 + .21 + .20 + .22 + .21)2 + (.33 + .37 + .32)2 + (.36 + .36)2 + (.33 + .22 + .27
+ .34 + .37 + .27 + .36 + .20 + .30 + .34 + .20 + .39 + .21 + .22)] = .92

Nested-factor Model: Verbal Comprehension Score

 = [(.73 + .76 + .75 + .66)2 + (.35 + .46 + .39 + .50)2] / [(.73 + .76 + .75 + .66)2 + (.35 + .46
+ .39 + .50)2 + (.34 + .22 + .28 + .31)] = .91

h = (.35 + .46 + .39 + .50)2 / [(.73 + .76 + .75 + .66)2 + (.35 + .46 + .39 + .50)2 + (.34 + .22
+ .28 + .31)] = .23
HIERARCHICALLY STRUCTURED CONSTRUCTS 61

Table 5
Model-Based Variance Composition and Reliabilities ( and h) of the WAIS–III Scale Scores

Composition of scale score variance


Scale score Target Observed % target % nontarget % error  h
construct variance construct construct(s)
One-factor model
General cognitive ability gOF 127.1 95.9 -- 4.1 .96 --

First-order factor model


Verbal comprehension VCFO 12.5 90.7 -- 9.3 .91 --
Perceptual organization POFO 19.0 92.1 -- 7.9 .92 --
Working memory WMFO 7.1 86.9 -- 13.1 .87 --
Processing speed PSFO 3.6 88.1 -- 11.9 .88 --

Higher-order factor model


Verbal comprehension VCHO,specific 12.5 23.5 67.2 9.3 .91 .23
Perceptual organization POHO,specific 19.0 5.8 86.3 7.9 .92 .06
Working memory WMHO,specific 7.1 14.5 72.4 13.1 .87 .15
Processing speed PSHO,specific 3.6 14.6 73.5 11.9 .88 .15
General cognitive ability gHO 126.9 92.4 4.4 3.2 .97 .92

Nested-factor model
Verbal comprehension VCNF,specific 12.5 23.4 67.5 9.1 .91 .23
Perceptual organization PONF,specific 18.9 5.0 87.6 7.3 .93 .05
Working memory WMNF,specific 7.1 15.4 72.6 12.0 .88 .15
Processing speed PSNF,specific 3.6 15.9 72.1 11.9 .88 .16
General cognitive ability gNF 127.0 92.7 4.3 3.0 .97 .93
Note. g = general cognitive ability; VC = Verbal Comprehension; PO = Perceptual Organization; WM = Working Memory; PS = Processing Speed; OF = one-factor model; FO =
first-order factor model; HO = higher-order factor model; NF = nested-factor model.
HIERARCHICALLY STRUCTURED CONSTRUCTS 62

Figure Captions

Figure 1. Alternative factor models for the WAIS–III: (a) One-factor model (OF), (b) first-

order factor model (FO), (c) higher-order factor model (HO), and (d) nested-factor model

(NF). Standardized solution is shown. g = general cognitive ability; VC = Verbal

Comprehension; PO = Perceptual Organization; WM = Working Memory; PS = Processing

Speed. Note: All models were identified by fixing the variance of the latent factors to 1.00; all

other model parameters were freely estimated. For the higher-order factor model, the

variances of the specific first-order factors (i.e., VCHO,specific, POHO,specific, WMHO,specific, and

PSHO,specific) were thus constrained to 1 – squared factor loading of the corresponding first-

order factor on gHO (e.g., the variance of VCHO,specific was constrained to equal 1 – squared

factor loading of VC on gHO). Further, for the nested-factor model, the factor loadings of the

two subtests on PSNF,specific were constrained to be equal (the nested-factor model otherwise

proved to be empirically underidentified).


63

Figure 1
a. One-Factor Model b. First-Order Factor Model
eOF,1 = .42 eFO,1 = .33
Information Information
.82
eOF,2 = .38 .76 eFO,2 = .23 1
Vocabularly Vocabularly .88
.79
eFO,3 = .27
VCFO
eOF,3 = .39 Similarities Similarities .85
.78
eOF,4 = .50 eFO,4 = .34 .81
Comprehension Comprehension
.71

.84
eOF,5 = .41 eFO,5 = .37
Object Assembly Object Assembly
.77 .80
eOF,6 = .32 eFO,6 = .27
Block Design .83 Block Design .85 1
eOF,7 = .38 eFO,7 = .36 .80
Picture Completion .79 Picture Completion
POFO .82
1 .89
eOF,8 = .23 .88 eFO,8 = .20 Matrix Reasoning
Matrix Reasoning
.82 gOF Picture
.84 .74
eOF,9 = .32 Picture eFO,9 = .30
Arrangement Arrangement
.87

.73
eOF,10 = .47 eFO,10 = .34 1
Digit Span Digit Span
.82 .81
eOF,11 = .33 Letter-Number eFO,11 = .20 Letter-Number .90
Sequencing Sequencing WMFO .90
eOF,12 = .41 .77 eFO,12 = .39
Arithmetic Arithmetic .78

.73
.84

1
.80
eOF,13 = .47 eFO,13 = .34 Digit-Symbol .89
Digit-Symbol
Coding Coding
PSFO
eOF,14 = .36 eFO,14 = .22
Symbol Search Symbol Search .89

c. Higher-Order Factor Model d. Nested-Factor Model


.26
eHO,1 = .33 VCHO,specific
Information
.82 Information eNF,1 = .34
.35
eHO,2 = .22 1
Vocabularly .88 Vocabularly .46 eNF,2 = .22

eHO,3 = .27
.85 VCHO .73 .39 VCNF,specific
Similarities Similarities eNF,3 = .28
.50
eHO,4 = .34 .81
Comprehension Comprehension eNF,4 = .31
.76
.86 .75
.06
eHO,5 = .37 .66
Object Assembly
.80 POHO,specific Object Assembly .33 eNF,5 = .32
.75
eHO,6 = .27
Block Design .85 Block Design .37 eNF,6 = .19
.82
1
eHO,7 = .36 .80 1
Picture Completion .09
POHO Picture Completion eNF,7 = .37
.89 .97 1 PONF,specific
.79 .10
eHO,8 = .20 Matrix Reasoning gNF .89
Matrix Reasoning
.08
eNF,8 = .20

eHO,9 = .30 Picture .84 gHO Picture


Arrangement .17 .83 eNF,9 = .31
Arrangement

WMHO,specific .91
eHO,10 = .34 .70
Digit Span .81 .52 eNF,10 = .24
Digit Span
.81 1
eHO,11 = .20 Letter-Number .90 Letter-Number .37 eNF,11 = .21
Sequencing WMHO .76 Sequencing WMNF,specific
.91 .16
eHO,12 = .39 eNF,12 = .40
Arithmetic .78 Arithmetic
.17 .80

PSHO,specific

.80
eHO,13 = .34 Digit-Symbol .89 Digit-Symbol .38 1 eNF,13 = .21
Coding Coding
.88 PSHO .38 PSNF,specific
eHO,14 = .22 eNF,14 = .22
Symbol Search Symbol Search

View publication stats

You might also like