You are on page 1of 10

Psychological Assessment © 2010 American Psychological Association

2010, Vol. 22, No. 1, 121–130 1040-3590/10/$12.00 DOI: 10.1037/a0017767

Independent Examination of the Wechsler Adult Intelligence


Scale—Fourth Edition (WAIS–IV): What Does the WAIS–IV Measure?

Nicholas Benson and David M. Hulac John H. Kranzler


The University of South Dakota University of Florida

Published empirical evidence for the Wechsler Adult Intelligence Scale—Fourth Edition (WAIS–IV)
does not address some essential questions pertaining to the applied practice of intellectual assessment. In
this study, the structure and cross-age invariance of the latest WAIS–IV revision were examined to (a)
elucidate the nature of the constructs measured and (b) determine whether the same constructs are
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

measured across ages. Results suggest that a Cattell–Horn–Carroll (CHC)–inspired structure provides a
This document is copyrighted by the American Psychological Association or one of its allied publishers.

better description of test performance than the published scoring structure does. Broad CHC abilities
measured by the WAIS–IV include crystallized ability (Gc), fluid reasoning (Gf ), visual processing (Gv),
short-term memory (Gsm), and processing speed (Gs), although some of these abilities are measured
more comprehensively than are others. Additionally, the WAIS–IV provides a measure of quantitative
reasoning (QR). Results also suggest a lack of cross-age invariance resulting from age-related differences
in factor loadings. Formulas for calculating CHC indexes and suggestions for interpretation are provided.

Keywords: Wechsler Adult Intelligence Scale—Fourth Edition, confirmatory factor analysis, Cattell–
Horn–Carroll theory, cross-age invariance

The Wechsler Adult Intelligence Scale—Fourth Edition visual variation added to reduce motor demands, (d) adding an
(WAIS–IV; Wechsler, 2008a) is the latest version in a long line of additional measure of processing speed (Gs; i.e., Cancellation),
Wechsler products dating back to the Wechsler–Bellevue Intelli- and (e) revising the Arithmetic and Digit Span subtests to place
gence Scale (Wechsler, 1939). The long and rich history of re- greater demands on working memory (Wechsler, 2008a). The new
search and clinical application undoubtedly contributes to their Visual Puzzles subtest requires examinees to view a completed
status as the most frequently used measures of intelligence for puzzle and select the response options that can be combined to
children, adolescents, and adults (Kaufman & Lichtenberger, form the puzzle, the new Figure Weights subtest is a quantitative
2006). and analogical reasoning task requiring selection of the appropriate
David Wechsler (Wechsler, 1940) viewed intelligence as a missing weight(s) needed to balance a scale, and the new Cancel-
multifaceted, global capacity reflecting a variety of qualitatively lation subtest requires examinees to discriminate the color and
different intellectual abilities as well as nonintellectual qualities shape of stimuli and identify target shapes within a structured
such as personality. Although Wechsler’s view of intelligence has arrangement of shapes.
not attracted “serious theoretical or scientific interest” (Jensen, Although developers of the WAIS–IV argued that the preceding
1987, p. 72), revisions of the Wechsler have favored continuity
changes improve the alignment of the newest Wechsler scales with
with previous editions over alignment with a contemporary theory
current theoretical advances, the overall structure of the scales is
of cognitive abilities (Zhu & Weiss, 2005). Although recent revi-
not aligned with a contemporary theory because the test’s devel-
sions of the Wechsler scales retain considerable continuity with
opers believe Wechsler’s original structure should be adhered to as
previous editions, there has been an increased effort to reflect
closely as possible due to an abundance of evidence supporting
research and theoretical advances. Notable content changes in-
clinical and practical applications (Wechsler, 2008b). However, in
clude (a) replacing the Verbal IQ and Performance IQ interpreta-
tion with a four factor structure (i.e., Verbal Comprehension Index, addition to such external validity evidence, a theoretical context is
Perceptual Reasoning Index, Working Memory Index, and Pro- needed in order to derive quality interpretations from IQ profiles
cessing Speed Index), (b) adding an additional measure of fluid (Flanagan & McGrew, 1997; Kaufman, 1994). An appropriate
reasoning (Gf; i.e., Figure Weights), (c) replacing the Object theoretical context is one that is scientifically supported (i.e., has
Assembly subtest with the new Visual Puzzles subtest, which is a been confirmed by positive instances of its assertions and has
survived numerous attempts at refutation). Linking test perfor-
mance to a scientifically supported theory establishes support for
the assertion that performance reliably reflects the construct of
Nicholas Benson and David M. Hulac, Division of Counseling and intelligence. Although a scientific theory can never be proven, the
Psychology in Education, The University of South Dakota; John H. Kran-
link between test scores and a theory can be established by gath-
zler, Department of Educational Psychology, University of Florida.
Correspondence concerning this article should be addressed to Nicholas
ering persuasive evidence supporting structural fidelity. Structural
Benson, Division of Counseling and Psychology in Education, The Uni- fidelity, a necessary condition for internal validity, exists when the
versity of South Dakota, 414 East Clark Street, Delzell Education Center interitem structure corresponds adequately to the substantive do-
Room 205D, Vermillion, SD 57069. E-mail: nicholas.benson@usd.edu main of interest (Loevinger, 1957).

121
122 BENSON, HULAC, AND KRANZLER

Current Internal Validity Evidence CHC theory is the most comprehensive and empirically sup-
ported psychometric theory pertaining to the structure of human
The accuracy and precision of test interpretations is dependent cognitive abilities (Alfonso, Flanagan, & Radwan, 2005). CHC
on theoretical support and strong psychometric properties (Amer- theory is an integrated theory that represents both CHC (Horn &
ican Educational Research Association, American Psychological Noll, 1997) and three-stratum (Carroll, 1993) theories of cognitive
Association, & National Council on Measurement in Education, abilities; CHC theory was developed from efforts to resolve dis-
1999). Although WAIS–IV development was influenced by con- crepancies between these two theories (McGrew, 1997). CHC
temporary research and theory pertaining to intelligence, the meld- theory is hierarchical and includes three strata. Psychometric g
ing of Wechsler’s view of intelligence with contemporary research (i.e., the general factor presumed to cause the positive correlation
yielded a structure that does not correspond to any modern theory of mental tasks) is located at the apex (Stratum III). Stratum II
of intelligence. consists of 10 broad abilities: fluid reasoning (Gf ), crystallized
Although the theoretical underpinnings of the WAIS–IV are intelligence (Gc), short-term memory (Gsm), long-term retrieval
dubious, psychometric data presented in the technical and inter- (Glr), visual processing (Gv), auditory processing (Ga), processing
pretive manual (Wechsler, 2008b) provide empirical support for speed (Gs), decision speed/reaction time (Gt), quantitative knowl-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

use of test scores derived from WAIS–IV performance when


This document is copyrighted by the American Psychological Association or one of its allied publishers.

edge (Gq), and reading and writing ability (Grw). Stratum I con-
making clinical inferences about an examinee’s intellectual func- sists of narrow CHC abilities. Space does not allow for presenta-
tioning. Average internal consistency reliability coefficients for tion of all Stratum I abilities, as more than 100 narrow CHC
subtests range from .78 for Cancellation to .94 for Vocabulary. abilities have been identified and defined (McGrew, 2005).
Average internal consistency reliability coefficients for the
WAIS–IV composites range from .90 for the Processing Speed
Purpose of Study
Index to .98 for the Full Scale IQ. The WAIS–IV authors provide
extensive exploratory and confirmatory factor analytic evidence Published empirical evidence for the WAIS–IV does not address
for each of 13 age-differentiated subgroups (ranging from age 16 essential questions pertaining to the applied practice of intellectual
years to age 90 years 11 months) in the technical and interpretive assessment. In this study, the structure and cross-age invariance of
manual (Wechsler, 2008b). However, empirical evidence regard- the latest WAIS–IV revision were examined to answer the follow-
ing the invariance of the WAIS–IV factor structure across ages is ing research questions. First, does a CHC-based theoretical struc-
not provided. Without such evidence, clinicians will not have the ture provide a better explanation of test performance than does the
research support necessary to determine whether subtest and com- WAIS–IV scoring structure? Second, what exactly is the nature of
posite scores reflect the same constructs across age groups. More- the constructs measured by the WAIS–IV? The fourth revision of
over, the WAIS–IV structure is not tested against a plausible, the WAIS contains some substantial changes from previous ver-
alternative, theoretically based structural model. Strong programs sions, as described above. Do these changes influence the substan-
of test validation should include examination of plausible alterna- tive theoretical meaning of the constructs as measured by the
tive theoretical models to determine whether other methods of WAIS–IV? We specified cross-loadings for subtests a priori based
interpretation can better describe examinees’ test performance on the content of the subtests as delineated in the technical and
(Benson, 1998). interpretive manual (Wechsler, 2008b), Carroll’s (1993) survey of
factor analytic research, and the substantive hypotheses formulated
by Keith et al. (2006) in their work pertaining to the WISC–IV.
Plausible Alternative Structure
Although not every possible permutation of subtest cross-loadings
Retention of a unique structure that is theoretically ambiguous was examined, note that modification indices were examined for
warrants examination of an alternative, theoretically based struc- parameters not explicitly tested. Thus, if a substantial cross-
tural model because an empirically supported theoretical structure loading existed but was not formally tested, the modification index
is widely accepted as a prerequisite for valid interpretation of for that parameter would have indicated that allowing the cross-
intelligence test performance (Flanagan & McGrew, 1997; Kauf- loading would yield a substantial improvement in model fit.
man, 1994). Interpreting a test through the lens of a theoretically The third and final research question pertains to the cross-age
based structural model allows practitioners to refer to the vast invariance (i.e., equivalence of test scores across age-differentiated
research supporting the theory when interpreting an examinee’s groups as established by empirical comparisons) of scores derived
test performance (Kamphaus, Winsor, Rowe, & Kim, 2005). Sim- from the WAIS–IV. In other words, does the WAIS–IV measure
ilarly, linking the WAIS–IV to a commonly accepted theoretical the same constructs across age levels? The process for assessing
model allows practitioners to compare WAIS–IV scores with test invariance used in this study was described by Keith (2005). This
scores obtained with other assessment tools (Kamphaus et al., process was used because it separates testing of cross-age invari-
2005). As the WAIS–IV did not adopt an empirically supported ance from testing of the factor structures and thus is well-aligned
theory, interpretation of performance may be enhanced by apply- with the substantive research questions addressed in this study.
ing a theoretically based structure derived from Cattell–Horn–
Carroll (CHC) theory (McGrew, 1997). Research pertaining to the Method
Wechsler Intelligence Scale for Children—Fourth Edition (WISC–
IV; Wechsler, 2003) suggested that a CHC-derived theoretical Participants
structure provides a better description of the WISC–IV perfor-
mance than does the four-factor scoring model (Keith, Fine, Taub, In this study, we used the WAIS–IV standardization sample
Reynolds, & Kranzler, 2006). (ages 16 years, 0 months to 90 years, 11 months), which included
INDEPENDENT EXAMINATION OF THE WAIS–IV 123

2,200 participants. The WAIS–IV standardization sample was subgroups. The third matrix was the product of all subgroups
selected to approximate current U.S. census projections and was ranging from ages 16 years to 69 years, 11 months. All three of
stratified according to four specific variables: census region, sex, these matrices are available from Nicholas Benson by request.
education level, and ethnicity. Data from these 2,200 participants Formation of the first two averaged matrices allowed for use of
were presented in the technical and interpretive manual (Wechsler, a calibration–validation approach. Use of calibration and valida-
2008b) as 13 age-differentiated subsamples: 16 years to 17 years, tion samples allowed hypotheses supported by initial analyses
11 months (n ⫽ 200); 18 years to 19 years, 11 months (n ⫽ 200); (calibration) to be validated with another portion of the data and
20 years to 24 years, 11 months (n ⫽ 200); 25 years to 29 years, separated testing of alternative models from testing of cross-age
11 months (n ⫽ 200); 30 years to 34 years, 11 months (n ⫽ invariance (Keith et al., 2006). The 65 years to 69 years, 11
200); 35 years to 44 years, 11 months (n ⫽ 200); 45 years to 54 months, subgroup was excluded during formation of the calibra-
years, 11 months (n ⫽ 200); 55 years to 64 years, 11 months (n ⫽ tion and validation samples in order to obtain samples of equal
200); 65 years to 69 years, 11 months (n ⫽ 200); 70 years to 74 size. However, the final model (i.e., the result of the validation
years, 11 months (n ⫽ 100); 75 years to 79 years, 11 months (n ⫽ stage) was further tested against an averaged matrix for ages 16
100); 80 years to 84 years, 11 months (n ⫽ 100); and 85 years to years to 69 years, 11 months, to ensure that it provides adequate fit
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

90 years, 11 months (n ⫽ 100). Notably, four age-differentiated


This document is copyrighted by the American Psychological Association or one of its allied publishers.

across this age range. The specific hypotheses tested during this
subgroups ranging from age 70 years to age 90 years, 11 months, series of model comparisons are provided in the results section.
were not included in analyses relevant to the research questions The third research question was addressed with a series of
addressed in this study because three measures (i.e., Letter– analyses designed to test both magnitude and location of bias
Number Sequencing, Figure Weights, and Cancellation) were not (Keith, 2005). The first step involved testing invariance of covari-
used with participants aged 70 years or older. Thus, the current ance matrices across age-differentiated subgroups, which was ac-
analysis only includes age groups from 16 years to 69 years, 11 complished by constraining variances and covariances to be equal
months. This decision was made because the aim of this study was for ages 16 years to 69 years, 11 months. Invariance of covariance
to determine what constructs the WAIS–IV measures, and includ- matrices was examined first because all potential factor structures
ing more rather than fewer subtests is viewed as the best approach are solved with these same covariance matrices. The second step
for accomplishing this goal (Keith et al., 2006). involved constraining all parameters to be equal across age-
differentiated subgroups. The first step provided a general test of
Analyses invariance that is independent of a specific factor structure, a
specific factor structure was imposed as a second step, and any
In this study, we used confirmatory factor analysis (CFA) to observed differences in fit between the models examined in these
answer the aforementioned research questions. The average stan- steps reflect lack of invariance arising from the factor structure. If
dard deviations and intercorrelations of the WAIS–IV subtest scale bias is evident, then the location of this bias can be located by
scores, which are provided in the technical and interpretive manual systematically relaxing model parameters (i.e., factor loadings,
(Wechsler, 2008b), were used as data input and were converted to higher order factor loadings, factor variances, and unique vari-
variance– covariance matrices to facilitate data analyses. Both core ances).
and supplemental subtests were included in analyses, as inclusion
of supplemental subtests is likely to clarify the substantive mean- Use of Fit Indices
ing of the WAIS–IV by ensuring that constructs are adequately
represented (Keith et al., 2006). Analyses were performed with Simulation studies suggested that multiple fit indices should be
Amos 7.0 (Arbuckle, 2007) computer software following the used when evaluating model fit (Fan & Silvo, 2005). Although
method of maximum-likelihood estimation. Maximum-likelihood some fit indices have been shown to provide redundant informa-
estimation, in which one assumes a multivariate normal distribu- tion (Meade, Johnson, & Braddy, 2008), several types of fit indices
tion, is a robust (Hoyle & Panter, 1995) and widely used (Kline, have been developed, each reflecting different facets of model fit.
2005) approach for examining free parameters (i.e., a parameter Researchers must select an adequate set of indices for examin-
that is not fixed at any particular value by the model hypothesis). ing model fit in light of substantive research questions and
The first research question was addressed by testing the probable sources of bias (Miles & Shevlin, 2007). The indices
WAIS–IV scoring structure against a structural model derived we used included the chi-squared test (␹2), the standardized
from an empirically supported theory of intelligence (i.e., CHC root-mean-square residual (SRMR), the root-mean-square error
theory). The second research question was addressed with a series of approximation (RMSEA), the comparative fit index (CFI),
of analyses to examine questions regarding how these changes the Akaike information criterion (AIC), and the change in
may have influenced the substantive meaning of the constructs chi-square value (⌬␹2).
measured by the WAIS–IV. Three averaged matrices were com- The null hypothesis of perfect fit was tested with ␹2 in response
puted to facilitate analyses. The first two of these matrices consist to Barrett’s (2007) recent call for researchers to report results of a
of every other age-differentiated subgroup for ages 16 years to 64 null hypothesis significance test. The SRMR was used because it
years, 11 months. Thus, the first matrix was a product of the provides standardized residuals to convey important information
16 years to 17 years, 11 months; 20 years to 24 years, 11 months; about model fit, is sensitive to model misspecification, and is
30 years to 34 years, 11 months; and 45 years to 54 years, 11 insensitive to sample size and violations of distributional assump-
months, subgroups and the second matrix was a product of the 18 tions (Bentler, 2007). The SRMR reflects similarity of the ob-
years to 19 years, 11 months; 25 years to 29 years, 11 months; 35 served covariance matrix to the model implied covariance matrix,
years to 44 years, 11 months; and 55 years to 64 years, 11 months, with good models displaying small residuals on average. The
124 BENSON, HULAC, AND KRANZLER

RMSEA was used because simulation studies suggest it performs of either model are consistent with the data. This finding is not
reasonably well as a test of invariance and is fairly insensitive to surprising, as the WAIS–IV is intentionally designed to consist of
sample size, interactions among variables, and the reliability of subtests that are reliable and interrelated (i.e., influenced by psy-
indicators (Meade et al., 2008). The RMSEA reflects how well the chometric g), and the ␹2 statistic is known to be biased against
model would fit the population covariance matrix if optimal pa- reliable indicators that do not have substantial unique variance
rameter values were chosen. The CFI was used because it is (Miles & Shevlin, 2007). Moreover, each subtest measures narrow
insensitive to sample size and has performed well as a test of abilities measured insufficiently (i.e., by at least two subtests) for
invariance during simulation studies (Cheung & Rensvold, 2002; inclusion as viable constructs in a model of WAIS–IV structure.
Meade et al., 2008). The CFI reflects the fit of the hypothesized Given extensive empirical evidence supporting the existence of
model relative to an independence model (i.e., a model in which all narrow cognitive abilities (e.g., Carroll, 1993), it is likely that
correlations among variables in the model are zero). The AIC was excluding narrow cognitive abilities from the structural model
used to identify the most parsimonious model between the yielded a discrepancy between the model and the data. Thus,
WAIS–IV scoring structure and the alternative CHC model. Sim- specification of a structural model for the WAIS–IV that fits the
ulation studies suggest that AIC is useful when conducting model data perfectly, yet remains theoretically sound, is seemingly an
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

comparisons, including comparisons involving nonnested models unrealistic goal. A more reasonable goal is to determine whether
(Yuan, 2005). For nested models, ⌬␹2 was used to determine the WAIS–IV is consistent with a prevailing theory.
whether adding theoretically substantive constraints resulted in a Results suggest that a CHC model provides a better explanation
large and statistically significant drop in ␹2 values (Keith, 2005). of test performance than does the WAIS–IV scoring structure. As
shown in Table 1, the CHC model yielded a substantial drop in the
Results AIC relative to the WAIS–IV scoring structure. Because these
models are not nested, AIC is a better index for comparison than
Research Question 1: Examination of a Plausible is ⌬␹2 (Keith, 2005). The first-order and second-order factor
loadings shown in Figure 2 suggest that the subtests are robust
Alternative (CHC) Model
measures of the abilities specified in the initial CHC model.
The WAIS–IV scoring structure shown in Figure 1 was tes- Consistent with previous research (Bickley, Keith, & Wolfle,
ted against the CHC-inspired initial structural model shown in 1995; Gustafsson, 1984; Keith et al., 2006), the second-order
Figure 2. Notably, both models failed the ␹2 null hypothesis test of loading of psychometric g on Gf reached unity. The second-order
perfect fit. This does not mean that all aspects of either model are loadings of psychometric g on Gc, Gv, and Gsm were strong (.79,
false. Rather, this finding merely refutes the claim that all aspects .84, and .82, respectively), whereas the second-order loading of

Similarities u1
uf1
.81
.88 Vocabulary u2
Verbal .80
Comprehension Information u3
.84
Comprehension u4

.80 uf2 Block Design u5


.76
.70 Matrix Reasoning u6
Perceptual
.76
Reasoning
.92 Visual Puzzles u7
.78
g .57 Figure Weights u8

Picture Completion u9
.90 uf3
.82
Digit Span u10
.66 Working .79
Memory Arithmetic u11
.79
Letter-Number u12
Sequencing
uf4
.76 u13
Symbol Search
Processing .77
Speed Coding u14
.58
Cancellation u15

Figure 1. The Wechsler Adult Intelligence Scale—Fourth Edition model, estimated with calibration data
consisting of every other age group in the standardization sample from ages 16 years to 64 years, 11 months
(ages 16 years to 17 years, 11 months, 20 years to 24 years, 11 months, 30 years, to 34 years, 11 months, and
45 years to 54 years, 11 months). The numbers next to u denote the subtest to which the unique subtest variance
(u) corresponds. The numbers next to uf denote the first-order factor to which the unique factor variance (uf)
corresponds. g ⫽ psychometric g.
INDEPENDENT EXAMINATION OF THE WAIS–IV 125

uf1 Similarities u1
.81
.88 Vocabulary u2
Gc .80
.84 Information u3
uf2
Comprehension u4
.82
.79 Block Design u5
Gv
.81
.59 Matrix Reasoning u6
.84 uf3 .70
Visual Puzzles u7
1.00 .78
g Gf Figure Weights u8

.79 Picture Completion u9


.82 uf4
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

.86 Digit Span u10


This document is copyrighted by the American Psychological Association or one of its allied publishers.

.63
Gsm Arithmetic u11
.83
Letter-Number u12
Sequencing
uf5
.76 Symbol Search u13
.77
Gs Coding u14
.58
Cancellation u15

Figure 2. Initial Cattell–Horn–Carroll model, estimated with calibration data. The numbers next to u
denote the subtest to which the unique subtest variance (u) corresponds. The numbers next to uf denote the
first-order factor to which the unique factor variance (uf) corresponds. g ⫽ psychometric g; Gc ⫽
crystallized ability; Gv ⫽ visual processing; Gf ⫽ fluid reasoning; Gsm ⫽ short-term memory; and Gs ⫽
processing speed.

psychometric g on Gs was moderate (.63). Because the CHC WAIS–IV technical and interpretive manual (Wechsler, 2008b),
model appears to provide a better explanation of performance than Carroll’s (1993) survey of factor analytic research, and the sub-
the WAIS–IV scoring structure does, the CHC model was used as stantive hypotheses formulated by Keith et al. (2006) in their work
a baseline to test substantive questions regarding the WAIS–IV pertaining to the WISC–IV. Most of these hypotheses pertained to
structure. plausible subtest cross-loadings. Hypotheses regarding cross-
loadings were examined to determine whether subtests are pure or
Research Question 2: Calibration Analyses Examining mixed measures of CHC abilities.
Structural Alternatives for the WAIS–IV Results from calibration analyses are shown in Table 1. Cali-
We examined substantive hypotheses regarding the WAIS–IV bration analyses suggest that separate Gf and Gv factors are nec-
scales on the basis of the content of the subtests as delineated in the essary. Combining the Gf and Gv factors resulted in a poor fit to

Table 1
Summary of Alternative Model Evaluation for the WAIS–IV With a Calibration Sample

Model ␹2(df) p ⌬␹2 ⌬df p RMSEA SRMR CFI AIC

WAIS–IV scoring model 431.520 (86) .000 — — — .071 .044 .947 499.520
Initial CHC model 312.698 (85) .000 — — — .058 .039 .965 382.698a
Combine Gf and Gv 433.331 (86) .000 — — — .071 .043 .935 501.331b
AR with Gsm cross-load 287.012 (84) .000 25.686b 1 .000 .055 .038 .969 359.012
AR with Gsm and Gc cross-loads 282.956 (83) .000 4.056a 1 .044 .055 .037 .969 356.956
Nonzero residual covariance for AR and FW
and FW to correlate 311.975 (84) .000 0.723b 1 .395 .058 .039 .965 383.975
MR with Gv cross-load 309.173 (84) .000 3.525b 1 .060 .058 .039 .966 381.173
PCm with Gc cross-load 308.548 (84) .000 4.150b 1 .042 .058 .038 .966 380.548
CA with Gsm cross-load 299.446 (84) .000 13.252b 1 .000 .057 .037 .967 371.446
SS with Gv cross-load 312.296 (84) .000 0.402b 1 .526 .058 .038 .965 384.296
CD with Gsm cross-load 309.685 (84) .000 3.013b 1 .083 .058 .039 .965 381.685

Note. N ⫽ 800. WAIS–IV ⫽ Wechsler Adult Intelligence Scale—Fourth Edition; CHC ⫽ Cattell–Horn–Carroll; Gf ⫽ fluid reasoning; Gv ⫽ visual
processing; Gsm ⫽ short-term memory; Gc ⫽ crystallized intelligence; AR ⫽ Arithmetic; FW ⫽ Figure Weights; MR ⫽ Matrix Reasoning; PCm ⫽ Picture
Completion; CA ⫽ Cancellation; SS ⫽ Symbol Search; CD ⫽ Coding; RMSEA ⫽ root mean square error of approximation; SRMR ⫽ standardized root
mean square residual; CFI ⫽ comparative fit index; AIC ⫽ Akaike information criterion.
a
Compared with previous model. b Compared with initial CHC model.
126 BENSON, HULAC, AND KRANZLER

the data relative to the model with separate Gf and Gv factors, as cross-loading was not included in the final model because the mag-
indicated by a substantial increase in the AIC value relative to the nitude of this loading (.10) was small and statistically nonsignificant
initial CHC model. Calibration analyses also suggest that Arith- with the calibration sample and of modest magnitude with the vali-
metic is a mixed measure of Gf and Gsm. Although Arithmetic and dation sample (.25).
Figure Weights both require reasoning with quantitative concepts, The final model was subjected to additional validation analysis
allowing these subtests to have a nonzero residual covariance did with the averaged matrix for all age-based groups from 16 years to 69
not yield a significant improvement in model fit, and the magni- years, 11 months, of age. Although the final model failed the ␹2 null
tude of the observed covariance was small and not statistically hypothesis test of perfect fit, ␹2(84, N ⫽ 1800) ⫽ 615.929, p ⫽ .000,
significant. Thus, the relationship between Arithmetic and Figure alternative fit indices confirm that CHC theory provides a good
Weights appears to be influenced primarily by Gf. More specifi- description of WAIS–IV performance (SRMR ⫽ .036, RMSEA ⫽
cally, these tests appear to measure quantitative reasoning (QR), .059, CFI ⫽ .965). The final CHC model provides a far superior fit
which has been defined as a narrow cognitive ability caused by Gf when compared with the WAIS–IV scoring structure, as indicated by
that pertains to inductive and deductive reasoning with concepts a substantial drop in ␹2 relative to the scoring structure, ⌬␹2(2) ⫽
involving mathematical relations (Carroll, 1993). Allowing Arith- 280.784, p ⫽ .000. Moreover, alternative fit indices suggest that the
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

metic to have an additional cross-load with Gc yielded a modest WAIS–IV scoring structure provides inferior fit to the averaged
yet statistically significant (at the .05 level) drop in ␹2 value as matrix for ages 16 years to 69 years, 11 months, relative to the CHC
well as a modest drop in the AIC value, which provided some structure (SRMR ⫽ .042, RMSEA ⫽ 0.072, CFI ⫽ 0.946).
preliminary support for this cross-loading. Calibration analyses As shown in Figure 3, with the exception of the Arithmetic with
also supported a cross-loading of Picture Completion with Gc and Gsm loading of .28, the factor-loadings for the final model are
a cross-loading of Cancellation with Gsm. robust and consistent with CHC theory. Second-order factor load-
ings (i.e., broad CHC abilities with psychometric g) are all robust.
Cross-Validation of Plausible Structural Alternatives Gs was found to have the lowest second-order loading (.66),
for the WAIS–IV whereas Gf was found to have the strongest (.99). In order to better
understand the nature of these second-order loadings, we examined
Factor loadings that yielded statistically significant improvements the indirect effects of psychometric g on subtests. These indirect
in model fit during calibration analyses were subjected to cross- effects, which are presented in Table 3, can be interpreted as
validation. Thus, they were retested, both in isolation and in combi- estimates of subtests’ g loadings. Cancellation was found to have
nation, with the averaged matrix for the validation sample. As can be a weak g loading of .37. All other subtests have g loadings above
seen in Table 2, some of the salient findings from calibration analyses .50. Figure Weights had the highest g loading (.78), followed
were not replicated during cross-validation. Additionally, even though closely by Arithmetic (.75) and Vocabulary (.74).
some factor loadings yielded statistically significant drops in ␹2 when The high g loadings of two Gf tasks that seemingly measure QR
tested in isolation the magnitudes of these loading were small and led us to test an additional model with a latent QR factor consisting
statistically nonsignificant when tested in combination with other of Figure Weights and Arithmetic. This model yielded a statisti-
salient loadings identified during calibration. The change in model fit cally significant drop in ␹2, ⌬␹2(1) ⫽ 6.630, p ⫽ .010, and a
that resulted from adding the Cancellation with Gsm cross-loading modest drop in the AIC relative to the final CHC model. The
was small and nonsignificant. Additionally, the Picture Completion first-order factor loadings on QR were .80 for Figure Weights and
with Gc cross-loading was not supported, as evidenced by modest .54 for Arithmetic. In turn, loadings on Gf in this model consisted
drops in ␹2 and AIC as well as a statistically nonsignificant and weak of .96 for QR and .69 for Matrix Reasoning.
factor loading (i.e., a standardized estimate of .09). Therefore, the
Cancellation with Gsm and Picture Completion with Gc cross-
Research Question 3: Cross-Age Invariance
loadings were not included in the final model shown in Figure 3.
Finally, although the Arithmetic with Gc cross-loading yielded reduc- The third research question was addressed with a series of
tion in AIC value and a statistically significant drop in ␹2, this analyses designed to test both the magnitude and location of bias

Table 2
Summary of Cross-Validation Results for Plausible Alternative Models

Model ␹2(df) p ⌬␹2 ⌬df p RMSEA SRMR TLI AIC

WAIS–IV scoring model 375.312 (86) .000 — — — .065 .043 .947 443.312
Initial CHC model 313.957 (85) .000 — — — .058 .039 .957 383.957a
AR with Gsm cross-load 282.329 (84) .000 31.628b 1 .000 .054 .037 .962 354.329
AR with Gsm and Gc cross-load 259.419 (83) .000 22.910a 1 .000 .052 .035 .966 333.419
PCm with Gc cross-load 309.049 (84) .000 4.908b 1 .027 .058 .037 .957 381.049
CA with Gsm cross-load 312.460 (84) .000 1.497b 1 .221 .058 .038 .957 384.460

Note. N ⫽ 200. WAIS–IV ⫽ Wechsler Adult Intelligence Scale—Fourth Edition; CHC ⫽ Cattell-Horn-Carroll; Gf ⫽ fluid reasoning; Gv ⫽ visual
processing; Gc ⫽ crystallized intelligence; Gsm ⫽ short-term memory; AR ⫽ Arithmetic; FW ⫽ Figure Weights; PCm ⫽ Picture Completion; CA ⫽
Cancellation; RMSEA ⫽ root mean square error of approximation; SRMR ⫽ standardized root mean square residual; TLI ⫽ Tucker Lewis index; AIC ⫽
Akaike information criterion.
a
Compared with previous model. b Compared with initial CHC model.
INDEPENDENT EXAMINATION OF THE WAIS–IV 127

uf1 Similarities u1
.84
.89 Vocabulary u2
Gc .79
Information u3
.84
uf2
Comprehension u4
.82
.83 Block Design u5
Gv
.80
Matrix Reasoning u6
.85 uf3
.69
.62 Visual Puzzles u7
.99 .78
g Gf Figure Weights u8

Picture Completion u9
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

.78 uf4 .54


This document is copyrighted by the American Psychological Association or one of its allied publishers.

.85 Digit Span u10


.66 .28
Gsm Arithmetic u11
.80
Letter-Number u12
Sequencing
uf5
.79 Symbol Search u13
.79
Gs Coding u14
.56

Cancellation u15

Figure 3. Final cross-validated Cattell–Horn–Carroll interpretative model, estimated with an averaged matrix
for ages 16 years to 69 years, 11 months, of the standardization sample. The numbers next to u denote the subtest
to which the unique subtest variance (u) corresponds. The numbers next to uf denote the first-order factor to
which the unique factor variance (uf) corresponds. g ⫽ psychometic g; Gc ⫽ crystallized ability; Gv ⫽ visual
processing; Gf ⫽ fluid reasoning; Gsm ⫽ short-term memory; and Gs ⫽ processing speed.

(Keith, 2005). The first step provided a general test of invariance Results suggest that the covariance matrix is reasonably invari-
that is independent of a specific factor structure, and a specific ant across age groups (SRMR ⫽ .078, RMSEA ⫽ .024, CFI ⫽
factor structure was imposed as a second step. Thus, observed .939). Although two of the fit indices supported invariance of the
differences in fit between the models examined in these steps WAIS–IV factor structure (RMSEA ⫽ .023, CFI ⫽ .934), the
reflected lack of invariance arising from the factor structure. Next, SRMR value of .098 suggested that imposition of the WAIS–IV
the location of bias was located by systematically relaxing model factor structure resulted in a large increase in the average differ-
parameters (i.e., factor loadings, higher order factor loadings, ence between the observed subtest correlation matrix and the
factor variances, and unique variances). matrix implied by the structure. Moreover, this imposition resulted
in a large and statistically significant increase in ␹2, ⌬␹2(112) ⫽
Table 3 180.098, p ⫽ .000. This means that although the WAIS–IV
Estimated g Loadings of WAIS–IV Subtests subtests appear to measure similar constructs with similar levels of
measurement error across age levels, as indicated by consistency
Subtest g loading of subtest variance and covariance across different age groups, the
Arithmetic .75 WAIS–IV scoring structure does not provide a consistent descrip-
Block Design .70 tion of test performance across age groups. Relaxing constraints on
Cancellation .37 factor loadings resulted in a large and statistically significant drop
Coding .53 in ␹2 relative to the model with all parameters constrained,
Comprehension .69
Digit Span .67
⌬␹2(88) ⫽ 140.822, p ⫽ .000. Thus, much of the lack of invari-
Figure Weights .78 ance across age groups appears to result from age-related differ-
Information .65 ences in the strengths of relations between subtests and latent
Letter–Number Sequencing .63 constructs. Given that a CHC model fits the standardization data
Matrix Reasoning .69
Picture Completion .53
better than does the WAIS–IV scoring structure, the nature of
Similarities .70 cross-age differences in factor loadings for the WAIS–IV scoring
Symbol Search .52 structure were not explored. Instead, we tested the hypothesis that
Visual Puzzles .68 the CHC model had invariant factor loadings across age-based
Vocabulary .74
groups.
Note. WAIS–IV ⫽ Wechsler Adult Intelligence Scale—Fourth Edition; Constraining factor loadings to be equal across age groups
g ⫽ psychometric g. resulted in a large and statistically significant increase in ␹2,
128 BENSON, HULAC, AND KRANZLER

⌬␹2(88) ⫽ 128.396, p ⫽ .003, relative to an unconstrained model. structure is not invariant across age groups, (d) adopting a CHC-
Thus, the lack of cross-age invariance for factor loadings was not inspired structure does not resolve the lack of invariance across
resolved merely by adopting a CHC structure. Constraints for age groups, and (e) lack of cross-age invariance can be attributed
individual factor loadings and combinations of factor loadings to inconsistent factor loadings for the Cancellation and Letter–
were relaxed to determine which relations between subtests and Number Sequencing subtests on their corresponding broad abilities
latent constructs were accounting for the lack of invariance. Re- (i.e., Gs and Gsm, respectively). Implications of these findings and
laxing constraints on factor loadings for two tests hypothesized as interpretative considerations are discussed in the following sec-
being sensitive to age effects (i.e., Cancellation and Letter– tions.
Number Sequencing) while maintaining cross-age constraints for
all other factor loadings resulted in a nonsignificant increase in ␹2 Does CHC Theory Provide a Superior Alternative for
relative to an unconstrained model, ⌬␹2(72) ⫽ 89.460, p ⫽ .080.
Test Interpretation?
Thus, lack of cross-age invariance for factor loadings can be
attributed to the Cancellation and Letter–Number Sequencing The results of this study support use of an interpretative frame-
subtests. Inspection of factor loadings within each age group work for WAIS–IV performance that is consistent with CHC
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

revealed variation in the magnitude of factor loadings for Cancel- theory. Given the popularity of CHC theory and cross-battery
lation and Letter–Number Sequencing. Standardized loadings for assessment (Alfonso et al., 2005), many test users are likely to use
Cancellation range from highs of .64 at ages 20 years to 24 years, a CHC interpretive framework. Although the WAIS–IV scoring
11 months, and .62 at ages 16 years to 17 years, 11 months, to lows structure is largely congruent with CHC theory, there are some
of .43 at ages 35 years to 44 years, 11 months, and .52 at ages 45 important differences to consider when interpreting test scores.
years to 54 years, 11 months. Standardized loadings for Letter– Most notably, interpretation of the Perceptual Reasoning Index is
Number Sequencing ranged from a high of .87 at ages 25 years to inconsistent with CHC theory. The results of this study support
29 years, 11 months, to a low of .68 at ages 18 years to 19 years, interpreting separate Gf and Gv indexes rather than the Perceptual
11 months, with the majority of loadings having a magnitude Reasoning Index. The remaining differences between the
above .80. In addition to the 18 years to 19 years, 11 months age WAIS–IV scoring structure and CHC theory are discussed in the
group, standardized loadings for Letter–Number Sequencing also following section because these differences pertain to subtest
fell below the .80 level for ages 35 years to 44 years, 11 months factor loadings.
(.78), 45 years to 54 years, 11 months (.79), and 65 years to 69
years, 11 months (.77). What Does the WAIS–IV Measure?
Invariance was indicated by a nonsignificant increase in ␹2
relative to an unconstrained model, ⌬␹2(80) ⫽ 96.092, p ⫽ .106, Consistent with the results of previous research regarding recent
as well as alternative fit indices evident (SRMR ⫽ .056, Wechsler revisions (Keith et al., 2006), results suggest that the
RMSEA ⫽ .019, CFI ⫽ 0.964) when cross-age constraints for Verbal Comprehension Index can be interpreted “as measuring
Cancellation were removed only at ages 16 years to 17 years, 11 verbal ability, comprehension, knowledge, and crystallized intel-
months, 20 years to 24 years, 11 months, 35 years to 44 years, 11 ligence” (p. 123). As noted in Table 4, the WAIS–IV Perceptual
months, and 45 years to 54 years, 11 and cross-age constraints for Reasoning subtests appear to measure Gf and Gv. The analyses
Letter–Number Sequencing were removed only at ages 35 years to indicate that Block Design, Visual Puzzles, and Picture Comple-
44 years, 11 months, 45 years to 54 years, 11 months, and 65 years tion subtests can be grouped together and used as a measure of Gv.
to 69 years, 11 months. Additionally, second-order loadings are Likewise, the Matrix Reasoning, Figure Weights, and Arithmetic
invariant when the preceding age constraints are removed for subtests can be interpreted as measures of Gf. Although Arithmetic
relations between psychometric g and broad abilities that have was found to measure a complex mix of Gf and Gsm (and to a
first-order loadings on Letter–Number Sequencing and Cancella- lesser extent Gc), it appears that it is primarily a measure of Gf.
tion (i.e., Gsm and Gs, respectively), as indicated by a nonsignif- The results of this study also suggest that Figure Weights and
icant increase in ␹2 relative to an unconstrained model, ⌬␹2(96) ⫽ Arithmetic can be combined to form a viable index of QR. As the
112.081, p ⫽ .125, and alternative fit indices (SRMR ⫽ .061, WAIS–IV only contains three measures of Gf, two of which are
RMSEA ⫽ .019, CFI ⫽ .964). measures of QR, we did not attempt to include QR in the final
model. However, this does not preclude interpretation of the QR
Discussion factor. The results of this study also suggest that Cancellation
should not be used when calculating a Gs composite score. Finally,
The purpose of this study was to address some questions essen- Letter–Number Sequencing appears to have an unstable factor
tial to the application of the WAIS–IV in intellectual assessment. loading across age groups and Arithmetic appears to be a second-
First, does a CHC-based theoretical structure provide a better ary measure of Gsm. Thus, Gsm composite scores are sensitive to
explanation of test performance than does the WAIS–IV scoring age effects when using the Letter–Number Sequencing Subtest,
structure? Second, what exactly is the nature of the constructs and Arithmetic does not appear to be an acceptable substitute when
measured by the WAIS–IV? Finally, does the WAIS–IV measure calculating this composite.
the same constructs across age levels? Using methods developed by Tellegen and Briggs (1967) and
The results of this study reveal that (a) a CHC-inspired structure adapted by Sattler (2008), we have provided formulas in Table 5
provides a better description of test performance relative to the that WAIS–IV users can use to calculate Gv, Gf, and QR compos-
published scoring structure, (b) the WAIS–IV measures several ites for individuals ages 16 years to 69 years, 11 months. For Gf,
abilities, including Gc, Gf, Gv, Gsm, Gs, and QR, (c) the scoring if performance on the Arithmetic subtest is inconsistent with the
INDEPENDENT EXAMINATION OF THE WAIS–IV 129

Table 4 from this scoring system can be considered to have somewhat


CHC Abilities Measured by WAIS–IV Subscales different meanings across age groups.
Subsequent analyses revealed that the observed lack of cross-
WAIS–IV subtest CHC ability age invariance resulted from cross-age differences in the magni-
Verbal Comprehension Index tude of factor loadings. This lack of invariance was not resolved
Similarities Gc merely by adopting a CHC structure. Further analyses revealed
Vocabulary Gc that the lack of cross-age invariance for factor loadings could be
Information Gc attributed to age-related variation in the magnitudes of the Can-
Comprehension Gc
cellation and Letter–Number Sequencing subtests on correspond-
Perceptual Reasoning Index
Block Design Gv ing broad abilities (e.g., Gs and Gsm, respectively). The seemingly
Matrix Reasoning Gf erratic pattern of age-related variation observed for these factor
Visual Puzzles Gv loadings, as well as the modest sample size for each age-
Figure Weights Gf/QR differentiated subgroup, suggest that this variation may arise from
Picture Completion Gv
sampling bias rather than age-related changes in cognitive func-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Working Memory Index


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Digit Span Gsm tioning.


Arithmetic Gf/QR (primary); Gsm (secondary)
Letter–Number Gsm
Sequencing
Limitations and Directions for Future Research
Processing Speed Index Understanding the factor structure of test instruments provides
Symbol Search Gs
Coding Gs valuable interpretive information for clinicians who are seeking to
Cancellation Gs understand the ideas measured through a particular instrument. To
that end, it is important to note that not every subtest of the
Note. CHC ⫽ Cattell–Horn–Carroll; Gc ⫽ crystallized intelligence; WAIS–IV is administered to every client. In particular, examinees
Gv ⫽ visual processing; Gf ⫽ fluid reasoning; Gsm ⫽ short-term memory;
over the age of 70 years are not assessed with three of the
Gs ⫽ processing speed; QR ⫽ quantitative reasoning.
subscales. Because the present analysis excluded examinees age 70
years or older, practitioners will not be able to apply these findings
to clients in this age range.
other two subtests, the sum of the other two scale scores (Matrix It is also important to keep in mind a comparison of the
Reasoning and Figure Weights) can be used. Similarly, a clinician WAIS–IV with other measures was not included in this study. A
can sum Figure Weights and Arithmetic to calculate a QR com- substantive study with multiple measures of intelligence is needed
posite. We calculated reliability coefficients for the preceding to clarify the narrow and broad abilities measured, which will
composites by inputting stability coefficients obtained from the improve understanding of what constructs the WAIS–IV measures
WAIS–IV technical and interpretive manual (Wechsler, 2008b) as (Keith, Kranzler, & Flanagan, 2001). For example, although the
well as the correlation coefficients of component subtests into a Arithmetic subtest was intended to be a Gsm measure, when the
reliability formula developed by Tellegen and Briggs (1967). The Figure Weights subtest is included in analyses, Arithmetic appears
resulting reliability coefficients were .89 for the Gv composite, .89 to be a better measure of Gf, and more narrowly QR, than it is of
for the Gf composite obtained from three subtests, .84 for the Gf Gsm. Simultaneous examination of multiple measures of intelli-
composite derived from two subtests, and .88 for the QR compos- gence would help clarify other relationships such as this example
ite. The levels of reliability reflected by these coefficients suggest that are obscured when a the structure of a small (relative to a
it is appropriate to use these composite scores for making individ-
ual clinical decisions. Thus, these composites have usefulness for
practitioners wishing to use a CHC interpretative approach. These Table 5
composites also will be useful to practitioners wishing to use a Formulas For Calculating CHC Composites From
cross-battery approach to assessment (Flanagan, Ortiz, & Alfonso, Corresponding Subscales
2007). Subtests
Composite (n) Formula
Does the WAIS–IV Measure the Same Constructs Visual processing (Gv) 3 1.995 ⫻ (BD ⫹ Vp ⫹ PCm) ⫹ 40.143
Across Age Levels? Fluid reasoning (Gf a) 3 1.980 ⫻ (MR ⫹ FW ⫹ AR) ⫹ 40.614
Fluid reasoning (Gf b) 2 2.822 ⫻ (MR ⫹ FW) ⫹ 43.567
The results of this study provide modest support for the conclu- Quantitative
sion that the WAIS–IV measures the same constructs across age reasoning (QR) 2 2.786 ⫻ (FW ⫹ AR) ⫹ 44.272
groups. The SRMR value of .078 obtained when constraining the
Note. Gf a includes all three subtests of Gf. Gf b does not include the
covariance matrix suggests that some age differences exist, al- Arithmetic subtest. As an example, the visual processing composite for an
though these differences may not be large enough to be practically examinee whose BD scaled score ⫽ 8, VP scaled score ⫽ 6, and PC scaled
meaningful. The fact that the Figure Weights and Cancellation score ⫽ 11 would be 1.995 ⫻ 25 ⫹ 40.143 ⫽ 90.018, which rounds to a
subtests are not given to individuals age 70 years or older suggest standard score of 90. Gv ⫽ visual processing; Gf ⫽ fluid reasoning;
CHC ⫽ Cattell–Horn–Carroll; BD ⫽ Block Design scaled score; Vp ⫽
that test developers were sensitive to age differences in perfor- Visual Puzzles scaled score; PCm ⫽ Picture Completion scaled score;
mance. However, the results of this study do not support invari- MR ⫽ Matrix Reasoning scaled score; FW ⫽ Figure Weights scaled score;
ance of the WAIS–IV scoring model. Thus, test scores derived AR ⫽ Arithmetic scaled score.
130 BENSON, HULAC, AND KRANZLER

study with multiple measures of intelligence) set of subtests are history of intelligence test interpretation. In D. P. Flanagan & P. L.
examined. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests,
Despite these limitations, this study is the first known indepen- and issues (2nd ed., pp. 23–38). New York, NY: Guilford Press.
dent confirmatory factor analysis in which the construct validity of Kaufman, A. S. (1994). Intelligence testing with the WISC–III. New York,
NY: Wiley.
the WAIS–IV is investigated. This empirical study provides sup-
Kaufman, A. S., & Lichtenberger, E. O. (2006). Assessing adolescent and
port for those who wish to use the WAIS–IV as a CHC measure.
adult intelligence. Hoboken, NJ: Wiley.
Similarly, evaluators who are familiar with CHC theory now know Keith, T. Z. (2005). Using confirmatory factor analysis to aid in under-
which subscales relate to which CHC ability and know the formu- standing the constructs measured by intelligence tests. In D. P. Flanagan
las necessary to calculate these composites. & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theo-
ries, tests, and issues (2nd ed., pp. 581– 614). New York, NY: Guilford
Press.
References
Keith, T. Z., Fine, J. G., Taub, G. E., Reynolds, M. R., & Kranzler, J. H.
Alfonso, V. C., Flanagan, D. P., & Radwan, S. (2005). The impact of (2006). Higher order, multisample, confirmatory factor analysis of the
Cattell–Horn–Carroll theory on test development and interpretation of Wechsler Intelligence Scale for Children—Fourth Edition: What does it
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

cognitive and academic abilities. In D. P. Flanagan & P. L. Harrison measure? School Psychology Review, 35, 108 –127.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

(Eds.), Contemporary intellectual assessment: Theories, tests, and issues Keith, T. Z., Kranzler, J. H., & Flanagan, D. P. (2001). What does the
(2nd ed., pp. 185–202). New York, NY: Guilford Press. Cognitive Assessment System (CAS) measure? Joint confirmatory fac-
American Educational Research Association, American Psychological As- tor analysis of the CAS and the Woodcock–Johnson Tests of Cognitive
sociation, & National Council on Measurement in Education. (1999). Abilities (3rd ed.). School Psychology Review, 35, 3089 –3119.
Standards for educational and psychological testing (3rd ed.). Wash- Kline, R. B. (2005). Principles and practices of structural equation mod-
ington, DC: AERA. eling (2nd ed.). New York, NY: Guilford Press.
Arbuckle, J. L. (2007). Amos 7.0 [Computer software]. Chicago, IL: Loevinger, J. (1957). Objective tests as instruments of psychological the-
Smallwaters. ory. Psychological Reports, 3, 635– 694.
Barrett, P. (2007). Structural equation modeling: Adjudging model fit. McGrew, K. S. (1997). Analysis of the major intelligence batteries accord-
Personality and Individual Differences, 42, 815– 824. ing to a proposed comprehensive Gf-Gc framework. In D. P. Flanagan,
Benson, J. (1998). Developing a strong program of construct validation: A J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual
test anxiety example. Educational Measurement: Issues and Practice, assessment: Theories, tests, and issues (pp. 151–179). New York, NY:
17, 10 –17. Guilford Press.
Bentler, P. M. (2007). On tests and indices for evaluating structural models. McGrew, K. S. (2005). The Cattell–Horn–Carroll theory of cognitive
Personality and Individual Differences, 42, 825– 829. abilities: Past, present, and future. In D. P. Flanagan & P. L. Harrison
Bickley, P. G., Keith, T. Z., & Wolfle, L. M. (1995). The three-stratum (Eds.), Contemporary intellectual assessment: Theories, tests, and issues
theory of cognitive abilities: Test of the structure of intelligence across (2nd ed., pp. 136 –181). New York, NY: Guilford Press.
the life span. Intelligence, 20, 309 –328. Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic sensitivity of alternative indices in tests of measurement invariance.
studies. New York, NY: Cambridge University Press. Journal of Applied Psychology, 93, 568 –592.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit Miles, J., & Shevlin, M. (2007). A time and place for incremental fit
indexes for testing measurement invariance. Structural Equation Mod- indices. Personality and Individual Differences, 42, 869 – 874.
eling, 9, 233–255. Sattler, J. M. (2008). Resource guide to accompany assessment of children:
Fan, X., & Silvo, S. A. (2005). Sensitivity of fit indexes to misspecified Cognitive foundations (5th ed.). La Mesa, CA: Jerome M. Sattler.
structural or measurement model components: Rationale of two-index Tellegen, A., & Briggs, P. F. (1967). Old wine in new skins: Grouping
strategy revisited. Structural Equation Modeling, 12, 343–367. Wechsler subtests into new scales. Journal of Consulting Psychology,
Flanagan, D. P., & McGrew, K. S. (1997). A cross-battery approach to 31, 499 –506.
assessing and interpreting cognitive abilities: Narrowing the gap be- Wechsler, D. (1939). Wechsler–Bellevue Intelligence Scale. New York,
tween practice and cognitive science. In D. P. Flanagan, J. L. Genshaft, NY: Psychological Corporation.
& P. L. Harrison (Eds.), Contemporary intellectual assessment: Theo- Wechsler, D. (1940). Nonintellective factors in general intelligence. Psy-
ries, tests, and issues (pp. 314 –325). New York, NY: Guilford Press. chological Bulletin, 37, 444 – 445.
Flanagan, D. P., Ortiz, S. O., & Alfonso, V. C. (2007). Essentials of Wechsler, D. (2003). Wechsler Intelligence Scale for Children—Fourth
cross-battery assessment (2nd ed.). Hoboken, NJ: Wiley. Edition. San Antonio, TX: Harcourt.
Gustafsson, J. E. (1984). A unifying model for the structure of intellectual Wechsler, D. (2008a). Wechsler Adult Intelligence Scale—Fourth Edition.
abilities. Intelligence, 8, 179 –203. San Antonio, TX: Pearson Assessment.
Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf-Gc theory. Wechsler, D. (2008b). Wechsler Adult Intelligence Scale—Fourth Edition:
In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary Technical and interpretive manual. San Antonio, TX: Pearson Assess-
intellectual assessment: Theories, tests, and issues (pp. 53–91). New ment.
York, NY: Guilford Press. Yuan, K. H. (2005). Fit indices versus test statistics. Multivariate Behav-
Hoyle, R. H., & Panter, A. T. (1995). Writing about structural equation ioral Research, 40, 115–148.
modeling. In R. H. Hoyle (Ed.), Structural equation modeling: Con- Zhu, J., & Weiss, L. (2005). The Wechsler Scales. In D. P. Flanagan &
cepts, issues, and applications (pp. 158 –176). Thousand Oaks, CA: P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories,
Sage. tests, and issues (2nd ed., pp. 297–324). New York, NY: Guilford Press.
Jensen, A. R. (1987). Individual differences in mental ability. In J. A.
Glover & R. R. Ronning (Eds.), Historical foundations of educational Received June 25, 2009
psychology (pp. 61– 88). New York, NY: Plenum. Revision received August 25, 2009
Kamphaus, R. W., Winsor, A. P., Rowe, E. W., & Kim, S. (2005). A Accepted August 30, 2009 䡲

You might also like