/  8
 
Aging and Vocabulary Scores: A Meta-Analysis
Paul Verhaeghen
Syracuse University
Vocabulary scores were examined in a total of 210 articles, containing 324 independent pairings of younger and older adults, from the 1986–2001 issues of 
Psychology and Aging.
The average effect size,favoring the old, was 0.80
SD.
Production tests yielded smaller effects (0.68
SD
) than multiple-choicetests (0.93
SD
). Both age and education were found to be partially independent determinants of performance in production tests; age effects disappeared in multiple-choice tests as soon as education wastaken into account. In addition, the Wechsler Adult Intelligence Scale—Revised Vocabulary subtest (D.Wechsler, 1981) was also found to be sensitive to the Flynn effect (J. R. Flynn, 1987; i.e., increasing testscores with advancing birth year). The results question the approach of using age-group equality invocabulary scores as a check on sample equivalence.
One of the oldest findings in the cognitive aging literature is thatthere is a sharp distinction in age trajectories between tests tappingexperience and knowledge, which show little or no decline overthe life span, and tests requiring online processing and mentalmanipulation, which do show decline. This distinction has beencaptured under names such as acquired abilities versus basicintelligence (Jones & Conrad, 1933), crystallized versus fluidintelligence (Horn & Cattell, 1967), or the pragmatics and me-chanics of intelligence (Baltes, 1987). In aging research, thisknowledge or experience component is most often measured byvocabulary tests, in which participants have to define words (i.e.,a production task) or choose from among alternatives a closesynonym (sometimes an antonym) for a word presented (i.e., amultiple-choice task). Vocabulary test scores are often used (e.g.,Lezak, 1995, p. 103) to estimate premorbid ability level. In fact,one of the most commonly used tests in aging research, the ShipleyInstitute of Living Scale (Shipley Scale; Shipley, 1946), wasexplicitly designed to test for mental deterioration in the context of organic or mental disorders. Likewise, Wechsler (1958) devised adeterioration quotient, in which performance on tests that arepresumably insensitive to changes associated with aging is com-pared with performance on tests that show age-related decline, andvocabulary is one of the tests considered aging insensitive. If theLezak–Shipley–Wechsler conjecture is correct, then results con-cerning the size of age differences in fluid cognition are only validif the vocabulary scores of the younger and older adults tested incognitive aging studies are identical.It is interesting to note that although many narrative and meta-analytic reviews exist on age relations in fluid intelligence, no suchreview exists, to my knowledge, concerning vocabulary scores.This article aims at overcoming this oversight. I present resultsfrom a meta-analysis conducted on articles containing vocabularyscores, as published in the 1986–2001 volumes of one journal,
Psychology and Aging.
The aim of the analysis is, first, to describeage differences in vocabulary scores and, second, to investigate theinfluence and confluence of selected independent variables onvocabulary scores. The first variable considered is year of publi-cation, which can be considered an index for historical change,birth cohort membership, or both (in a cross-sectional design, thesecannot be distinguished; Schaie, 1996). The second variable con-sidered is level of education, which is expected to correlate posi-tively with the size of an individual’s vocabulary (Lezak, 1995, p.539 and following). The third variable is the type of vocabularytask, that is, whether the test format is multiple choice or whetherthe participant is required to provide word definitions (i.e., aproduction task). Production tasks require word finding, andmultiple-choice tasks require correct recognition, and older adultsmay have trouble with the former and not the latter (e.g., Burke,MacKay, & James, 1999). The fourth variable is chronologicalage. The descriptive analysis mainly consists of calculatingmean standardized differences between young and older adults;for the exploration of the influence of year of publication, task type, education, and age, both mean standardized differences andraw scores on the two most frequently used tests (the WechslerAdult Intelligence Scale—Revised [WAIS–R] Vocabulary subtest[Wechsler, 1981] and the Shipley scale) were used.
Method
Sample of Studies
Because many studies on cognition and aging report vocabulary mea-sures as part of the descriptive information about their samples, I decidedto deviate from the usual sampling approach in meta-analysis (i.e., usingsearch terms in search engines, which would not be a successful strategy)and to examine the full population of articles of the main journal in thefield,
Psychology and Aging.
All studies were included that (a) reported ameasure of vocabulary and (b) examined a sample of younger adults(average age older than 18 and younger than 30; we allowed one sample of prospective and first-year Harvard students with a mean age of 17.8 thatwas part of a multiexperiment study; Schacter, Koutstaal, Johnson, Gross,Paul Verhaeghen, Department of Psychology and Center for Health andBehavior, Syracuse University.This research was supported in part by a grant from the NationalInstitute on Aging (AG-16201).Correspondence concerning this article should be addressed to PaulVerhaeghen, Department of Psychology, 430 Huntington Hall, Syra-cuse University, Syracuse, New York 13244-2340. E-mail: pverhaeg@psych.syr.edu
Psychology and Aging Copyright 2003 by the American Psychological Association, Inc.2003, Vol. 18, No. 2, 332339 0882-7974/03/$12.00 DOI: 10.1037/0882-7974.18.2.332
332
 
& Angell, 1997) as well as a sample of older adults (average age of 60 orolder). All issues of Volumes 1
16 (1986
2001) were hand searched. Atotal of 210 articles were included in the database, containing a total of 324independent pairings of young and older adults.
1
Descriptive informationfor the data set is reported in Table 1.
Tests Included 
Five tests were used in more than 10 studies: the Vocabulary subtest of the WAIS (Wechsler, 1955) or the WAIS
R (Wechsler, 1981), The Mill-Hill Vocabulary Scale (Raven, 1982), the Nelson
Denny Reading Test(Nelson & Denny, 1960), the Shipley scale (Shipley, 1946), and one of thevocabulary tests included in the Educational Testing Services (ETS) Kit of Factor Referenced Tests (Ekstrom, French, Harman, & Derman, 1976). Of these, the WAIS and WAIS
R are production tests (i.e., the participantsupplies a dictionary-like description for each of a series of words present-ed); the other tests are multiple-choice tests (i.e., the participant chooses asynonym or description among a set of alternatives for each of a series of words presented).
Statistical Analyses
Two types of analyses were conducted. First, traditional effect-sizeanalysis (Hedges & Olkin, 1985) was used to determine the size of the ageeffect on vocabulary scores. Size of the effect was expressed as the meanstandardized difference, that is, the mean of older adults
performanceminus younger adults
performance, divided by the pooled standard devi-ation. When mean or standard deviation was not reported, inferentialstatistics, if available, were used to determine effect sizes. An overall effectsize and separate average effect sizes for each test and for each type of task (i.e., production vs. multiple choice) were calculated. Multiple regressionanalysis was used to investigate the possible influence of historic differ-ences, age differences, differences in educational level, and task type inlevel of education on the effect.The second type of analysis concerned a within-task analysis of the twotypes of vocabulary measures most frequently reported, namely theWAIS
R Vocabulary subtest (a production task) and the Shipley scale (amultiple-choice task). For both of these tasks, I investigated the influenceof historical differences, age, and educational level on the raw scores of thetest in a series of weighted least squares regression analyses, weighting forsample size.
Results
 Effect-Size Analysis
Averaged effect sizes are reported in Table 2. All effect sizes aresignificant and favor the older adults, as indicated by the findingthat all lower limits of the 95% confidence intervals are larger thanzero. The overall effect size, highly heterogeneous (
Q
w
1563.12), is 0.80. When the sample was split into a group of effectsizes derived from production tasks and a group derived frommultiple-choice tasks, it was found that the former groupingyielded a significantly smaller effect size than the latter(
Q
B
24.24), but each of these remained heterogeneous. Splittingthe sample by test, likewise, did not result in homogeneity of effectsizes, with the exception of the mean weighted effect size for theNelson
Denny test. Figure 1 offers a funnel plot of the data (Light& Pillemer, 1982). In a funnel plot, sample size is plotted againsteffect size. If the law of large numbers holds, then these plotsshould be inverse-funnel shaped; that is, with increasing numbersof participants, the effect sizes should become increasingly lessvariable and closer to the average value. Note, first, that this seemsto be the case here, and, second, that the funnel plot is very regular,that is, there are no gaps or asymmetries, with the exception of aclear outlier with an effect size larger than 7. Removing this outlierdid not change the results of any of our analyses (this study had avery small sample size of 18 younger and 18 older adults, and inall analyses, the appropriate weighting coefficients were used asoutlined in Hedges and Olkin (1985), and these coefficients are afunction of sample size). In a first exploration of heterogeneity, weidentified all studies with a standardized residual larger than 3 andsmaller than
3. Removing those from the data set did not resultin homogeneity.The source of the heterogeneity was further explored in a seriesof multiple regression analyses, using the method advocated byHedges and Olkin (1985). The results are reported in Table 3. In afirst regression model (Model 1), the predictors were (a) year of publication, (b) a dummy variable coding for production versusmultiple-choice format, (c) the older
younger difference in age inthe study, and (d) the older
younger difference in years of educa-tion in the study (
222). The results indicate that larger agedifferences in the sample led to larger effect sizes (i.e., to largerage differences in vocabulary scores favoring older adults), largereducation differences in the sample led to larger effect sizes, and
1
Given the size of this data set, I did not list these references in theReference section; however, interested readers can obtain the full data setin the form of an Excel spreadsheet from Paul Verhaeghen or from theInternet on the PsycARTICLES database at http://spider.apa.org/ftacomp/ index.cfm?CFID
2081538&CFTOKEN
5006231.
Table 1
 Descriptive Statistics for the Meta-Analytic Data Set (k 
324)
Variable
k M SD
Minimum MaximumNo. of younger participants 320 35.40 27.39 6 244No. of older participants 321 40.01 41.99 6 429Age (younger) 319 21.39 2.54 17.8 34.6Age (older) 320 70.42 2.60 62 79.9Age difference (older
younger) 319 49.04 3.69 36.1 61.4Years of education (younger) 249 14.13 1.12 11.9 17.5Years of education (older) 257 15.04 1.40 9.5 17.8Years of education difference (older
younger) 249 0.91 1.66
3.95 4.1Proportion of women (younger) 166 58.25 21.69 0 100Proportion of women (older) 167 54.92 23.21 0 100
 Note.
number of studies.
333
AGING AND VOCABULARY META-ANALYSIS
 
multiple-choice tests yielded larger effect sizes than productiontests. The coefficient associated with year of publication was notsignificant. To test whether age differences and education differ-ences have differential influences in production and multiple-choice tests, a second model (Model 2) was run, in which theinteraction between the age difference and response format wasintroduced, as well as the interaction between educational leveldifference and response format. Of the interaction terms, only theterm involving the age difference became significant, indicatingthat the effects of differences in chronological age varied signifi-cantly between response formats.Given that response format interacted with age, a follow-upmultiple regression analysis was conducted, splitting the sampleaccording to response format (Table 4). In studies using a produc-tion format (
123), both the age difference and the differencein years of education influenced the effect size, with larger agedifferences and larger education differences leading to larger dif-ferences in vocabulary scores favoring the older adults. Inmultiple-choice tests (
99), the difference in educational levelwas significantly associated with effect size (larger differences ineducation leading to larger age differences in vocabulary favoringthe old), but the age difference was not.Table 2
 Effect Sizes (Mean Standardized Differences) for Age in Vocabulary Scores
Vocabulary measure
k
LL of 95% CI UL of 95% CI
Q
w
All measures 279 0.80 0.77 0.83 1563.12
a
Production 142 0.68 0.63 0.72 873.18
a
Multiple choice 136 0.93 0.89 0.97 608.71
a
WAIS 17 0.63 0.52 0.73 65.15
a
WAIS
R 88 0.69 0.63 0.75 586.64
a
WAIS
R (second half) 11 0.88 0.74 1.02 24.23
a
Mill-Hill 28 0.85 0.73 0.97 97.73
a
Nelson
Denny 10 1.62 1.42 1.81 10.42Shipley 44 0.86 0.79 0.93 171.90
a
ETS (diverse measures) 31 1.12 1.04 1.21 65.15
a
 Note.
number of studies;
average weighted effect size for age (positive values denote that olderadults score higher than younger adults); LL
lower limit; CI
confidence interval; UL
upper limit;
Q
w
within-group homogeneity (chi-square distributed with
df 
1); WAIS
R
Wechsler Adult IntelligenceScale
Revised; Mill-Hill
Mill-Hill Vocabulary Scale; Nelson
Denny
Nelson
Denny Reading Test;Shipley
Shipley
s Institute of Living Scale; ETS
Educational Testing Services.
a
Significant heterogeneity at
p
.05.
Figure 1.
Funnel plot of effect sizes (effect size as a function of number of participants in each study;
279).
334
VERHAEGHEN

Share & Embed

More from this user

Add a Comment

Characters: ...