& Angell, 1997) as well as a sample of older adults (average age of 60 orolder). All issues of Volumes 1
–
16 (1986
–
2001) were hand searched. Atotal of 210 articles were included in the database, containing a total of 324independent pairings of young and older adults.
1
Descriptive informationfor the data set is reported in Table 1.
Tests Included
Five tests were used in more than 10 studies: the Vocabulary subtest of the WAIS (Wechsler, 1955) or the WAIS
–
R (Wechsler, 1981), The Mill-Hill Vocabulary Scale (Raven, 1982), the Nelson
–
Denny Reading Test(Nelson & Denny, 1960), the Shipley scale (Shipley, 1946), and one of thevocabulary tests included in the Educational Testing Services (ETS) Kit of Factor Referenced Tests (Ekstrom, French, Harman, & Derman, 1976). Of these, the WAIS and WAIS
–
R are production tests (i.e., the participantsupplies a dictionary-like description for each of a series of words present-ed); the other tests are multiple-choice tests (i.e., the participant chooses asynonym or description among a set of alternatives for each of a series of words presented).
Statistical Analyses
Two types of analyses were conducted. First, traditional effect-sizeanalysis (Hedges & Olkin, 1985) was used to determine the size of the ageeffect on vocabulary scores. Size of the effect was expressed as the meanstandardized difference, that is, the mean of older adults
’
performanceminus younger adults
’
performance, divided by the pooled standard devi-ation. When mean or standard deviation was not reported, inferentialstatistics, if available, were used to determine effect sizes. An overall effectsize and separate average effect sizes for each test and for each type of task (i.e., production vs. multiple choice) were calculated. Multiple regressionanalysis was used to investigate the possible influence of historic differ-ences, age differences, differences in educational level, and task type inlevel of education on the effect.The second type of analysis concerned a within-task analysis of the twotypes of vocabulary measures most frequently reported, namely theWAIS
–
R Vocabulary subtest (a production task) and the Shipley scale (amultiple-choice task). For both of these tasks, I investigated the influenceof historical differences, age, and educational level on the raw scores of thetest in a series of weighted least squares regression analyses, weighting forsample size.
Results
Effect-Size Analysis
Averaged effect sizes are reported in Table 2. All effect sizes aresignificant and favor the older adults, as indicated by the findingthat all lower limits of the 95% confidence intervals are larger thanzero. The overall effect size, highly heterogeneous (
Q
w
1563.12), is 0.80. When the sample was split into a group of effectsizes derived from production tasks and a group derived frommultiple-choice tasks, it was found that the former groupingyielded a significantly smaller effect size than the latter(
Q
B
24.24), but each of these remained heterogeneous. Splittingthe sample by test, likewise, did not result in homogeneity of effectsizes, with the exception of the mean weighted effect size for theNelson
–
Denny test. Figure 1 offers a funnel plot of the data (Light& Pillemer, 1982). In a funnel plot, sample size is plotted againsteffect size. If the law of large numbers holds, then these plotsshould be inverse-funnel shaped; that is, with increasing numbersof participants, the effect sizes should become increasingly lessvariable and closer to the average value. Note, first, that this seemsto be the case here, and, second, that the funnel plot is very regular,that is, there are no gaps or asymmetries, with the exception of aclear outlier with an effect size larger than 7. Removing this outlierdid not change the results of any of our analyses (this study had avery small sample size of 18 younger and 18 older adults, and inall analyses, the appropriate weighting coefficients were used asoutlined in Hedges and Olkin (1985), and these coefficients are afunction of sample size). In a first exploration of heterogeneity, weidentified all studies with a standardized residual larger than 3 andsmaller than
3. Removing those from the data set did not resultin homogeneity.The source of the heterogeneity was further explored in a seriesof multiple regression analyses, using the method advocated byHedges and Olkin (1985). The results are reported in Table 3. In afirst regression model (Model 1), the predictors were (a) year of publication, (b) a dummy variable coding for production versusmultiple-choice format, (c) the older
–
younger difference in age inthe study, and (d) the older
–
younger difference in years of educa-tion in the study (
k
222). The results indicate that larger agedifferences in the sample led to larger effect sizes (i.e., to largerage differences in vocabulary scores favoring older adults), largereducation differences in the sample led to larger effect sizes, and
1
Given the size of this data set, I did not list these references in theReference section; however, interested readers can obtain the full data setin the form of an Excel spreadsheet from Paul Verhaeghen or from theInternet on the PsycARTICLES database at http://spider.apa.org/ftacomp/ index.cfm?CFID
2081538&CFTOKEN
5006231.
Table 1
Descriptive Statistics for the Meta-Analytic Data Set (k
324)
Variable
k M SD
Minimum MaximumNo. of younger participants 320 35.40 27.39 6 244No. of older participants 321 40.01 41.99 6 429Age (younger) 319 21.39 2.54 17.8 34.6Age (older) 320 70.42 2.60 62 79.9Age difference (older
younger) 319 49.04 3.69 36.1 61.4Years of education (younger) 249 14.13 1.12 11.9 17.5Years of education (older) 257 15.04 1.40 9.5 17.8Years of education difference (older
younger) 249 0.91 1.66
3.95 4.1Proportion of women (younger) 166 58.25 21.69 0 100Proportion of women (older) 167 54.92 23.21 0 100
Note. k
number of studies.
333
AGING AND VOCABULARY META-ANALYSIS
Add a Comment