You are on page 1of 25
JOURNAL OF VERBAL LIARNING AND VERBAL BEHAVIOR 12, 335-359 (1973) ‘The Language-as-Fixed-Effect Fallacy: A Critique of Language Statistics in Psychological Research Hernert H. Lark! Stanford University ‘Current investigators of words, sentences, and other language materials almost never provide statistical evidence that their findings generalize beyond the specific sample of language materials they have chosen. Nevertheless, these same investigators do not hesitate ‘toconclude that their findings are true for language in general. In so doing, itis argued, they are committing the language-as-fixed-ffect fallacy, which can lead to serious error, The problem is illustrated for one well-known series of studies in semantic memory. With the appropriate statistics these studies are shown to provide no reliable evidence for most ofthe main conclusions drawn from them. A review of other experiments in semantic memory shows that many of them are likewise suspect. It is demonstrated how this fallacy can be ‘voided by doing the right statistics, seleting the appropriate design, and sampling by systematic procedures, or, alleratively, by proceeding according to the to-alled method of single cases In 1964, Edmund B. Coleman published ‘an important methodological paper called “Generalizing to a Language Population” in which he criticized some of the procedures Psychologists were then using to deal with language samples in their study of verbal behavior. As he put it, “Many studies of verbal behavior have little scientific point if their conclusions have to be restricted to the specific language materials that were used in the experiment, It has not been customary, however, to perform significance tests that Permit generalization beyond these specific materials, and thus there is little statistical evidence that such studies could be successfully * The preparation of this paper was supported in part by Public Health Service Grant MH-20021 from the [National Institut of Mental Health. Lam vey grateful to William P. Banks, J, Merrill Carlsmith, Eve V. Cla Douglas J Herrmann, Peter Lucy, Lance J, Rips, and [Edward J. Shoben for their helpful comments onthe ‘manuscript and to Thomas K. Landauer, David E. Meyer, and three anonymous reviewers for their detailed reviews ofthe paper. {am especially indebt to Edward E, Smith and Ewart A. C. Thomas for their ‘generous and thoughtful counsel on many points inthe paper. replicated if a different sample of language materials were used (p. 219).” Coleman then described available statistical procedures that would assure generality across language materials. Despite their importance, Cole- ‘man’s criticisms got buried in the literature and have been all but totally ignored ever since. But if his criticisms were serious then, they are even more serious now, for there has been an increase in research in such areas as psycholinguistics, word perception, and se- mantic memory—areas particularly vulner- able to Coleman's criticisms. In the present paper, therefore, I would like to disinter Coleman's arguments from their premature grave, add a few arguments of my own, and then demonstrate with several specific ex- amples how these arguments lead to serious doubts about the conclusions drawn in many well-known papers in verbal learning, human memory, and psycholinguistics. Coleman's main point i best illustrated with, a simple example. Imagine that Baker and Reader are two psychologists interested in reading. Independently, they come up with the hypothesis that people can read, that is, 3 336 cuaRk perceive and vocalize, nouns faster than verbs. To test their hypotheses, each consults 8 dictionary, selects 10 nouns and 10 verbs at random, and collects reading latencies for the 20 words from each of 50 subjects, Let us assume, however, that contrary to their hypothesis nouns are in actuality exactly equal to verbs in reading latencies. Neverthe- less, since the actual latencies for individual ‘nouns and verbs vary from 500 to 1000 msec, the nouns in any particular sample will not bbe exactly equal to the verbs. So let us assume, Quite plausibly, that in Baker's sample the ouns are actually 25 msec faster than the verbs, while in Reader's there is a 25 msec difference in the opposite direction. Inde- pendently, then, the two investigators tally their results, Baker finding a 30 msec differ- 10 verbs by any conventional statistical test, he would have found no significant difference between nouns and verbs. So even though Baker’s and Reader's findings should replicate with new samples of subjects, they should not, necessarily, with new samples of words. And. this is why it was possible for Baker and Reader to come to exactly contrary con clusions, complete with “statistical” evidence, In drawing their conclusions, therefore, Baker and Reader have committed a statistical ‘error, one I will call the language-as-fixed- ffect fallacy. In statistical jargon, they have {iealed Wards asa faed instead of a random fe imply aceptng ne asare “he 2 words they chone eens ona plete population “of worth they Sis ‘Beneralizeto- They have not Presented any Seccin favor of the hypothesis, and Reader, a—statstcaT evidence to show that their findings 35 msec difference against it, And since 42 ‘out of 50 subjects showed the difference for Baker, and 45 out of 50 for Reader (both differences significant at p<.001 by a sign test), Baker reports to the public that he has Teliable support for the hypothesis, while Reader reports that he has reliable evidence Against it. But how could either investigator have come {o his conclusion (barring a one in a thousand Statistical freak) when in fact there is a zero difference between nouns and verbs? The answer lies in their statistics, With their sign tests they have demonstrated, consistent with the true means in their respective samples, that the differences they found would replicate if they gave these same two samples of 20 words to new samples of subjects. They have ot demonstrated, however, that their differ ences would replicate if they gave new samples of 10 nouns and 10 verbs to new samples of subjects. Nor would they be able to demon strate this. If Baker and Reader had examined the individual mean latencies to their 20 words, they would have found that the 10 nouns ranged approximately from S00 to 1000 msec, and so did the 10 verbs. Thus, if either investi- Bator had compared the 10 nouns against the Beneralize beyond the 20 words they chose, yet they have drawn conclusions which resume that they have. Although the errors in Baker's and Reader's studies are obvious, nearly every study in the current literature vulnerable to this fallacy exhibits the very same error. Modern investi Bators of language, of course, have been aware of the problem of language generality ‘and have explicitly discussed such problems us the random sampling of words, item selection biases, and the sizes of language samples. Despite this concern, however, most of these investigators have been unaware of the statistical error they themselves have been committing. With few exceptions, they have failed to provide even the most elementary statistical evidence that their results generalize beyond their particular sample of words of Sentences. Although in some instances this failure has probably done little harm, in far too ‘many other instances it Ieaves the conclusions drawn by the investigator completely in doubt. As evidence for such doubts one could cite studies in verbal learning, memory, sycholinguistics, visual perception, orreading, To bring this task down to manageable size, T have therefore chosen to examine most off the papers in small field 0) example will | by Rubenstei Twill demons statistics are ¢ reliable supp: clusions draw. et al. studies cause, comm bby appendices necessary sta though the + such analyses possible conse on their resul will offer som approach tot The Rubenste Rubenstein series of five « people to dec word or a no: refer to the Garfield, and by Rubenstein as Study 2, a Rubenstein, L Studies 3, 4, a were designed about the sc memory, Stud thesis that to word, the sul semantic met recognition ti semantic prop such as wheth or two, Studie that people p “phonemic re search the int constitutes a thesis recogn tional statistical test, significant difference bs. So even though dings should replicate ieets, they should not, ples of words. And. ssible for Baker and xactly contrary con- “statistical” evidence, nelusions, therefore, committed a statistical hhe language-as-fixed- cal jargon, they have instead of a random \gthe assumption that constitute the com: sords. they wish to not presented any ow that their findings 20 words they chose, conclusions which 1 Baker's and Reader's arly every study in the rable to this fallacy rror. Modern investi F course, have been of language generality ssed such problems as words, item selection >f language samples. wwever, most of these n unaware of the cemselves have been xceptions, they have most elementary hetr results generalize sample of words or some instances this little harm, in fartoo aves the conclusions ator completely in such doubts one could learning, memory, perception, orreading n to manageable size, to examine most of ‘THE LANGUAGE-AS-FIXED-EFFECT FALLACY 337 the papers in the new and, as yet, relatively small field of semantic memory. My major example will be drawn from series of studies by Rubenstein and his colleagues. For these [will demonstrate that when the appropriate statistics are computed, there is no longer any reliable support for most of the main con- clusions drawn from them. The Rubenstein et al. studies have bren singled out only be- cause, commendably, they are accompanied by appendices containing data from which the necessary statistics can be calculated, Even though the remaining studies do not allow such analyses to be done, I shall examine the possible consequences of statistical procedures oon their results as well. In the final section I will offer some remedies and one alternative approach to this unfortunate state of aus Case Stupies The Rubenstein et al. Studies Rubenstein and his colleagues carried out a series of five experiments on the time it takes people to decide whether a letter string is a word or a nonword. For brevity's sake I will refer to the experiment by Rubenstein, Garfield, and Millikan (1970) as Study 1, that by Rubenstein, Lewis, and Rubenstein (1971a) as Study 2, and Experiments 1, 2, and 3 by Rubenstein, Lewis, and Rubenstein (1971b) as Studies 3, 4, and 5, respectively. These studies were designed to test a series of hypotheses about the search for words in semantic ‘memory. Studies 1 and 2 examined the general thesis that to recognize a letter string as a word, the subject must locate this word in semantic memory, Under this hypothesis, recognition time should depend on various semantic properties of the word recognized, such as whether the words have one meaning oF two. Studies 3, 4, and 5 examined the thesis that people put each letter string through a “phonemic recoding” before attempting to search the internat lexicon to see whether it constitutes a word or not. Under this hypo- thesis recognition time should depend on various phonological properties of the letter strings, such as whether the nonword letter strings have the same pronunciation as English words. Since all five studies are essentially alike in procedure, I will first cxamine a simplified version of Study 5 in some detail and then, later, present the more complete evidence on all of them together. A statistical analysis of Study 5. In this study the authors wished to compare homo- phones (words like bear, which is pronounced just like bare, but is spelled differently) against nonhomophones. They selected 25 homophones and 24 nonhomophones, mixed them in with nonword filler items, presented them one at a time to each of 44 subjects, and measured their word/nonword. recognition times in milliseconds, The study, therefore, had three effects: (1) Homophony, consisting of two fixed categories; (2) Words nested within Homophony, consisting of a random sample of all possible homophones and non- homophones; and (3) Subjects, consisting of a random sample of all possible people, Al- though this is a rather complicated mixed hierarchical design, the appropriate analysis of variance can be constructed on advice from, say, Winer (1971). Table 1 shows the appropriate sources of variance, degrees of freedom, and expected mean squares for the more general analysis in which there are p Treatments, g Words nested within each Treatment, and r Subjects. ‘The critical issue, as always, is how to construct the F-ratio that tests whether oF not the Treatments effect is significant, inthis case whether homophones differ significantly from nonhomophones. The information required for this decision is found in the expected value of the mean squares, abbreviated E(MS), in the right-hand side of Table I. The main goal isto show that the variance due to Treatments, a. is greater than zero. This requires us to compare the MSz, the Treatments mean square, against some “error” term, MSerer: such that E(MS;) exceeds E(MScir.) bY exactly the variance due to Treatments, 0 338 LARK. TABLE 1 ‘SOURCES OF VARIANCE AND EXPECTED MEAN SQUARES FOR MIXED Hire One Foxe ErrEcr aso Label Sources of variance T Treatments (p) p- WT Words (q) within Treatments 2g- s Subjects (0) ra TxS Treatments x Subjects Sx WwT Subjects Words within Treatments pq — The logic, then, is if the MS, calculated from the data exceeds the MS ge calculated from the data by a sufficient amount, we can be Confident that this has happened because the variance due to Treatments, ¢, is greater than zero. More precisely, if @,2 = 0, then the ratio MSr/MSrre is distributed as F around a mean of ni(n—2), where n is the degrees of freedom of MS.re; therefore, if this ratio is enough greater than mi(n~2), we can reject the hypothesis that ¢,? = 0, Given these requirements let us consider F,, the F-ratio in (1), in which the Treatments effect is tested against the Treatments by Subjects interaction © A 1e~NO= 1) = MSS Although appropriate in many designs, F, clearly does not fulfill our requirements. Algebraically, E(MSy) exceeds E(MSy.<) by the sum of two variances, that is, r¢,? + 9ra,2, Not just one. So a significant F, could lead to Any one of three conclusions Qa o%>0 and 4,2=0, b a7=0 and 9,230, & of>0 and 9230, In particular, (2b), in which o,? = 0, isadefinite possibility, and so F, could be significant even though there were no differences among the ‘Treatments. The same considerations hold for Fz, the F-ratio in 3), in which the Treatments effect is tested against the Words-within, Treatments effect: ) FP 1.09 - 1) = MS/MS pur. ARCHICAL THREE FACTOR DESIGN wir ‘Two Ranpos Errects Desreesoffieedom Expected value of mean square ee 2 + a+ gout treet + ara +pa08 » B-D6- atten sget DO-D ottont In this case, E(MS,) exceeds E(MSwax) by dou? + rga,2, and s0 F,, if significant, could indicate any one of three possibilities, as shown in (4) and” 942 =0, and oy? >0, & o2>0 and 9,20) Since (4b) isa definite possibilty, a significant F, does not guarantee that the variance due to Treatments, is greater than zer0 either, An Feratio with MSsewer a5 the error term ean be shown to fail in the same way, Because there is no single error term appro= priate for this analysis, Winer (1971) and others Tecommend the use of F, the so-called quasi Feratio in (5) ©) FGI) = (MSy + MSsewer)) (MSs + MSwx1) The degrees of freedom é and jfor this F-ratio are computed as follows. Let MS, and MS, bbe the two mean squares in the numerator of F and letn, and m, be their respective degrees of freedom. Then iis the nearest integer value of the following formula: ©) r= C45, + a5, [(MBE , MSs ) The value of jis computed by the same formula but where MS, and MS, are the two mean squares from the denominator of F’. A little algebra will show that the expected value of the numerator E(MS;+ MSzawer) exceeds the expected value of the denominator E(MSras + MSwor) by exactly the wanted term, gra, t) So when Fis dueto Treatm: than zero, AS the nan isan approxir an exact testir all practicalea is very close, s preferable to example, Win tests that mig first procedu: out =0 by d MS ras MSsou liberal alpha 1 accomplished, the Treatmen (4a) is the on significant Fy, must first she significant F- then compute effect. In prac procedures is will often not ratios will cor that the first p) and this is unlit geneous group cedure requir. assumption thi words are san siderably, and expect the vari Second, these In some instan of “significant account all of to the judgm: * Consider, for ratio MSeasiMS Significant. Accor allow us to com reliable over both ‘Yet this patter ¢ ignificant, anda contraditory co: nee Factor Desion went | value of mean square ceeds E(MSwar) by ". if significant, could three possibilities, as nd 0,3 =0, nd a,2>0, Wd oe>0. Dossibility, a significant that the variance due reater than zero either. wer as the error term the same way, ingle error term appro- Winer(1971)and others, F’, the so-called quasi MSsxwer) (MS pas + MSwar) ‘and jor this F-ratio vs. Let MS, and MS, in the numerator of their respective degrees ve nearest integer value ies ccd by the same formula 15, are the two mean minator of F’ A little the expected value of 1+ MSs. wer) exceeds of the denominator v exactly the wanted MS; ‘THE LANGUAGE-AS-FIKED-EFFECT FALLACY 339 term, qra,2, the variance due to Treatments, So when Fis significantly large, the variance due to Treatments an be assumed to be greater than zero, As the name “quasi F-ratio” suggests, F” is an approximation toa true F-ratio and is not an exact test in the statistical sense. For almost all practical cases, however, the approximation is very close. 50 close that its use appears to be preferable to other possible procedures, For example, Winer (1971, p. 378) spells out two tests that might be used instead of F’, In the first procedure, one must first show that 2,2 =0 by demonstrating that the F-ratio MSr.siMSs.wor is not significant at some liberal alpha level (say, 2 = 25). Once this is accomplished, one can compute F, as atest of the Treatments effect, since, with o,=0, (4a) is the only possible interpretation of a significant F,. In the second procedure, one |. must first show that o,?=0 with a non- significant F-ratio, MSwer/MSgower, and then compute F, as the test for the Treatments effect. In practice, however, neither of these! Procedures is very satisfactory. First, they will often not work, for the prerequisite F-7 ratios will come out to be significant. Note that the first procedure requires that 0,2 =0, and this is unlikely except with a rather homo- geneous group of subjects. The second pro- cedure requires the even more unlikely assumption that 4, =0. Because individual words are sampled, they should vary con- siderably, and so investigators should rarely expect the variance due to words to be nil Second, these procedures are rather risky. In some instances they can lead to a judgment of “significant,” while F’ which takes into account all of these variances at once leads to the judgment of “not significant."? In * Consider, for example, an instance where the F= ratio MSrasiMSewwer is not significant, and Fy is significant. According tothe fist procedure, is patlern could very eatily arise while ris not significant, anda nonsignificant F, would lead usto the contradictory conclusion thatthe Treatments effects short, F’ is probably the safest test to use in most instances, Returning to Study 5, we find that Ruben- stein et al. did not use F’ or either of the alternative procedures suggested by Winer. Instead, they computed F,, finding F,(I, 43) 10.40, p <.005, and concluded that there was a significant difference between homo- phones and nonhomophones. This test, of course, is the normal one for simple Treat- ments by Subjects factorial designs in which Subjects is assumed to be the only random effect. Thus, it would have been an appropriate test if Words could have been considered a fixed effect, that is, if the 25 homophones and 24 nonhomophones had depleted their respec- tive language populations, But there are obviously many such words in English and other languages that Rubenstein er al. did not include in their experiment, and so Words within Treatments should have been treated as a random effect along with Subjects. Since the design in Table 1 is the appropriate one, theit significant F, allows the three interpretations shown in (2). In particular, (2b) might be correct, and the Treatments effect might actually be null. Interpretation (26) is especially plausible since one would expect the variance due to Words, ay, to be con- siderable by itself. Thus, the significant F, Rubenstein er al. cited is inconclusive as evidence for the homophone/nonhomophone effect. Note that if there really were a Treatments effect and o,? > 0, then F;, the F-ratio in (3), should also be significant, since the expected value of the numerator E(MS,) exceeds the expected value of the denominator E(MSwer) by the quantity gro? plus an additional quantity. Although Rubenstein et al, did not compute F;, it can be readily calculated from the mean latencies for each word given in the Is not reliable over Subjects even teating Words as a fixed effect. The quasi F-ato, in contrast ie dependent con both F, and Fy, and indeed, F’ must be smaller than both and Fy (se below). So F” would rarely if ever lead to such contradictory conclusions as these. 340 . CLARK appendix to Study 5, as I will show later. This Feratio, F,(1, 43) =2.00, is not significant, leading one to suspect that the Homophony effect is probably not reliable, Fora better test of the Homophony effect, one should calculate F", the quasi F-ratio in (3). Note that its computation requires four mean squares, MSr, MSt,s, MSwar and MSswwy. The fitst three can readily be calcu- lated from Study $ and its appendix, but the fourth, M5;,w«r.cannot becaleulated without allof the data. Itis therefore impossible to cal- culate the actual F for Rubenstein e¢ als data We can, however, calculate the maximum and minimum values of F’ given the values of MSt, MSrys, and MSwy; for Rubenstein et al’s data and given certain assumptions ‘And with max F’ and min F’, we would be able {0 draw the following conclusions. If max F is not significant, then the actual F’ for Ruben- stein er al.’s data cannot be significant either, for F must be smaller than max F’. Or, if ‘min Fis significant, then the actual F” for their data must be significant, since F must be larger than min F. Only in the case where max F’is significant and min Fis not would we ‘ot be able to draw any conclusions, for then the actual F” might or might not be significant The calculation of max F’ and min F' follows a straightforward line of reasoning. For given values of MSs, MSras, and MS war, F’ will be ata maximum when MS,,werisata maximum and at a minimum when MSsewer is at a minimum. The maximum value of MSyswer, in turn, can be reckoned as follows. In Table 1, it can be seen that E(MSsewer) cannot be larger than the smallest expected value of the other mean squares in the table since all of the ‘other mean squares in the table contain extra variances. In particular, MSs.wer cannot, on the average, exceed Sys, the smallest of the Femaining mean squares in the Rubenstein et al, data, Because of sampling variation, how. ever, both MSs.wer and MS; are imperfect estimates of their expected values, and so MSswer could, by chance, be somewhat larger than MSr,s. So under the rather unlikely condition that o,,?=0, MSzawer will be significantly targer (with a =.03) than MSros only 2.5% of the time (in a twoetailed test). Thus, the practical limit to be placed on the size of MSs.wer is that it not be sig- nificantly larger than MSyys.? That is, we are interested in the critical value of the following Frratio: Fle ~D6-)e-De=) = MSscwerl MSs Let the critical value of F, at the .05 level be denoted by F}. Then, by simple algebra the maximum allowable value of MSe.wor is FEMSr,s, and therefore max Fis given by the following formula for (MSt4s < MSwer) (8) max Fi) = (MS; + FIMSy,3)) (MSrxs + MS wor) where i and j are defined as in (6), but with the Value of FSMSrus replacing MSsewar in (6) Ttcan be shown that the actual F” will always be less significant than max F’ so long as MSui(e ~ 1) > FEMSrasl0(g — 1) (r~ 1); this condition, of course, will invariably hold for any interesting Treatments effects since for them 44S; will have to be larger than FS MS, and in any case (p—1) will be smaller typically much smaller, than p(g— 1) (r~ 1) The minimum value of MS.,wer, obviously, is zero, and 50 min Fis given by the formula: (9) min F'(i,j) = MS MS qu5 + MSug) where i= (p~ 1) and jis as defined in (6). It can easily be shown that the actual F’ will always be more significant than min F’ 2 Note thatif MSs Were significantly larger than ‘MSvox, we would have reason to conclude (assuming {hat au? =O) that MScowor was an overestimate of 24+ Gus, oF that MSenq was an underestimate ofthe quantity, or both. In thiscase F probably has positive Bias, and consequently, the probability of a Type T Torin testing the Treatments effect is higher thea the sated level. When this happens, there may be Something amiss in the methodology of the exper ment. So when MSyiwer/MSrasis significant, itis best {0 Use the more conservative max F instead of the actual F thereby treating the quasi F-atio as having a ‘distribution truncated atthe value of max F”. Now we ar F for Study 43) at the 05 by the formul 194 with Par this value is r Study $is not no reliable s Justification, that homophe than nonhom: In the cou resented thre exactly does « speaking, F, i the same 25 | phones were subjects. Beca fairly certain 1 replicate on tl the other ha happen if the new random + 24 nonhomop significant imy should not nc sample of we should happer 44 subjects ar homophones ‘cause it was n ance that th replicate in ¢ descriptions, » that if either » F will not be almost always Statistically, as Thave pres MSy. were c: of Study 5. geometric mea homophones reported in th happened, Ru! the homophor high and low 104? = 0, MSsawar Will (with =.03) than he time (in a two-tailed ical limit to be placed ar is that it not be sig- MSy.5.? That is, we are 'I value of the following M(P-Dr-1)) MSsawariMSras 1 Fy at the .05 level be by simple algebra the value of MSsewer is we max F’ is given by For (MSras < MSwar) Sy + FIMSy.s)) (MSra5+ MSwor) sd as in (6), but with the lacing MSs,war in (6) he actual F” will always in max F’ so long as s/plq— 1) (r—~1); this will invariably hold for nents effects since for belargerthan F}MSrxs ~1) will be smaller, than p(q—1) (r= 1). F MSc.wer, obviously s given by the formula: VW MSrs + MSwex) is as defined in (6). It hat the actual F’ will ant than min F* cre significantly larger than s9n to conclude (assuming + was an overestimate of vsan underestimate ofthis °F probably has positive «© probability of a Type 1 Os effects higher than the happens, there may be ‘thodology of the exper ‘Seuss significant, itis best tye may Fines of the seuss Pali having e value of ma: F ‘THE LANGUAGE-AS-FIXED-EFFECT FALLACY M41 Now we are in a position to compute mux F for Study 5. FS, thecritical valuefor F\(1935, 43) at the .05 level (two-tailed), is 1.68, Then, by the formula in (8), max F* turns out to be 1.94 with 1 and 62 degrees of freedom. Since this value is not significant, the actual F’ for Study Sis not significant either, and so there is no reliable support, that is, no statistical justification, in the data for the conclusion that homophones are recognized more slowly than nonhomophones. In the course of this argument, T have presented three F-ratios: F,, F,, and F'. What exactly does each of them tell us? Roughly speaking, F; indicates what should happen if the same 25 homophones and 24 nonhomo- Phones were given to a new sample of 44 subjects, Because F, was significant, we can be fairly certain that the Homophony effect will replicate on this new sample of subjects. On the other hand, Fy indicates what should happen if the same 44 subjects were given a new random sample of 25 homophones and 24 nonhomophones. The fact that F; was not significant implies that the Homophony effect. should not necessarily replicate on the new sample of words. Finally, F” tells us what should happen both with a new sample of 44 subjects and with a new sample of 25 homophones and 24 nonhomophones. Be- cause it was not significant, there is no assur- ance that the Homophony effect would replicate in this case. From these rough descriptions, we should also expect, in general, that if either F, or F; is not significant, then F will not be significant either, and this will almost always be the case (see below). Statistically, Study 5 was not quite as simple as I have presented itso far. While MS, and MSy.x were calculated directly from the text of Study 5, MSwar required the use of the geometric mean latencies for each of the 25 homophones and 24 nonhomophones. as feported in the appendix to Study 5. As it happened, Rubenstein er al. had also divided the hemophanes and oneness high and low frequency ranges such that the 49 words actually constituted a Homophony by Frequency design. So to calculate MSwar ¥ took logarithmic transforms of the 9 latencies, just as Rubenstein e” af, had done, and submitted them to the appropriate 2 x 2 factorial design, with Words nested within Homophony and Frequency. This design is completely analogous to the typical between- subjects design except that here the sampling factor is Words, not Subjects. Since there were unequal numbers of words within the various conditions, I also had to make use of Winer's (1971) method of unweighted means for the analysis of variance. The mean square for Words within Homophony and Frequency for this design can be shown to be identical to the mean square (times 44, the number of subjects) that is required for the more com- plete analysis indicated above. It was this mean square that I used as the value of ‘MSwer in the above calculations. In general, MSy and MSway—hence F;—can be com- puted simply by collapsing across subjects and by applying the appropriate analysis of variance as if Words was the only random effect. Statistical evidence for Studies 1 through 5. Following exactly the same procedures as 1 used for Study 5, T have computed Fy, max F’, and min F’ for each effect originally reported as significant in al five studies by Rubenstein eral. and have listed these values in Table 2 Opposite the values of F, reported in the original studies.* As Table 2 makes plain, there are large discrepancies between the values of F, reported by Rubenstein er al, and the values of max F’ calculated for the * Accordingto H. Rubenstein, the authors of Study 1 inadvertently listed the arithmetic, rathee than the ‘eometric, means fr each letter string inthe appendix to the study; H. Rubenstein has kindly sent me the ‘seometric means, and the calculations have done ate ‘based on them. In addition, the geometric means calculated from the appendix of Study 3 (namely, 880, 536, and 995) do not jibe exactly with the reported Sppavently bosuse of Kubsasi et e's Praceders for repacing missing data 342 CLARK TABLE 2 F-Ramtos ay Sumect (Fj), F-Ranos ay won (F), ‘AND MAKIMUM AND Miiaune Quast F-RATIOS (Mas F° AND Mi Fron Srupies 1 rHROUOH § OY Ri $e renner rat Signi. Signi. Signit. Study Source of variance FL _ieance =F icance Min F’and Max F”> scence? 1 Frequency FQ, 76) = 45.53" 001 FQ, 168)= 5302 001 FQ, 197) = 2443, 25.10 oor Homography (H) F(1,38)= 10.72 005 FU, 168)= 426 (05 FUL,193)= 308, 332 an Concreteness x Hi F(L,38)= 17.77 001 FU, 168)= 117 as. FULIED= 110 120 ee 2 Systematic (S) FL 44)= 3880 “001 FUSIO)= 4i4 OS FUL, 26)= 3.70, 387 n, Equiprotabiity(E) F(,48)~ 19.93 001 FU, 108)= 279 as. FULI2)— 248, 268 ne SKE F,48)~ 1159 005 FU, 104) = 098 ns. Fil,120)= 090, 103 ne Frequency (F) FQ, 88) = 27.28" 001 F(2,108)~ 462.025 F(2.138)= 3.95, 418 ‘Oa SxExP FQ,8)~ 717 001 FQ.106)= 018 ns. FQ,109)~ 018, 022 ne 3 Legality FA, 44) 108.75: 001 FUL, 185)= 86.18 .001 FUL 163)= 48.08,48.81 COL Pronounceabilty F(1,48)= 9.83 005 F(I,185)= O81 ns. FU2I2)~ 078, O88 oe within legality { Homorhony —FU,43)= 3497 001 FUL,53) = 746 01 FU,74) = 630, 660 025 $ Homophony F(t, 43)~ 1040 005 FU, 45) = 200 ns. FUlL6) = 18 194 ae Frequency F,43)~ $341 Ot FUL45) = 4501 001 FUL82) 292312980 ‘Cot wai Sots of feedom for FC, 76) and (2, 8) in Studies | and 2 were incorrectly reported inthe original studies as F(2, 38) and FQ, 4), respectively * The degres of freedom for min F’ and max F” were the same i ‘Study 2, where min F’had and 109 degrees offeedom and max F ‘had 3 and 109 dey “The level of significance was the same for min F” and max F’ for all 13 pairs of aloes same data. While all 13 values of F, were significant at the .005 level, only five values of ‘max F° are significant, two at only the .025 Jevel. Thus, when Words is treated asa random effect along with Subjects, a number of the effects originally reported as significant turn out to be statistically unreliable. In addition, Table 2 illustrates two general points about quasi F-ratios. First, consider those instances where both F, and F, are larger than F$ (1.64 for most of Table 2). ‘These instances turn out to be the only inter- esting ones, since all other instances can be shown to result automatically in a non- significant F’. In these cases, it can be shown (ee the Appendix) that max F", hence the actual F’, will never be larger than F, or F;, whichever is smaller. This agrees with our intuitions about F’. If F, and F, indicate what would happen with new samples of subjects and words, respectively, then F’ should be smaller than either, since it indicates inall eases but one, the Sx E x F interaction in of reedom, what should happen both with new subjects and with new words.* Second, in most instan- cts max F"is not much larger than min F’ It can be shown (see the Appendix) that when Fy > F, (as is true for most of Table 2), max F’ is algebraically equivalent to (1+ FS/F,) times min F". Consider the Systematicity effect in Study 2 (where F, = 34,80 and F} 1.64), There max F" is only 5% larger than min F’, For the Homophony effect in Study $ where Fis only 10.40, max F is still only 16% larger than min F°. All this indicates that in many instances min F* will not be much smaller than the actual F” and could therefore be used 45 a convenient substitute for the actual F’ when the latter is too cumbersome to calculate * Strictly speaking, although F’mus be smaller than both F, and F,, its possible for F” to be significant (because of the possibilty of increased degrees of freedom) even though the smaller of F, and Fis not. ‘This possibilty, however, i very emote and has never ‘ceurred in my experience casily (see be noted that ma the 2 level is ¢ (FE 1931 nonsignificant How do al clusions draw ‘main hypoth required that: Homography and Equiprot significant, 7 reliable evide either study. Studies 3, 4,9 of the effects Megatity anc There was su Homophony leer strings + the Homophe letter strings conclude, of ¢ significance in Other more more powerfu with frequene way, might we significant effe simply, that th analyzed prov for the conctus The Meyer St A rather « ‘committing th is Meyer's (19 representation tic informatior In one, 56 sut true-faise jud All chairs are In the second, identical proc sentences in v some, as in Se classified the & wun Quast F-RaTios (Max F° Min F’ and Max F°> 2, 197) = 2443, 25.10 ‘THE LANGUAGE-AS-FIXED-EFFECT FALLACY 343 easily (See below). In addition, it should be noted that max: F’ does not change much when the a level is raised from 05 (Ff = 1.64) to 01 (F}=193). In no case in Table 2 does a ‘nonsignificant max F’ now become significant. How do all these statistics affect the con- clusions drawn in Studies 1 through 5? The into 16 categories according to the subject predicate (S-P) relation they exhibited and then carried out @ variety of comparisons ‘among the categories in order to distinguish among a number of competing theories of semantic retrieval. For purpose of illustration 1 will examine only one of these comparisons ‘a eae ‘main hypothesis tested in Studies 1 and 2 in detail (18= 110, 120, nw required thatatieastsome of theeffects labeled In the case under examination sentences 1,126) 2 370, 37 a Homography, Concreteness, Systematicity, such as All stones are rubies were compared 012)= 248, 268 ne and Equiprobability, or their interactions, be with ones such as All solids are rubies (1,120) 090, 1.03 ns. significant, The quasi F-ratios show no Stones is said to be a “small” superset of 1a ~ 398, 416 02s reliable evidence for any of these effects in rubies and. solids, a “large” superset. of ieoanoe aett OL cither study. The main hypothesis tested in rubies, because stones is itself a subset of 0,21)> O78, O88 Studies 3, 4, and 5 required that at least some solids. This particular pair of sentences, then, of the effects labeled Pronounceabilty within can be thought of as having been constructed Seceeeee reese Illegality and Homophony be significant. from the Word-triple rubies-stones-solids, in Aiiae ean iteaeet oor There was support for this hypothesis in the which rubies isa subset of stones which in turn, sreetly reported in the original ie, the Sx E x F interaction in Uegrees of freedom, values, ‘n both with new subjects 5° Second, in most instan- Homophony effect of Study 4, where the letter strings used were nonwords, but not in the Homophony effect of Study 5, where the letter strings were actual words. One cannot conclude, of course, that those effects lacking significance in Studies | through $ are not real, Other more sensitive experiments, or even ‘more powerful analyses of these same studies with frequency handled in a more detailed is a subset of solids. Meyer composed eight such Word-triples (implicitly taking them from the population of all Word-triples with this nesting property), constructed one pair of sentences from each Word-triple, and then examined the latencies of 56 subjects to all 16 of the resulting sentences. Since each sentence in the “small” superset category was paired ~ with one in the “large” superset category, veh larger than min F. It the Appendix) that when for most of Table 2), max equivalent to (1+ FS/F,) der the Systematicity effect F, = 34.80 and FS ~ 1.64), ly 5% larger than min F ay effect in Study 5 where + Fisstill only 16% larger bis indicates that in many will not be much smaller nd could therefore be used bstitute for the actual F vocumbersome to calculate way, might well show any one of these non- these sentences fit into a simple factorial significant effects to be real. The argument is, design with three crossed factors: Treatments simply, that the data in Studies | through 5s (that is, Size of superset relation of S to P) analyzed provide no statistical justification Worde-triples, and Subjects for the conclusion that these effects ae real, The analysis of variance for such a factorial design is indicated for the general case in The Meyer Study Table 3 (see Coleman, 1964; Winer, 1971 and A rather different example of a study others). This design contains p fixed Treat- committing the language-as-fixed-effect fallacy _ ments, q random Word-triples, and r random is Meyer's (1970) detailed investigation of the Subjects. The problem here again is how to Fepresentation and retrieval of stored seman- choose the correct F-ratio for testing the ticinformation, He reported two experiments. reliability of the Treatments effect. As in the Jn one, 56 subjects were timed as they made previous design, it is not correct to use Fy, true-false judgments of 192 test items like which tests the Treatments effect against the All chairs are furniture (that is, All S are P). Treatments by Subjects interaction: In the second, 32 subjects went through the identical procedure with a similar set of 388 (10) Fu(P—16(p— I)(r~ 1) = MS sentences in which all had been replaced by Motus some, as in Some chairs are furniture. Meyer A little algebra shows that F., if significant though F” must be smaller than sessile for F” to be significant wbilty of increased degrees of the smaller of F and Fis not «r,s ery remote and has never a 0, and this 344 ca RK TABLE 3 Sources oF VARIANCE AND EXPECTED MEAN SQUARES FoR Mix FACTORIAL DESION wIrit one Fotep Errecr ‘AND Two Ranbon Errects eee Label Source of variance Dares offreedom Expected value of mean square T ‘Treatments (p) pat 22+ Ora +4042 + rove? + gro w Words (9) at 22+ pow +r, s Subjects (r) re + pon? pao TxW Treatments x Words @-DG@~v 2+ Om tr? TxS Treatments x Subjects @-De-v 208+ om + 4048 WxS Words « Subjects @-D6-1) 20+ pot TxWxS Treatments x Words x Subjects (—1)(@—1(r=1) or soe gives no assurance that ¢,2 0, that is, that the variance due to Treatments is greater than zero. The Fratio in (11), which I will denote as F, since it is analogous to F, in the previous design, is not the right one either: () A= 1,(9-)@~D) = MS, MSraw It suffers from the same fault, for its sig- nificance guarantees that ro? + 0,7 >0, but not that o7>0. Winer (1971). therefore recommends the following quasi F-ratio, which I will again denote as £” (12) FG J) = (MSy + MSyasaw)] (ASpas + MS roe) As before, the degrees of freedom i is the nearest integer calculated by the formula in (13). 03) 04, + ms (HSE 4 MS) 48)= 5.5, p<.05 for this comparison, he concluded that “large” supersets take reliably Jonger than “small” supersets. But to see if this 44 msec difference really is reliable, we ‘must compute F'. Unfortunately, this is impossible to do from the information pre- sented in Meyer's paper, for there is no way to calculate MSt.w oF MSresaw. To illustrate what might happen, therefore, [will make reference to some data recently collected by Lance Rips in a partial replication of Meyer's two experiments with 24 subjects and nine newly sampled word-triples.* First, one can attempta crude estimate of max F’ for Meyer's data, In Rips’ data MSy.y was more than twice as large as MSy,s. Since Meyer used ‘many more subjects than Rips did (56 to 24), in Meyer's data MSr,w is likely to be even more than twice the size of MSy,s. If we assume that MSr.w>2MSy,s, of equiva- lently, F, > 2F,, in Meyer's data, then max F” turns out to be 2.34, which with 2 and 15 degrees of freedom is not significant. Second, one can turn the argument around and com. ute the standard deviation for the Treat- ments by Word-triples interaction effects that ‘would be required for max F” to be significant (at the .05 level) given Meyer's own F,. This standard deviation turns out to be 39 msec, which is less than half the size of Rips’ standard deviation of 91 msec, Thus, for Meyer's "Rips. Lance J. Quantification and semantic ‘memory. In preparation. am deeply indebied to Rips forthe use of his data, 44-msec diffe experiment w. much more p: used about the as Rips did. T quite a differe 44-msec differ subjects and then it ought « subjects and But it did not data went 20 n Meyer's. tis instruct Rips’ data fe (simitar to anc ted) common The Fs for th data (see his 7 5.5, 17.4, 176. significant ata tive F,sin Rips 30.09, and 18, were significar freedom). So ! duce more sigt could have hag. including the Meyer's exper show dramatic nificance levels, Fisccalculated | first three F's respectively, nificance. The sons are F(1, 1 FA, 16) = 60 latter two are levels, respecti F's are each le 7 These six Fer following six con (1970): (1) Variat, ces: (2) Posize, Su set, UA sentence () Pesize, Subset. st, PA sentences oN wei one Fras Errict ected valve of mean square o€ this comparison, he + supersets take reliably supersets. But to see if ce really is reliable, we Unfortunately, this is m the information pre- Per, for there is no way F MSeysqw- To illustrate therefore, I will make ta recently collected by al replication of Meyer's h 24 subjects and nine triples.* First, one can ste of max F’ for Meyer's MSr.y was more than Stas: Since Meyer used han Rips did (56 to 24), row is likely to be even size of MSy.5. If we 2MSras, oF equiva- yer's data, then max F” which with 2 and 15 ‘ot significant. Second, iment around and com- b. max F’=(I + FSF) min FY, for FoF, ‘The F$ in (17a) is defined as the critical value Of the Feratio MSr.sw/MStase MSsewar MSris OF MScuwar/MSser, Whichever is appropriate for the design; the F} in (17b) is defined analogously, but’ with MSr.y or MSwey in the denominator, whichever is appropriate, The range of F” given by max F” minus min F’ will decrease as the number of subjects and words increase and as the larger 348 CLARK of F, and F, increases.” Thus, min F’is to be recommended in cases where F, and F, are easy to calculate but F” is not. The use of min F in these cases seems far preferable to the next best procedure, which is to report F; and F; separately with the requirement that both be significant."® The latter procedure will sometimes lead to judgments of “significant” where such judgments are not justifiable, because F, and F, can both be significant even though F’is not Investigators can sometimes avoid the use of F* and its restrictive assumptions alto ether by choosing designs in which simpler Statistics are appropriate, even when Words is treated as a random effect. 1 will illustrate With two designs discussed by Winer (1971, P. 364-365), Imagine that Cushman, a Psychologist, wishes to compare the recall of concrete and abstract words. She therefore Bives each of 20 subjects eight concrete and Cight abstract words to recall. But with scores of Is and 0s, Cushman does not feel that the computation of F would be legitimate. To sidestep the problem, she therefore presents cach subject with a different set of eight concrete and eight abstract words. With Words (a random effect) nested within Subjects (a random effect), Winer (p. 365) shows that the Treatments effect is legitimately tested against the Treatments by Subjects inter- *'It should be noted thatthe degrees of freedom of ‘max F will not accessrily be the same ss the of ‘min F's Whereas the degrees of freedom j Tor the ‘denominator will not change, the degrees of freedom f {ot the numerator can be Tatger for max F” than for ‘min F. Nevertheless, when F, and Fy are both si. nificant, the nearest integer value of i will ymealy Be the same fr both max and min Ftoa{for sample, see footnote bof Table 2) '° In my own previous research, because of dif Cults in calculating F, 'have relied, with lapses, on this weaker method, reporting F, and Fy separately (see Clark & Begun, 1968; Clark & Card, 1969; Clare & Clark, 1968). 1 min F” had been avaiable inert would bviously have been Statistic to use. Fortunately, in these studies are affected from F, and F; 10 min F ‘4 more appropriate the statistial conclusions very little by the change action, that is, with the F-ratio MS;/MSyys. ‘That is, Cushman can collapse over the Words factor altogether, compute an F-ratio by subjects (or an equivalent nonparametric test such asthe Wilcoxon test orsign test), and, ifthe test shows significance, justifiably claim that her finding is general for both words and Subjects. (See Carroll 1966 for an application of this design.) The second design is similar Imagine that Cushman has given half her Subjects only concrete words and the other half only abstract words. With Words nested within Subjects, and Subjects in turn nested Within Treatments, Winer (p. 364) demon- strates that the Treatments effect is appro. priately tested against the Subjects. within Treatments error teem, that i, by the Feratio MSz'MSsuy. So again, Cushman can collapse over the Words factor entirely, compute the Appropriate parametric or nonparametric tests by subjects, and legitimately generalize her findings to both words and subject. Although both of these designs require « large number of words, they have the advan~ {age that they simplify the statistics required, ‘specially if parametric statistics are deemed inappropriate for the design. For other poss- ible simplifying designs, one should consult Winer (1971) or other similar reference texts, When should the investigator teat language 48 a random effect? The answer is, whenever the language stimuli used do not deplete the Population from which they were drawn Note that the answer is nor, whenever the language stimuli used were chosen ar random from this population. The latter requirement is, in a sense, secondary to whether oF not language should be treated us a. random effec. Consider, for example, the Meyer (1970) study examined in detail above. In it, Meyer explicitly considered treating Word triples as a random effect, but rejected the idea “because of the procedure used to Select the test stimuli” (p. 263). Meyer based this decision, presumably, on the fact that he had not sampled the word:-triples at random, ‘but had composed them with the aid of dictionaries a minimize wore and the like. ‘composed oth Parison thant) the same type parisons in the triples as a fix mately genera Population « sampling proc no possibility the more incl collection of a Meyer clearly the investigatc Rot chosen at as a random € other words + The nonrande difficulty only wants to deter he can legitime Choose the Ap, On the face design seems | lem. for everyo: courses on exp duction of Wor however, bring In traditional whether to use subject design smvestigator k fenerally requ comparable bi each subject Furthermore, those experime pects smaller di Increasing the omer of the + ‘sisdom applies Words as a Uesigns are mo Uesigns, since i the F-ratio MSy/MSras ‘collapse over the Words compute an Feratio by juivalent nonparametric 'xon test or sign test), and, hificance, justifiably claim ‘neral for both words and 11 1966 for an application second design is similar man has given half her words and the other half 's. With Words nested Subjects in turn nested Winer (p, 364) demon- satments effect is appro- nst the Subjects within im, that is, by the F-ratio in, Cushman can collapse tor entirely, compute the ‘ric Or nonparametric id legitimately generalize th words and subjects these designs require a rds, they have the advan- ify the statistics required, tric statistics are deemed ¢ design. For other poss- igns. one should consult F similar reference texts, nvestigator treat language The answer is, whenever i used do not deplete the hich they were drawn, er is mot, whenever the F,, or equivalently, MS, = MSwer, then max F' is given by (ii), as shown in the text (ii) max F (if) = (MS, + F3 MSs) (MS 45+ MS wor) where F3 is the critical value of F, To be able to examine the range of max F’, we will ‘multiply both the numerator and denominator OF Gi) by MSeiMSqas MSwer, and this yields: (iii) max F'(i,j) = MS MS pe MSy (oases a5 ase) jus: | Ms, fice vise) and since Fy MSy/MSyog and Fy MSy MSwer, we can simplify this formula as follows: (iv) max Fj) = FAP, + FMF, + Fy) There are several interesting conclusions that follow directly from (iv). First, there is only one really interesting case and that is when both F, and F; are larger than F¥; if they are not, there is no chance of max F’ being significant, But when thisis the case, the following inequality holds: () max F< Fy Fy This inequality follows from (iv) because when F, >, > FS, the fraction (F, + F3)) (F, + F2) will always be less than or equal to 1, hence F; times this fraction will always be less than or equal to F;. Indeed, the frst equality sign will obtain only if max f= Fy = FS, and max F could not be significant, ot if F; were infinitely large. Since this whole argument is symmetrical for F, and Fi, it follows that in all practical cases, max’ F” will be less than F, or F,, whichever is smaller. Second, if we again assume F, > Fa, then MA +S) This follows because when F, = F;, equation (iv) reduces to max F’= 4(F, + F$).and when Fr becomes larger than F, max F will always be less than this value. ‘As shown in the text, min F"is given by (vi) MSrI(MS.g-* MS war) Multiplying the numerator and denominator by MSt/MSr.sMSwer and simplifying, we obtain (vi) max F (vii) min FC) (vii) min Fi) = FPF, + Fe) It follows directly that min F’ will always be less than F, and F,, and indeed (i) minF’G4F, for Fa Fy min GAP, for Fy F, The degrees of freedom can also be calculated in terms of Fy and Po AF, Ian ot sail degrees of freedom, and Fy has m and ny degrees of freedom, then i=, and j is the nearest imteger calculated by the following formula: Win WS eof By multiplying denominator in ( and simplifying .v Gi) J (+ It should be note simply be twice th nig; im any case, j sum of m, and n; Fils = Fam, Finally, while derive the relatio ‘min F* simply by (vii), This gives (xii) max Fim (sili) max P= ANDERSON, 8, Hot 393-06 Barna, W. F., & Mi for verbal items and extension of, Journal of Expe 1969, 80, 1-46, Bewonx, E. H. Con vocabulary: The verbs in Engl Hague: Mouton, Buenwisci, M. Some adjectivals, Foun: 26. Branston, J.D, B Sentence memot pretive approact 193-209, CannouL, J.B, An quality of trans and Compstaton (Cooney, Che a S10 10. Cambric Cuan, E. ¥. On th before and after Verbal Behavior, Cuanx, E. V. Nort points: More on Coane HOH Lin reasoning. yeh 204, teresting case and that is dF; are larger than FY; if fe is no chance of max F’ But when this is the case, the ty holds: oF follows from (iv) because ‘$1 the fraction (F, + F)/ 1ys be less than or equal to 1 is Fraction will always: be ul to Fy. Indeed, the first obtain only if max F’= Fy could not be significant, oF large. Since this whole metrical for Fy and Fy, it Ul practical cases, max F* | oF Fy, whichever is smaller. in assume F, > F,, then A+ FD use when F, = F,, equation vF'=4(F, + Ff), and when than F;, max F will always alue. text, min F’is given by (vi) MSr(MSr.s + MSwut) wumerator and denominator {Suer and simplifying, we ~ FAPAF, + Fi) that min F° will always be and indeed for Fix Fi for Fishy dom can also be calculated nd Ff Fy has mand my om, and F; has m and ny om, then =n, and j is the alculated by the following wo (Se Sue ‘THE LANGUAGE-AS-FIXED-EFFECT FALLACY" 357 By multiplying both the numerator and denominator in (x) by (MS1/MSr.sMSwor)* and simplifying swe get ods ry (EE It should be noted that when Fy = Fy, j will simply be twice the harmonic mean of mand ‘niin any case, j will always be less than the sum of m, and ny and will equal it only when Filme = Fly Finally, while assuming F, > Fy, we can derive the relationship between max F° and ‘min F° simply by taking the ratio of (i) to (si. This gives (xii) max Fjonin Fa (Fy + PDF, (xii) max F = (1 + FSF) min F Resaunens ‘ANDERSON, S. R. How to get even. Language, 1972, 48, 1893-506, Barc, W. F., & Monraaue, W. E. Category norms for verbal items in 56 categories: A replication nd extension af the Connecticut category norms. Journal of Experimental Psychology Monograph, 1968, Beno. E.'H. Componential analysis of general ocabulary: The semantic structure of set of ferbs in English, Hindi, and Japanese. The Hague: Mouton, 1966, Bienwisci, M. Some semantic universals of German adjectivals. Foundations of Language, 1967, 3. 1~ 36. Bransrono, J. D., BARCLAY, J. Ri & FRANK, J J Sentence memory: A constructive versus inter- Dretive approach. Copniive Peychology, 1972, 3, 193-208, CARROLL, J. B. An experiment in evaluating the ‘quality of translations. Mechanical, Transition and Computational Linguists, 1966, 9. 55-66, Cuoasxy,C. The acguitton of syntax in children from 510 10. Cambridge, Mass: MIT Press, 1969 Cuanx, E. V. On the acquisition of the meaning of ‘before and after. Journal of Verbal Learning and Verbal Behavior, 1971, 10, 266-215, Cuan, E. V. Normal states and evaluative view: points: More on come nd go. Language, i press. Cuan, HH. Linguistic processes. in deductive teasoning. Psychological Review, 1969, 76, 387- 404 (Cuank, H. H., & BEOUN, J.S. The use of syntax in understanding sentences. British Joumal of Poychology, 1968, $9, 219-229, Cuamx, H. H., & Cano, 8. K. The role of semantics in remembering comparative sentences. Journal ‘of Experimental Prychology, 1969, 82, 545-553, Cuan, HH, & Chast, W. G. On the process of ‘comparing sentences against pictures. Cognitive Psychology, 1972, 3, 472-817 Cuanx, H. H,, & Clank, E, V. Semantic distinctions ‘and memory for complex sentences. Quarterly Journal of Experimental Peychology, 1968, 20, 129-138, ‘Couewan, E-B. Generalizing ta language population, Peychologicel Reports, 1964, 14, 219-226, Coveman, EB. Learning of prose written in foue grammatical transformations. Journal of Applied Poychology, 1965, 49, 32-341 Coutins, A Mu, & QuICLIAN, M, R, Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 1969, 8, 240-247 Coutins, A. M., & QUILLIAN, M. R, Does category size affect categorization time? Journal of Verbal Learning and Verbal Behavior, 1970, 9, 32-438 @ Couns, AL M., & QuiLuian, M. R. Facilitating retrieval from semantic memory: The effect of repeating part of an inference, Acta Prychologica, 1970, 33, 304-314. (6) ‘Couuins, A. M., & QUILLIAN, M. R. Categories and subeategories in semantic memory. Paper pre Sented at the annual meeting. of Psychonomic Society, St. Louis, MO, 1971 ‘ConA, C. Cognitive economy in semantic memory Journal of Experimental Psychology, 1972, 92, 149-158, DrSoro, C., LONDON, M., & HANoet, S. Social reasoning and spatial paralogie. Journal of Personality and Social Psychology, 1968, 2, 513~ su. Donatoson, M., & BALFOUR, G. Less is more: A study of language comprehension in children British Journal of Prychology, 1968, 59, 461-172, Fiviwore, C.J. Deietie eategrie inthe semantics of come. Foundations of Language, 1968, 2219-227. Ficawone, C. J. Verbs of judging: An exercise in semantic description. In C. J. Fillmore & D. T. Langendoen (Eds), Studies in linguistic seman tics, New York: Holt, Rinehart, & Winston, 1971 Frasen, J. B. An analysis of “even” in English. In C.'J. Fillmore & D. T. Langendoen (Eds, Studies in lngustic semanies. New York: Holt Rinehart, & Winston, 1971 FretoMan, J.L., & Lorvus, E. F. Retrieval of words from ‘longrierm memory. Journal of Verbal Learning and Verbal Behavior, 1971, 10, 107-115, 358 Hors, L. J. A presuppositional approach to only and ‘ven. Papers from the Sth Regional Meeting, Chicago Linguistic Society, 1969, 98-107 Jomson-Laino, P. N. On understanding tosically ‘complex sentences. Quarterly Journal of Exper ‘mental Prychology, 1969, 21, 113. (a) Jomson-Lamo, P. N. Reasoning with ambiguous Sentences. British Journal of Psychology, 1969, 60, 17-23. (6) Karz, J. 5. Semantic theory and the meaning of “good”, Journal of Philosophy, 1964, 61, 739-168, Kutena, H., & Francis, W.N. Computational analysis Of presenisday American English. Providence, Rl: Brown University, 1967 LanoaveR, T. K., & FRemDMaN, J. L. Information: ‘etrieval from long-term memory: Category size {and recognition time. Journal of Verbal Learning ‘and Verbal Behavior, 1968, 7, 291-298 Lawpauen, TK., & Mever, D. E, Category size and Semanticsmemory retrieval. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 839-889 Lenmen, A. Semantic cuisine. Journal of Linguistics, 1969, §, 38-56, ‘Lortus,E.F. Nouns, adjectives, and semantie memory. Journal of Experimental Pychology, 1972, 96, 213-218. Lorrus, E. F. Category dominance, instance domine ‘ance, and categorization time. Journal of Exper. ‘mental Poychology, 1973, 97, 70-74 Lormus, E. F. Activation’ of semantic memory. American Journal of Psychology. in press Lorrus, E. F., & Faerowan, J. L, Effect of eaegory: fame frequency on the speed of naming. an instance of the category. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 343 Lormis, E. F., Faeepwax, J. L.. & Lormus, GR. Retrieval of words from subordinate and supra: ordinate categories in semantic hierarchies. Peychonomie Sefence, 1970, 21, 238-236. Lrrus, EF. & Semin, i W, Calegorieation nociss for $0 representative instances. Journal of Exper. imental Prychology Monograph, 1971, 91, 355-368. MeveR, D. E. On the representation and retrieval of Stored semantic information, Copnitive Prycho- logy, 1970, 1, 242-300. Morven. D. E: Dual memory-scarch of relat and unrelated semantic categories. Paper presented At the meeting of the Eastern Psychological Association, New York, April, 1971 Mever, D. E. Verifying affirmative and negative Dropositions: Effeets of negation on memory Fetrieval. In S. Kornblum (Ed), Attention and PerformancelV. New York: Academe Pree 1979. Men, D. E. Correlated operations in searching stored semantic categories. Journal of Expert. imental Psychology, in. press CLARK Mev, D. E., & Euus, G, B. Parallel processes in ‘word recognition. Paper presented at the meeting of the Paychonomic Society, San Antoni November, 1970, Mever, D. E., & Scivanevetor, R. W. Facilitation in recognizing pairs of words: Evidence of ‘dependence between retrieval operations, Journal of Experimental Psychology, 1911, 99, 227- 2, Pawo, A., Yuitts, J, & MADIOAN, S, Conereteness, Jmagery, and meaningfulness value for 925 nouns Journal of Experimental Prychology, 1968, 76, ‘Monograph Supplement, No.1, Part 3. Postat, P. M.On the surface verb“remind”. Linguistic Inquiry, 190, 1, 37-120. Postaan, L., & Kerra, G. Norms of word astocian tons, New York: Academic Press, 1970 Ris, L. J., SHOveN, E.J., & Sur, E. E, Semantic stance and the verification of semantic relations Journal of Verbal Learning and Verbal Behavior, 1973, 12, 1-20, osc, E. On the internal structure of perceptual and semantic categories. In T. E, Moore (Ed), Cognitive development and the acquisition of language. New York: Academic Pres, 1973. RusesTan, H., GaRTiELD, L., & MILuiKeN, J. A. Homographic entries in the internal lexicon Yournal of Verbal Learning and Verbal Behavior, 1970, 9, 487-494, Ruwmsre, H., Lew, S. S., & Ruvensten, M, Homographic entries in the internal lexicon Effects of systematicity and relative frequency of ‘meanings. Journal of Verbal Learning and Verbal Behavior, 1971, 10, 7-62. (a) Romensrein, H., Lewis, S. 5, & Rusexsrany, M. Evidence for phonemic recoding in Visual Word recognition. Journal of Verbal Learning and Verbal Rebivior. 1971, 1, 645-687. (0) Smite, W., & WALLACE, I, Semantie similarity and the comparison of word meanings. Journal of Experimental Psychology, 1969, 82, 343- 346, Scuaeeren, BL, & Wattace, R. The comparison of word meanings. Journal of Experimental Psy- haley, 1970, 6, 144-152, (a) Scwasrren, B., & WaLLace, R, Semantic interference: Obligatory oF optional. Journal of Experimental Peychology, 1970, 86, 335-337. (b) Scnvanevetor, R.W., & Meven, D. E. Retrieval and comparison processes in semantic memory. In S, Kornblum (Ed.), tention and Performance IV. New York: Academic Press, 1973, ‘SurmH, E. E. Effects of familiarity of stimulus recoge ‘ition and categorisation, Journal of Experimental Paychology, 1967, 74, 324-332. term memory Verbal Behaoio ssroncon, GoW, methods 6, Ame us, G. B, Parallel processes in 4. Paper presented atthe meeting nomic Society, San Antonio, vanevenor, R, W, Facilitation pairs of words: Evidence of a ‘een retrieval operations. Journal 1 Peychology, 1971, 99, 227- 1, & Maptoan, S. Conereteness, ahingfulness value for 925 nouns. crimenal Peychology, 1968, 76, plement, No.1, Part 2 surface verb “remind” ingutic 37-120. P11, G. Norms of word associa. ‘Academic Press, 1970. E.J., & Surmn, E. E, Semantic Verification of semantic relations. 21 Learning and Verbal Behavior, internal structure of perceptual tegories. In T. E, Moore (Ed), pment and the acquisition of ‘ork: Academic Press, 1973, seLD, L. & MILLIKEN, J. A. tres in the intemal lexicon. 1 Learning and Verbal Behavior, wis, S. Su. de RUBENSTER, M. niries inthe internal. lexicon: latcity and relative frequency of 11 of Verbal Learning and Verbal (0, 37-62.(a) wis, S. S., & RUBENSTEIN, M. onemic recoding in visual word nal of Verbal Learning and 1971, 10, 645-657. (6) ALLACE, R. Semantic similarity Son of word meanings. Journal | Poychology, 1963, 82, 343 ALLACE, R. The comparison of Journal of Experimental Psy- 5, 188-182. (@) (LACE, R, Semantic interference: ptional. Journal of Experimental 2,86, 335-397, (6) v., & Mevar, D. E, Reteeval and ses im semantic memory. In 4.) Attention and Performance IV. demic Press, 1973, of familiarity of stimulus recog- otzation, Journal of Experimental 7,74, 24-332, ‘THE LANGUAGE-AS-FIXED-EFFECT FALLACY 359 Sur, E. E., HAVILAND, S. E., Bucktzy, P. B., & Witxiss, A. Conjoint frequency, category size, and ‘Sack, M. Retrieval of artificial facts from lon categorization time. Journal of Verbal Learning term memory. Journal of Verbal Learning and and Verbal Behavior, 1971, 10, 382-385. Verbal Behaotor, 1972, 11, 583-593. ‘Winer, B. J. Stavstcal principles in experimental Swepecon, G. W., & Cocitax, W. G. Statistical design. New York: McGraw-Hill, 971. ‘methods. Ames: lowa State University Pres, 1967, (Received December 18, 1972)

You might also like