Professional Documents
Culture Documents
Language Testing
Language Testing
http://ltj.sagepub.com/
Published by:
http://www.sagepublications.com
Additional services and information for Language Testing can be found at:
Subscriptions: http://ltj.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://ltj.sagepub.com/content/30/4/535.refs.html
What is This?
/$1*8$*(
Article 7(67,1*
Language Testing
30(4) 535–556
Re-examining the content © The Author(s) 2013
Reprints and permissions:
validation of a grammar sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0265532213489568
test: The (im)possibility of ltj.sagepub.com
J. Charles Alderson
Lancaster University, UK
Benjamin Kremmel
University of Innsbruck, Austria
Abstract
“Vocabulary and structural knowledge” (Grabe, 1991, p. 379) appears to be a key component
of reading ability. However, is this component to be taken as a unitary one or is structural
knowledge a separate factor that can therefore also be tested in isolation in, say, a test of syntax?
If syntax can be singled out (e.g. in order to investigate its contribution to reading ability), this
test of syntactic knowledge would require validation. The usefulness and reliability of using expert
judgments as a means of analysing the content or difficulty of test items in language assessment
has been questioned for more than two decades. Still, groups of expert judges are often called
upon as they are perceived to be the only or at least a very convenient way of establishing key
features of items. Such judgments, however, are particularly opaque and thus problematic when
judges are required to make categorizations where categories are only vaguely defined or are
ontologically questionable in themselves. This is, for example, the case when judges are asked to
classify the content of test items based on a distinction between lexis and syntax, a dichotomy
corpus linguistics has suggested cannot be maintained. The present paper scrutinizes a study by
Shiotsu (2010) that employed expert judgments, on the basis of which claims were made about
the relative significance of the components ‘syntactic knowledge’ and ‘vocabulary knowledge’ in
reading in a second language. By both replicating and partially replicating Shiotsu’s (2010) content
analysis study, the paper problematizes not only the issue of the use of expert judgments, but,
more importantly, their usefulness in distinguishing between construct components that might,
in fact, be difficult to distinguish anyway. This is particularly important for an understanding and
diagnosis of learners’ strengths and weaknesses in reading in a second language.
Keywords
Content analysis, grammar, judgments, reading in a second language, vocabulary
Corresponding author:
J. Charles Alderson, Lancaster University – Linguistics and English Language, County College South,
Lancaster University, Lancaster, LA1 4YL, UK.
Email: c.alderson@lancaster.ac.uk
Alderson (1993a) argues that the use of so-called experts to judge the content or difficulty
of test items is highly questionable as these judgments are often of limited accuracy, reli-
ability and validity. Bachman et al. (1996) and Alderson et al. (2012) have also demon-
strated that judgments about salient item characteristics appear rather obscure and arbitrary
and that agreement between judges is moderate at best. Nevertheless, language testers still
frequently rely on expert judgments when attempting to validate the content or construct
of a particular test. In a study investigating the relative significance of different compo-
nents of second language (L2) reading, Shiotsu (2010), as part of a preliminary study,
employed expert judges to justify a test of 35 items as a valid and suitable measure of
syntactic knowledge. Based on these judgments, Shiotsu (2010) then decided which items
should be included in the test of syntactic knowledge and thus form the basis of further
statistical analyses, their results and inferred claims about the nature of L2 reading.
Given Alderson’s (1993a) caution against the use of expert judgments and other
criticisms that have been voiced concerning the study in question (Brunfaut, 2009),
Shiotsu’s preliminary study should be scrutinized in more detail, analysing closely the
logic and rationale behind the inclusion or exclusion of certain items and replicating
the study to see whether findings can be corroborated with different groups of experts.
The aim of the present paper is thus twofold: It will problematize the use of expert
judgments for content validation in general, but will also discuss the particular diffi-
culty when the judges’ task is to make a clear construct distinction between syntactic
and lexico-semantic knowledge, two categories that, by nature, overlap and have
blurred boundaries.
This paper will present the findings of an examination of Shiotsu’s (2010) content
analysis study. The paper briefly outlines the context of the study and Shiotsu’s (2010)
findings in the original study in order to contextualize and facilitate interpretation of the
insights gained from the present study. It then presents the results of replications of this
study and discusses findings from using an alternative approach to judgment gathering to
investigate whether Shiotsu’s (2010) test can be confirmed as an instrument that mainly
measures syntactic knowledge and therefore has yielded trustworthy results that form the
basis of claims about the relative significance of syntactic knowledge in L2 reading abil-
ity. Finally, the implications of the study for our understanding of L2 reading and for
future research are discussed.
Context
Adopting a component model of reading rather than focusing on the cognitive process-
ing, numerous researchers in the past have attempted to model L2 reading ability and
explain the relative contribution of different components to reading, or rather reading
test performance. Amongst these, “vocabulary and structural knowledge” (Grabe,
1991, p. 379) appears to be one of the most prominent components according to
research. Nevertheless, it is generally agreed that vocabulary knowledge best predicts
reading test performance.
Alderson (2000) states that “factor analytic studies of reading have consistently found
a word knowledge factor on which vocabulary tests load highly” (p. 99) and that there-
fore vocabulary knowledge is an important predictor of variance in reading test
performances (Qian, 2002). Baddeley et al. (1985), Dixon et al. (1988), Cunningham et
al. (1990), Beck and McKeown (1991) and Daneman (1991) identified vocabulary as an
important component of fluent L1 reading. Hacquebord (1989), Bossers (1989), Laufer
(1992) and Schoonen et al. (1998) report similar findings for the L2 context. Yamashita
(1999) also claims that L2 vocabulary knowledge surpasses L2 grammar knowledge in
explaining L2 reading variance. Brisbois (1995), using grammar and vocabulary as inde-
pendent predictor variables in her analysis, found that vocabulary measures showed a
higher correlation with reading scores than did the grammar measure.
However, a recent paper by Shiotsu and Weir (2007) criticizes the methodological
bias and shortcomings in previous studies and concludes that “the literature on the rela-
tive contribution of the grammar and vocabulary knowledge to reading performance is
too limited to offer convincing evidence for supporting one or the other of the two pre-
dictors” (Shiotsu & Weir, 2007, p. 105).
Instead, Shiotsu and Weir claim that “the role of vocabulary appears somewhat
overstated while that of grammar understated” (p. 104), which would be in accord-
ance with studies by Alderson (1993b) and Bachman et al. (1989) which found that
grammar tests do explain a substantial percentage of variance in reading test perfor-
mances. Kaivanpanah and Zandi (2009) concluded from their findings that “syntac-
tic behavior is more related to reading comprehension than vocabulary knowledge”
with students’ scores on the TOEFL grammar test outperforming their scores on Qian
and Schedl’s (2004) Depth of Vocabulary Knowledge Test as a predictor of reading
test performance.
While it is beyond the remit of this paper to examine the methodology of all of these
studies which appear to support Shiotsu’s (2010) and Shiotsu and Weir’s (2007) claim,
the present paper will scrutinize the methodology and results of Shiotsu’s (2010) prelimi-
nary study, on which the claim is based that syntactic knowledge is a better predictor of
L2 reading test performance than vocabulary knowledge. Only if Shiotsu’s (2010) test
can be confirmed as measuring mainly, or exclusively, syntactic knowledge, can the
results it has produced be regarded as a reliable basis for any claims regarding the con-
struct of reading L2 ability.
Table 1. Results of Shiotsu’s original content analysis study (Shiotsu, 2010, p. 63).
Item A B C
1 12 2 0
2 12 1 1
3 11 0 3
4 7 5 2
5 13 1 0
6 7 3 4
7 11 2 1
8 9 1 4
9 9 4 1
10 6 4 4
11 12 2 0
12 5 8 1
13 13 1 0
14 7 5 2
15 11 1 2
16 9 3 2
17 12 1 1
18 5 9 0
19 10 0 4
20 10 0 4
21 4 0 8
22 11 0 2
23 9 4 1
24 9 1 3
25 8 3 3
26 10 1 3
27 12 1 1
28 8 0 5
29 10 3 1
30 10 1 2
31 7 6 1
32 13 1 0
33 9 4 1
34 11 2 1
35 9 3 1
discussion in such a study would only show “the success of the cloning process” (p. 96)
but not provide an unbiased picture of what experts actually thought the items were test-
ing, just as in the reference study no training, discussion or category modification was
offered to the judges. The results of this study can be found in Table 2.
The replication study confirmed the legitimacy of Shiotsu’s exclusion of items 12, 18
and 21 from the test. While the group of judges also found that this was in principle a test
of syntax (437 out of 735 ratings included the syntax category), nine items clearly emerged
as problematic (items 4, 10, 12, 14, 18, 21, 25, 28 and 31), even if Shiotsu’s questionable
rationale for item exclusion was applied. In addition, item 33 does not show a clear ten-
dency or majority vote in terms of its categorization. This replication study therefore
indicates that items 10, 14, 25, 28, 31 and 33, which were all included in Shiotsu’s original
test, cannot be unambiguously justified in this test of syntactic knowledge.
study. The results of this study, in which 14 university professors of applied linguistics
participated, are shown in Table 4.
Table 4 probably best illustrates the insecurity of judges with this categorization task.
Again, the findings seem to suggest that this 35-item test is in general one of syntactic
Item A B C A/B B/A B/C A/C C/A C/B C/A/B A/B/C C/B/A A? ?
1 10 1 – 2 1 – – – – – – – –
2 7 3 1 1 1 – 1 – – – – – –
3 12 – 2 – – – – – – – – – –
4 5 4 1 2 1 – – – 1 – – – – –
5 9 2 – 1 – – 1 – – – – 1 –
6 12 1 – – – – – – – – – 1 –
7 9 1 – – – – 3 – – – – 1 –
8 12 1 – – – – – 1 – – – – –
9 7 4 – 3 – – – – – – – – –
10 6 5 – 2 1 – – – – – – – –
11 6 3 – 2 2 – – – – – – 1 –
12 2 10 1 1 – – –
13 13 – – – – – – – – – – 1 –
14 4 8 – – 1 1 – – – – – – –
15 11 1 – – – – – 1 – – – 1 –
16 9 1 1 2 – – – – – – – 1 –
17 9 2 – 3 – – – – – – – – –
18 5 6 – – 1 2 – – – – – – –
19 13 – 1 – – – – – – – – – –
20 12 – 1 – – – 1 – – – – – –
21 3 – 5 – – – – 1 – – – – 5
22 9 – 2 – – – 2 1 – – – – –
23 10 – 2 – – – 2 – – – – – –
24 12 – 1 – – – – 1 – – – – – –
25 4 2 3 3 1 – – – 1 – – – – –
26 7 – 1 1 – – 3 – 1 – 1 – – –
27 12 – – – – – 1 – – – – 1 – –
28 6 – 2 2 – – 1 2 – 1 – – – –
29 10 – 2 – – – 2 – – – – – – –
30 10 – 2 1 – – 1 – – – – – – –
31 4 3 2 2 – – – – 2 – 1 – – –
32 11 – 1 – – – 1 1 – – – – – –
33 5 2 1 3 – – 2 – 1 – – – – –
34 10 – 2 – – – – – – – 2 – – –
35 8 – – 1 – – 2 1 1 – 1 – – –
knowledge, albeit not as convincingly as the results of the studies already discussed. Of the
490 ratings, 294 included the syntax category (60%). As in both the original study and the
second replication study, the judgments for items 12, 18 and 21 clearly suggest that they should
be excluded from this test of syntax as they are tapping into a different reading component.
However, as in the second replication study, ratings for item 14 would also suggest
that, adhering to Shiotsu’s logic, this item should be eliminated as it is not clearly and
convincingly an item testing syntactic knowledge. Items 9 and 25, identified as possible
drop-outs in the second replication study, would not have to be eliminated on the basis of
the judgments of the third replication study if Shiotsu’s principle for inclusion was applied.
However, both of these items do not seem to be clear-cut syntax items, as Table 4 sug-
gests. Applying the alternative principle of majority decisions to the judgments, these
two (items 9 and 25) and another 8 items (items 2, 4, 10, 11, 26, 28, 31 and 33) would
emerge as problematic, four of which (items 10, 28, 31 and 33) were also identified as
problematic in the second replication study.
of the continuum and thus suggests that it should not be included in a syntactic knowl-
edge measure. Again, the results suggest that items 12 and 18 should be removed from
the test as they are testing lexico-semantic knowledge rather than syntactic knowledge.
Item 21, identified as a drop-out in the other investigations, does not emerge as problem-
atic from these results. This, however, may be due to the fact that it was classified as a
‘sentence comprehension’ item in the other studies, for which there was no category
available in this rating procedure.
However, item 14, identified as a potential drop-out in both the first and the second
replication study and not convincingly justified as a syntax item in the original study,
clearly emerges as a problematic item with a mean rating of 4.75. This strongly suggests
that this item should be removed from the syntactic knowledge measure. Items 9, 25, 31
and 33, highlighted as potentially problematic in both the second and the third replication
study, also have values well above the cut-score and might thus not be justifiable as items
testing syntactic knowledge. The previous findings for item 28, which suggested that this
item might also be removed from the syntactic knowledge measure, could not be con-
firmed in the fourth replication study. For item 2 the results of the third replication study,
in which this item was identified as a potential drop-out, could be confirmed.
rated by this group as tending to test syntactic knowledge. This echoes the judgments of
this group on these items using the original classification grid. Also, item 10, identified
clearly by the group as a lexico-semantic item in the first replication study, is just below
the stipulated cut with a mean rating of 3.48.
Overview of results
Table 7 shows which items were identified as potential candidates for exclusion from the
test of syntactic knowledge in question by the five replication studies outlined above.
‘XX’ marks items that should be excluded according to the original rationale of Shiotsu,
that is, if the syntax category did not receive the highest number of votes. ‘X’ marks
items that should be excluded according to an alternative principle, that is, if the syntax
category did not receive the majority of votes. ‘?’ marks problematic items identified in
studies four and five, which employed a slightly different methodology and thus also a
different rationale for item exclusion.
Contrary to Shiotsu’s study, which only yielded three problematic items, five replica-
tion studies, conducted using identical as well as similar methods of gathering expert
judgments in content analyses in the original study, found that only 20 items out of 35
emerged as unproblematic and clear syntax items from all studies. Fifteen items in total
were shown to require further scrutiny as they could clearly be questioned: items 2, 4, 6,
9, 10, 11, 12, 14, 18, 21, 25, 26, 28, 31 and 33 (see Appendix).
Three items (14, 25 and 31), which were not excluded in the original study, were identi-
fied as problematic by all five replication studies. As all five replication studies suggest that
these three items cannot justifiably be maintained as syntax items, the legitimacy of the test
in question and all results and claims based on it are questioned. An exclusion of at least
these three items (in addition to the three originally excluded items) and a subsequent re-
run of the original analysis, as well as a comparison of results against a re-analysis using
the 20 remaining syntactic items, appears necessary as different findings might result.
classified as having 50% minimum agreement with a level of confidence of 0.05 the value
in the cell for the lower bound of the confidence interval must be 50% or greater.
For example, if six out of eight judges (75%) agreed that an item tested a specific
construct, a value of ~41% would be read from Table 8 (down the 70%+ column and
across the eight judges column). Thus, we could not say, with a level of confidence of
0.05, that if we collected more data the value for agreement would not drop below 50%.
So, either, more data must be collected or the judges should not be considered to be pro-
viding sufficient evidence that the item is testing the specific construct, given the cut-off
of 50%. However, if seven out of eight judges (88%) agreed then a value of ~51% would
be read and we could accept the evidence, with a 95% level of confidence, that the item
tested the specific construct.
This method of assessing agreement, if it were adopted as a standard, would have two
main benefits. First of all, it would provide guidance to researchers as to the number of
judges that they need for acceptable levels of confidence in their classifications. And,
secondly, it would provide a standardized tool for researchers using judgments which
would allow better comparability between the judgments gathered by different studies.
However, it should be noted that this method tends towards the conservative; an observed
agreement of 50% will never yield an acceptable result and minimum proportions of 5/5,
8/10 and 14/20 are required to accept agreements with a 0.05 level of confidence.4
are still used frequently, if not exclusively, to validate the content or construct of tests. If
a sufficient number of studies had already made the problematic nature of ‘experts’ and
their judgments in language testing clear, the field needs to ask itself why it is that
‘expert’ judgments are still often solely relied upon in these matters. The question
becomes even more pertinent when considering that the literature has, over the past dec-
ades, suggested several alternative (statistical) methods of content validation (Buck &
Tatsuoka, 1998; Lee & Sawaki, 2009, who propose using Q-matrices of ‘expert’ judg-
ments against item attributes). But a Q-matrix is, after all, nothing more than a collection
of human judgments and if the human judgments comprising the Q matrix are incorrect,
the resulting diagnostic classifications will also be incorrect. However, it is rare for other
empirical but non-statistical approaches to be employed instead of ‘expert’ judgments.
The fact that the use of ‘expert’ judgments continues to be a widespread and recom-
mended procedure (e.g. in large-scale assessments such as OECD PISA in 2000, 2003,
2006, 2009, as well as 2012), indicates that the findings of the present studies are still
worth discussing in order to further raise awareness.
In addition, the qualifications, degree of expertise and reliability of judges in content
validation studies need to be problematized as well. It is not at all clear exactly what
criteria should be used to qualify judges as ‘experts’, and the authors are not aware of any
studies, including the reference study, that would provide supporting evidence for such
criteria or qualificatory credentials.
The fact that all ‘experts’ employed in the replication studies held a degree in linguis-
tics or applied linguistics should ensure that the judgment categories were familiar and
that “their qualification for the task should not be in doubt” (Weir, Hughes & Porter,
1990, p. 507), but the different amounts of experience within the expert groups should be
acknowledged as a limitation of all such studies to date.
The second, arguably more important and more interesting insight concerns the con-
struct of ‘grammar’ and, potentially, also ‘L2 reading ability’, since one would ideally
require a clear distinction to be made between vocabulary and structural knowledge, par-
ticularly for diagnostic assessment. Such a clear distinction between syntactic and lexico-
semantic knowledge, however, might be neither achievable nor desirable since several
linguists have argued for abandoning the vocabulary–grammar dichotomy. Although lexis
and grammar have traditionally been kept apart, evidence from corpus linguistics suggests
that vocabulary and grammar, because of the highly patterned structure of language, “are
in fact inseparable” (Römer, 2009, p. 141). Römer argues that the traditional grammar–
lexicon dichotomy “may hold true for sentences which have been invented in order to
illustrate it, but it collapses when we consult real language data” (2009, p. 142). Lewis
(1993) claims that “the grammar/vocabulary dichotomy is invalid” (p. vi) and argues that
“language consists of grammaticalised lexis” (p. vi). Lewis (1993) further maintains that
“dichotomies simplify, but at the expense of suppression” (p. 37) and suggests placing
“lexical items” (p. 89), that is, words, multi-word units, polywords (e.g. phrasal verbs) or
collocations, on a cline or scale instead. Nattinger and DeCarrico (1992) claim that “lexi-
cal phrases [are] form/function composites, lexico-grammatical units that occupy a posi-
tion somewhere between the traditional poles of lexicon and syntax” (p. 36). Sinclair
(2004) asserts that “so strong are the co-occurrence tendencies of words, word classes,
meanings and attitudes that we must widen our horizons and expect the units of meaning
to be much more extensive and varied than is seen in a single word” (p. 39), suggesting
that traditional tests of vocabulary employed to investigate the contribution of vocabulary
knowledge to reading ability only paint half the picture and that “lexicogrammar”
(Sinclair, 2004, p. 39) should perhaps instead be treated as a unitary component of reading
ability rather than attempting to distinguish between vocabulary and grammar.
This concern about the real divisibility of the two components has already been raised
by Shiotsu and Weir (2007) and Brunfaut (2008) in comments on the original study in
question. Findings from other studies investigating the relative contribution of vocabu-
lary knowledge and grammar knowledge appear to confirm that the relation between
syntax and lexis is a continuum, as researchers have consistently found high correlations
between the two components (Brisbois, 1995; Shiotsu & Weir, 2007; Brunfaut, 2008). It
might therefore be of interest for future research to construct tests of lexis in a more
phraseological approach and to examine tests of formulaic sequences or multi-word units
to see whether they would account for the same amount of variance in reading test per-
formances as traditional vocabulary and grammar measures taken together. In any case,
testers and applied linguists need to recognize the slipperiness of the slope between the
constructs and need to qualify or describe their dichotomies. Using Likert scales symbol-
izing this continuum as operationalized in replication studies 4 and 5 instead of categori-
cal classifications might be a first step towards this but further replication studies and
increased problematization of judgments employed in research are needed.
Most importantly, future research needs to define its constructs better, needs to avoid
simplistic statements to the effect that Grammar is more important than Vocabulary, but
rather should make more nuanced and properly researched statements about which
aspects of which constructs seem more or less relevant to predicting reading ability in a
second language.
Acknowledgements
We wish to thank Ari Huhta and Tineke Brunfaut as well as the judges who took part in the various
replication studies, and the anonymous reviewers for their valuable feedback on earlier versions of
this paper. Part of this paper is based on a Master’s dissertation submitted to Lancaster University,
UK in December 2012.
Funding
This research received no specific grant from any funding agency in the public, commercial, or
not-for-profit sectors.
Notes
1. For some items, no response was given by some judges, which is why the total amount of rat-
ings does not amount to the expected 14 × 35 = 490.
2. The findings from Shiotsu`s main study (2010) showed that “Syntactic Knowledge (β =
.73, p < .001) is the strongest predictor of the overall Passage Reading Comprehension
performance while Vocabulary Breadth (β = .13, p < .05) […] made additional but much
smaller contributions to the prediction” (p.124f.). In Shiotsu and Weir’s (2007) study, feed-
ing partly into Shiotsu’s main study (2010), syntax was shown “to exceed vocabulary in
standardized regression weight (.61* vs. .34*), percentage of reading variance explained
(79% vs. 72%) and percentage of reading variance uniquely explained (11% vs. 4%)”
(Shiotsu & Weir, 2007, p. 114). In contrast to the original study, there are 7 extra ratings
(14 × 35 = 490) in this study, which is due to the fact that some raters allocated some items
to two or more categories.
3. In contrast to the original study, there are seven extra ratings (14 × 35 = 490) in this study,
which is due to the fact that some raters allocated some items to two or more categories.
4. We are indebted to our colleague Gareth McCray for this solution, the rationale and the
examples.
References
Agresti, A., & Coull, B. A. (1998). Approximate is better than ‘exact’ for interval estimation of
binomial proportions. The American Statistician, 52(2), 119–112.
Alderson, J.C. (1993a). Judgments in language testing. In D. Douglas & C. Chapelle (Eds.), A new
decade of language testing research (pp. 46–57). Alexandria, VA: TESOL.
Alderson, J.C. (1993b). The relationship between grammar and reading in an English for academic
purposes test battery. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing
research (pp. 203–219). Alexandria, VA: TESOL.
Alderson, J.C. (2000). Assessing reading. Cambridge: Cambridge University Press.
Alderson, J.C. (2011). Investigating content analysis judgments. Unpublished manuscript.
Alderson, J. C., Brunfaut, T., McCray, G., & Nieminen, L. (2012). Components of reading in first
and second language, test item difficulty and overall reading ability. Paper presented at AAAL.
Boston, MA, March 24–27.
Bachman, L.F., Davidson, F., Lynch, B., & Ryan, K. (1989). Content analysis and statistical mod-
eling of EFL proficiency tests. Paper presented at The 11th Annual Language Testing Research
Colloquium, San Antonio, Texas.
Bachman, L.F., Davidson, F., Ryan, K., & Choi, I.C. (1995). An investigation into the compara-
bility of two tests of English as a foreign language. Cambridge: Cambridge University Press.
Bachman, L.F., Davidson, F., & Milanovic, M. (1996). The use of test method characteristics in
the content analysis and design of EFL proficiency tests. Language Testing, 13, 125–150.
Baddeley, A., Logie, R., Nimmo-Smith, I., & Brereton, N. (1985). Components of fluent reading.
Journal of Memory and Language, 24, 119–131.
Beck, I.L., & McKeown, M. (1991). Conditions of vocabulary acquisition. In R. Barr, M.L. Kamil,
P. Mosenthal & P.D. Pearson (Eds.), Handbook of Reading Research, Vol. II (pp. 789–824).
Mahwah, NJ: Lawrence Erlbaum.
Bossers, B. (1989). Lezen in de tweede taal: een taal- of leesprobleem? Toegepaste Taalweten-
schap in Artikelen [Reading in the second language: a language or reading problem?], 38,
176–188.
Brisbois, J.E. (1995). Connections between first- and second-language reading. Journal of Read-
ing Behavior, 27, 565–584.
Brunfaut, T. (2008). Foreign language reading for academic purposes. Students of English (native speak-
ers of Dutch) reading English academic texts. Unpublished PhD thesis, University of Antwerp.
Brunfaut, T. (2009). The relative contribution of grammar and vocabulary to explaining reading
test performance. Paper presented at the 6th Annual Conference of EALTA, Turku, Finland.
Buck, G., & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing:
examining attributes of a free response listening test. Language Testing, 15(2), 119–157.
Cunningham, A.E., Stanovich, K. E., & Wilson, M.R. (1990). Cognitive variation in adult college
students differing in reading ability. In T.H. Carr & B.A. Levy (Eds.), Reading and its develop-
ment: Component skills approaches. San Diego, CA: Academic Press.
Daneman, M. (1991). Individual differences in reading skills. In R. Barr, M.L. Kamil, P. Mosen-
thal & P.D. Pearson (Eds.), Handbook of reading research, Vol. II (pp. 512–538). Mahwah,
NJ: Lawrence Erlbaum.
Dixon, P., LeFevre, J.A., & Twiley, L.C. (1988). Word knowledge and working memory as predic-
tors of reading skill. Journal of Educational Psychology, 80(4), 465–472.
Grabe, W. (1991). Current developments in second-language reading research. TESOL Quarterly,
25(3), 375–406.
Hacquebord, H. (1989). Tekstbegrip van Turkse en Nederlandse leerlingen in het voortgezet
onderwijs. [Reading Comprehension of Turkish and Dutch Students Attending Secondary
Schools] Groningen: RUG.
Kaivanpanah, S., & Zandi, H. (2009). The role of depth of vocabulary knowledge in reading com-
prehension in EFL contexts. Journal of Applied Sciences, 9, 698–706.
Laufer, B. (1992). Reading in a foreign language: how does L2 lexical knowledge interact with the
reader’s general academic ability? Journal of Research in Reading, 15, 95–103.
Lee, Y., & Sawaki, Y. (2009). Application of three cognitive diagnosis models to ESL reading and
listening assessments. Language Assessment Quarterly, 6, 239–263.
Lewis, M. (1993). The lexical approach: the state of ELT and a way forward. London: Language
Teaching Publications.
Lumley, T. (1993). Reading comprehension sub-skills: Teachers’ perceptions of content in an EAP
test. Melbourne Papers in Language Testing, 2(1), 25–60.
Nattinger, J., & DeCarrico, J. (1992). Lexical phrases and language teaching. Oxford: Oxford
University Press.
Qian, D.D. (2002). Investigating the relationship between vocabulary knowledge and academic
reading performance: an assessment perspective. Language Learning, 52, 513–536.
Qian, D.D., & Schedl, M. (2004). Evaluation of an in-depth vocabulary knowledge measure for
assessing reading performance. Language Testing, 21, 28–52.
Römer, U. (2009). The inseparability of lexis and grammar: corpus linguistic perspectives. Annual
Review of Cognitive Linguistics, 7, 141–163.
Schoonen, R., Hulstijn, J., & Bossers, B. (1998). Metacognitive and language-specific knowledge
in native and foreign language reading comprehension: an empirical study among Dutch stu-
dents in grades 6, 8 and 10. Language Learning, 48(1), 71–106.
Shiotsu, T. (2010). Components of L2 reading: Linguistic and processing factors in the reading
test performances of Japanese EFL learners. Cambridge: Cambridge University Press.
Shiotsu, T., & Weir, C.J. (2007). The relative significance of syntactic knowledge and vocabulary
breadth in the prediction of reading comprehension test performance. Language Testing, 24,
99–128.
Sinclair, J.McH. (2004). Trust the text: Language, corpus and discourse. London: Routledge.
Weir, C.J., Hughes, A., & Porter, D. (1990). Reading skills: Hierarchies, implicational relation-
ships and identifiability. Reading in a Foreign Language, 7(1), 505–510.
Wilson, E. B. (1927). Probable inference, the Law of Succession, and statistical inference. Journal
of the American Statistical Association, 22, 209–212.
Yamashita, J. (1999). Reading in a first and a foreign language: a study of reading comprehension
in Japanese (the L1) and English (the L2). Unpublished PhD thesis, Lancaster University.
9. ______ a pity you did not check the figures with your
partner.
A. What’s B. That’s C. There’s D. It’s
12. ______ how hard he worked, his tutor never commented on it.
A. Of no account B. No matter C. Without regard D. Mindless
25. ______ some mammals came to live in the sea is not known.
A. Which B. Since C. Although D. How
26. ______ their nests well, but also build them well.
A. Not only brown thrashers protect B. Protect not only brown
thrashers
C. Brown thrashers not only protect D. Not only protect brown
thrashers
C. which certain plants and animals give off the heatless light
D. is the heatless light given off by certain plants and ani-
mals
Conifers first appeared on the Earth ______ the early Perm-
31.
ian period, some 270 million years ago.
A. when B. or C. and D. during
33.
______ a baby turtle is hatched, it must be able to fend for
itself.
A. Not sooner than B. No sooner C. So soon that D. As soon as
3.
By the time this course finishes ______ a lot about engi-
neering.
A.
I will learn B. I learn C. I will have learnt D. I have
learnt
7.
As a result of his lectures she ______ by this new approach
to teaching.
A.
was influenced B. has influenced C. influenced D. had influenced
15. You’d better ______ to the doctor next time you feel ill.
A. to go B. going C. go D. gone
17.
He is ______ proud man that he would rather fail than ask
for help.
A. so a B. such C. a so D. such a
20.
If only he ______ down the results when he did the experi-
ments!
A. writes B. had written C. has written D. was writing
22.
Vitamin C, discovered in 1932, ______ first vitamin for
which the molecular structure was established.
A. the B. was the C. as the D. being the
23.
The behavior of gases is explained by ______ the kinetic
theory.
A. what scientists call B. what do scientists call
C. scientists they call D. scientists call it
24.
Ironically, sails were the salvation of many steamships
______ mechanical failures.
A. they suffered B. suffered C. were suffered D. that had suf-
fered
27. The name Nebraska comes from the Oto Indian word ‘nebrathka,’
______ flat water.
A. to mean B. meaning C. it means D. by meaning
29.
Rich tobacco and champion race horses have ______ of Ken-
tucky.
A. long been symbols B. been long symbols
C. symbols been long D. long symbols been
32.
There are very few areas in the world ______ be grown suc-
cessfully.
A. where apricots can B. apricots can
C. apricots that can D. where can apricots
34.
Tungsten, a gray metal with the ______, is used to form the
wires in electric light bulbs.
A. point at which it melts is the highest of any metal
B. melting point is the highest of any metal
C. highest melting point of any metal
D. metal’s highest melting point of any