Professional Documents
Culture Documents
REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/328006?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension
Using a Multiple-Choice Rational Cloze
MARSHA BENSOUSSAN and RACHEL RAMRAZ
approximately
omitted) or rational cloze (the test designer de- 13,000 students yearly at both
Haifa test
cides which words to omit) is a better and Tel
ofAviv universities.
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 231
able questions are discarded, wide a rangeif ofan items as could be tested by
insufficient
number of questions remains, means of thethe conventional
entire multiple-choice
text
must be discarded or re-pretested reading comprehension untiltest." results
are satisfactory. Moreover, a theoretical issue
arises because it is not clear whether multiple-
SOLUTION: FILL-IN TEST
choice scores reflect test comprehension, ability
to choose the correct distractor, Linguistic Basis. or According
both.6 to discours
Despite these disadvantages, sis theory, multiple-choice
a text has three levels of m
questions are useful because of the focusing
1) the micro-level, efficient on the lexi
scoring. An even more of words and their interaction
important with other words is
advantage
that a wide range of skills in themay
context; be
2) the tested by
pragmatic level, this
which is
method. extra-textual and draws on the reader's general
knowledge of the world; and 3) the macro-level,
ALTERNATE SOLUTION: CLOZE PROCEDURE
dealing with the functions of the sentences and
A number of researchers have the proposed
structure the
of the text as a whole." Excerpts
random cloze procedure for testing from parallel sections
foreign lan- of one of the texts used
guage reading comprehension.7inIt has a good
Experiment Two, once as a fill-in test and
text-item ratio and requires relatively
once as alittle re-
multiple-choice comprehension test,
adjustment time between items. appearThe disadvan-
below in the Appendix. Items are coded
tages, however, are great. The according
random cloze
to their level of meaning. Although
cannot be pre-tested, since by re-inserting an evident from the short
it may not be totally
unsuccessful word into a gap, the sample
testwe selected, we aimed to include as
designer
would double the permitted spanmany between gaps
macro-level items as possible in both fill-
in and
(i.e., fifteen instead of the required multiple-choice
seven), and tests (see Experiment
it would no longer be a randomTwo cloze test.
and Table III).
Another disadvantage is theoretical: even
Micro-level items would test specific under-
though omitting every nth wordstanding
may giveof a word
the or collocation where the
test designer a random sample of theappeared
clue(s) text,initclose proximity to the blank
still will not give direct information
(one oron whether
two words before and/or after). Macro-
examinees know specific wordslevel
or items
phrases
would in
test a more general under-
the text.
standing of larger units of text (e.g., writer's
In a non-random or rational cloze, the test opinion, words showing comprehension of key
designer decides how many words to delete.8 concepts, function words signalling contrast/
Thus the two central problems of test construc- opposition, main idea of paragraph).
tion would be solved: the number of words be-
Fill-in Test Construction. Twenty to thirty blank
tween gaps no longer matters and test designers
spaces selected by the test designers were in-
can decide on the words they wish to test.9 serted into a 300-word text. Each blank space
However, marking is awkward and time-con- takes the place of a word or phrase (of not more
suming, and therefore we could not use the
than three words), and for each blank space,
rational cloze in our English entrance examina-
there is a choice of four possible answers. The
tion.
fill-in test modifies the cloze procedure in three
For our purposes, we needed a multiple- ways: 1) possible responses are already pro-
choice version of the cloze. Those discussed in vided in a multiple-choice format - unlike the
the literature are based on random cloze pro- cloze, which contains gaps in the text that need
cedure, with a gap after every nth word.10 Al- to be filled in; 2) unlike the cloze test, a blank
though this test format suited our needs, we space in a fill-in test can take the place of more
wished to have more control over items. In any than one word; 3) blank spaces are placed not
case, as a result of the multiple-choice format, after every nth word, but within a range of
test designers present the examinee with spe- seven to fifteen words or more. Each blank
cific choices, thus counteracting whatever ran- space reflects the comprehension of either
dom effect the omission of every nth word is micro-level (i.e., word, phrase, or clause) or
supposed to have. Thus, it is no longer a purely macro-level (i.e., sentence, paragraph, or
random cloze. We opted instead for the rational whole-text) meaning in the text. As much as
cloze and included, as nearly as possible, as possible, we focused the fill-in test items to
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
232 Marsha Bensoussan & Rachel Ramraz
can follow the thought sequence. We aimed for a wide variety of items. For
this reason half the blanks represent content
Since it is relatively difficult to find words
words (nouns, verbs, adjectives, adverbs) and
that would indicate a student's knowledge of the
whole text, preference could be given to firstthe other half function words (conjunctions,
placing blanks which test general reading com-prepositions, word forms). Test constructors are
prehension. Function words, such as "however" free, however, to focus the test on whatever
and "therefore," would be good places for level of discourse they wish to test.
blanks. Other items tested could be cohesive Rationale for Determining Distractors. In choos-
ing distractors, the test constructor may make
markers such as "not only . . but also," "either
use of collocations, presenting words that could
S. . or," and "on one hand . . . on the other
hand." It is assumed that a student's recogni- appear together and make sense in some other
context. For this reason, opposites are particu-
tion of these syntactic devices would enable him
larly useful. They test the student's understand-
to follow the flow of an argument, and that lack
of recognition would impede his comprehen- ing of the whole text. Conjunctions are also
sion. helpful here. A student choosing "therefore"
Content words such as nouns, adjectives, when only "however" would fit the context may
and verbs which carry the weight of an argu- have understood a particular sentence, but cer-
ment would also be useful, and their opposites tainly did not grasp how the sentence fit into
would be included among the distractors.the
In context as a whole.
this way, the test constructor may suggest alter-For testing English as a foreign language,
synonymous distractors should not be used. It
nate misleading logical thought sequences, but
is advisable to avoid distractors where the cor-
only one set of choices would be consistent with
the writer's intentions within the text as a rect choice is ambiguous even for native
whole. speakers. Thus, one should also avoid asking
When pre-testing it is advisable to place ap-about detailed grammatical points (e.g., the
proximately fifty percent more blank spaces distinction between it's and its) or prepositions
than needed, even though shrinkage is usuallywhich may also be confused by native speakers.
minimal. For example, if fifteen items are re-The fill-in test is essentially a test of reading
comprehension, not of grammar.
quired for a test, twenty to twenty-five may be
pre-tested. Afterwards, when unsuitable test If placement of the blank spaces is based on a
items are eliminated, these gaps can either belinguistic examination of the text rather than at
filled with the original word, or else the wholerandom, it might be argued that one way of con-
phrase in which it appears (provided it is not structing the fill-in test would be to pre-test it as
a cloze. Likely blank spaces could be chosen by
a key phrase) may be eliminated. The fill-in test
still remains intact because each item is inde- the test constructor, and the possible distractors,
pendent of the others, even after many of the it might be supposed, might be found from
items have been eliminated.13 among the students' wrong answers. 14 Experi-
In thinking up alternate responses (distrac-ence shows, however, that only about one-third
tors) the test constructor would be expected toof the test items can be obtained in this way.
use words focusing on a particular point, either This method is very time-consuming and yields
in terms of content or structure. It was found, relatively little in return. 15 The best way to con-
struct the fill-in test, given the level of English
in fact, that items focusing on a single point (in
which students were to choose from among four proficiency of our students, is to decide before-
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 233
hand the structures and ideas that are to be conventional multiple-choice tests. Although
the fill-in test constituted little more than one-
tested and to place blanks where these points
are likely to be tapped. third of the total number of lines of text, it con-
tributed at least one-half of the total number
of test items. Reliability was high.
FILL-IN VS. MULTIPLE-CHOICE TEST
Experiment Two: Same Texts, Different Formats.
Having developed the basic outlines of the
A question arises concerning the difference be-
fill-in test, we needed statistical proof
tween that
the two typesit
of tests. Although they may
would do its job as well as the more
yield similarwell-
statistical results, one could not
known, conventional type of multiple-choice
go as far as to say that the fill-in and multiple-
test. Four experiments were conducted using
choice (M-C) tests examine the same skills.
four separate test batteries which compared
Nevertheless, test
both test the reading comprehen-
items and scores of the fill-in test with those of
sion of a particular text. In order to have a
its multiple-choice counterpart using a com-
better basis for comparison, it was decided to
puterized item analysis procedure. 16Number take four texts and test each twice, using a dif-
of items per line of text, difficulty levels of ferent format and different students each time.
items, extent and function of test questions, Each test was constructed first in the conven-
correlations between scores, and test reliabilitytional multiple-choice format, and the second
were examined. time the same text was used to construct the
new fill-in format.
Experiment One: Test Difficulty. The sample
consisted of 435 first-year students taking the A total of 1487 applicants to the first-year
advanced reading course in English as a of For-
studies at Haifa University were tested. Most
eign Language at Haifa University in 1973. were high school graduates who had had seven
Each student took one of three English to tests
eight years of English. At random, each stu-
consisting of four subtests: one fill-in testdent andreceived one text with questions. A com-
three multiple-choice (M-C) tests (texts accom-parison of the statistical results appears in Table
panied by multiple-choice content questions II. An
as examination of the table shows that the
well as by vocabulary and reference questions).fill-in format yields more items. Since there are
A description of Experiment One is given greater
in differences among the average raw
scores for Texts A, B, C, and D than there are
Table I. In this first test battery, multiple-choice
and fill-in tests appear to be of approximatelybetween the M-C and fill-in versions of each
equivalent length and difficulty. text, the results would indicate that the choice
In terms of item easiness, Table I indicates of text may be more important than the format
that fill-in items are on a par with those of bythewhich it is tested.
TABLE I
Total 77 53 73 52 79 47
Reliability:
Kuder-Richardson .828 .709 .761
Split-Half .8669 .6613 .7963
*EI = average of the Easiness Indic
**n = number of subjects.
** *The relatively large number of it
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
234 Marsha Bensoussan & Rachel Ramraz
TABLE II
Number of lines 30 30 28 28 30 30 36 36
Number of students 73 192 204 180 201 213 213 211
Number of items 21 15 28 11 24 9 20 13
Number of good items 18 15 23 10 22 9 14 12
Disc. Index >.30 (%) (86) (100) (82) (91) (92) (100) (70) (92)
Median Disc. Index* .46 .50 .41 .49 .44 .45 .42 .42
Reliability:
Kuder-Richardson (20) .80 .76 .79 .63 .80 .58 .69 .62
Split-half .80 .78 .78 .62 .80 .48 .67 .58
Score (%) 63 62 6 64 75 72 46 44
Standard deviation 18.35 16.56 14.45 18.41 17.26 11.43 19.50 14.86
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 235
TABLE IV
Textual
Number of questions 12 2 7
Mean (% correct answers) 68 63 53
Standard deviation 15.8 9.7 21.9
Number of students: 73
X = 63%
Multiple-choice:
Number of questions 7 1 7
Mean (% correct answers) 73 41 55
Standard deviation 7.40 - 17.88
Number of students: 192
= 62%
bered,
texts as a combined single long text whenhowever,
com-that the correlations obtained
in Part
paring results with those of the fill-in. TheFour are based on only ten multiple-
first
test battery contained fifty-six items choice questions,
and the whereas the present correla-
second fifty-five. tions are based on thirty-eight and forty-one
The test batteries were administered as the multiple-choice questions, respectively.
English section of the entrance examination to When correlations are not excessively high,
the universities of Haifa and Tel Aviv during there is usually an external factor common to
two consecutive years. The first year, the both tests. Each test, then, gives information
examination was administered to 7499 appli- of a different kind about students' reading com-
cants; during the second year, 7114. Theprehension. In this test battery, the fill-in sub-
tests did not correlate so highly as to permit
results of the two test batteries appear in Table
V. their substitution for the multiple-choice sub-
Pearson correlations between total fill-in and test. Since each test contributes another meas-
ure of information, both fill-in and M-C tests
multiple-choice scores for the first test battery
was .75; .79 for the second. These figures are
could be used in the complete test battery.
considerably higher than those obtained in PartPart Four: Fill-In vs. M-C Test Formats. In as-
Four of this study, where the Pearson Correla-
sessing the fill-in, we wished to compare it with
other types of reading comprehension tests.
tion ranged from .36 to .47. It must be remem-
TABLE V
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
236 Marsha Bensoussan & Rachel Ramraz
TABLE VI
Comparison of Difficulty Levels of a M-C Test Battery and Three Fill-In Subtests
119 1 14.5 4.4 5.8 2.8 5.5 2.5 4.5 2.0 13.2 6.7
142 2 14.3 3.1 5.9 3.0 5.7 2.4 4.6 2.4 13.9 6.1
93 3 13.1 3.1 5.6 2.9 5.5 2.4 4.4 2.3 13.4 6.6
Subtest: (1) 28 12 9 10 26
(2) 28
(3) 24
Accordingly, the three fill-in subtests described incorrect according to comprehension (whether
in Experiment One above were compared with it was clear that the student understood the
another test battery which had been previously meaning of the context). Spelling errors were
administered to 354 applicants to Haifa Uni- not counted. A panel of twelve teachers graded
versity.'7 The same students took both the the examinations, and acceptable responses had
multiple-choice/cloze test battery and one fill- to be agreed upon unanimously during the
in subtest. marking of papers. It was assumed that some-
The English section of the entrance examina- one who was able to fill in the gaps in the text
tion consisted of fifty-seven test items and was demonstrated the ability to read and under-
seventy-five minutes in duration. Each of thestand the passage. Sentence completion and vocabu-
lary substitution, the first two subtests, were
items was selected for level of difficulty and dis-
crimination by pre-testing a similar population short, consisting of only one or two sentences,
at Haifa University. whereas the multiple-choice comprehension and cloze
Within the multiple-choice framework,subtests presented much longer and more com-
plex reading passages.
many types of testing exercises are possible. We
used the following three multiple-choice sub- If we compare the fill-in with each of the sub-
tests: 1) sentence completion subtest, which was tests
a in the test battery in terms of their respec-
test of word form and syntax (i.e., content tive difficulty levels (see Table VI), we obtain
words and function words-compositions, the following hierarchy: vocabulary is the
prepositions), where the student chose the easiest subtest. It is followed by the more diffi-
word(s) that best completed the sentence; 2) cult fill-in, cloze, multiple-choice, and sentence
vocabulary substitution subtest, a test in which the completion subtests, all three of which are ap-
examinee was asked to find the best synonym proximately of equal difficulty. This general
for the underlined word in each sentence; 3) pattern appears for each of the three fill-in sub-
multiple-choice comprehension subtest, a text ac- tests.
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 237
TABLE VII
Pearson Correlations Between Subtest Scores
Sentence Completion:
Total 1.00 .43 .64 .54
Subtest 1 .39 .62 .54
Subtest 2 .50 .66 .60
Subtest 3 .35* .64 .48
Vocabulary:
Total 1. 00 .34 .40
Subtest 1 .27** .41
Subtest 2 .38 .42
Subtest 3 .35* .36
Multiple-Choice:
Total 1. 00 .53
Subtest 1 .52
Subtest 2 .64
Subtest 3 .39
Cloze:
Total 1.00
Subtest 1
Subtest 2
Subtest 3
All correlations are significant, and all are p <.0001 except: *p <.001 and **p<.01.
***Total: Across all three fill-in subtests.
some larger common factor, such as the testing in tests reading comprehension - n
of EFL reading comprehension. The lowest words set and word forms at the micro-le
of correlations was between vocabulary more and importantly, the ability to follow
M-C. Correlations between the subtests were cal thought sequence at the macro-level
not so high as to permit the possibility ofing. sub-
stituting for another. In this respect, each sub- Statistically, the fill-in test measures up to
test appears to be tapping a different area theoftraditional M-C test. A test constructed in
reading comprehension. the fill-in format will probably have items of
It is especialy interesting to note that the the
fill-same average difficulty and effectiveness as
in subtests, multiple-choice versions ofif the
it had been constructed in the conventional
modified cloze procedure, do not correlate
M-C format. The only differences are that the
fill-in
highly with either the multiple-choice test or the will probably have more test items (ad-
random cloze passage. Thus, we may conclude
ministered in the same amount of time), and
that the fill-in test may be testing something
therefore the reliability will be slightly higher,
different. and that it will be more difficult to write macro-
level test items for the fill-in test. On the whole,
however, it is easier to write fill-in test items.
CONCLUSION
For these reasons, we have used a fill-in sub-
The purpose of the fill-in test test
is not
in ourtoEnglish
re- examination. We believe
place the conventional M-C test but
that to offer improves
its inclusion an the quality and effi-
additional multiple-choice test format.
ciency ofThe fill-
the test battery as a whole.
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
238 Marsha Bensoussan & Rachel Ramraz
9Greene
guage," TESOL Quarterly, 13 (1979), pp. 219-27; Laura produced
K. a rational cloze that was equal in diffi-
Heilenman, "The Use of a Cloze Procedure in Foreign
culty butLan-
more reliable than the random cloze by selecting
guage Placement," Modern LanguageJournal, 67content words (nouns, verbs, adverbs, and adjectives).
(1983), pp.
121-26; Charles Stansfield & Jacqueline Hansen, Another non-random cloze procedure, the "Discourse
"Field
Dependence as a Variable in Second Language Clozewas
Cloze," Test
developed by Levenston, Nir, and Blum-Kulka.
Performance," TESOL Quarterly, 17 (1983), pp. 0Ozete,
29-38.Guthrie, Cranney, Porter, and Jonz (notes 1
2Taylor, Oller and his colleagues, Heilenman, & 2 above).
Stansfield,
and Hansen are in favor of random cloze as are John
"Other approaches are possible; for example, the rational
Guthrie, "Reading Comprehension and Syntactic "DiscourseRe-
Cloze," where the deletions consist solely of cohe-
sponses in Good and Poor Readers," Journal of sion markers on the macro-level, presented by Levenston,
Educational
Psychology, 65 (1973), pp. 294-99; Don Porter,Nir, and Blum-Kulka. Thus, a cloze or fill-in test on the
"Modified
macro-level only is also possible.
Cloze Procedure: A More Valid Reading Comprehension
12Van Dijk;
Test," English Language Teaching Journal, 30 (1975), pp. Levenston, Nir, and Blum-Kulka; M. A. K.
151-55; Jon Jonz, "Improving on the Basic Egg:Halliday
The & Ruqaiya Hasan, Cohesion in English (London:
M-C
Longman,
Cloze," Language Learning, 26 (1976), pp. 255-65. 1976).
In favor
of non-random, rational cloze are A. Cranney, 13See
"The Jonz,
Con- Alderson, Klein-Braley (note 2 above).
struction of Two Types of Cloze Reading Tests14Seefor Jonz
Col- (note 2 above).
151It was
lege Students,"Journal of Reading Behavior, 5 (1972-73), suggested by Valerie Whiteson, Department of
pp.
60-64; Alderson; Frank Greene, "Modification English,
of the Cloze
Bar-Ilan University, that a pre-test using the cloze
would be successful provided that the English proficiency
Procedure and Changes in Reading Test Performances,"
Journal of Educational Measurement, 2 (1965),ofpp.
the213-17;
students was high enough, that is, near native level
(personal
Christine Klein-Braley, "Empirical Investigations communication).
of Cloze
Tests: An Examination of the Validity of Cloze16Rachel
Tests asRamraz, "ITANA V: Computer Program for
Item
Tests of General Language Proficiency in English Analysis,"
for Ger- Report No. 40 (Haifa: Univ. Selection &
Assessment
man University Students," Diss., Univ. of Duisburg, 1981;Unit, 1977).
Lyle Bachman, "The Trait Structure of Cloze Test17See Marsha Bensoussan, "A Comparison of Cloze and
Scores,"
Multiple-Choice Reading Comprehension Tests," Report
TESOL Quarterly, 16 (1982), pp. 61-70; E. A. Levenston,
No. the
R. Nir & S. Blum-Kulka, "Discourse Analysis and 57 (Haifa:
Test- Univ. Selection & Assessment Unit, 1981).
Kuder-Richardson
ing of Reading Comprehension by Cloze Techniques," pre- no. 20 Reliability = .93.
APPENDIX
A conflict between two worlds ........... is developing ........... On one side ........... (introducto
part of the text, containing 3 items: A, B, C) On the other side, demanding an ever larger share of th
are about 100 underdeveloped poor (E) with 2 billion people--millions of whom (F) in the sh
of death by starvation or disease. (G) , the conflict has been limited to economic pressures and proposa
(H) in international forums............
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 239
In the U.N. General Assembly, (L) they are now a solid vo
1 2 3 4
D. poverty idealism wealth* economy (1, 3)
E. people states* living industry (1, 3)
F. attempt work exist* worry (1)
G. Finally, However, After which, So far,* (3)
H. speeches* industry states producing
L. if where* although how (1)
M. economic* cultural produce politics (1, 3)
N. in addition to as a result of on behalf of in exchange of (1)
O. said denounced* praised told (1)
*Correct answer.
1 ......... (5 lines of text) On the other side, demanding an ever larger share of that wealth, are about 100
undeveloped poor states with 2 billion people - millions of whom exist in the shadow of death by starvation or disea
3 So far, the conflict has been limited to economic pressures and proposals, and speeches in international foru
In the U.N. General Assembly, where they now constitute a solid voting bloc, the developing state
have approved resolutions that demand a "new international economic order." The meaning: massive and painf
6 sacrifices by the rich on behalf of the poor. So one-sided have the Assembly's actions become that the U.S. has denounc
them as "a tyranny of the majority."
Questions on Text A
Textual Clues
(G,H) *4. have already put some pressure on the richer nations (1)
B. In the U.N. General Assembly, new resolutions demand that
*1. the rich nations give much more to the poor nations
2. the rich nations approve of more poor nations
3. the poor nations sacrifice more for the rich nations
(L, M, N) 4. the poor nations make more massive efforts for the rich nations
C. According to lines 6-7, the U.S. thinks that the Assembly's resolutions
1. are fair to all nations
This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms