You are on page 1of 11

Testing EFL Reading Comprehension Using a Multiple-Choice Rational Cloze

Author(s): Marsha Bensoussan and Rachel Ramraz


Source: The Modern Language Journal, Vol. 68, No. 3 (Autumn, 1984), pp. 230-239
Published by: Wiley on behalf of the National Federation of Modern Language Teachers
Associations
Stable URL: https://www.jstor.org/stable/328006
Accessed: 23-05-2019 04:40 UTC

REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/328006?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

National Federation of Modern Language Teachers Associations, Wiley are collaborating


with JSTOR to digitize, preserve and extend access to The Modern Language Journal

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension
Using a Multiple-Choice Rational Cloze
MARSHA BENSOUSSAN and RACHEL RAMRAZ

The purpose of this paper is to discuss the


THERE IS NOTHING NEW ABOUT TESTING EFL READ-

ing comprehension by means of the cloze


fill-in test,pro-
to explain how to construct it, to give
cedure. In a cloze test, the examineestatistical
is required evidence for its effectiveness, and to
compare
to fill in gaps in a given text. After Taylor it with
de- the more conventional mul-
tiple-choice and
veloped the cloze to measure text difficulty forcloze tests. The fill-in test was
native readers, other researchers showed
developedthat by the writers over a period of eight
it can measure foreign and second yearslanguage
in an attempt to improve the English sec-
proficiency as well.1 Opinions differtion of the as psychometric
to entrance examination
which
whether the random cloze (every nthwas administered
word between 1973-83 to

approximately
omitted) or rational cloze (the test designer de- 13,000 students yearly at both
Haifa test
cides which words to omit) is a better and Tel
ofAviv universities.

reading comprehension.2 We chose to Because


use the of the large number of examinees,
rational cloze in order to focus our test of EFL there were certain restrictions on test format.
reading comprehension on specific parts We
of were limited to multiple-choice scoring.
speech or content words, according to our
Another limitation was time: sixty minutes to
needs. diagnose English proficiency and to place stu-
A major recent development has been the
dents in appropriate classes. We needed to re-
ceive the maximum amount of information
addition of alternate responses from which the
examinee must choose the correct words to fill
about students' reading ability in this limited
time.
in the gaps in the text.3 Various researchers
have advocated presenting two, three, or four
CONVENTIONAL TESTS: DISADVANTAGES
alternate responses for each blank space in the
text.4 We have used four. Traditionally, reading tests include
The final question is whether an item tests
with multiple-choice questions as well
tence
only a limited section of text (i.e., phrase or completion items which were
specific points of grammar and vocab
clause: micro-level) or whether it reveals com-
prehension of a more general kind (i.e., inter-
many ways, however, this framework
sentence, inter-paragraph: discourse or macro-
cient. In the multiple-choice tests the t
level).5 What we believe to be a new develop-ratio is unfavorable; a student is req
read a great many lines of text in o
ment in this field is the multiple-choice version
of a modified rational cloze procedure which answer relatively few questions (400 w
text yielding ten questions in thirty m
permits the test designers to focus on a desired
amount of text, whether on the micro- or
opposed to other psychometric exam
macro-level. By choosing both the gaps in the subtests consisting of 25-40 questions i
text as well as the alternative responses pre-minutes). For the other tests, in additio
usual warm-up time required at the b
sented to the examinee, test designers can direct
the examinee's responses and are thus able of to an examination, readjustment time
construct tests to suit their specific needs.whenever
To an examinee begins a new
The greater the number of subtests, t
avoid confusing this test format with other cloze
time spent readjusting- using preci
procedures, we will call this one the fill-in test.
that should be used in reading text
The Modern Language Journal, 68, iii (1984) answering questions.
0026-7902/84/0003/230 $1.50/0
Multiple-choice tests are problem
?1984 The Modern Language Journal
other ways, too. After pre-testing, when

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 231

able questions are discarded, wide a rangeif ofan items as could be tested by
insufficient
number of questions remains, means of thethe conventional
entire multiple-choice
text
must be discarded or re-pretested reading comprehension untiltest." results
are satisfactory. Moreover, a theoretical issue
arises because it is not clear whether multiple-
SOLUTION: FILL-IN TEST
choice scores reflect test comprehension, ability
to choose the correct distractor, Linguistic Basis. or According
both.6 to discours
Despite these disadvantages, sis theory, multiple-choice
a text has three levels of m
questions are useful because of the focusing
1) the micro-level, efficient on the lexi
scoring. An even more of words and their interaction
important with other words is
advantage
that a wide range of skills in themay
context; be
2) the tested by
pragmatic level, this
which is
method. extra-textual and draws on the reader's general
knowledge of the world; and 3) the macro-level,
ALTERNATE SOLUTION: CLOZE PROCEDURE
dealing with the functions of the sentences and
A number of researchers have the proposed
structure the
of the text as a whole." Excerpts
random cloze procedure for testing from parallel sections
foreign lan- of one of the texts used
guage reading comprehension.7inIt has a good
Experiment Two, once as a fill-in test and
text-item ratio and requires relatively
once as alittle re-
multiple-choice comprehension test,
adjustment time between items. appearThe disadvan-
below in the Appendix. Items are coded
tages, however, are great. The according
random cloze
to their level of meaning. Although
cannot be pre-tested, since by re-inserting an evident from the short
it may not be totally
unsuccessful word into a gap, the sample
testwe selected, we aimed to include as
designer
would double the permitted spanmany between gaps
macro-level items as possible in both fill-
in and
(i.e., fifteen instead of the required multiple-choice
seven), and tests (see Experiment
it would no longer be a randomTwo cloze test.
and Table III).
Another disadvantage is theoretical: even
Micro-level items would test specific under-
though omitting every nth wordstanding
may giveof a word
the or collocation where the
test designer a random sample of theappeared
clue(s) text,initclose proximity to the blank
still will not give direct information
(one oron whether
two words before and/or after). Macro-
examinees know specific wordslevel
or items
phrases
would in
test a more general under-
the text.
standing of larger units of text (e.g., writer's
In a non-random or rational cloze, the test opinion, words showing comprehension of key
designer decides how many words to delete.8 concepts, function words signalling contrast/
Thus the two central problems of test construc- opposition, main idea of paragraph).
tion would be solved: the number of words be-
Fill-in Test Construction. Twenty to thirty blank
tween gaps no longer matters and test designers
spaces selected by the test designers were in-
can decide on the words they wish to test.9 serted into a 300-word text. Each blank space
However, marking is awkward and time-con- takes the place of a word or phrase (of not more
suming, and therefore we could not use the
than three words), and for each blank space,
rational cloze in our English entrance examina-
there is a choice of four possible answers. The
tion.
fill-in test modifies the cloze procedure in three
For our purposes, we needed a multiple- ways: 1) possible responses are already pro-
choice version of the cloze. Those discussed in vided in a multiple-choice format - unlike the
the literature are based on random cloze pro- cloze, which contains gaps in the text that need
cedure, with a gap after every nth word.10 Al- to be filled in; 2) unlike the cloze test, a blank
though this test format suited our needs, we space in a fill-in test can take the place of more
wished to have more control over items. In any than one word; 3) blank spaces are placed not
case, as a result of the multiple-choice format, after every nth word, but within a range of
test designers present the examinee with spe- seven to fifteen words or more. Each blank
cific choices, thus counteracting whatever ran- space reflects the comprehension of either
dom effect the omission of every nth word is micro-level (i.e., word, phrase, or clause) or
supposed to have. Thus, it is no longer a purely macro-level (i.e., sentence, paragraph, or
random cloze. We opted instead for the rational whole-text) meaning in the text. As much as
cloze and included, as nearly as possible, as possible, we focused the fill-in test items to

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
232 Marsha Bensoussan & Rachel Ramraz

include sentence, paragraph, adjectives) were more successful


and than evenitems w
text meaning. unclearly focused (alternate responses include
The basic criterion two for
adjectives choosing
and two conjunctions, blankso the stu- sp
is enough redundancy dent does not
in understand
the if hetext
is to look forso
the that
ficient reader could use the clues to fill in the content or form or both). The student probably
does not understand the point that is being
gap with an appropriate word or expression.
tested, and therefore the results would not
Moreover, in choosing the blanks, the test con-
structor should focus on pivotal or key words really show whether the student has understood
in a logical argument to see whether a studentthe text.

can follow the thought sequence. We aimed for a wide variety of items. For
this reason half the blanks represent content
Since it is relatively difficult to find words
words (nouns, verbs, adjectives, adverbs) and
that would indicate a student's knowledge of the
whole text, preference could be given to firstthe other half function words (conjunctions,
placing blanks which test general reading com-prepositions, word forms). Test constructors are
prehension. Function words, such as "however" free, however, to focus the test on whatever
and "therefore," would be good places for level of discourse they wish to test.
blanks. Other items tested could be cohesive Rationale for Determining Distractors. In choos-
ing distractors, the test constructor may make
markers such as "not only . . but also," "either
use of collocations, presenting words that could
S. . or," and "on one hand . . . on the other
hand." It is assumed that a student's recogni- appear together and make sense in some other
context. For this reason, opposites are particu-
tion of these syntactic devices would enable him
larly useful. They test the student's understand-
to follow the flow of an argument, and that lack
of recognition would impede his comprehen- ing of the whole text. Conjunctions are also
sion. helpful here. A student choosing "therefore"
Content words such as nouns, adjectives, when only "however" would fit the context may
and verbs which carry the weight of an argu- have understood a particular sentence, but cer-
ment would also be useful, and their opposites tainly did not grasp how the sentence fit into
would be included among the distractors.the
In context as a whole.
this way, the test constructor may suggest alter-For testing English as a foreign language,
synonymous distractors should not be used. It
nate misleading logical thought sequences, but
is advisable to avoid distractors where the cor-
only one set of choices would be consistent with
the writer's intentions within the text as a rect choice is ambiguous even for native
whole. speakers. Thus, one should also avoid asking
When pre-testing it is advisable to place ap-about detailed grammatical points (e.g., the
proximately fifty percent more blank spaces distinction between it's and its) or prepositions
than needed, even though shrinkage is usuallywhich may also be confused by native speakers.
minimal. For example, if fifteen items are re-The fill-in test is essentially a test of reading
comprehension, not of grammar.
quired for a test, twenty to twenty-five may be
pre-tested. Afterwards, when unsuitable test If placement of the blank spaces is based on a
items are eliminated, these gaps can either belinguistic examination of the text rather than at
filled with the original word, or else the wholerandom, it might be argued that one way of con-
phrase in which it appears (provided it is not structing the fill-in test would be to pre-test it as
a cloze. Likely blank spaces could be chosen by
a key phrase) may be eliminated. The fill-in test
still remains intact because each item is inde- the test constructor, and the possible distractors,
pendent of the others, even after many of the it might be supposed, might be found from
items have been eliminated.13 among the students' wrong answers. 14 Experi-
In thinking up alternate responses (distrac-ence shows, however, that only about one-third
tors) the test constructor would be expected toof the test items can be obtained in this way.
use words focusing on a particular point, either This method is very time-consuming and yields
in terms of content or structure. It was found, relatively little in return. 15 The best way to con-
struct the fill-in test, given the level of English
in fact, that items focusing on a single point (in
which students were to choose from among four proficiency of our students, is to decide before-

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 233
hand the structures and ideas that are to be conventional multiple-choice tests. Although
the fill-in test constituted little more than one-
tested and to place blanks where these points
are likely to be tapped. third of the total number of lines of text, it con-
tributed at least one-half of the total number
of test items. Reliability was high.
FILL-IN VS. MULTIPLE-CHOICE TEST
Experiment Two: Same Texts, Different Formats.
Having developed the basic outlines of the
A question arises concerning the difference be-
fill-in test, we needed statistical proof
tween that
the two typesit
of tests. Although they may
would do its job as well as the more
yield similarwell-
statistical results, one could not
known, conventional type of multiple-choice
go as far as to say that the fill-in and multiple-
test. Four experiments were conducted using
choice (M-C) tests examine the same skills.
four separate test batteries which compared
Nevertheless, test
both test the reading comprehen-
items and scores of the fill-in test with those of
sion of a particular text. In order to have a
its multiple-choice counterpart using a com-
better basis for comparison, it was decided to
puterized item analysis procedure. 16Number take four texts and test each twice, using a dif-
of items per line of text, difficulty levels of ferent format and different students each time.
items, extent and function of test questions, Each test was constructed first in the conven-
correlations between scores, and test reliabilitytional multiple-choice format, and the second
were examined. time the same text was used to construct the
new fill-in format.
Experiment One: Test Difficulty. The sample
consisted of 435 first-year students taking the A total of 1487 applicants to the first-year
advanced reading course in English as a of For-
studies at Haifa University were tested. Most
eign Language at Haifa University in 1973. were high school graduates who had had seven
Each student took one of three English to tests
eight years of English. At random, each stu-
consisting of four subtests: one fill-in testdent andreceived one text with questions. A com-
three multiple-choice (M-C) tests (texts accom-parison of the statistical results appears in Table
panied by multiple-choice content questions II. An
as examination of the table shows that the
well as by vocabulary and reference questions).fill-in format yields more items. Since there are
A description of Experiment One is given greater
in differences among the average raw
scores for Texts A, B, C, and D than there are
Table I. In this first test battery, multiple-choice
and fill-in tests appear to be of approximatelybetween the M-C and fill-in versions of each
equivalent length and difficulty. text, the results would indicate that the choice
In terms of item easiness, Table I indicates of text may be more important than the format
that fill-in items are on a par with those of bythewhich it is tested.

TABLE I

Description of Three English Tests

Test** (n = 435) English Test 1 (n = 154) English Test 2 (n = 170) English


Number Number EI* Number Number EI* Number Number EI*
Subtest lines items (%) lines items (%) lines items (%)
(1) Fill-in 30 28 52 27 28 56 32 24 55
(2) M-C A*** 5 6 48 7 7 50 12 6 63
(3) M-C B 18 11 54 13 6 42 14 7 43
(4) M-C C 24 8 51 26 11 43 21 10 50

Total 77 53 73 52 79 47

Reliability:
Kuder-Richardson .828 .709 .761
Split-Half .8669 .6613 .7963
*EI = average of the Easiness Indic
**n = number of subjects.
** *The relatively large number of it

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
234 Marsha Bensoussan & Rachel Ramraz
TABLE II

A Comparison between Fill-In and Multiple-Choice Tests on Same Text

Text A Text B Text C Text D

Fill-in M-C Fill-in M-C Fill-in M-C Fill-in M-C

Number of lines 30 30 28 28 30 30 36 36
Number of students 73 192 204 180 201 213 213 211
Number of items 21 15 28 11 24 9 20 13
Number of good items 18 15 23 10 22 9 14 12
Disc. Index >.30 (%) (86) (100) (82) (91) (92) (100) (70) (92)
Median Disc. Index* .46 .50 .41 .49 .44 .45 .42 .42

Reliability:
Kuder-Richardson (20) .80 .76 .79 .63 .80 .58 .69 .62
Split-half .80 .78 .78 .62 .80 .48 .67 .58
Score (%) 63 62 6 64 75 72 46 44
Standard deviation 18.35 16.56 14.45 18.41 17.26 11.43 19.50 14.86

*Discrimination Index = point biserial correlation between the resp


considered effectively able to discriminate between good and weak
.30 but less than .60.

in test need not be restrict


Having established that the fill-in is as good
a test as the M-C test, we now ask ourselves of single words; it can be u
about the nature of its function and whether standing of the wider conte
this is different from that of the M-C test. A portions as the M-C test. In
section of Test A in each of the test formats quent kind of item include
appears in the Appendix. Looking at the data and macro-textual clues alth
has more linguistic clues an
from the point of view of discourse analysis, we
proportionately more textu
broke down the results according to our criteria
for choosing blank spaces (micro-level, prag- Like the M-C test, the fil
both micro-level and mac
matic level, and macro-level meanings). The
distribution of macro-level items is similar for
From the point of view of
both M-C and fill-in tests. As would be ex- questions were separated ac
of context included, and m
pected, however, the fill-in test has more micro-
level linguistic items (see Table III). for each category (see Tabl
Although the M-C test and the fill-in test C and fill-in tests, linguist
may not test the same reading comprehension were easiest (means: fill-
skills, they both require the reader to focus73%).
on Excluding the one (a
a specific amount of text in order to answer a
tional) question in the M-C
question. This area of focus can vary from the
one macro-level items were
word to the extent of the entire text. The fill- in = 53%, M-C = 55%).
Part Three.: Two Test Batteri
and Multiple-Choice Items.
TABLE III compared with multiple-ch
Breakdown of Items According to Amount in two additional test batte
of Text
Included: M-C and Fill-In Tests A of fill-in and multiple-choic
ing to the following forma
Number of Items
Multiple-Choice Subtest A
Fill-in M-C Multiple-Choice Subtest B
Category Test A Test A Multiple-Choice Subtest C
Fill-in Subtest D
1. Micro-level meaning 12 7 Multiple-Choice Subtest E
2. Pragmatic level meaning 2 1
The texts were graded from A, the easiest, to
3. Macro-level meaning 7 7
E, the most difficult. For convenience, it was
Total number of items 21 15
decided to consider all four multiple-choice

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 235
TABLE IV

Experiment Two Text-Question Difficulty

Textual

Linguistic Pragmatic (cohesion)


Textual clues (micro-) (micro-) (macro-)
Fill-in:

Number of questions 12 2 7
Mean (% correct answers) 68 63 53
Standard deviation 15.8 9.7 21.9
Number of students: 73
X = 63%

Multiple-choice:
Number of questions 7 1 7
Mean (% correct answers) 73 41 55
Standard deviation 7.40 - 17.88
Number of students: 192
= 62%

bered,
texts as a combined single long text whenhowever,
com-that the correlations obtained
in Part
paring results with those of the fill-in. TheFour are based on only ten multiple-
first
test battery contained fifty-six items choice questions,
and the whereas the present correla-
second fifty-five. tions are based on thirty-eight and forty-one
The test batteries were administered as the multiple-choice questions, respectively.
English section of the entrance examination to When correlations are not excessively high,
the universities of Haifa and Tel Aviv during there is usually an external factor common to
two consecutive years. The first year, the both tests. Each test, then, gives information
examination was administered to 7499 appli- of a different kind about students' reading com-
cants; during the second year, 7114. Theprehension. In this test battery, the fill-in sub-
tests did not correlate so highly as to permit
results of the two test batteries appear in Table
V. their substitution for the multiple-choice sub-
Pearson correlations between total fill-in and test. Since each test contributes another meas-
ure of information, both fill-in and M-C tests
multiple-choice scores for the first test battery
was .75; .79 for the second. These figures are
could be used in the complete test battery.
considerably higher than those obtained in PartPart Four: Fill-In vs. M-C Test Formats. In as-
Four of this study, where the Pearson Correla-
sessing the fill-in, we wished to compare it with
other types of reading comprehension tests.
tion ranged from .36 to .47. It must be remem-

TABLE V

A Comparison Between Fill-In and Multiple-Choice Tests

Pearson correlations between


Number of Total number Number Mean Standard K-R total scores of fill-in and
Year subjects of test items of items score (%) deviation rel.* multiple-choice tests
1980 7499 56 Fill-In

15 9.0 (60) 3.98 .841


M-C .748

41 21.5 (52) 8.75 .903


1981 7114 55 Fill-In

17 8.5 (50) 4.32 .815


M-C .798

38 22.0 (58) 8.39 .880

*Average Kuder-Richardson #20 reliability for two parallel test forms.

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
236 Marsha Bensoussan & Rachel Ramraz

TABLE VI

Comparison of Difficulty Levels of a M-C Test Battery and Three Fill-In Subtests

Multiple-Choice Battery + Cloze

Fill-In Subtest Sent. Comp. Vocabulary M-C Text Cloze


N Form Mean SD Mean SD Mean SD Mean SD Mean SD

119 1 14.5 4.4 5.8 2.8 5.5 2.5 4.5 2.0 13.2 6.7
142 2 14.3 3.1 5.9 3.0 5.7 2.4 4.6 2.4 13.9 6.1
93 3 13.1 3.1 5.6 2.9 5.5 2.4 4.4 2.3 13.4 6.6

Total across all three subtests:


345 14.1 3.6 5.8 2.9 5.6 2.4 4.5 2.2 13.5 6.4
Mean (%) 54 48 62 50 52
Number of items in each subtest:
Fill-in form:

Subtest: (1) 28 12 9 10 26
(2) 28
(3) 24

Accordingly, the three fill-in subtests described incorrect according to comprehension (whether
in Experiment One above were compared with it was clear that the student understood the
another test battery which had been previously meaning of the context). Spelling errors were
administered to 354 applicants to Haifa Uni- not counted. A panel of twelve teachers graded
versity.'7 The same students took both the the examinations, and acceptable responses had
multiple-choice/cloze test battery and one fill- to be agreed upon unanimously during the
in subtest. marking of papers. It was assumed that some-
The English section of the entrance examina- one who was able to fill in the gaps in the text
tion consisted of fifty-seven test items and was demonstrated the ability to read and under-
seventy-five minutes in duration. Each of thestand the passage. Sentence completion and vocabu-
lary substitution, the first two subtests, were
items was selected for level of difficulty and dis-
crimination by pre-testing a similar population short, consisting of only one or two sentences,
at Haifa University. whereas the multiple-choice comprehension and cloze
Within the multiple-choice framework,subtests presented much longer and more com-
plex reading passages.
many types of testing exercises are possible. We
used the following three multiple-choice sub- If we compare the fill-in with each of the sub-
tests: 1) sentence completion subtest, which was tests
a in the test battery in terms of their respec-
test of word form and syntax (i.e., content tive difficulty levels (see Table VI), we obtain
words and function words-compositions, the following hierarchy: vocabulary is the
prepositions), where the student chose the easiest subtest. It is followed by the more diffi-
word(s) that best completed the sentence; 2) cult fill-in, cloze, multiple-choice, and sentence
vocabulary substitution subtest, a test in which the completion subtests, all three of which are ap-
examinee was asked to find the best synonym proximately of equal difficulty. This general
for the underlined word in each sentence; 3) pattern appears for each of the three fill-in sub-
multiple-choice comprehension subtest, a text ac- tests.

companied by multiple-choice questions about In general, the subtests do not correlate


content, syntax, vocabulary, and reference; 4) highly with each other (see Table VII),
rational cloze (same type as above, Experiments although all correlations obtained were signifi-
One, Two, and Three). cant. The highest correlation (.64) was found
The cloze subtest consisted of a 313-wordbetween scores of the sentence completion sub-
test and the multiple-choice test. The relatively
text containing twenty-four blank spaces. Each
student response was marked either correct orlow correlations would indicate a relation with

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 237
TABLE VII
Pearson Correlations Between Subtest Scores

Subtest Fill-In Sentence Completion Vocabulary Multiple-Choice Cloze


Fill-In:
***Total 1.00 .50 .40 .41 .43
Subtest 1 .50 .38 .36 .38
Subtest 2 .53 .38 .47 .50
Subtest 3 .50 .46 .45 .48

Sentence Completion:
Total 1.00 .43 .64 .54
Subtest 1 .39 .62 .54
Subtest 2 .50 .66 .60
Subtest 3 .35* .64 .48

Vocabulary:
Total 1. 00 .34 .40
Subtest 1 .27** .41
Subtest 2 .38 .42
Subtest 3 .35* .36

Multiple-Choice:
Total 1. 00 .53
Subtest 1 .52
Subtest 2 .64
Subtest 3 .39

Cloze:
Total 1.00
Subtest 1
Subtest 2
Subtest 3

All correlations are significant, and all are p <.0001 except: *p <.001 and **p<.01.
***Total: Across all three fill-in subtests.

some larger common factor, such as the testing in tests reading comprehension - n
of EFL reading comprehension. The lowest words set and word forms at the micro-le
of correlations was between vocabulary more and importantly, the ability to follow
M-C. Correlations between the subtests were cal thought sequence at the macro-level
not so high as to permit the possibility ofing. sub-
stituting for another. In this respect, each sub- Statistically, the fill-in test measures up to
test appears to be tapping a different area theoftraditional M-C test. A test constructed in
reading comprehension. the fill-in format will probably have items of
It is especialy interesting to note that the the
fill-same average difficulty and effectiveness as
in subtests, multiple-choice versions ofif the
it had been constructed in the conventional
modified cloze procedure, do not correlate
M-C format. The only differences are that the
fill-in
highly with either the multiple-choice test or the will probably have more test items (ad-
random cloze passage. Thus, we may conclude
ministered in the same amount of time), and
that the fill-in test may be testing something
therefore the reliability will be slightly higher,
different. and that it will be more difficult to write macro-
level test items for the fill-in test. On the whole,
however, it is easier to write fill-in test items.
CONCLUSION
For these reasons, we have used a fill-in sub-
The purpose of the fill-in test test
is not
in ourtoEnglish
re- examination. We believe
place the conventional M-C test but
that to offer improves
its inclusion an the quality and effi-
additional multiple-choice test format.
ciency ofThe fill-
the test battery as a whole.

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
238 Marsha Bensoussan & Rachel Ramraz

sented at the International Symposium of Language f


NOTES
Special Purposes, Eindhoven, Netherlands, 1982.
3Andrew Cohen, Testing Language Ability in the Classroom
(Rowley, MA: Newbury House, 1980), pp. 94-95.
4Ozete advocates two alternate responses, Guthrie favo
'Wilson L. Taylor, "Cloze Procedure: A New Tool and
three, for Porter, Jonz, and Cranney favor four.
Measuring Readability," Journalism Quarterly, 30 5Teun
(1953), A.
pp.van Dijk, Macrostructures (Hillsdale, NY: E
baum,Levels
415-33; John Oller, "Scoring Methods and Difficulty 1980).
6Marsha
for Cloze Tests in Proficiency in English as a Second Bensoussan, "Testing the Test of EFL Rea
Lan-
guage," Modern Language Journal, 56 (1972),ing pp. Comprehension:
151-58; To What Extent Does the Difficu
Patricia Irvine, Parvin Atai & John Oller, "Cloze, Dicta-
of a Multiple-Choice Comprehension Test Reflect the D
tion, and the Test of English as a Foreign ficulty
Language,"
of the Text?" System, 10 (1982), pp. 285-90; Jo
Language Learning, 24 (1974), pp. 245-53; Oscar Pikulski & Edna Pikulski, "Cloze, Maze, and Teacher
Ozete, "The
Judgement,"
Cloze Procedure: A Modification," Foreign Language Annals,Reading Teacher, 30 (1977), pp. 766-70.
7See note
10 (1977), pp. 565-68; J. Charles Alderson, "The 2 above.
Cloze
8See note
Procedure and Proficiency in English as a Foreign Lan-2 above.

9Greene
guage," TESOL Quarterly, 13 (1979), pp. 219-27; Laura produced
K. a rational cloze that was equal in diffi-
Heilenman, "The Use of a Cloze Procedure in Foreign
culty butLan-
more reliable than the random cloze by selecting
guage Placement," Modern LanguageJournal, 67content words (nouns, verbs, adverbs, and adjectives).
(1983), pp.
121-26; Charles Stansfield & Jacqueline Hansen, Another non-random cloze procedure, the "Discourse
"Field
Dependence as a Variable in Second Language Clozewas
Cloze," Test
developed by Levenston, Nir, and Blum-Kulka.
Performance," TESOL Quarterly, 17 (1983), pp. 0Ozete,
29-38.Guthrie, Cranney, Porter, and Jonz (notes 1
2Taylor, Oller and his colleagues, Heilenman, & 2 above).
Stansfield,
and Hansen are in favor of random cloze as are John
"Other approaches are possible; for example, the rational
Guthrie, "Reading Comprehension and Syntactic "DiscourseRe-
Cloze," where the deletions consist solely of cohe-
sponses in Good and Poor Readers," Journal of sion markers on the macro-level, presented by Levenston,
Educational
Psychology, 65 (1973), pp. 294-99; Don Porter,Nir, and Blum-Kulka. Thus, a cloze or fill-in test on the
"Modified
macro-level only is also possible.
Cloze Procedure: A More Valid Reading Comprehension
12Van Dijk;
Test," English Language Teaching Journal, 30 (1975), pp. Levenston, Nir, and Blum-Kulka; M. A. K.
151-55; Jon Jonz, "Improving on the Basic Egg:Halliday
The & Ruqaiya Hasan, Cohesion in English (London:
M-C
Longman,
Cloze," Language Learning, 26 (1976), pp. 255-65. 1976).
In favor
of non-random, rational cloze are A. Cranney, 13See
"The Jonz,
Con- Alderson, Klein-Braley (note 2 above).
struction of Two Types of Cloze Reading Tests14Seefor Jonz
Col- (note 2 above).
151It was
lege Students,"Journal of Reading Behavior, 5 (1972-73), suggested by Valerie Whiteson, Department of
pp.
60-64; Alderson; Frank Greene, "Modification English,
of the Cloze
Bar-Ilan University, that a pre-test using the cloze
would be successful provided that the English proficiency
Procedure and Changes in Reading Test Performances,"
Journal of Educational Measurement, 2 (1965),ofpp.
the213-17;
students was high enough, that is, near native level
(personal
Christine Klein-Braley, "Empirical Investigations communication).
of Cloze
Tests: An Examination of the Validity of Cloze16Rachel
Tests asRamraz, "ITANA V: Computer Program for
Item
Tests of General Language Proficiency in English Analysis,"
for Ger- Report No. 40 (Haifa: Univ. Selection &
Assessment
man University Students," Diss., Univ. of Duisburg, 1981;Unit, 1977).
Lyle Bachman, "The Trait Structure of Cloze Test17See Marsha Bensoussan, "A Comparison of Cloze and
Scores,"
Multiple-Choice Reading Comprehension Tests," Report
TESOL Quarterly, 16 (1982), pp. 61-70; E. A. Levenston,
No. the
R. Nir & S. Blum-Kulka, "Discourse Analysis and 57 (Haifa:
Test- Univ. Selection & Assessment Unit, 1981).
Kuder-Richardson
ing of Reading Comprehension by Cloze Techniques," pre- no. 20 Reliability = .93.

APPENDIX

Experiment Two--Fill-in Test A

POOR vs. RICH: A NEW GLOBAL CONFLICT

A conflict between two worlds ........... is developing ........... On one side ........... (introducto

part of the text, containing 3 items: A, B, C) On the other side, demanding an ever larger share of th

are about 100 underdeveloped poor (E) with 2 billion people--millions of whom (F) in the sh
of death by starvation or disease. (G) , the conflict has been limited to economic pressures and proposa
(H) in international forums............

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms
Testing EFL Reading Comprehension 239
In the U.N. General Assembly, (L) they are now a solid vo

resolutions that demand a "new international (M) order." Th

rich (N) the poor. So one-sided have the Assembly's


as "a tyranny of the majority."

Questions on Fill-in Text A

Textual Clues: 1 = linguistic (micro-)


2 = pragmatic (micro-)
3 = textual (macro-)

1 2 3 4
D. poverty idealism wealth* economy (1, 3)
E. people states* living industry (1, 3)
F. attempt work exist* worry (1)
G. Finally, However, After which, So far,* (3)
H. speeches* industry states producing
L. if where* although how (1)
M. economic* cultural produce politics (1, 3)
N. in addition to as a result of on behalf of in exchange of (1)
O. said denounced* praised told (1)
*Correct answer.

Experiment Two -Multiple-Choice Test A

1 ......... (5 lines of text) On the other side, demanding an ever larger share of that wealth, are about 100
undeveloped poor states with 2 billion people - millions of whom exist in the shadow of death by starvation or disea
3 So far, the conflict has been limited to economic pressures and proposals, and speeches in international foru
In the U.N. General Assembly, where they now constitute a solid voting bloc, the developing state
have approved resolutions that demand a "new international economic order." The meaning: massive and painf
6 sacrifices by the rich on behalf of the poor. So one-sided have the Assembly's actions become that the U.S. has denounc
them as "a tyranny of the majority."

Questions on Text A

Textual Clues

Corresponding A. According to lines 2-3, so far the underprivileged nations


Fill-in Items 1. have done very little to help themselves
2. have been in constant armed conflict
3. have no demands at all on the richer nations

(G,H) *4. have already put some pressure on the richer nations (1)
B. In the U.N. General Assembly, new resolutions demand that
*1. the rich nations give much more to the poor nations
2. the rich nations approve of more poor nations
3. the poor nations sacrifice more for the rich nations
(L, M, N) 4. the poor nations make more massive efforts for the rich nations
C. According to lines 6-7, the U.S. thinks that the Assembly's resolutions
1. are fair to all nations

2. are in the interest only of the rich nations


(0) *3. are in the interest of only the poor nations
4. are insufficient in the face of mass starvation (2)
*Correct answer.

This content downloaded from 128.250.144.144 on Thu, 23 May 2019 04:40:34 UTC
All use subject to https://about.jstor.org/terms

You might also like