Professional Documents
Culture Documents
net/publication/281452719
CITATIONS READS
5 1,246
1 author:
Gunadi Sulistyo
State University of Malang
26 PUBLICATIONS 61 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Impacts of Teacher-Written Corrective Feedback with Teacher-Student Conference on Students’ Revision View project
THE PROFILE OF EFL LEARNERS AS MEASURED BY AN ENGLISH PROFICIENCY TEST View project
All content following this page was uploaded by Gunadi Sulistyo on 03 September 2015.
Gunadi H. Sulistyo
Abstract: This article briefly reviews the development of TOEFL as a widely ac-
knowledged version of English proficiency test for non-native users. The specific
aspects on review are the nature of TOEFL as a testing instrument, its historical de-
velopment from the perspective of language and test development theories, and the
testing formats of the language aspects in both earlier and later versions of TOEFL.
Also on elaboration is the scoring comparison applied to both versions..
The Test of English as a Foreign Language needs of TOEFL scores have also boosted
(henceforth TOEFL) has enjoyed its presti- from year to year. As a consequence, the
gious status as a standardized test widely needs of TOEFL training are also inevitable
used across nations of more than one hun- although it is not clear whether those pros-
dred countries since its initial establish- pective TOEFL takers take the test for their
ment in early 1960s. It has been utilized as a further studies abroad or for any other pur-
means of measuring the proficiency of non- poses. What is obvious then is that such a
native speakers of English, as its name de- demanding context triggers the establish-
monstrates, in English as a foreign lan- ment of preparatory courses that burgeon
guage, in particular for academic purposes. ubiquitously. It is an undeniable fact that
Not only international educational institu- such preparatory courses in a way play a
tions, several domestic higher-learning in- role in catering for the needs of the prospec-
stitutions as well as non educa-tional agen- tive TOEFL candidates of the TOEFL
cies have also made use of the score of in- scores a part from any interest in their es-
dividuals taking TOEFL as a requirement of tablishment.
not only admission, recruit-ment, but also While the technologies of testing
exit purposes. This implies that many have adopted by TOEFL have advanced more
relied on TOEFL as a dependable tool that rapidly, and at the same time, while the
can provide good evidence of one s profi- need of the TOEFL scores tends to be in-
ciency in English as a foreign language. creasing in number, it seems that those pre-
It is believed that interest in taking paratory courses have not been able com-
TOEFL has been increasing as shown in the pletely to catch up with, in particular, the
number of the prospective TOEFL takers advances of testing technologies employed
from year to year. This indicates that the by TOEFL. For example, on one side
116
117 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009
- _______._____________________________________________.__________ +
layers bit by bit. In TOEFL this view is re- components may be realized, which include
flected clearly in the importance of having listening, speaking, reading, and writing.
accuracy of grammatical form or testing a Language components include gram-
TOEFL taker s grammatical knowledge. mar/structure, vocabulary, phonology/ortho-
Language is also conceived to comprise graphy and fluency. Harris (1969:11) neatly
interacting two components: language skills illustrates the relation between language
and language components. Language skills skills and language components as the fol-
refer to the modes through which language lowing matrix suggests.
Language Language Components
Skills
Grammar / Struc- Vocabulary Phonology/ Or- Rate and General
ture thography Fluency
Listening v v v v
Speaking v v v v
Reading v v v v
Writing v v v v
with a test on vocabulary items which are The other two, which are normally tested
presented in sentential contexts. This part separately, still include the Test of Written
also frequently begins with a short text, pre- English (TWE) and the Test of Spoken Eng-
sumably a paragraph with questions follow- lish (TSE). The test format used also re-
ing the text. Items on testing understanding mains the multiple-choice type with four
the meaning of a word in context are also selective alternatives. In later years, howev-
included as a question. er, the cBT version adopted more and more
Seen from the testing technology communicative views of language compe-
adopted in the pBT version, the test format tence.
used in TOEFL fits the divisibility nature of The cBT version, however, may be
language. Apart from its test of written Eng- classified into two in terms of the testing
lish (TWE) and test of spoken English technology adopted in this version. The first
(TSE), TOEFL employs the multiple-choice cBT type is essentially like the pBT version
type with four options. In this selective type in that it still assesses listening, grammar
of response, there is a stem that functions as and written expression, and reading with the
a stimulus to which the TOEFL takers will same sub tests distributed in 3 main sec-
respond. Following the stimulus are the al- tions. The test format is the same: the mul-
ternatives with one correct answer and three tiple-choice type with four selective alterna-
distracters for the TOEFL takers to select. tives and the items presentation is fixed
The use of multiple-choice enables the lan- with an increasing level of difficulty. As
guage elements to be measured bit by bit. In such the first cBT type may be known as the
addition, the presentation of items in the test non-adaptive type of the cBT version. This
follows an order with an increasing level of first type is basically the pBT version which
difficulty. However, the items presentation is made computerized.
is fixed in that the TOEFL takers have no In the initial stage, the second cBT
choice in completing the items presented to type is essentially like the non adaptive cBT
them whether or not the level difficulty (p) one in terms of contents and format: mea-
of the items fit their language ability. The surement of listening comprehension, sensi-
earlier items are those with medium level of tivity on grammar and written expressions,
difficulty, or .60 p .80 (Crocker and and reading comprehension with the same
Algina, 1986:312). This feature is unders- sub tests distributed in 3 main sections; the
tandable due to the nature of the pBT for- test format being the same: the multiple-
mat which does not permit the level of dif- choice type with four selective alternatives.
ficulty of the items to vary along the line What differs essentially lies in the presenta-
with the ability level of the test takers. tion of test items. This type still adopts a
The cBT version. Seen from its un- strategy with an increasing level of difficul-
derlying linguistic theory, basically the cBT ty with earlier items being those with a me-
version is still characterized by the structur- dium level of difficulty. However, the items
al linguistic views. Thus, there have not following the first items may vary depend-
been significant changes in the concepts re- ing on the response of the TOEFL takers.
flecting general language ability adopted in Figure 4 describes the scheme of adaptive
the cBT version. Just like the pBT version, testing.
in the cBT earlier version, the test is com-
prised of three sub tests as Listening Sec-
tion, Grammar and Written Expression Sec-
tion, and Reading Comprehension Section.
121 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009
Item 5
Item 4
Item 3 Item 5
Item 2 Item 4
Item 2 Item 4
Item 3 Item 5
Item 4
Item 5
sion of macro language skills: listening, macro skills: listening, speaking, reading
speaking, reading, and writing. Language and writing. Also, in this iBT version, aca-
components are tested in integration within demic settings and themes are more empha-
language skills rather than in isolation. sized.
In addition to this shift, the new cBT al- Introduced worldwide in the period of
so adopts a new testing mode that provides 2005 - 2006, the iBT version, as its name
TOEFL takers with more opportunities to indicates, makes functional use of informa-
demonstrate their command in utilizing ma- tion and communication technology (ICT).
cro skills by way of constructing direct res- The TOEFL tasks in the iBT version are
ponses, thus beginning to leave the selec- delivered through the internet from Educa-
tive-response format behind. Another shift tional Testing Services (ETS) to the autho-
taking place in the later development of the rized testing centers where the candidates
new cBT relates with the language are pooled to complete the tasks.
processing. Unlike the old cBT version
which is essentially the pBT version made BUILDING BLOCKS INSIDE ALL
computerized and elicits language abilities TOEFL VERSIONS
through a discrete-mode of testing, the new-
er cBT begins to include test tasks that All the three TOEFL versions: pBT,
would process language in a more integra- cBT, and iBT essentially have their own
tive fashion in terms of language skills. characteristics viewed from two main
Thus, writing or speaking tasks may have a points: what to be tested and how to test it.
relation with a reading or listening task. What follows is a brief account of each of
Prior to producing English through writing the TOEFL versions seen from these two
or speaking tasks, a candidate may be re- points.
quired to incorporate pieces of information In terms of what to test, the pBT ver-
that appear in a reading or listening task. sion, as it was influenced by the structural
The iBT version. This version was linguistic views, is characterized by dis-
launched in 2005, and is gradually expected crete-testing practices. One (or maybe two)
to substitute the role of both the cBT and language component is obviously tested,
the pBT versions. The introduction of the namely grammar, under a separate section.
new cBT version plays a critical role to the Vocabulary is also tested, but is commonly
establishment of the iBT version in that the put under reading. Two macro skills are
new cBT version lays a strong transitional tested, namely listening and reading. These
bridge on which to step onto the era of the three aspects: grammar, listening, and read-
iBT version. ing all together comprise one battery com-
As has been discussed previously, from monly known as the paper-based TOEFL.
linguistic standpoints, the later cBT version Two other macro skills speaking and writ-
has endeavored to feature the principles of ing are also tested as independent sets
the communication-movement in language known as the Test of Spoken English (TSE)
testing in the corresponding TOEFL sub and the Test of Written English (TWE) re-
tests. As the further development of the lat- spectively.
er cBT version, the iBT version shares simi- The listening section of the pBT version
lar features with the later cBT version. normally consists of a variety tasks assess-
From linguistic standpoints, the iBT version ing three or four aspects: sentence level
is also characterized by the need to test the comprehension, comprehension of dialogs,
candidates functional language skills. It is comprehension of extended conversations,
designed to measure the integrated use of and comprehension of mini talks. These as-
123 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009
pects clearly indicate levels of how lan- sessing a candidate s clarity in expressing
guage is believed to be constructed. The so- his/her ideas in spoken English of exposi-
cial themes presented to assess these aspects tion. The candidate s use of grammar, vo-
are commonly of general interest. In mini cabulary, and pronunciation as well as idea
talks, however, mini lectures are also pre- organizations are also evaluated.
sented. This is intended to represent aca- In terms of how to test, the pBT version
demic settings. is basically non adaptive testing. The items
The grammar section mainly aims at are arranged with their fixed yet increasing
testing grammatical accuracy and, in one level of difficulty along the items in the cor-
sense, grammar sensitivity. The grammati- responding battery. Aspects to be tested are
cal points to be tested include a variety of organized into sections indicating particular
English grammar aspects such as verbs, abilities to be assessed. For instance, the
auxiliary verbs, nouns, pronouns, modifiers, listening comprehension section is orga-
comparatives, connectors, sentences and nized into three main parts: sentence com-
clauses, relationship of ideas, agreement, prehension or dialog part of about 30 ques-
introductory verbal modifiers, parallel struc- tions, extended conversation part of about 5
tures, redundancy, and word choice questions, and mini-talk part of about 5
(Sharpe, 2005:86-113). questions. The grammar and written expres-
The reading section may consist of two sion section is differentiated into two parts:
main aspects to be tested: vocabulary and error recognition and completion with about
reading. Vocabulary includes the testing of 20 questions each. The reading comprehen-
word meanings and/or meanings of words sion section may take the form of 4-5 read-
in sentential contexts (Jenskins-Murphy, ing passages with about 7-9 comprehension
1981). This includes among other things questions following them. Vocabulary items
testing of shades of meaning of words, syn- are included in this part. Writing is an inde-
onym, antonym, word-part clues, denota- pendent task in the pBt version which re-
tion, and connotation. Reading aims at as- quires the test takers to respond to a writing
sessing various micro reading skills, like task using their hand writing. Normally, it
understanding main idea, understanding takes 3 hours to accomplish the pBT ver-
supporting ideas/details, understanding or- sion.
ganization of the text, understanding im- Basically there are not many differences
plied details, understanding word meaning, between the pBT and cBT versions in terms
understanding pronoun reference, and un- of what to test. Thus, the cBT version tests
derstanding the writer s tone of writing both language components (grammar and
(Phillips, 1989). The rhetoric modes of the possibly vocabulary) and skills (listening,
text include among other things narration, reading, and writing). Slight differences are
definition/illustration, classification, com- observed, however. In the listening section
parison, contrast, cause, effect, persua- of the cBT section, less dialogs (about 20)
sion/justification, problem/solution (Sharpe, are presented with one question each. Ex-
2005:122-251). tended conversations in the pBT versions
The writing test assesses a candidate s are modified a bit into short conversations
ability in writing a piece of essay of exposi- in the cBT version with about the same
tory or persuasive modes. Focus of testing number i.e. 3 short conversations with about
is placed on the candidate s ability to organ- 3 questions each. The pBT s mini-talks are
ize ideas using accurate grammar, vocabu- also modified and specified as mini-lectures
lary, spelling and mechanics. Just like the and discussions in the cBT versions with
writing test, the speaking test aims at as- about 5 questions each. Also, the reading
Sulistyo, Toefl in A Brief Historical 124
section of the pBT and cBT versions is al- of language components into language
most similar in terms of the number of pas- skills, and integrated processing of informa-
sages and comprehension questions follow- tion within language skills as well as an
ing the passages, figuring at about 5 passag- emphasis on academic settings has characte-
es and 10 questions each respectively. In rized the iBT version strongly.
terms of the writing tasks, the pBT and cBT A further examination on each sub sec-
versions share similarities i.e. one indepen- tion of the iBT version reveals substantial
dent task, focusing on exposition/persua- changes that have been made. In the listen-
sion. The scoring of writing, however, is ing section, more genuine academic conver-
combined with the structure section. Final- sations and lectures are presented. Other
ly, the number of items in the structure sec- aspects include questions that probe further
tion is reduced in the cBT version to about about the speaker s mood, feeling, purpose
25 questions from about 40 questions in the and drive are also posed to the test takers.
pBT version. All the sub tests in the cBT More naturally, note taking, which did not
version need 3.5 hours to accomplish. appear in the previous versions, is permitted
In terms of how to test, as has been in the iBT version. In terms of types of aur-
dealt with previously, the cBT version is al stimulus, listening tasks include two main
psychometrically speaking adaptive, in par- formats: lectures (6 texts) as well as class-
ticular in the listening and the structure ses- room discussion with corresponding ques-
sions, but not for the reading and writing tions (about 5) each, and academic setting
sections. Initial numbers are presented with conversations (about 3) accompanied with 5
items of moderate levels of difficulty. The questions each. With this format, the iBT
presentation of other items following these version clearly presents more contextual
initial numbers depends on the answer to academic materials and eliminates dialogs
these initial numbers. An item with a more of general themes that are normally pre-
difficult level follows a correct answer; an sented as fragments. The micro listening
item with less difficult level tags along with skills assessed include identification of
an incorrect answer, and so on. Simply, the main ideas, supporting details, inferences,
presentation of an item adapts the TOEFL functions, and organizational structure of
taker s level of ability or technically known the text.
as theta and symbolized as (Hulin, Dra- Unlike in the previous versions, where
gow, and Parsons, 1983:26). speaking was an optional sub test tested on-
The iBT version as the next generation ly on particular testing dates, in the iBT ver-
TOEFL may be considered as a significant sion speaking is an integral part of the bat-
innovation in the construction of TOEFL. It tery. In all there are 6 tasks comprising of
comes up utterly with not only a new format two independent tasks dealing with expres-
along with all macro skills but also a new sion of an opinion on a known topic of aca-
presentation. As aforementioned, the iBT demic matters and four integrated tasks re-
version captures the views of the commu- quiring speaking on the basis of information
nicative approach. These are clearly reflect- picked up in the listening and reading sec-
ed in the inclusion of all macro language tions. More specifically, in the integrated
skills: listening, speaking, reading, and writ- mode involving reading, listening, and
ing as integrated sub tests while excluding speaking, test takers are first to read a text,
language components: grammar and voca- listen to a text, and then to speak the rela-
bulary as a discrete sub test. In addition, tionship of ideas in the two texts. While in
academic themes are also considered in the the integration of listening and speaking, the
new version. This is to say that integration test takers are first to listen long texts and
125 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009
The table clearly shows that the total It has been touched upon previously
score for each version differs markedly with that pBT and iBt do not utilize an adaptive
the iBT total score of 120, cBT of 300 and mode of testing unlike the cBT version.
pBT of 677. These total scores also imply a However, what is unclear in relation to the
score transformation in particular with cBT scoring mechanism, particularly with the
and pBT, to reach a total score. It is true cBT version is whether a transformation
that in cBT and more obviously pBT, score from raw scores to ability scores is per-
transformation is performed. In order to es- formed is carried out or not such as that
timate a score on a sub test, a table of score normally adopted in the application of the
conversion is required (Sulistyo, 2001). item response theory in real testing context
This explains that totaling a maximum score (Baker, 1985). In the traditional mode of
in each sub test in cBT and pBT does not score interpretations, raw scores are as-
automatically yields a total score of each of sumed to reflect abilities. This is unlike the
these versions. This is not the case with the practice in the modern mode of score inter-
iBT, where totaling the maximum score of pretations where abilities are not just the
each sub test automatically yields the total sum of the correct answers (Lord, 1980).
score of the version. Thus there seems to be One s ability or known as theta ( ) is re-
a more simplification in the scoring scale in flected in scoring adopting the item re-
the more recent version. sponse theory.
A closer look at each of sub test in each
A WORD TO CONCLUDE
version also reveals how scoring in each
The paper has addressed all the topics
version is performed in a different way.
under interest. Substantially, TOEFL is a
With cBT and iBT, the minimum total score
proficiency test which aims at assessing
in each aspect to be tested is 0 (zero) and
one s general language ability. The test
the maximum score is 30. Thus in terms of
spreads individuals along the continuum of
the assigning the lowest and the highest
language ability so that their language abili-
scores they share the same ground. The
ties are known in an ability scale. The con-
yielding of the total score, however, is dif-
tent of TOEFL does not reflect a particular
ferent. pBT obviously utilizes a different
set of syllabus or curriculum. Historically,
score assignment, the lowest score being 31
TOEFL has witnessed three shifts in its ver-
while the highest 67 or 68 points.
sion, namely pBT, cBT, and iBT along the
127 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009