You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/281452719

TOEFL IN A BRIEF HISTORICAL OVERVIEW FROM PBT TO IBT

Article · August 2009

CITATIONS READS

5 1,246

1 author:

Gunadi Sulistyo
State University of Malang
26 PUBLICATIONS   61 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Impacts of Teacher-Written Corrective Feedback with Teacher-Student Conference on Students’ Revision View project

THE PROFILE OF EFL LEARNERS AS MEASURED BY AN ENGLISH PROFICIENCY TEST View project

All content following this page was uploaded by Gunadi Sulistyo on 03 September 2015.

The user has requested enhancement of the downloaded file.


TOEFL IN A BRIEF HISTORICAL OVERVIEW FROM PBT TO IBT

Gunadi H. Sulistyo

Jurusan Sastra Inggris Fak. Sastra Universitas Negeri Malang

Abstract: This article briefly reviews the development of TOEFL as a widely ac-
knowledged version of English proficiency test for non-native users. The specific
aspects on review are the nature of TOEFL as a testing instrument, its historical de-
velopment from the perspective of language and test development theories, and the
testing formats of the language aspects in both earlier and later versions of TOEFL.
Also on elaboration is the scoring comparison applied to both versions..

Kata kunci: TOEFL, development, version

The Test of English as a Foreign Language needs of TOEFL scores have also boosted
(henceforth TOEFL) has enjoyed its presti- from year to year. As a consequence, the
gious status as a standardized test widely needs of TOEFL training are also inevitable
used across nations of more than one hun- although it is not clear whether those pros-
dred countries since its initial establish- pective TOEFL takers take the test for their
ment in early 1960s. It has been utilized as a further studies abroad or for any other pur-
means of measuring the proficiency of non- poses. What is obvious then is that such a
native speakers of English, as its name de- demanding context triggers the establish-
monstrates, in English as a foreign lan- ment of preparatory courses that burgeon
guage, in particular for academic purposes. ubiquitously. It is an undeniable fact that
Not only international educational institu- such preparatory courses in a way play a
tions, several domestic higher-learning in- role in catering for the needs of the prospec-
stitutions as well as non educa-tional agen- tive TOEFL candidates of the TOEFL
cies have also made use of the score of in- scores a part from any interest in their es-
dividuals taking TOEFL as a requirement of tablishment.
not only admission, recruit-ment, but also While the technologies of testing
exit purposes. This implies that many have adopted by TOEFL have advanced more
relied on TOEFL as a dependable tool that rapidly, and at the same time, while the
can provide good evidence of one s profi- need of the TOEFL scores tends to be in-
ciency in English as a foreign language. creasing in number, it seems that those pre-
It is believed that interest in taking paratory courses have not been able com-
TOEFL has been increasing as shown in the pletely to catch up with, in particular, the
number of the prospective TOEFL takers advances of testing technologies employed
from year to year. This indicates that the by TOEFL. For example, on one side

116
117 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009

TOEFL currently has adopted the socalled NATURE


next generation version of the internet based By purpose, TOEFL can be categorized
testing practices, or iBT (henceforth) since as a proficiency test. Brown (2005:8) de-
2005, which has been a significant shift
fines a proficiency test as the test that has
from older TOEFL versions of computer- the function to assess the general know-
based TOEFL (cBT for short) as well as pa- ledge or skills commonly required or prere-
per-and-pencil based TOEFL (pBT, hence- quisite to entry into (or exemption from) a
forth). On the other side, those training group of similar institutions. The generality
courses mostly still deal with the pBT ver- level of proficiency of the test implies first
sion. Therefore, there seems to be a need of that TOEFL is not linkable to a particular
those running preparatory TOEFL programs school syllabus or curriculum because
to seek for a more wide-ranging picture of TOEFL is established on the basis of con-
all the practices that TOEFL has undergone cept of general language ability. Simply, in
so far. There are several reasons for this ne-
a more operational term the materials that
cessity. In the first place, those courses will are contained in TOEFL do not reflect the
have accurate visions in provi-ding the instructional materials of a particular sylla-
prospective TOEFL candidates with accu-
bus or curriculum.
rate information concerning the type of lan- Secondly, as a test established on gen-
guage skills these candidates pursue. Next, eral language ability, TOEFL is necessarily
those courses in effect will try their best to
a norm-referenced test. This kind of test is
provide the prospective TOEFL candidates to produce scores that can spread individu-
with appropriate language skills that reflect als taking the test along the ability line
most closely real academic life. Also, those
ranging from the least able to the most able.
courses will always try to update them- Psychometrically, such a test needs to be
selves with TOEFL s most current technol- able to put an individual in a point along the
ogies and practices, which ultimately will
ability line ranging from - to + . This
benefit the prospective TOEFL candidates view also posits that the number of the
from joining the courses they offer. people with the ability close to these two
This piece of paper is aimed briefly at
extreme points ( - and + ) is fewer than
reviewing TOEFL from its first version, that of the people with the ability around the
pBT, to its current version, iBT. For the average. This is what is commonly known
purpose, several topics will be dealt with,
as assumption of normality. When plotted,
covering first the discussion that touches each level of ability in the universe of a par-
more on the conceptual ground of the nature ticular group necessarily forms a bell, and
of TOEFL. The next part will deal with the
its distribution is commonly known as a
development of TOEFL. Following this part bell-shaped distribution
is the presentation of the components that
make up each existing TOEFL. Scoring
matters constitute the next part. Finally, the
last part concludes the paper.
.
Sulistyo, Toefl in A Brief Historical 118

(Rosa, et al., 2001:261)


Figure 1: Assumed Normal Distribution of Ability
In this view one s performance in formance in the same test. Consider the fol-
TOEFL is to be compared to another s per- lowing figure.
A B

- _______._____________________________________________.__________ +

Figure 2: A Hypothetical Standing Ability of Two Individuals in Ability Line


In the figure displayed above the stand- iBT. Initiated by an American council on
ing of A is relative to the standing of B in the testing of English as a foreign language
an ability line that ranges from - to + in the early of 1962, TOEFL and its histori-
with B being considered more able than A cal development can be viewed from two
in the line. This follows then that, the gene- angles. The first perspective is concerned
rality nature of a proficiency test makes it with development of TOEFL as seen from
possible for a comparison of not just indi- the underlying concept of language ability;
vidual but also groups. and the second angle deals with advances in
Also, as Brown (2005:9) puts it to say the testing technology that characterize
as a test of language proficiency, TOEFL TOEFL.
can play a role as an external measure The pBT version. Viewed from the
which is neutral to individuals as well as linguistic perspective, TOEFL originally
groups. In the case of academic contexts adopts the structural linguistic view and this
within English-speaking countries, say the is obvious in the pBT format. This structur-
USA or Canada, an individual s obtained al linguistic view believes that language is
TOEFL score will make it possible to de- divisible in nature. Just recall the concept of
termine whether an individual fits in a par- duality in language. Language is of two
ticular program or not. Similarly, an indi- main layers: the layer of form and that of
vidual s score upon completion of a TOEFL meaning. The former is concrete; the latter
program will be able to be used as an indi- abstract. As such the former is believed to
cator of his/her proficiency level that can be more learnable than the latter. The layer
predict his/her success in other context. of form consists of other divisible layers,
from phonemes as the smallest unit to syn-
DEVELOPMENT OF TOEFL:
tactic constructions as the largest. Language
HISTORICAL PERSPECTIVES
ability is conceptualized as the mastery of
Thus far, TOEFL has witnessed three
successive major formats: pBT, cBT, and the layers one by one. This follows that
there is a need to test one s mastery of these
119 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009

layers bit by bit. In TOEFL this view is re- components may be realized, which include
flected clearly in the importance of having listening, speaking, reading, and writing.
accuracy of grammatical form or testing a Language components include gram-
TOEFL taker s grammatical knowledge. mar/structure, vocabulary, phonology/ortho-
Language is also conceived to comprise graphy and fluency. Harris (1969:11) neatly
interacting two components: language skills illustrates the relation between language
and language components. Language skills skills and language components as the fol-
refer to the modes through which language lowing matrix suggests.
Language Language Components
Skills
Grammar / Struc- Vocabulary Phonology/ Or- Rate and General
ture thography Fluency

Listening v v v v

Speaking v v v v

Reading v v v v

Writing v v v v

(adapted from Harris, 1969:11)


Figure 3: Matrix on Language Skills and Components
In the earlier format of TOEFL, the dialogs, and comprehension of texts larger
adoption of the structural linguistic view is than dialogs/monologs. In the comprehen-
obvious. For instance, in the pBT the test is sion of fragmented sentences, accuracy in
comprised of three sub tests as Listening frequently grammatical points is tested.
Section, Grammar and Written Expression In addition to these, the second section,
Section, and Reading Comprehension Sec- Grammar and Written Expression Section,
tion. The other two, which are normally clearly reflects the structural linguistic
tested separately, include the Test of Writ- view. In this part, a sentence of particular
ten English (TWE) and the Test of Spoken grammatical complexities is presented with
English (TSE). one part containing a grammatical mistake.
The influence of the structural linguistic The lexical meaning of the sentence is as far
view, when examined further, is also ob- as possible kept plausible. The TOEFL tak-
vious in the formulation of test items. For ers are to identify the mistake. Very fre-
instance, the listening section is composed quently the mistake is of local errors ,
of three independent parts reflecting the ex- which do not potentially interfere with
istence of linguistic layers: comprehension communication. This section has a typical
of fragmented sentences, comprehension of item as follows
The Peace Corps was establish on March 1, 1961 by then President John F. Kennedy.
A B C D
In such an item, rather than communica- tutes an utmost important prerequisite for
tiveness of an expression, grammatical sen- language mastery.
sitivity of the expression is being assessed. Reading section also clearly sug-
This seems to be typical of the structural gests the influence of the structural linguis-
linguistic view for accuracy of form consti- tic view. This section may be introduced
Sulistyo, Toefl in A Brief Historical 120

with a test on vocabulary items which are The other two, which are normally tested
presented in sentential contexts. This part separately, still include the Test of Written
also frequently begins with a short text, pre- English (TWE) and the Test of Spoken Eng-
sumably a paragraph with questions follow- lish (TSE). The test format used also re-
ing the text. Items on testing understanding mains the multiple-choice type with four
the meaning of a word in context are also selective alternatives. In later years, howev-
included as a question. er, the cBT version adopted more and more
Seen from the testing technology communicative views of language compe-
adopted in the pBT version, the test format tence.
used in TOEFL fits the divisibility nature of The cBT version, however, may be
language. Apart from its test of written Eng- classified into two in terms of the testing
lish (TWE) and test of spoken English technology adopted in this version. The first
(TSE), TOEFL employs the multiple-choice cBT type is essentially like the pBT version
type with four options. In this selective type in that it still assesses listening, grammar
of response, there is a stem that functions as and written expression, and reading with the
a stimulus to which the TOEFL takers will same sub tests distributed in 3 main sec-
respond. Following the stimulus are the al- tions. The test format is the same: the mul-
ternatives with one correct answer and three tiple-choice type with four selective alterna-
distracters for the TOEFL takers to select. tives and the items presentation is fixed
The use of multiple-choice enables the lan- with an increasing level of difficulty. As
guage elements to be measured bit by bit. In such the first cBT type may be known as the
addition, the presentation of items in the test non-adaptive type of the cBT version. This
follows an order with an increasing level of first type is basically the pBT version which
difficulty. However, the items presentation is made computerized.
is fixed in that the TOEFL takers have no In the initial stage, the second cBT
choice in completing the items presented to type is essentially like the non adaptive cBT
them whether or not the level difficulty (p) one in terms of contents and format: mea-
of the items fit their language ability. The surement of listening comprehension, sensi-
earlier items are those with medium level of tivity on grammar and written expressions,
difficulty, or .60 p .80 (Crocker and and reading comprehension with the same
Algina, 1986:312). This feature is unders- sub tests distributed in 3 main sections; the
tandable due to the nature of the pBT for- test format being the same: the multiple-
mat which does not permit the level of dif- choice type with four selective alternatives.
ficulty of the items to vary along the line What differs essentially lies in the presenta-
with the ability level of the test takers. tion of test items. This type still adopts a
The cBT version. Seen from its un- strategy with an increasing level of difficul-
derlying linguistic theory, basically the cBT ty with earlier items being those with a me-
version is still characterized by the structur- dium level of difficulty. However, the items
al linguistic views. Thus, there have not following the first items may vary depend-
been significant changes in the concepts re- ing on the response of the TOEFL takers.
flecting general language ability adopted in Figure 4 describes the scheme of adaptive
the cBT version. Just like the pBT version, testing.
in the cBT earlier version, the test is com-
prised of three sub tests as Listening Sec-
tion, Grammar and Written Expression Sec-
tion, and Reading Comprehension Section.
121 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009

Item 5

Item 4

Item 3 Item 5

Item 2 Item 4

Item 1 Item 3 Item 5

Item 2 Item 4

Item 3 Item 5

Item 4

Item 5

Figure 4: One Simple Scheme of Adaptive Testing


A correct response made by the test Thus, under this scheme a test taker will
taker will be a stimulus for the computer to only respond to the items that fit his/her
process a more difficult item than the earlier level of ability. The adaptive version is fur-
to be completed next by the test taker. Con- ther facilitated with advances in computing
versely, a wrong answer will lead to the technology where manual computations will
presentation of an easier item to the test be extremely time-consuming and tiring
taker to be responded. Thus, the level of dif- with no assurance of accuracy.
ficulty of the test items adapts the test tak- In addition to these, in the later devel-
er s level of ability. Because of this me- opment, a new cBT begins to include im-
chanism, the second cBT type is commonly portant views in more recent advances in
known as the adaptive version of TOEFL. the concept of communicative language use
Unlike the non adaptive one, the adap- (TOEFL Internet-Based Test, 2007:3). Of
tive version of TOEFL is based on the work the views in the new concept of communic-
of the modern test theory or the item re- ative language use, language is considered
sponse theory in which items are made in- more functionally with a focus as a means
variant across test takers (Hulin, Drasgow, of communications. While verbal commu-
and Parsons, 1983:43), which means that no nications mean the realization of compe-
matter who responds to the items, the cha- tence in performance, the new CBT goes on
racteristics of the items remain the same. this tract and is concerned with the inclu-
Sulistyo, Toefl in A Brief Historical 122

sion of macro language skills: listening, macro skills: listening, speaking, reading
speaking, reading, and writing. Language and writing. Also, in this iBT version, aca-
components are tested in integration within demic settings and themes are more empha-
language skills rather than in isolation. sized.
In addition to this shift, the new cBT al- Introduced worldwide in the period of
so adopts a new testing mode that provides 2005 - 2006, the iBT version, as its name
TOEFL takers with more opportunities to indicates, makes functional use of informa-
demonstrate their command in utilizing ma- tion and communication technology (ICT).
cro skills by way of constructing direct res- The TOEFL tasks in the iBT version are
ponses, thus beginning to leave the selec- delivered through the internet from Educa-
tive-response format behind. Another shift tional Testing Services (ETS) to the autho-
taking place in the later development of the rized testing centers where the candidates
new cBT relates with the language are pooled to complete the tasks.
processing. Unlike the old cBT version
which is essentially the pBT version made BUILDING BLOCKS INSIDE ALL
computerized and elicits language abilities TOEFL VERSIONS
through a discrete-mode of testing, the new-
er cBT begins to include test tasks that All the three TOEFL versions: pBT,
would process language in a more integra- cBT, and iBT essentially have their own
tive fashion in terms of language skills. characteristics viewed from two main
Thus, writing or speaking tasks may have a points: what to be tested and how to test it.
relation with a reading or listening task. What follows is a brief account of each of
Prior to producing English through writing the TOEFL versions seen from these two
or speaking tasks, a candidate may be re- points.
quired to incorporate pieces of information In terms of what to test, the pBT ver-
that appear in a reading or listening task. sion, as it was influenced by the structural
The iBT version. This version was linguistic views, is characterized by dis-
launched in 2005, and is gradually expected crete-testing practices. One (or maybe two)
to substitute the role of both the cBT and language component is obviously tested,
the pBT versions. The introduction of the namely grammar, under a separate section.
new cBT version plays a critical role to the Vocabulary is also tested, but is commonly
establishment of the iBT version in that the put under reading. Two macro skills are
new cBT version lays a strong transitional tested, namely listening and reading. These
bridge on which to step onto the era of the three aspects: grammar, listening, and read-
iBT version. ing all together comprise one battery com-
As has been discussed previously, from monly known as the paper-based TOEFL.
linguistic standpoints, the later cBT version Two other macro skills speaking and writ-
has endeavored to feature the principles of ing are also tested as independent sets
the communication-movement in language known as the Test of Spoken English (TSE)
testing in the corresponding TOEFL sub and the Test of Written English (TWE) re-
tests. As the further development of the lat- spectively.
er cBT version, the iBT version shares simi- The listening section of the pBT version
lar features with the later cBT version. normally consists of a variety tasks assess-
From linguistic standpoints, the iBT version ing three or four aspects: sentence level
is also characterized by the need to test the comprehension, comprehension of dialogs,
candidates functional language skills. It is comprehension of extended conversations,
designed to measure the integrated use of and comprehension of mini talks. These as-
123 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009

pects clearly indicate levels of how lan- sessing a candidate s clarity in expressing
guage is believed to be constructed. The so- his/her ideas in spoken English of exposi-
cial themes presented to assess these aspects tion. The candidate s use of grammar, vo-
are commonly of general interest. In mini cabulary, and pronunciation as well as idea
talks, however, mini lectures are also pre- organizations are also evaluated.
sented. This is intended to represent aca- In terms of how to test, the pBT version
demic settings. is basically non adaptive testing. The items
The grammar section mainly aims at are arranged with their fixed yet increasing
testing grammatical accuracy and, in one level of difficulty along the items in the cor-
sense, grammar sensitivity. The grammati- responding battery. Aspects to be tested are
cal points to be tested include a variety of organized into sections indicating particular
English grammar aspects such as verbs, abilities to be assessed. For instance, the
auxiliary verbs, nouns, pronouns, modifiers, listening comprehension section is orga-
comparatives, connectors, sentences and nized into three main parts: sentence com-
clauses, relationship of ideas, agreement, prehension or dialog part of about 30 ques-
introductory verbal modifiers, parallel struc- tions, extended conversation part of about 5
tures, redundancy, and word choice questions, and mini-talk part of about 5
(Sharpe, 2005:86-113). questions. The grammar and written expres-
The reading section may consist of two sion section is differentiated into two parts:
main aspects to be tested: vocabulary and error recognition and completion with about
reading. Vocabulary includes the testing of 20 questions each. The reading comprehen-
word meanings and/or meanings of words sion section may take the form of 4-5 read-
in sentential contexts (Jenskins-Murphy, ing passages with about 7-9 comprehension
1981). This includes among other things questions following them. Vocabulary items
testing of shades of meaning of words, syn- are included in this part. Writing is an inde-
onym, antonym, word-part clues, denota- pendent task in the pBt version which re-
tion, and connotation. Reading aims at as- quires the test takers to respond to a writing
sessing various micro reading skills, like task using their hand writing. Normally, it
understanding main idea, understanding takes 3 hours to accomplish the pBT ver-
supporting ideas/details, understanding or- sion.
ganization of the text, understanding im- Basically there are not many differences
plied details, understanding word meaning, between the pBT and cBT versions in terms
understanding pronoun reference, and un- of what to test. Thus, the cBT version tests
derstanding the writer s tone of writing both language components (grammar and
(Phillips, 1989). The rhetoric modes of the possibly vocabulary) and skills (listening,
text include among other things narration, reading, and writing). Slight differences are
definition/illustration, classification, com- observed, however. In the listening section
parison, contrast, cause, effect, persua- of the cBT section, less dialogs (about 20)
sion/justification, problem/solution (Sharpe, are presented with one question each. Ex-
2005:122-251). tended conversations in the pBT versions
The writing test assesses a candidate s are modified a bit into short conversations
ability in writing a piece of essay of exposi- in the cBT version with about the same
tory or persuasive modes. Focus of testing number i.e. 3 short conversations with about
is placed on the candidate s ability to organ- 3 questions each. The pBT s mini-talks are
ize ideas using accurate grammar, vocabu- also modified and specified as mini-lectures
lary, spelling and mechanics. Just like the and discussions in the cBT versions with
writing test, the speaking test aims at as- about 5 questions each. Also, the reading
Sulistyo, Toefl in A Brief Historical 124

section of the pBT and cBT versions is al- of language components into language
most similar in terms of the number of pas- skills, and integrated processing of informa-
sages and comprehension questions follow- tion within language skills as well as an
ing the passages, figuring at about 5 passag- emphasis on academic settings has characte-
es and 10 questions each respectively. In rized the iBT version strongly.
terms of the writing tasks, the pBT and cBT A further examination on each sub sec-
versions share similarities i.e. one indepen- tion of the iBT version reveals substantial
dent task, focusing on exposition/persua- changes that have been made. In the listen-
sion. The scoring of writing, however, is ing section, more genuine academic conver-
combined with the structure section. Final- sations and lectures are presented. Other
ly, the number of items in the structure sec- aspects include questions that probe further
tion is reduced in the cBT version to about about the speaker s mood, feeling, purpose
25 questions from about 40 questions in the and drive are also posed to the test takers.
pBT version. All the sub tests in the cBT More naturally, note taking, which did not
version need 3.5 hours to accomplish. appear in the previous versions, is permitted
In terms of how to test, as has been in the iBT version. In terms of types of aur-
dealt with previously, the cBT version is al stimulus, listening tasks include two main
psychometrically speaking adaptive, in par- formats: lectures (6 texts) as well as class-
ticular in the listening and the structure ses- room discussion with corresponding ques-
sions, but not for the reading and writing tions (about 5) each, and academic setting
sections. Initial numbers are presented with conversations (about 3) accompanied with 5
items of moderate levels of difficulty. The questions each. With this format, the iBT
presentation of other items following these version clearly presents more contextual
initial numbers depends on the answer to academic materials and eliminates dialogs
these initial numbers. An item with a more of general themes that are normally pre-
difficult level follows a correct answer; an sented as fragments. The micro listening
item with less difficult level tags along with skills assessed include identification of
an incorrect answer, and so on. Simply, the main ideas, supporting details, inferences,
presentation of an item adapts the TOEFL functions, and organizational structure of
taker s level of ability or technically known the text.
as theta and symbolized as (Hulin, Dra- Unlike in the previous versions, where
gow, and Parsons, 1983:26). speaking was an optional sub test tested on-
The iBT version as the next generation ly on particular testing dates, in the iBT ver-
TOEFL may be considered as a significant sion speaking is an integral part of the bat-
innovation in the construction of TOEFL. It tery. In all there are 6 tasks comprising of
comes up utterly with not only a new format two independent tasks dealing with expres-
along with all macro skills but also a new sion of an opinion on a known topic of aca-
presentation. As aforementioned, the iBT demic matters and four integrated tasks re-
version captures the views of the commu- quiring speaking on the basis of information
nicative approach. These are clearly reflect- picked up in the listening and reading sec-
ed in the inclusion of all macro language tions. More specifically, in the integrated
skills: listening, speaking, reading, and writ- mode involving reading, listening, and
ing as integrated sub tests while excluding speaking, test takers are first to read a text,
language components: grammar and voca- listen to a text, and then to speak the rela-
bulary as a discrete sub test. In addition, tionship of ideas in the two texts. While in
academic themes are also considered in the the integration of listening and speaking, the
new version. This is to say that integration test takers are first to listen long texts and
125 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009

then to make a summary and express, or SCORES IN COMPARISON


defend their opinions on the information The changes in the format of all TOEFL
contained in the texts with clarity, cohe- versions are also accompanied with changes
rence and accuracy.
in the scale used to score the TOEFL takers.
Reading tasks in the iBT version also These changes are necessary. There are
take a different format. In the new format 3- reasons for the changes. In the first place
5 long texts of academic themes are pre- the components to be tested change. As a
sented, followed with 12-14 questions each result, there is a need to accommodate the
text, covering the micro reading skills about scoring system in evaluating the TOEFL
main ideas, supporting details, inferences, takers score in each of the components to
restatements, sentence insertion, language be tested. Secondly, the underlying philoso-
functions, organization of ideas. Rather than phies of the versions have also shifted from
recognize and make a choice as has hap- structural to communicative, thus respond-
pened in the old versions, in the iBT version
ing to more recent advances to the theory of
the test takers are to make responses in the language. Most importantly, a more mea-
form of categorizing information, filling out ningful interpretation is needed as what pro-
tables or charts, making or completing
ficiency is indicated by the score on taking
summaries, or paraphrasing. TOEFL of different versions. It is reported
Just like speaking, writing is also that extensive studies on the scoring com-
integral to the whole TOEFL battery in the
parison have been conducted involving a
iBT version. There are two tasks: one inte- number of 3,000 from 30 countries between
grated task which requires the test takers to the period of 2003-2004 (ETS, undated:4)
write the relationship of ideas of the aca-
Score comparison in all TOEFL ver-
demic texts they have read and heard; one sions may be viewed from the total score or
independent task which requires the test from each corresponding separate sub test:
takers to support a personal opinion in the
listening, reading and writing. Meanwhile,
form of an essay. scores obtainable from the grammar section
Advances in the test construction of the are relevant to be compared because they
iBT version have not, however, happened in
are only available from pBT and cBT ver-
the area of presenting adaptive items. Thus, sions. However, for a more meaningful in-
in accomplishing items of the iBT version, terpretation of the score obtained, separate
test takers are presented with the same array
scores are more desirable because these
of test problems. separate scores are more informative than
The presentation described above clear- the total score. The following table presents
ly indicates that the iBT version of TOEFL
TOEFL score scales of the three versions.
differs markedly from its other two prede-
cessors. The differences are obvious from
two aspects: what to test and how to test,
but not in terms of adaptability of the test
tasks to the test takers level of ability.
Sulistyo, Toefl in A Brief Historical 126

Table 1: TOEFL Score Scale Comparison


TOEFL Ver- Aspects to Be Tested Total Score
sion
Listening Structure Speaking Reading Writing

iBT 0 - 30 n/a 0 - 30 0 - 30 0 - 30 120

cBT 0 30 0 - 30 n/a 0 - 30 combined 300


with Struc-
ture

pBT 31 - 68 31 - 68 n/a 31 - 67 n/a 677

(adapted from ETS, undated:5)

The table clearly shows that the total It has been touched upon previously
score for each version differs markedly with that pBT and iBt do not utilize an adaptive
the iBT total score of 120, cBT of 300 and mode of testing unlike the cBT version.
pBT of 677. These total scores also imply a However, what is unclear in relation to the
score transformation in particular with cBT scoring mechanism, particularly with the
and pBT, to reach a total score. It is true cBT version is whether a transformation
that in cBT and more obviously pBT, score from raw scores to ability scores is per-
transformation is performed. In order to es- formed is carried out or not such as that
timate a score on a sub test, a table of score normally adopted in the application of the
conversion is required (Sulistyo, 2001). item response theory in real testing context
This explains that totaling a maximum score (Baker, 1985). In the traditional mode of
in each sub test in cBT and pBT does not score interpretations, raw scores are as-
automatically yields a total score of each of sumed to reflect abilities. This is unlike the
these versions. This is not the case with the practice in the modern mode of score inter-
iBT, where totaling the maximum score of pretations where abilities are not just the
each sub test automatically yields the total sum of the correct answers (Lord, 1980).
score of the version. Thus there seems to be One s ability or known as theta ( ) is re-
a more simplification in the scoring scale in flected in scoring adopting the item re-
the more recent version. sponse theory.
A closer look at each of sub test in each
A WORD TO CONCLUDE
version also reveals how scoring in each
The paper has addressed all the topics
version is performed in a different way.
under interest. Substantially, TOEFL is a
With cBT and iBT, the minimum total score
proficiency test which aims at assessing
in each aspect to be tested is 0 (zero) and
one s general language ability. The test
the maximum score is 30. Thus in terms of
spreads individuals along the continuum of
the assigning the lowest and the highest
language ability so that their language abili-
scores they share the same ground. The
ties are known in an ability scale. The con-
yielding of the total score, however, is dif-
tent of TOEFL does not reflect a particular
ferent. pBT obviously utilizes a different
set of syllabus or curriculum. Historically,
score assignment, the lowest score being 31
TOEFL has witnessed three shifts in its ver-
while the highest 67 or 68 points.
sion, namely pBT, cBT, and iBT along the
127 BAHASA DAN SENI, Tahun 37, Nomor 2,Agustus 2009

line with a shift in the theory underlying REFERENCES


their construction. The pBT and cBT are Baker, Frank B. 1985. The Basics of Item
gradually being replaced by the iBT ver- Response Theory. Portsmouth, New
sion, which began to come to public in
Hampshire: Heinemann.
2005. Seen from the components making up Brown, James D. 2005. Testing in Lan-
TOEFL as a battery, each version has a dif-
guage Programs: A Comprehensive
ferent sub test with different testing formats. Guide to Testing language Assessment.
The earlier versions are characterized by the New York: McGraw-Hill.
structural grammar views. The iBT version ETS, Undated. TOEFL Internet-Based Test:
begins to move onto the communication- Score Comparison Tables. Princeton,
movement. The earlier ones make use of New Jersey: Educational Testing Ser-
general social themes; the iBT version fo- vice.
cuses more on academic matters. While the Harris, David. P. 1969. Testing English as a
multiple-choice type still characterizes all
Second Language. New York: McGraw-
versions dominantly, speaking and writing Hill Book Company.
take direct testing in which the test takers Hulin, Charles L., Drasgow, Pritsz, and Par-
respond to the tasks by speaking and writing
sons, Charles K. 1983. Item Response
respectively not just recognizing alternative
Theory: Applications to Psychological
items provided as a selection. The scoring Measurement. Homewood, Illinois: Dow
in all the versions changes moving from a
Jones-Irwin.
complex score transformation to a simpler Jenskins-Murphy, Andrew. 1981. How to
one. Prepare TOEFL. New York: Harcourt
As a proficiency test aimed at testing
Brace Jovanovich.
language abilities as are required in the aca- Lord, Frederic M. 1980. Applications of
demic settings, TOEFL has significant Item Response Theory. Hillsdale, NJ:
backwash impacts. For example, it has di-
Lawrence Erlbaum Associate, Publish-
rected individuals to make a variety of at- ers.
tempts on how to achieve a higher TOEFL Sharpe, Pamela J. 2005. Barron s Practice
score. TOEFL training courses have mu-
Exercises for the TOEFL 5th Edition. Ja-
shroomed as a sequence. While all this is a karta: Bina Rupa Aksara.
good indicator of the presence of motivation Sulistyo, Gunadi H. (2001). Technical
in the English learning, cautions should be
Considerations for Taking the (Paper-
exercised. Tests like TOEFL frequently and-Pencil-Based) TOEFL . A Paper
play a high-stake role. However, real aca- Presented in a seminar Computer-Based
demic life after taking TOEFL is more rea-
TOEFL: Concepts and Strategies orga-
listic. Therefore, a high score in TOEFL nized by CSU, English Department, State
needs to reflect a solid mastery for func- University of Malang, May 26, 2001.
tional communication in academic settings.
The challenge for the TOEFL course pro-
viders is that they need to provide their cus-
tomers with relevant instructional materials,
suitable class tasks for practice, and more
importantly they need to keep up with all
recent advances made by ETS as the
TOEFL developer.

View publication stats

You might also like