Professional Documents
Culture Documents
COMMENTARY: An Analysis of a
Language Test for Employment:
The Authenticity of the
PhonePass Test
Christian W. Chun
Published online: 16 Nov 2009.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden.
The publisher does not give any warranty express or implied or make any
representation that the contents will be complete or accurate or up to
date. The accuracy of any instructions, formulae, and drug doses should be
independently verified with primary sources. The publisher shall not be liable
for any loss, actions, claims, proceedings, demand, or costs or damages
whatsoever or howsoever caused arising directly or indirectly in connection
with or arising out of the use of this material.
LANGUAGE ASSESSMENT QUARTERLY, 3(3), 295–306
Copyright © 2006, Lawrence Erlbaum Associates, Inc.
COMMENTARY
An Analysis of a Language
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
In a further demonstration of the growing divide between those who have unlim-
ited access to technology and those who do not, the recent development of the ad-
ministration of tests over the telecommunications networks that has proceeded
apace—the Graduate Record Examination, the Test of English as a Foreign Lan-
guage Internet-Based Testing, and the Information and Communication Technol-
ogy Literacy Assessment, to name just a few—highlights the highly commercial-
ized aspects of the test as being one more tool to be commodified, integrated, and
test takers and those who assess the test results must inevitably ask is, “Are we get-
ting our money’s worth?”
One such test that is being administered efficiently over telecommunications
networks is Ordinate Corporation’s PhonePass Spoken English Test–10 (SET–10).
This test is being marketed as a useful assessment tool for screening job candi-
dates’ ability in spoken English. In addition, according to the website of Ordinate
(now a subsidiary of Harcourt Assessment), the applications of this test include
evaluating employees for promotions and measuring service employees with “an
accurate and objective test of their spoken English capabilities” (Ordinate Corpo-
ration, 2005, ¶ 8). The test is also promoted as having academic applications—as
an entrance exam, a placement and exit exam, and one to qualify international
teaching assistants.
What is pertinent here is the question, as a language test, how authentic is
PhonePass? Authenticity has been defined as “the degree of correspondence of the
characteristics of a given language test task to the features of a Target Language
Use task. It is this correspondence that is at the heart of authenticity” (Bachman &
Palmer, 1996, p. 23). Do the characteristics of the domain of the test tasks found in
PhonePass correspond to the real-life domain of nontest tasks? As Luoma (2004)
reminded us, “with task-based tests, the developers need to show that the content
of the test tasks is representative of the demands of the corresponding task outside
the test situation, and that the scoring reflects this” (p. 43).
The notion and application of authenticity in language testing are complex.
Lewkowicz (2000) suggested that authenticity comprises three elements: “authen-
ticity of input, purpose, and outcome” (p. 51). She posed the question “To what ex-
tent can/do test tasks give rise to authentic-sounding output which allow for gener-
alizations to be made about test takers’ performance in the real world?” (p. 51). Do
the test tasks in the PhonePass test generate such an output? The assessment of
what generally constitutes “authentic-sounding output” may be problematic; how-
ever, when situated in specific sociolinguistic contexts, certain outputs should be
expected and thus accepted as authentic sounding.
The PhonePass test has been promoted as an expedient assessment for screen-
ing job candidates’ ability in spoken English and as an academic entrance exam.
Examiners (as well as test takers) should therefore expect to find a high degree of
correspondence between the features of the test tasks found in the Phone Pass test
LANGUAGE TEST FOR EMPLOYMENT 297
and the features of target language use tasks found in real-life domains for the test
to be considered relatively authentic. In the real-life domain of the work environ-
ment, one of the primary target language use tasks is the use of extended produc-
tion responses. These might include interacting with clients over the phone and in
face-to-face encounters and responding to customers’ queries, each of which re-
quires knowing which register to use in an appropriate situation. These tasks in-
volve extensive organizational and pragmatic knowledge. A defined range of pro-
duction responses should serve as a model by which test takers’ output can be
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
GENERAL DESCRIPTION
n.d.), from the Technical Reports section of the website. Given the extra step in-
volved, it seems unlikely that many test takers will do this.
The setting of the SET–10 is one of its main appeals, and it is used in marketing the
test. This physical characteristic of the test enables one to take the test in any loca-
tion where there is a land-line phone system available. Therefore, the test taker can
conceivably take the test in an office, in the comfort of one’s own home, at an air-
port, or even at an outdoor telephone booth (if one still exists). However, because
the SET–10 administration is supported by a test paper, the test taker does need to
have access to an Internet-connected computer first, to download it from the com-
pany’s website. The test taker also needs a test identification number to take the
test over the phone. This identification number is available on only the downloaded
test paper. Although the company markets the test as being just a phone call away,
this aforementioned procedure of obtaining the required test paper necessitates a
fairly high degree of familiarity with computer technology—that is, knowing how
to use an operating system on a computer; how to open and navigate an Internet
browser, to gain access to the desired website of the test; and how to download ma-
terials from the test link. In addition, a test taker must also have access to and
knowledge of a computer printer, to print out the required test paper.
In this testing situation, other materials and equipment are also specialized.
This includes familiarity with paper format, as well as the instructions and the
item-response format given on the test paper. These item responses include read-
ing printed sentences aloud, repeating sentences after hearing them once, giving a
short phrasal answer to spoken questions, rearranging audible word groups into
syntactically correct sentences, and responding to open questions. Finally, the test
taker needs to have access to a telephone, and she or he needs to be familiar with
telephone technology.
The participants involved in the test task are the test taker and the faceless, unfa-
miliar, and quite possibly unnerving test administrator, which comprises speech-
recognition technology employed over a telephone and computer software that au-
tomatically scores the test. However, an independent proctor may be needed to ver-
ify the identity of the test taker for interested parties. This proctor needs to be pres-
ent when one takes the test.
The time of task is the other appeal that the company trumpets. The test is con-
tinuously available so that test takers are free to take the test at any time, day or
night, and on any day of their choosing. This is an advantage to test takers in that
they are able to schedule the test according to their own time preferences, in stark
LANGUAGE TEST FOR EMPLOYMENT 299
contrast to so many high-stakes tests. The test itself takes approximately 10 min
to finish.
The instructions for the test are written and spoken in the target language, which
in the case of the SET–10 is English. The channel used is visual and aural. The vi-
sual channel comprises written text and three pictures modeling the correct way to
hold the telephone while giving responses. The test taker hears further instructions
on the telephone when first calling the PhonePass system. These include entering
the test identification number on the keypad. The written instructions are presented
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
in a fairly clear manner, with specific examples for three of the five tasks. The ex-
planations for each of the five tasks are somewhat brief. The instructions for the
procedure, the tasks, the criteria for the scores, and suggestions for taking the test
in an optimal setting (the sole criterion is the absence of noise in the location of the
test taker’s choice), and the diagrams of a person holding the phone in the correct
way are all printed on one side of the test paper.
As mentioned, the test taker must complete five clearly separated sections on
the SET–10. In each of these five sections, the test taker is presented with different
task types: reading aloud, repeating sentences, giving short answers to questions,
building sentences, and responding to open questions. In Part A of the test, the test
taker must read aloud 8 sentences from a list of 12 items that are printed on the test
paper. In this section, the test taker must read the sentences in the order given over
the telephone. This order is different from the sequential order that is shown on the
test paper. Part B includes a fixed sequence of responses; that is, test takers repeat
each sentence verbatim, which they hear once. There are a total of 16 repeat sen-
tence items. In Part C, the test taker must respond with a simple answer to spoken
questions. This section has 24 short-answer items. Part D presents the test taker
with spoken phrasal word groups in a random sequence. The test taker must rear-
range these word groups into syntactically correct sentences. There are 10 items in
this section. Last, in Part E, test takers hear a spoken question to which they re-
spond with their opinions within 20 s. There are three items in this section. The five
sections do not differ in importance. As stated, time allotted for the entire test is 10
min. The test appears speeded to some because not all test takers can complete
each task, especially in Part E. The computer records and scores the responses to
the items in the first four sections. The system records responses to the items in the
last section but does not score them; instead, they are open for review by autho-
rized listeners.
The input for the SET–10 has several format characteristics. Both an aural
channel and a visual channel present the input. The visual channel is from the re-
quired test paper, and the aural channel includes the spoken prompts. The length of
the input is presented in phrases and sentences. The input in Parts C, D, and E are a
bit lengthier and thus require more interpretation. In the first four sections, the
types of input are a series of items, whereas in the last section, Part E, the type of
input is a prompt in that the test taker is expected to produce an extended response
300 CHUN
to the questions (although 20 s in length, it involves more production than that per-
formed in the items in the other sections of the test). Because the test must be com-
pleted within 10 min, the test taker needs a high degree of speed. Therefore, the test
taker has to process the information in the input at a fairly high rate. Last, the vehi-
cle by which the input is delivered is reproduced via the speech-recognition soft-
ware employed over the telephone.
The organizational characteristics of the test for both the input and the expected
responses include grammatical knowledge and textual cohesion. The textual cohe-
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
sion on the test comprises comprehension and production of the utterances spoken
in Parts B, C, D, and E. The test also requires knowledge of ideational functions in
Part E, the open questions. However, this section is not scored. In the SET–10 Test
Description and Validation Summary (Ordinate Corporation, 2004), the company
claimed that the test investigates the “psycholinguistic elements of spoken lan-
guage performance rather than the social … elements of communication” (p. 3).
The language characteristics found in the test items of the SET–10 Demo Test do
not support this claim, because there are numerous cultural references throughout
the test sections. One such reference was contained in the test item question
“Would a blanket go on the bed or the wall?” The topical characteristics in Parts A,
C, and D, which are cultural, bear this out. In addition, because the test is marketed
as an assessment tool for screening job candidates’ ability in spoken English, the
social elements of communication should be at the core of the test’s construct.
In terms of the relationship between input and response, this test has no recipro-
cal tasks; that is, the test taker receives no feedback on the responses. Neither is this
test adaptive, because the input is not affected by the test taker’s responses. All of
the test tasks are of narrow scope. Finally, regarding the directness of relationship,
the test tasks are direct in Parts A, B, C, and D, whereas the test task in the unscored
Part E is indirect.
The test taker must complete five sections on the SET–10, comprising several struc-
tured speaking tasks. In the first section, Part A, the test taker must read aloud a total
of 8 sentences from a list of 12 items printed on the test paper. In this section, the test
taker must read the sentences in the order that is instructed over the telephone. This
order is different from the sequential order that is shown on the test paper. Because
Part A’s test task consists of reading aloud printed sentences, it is a structured speak-
ing task, or what Luoma (2004) said is “the speaking equivalent of multiple choice
tasks” (p. 50). I consider this test task of reading aloud to be a relatively less authentic
task because it does not ask the test taker to create discourse that would be expected in
a target language use task in the real-life domain of employment or school. In terms
LANGUAGE TEST FOR EMPLOYMENT 301
of organizational knowledge, the test takers are simply reading sentences aloud;
grammatical knowledge is not involved here, because the test taker does not need to
produce or comprehend formally accurate sentences. In addition, textual knowledge
is not being tested here either, because the sentences are read in a random order rather
than in the sequentially coherent sentence groups presented on the test paper. This
seems to prevent the test examiner from knowing if the test taker has knowledge of
cohesion, because the explicitly marked relationships among the four sentences in
each group are negated by the random reading aloud of them and thus require neither
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
The diagnostic subscore mapped to this section is vocabulary, which the test de-
fines as “reflecting the ability to understand common everyday words spoken in
sentence context and to produce such words as needed” (Ordinate Corporation,
2004, p. 8). Upon examination of the vocabulary in the questions in Part C, the lex-
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
ical knowledge required seems to be a fairly low level (high beginning to low inter-
mediate level at best); thus, even if a test taker answered all these questions cor-
rectly, the score result on this section allows the test examiner to interpret this score
as an indication of a test taker’s ability to perform up to only that level of lexicality.
Since I did not take the actual test (only the demo test), I do not know if the vocabu-
lary is more extensive in this section. Based on the judgment from the demo test,
there needs to be a wider range of vocabulary presented in Part C if there is to be
any degree of correspondence of characteristics found in this language test task to
the features of target language use tasks found in the work and school domains.
The vocabulary items being measured do not appear to be representative of a level
at which university students or professional employees are expected to perform. In
addition, this task demands of the test taker only a limited production response of a
single word or phrase; therefore, it is difficult to see how one can accurately assess
an examinee’s speaking capability because it should also include the use of ex-
tended production responses. Although the test task in Part C requires grammatical
and textual knowledge, again, sociolinguistic knowledge is not being tested.
In Part D’s task, Sentence Builds, the test taker hears a sequence of three short
phrasal word groups. These phrases are presented in a random sequence, and the
test taker is asked to rearrange them into a syntactically correct sentence. Here are
some examples from the demo test:
“in/bed/stay”
“your books/leave/at home”
“we wondered/would fit in here/whether the new piano”
One of the diagnostic subscores mapped to this section is fluency. In the valida-
tion report by the company, fluency is defined as reflecting “the rhythm, phrasing
and timing evident in constructing [italics added], reading, and repeating sen-
tences” (Ordinate Corporation, 2004, p. 9). However, it appears that this test task
does not adequately represent the construct being measured. Part D requires con-
structing a sentence, but that task is in response to an input of word groups that the
test taker has to rearrange into a syntactically correct sentence. “Constructing” a
sentence implies more than simply rearranging given words—it suggests the per-
formance of extended production responses. However, Part D requires the test tak-
LANGUAGE TEST FOR EMPLOYMENT 303
ers to perform only selected responses in which they must choose one response by
simply rearranging the given phrasal word groups and are thus not required to actu-
ally produce any utterance of their own. Although the test task in this section tests
grammatical knowledge, again, as in the other sections, pragmatic knowledge is
not tested. Sentence mastery is another diagnostic subscore mapped to this part,
and according to the validation summary, it “reflects the ability to understand, re-
call, and produce English phrases and clauses in complete sentences. Performance
depends on accurate syntactic processing and appropriate usage of words, phrases
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
and clauses in meaningful sentence structures” (p. 8). Because this test task in Part
D seems to be more of a multiple-choice task in that there are only six possible
combinations of the three phrasal word groups, any interpretation of this score in
sentence mastery is difficult to generalize beyond this test task to any particular tar-
get language use domain. Producing phrases and clauses in complete meaningful
sentences requires extended production responses, not selecting one combination
out of a possible six. Although one can argue that this test task requires syntactic
knowledge to give the correct response of rearranged word groups, I contend that it
is relatively less authentic because the test taker has a one-in-six chance of guess-
ing the correct combination of word groups.
In the last section, Part E, the test taker listens to a spoken question and then has
20 s in which to give an opinion. The questions deal with either family life or pref-
erences/choices, such as “Which would you prefer to live, in a big city or a small
town and why?” and “What qualities do you look for in a friend?” Although this
section’s test task is relatively authentic (certainly more so than the other tasks in
this test) in that it requires extended production responses (albeit for 20 s), the test
taker’s responses are not scored at all; they are available for human review by au-
thorized listeners. As Bachman (1990) put it, “considerations of efficiency often
take precedence over those of validity and authenticity” (p. 298).
Ordinate Corporation’s Validation Summary for PhonePass SET–10 (n.d.)
claimed that its data provide evidence in support of several conclusions. One con-
clusion is that test items elicit responses that can be reliably estimated of a test
taker’s conversational skills. This is doubtful upon a careful review of the test
tasks. In Part A, the test taker is simply reading sentences aloud; in Part B, the test
taker is repeating sentences; in Part C, only a limited production response of one
word or phrase is required of the test taker; and in Part D the test taker gives only a
selected response. Strictly in Part E is there an extended response required, which
provides an opportunity for an accurate assessment of a test taker’s conversational
skills. However, as noted, this section is not scored.
In terms of bias, the SET–10 seems to have minimal bias. An analysis of the
sentences and tasks did not reveal any offensive or inflammatory content or lan-
guage. In general, the test materials do not seem to cause any unfair penalization
owing to a test taker’s group membership. In terms of educational access, the test’s
materials involve skills in speaking and listening. Whether all test takers have had
304 CHUN
the opportunity to learn these types of tasks is an interesting question. Test takers
who reside in an English-speaking country have an advantage over test takers who
reside elsewhere inasmuch as the former would have had more opportunities to
learn and become familiar with the tasks presented on the SET–10. A comparison
study of test takers in English-speaking environments and test takers in non-Eng-
lish-speaking environments measuring their respective test results is needed to see
if bias is present.
In terms of cost, the price is reasonable ($35 to $40), especially in comparison
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
CONCLUSION
Upon an examination of the test tasks in each section, it seems that the perfor-
mance on these tasks does not necessarily predict or reflect the speaking ability of
the test taker to function in another environment—that of the real-life domain of
school and work. Listening and speaking over the telephone and in face-to-face
conversations involves different cognitive demands. Performance on these tasks in
an aural channel might be different on similar tasks when presented in a visual
channel, for example, conversations involving body language and facial gestures.
LANGUAGE TEST FOR EMPLOYMENT 305
Thus, scores from a performance on the SET–10 may not meet external criteria,
such as employment ratings. The content found in the test tasks of the SET–10 is
not representative of demands of corresponding tasks outside this test situation.
According to Bachman (1990), the definition of authenticity is “a function of the
interaction between the test taker and the test task. … Test authenticity thus be-
comes essentially synonymous with what we consider communicative language
use, or the negotiation of meaning” (p. 317). Through this definition comes the
conclusion that because there is no negotiation of meaning found in Parts A and
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
B of the test and perhaps Part D, this test has to be considered relatively less
authentic.
The PhonePass test needs a major revision of its test tasks for its test scores to be
interpreted as a valid, reliable, and fair predictor of the test taker’s language abili-
ties. But as Bachman (2002) pointed out, because most real-life domain tasks are
complex and diverse, “the evidence of content relevance and representativeness
that is required to support the use of test scores for prediction is extremely difficult
to provide” (p. 453). One possible way out of this dilemma is the development of
specialized test tasks specific to a particular domain so that there might be one ver-
sion to assess business language skills and another version to assess academic abil-
ity. Making the test tasks adaptive might also give the examiner a more reliable as-
sessment of the test taker’s communicative competence. As Bachman (1990)
stated, “The key is the selection or creation of language whose content is interest-
ing and relevant enough to the test taker to be engaging” (pp. 321–322). The con-
tent found in PhonePass is neither interesting nor relevant to be engaging.
It would be interesting to see if the test takers’ perception of the authenticity of
the test characteristics in the PhonePass test affected their performance, following
the study by Lewkowicz (2000). Certainly, further research in this area is needed.
Indeed, as Bachman (2001) stated,
REFERENCES
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013
Anderson, R. C. (1972). How to construct achievement tests to assess comprehension. Review of Edu-
cational Research, 42(2), 145–170.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, England: Oxford
University Press.
Bachman, L. F. (2001). Some construct validity issues in interpreting scores from performance assess-
ments of language ability. In R. L. Cooper, E. Shohamy, & J. Walters (Eds.), New perspectives and is-
sues in educational language policy: A festschrift for Bernard Dov Spolsky (pp. 63–90). Amsterdam:
Benjamins.
Bachman, L. F. (2002). Some reflections on task-based language performance assessment. Language
Testing, 19, 453–476.
Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford, England: Oxford University
Press.
Buck, G. (2001). Assessing listening. Cambridge, England: Cambridge University Press.
Foucault, M. (1979). Discipline and punish: The birth of the prison (A. Sheridan, Trans.). New York:
Vintage. (Original work published 1975)
Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), Selected papers from the Euro-
pean Year of Languages Conference, Barcelona (pp. 27–48). Cambridge, England: Cambridge Uni-
versity Press.
Lewkowicz, J. A. (2000). Authenticity in language testing: Some outstanding questions. Language
Testing, 17, 43–64.
Luoma, S. (2004). Assessing speaking. Cambridge, England: Cambridge University Press.
Ordinate Corporation. (2004). SET–10 test description and validation summary. Retrieved January 29,
2005, from http://www.ordinate.com/pdf/SET–10_Test_Desc_Validation.pdf
Ordinate Corporation. (2005). Test applications. Retrieved February 28, 2005, from http://www.ordi-
nate.com/content/prod/applications.shtml
Ordinate Corporation. (n.d.). Validation summary for PhonePass SET–10. Retrieved January 29, 2005,
from http://www.ordinate.com/pdf/ValidationSummary000302.pdf
Shohamy, E. (2001). Democratic assessment as an alternative. Language Testing, 18, 373–391.