You are on page 1of 13

This article was downloaded by: [University of Southern Queensland]

On: 14 May 2013, At: 09:50


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,
UK

Language Assessment Quarterly


Publication details, including instructions for
authors and subscription information:
http://www.tandfonline.com/loi/hlaq20

COMMENTARY: An Analysis of a
Language Test for Employment:
The Authenticity of the
PhonePass Test
Christian W. Chun
Published online: 16 Nov 2009.

To cite this article: Christian W. Chun (2006): COMMENTARY: An Analysis of a


Language Test for Employment: The Authenticity of the PhonePass Test, Language
Assessment Quarterly, 3:3, 295-306

To link to this article: http://dx.doi.org/10.1207/s15434311laq0303_4

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-


and-conditions

This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden.

The publisher does not give any warranty express or implied or make any
representation that the contents will be complete or accurate or up to
date. The accuracy of any instructions, formulae, and drug doses should be
independently verified with primary sources. The publisher shall not be liable
for any loss, actions, claims, proceedings, demand, or costs or damages
whatsoever or howsoever caused arising directly or indirectly in connection
with or arising out of the use of this material.
LANGUAGE ASSESSMENT QUARTERLY, 3(3), 295–306
Copyright © 2006, Lawrence Erlbaum Associates, Inc.

COMMENTARY

An Analysis of a Language
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

Test for Employment:


The Authenticity of the PhonePass Test
Christian W. Chun
Ontario Institute for Studies in Education of the University of Toronto

This article presents an analysis of Ordinate Corporation’s PhonePass Spoken Eng-


lish Test–10. The company promotes this product as being a useful assessment tool
for screening job candidates’ ability in spoken English. In the real-life domain of the
work environment, one of the primary target language use tasks involves extended
production responses. This includes interaction with the clients, responding to cus-
tomers, and so forth. What is germane here is the question, does the domain of the test
tasks found in the PhonePass Spoken English Test–10 correspond to the real-life do-
main of nontest tasks? Thus the rationale for my using the framework of language
task characteristics outlined by Bachman and Palmer (1996) and the Test Fairness
framework proposed by Kunnan (2004) as guides for an analysis to see if the target
language domain of this test is something that the test taker is likely to face in the
real-life domain of a work environment.

In a further demonstration of the growing divide between those who have unlim-
ited access to technology and those who do not, the recent development of the ad-
ministration of tests over the telecommunications networks that has proceeded
apace—the Graduate Record Examination, the Test of English as a Foreign Lan-
guage Internet-Based Testing, and the Information and Communication Technol-
ogy Literacy Assessment, to name just a few—highlights the highly commercial-
ized aspects of the test as being one more tool to be commodified, integrated, and

Correspondence should be addressed to Christian W. Chun, Modern Language Center, CTL,


OISE/UT, 252 Bloor Street West, 10th Floor, Toronto, ON M5S Iv6, Canada. E-mail: cchun@
oise.utoronto.ca
296 CHUN

distributed in the global economy. A marketplace that features products competing


for consumers’ attention rewards the producer who lowers the product’s distribu-
tion costs significantly enough so that profit margins increase commensurately.
We are witnessing this in the phenomenon of tests being administered evermore
cheaply and efficiently, at least from the viewpoint of the producer. For the test
taker—or, perhaps more aptly, the test consumer—the consumption of this com-
modity does not come cheaply. Because the present relationship between the tester
and the test taker is now unmistakably a market transaction, the question that the
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

test takers and those who assess the test results must inevitably ask is, “Are we get-
ting our money’s worth?”
One such test that is being administered efficiently over telecommunications
networks is Ordinate Corporation’s PhonePass Spoken English Test–10 (SET–10).
This test is being marketed as a useful assessment tool for screening job candi-
dates’ ability in spoken English. In addition, according to the website of Ordinate
(now a subsidiary of Harcourt Assessment), the applications of this test include
evaluating employees for promotions and measuring service employees with “an
accurate and objective test of their spoken English capabilities” (Ordinate Corpo-
ration, 2005, ¶ 8). The test is also promoted as having academic applications—as
an entrance exam, a placement and exit exam, and one to qualify international
teaching assistants.
What is pertinent here is the question, as a language test, how authentic is
PhonePass? Authenticity has been defined as “the degree of correspondence of the
characteristics of a given language test task to the features of a Target Language
Use task. It is this correspondence that is at the heart of authenticity” (Bachman &
Palmer, 1996, p. 23). Do the characteristics of the domain of the test tasks found in
PhonePass correspond to the real-life domain of nontest tasks? As Luoma (2004)
reminded us, “with task-based tests, the developers need to show that the content
of the test tasks is representative of the demands of the corresponding task outside
the test situation, and that the scoring reflects this” (p. 43).
The notion and application of authenticity in language testing are complex.
Lewkowicz (2000) suggested that authenticity comprises three elements: “authen-
ticity of input, purpose, and outcome” (p. 51). She posed the question “To what ex-
tent can/do test tasks give rise to authentic-sounding output which allow for gener-
alizations to be made about test takers’ performance in the real world?” (p. 51). Do
the test tasks in the PhonePass test generate such an output? The assessment of
what generally constitutes “authentic-sounding output” may be problematic; how-
ever, when situated in specific sociolinguistic contexts, certain outputs should be
expected and thus accepted as authentic sounding.
The PhonePass test has been promoted as an expedient assessment for screen-
ing job candidates’ ability in spoken English and as an academic entrance exam.
Examiners (as well as test takers) should therefore expect to find a high degree of
correspondence between the features of the test tasks found in the Phone Pass test
LANGUAGE TEST FOR EMPLOYMENT 297

and the features of target language use tasks found in real-life domains for the test
to be considered relatively authentic. In the real-life domain of the work environ-
ment, one of the primary target language use tasks is the use of extended produc-
tion responses. These might include interacting with clients over the phone and in
face-to-face encounters and responding to customers’ queries, each of which re-
quires knowing which register to use in an appropriate situation. These tasks in-
volve extensive organizational and pragmatic knowledge. A defined range of pro-
duction responses should serve as a model by which test takers’ output can be
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

assessed as authentic sounding.


The framework of language task characteristics outlined by Bachman and
Palmer (1996) and the Test Fairness framework proposed by Kunnan (2004) are
used here as guides for this analysis to see if the characteristics of the domain of the
test tasks found in PhonePass correspond to real-life domains of nontest tasks. The
analysis is based on the SET–10 Demo Test, which is available through the com-
pany’s website (http://www.ordinate.com/content/demo/demo-level1.shtml). The
format and length of time to complete the Demo Test are the same as those of the
actual SET–10. To be sure, I took the Demo Test twice (February 2005 and May
2005).

GENERAL DESCRIPTION

The test is a 10-min spoken-English test for non-native speakers of English.


The test, which according to the company’s Validation Summary for PhonePass
SET–10, measures “facility in spoken English” (Ordinate Corporation, n.d., p. 1),
is administered over a land-line telephone and is immediately scored by computer.
The test taker can then retrieve the score, which is posted on the company’s
website. The test results are reviewed by academic institutions, businesses, and
government agencies to assess the test takers’ listening and speaking ability in
English. In North America, the cost of the test ranges from $25 to $40, depending
on the quantity of tests required by the employer or institution.
The test consists of five sections (Parts A–E). The computer records and scores
the responses in the first four sections. The last section contains open-ended ques-
tions, and the responses are recorded, not scored by the system. The scores are re-
ported in the range of 20 to 80, with 80 being the top score. This overall score “rep-
resents the ability to understand spoken English and speak it intelligibly at a native
conversational pace on everyday topics” (Ordinate Corporation, 2004, p. 8).
Scores are based on a weighted combination of four diagnostic subscores: sen-
tence mastery, vocabulary, fluency, and pronunciation. A speech-processing soft-
ware performs the scoring of the test taker’s spoken responses. For test takers to be
apprised of these criteria and procedures, they must download a separate docu-
ment, the Validation Summary for PhonePass SET–10 (Ordinate Corporation,
298 CHUN

n.d.), from the Technical Reports section of the website. Given the extra step in-
volved, it seems unlikely that many test takers will do this.

FRAMEWORK OF LANGUAGE TASK


CHARACTERISTICS
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

The setting of the SET–10 is one of its main appeals, and it is used in marketing the
test. This physical characteristic of the test enables one to take the test in any loca-
tion where there is a land-line phone system available. Therefore, the test taker can
conceivably take the test in an office, in the comfort of one’s own home, at an air-
port, or even at an outdoor telephone booth (if one still exists). However, because
the SET–10 administration is supported by a test paper, the test taker does need to
have access to an Internet-connected computer first, to download it from the com-
pany’s website. The test taker also needs a test identification number to take the
test over the phone. This identification number is available on only the downloaded
test paper. Although the company markets the test as being just a phone call away,
this aforementioned procedure of obtaining the required test paper necessitates a
fairly high degree of familiarity with computer technology—that is, knowing how
to use an operating system on a computer; how to open and navigate an Internet
browser, to gain access to the desired website of the test; and how to download ma-
terials from the test link. In addition, a test taker must also have access to and
knowledge of a computer printer, to print out the required test paper.
In this testing situation, other materials and equipment are also specialized.
This includes familiarity with paper format, as well as the instructions and the
item-response format given on the test paper. These item responses include read-
ing printed sentences aloud, repeating sentences after hearing them once, giving a
short phrasal answer to spoken questions, rearranging audible word groups into
syntactically correct sentences, and responding to open questions. Finally, the test
taker needs to have access to a telephone, and she or he needs to be familiar with
telephone technology.
The participants involved in the test task are the test taker and the faceless, unfa-
miliar, and quite possibly unnerving test administrator, which comprises speech-
recognition technology employed over a telephone and computer software that au-
tomatically scores the test. However, an independent proctor may be needed to ver-
ify the identity of the test taker for interested parties. This proctor needs to be pres-
ent when one takes the test.
The time of task is the other appeal that the company trumpets. The test is con-
tinuously available so that test takers are free to take the test at any time, day or
night, and on any day of their choosing. This is an advantage to test takers in that
they are able to schedule the test according to their own time preferences, in stark
LANGUAGE TEST FOR EMPLOYMENT 299

contrast to so many high-stakes tests. The test itself takes approximately 10 min
to finish.
The instructions for the test are written and spoken in the target language, which
in the case of the SET–10 is English. The channel used is visual and aural. The vi-
sual channel comprises written text and three pictures modeling the correct way to
hold the telephone while giving responses. The test taker hears further instructions
on the telephone when first calling the PhonePass system. These include entering
the test identification number on the keypad. The written instructions are presented
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

in a fairly clear manner, with specific examples for three of the five tasks. The ex-
planations for each of the five tasks are somewhat brief. The instructions for the
procedure, the tasks, the criteria for the scores, and suggestions for taking the test
in an optimal setting (the sole criterion is the absence of noise in the location of the
test taker’s choice), and the diagrams of a person holding the phone in the correct
way are all printed on one side of the test paper.
As mentioned, the test taker must complete five clearly separated sections on
the SET–10. In each of these five sections, the test taker is presented with different
task types: reading aloud, repeating sentences, giving short answers to questions,
building sentences, and responding to open questions. In Part A of the test, the test
taker must read aloud 8 sentences from a list of 12 items that are printed on the test
paper. In this section, the test taker must read the sentences in the order given over
the telephone. This order is different from the sequential order that is shown on the
test paper. Part B includes a fixed sequence of responses; that is, test takers repeat
each sentence verbatim, which they hear once. There are a total of 16 repeat sen-
tence items. In Part C, the test taker must respond with a simple answer to spoken
questions. This section has 24 short-answer items. Part D presents the test taker
with spoken phrasal word groups in a random sequence. The test taker must rear-
range these word groups into syntactically correct sentences. There are 10 items in
this section. Last, in Part E, test takers hear a spoken question to which they re-
spond with their opinions within 20 s. There are three items in this section. The five
sections do not differ in importance. As stated, time allotted for the entire test is 10
min. The test appears speeded to some because not all test takers can complete
each task, especially in Part E. The computer records and scores the responses to
the items in the first four sections. The system records responses to the items in the
last section but does not score them; instead, they are open for review by autho-
rized listeners.
The input for the SET–10 has several format characteristics. Both an aural
channel and a visual channel present the input. The visual channel is from the re-
quired test paper, and the aural channel includes the spoken prompts. The length of
the input is presented in phrases and sentences. The input in Parts C, D, and E are a
bit lengthier and thus require more interpretation. In the first four sections, the
types of input are a series of items, whereas in the last section, Part E, the type of
input is a prompt in that the test taker is expected to produce an extended response
300 CHUN

to the questions (although 20 s in length, it involves more production than that per-
formed in the items in the other sections of the test). Because the test must be com-
pleted within 10 min, the test taker needs a high degree of speed. Therefore, the test
taker has to process the information in the input at a fairly high rate. Last, the vehi-
cle by which the input is delivered is reproduced via the speech-recognition soft-
ware employed over the telephone.
The organizational characteristics of the test for both the input and the expected
responses include grammatical knowledge and textual cohesion. The textual cohe-
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

sion on the test comprises comprehension and production of the utterances spoken
in Parts B, C, D, and E. The test also requires knowledge of ideational functions in
Part E, the open questions. However, this section is not scored. In the SET–10 Test
Description and Validation Summary (Ordinate Corporation, 2004), the company
claimed that the test investigates the “psycholinguistic elements of spoken lan-
guage performance rather than the social … elements of communication” (p. 3).
The language characteristics found in the test items of the SET–10 Demo Test do
not support this claim, because there are numerous cultural references throughout
the test sections. One such reference was contained in the test item question
“Would a blanket go on the bed or the wall?” The topical characteristics in Parts A,
C, and D, which are cultural, bear this out. In addition, because the test is marketed
as an assessment tool for screening job candidates’ ability in spoken English, the
social elements of communication should be at the core of the test’s construct.
In terms of the relationship between input and response, this test has no recipro-
cal tasks; that is, the test taker receives no feedback on the responses. Neither is this
test adaptive, because the input is not affected by the test taker’s responses. All of
the test tasks are of narrow scope. Finally, regarding the directness of relationship,
the test tasks are direct in Parts A, B, C, and D, whereas the test task in the unscored
Part E is indirect.

TEST FAIRNESS FRAMEWORK:


VALIDITY AND AUTHENTICITY

The test taker must complete five sections on the SET–10, comprising several struc-
tured speaking tasks. In the first section, Part A, the test taker must read aloud a total
of 8 sentences from a list of 12 items printed on the test paper. In this section, the test
taker must read the sentences in the order that is instructed over the telephone. This
order is different from the sequential order that is shown on the test paper. Because
Part A’s test task consists of reading aloud printed sentences, it is a structured speak-
ing task, or what Luoma (2004) said is “the speaking equivalent of multiple choice
tasks” (p. 50). I consider this test task of reading aloud to be a relatively less authentic
task because it does not ask the test taker to create discourse that would be expected in
a target language use task in the real-life domain of employment or school. In terms
LANGUAGE TEST FOR EMPLOYMENT 301

of organizational knowledge, the test takers are simply reading sentences aloud;
grammatical knowledge is not involved here, because the test taker does not need to
produce or comprehend formally accurate sentences. In addition, textual knowledge
is not being tested here either, because the sentences are read in a random order rather
than in the sequentially coherent sentence groups presented on the test paper. This
seems to prevent the test examiner from knowing if the test taker has knowledge of
cohesion, because the explicitly marked relationships among the four sentences in
each group are negated by the random reading aloud of them and thus require neither
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

producing nor comprehending these relationships.


There is no evidence that this test task in Part A requires any functional and
sociolinguistic knowledge that is needed for the target language use domain of
work and school. Because pronunciation is one of the dimensions being evaluated
on this section, the question becomes what Luoma (2004) has asked: What is the
native speaker standard for foreign language pronunciation? Which regional vari-
eties or standards should be used? Now, although many language learners pro-
nounce in a fully comprehensible manner, few, according to Luoma, are able to
achieve a native-like standard in all respects.
In Part B of the test, the test takers repeat each sentence verbatim, which they
hear once. The sentences are presented to the test taker in order of increasing diffi-
culty. Some of the sentences that I had to repeat on the demo test were as follows:
“War broke out,” “Let’s meet again in two weeks,” and “There are three basic ways
in which a story might be told to someone.” Because this test task is a sentence-rep-
etition task, as Buck (2001) described it, the test taker, in this section, is only given
a series of unconnected sentences rather than a unified passage. For the shorter
sentences, such as “War broke out,” which can be repeated immediately, “it might
test no more than the ability to recognize and repeat sounds, and this may not re-
quire processing of the meaning at all. … [This] clearly fails Anderson’s (1972)
criteria for proof of comprehension” (p. 79). For the longer sentences, such as
“There are 3 basic ways in which a story might be told to someone,” this task starts
to test the examiner’s working memory. Now, although Buck (2001) argued that
sentence-repetition tasks, such as the ones found in Part B, require speech produc-
tion, my interpretation of the speech production needed in the real-life domain of
school and work necessitates the ability to create and interpret discourse by relat-
ing utterances to their meanings and intentions as well as the setting. Even a parrot
can be taught to repeat short sentences devoid of any meaning or context.
In Part C, the test taker must respond with a single word or short phrase to spo-
ken questions. To respond to the question prompt, the test taker needs to identify
the words in phonological and syntactic context and infer the demand proposition.
Sample questions included on the demo test were

“What season comes before spring?”


“What is frozen water called?”
302 CHUN

“How many wheels does a bicycle have?”


“Oranges and bananas are fruits or vegetables?”
“Does a blanket go on the bed or the wall?”

The diagnostic subscore mapped to this section is vocabulary, which the test de-
fines as “reflecting the ability to understand common everyday words spoken in
sentence context and to produce such words as needed” (Ordinate Corporation,
2004, p. 8). Upon examination of the vocabulary in the questions in Part C, the lex-
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

ical knowledge required seems to be a fairly low level (high beginning to low inter-
mediate level at best); thus, even if a test taker answered all these questions cor-
rectly, the score result on this section allows the test examiner to interpret this score
as an indication of a test taker’s ability to perform up to only that level of lexicality.
Since I did not take the actual test (only the demo test), I do not know if the vocabu-
lary is more extensive in this section. Based on the judgment from the demo test,
there needs to be a wider range of vocabulary presented in Part C if there is to be
any degree of correspondence of characteristics found in this language test task to
the features of target language use tasks found in the work and school domains.
The vocabulary items being measured do not appear to be representative of a level
at which university students or professional employees are expected to perform. In
addition, this task demands of the test taker only a limited production response of a
single word or phrase; therefore, it is difficult to see how one can accurately assess
an examinee’s speaking capability because it should also include the use of ex-
tended production responses. Although the test task in Part C requires grammatical
and textual knowledge, again, sociolinguistic knowledge is not being tested.
In Part D’s task, Sentence Builds, the test taker hears a sequence of three short
phrasal word groups. These phrases are presented in a random sequence, and the
test taker is asked to rearrange them into a syntactically correct sentence. Here are
some examples from the demo test:

“in/bed/stay”
“your books/leave/at home”
“we wondered/would fit in here/whether the new piano”

One of the diagnostic subscores mapped to this section is fluency. In the valida-
tion report by the company, fluency is defined as reflecting “the rhythm, phrasing
and timing evident in constructing [italics added], reading, and repeating sen-
tences” (Ordinate Corporation, 2004, p. 9). However, it appears that this test task
does not adequately represent the construct being measured. Part D requires con-
structing a sentence, but that task is in response to an input of word groups that the
test taker has to rearrange into a syntactically correct sentence. “Constructing” a
sentence implies more than simply rearranging given words—it suggests the per-
formance of extended production responses. However, Part D requires the test tak-
LANGUAGE TEST FOR EMPLOYMENT 303

ers to perform only selected responses in which they must choose one response by
simply rearranging the given phrasal word groups and are thus not required to actu-
ally produce any utterance of their own. Although the test task in this section tests
grammatical knowledge, again, as in the other sections, pragmatic knowledge is
not tested. Sentence mastery is another diagnostic subscore mapped to this part,
and according to the validation summary, it “reflects the ability to understand, re-
call, and produce English phrases and clauses in complete sentences. Performance
depends on accurate syntactic processing and appropriate usage of words, phrases
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

and clauses in meaningful sentence structures” (p. 8). Because this test task in Part
D seems to be more of a multiple-choice task in that there are only six possible
combinations of the three phrasal word groups, any interpretation of this score in
sentence mastery is difficult to generalize beyond this test task to any particular tar-
get language use domain. Producing phrases and clauses in complete meaningful
sentences requires extended production responses, not selecting one combination
out of a possible six. Although one can argue that this test task requires syntactic
knowledge to give the correct response of rearranged word groups, I contend that it
is relatively less authentic because the test taker has a one-in-six chance of guess-
ing the correct combination of word groups.
In the last section, Part E, the test taker listens to a spoken question and then has
20 s in which to give an opinion. The questions deal with either family life or pref-
erences/choices, such as “Which would you prefer to live, in a big city or a small
town and why?” and “What qualities do you look for in a friend?” Although this
section’s test task is relatively authentic (certainly more so than the other tasks in
this test) in that it requires extended production responses (albeit for 20 s), the test
taker’s responses are not scored at all; they are available for human review by au-
thorized listeners. As Bachman (1990) put it, “considerations of efficiency often
take precedence over those of validity and authenticity” (p. 298).
Ordinate Corporation’s Validation Summary for PhonePass SET–10 (n.d.)
claimed that its data provide evidence in support of several conclusions. One con-
clusion is that test items elicit responses that can be reliably estimated of a test
taker’s conversational skills. This is doubtful upon a careful review of the test
tasks. In Part A, the test taker is simply reading sentences aloud; in Part B, the test
taker is repeating sentences; in Part C, only a limited production response of one
word or phrase is required of the test taker; and in Part D the test taker gives only a
selected response. Strictly in Part E is there an extended response required, which
provides an opportunity for an accurate assessment of a test taker’s conversational
skills. However, as noted, this section is not scored.
In terms of bias, the SET–10 seems to have minimal bias. An analysis of the
sentences and tasks did not reveal any offensive or inflammatory content or lan-
guage. In general, the test materials do not seem to cause any unfair penalization
owing to a test taker’s group membership. In terms of educational access, the test’s
materials involve skills in speaking and listening. Whether all test takers have had
304 CHUN

the opportunity to learn these types of tasks is an interesting question. Test takers
who reside in an English-speaking country have an advantage over test takers who
reside elsewhere inasmuch as the former would have had more opportunities to
learn and become familiar with the tasks presented on the SET–10. A comparison
study of test takers in English-speaking environments and test takers in non-Eng-
lish-speaking environments measuring their respective test results is needed to see
if bias is present.
In terms of cost, the price is reasonable ($35 to $40), especially in comparison
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

to tests such as the Test of English as a Foreign Language Internet-Based Testing,


which costs almost 4 times more. Presumably, a fair number of test takers are able
to take the SET–10 at least once. The unique geographical feature of the test is that
the test site involves any location where a test taker would have access to a
land-line telephone. Thus, test locations should not pose too much of a hardship to
test takers in developed countries. However, in some underdeveloped countries,
the absence of an infrastructure supporting telecommunication poses a significant
obstacle to taking this test. It is not known if this test provides appropriate accom-
modations for test takers with physical impairments, such as the sight or hearing
impaired. It does not appear that the test paper with instructions and test items is
available in a Braille format for the sight impaired. Because one of the constructs is
the understanding of spoken English, this might be compromised if accommoda-
tions were made for the hearing impaired.
As mentioned, the test taker needs to be familiar with the test-taking equipment,
such as computers, printers, and telephones. This might be an impediment in coun-
tries where such equipment is rare or unavailable. The procedures are extensive in
that the test taker must be familiar with Internet technology and computer operat-
ing system commands in order to obtain the test paper. Last, the test taker needs to
know how to use a telephone.
It is not known if any washback effects exist on instructional practices leading
up to the test. Any opportunities available to the test taker to rescore or reevaluate
the test and any legal provisions to challenge scores were not posted on the com-
pany’s website or in their reports.

CONCLUSION

Upon an examination of the test tasks in each section, it seems that the perfor-
mance on these tasks does not necessarily predict or reflect the speaking ability of
the test taker to function in another environment—that of the real-life domain of
school and work. Listening and speaking over the telephone and in face-to-face
conversations involves different cognitive demands. Performance on these tasks in
an aural channel might be different on similar tasks when presented in a visual
channel, for example, conversations involving body language and facial gestures.
LANGUAGE TEST FOR EMPLOYMENT 305

Thus, scores from a performance on the SET–10 may not meet external criteria,
such as employment ratings. The content found in the test tasks of the SET–10 is
not representative of demands of corresponding tasks outside this test situation.
According to Bachman (1990), the definition of authenticity is “a function of the
interaction between the test taker and the test task. … Test authenticity thus be-
comes essentially synonymous with what we consider communicative language
use, or the negotiation of meaning” (p. 317). Through this definition comes the
conclusion that because there is no negotiation of meaning found in Parts A and
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

B of the test and perhaps Part D, this test has to be considered relatively less
authentic.
The PhonePass test needs a major revision of its test tasks for its test scores to be
interpreted as a valid, reliable, and fair predictor of the test taker’s language abili-
ties. But as Bachman (2002) pointed out, because most real-life domain tasks are
complex and diverse, “the evidence of content relevance and representativeness
that is required to support the use of test scores for prediction is extremely difficult
to provide” (p. 453). One possible way out of this dilemma is the development of
specialized test tasks specific to a particular domain so that there might be one ver-
sion to assess business language skills and another version to assess academic abil-
ity. Making the test tasks adaptive might also give the examiner a more reliable as-
sessment of the test taker’s communicative competence. As Bachman (1990)
stated, “The key is the selection or creation of language whose content is interest-
ing and relevant enough to the test taker to be engaging” (pp. 321–322). The con-
tent found in PhonePass is neither interesting nor relevant to be engaging.
It would be interesting to see if the test takers’ perception of the authenticity of
the test characteristics in the PhonePass test affected their performance, following
the study by Lewkowicz (2000). Certainly, further research in this area is needed.
Indeed, as Bachman (2001) stated,

What may … be critical, in terms of test takers’ performance on an assessment task, is


their perception of authenticity—of the relevance of the characteristics of the assess-
ment task to the characteristics of tasks in a particular target language use (TLU) do-
main. That is, one could hypothesize that test takers’ perceptions of authenticity will
affect the way in which they approach a particular assessment task, the strategies they
use in completing it, and ultimately their performance on that task. (p. 73)

Should we expect test takers to generate “authentic-sounding output” (per Lew-


kowicz, 2000) from input that is clearly inauthentic, as in the case of PhonePass?
It seems that PhonePass may be reaching the status of a high-stakes test because
it, like all high-stakes tests, “can create winners and losers, successes and failures,
the rejected and the accepted” (Shohamy, 2001, p. 374). Because this analysis of
PhonePass reveals that it fails by any reasonable measure of authenticity, then one
has to come to the grim conclusion that this test is nothing more than a crass com-
306 CHUN

modity, cynically designed to appeal to the bottom-line concerns of both busi-


nesses and institutions alike, manifesting what Foucault (1975/1979) called the
“subjection of those who are perceived as objects and the objectification of those
who are subjected” (pp. 184–185). Like all commodities, PhonePass objectifies
the consumer—in this case, the unfortunate test taker who is subjected to this test.

REFERENCES
Downloaded by [University of Southern Queensland] at 09:50 14 May 2013

Anderson, R. C. (1972). How to construct achievement tests to assess comprehension. Review of Edu-
cational Research, 42(2), 145–170.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, England: Oxford
University Press.
Bachman, L. F. (2001). Some construct validity issues in interpreting scores from performance assess-
ments of language ability. In R. L. Cooper, E. Shohamy, & J. Walters (Eds.), New perspectives and is-
sues in educational language policy: A festschrift for Bernard Dov Spolsky (pp. 63–90). Amsterdam:
Benjamins.
Bachman, L. F. (2002). Some reflections on task-based language performance assessment. Language
Testing, 19, 453–476.
Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford, England: Oxford University
Press.
Buck, G. (2001). Assessing listening. Cambridge, England: Cambridge University Press.
Foucault, M. (1979). Discipline and punish: The birth of the prison (A. Sheridan, Trans.). New York:
Vintage. (Original work published 1975)
Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), Selected papers from the Euro-
pean Year of Languages Conference, Barcelona (pp. 27–48). Cambridge, England: Cambridge Uni-
versity Press.
Lewkowicz, J. A. (2000). Authenticity in language testing: Some outstanding questions. Language
Testing, 17, 43–64.
Luoma, S. (2004). Assessing speaking. Cambridge, England: Cambridge University Press.
Ordinate Corporation. (2004). SET–10 test description and validation summary. Retrieved January 29,
2005, from http://www.ordinate.com/pdf/SET–10_Test_Desc_Validation.pdf
Ordinate Corporation. (2005). Test applications. Retrieved February 28, 2005, from http://www.ordi-
nate.com/content/prod/applications.shtml
Ordinate Corporation. (n.d.). Validation summary for PhonePass SET–10. Retrieved January 29, 2005,
from http://www.ordinate.com/pdf/ValidationSummary000302.pdf
Shohamy, E. (2001). Democratic assessment as an alternative. Language Testing, 18, 373–391.

You might also like