Assessing Speaking Skills: A: Workshop For Teacher Development Ben Knight

Assessing speaking skills: a
workshop for teacher development
Downloaded from https://academic.oup.com/eltj/article/46/3/294/388540 by National Science & Technology Library user on 30 September 2023
Ben Knight
Speaking skills are often considered the most important part of an EFL
course, and yet the difficulties in testing oral skills frequently lead teachers
into using inadequate oral tests or even not testing speaking skills at all.
This article describes a workshop used in teacher development
programmes to help teachers with one aspect of the problem of oral
testing: what should we look for when we assess a student's ability to speak
English? The workshop looks first at the range of criteria that teachers
might use in such assessment. Then it examines how the selection and
weighting of those criteria should depend on the circumstances in which
the test takes place. The article also discusses issues raised by the
workshop, and considers its applicability to people working in different
circumstances.
Reasons for the Assessment of speaking skills often lags far behind the importance given
workshop to teaching those skills in the curriculum. We recognize the importance of
relevant and reliable assessment for providing vital information to the
students and teachers about the progress made and the work to be done.
We also recognize the importance of backwash (the effect of the test on
the teaching and learning during the course). Most teachers would accept
that 'if you want to encourage oral ability, then test oral ability' (Hughes,
1989:44). But the problems of testing oral ability make teachers either
reluctant to take it on or lacking in any confidence in the validity of their
assessments. Such problems include: the practical problem of finding the
time, the facilities and the personnel for testing oral ability; the problem of
designing productive and relevant speaking tasks; and the problem of
being consistent (on different occasions, with different testees and
between different assessors). Another problem, which is the focus of the
workshop framework described here, is deciding which criteria to use in
making an assessment. The workshop has two principal aims:
1 to make teachers more consciously aware of the different possible

criteria they could be using to assess their students' speaking skills;
2 to make teachers more aware of the way their selection and weighting
of those criteria depend on the context in which they are to be used.
Achieving these aims is crucial for making valid and reliable tests. Except
where tests are being marked holistically (simply in terms of degrees of
communicative success), marking involves the use of assessment criteria.
Even when the assessment is holistic on the surface, the assessor may be
thinking in terms of criteria in judging that overall communicative success
(Bachman, 1990: 329). It is doubtful whether the criteria can be
294 ELT Journal Volume 4613 July 1992 © Oxford University Press 1992
considered justified and validated if the assessor is not even explicitly
aware of them. The reliability of an assessor on different occasions with
different testees can be improved by more explicit criteria, as can the
reliability between assessors.
The workshop The workshop takes between about 1 Vi and 2Vi hours and requires two or
three short video clips of students talking.1 Making your own video clips
is preferable, as you can make the task and situation reflect the type of test
which the teachers you are addressing are most likely to use.
Stage 1: a. Viewing and reflection. (10 mins.)

assessment Teachers are shown a video clip of a student (or students) talking and are
criteria asked to reflect on the question 'Which aspects of the students' speaking
would affect the grade you would give the students for their speaking
skills?'. The presenter needs to say in advance how long the clip will be,
and what instructions were given to the students.
b. Discussion (15 mins.)

Teachers can compare their notes in pairs or small groups, and then this
discussion can open up into a plenary. The objective at this stage is to get
the teachers to be more conscious of what affects their own judgements,
and to see how others may view it differently. The presenter's role will
include pinning people down on vague terms, such as 'communicative' or
'passive' (I have heard 'communicative' being used for: i) easy to
understand, ii) says a lot, iii) makes up for linguistic weakness with
gestures, etc., iv) interacts well with other person in role-play). Another
role is to elicit or suggest concrete examples (from the clip) of features
being discussed (e.g. an example of 'inappropriate language'). There is no
need at this stage to try to resolve all the differences of opinion, as often
those differences stem from different assumptions about the testing
context, which will be looked at later. After this discussion, it is useful to
show the clip again, so that people can reconsider the points made by
themselves and others.
c. List of assessment criteria (10-15 mins.)

The presenter then hands out copies of the list of assessment criteria (see
Figure 1). This list is fairly comprehensive in its broad categories, though
within those there could be many more detailed criteria (for example, in
Figure 1 Assessment criteria
1 GRAMMAR
a. range
b. accuracy
2 VOCABULARY
a. range
b. accuracy
3 PRONUNCIATION
a. individual sounds (esp. phonemic distinctions)
b. stress and rhythm
Assessing speaking skills 295
c. intonation
d. linking/elision/assimilation
4 FLUENCY
a. speed of talking
b. hestita'tion while speaking
c. hesitation before speaking
5 CONSERVATIONAL SKILL
a. topic development
b. initiative (in turn taking, and topic control)
c. cohesion: i) with own utterances
ii) with interlocutor
d. conversation maintenance
(inc. clarification, repair, checking, pause fillers, etc.)
6 SOCIOLINGUISTIC SKILL
a. distinguishing register and style
(e.g. formal or informal, persuasive or conciliatory)
b. use of cultural references
7 NON-VERBAL
a. eye-contact and body posture
b. gestures, facial expressions
8 CONTENT
a. coherence of arguments
b. relevance
an investigation of 'fluency' alone (Lennon, 1990:404—405), 12 different

variables were looked at, ranging from 'words per minute' to 'mean pause
time at T-Unit boundaries'). The amount of explanation needed for the
terms on the list will of course depend on the teachers.
d. Viewing and comment (10-15 mins.)

The teachers are then shown another clip of students talking, and are
asked to think about the usefulness and relevance of the criteria on the list
for assessing the students' speaking skills, adding or deleting as they think
necessary. The objective of this stage is to consider further the criteria
they think are relevant for assessing speaking skills, and also, by getting
them to relate their views to the terms on the list, to give them a common
vocabulary to aid discussion of differences of opinions. I have found
teachers tend to be less talkative here, since it is a point of mental
reorganization for them as they try to relate their own feelings and
experience with the list.
Stage 2: a. Introduction (5 mins.)

Assessment and gy ^ s stage, the question of context should have arisen several times
the context ( e g m m e form o f comments beginning 'Well, it depends on why . . .
/what. . . /where . . . /how . . . ' ) . The presenter now recalls various
examples of these and notes how they show the importance of the context
in deciding the choice of assessment criteria.
296 Ben Knight
b. Examples (10-15 mins.)
Teachers are then given the hand-out with the examples of different
selections and weightings of criteria together with descriptions of the
relevant contexts (see below). Note that the number under the
'Weighting' column does not represent maximum marks for that
criterion, but its value relative to the other criteria. For example, each
criterion might be given a mark out of ten, and each score would be
multiplied by its weighting number before being totalled up. Teachers can
then be encouraged to ask any questions about the criteria, the context and
the relationship between the two. For example: 'Why did you include
listening comprehension in the placement test, but not in the end-of-term
test?' It would, of course, be wise for you as presenter to use your own
examples (criteria you have used yourself) so that you are more likely to
be able to answer such questions.
Criteria and context

Look at these two examples of differences in the selection and weighting of
assessment criteria. Note how these two examples of assessment criteria sets vary
according to the situation, and try to list the factors in the testing situation which
can affect such selection and weighting.
1 Placement test
A placement test for University students (to decide which level they go into for
their General English Communication course. This is an interview—basically
answering questions—with a teacher who is both interviewer and scorer at the
same time. It is taken with a single written gap-fill test, which assesses knowledge
of grammatical structures, vocabulary, and functionally appropriate structures.
In informal terms, the qualities we felt were important for success in a class were
the ability to understand the teacher, knowledge of vocabulary, structures and
functions (largely tested in the written gap-fill test), confidence and a willingness
to take chances and try things out, as well as the ability to distinguish
(productively) basic phonemes in English. The category of 'range of grammar and
vocabulary' aims to capture people with a wide experience of English (who we
thought would progress more quickly).
(Numbers on
main list) Weighting
.1 Range of grammar and vocabulary (la and 2a) 3
2 Accuracy of grammar and vocabulary (Iband2b) 2
3 Phonemic distinctions (3a) 2
4 Hesitation (4b, c) 4
5 Initiative, topic development, and
conversational control (5a, b, d) 4
6 Listening comprehension (not listed) 5
2 End-of-term ESP test

This test was given at the end of one term of a course for receptionists in
international hotels, to see how much they had progressed and what they needed to
work on the following term! The speaking test was a role-play with two students,
and the teacher only observed and marked the score sheet. There were several other
tests—gap-fill and multiple-choice tests of grammar, vocabulary, and functions,
and a listening comprehension test.

(Numbers on
main list) Weigl
Grammar and vocabulary (1 and 2) 3
Pronunciation (3) 1
a. individual sounds 1
b. stress and rhythm 1
c. intonation and linking 1
Fluency (4)
a. hesitation before speaking 1
b. hesitation while speaking 1
Conversational skill (5)
a. cohesion 1
b. conversation maintenance 1
Sociolinguistic skill (6)
a. distinguishing register
and style 2
b. use of cultural references 1
Non-verbal (7)
a) eye contact and body posture 1
b) gestures and facial expressions 1
Content (relevance) (8) 2
c. Context variables (15 mins.)

By looking at the examples and thinking of their own experience, the
teachers are asked to abstract the variables in the context which may affect
the choice and weighting of the criteria. The variables could include the
following:
i. The purpose of the test:
—achievement, proficiency, predictive or diagnostic?
—and depending on that: the course objectives; the underlying theory
(formal or informal) of what constitutes language proficiency; the
future situations the students are being tested for; the types of
feedback the students in question would understand and could
benefit from.
ii. The circumstances of eliciting the sample of language being assessed.
—the degree of freedom or control over what the student could say
and do.
—the number of participants and their roles in the situation.
iii. Observation restrictions
—extent to which assessor participates in speaking situation (e.g.
interviewer or observer).
—whether recorded or not (on audio or video cassette).
iv. the other tests in the battery (e.g. the selection or weighting of a
criterion for grammatical accuracy may depend on how much it has
been assessed in accompanying written tests).
d. Using the different criteria sets (15 mins.)

(Optional if short of time.)
Teachers watch another video clip and assess the student's oral skills
298 Ben Knight
using first one of the example criteria sets and then the other. This is to
demonstrate how different criteria sets (appropriate in different contexts)
can produce different assessments of the same performance. Different
criteria sets will not always produce differing results, and so care needs to
be taken to use a clip which will make the point clearly.
e. Task (20-30 mins.)

Teachers are given details of a testing situation (preferably compatible
with their own) and are asked to decide on the criteria they would use and
the weighting for those criteria. There follows an example of such a task:
TASK (Selecting Assessment Criteria.)

You should imagine you are responsible for the oral testing of 100 students who have
applied to study in U.S. colleges for one year (various subjects). You have to judge
whether they have sufficient speaking skills to survive and do well there. You can
conduct a 10-minute interview for each (with only one interviewer/assessor for each
interview). The interviewers are all experienced and trained EFL teachers. The other
tests in the battery will be multiple-choice tests of grammar and vocabulary, of
academic reading and lecture listening comprehension, and essay-type questions.
Decide on the criteria you would use in assessing their spoken English, and the
relative weighting of each.
The purpose of this task is:

—for teachers to think more concretely about the points raised so far, to
let them see how referring to a particular context (described in the task)
can reduce the differences in opinion when talking more generally.
—to provide an intermediary stage between the thinking in the earlier part
and the need for application of those thoughts to their own situation
after the workshop.
/. Conclusion (5 mins.)
The presenter can ask students for their comments on the workshop—how
useful it was, how it could have been more useful, whether they think they
would change the way they assess their students' speaking skills, and so
on.
Discussion There is still a great deal of subjectivity in a) the selection of criteria, and
7. Objective b) the way each criterion is measured (e.g. how exactly do you decide the
criteria? grammatical accuracy of a speaker's performance?). The workshop aims
only to improve the quality of those subjective decisions about selecting
criteria by making it more conscious and explicit, and by giving the
teachers a chance to discuss other points of view. It assumes that teachers
do not have the resources to carry out their own research. A kind of
collective subjectivity can be reached for how each criterion is measured
by 'training' or 'moderating' sessions for assessors. But for those who
have the time and resources to look closely and objectively at these
questions, the following will be of interest: Hinofotis (1980), Hieke
(1985), Fulcher (1987), and Lennon (1990).
Assessing speaking skills 29S»
2. Analytic or Several tests, such as the RSA Profile Certificate or the ILR (Interagency
holistic Language Roundtable) oral interview, use a different type of criterion to
assessment? this workshop. The speakers are observed in different speaking tasks and
they are simply judged for their degree of success in that task. This
holistic approach argues that, as we cannot observe directly mental
characteristics like grammatical knowledge or ability to maintain
conversations, it will be inaccurate to give (inferred) scores for them.
Rather we should simply assess the learner's (observable) success in
performing authentic language tasks.
The approach behind this workshop, however, is one which argues that it
is those mental abilities (which we must infer from the learner's
performance) that we are most interested in, for at least the following
reasons. Firstly, we cannot predict, let alone test for, every function and
situation which a learner might need English for. Therefore any claim
about general proficiency must involve a lot of inferences from the few
sample functions/situations we do test. Secondly, a lot of teachers' own
tests are partly diagnostic, and teachers need to know more about why a
particular learner performed badly in some situations and better in others.
This will usually come down to inferred component abilities, such as
range of vocabulary or fluency. For a detailed discussion of the two
approaches, see Bachman (1990: 41-42, 301-333). Hughes (1989: 110)
recommends using both approaches, with one as a check on the other.
3. When to do this This workshop on its own may seem peripheral as teachers often worry
workshop? more about such problems as making a practical test, setting fair tasks and
getting a fair sample of the students' language, and being consistent.
However, it is probably helpful to tackle the problem of assessment
criteria before these other questions, since we need to start by deciding
what we want to measure, before deciding what is the most reliable and
practical means to measure it. The danger, otherwise, is that we choose
reliable and practical tests (e.g. multiple-choice tests) which do not give
us the information we really want about our students' oral skills, and
which can have a negative effect on students' attitudes to developing
those skills during the course.
4. Too complicated? Considering the context in selecting assessment criteria does make the
discussion more complicated. So with teachers for whom this topic is
completely new, it would probably be better to leave such considerations
aside or condense them severely. I have found some untrained teachers
saying that they wished they could have come away from the workshop
with one fixed set of 'best criteria'. Gebhard (1990:158,160) reports that
handed-down direction is preferred by beginning teachers (quoting
research in Copeland, 1982) and by teachers in certain countries who feel
that 'if the teacher is not given direction by the supervisor, then the
supervisor is not considered qualified'.
However, taking the testing context into account is valuable, despite the
added complexity, in dealing with two common problems with teacher-
development workers.
300 Ben Knight
Firstly, it makes it easier for teachers to apply what they leam in the
workshop to their own situations, especially when they are working in
contexts very different from that of the presenter. This is also helped by
the final task .(which is an exercise in applying to a particular situation
principles learnt during the workshop).
Secondly, it helps resolves conflicts of opinion. Many of the
disagreements at the beginning of a workshop can be related to different
assumptions about the testing context and its effect on the selection of
criteria. Thus, it not only improves our understanding, but also improves
the conduct of the workshop: it avoids the 'anything goes' approach
which creates cynicism, and it reduces the ex cathedra judgements by the
presenter which can lead to resentment or passivity.
5. Usable in other The main limitations to this workshop are:

circumstances? a# There must be a video player, and some video clips of students
speaking (this is best done by recording students speaking in situations
most likely to be used in tests by those teachers—i.e. a video camera is
helpful.).
b. The teachers must have a sufficient level of English (or other target
language) to assess the students and to contribute to the discussion.
c. The teachers need to have some experience of assessing students'
speaking skills. Pre-service teachers would probably be overwhelmed
by the workshop in its present form.
The workshop, however, need not be limited to native speakers, trained
teachers, or to EFL teachers, as it has proved as useful to non-native
speakers, untrained (but practising) teachers, and teachers of other
languages. Each of these three latter groups may have certain
characteristics which should be taken into account.
Some non-native speakers tend to place far greater emphasis on
grammatical accuracy to begin with, though this is usually due to the way
they learnt English rather than with any difficulty in perceiving discourse
and sociolinguistic skills.
Experienced, untrained teachers often lack the relevant vocabulary to talk
clearly about learning and teaching, and sometimes appear to be more
dogmatic or to lose perspective (e.g. claiming the single most important
criterion in assessing speaking skills is whether the speaker keeps eye-
contact with the listener).
While it is obvious that the details of grammatical accuracy and
pronunciation will be different for other languages, that is probably also
true of the other criteria such as conversation maintenance and non-verbal
behaviour. However, provided their level of English is sufficient for the
tasks, the workshop's aims (awareness of the range of different criteria
possible, and awareness of how their selection depends on the testing
context) can be met for teachers of different languages in the same group.
It should also be noted that these three groups bring different perspectives
which can only enrich the discussions: for example, suggesting as a
criterion for communication between non-native speakers 'the ability to
adapt your level of English to that of your interlocutor', or (from a
Japanese teacher) 'the skill of stating an opinion in such a way that it is
easy for the listener to disagree without seeming argumentative'.
Conclusions The workshop works as a way of stimulating teachers to think about and
discuss the way they assess their students' speaking skills. It is rare for a
participant not to be absorbed by the tasks and the exchange of ideas. A
few participants have found it rather frustrating not to have a fixed set of
testing criteria at the end of the workshop, but most seemed to find the
process of relating criteria to context helpful in clarifying their own
positions. A large survey of teachers' testing experience found 'there is
evidence that most [teachers] prefer to use informal and flexible
approaches which can be adapted to different student populations'
(Brindley, 1989: 31). This workshop suits such preferences.
Received September 1991
Note Hieke, A. 1985. 'A Componential Approach to Oral

1 The video clips I have used (about 3-5 mins. long) Fluency Evaluation'. The Modern Language
mainly showed students speaking in pairs, usually Journal. LXTX/2: 135-42.
in a simple role-play. Hinofotis, F. 1983. "The Structure of Oral
For example: Communication in an Educational Environment: a
comparison of factor analytic rational procedures'.
Student A: You want to go on a holiday to Hawaii. in Oiler, J. (ed.) 1983.
Try to persuade your partner to come with you, Hughes, A. 1989. Testing for Language Teachers.
though he or she wants to go somewhere else. Cambridge: Cambridge University Press.
Student B: You want to go on a holiday to Europe. Lennon, P. 1990. 'Investigating Fluency in EFL: A
Try to persuade your partner to come with you, Quantitative Approach'. Language Learning
though he or she wants to go somewhere else. XL/3: 387/417.
Oiler, J. 1983. Issues in Language Testing Research.
Rowley, Mass.: Newbury House.
References Richards, J. and D. Nunan (eds.) 1990. Second
Bachman, L. 1990. Fundamental Considerations in Language Teacher Education. Cambridge:
Language Testing. Oxford: Oxford University Cambridge University Press.
Press.
Brindley, G. 1989. Assessing Achievement in the
Learner-centred Curriculum. Sydney: National
Centre for English Language Teaching and
Research. The author
Copeland, W. 1982. 'Student teachers' preference Ben Knight teaches EFL and linguistics at Shinshu
for supervisory approach'. Journal of Teacher University, Japan. He obtained an MSc in Applied
Education. XXXm/2: 32-36. Linguistics from the University of Edinburgh in
Fulcber, G. 1987. 'Tests of Oral Performance: the 1987, and has taught EFL/ESL in Britain, Kenya,
need for data-based criteria.' ELT Journal XL1/4: Italy, India, and Sri Lanka. His current interests
287-291. include the testing of spoken English, teacher
Gebhard, J. 1990. 'Models of Supervision: choices'. development, and language learning beyond the
in Richards J. and D. Nunan (eds.) 1990. classroom.
302 Ben Knight

Assessing Speaking Skills: A: Workshop For Teacher Development Ben Knight

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessing Speaking Skills: A: Workshop For Teacher Development Ben Knight

Uploaded by

Copyright:

Available Formats

Assessing speaking skills: a

workshop for teacher development

1 to make teachers more consciously aware of the different possible

Stage 1: a. Viewing and reflection. (10 mins.)

b. Discussion (15 mins.)

c. List of assessment criteria (10-15 mins.)

an investigation of 'fluency' alone (Lennon, 1990:404—405), 12 different

d. Viewing and comment (10-15 mins.)

Stage 2: a. Introduction (5 mins.)

Criteria and context

2 End-of-term ESP test

Assessing speaking skills 297

Grammar and vocabulary (1 and 2) 3

c. Context variables (15 mins.)

d. Using the different criteria sets (15 mins.)

e. Task (20-30 mins.)

TASK (Selecting Assessment Criteria.)

The purpose of this task is:

5. Usable in other The main limitations to this workshop are:

Note Hieke, A. 1985. 'A Componential Approach to Oral

302 Ben Knight

You might also like