Professional Documents
Culture Documents
Davison (2004) - Teacher-Based Assessment in OZ and HK
Davison (2004) - Teacher-Based Assessment in OZ and HK
I Introduction
In some countries (e.g., England, and increasingly over recent years,
the USA) the distrust of teachers by politicians is so great that in-
volving teachers in the formal assessment of their own students is
unthinkable. And yet, in many other countries (e.g., Norway and
Sweden) teachers are responsible not just for determination of their
students’ results in school leaving examinations, but also for univer-
1
There is not necessarily more variability in writing assessment than other modes. It is simply
that writing assessment is the most common mode of assessment in English language
education, and the easiest to research.
308 Teacher assessment practices in Australia and Hong Kong
2
The difference between criterion- and construct-referenced assessment systems is in the
relationship between written descriptions (if they exist at all) and the domains (Wiliam,
2001a). In criterion-referenced systems written statements collectively define the level of
performance required (or, more precisely, the justifiable inferences), whereas in construct-
referenced systems written statements merely exemplify the kinds of inferences warranted.
Chris Davison 311
the raters (or anybody else) to know what they are doing, only that
they do it right’, English language education is something in which
the wider community has a vested interest. Thus, debates over the
rights and wrongs of English language teaching (and teachers)
become very heated when teacher-based assessment is discussed.
Finally, construct-referenced assessment, like criterion-based sys-
tems, assumes that agreement within the community, rather than
disagreement, is primary. This can lead to moderation meetings, an
essential requirement of construct-referenced systems, valuing com-
monality over diversity of opinion, with a consequent undermining
of validity as teachers bury their differences in the drive for consen-
sus. As Clarke and Gipps (2000: 40) argue, group moderation must
be used to ‘ensure that teachers have common understandings of the
criterion performance’. This implies development, not just vali-
dation of understandings, and inevitably requires more time than
most systems are prepared to fund.3
In summary, it can be argued that both criterion and construct-
referenced systems have their limitations as approaches for teacher-
based assessment. This study not only reveals the extent to which
these limitations are realized in teachers’ actual assessment practices
in different contexts, but also suggests some ways to overcome these
limitations by exploring what teachers actually do as they assess,
not just what systems expect them to do.
3
Paradoxically, moderation meetings are almost always the first casualty of the implemen-
tation of school-based assessment. For example, when the new Certificate program was intro-
duced in Melbourne schools in 1992, all teachers’ grades were subject to systematic
verification and assessor moderation but this checking process was abandoned as too costly,
and replaced by a system of top-down state-wide reviews and introduction of an external
comparison=control, the ‘context-free’ General Achievement Test (GAT). If there was a dis-
crepancy between the school-based results and the GAT, reviews of all school-based assess-
ment in specific schools were automatically triggered. In 2000 this was replaced by
standardization of results with those of the external exam, hence undermining the integrity of
the school-based assessment. In Hong Kong there is already talk of the need to supplement
teacher moderation in school-based assessment with standardization against the external
examinations (South China Morning Post, 31 May 2003, p. E3)
312 Teacher assessment practices in Australia and Hong Kong
4
This research was carried out with the aid of grants from the Australian Research Council
and the University Grants Commission, Hong Kong.
Chris Davison 313
5
Space precludes a more detailed analysis of the teachers’ assesssment conversations and
commentaries as situated and distinct forms of discourse.
314 Teacher assessment practices in Australia and Hong Kong
Fig. 1 Assessment criteria for CAT 1 Presentation of an issue (ESL) Source: Board of
Studies, 1999: 1620.
(Looks at criteria as rereads bits of text, at the same time jotting down
15 marks and comments on the assessment sheet in blue pen) The whole
thing does show a limited understanding of the task (explicit reference to
Criterion 1, Figure 1). I think I’d rate it as a medium. And the reasons
why I’d rate it as medium rather than high or even low, is that he is
aware that he needs to express his opinion and that it should be an
20 informed opinion and that he does make, I suppose, he does sort of list a
6
In all of the extracts, initial letters identify the different teachers participating in the assess-
ment discussion. Transcription conventions include bold for my emphasis; ¼ indicates over-
lapping turns.
7
Pseudonymns are used for all references to students.
Chris Davison 315
whole host of ideas, which one assumes he has collected from his read-
ings. And, I think then that too, shows at least a medium knowledge of
the chosen content. Perhaps medium to high. High because of the actual
number of opinions. But much of the material which he calls to support
25 his ideas is rather superficial and tends to be opinionative rather than
substantive. So yes, there is some knowledge, I’ll give him a high for that
(explicit reference to Criterion 2, Figure 1) -perhaps a little generously.
You’re thinking globally first off on first reading, and then you start to
apply the criterion and your frame of mind changes according to ‘Do I
downgrade?’ or ‘Do I upgrade?’ . . . lots of us have discrepancy, we’d like
to reward a little bit more for that particular expression or style of writ-
ing, but we’re not going to reward it for its ideas, for example, depend-
ing on the first reading of the piece. (Follow-up interview, p. 3)
R: I was appallingly sort of honest with what I did here. I ticked the boxes
and was horrified when I finished up with a B plus, because my instinct
told me it was a C plus, but I stuck with it.
J: Why did your instinct tell you that?
5 R: Well, I when I mean instinct, when I read it through the first time I
thought well no paragraphs, no this, no that, there were some glaring,
well, let me stick to what I did rather than that. I noticed with several
of these, I was under the impression that they were meant to state con-
tention at the head of the piece.
10 M: That’s what I meant by formulaic the formula is, it’s in the title.
R: They haven’t done that, and I thought that was a fundamental cri-
terion. I mean not a criterion in the sense that you penalised or
rewarded them. And I thought that, well, his appreciation was at
best, moderate. Now, I suppose by that I mean medium, but I think I
15 should have gone for low.
M: The reason why mine was not above a C plus in the end, even though
as you can see I . . . this is a B, because I did change my marks after a
Chris Davison 317
while, was because I felt the main task here is to understand. If the
first couple, I consider the first three more important than say the last
20 three. In fact, I’ve often argued that they should not be weighted
equally. If the language is such, this is definitely such, that you could
clearly understand what he is saying, then the language does not de-
tract from meaning. Then it boils down to the first three criteria.
There is where I felt he really had quite a few problems.
25 R: So, what did you give him for those first three?
M: He did not maintain a personal viewpoint and argue coherently, sup-
porting and substantiating his arguments, so I couldn’t go the to B=B
plus, because for me that has to happen. Even at the lowest level, it
has to happen. And yet, you can’t go down any lower than that
30 because his ¼
B: ¼ so what are you saying that there, you’re talking about criterion 4
(see Figure 1) for example.
R: No, you’re initially restricting yourself to you’re commenting on 1,
2 and 3. I would argue that what you are saying is more pertinent to
35 4 and 5.
B: So when you’re saying criterion 3 is no good, it’s not effective and
appropriate exploration of ideas.
M: I’m not saying it’s not good. What I’m saying is instead of the struc-
ture, instead of arguing the case and substantiating the argument,
40 and then saying the opposition may have claimed this and that and
then rebut it, he seems to have stacked arguments one after the other.
B: OK, so you’re talking about criterion 5, 4 and 5 (see Figure 1). So, if
we look at the criteria, would you say that criterion 5 medium, the
work demonstrates some ability. I mean it’s a modifier, isn’t it? It’s
45 not total, or universal ability, some ability to . . . He’s demonstrated
some ability, it’s certainly not high. It’s certainly not some structure.
R: I think you and I are actually, what we’re doing is being quite legalistic
about this, aren’t we?
B: I always try to do that myself.
50 R: I have a dreadful conflict within myself as to what I call my intuitive
judgment, which is what you are going on, I think.
J: It is, it is.
R: And thinking about the future and where is this child going, can this
child cope? But that’s not what we’re being asked to do. We’re being
55 asked to tick the box. You say something is low. I don’t think you
can say that’s low, sadly.
B: But here, I think all we have to do is exactly I just read the
words. And it’s always worried me, this good or sound, because to me
there’s quite a range between good and sound. Good, this is good.
60 Sound is much better, much more concrete ability.
J: When I was speaking on the tape, I was saying I don’t know whether
I’m going to go medium or low here. I just felt, when I read it through
and look at it holistically, as you say, as a whole thing, I just think, it
65 hasn’t had a lot of depth, it doesn’t have a lot there and therefore
shouldn’t score well.
R: You’re saying this child is not tertiary material, that’s what you’re
saying.
318 Teacher assessment practices in Australia and Hong Kong
was very interesting, because when we marked and, say I had G’s stu-
dent marked a couple of grades lower than G. had given it. G. would
20 come out with the background of this ‘but she’s tried so hard, you
know, and that was a justification’. And I remember those very
clearly.
G: Oh really, OK.
B: Yes, we did it that way. It’s pretty hard to avoid it when you get into
25 that sort of classroom relationship with some kids.
G: I suppose, I mean, if it operates over too wide a spread then it’s a real
problem if it isn’t there. If it’s a question of like, whether it’s a C or a
C plus, then, OK, I guess you can live with that. But if it’s a question
of whether it’s a C or an A, then that really calls it into question,
30 doesn’t it? If you can give it that much of the benefit.
M: It’s really those border areas and I think, if truth be told, subcon-
sciously, we are all affected by the students in front of us when we
know them.
R: By definition, I mean, we’re human beings, aren’t we?
35 M: That’s right.
G: Oh yes.
R: On the other hand, I think that’s the idea of the criteria, to try and
minimise that.
S: This is a very good grade. That should be 71, a safe C. Now, I don’t
know. The language because though she used a lot of idioms, she does
have a mass group of the language there. If it is overused, I think it is
a very general thing with Hong Kong students. They swallow diction-
5 aries and then they try and pump out as many of these sort of like
‘Every cloud has a silver lining’ and all of these sorts of stuff. I think
it’s better, much, much better than any of the others that we have
looked at so far. And, I know there are two ways you can look at this
and there are two ways that I can look at this. Either I can give this a
10 respectful mark or I can fail it. There is nowhere in between. Either it’s
going to get it a C or it’s going to get F. I went for the C in the end
because I guess I understood what she was trying to say. If you can
get your mind around her sort of very flowery language, then there is
an argument there.
15 V: ¼ I think she’s trying to show [unintelligible] poetic language.
Extract 6: Respect
Even more striking is the way in which the Hong Kong teachers in
this study unlike the Melbourne teachers expressed their con-
cern about their lack of authority and influence as teachers, as can
be seen in Extract 7. Here, R makes the extraordinary statement
that his marking a dominant and time-consuming feature of his
daily routine is ‘pretty negligible’ (line 8), because the grades and
comments have no effect on learning or teaching in his school.
Chris Davison 323
View of the assessment Mechanistic, procedural, De-personalized, explicit, Principled, explicit but Personalized, implicit, Personalized, intuitive,
process automatic, technical, codified, legalistic, interpretative, attuned to local highly impressionistic beyond analysis
seemingly universalized culturally detached. cultures=norms=expectations culturally bound e.g., You just know. . .(HK5);
e.g., I just follow the e.g., I have to be legalistic . . . e.g., It’s very complex and e.g., Can this child cop?e She’s just got it. . .(A3)
criteria (A9); I ticked the (A5); I would like to give ultimately you have to give more (A5); You’ve saying this
boxes (A5); The criteria a higher grade but I can’t weight to one thing than another, child is not tertiary
are just there, so it’s because of the criteria (A10) it comes down to professional material (A5)
really easy (A6) judgement (A4)
View of the assessment Text-focused Text-focused, but awareness Text and student focused Student-focused Student-focused
product of student
View of inconsistencies Seemingly unaffected by Inconsistencies a problem, Inconsistencies inevitable, Inconsistencies a Seemingly unaffected by
inconsistencies threat to reliability cannot necessarily be resolved problem, threat to inconsistencies
e.g., I worry when I make satisfactorily, teachers need to validity, assessor
judgment. . .am I interpreting rely on professional judgement training needs to be
the criteria correctly? (A2) e.g., I have to juggle things, weight improved
them up in my own mind and
think what the alternatives are (A1)
e.g., I think would my
colleagues accept this
as an A? (HK2)
View of assessor needs Need better assessment Need better assessor Need more time for moderation Need ‘better’ assessors System not open to scrutiny,
e.g., for support=training criteria training (to interpret and professional dialogue (to uphold standards) not accountable, operated
criteria) (to make basis of judgments by the ‘chosen’ few.
more explicit)
V Conclusions
Wiliam (2001a) argues that high-quality educational provision
demands that teachers are involved in the summative assessment of
their students. This study reveals that teachers in Hong Kong and
Australia have very different approaches to assessment and very dif-
ferent ‘assessment’ problems. However, there is an urgent need in
both ‘old’ and new teacher assessment contexts to provide more
opportunities for teacher interaction around assessment issues.
Teachers need explicit high quality assessment criteria as a frame-
work for dialogue. They also need time and space to develop a
sense of ownership and common understanding of the assessment
process and to articulate and critique their often implicit constructs
and interpretations. Such teacher interactions are also necessary to
help all stake-holders develop a more informed perspective of
teacher assessment practices and to establish the key ingredients for
validity and reliability in teacher-based assessment: dialogue and
trust.
VI References
Ajzen, I. 1988: Attitudes, personality and behavior. Milton Keynes: Open
University Press.
Alderson, J.C. and Wall, D. 1993: Does washback exist? Applied Linguis-
tics 14, 11529.
Andrews, S. 1994: The washback effect of examinations: its impact upon
curriculum innovation in English language teaching. Curriculum
Forum 4, 4458.
Birrenbaum, M. and Dochy, F.J., editors, 1996: Alternatives in assessment
of achievement, learning processes and prior knowledge. Boston, MA:
Kluwer.
Black, P. and Wiliam, D. 1998: Assessment and classroom learning.
Assessment in Education 5, 774.
Chris Davison 329