You are on page 1of 1

Abstract Writing assessment involves human rater‟s/raters‟ judgment, a fact, sufficient in itself, to make it susceptible to subjectivity, variability

, and unpredictability. Raters are different in terms of first language background, culture and personality; therefore, it is no wonder that they disagree in the evaluation of a given composition written by yet another person, different in the same respects. What is actually unnatural is their agreement albeit being very different. If one rater is more stringent than others, those students who are unfortunate enough to have their performances assigned to him/her are at a great disadvantage. The situation is aggravated even further when “life-changing” (McNamara and Roever, 2006) decisions are made based on such discrepant scores. Even with the clearest possible detailed scoring instructions and the most efficient rater training, an element of subjectivity always remains in raters' judgments (Weigle 1994, 1998, 2002; McNamara and Lumley, 1995) affecting seriously students' scores and their validity. Possible sources of rater inconsistency have been investigated in both first and second language contexts. Shaw and Weir (2007: 168) note that a crucial factor which has a great impact on the way 'raters evaluate written performance is the characteristics of the raters themselves.‟ O'Sullivan, (2000, cited in Shaw and Weir, 2007: 168) attributes rater variations to three possible sources of influence: 'physiological, psychological, and experiential.‟ In the same vein, Weigle (1994) grouped sources of raters' disagreement into three categories: 'within the text, within the rater, and within the rating context.‟ To these, Bachman (1990) adds other factors that are not related to the test takers'

linguistic ability; yet still affect test scores, and therefore their reliability. Bachman refers to these factors as potential sources of measurement error and 'categorizes them into three mutually exclusive groups: 1) test method factors (e.g. raters, prompt type, etc.), 2) personal attributes (e.g. test taker's cognitive style, knowledge of particular content, etc.), and 3) random factors (e.g. fatigue, time of day, etc…) (Quintieri, 2005; McNamara 1 996, 2000, 2006). This study focuses on the rater, a salient source of rating variability in writing assessment. It investigates the effect(s) that a rater's first language (L1) might have on their behaviour while assessing writing. The literature is fraught with studies in this area; yet these studies have contrasting and non-conclusive results owing to „a preoccupation with a limited set of factors, at the expense of others that may have also influenced behavior‟ Erdosy (2004: 6). At one end, a number of studies have found little or no relationship between the rating of students‟ compositions and rater L1 background; while at the other end we find many studies that have found a strong relationship between raters‟L1 background and the way they score students‟ essays. The purpose of this study is essentially to find out quantitatively whether a rater‟s linguistic background has an effect on the way they assess ESL/EFL compositions in an Omani context. Twenty (n=20) raters with four different L1 backgrounds (5 Arabs, sharing students' first language and culture, 5 English native speakers, 5 Indians, and 5 Russians) assessed three (n=3) essays written by Omani students. Findings from bias analysis show different assessing patterns for the four groups of raters, which could be a strong indication that raters‟ different L1s are very likely to be underlying the displayed patterns in their ratings. Both FACETS and ANOVA analyses reveal four different rating patterns for these sample raters: the Indian raters tend to be severe on all categories. Actually, they are the most severe among the four groups. The Russians, on the other hand, are the most lenient on all categories. The third pattern is represented by the group of native speakers, who show a tendency to be severe on Content and Coherence and Cohesion. Also, and quite interestingly, qualitative data from interviews processed using Nvivo, reveal that this group of raters say they do not focus on grammar while scoring, but Text Analysis of the scored scripts indicate they actually pay much heed to grammatical structures in students‟ writing, which confirms McNamara‟s (1996) findings. The last pattern is that of Arab raters, who tend to be severe on grammar and vocabulary but lenient on Content and Coherence and Cohesion.