System 29 (2001) 371–383 www.elsevier.


Writing evaluation: what can analytic versus holistic essay scoring tell us?
Nahla Bacha*
Lebanese American University, PO Box 36, Byblos, Lebanon Received 14 November 2000; received in revised form 26 February 2001; accepted 29 March 2001

Abstract Two important issues in essay evaluation are choice of an appropriate rating scale and setting up criteria based on the purpose of the evaluation. Research has shown that reliable and valid information gained from both analytic and holistic scoring instruments can tell teachers much about their students’ proficiency levels. However, it is claimed that the purpose of the essay task, whether for diagnosis, development or promotion, is significant in deciding which scale is chosen. Revisiting the value of these scales is necessary for teachers to continue to be aware of their relevance. This article reports a study carried out on a sample of final exam essays written by L1 Arabic non-native students of English attending the Freshman English I course in the English as a Foreign Language (EFL) program at the Lebanese American University. Specifically, it aims to find out what analytic and holistic scoring using one evaluation instrument, the English as a Second Language (ESL) Composition Profile (Jacobs et al., 1981. Testing ESL Composition: A Practical Approach. Newbury House, Rowley, MA), can tell teachers about their students’ essay proficiency on which to base promotional decisions. Findings indicate that the EFL program would benefit from more analytic measures. # 2001 Published by Elsevier Science Ltd.
Keywords: English as a Foreign/Second Language; Writing evaluation; Essay evaluation; Analytic evaluation; Holistic evaluation

* Tel./fax: +961-9-547256. E-mail address: (N. Bacha). 0346-251X/01/$ - see front matter # 2001 Published by Elsevier Science Ltd. PII: S0346-251X(01)00025-2

Hayward. 1985. 1990. However. Hillocks.. 1983. 1995. Literature review In evaluating essay writing either analytically or holistically. 1998). b. 1986a. 1993. 1982. Douglas. Bacha / System 29 (2001) 371–383 1. 1991.1963. can tell us in essay writing evaluation for promotional purposes. Horowitz. Cooper and Odell. the choice of evaluation instrument to be adopted becomes significant. 1981. Raymond. In this study. Purves and Degenhart. 1991. 1983 cited in Gamaroff. in order for these programs to obtain valid results upon which to base decisions. Caudery. Some of these concerns have included the need to attain valid and reliable scores. 1990). two main related concerns in both L1 and L2 essay evaluation literature are the appropriateness of the scoring criteria and the standard required (Gamaroff. Reid. 1980. teachers have had to address a number of concerns that have been found in the research to affect the assigning of a final score to a writing product. 1990a. Lloyd-Jones. 1986. It is distinguished from assessment in that the scoring for the latter focuses more on feedback and alternative evaluative techniques in the process of learning. Evaluating essays in EFL/ESL programs has been mainly for diagnostic. Cushing-Weigle. Crowhurst and Piche. 1983. Jacobs et al. Kegley. 1992. analytic and holistic.372 N. set clear essay prompts. the paramount guiding principle is obviously the purpose of the essay. 1995. Hamp-Lyons. It is then important that teachers are aware of the potential of the evaluation criteria being adopted. 1990. Tedick and Mathison. set relevant tasks. Smith et al. Elbow. 1986. 1990. 1 . Upshur and Turner. Introduction In adopting scoring instruments with clearly identifiable criteria for evaluating1 English as a Foreign Language/English as a Second Language (EFL/ESL) essays. Read. 2000). 1977. b. Carlman. Brossell.. give sufficient writing time. 2. Thus. Shohamy. 1995. 1990. 1987.. Evaluation sometimes refers to assigning a score to a direct writing product based on predefined criteria. 2000). b. This is most crucial when decisions concerning student promotion at the end of the semester to the next English course have to be made mainly based on essay writing scores. 1979. 1986. 1986a. Kroll (1990a) emphasizes the complexity in setting criteria and standards by which to evaluate student writing. Although Gamaroff (2000) states that language testing is not in an ‘‘abyss of ignorance’’ (Alderson.. developmental or promotional purposes (Weir. and choose appropriate rhetorical modes (Braddock et al. 1995. the term evaluation will refer to assigning a score to a direct writing product. Kroll. 1994. 1990. These issues are also of concern and often require different consideration when evaluating non-native student essays in English (Oller and Perkins. 1982. This article focuses on examining what two types of scoring instruments. Quellmalz et al. the choice of the ‘right’ essay writing evaluation criteria in many EFL/ESL programs remains problematic as often those chosen are inappropriate for the purpose. Hoetker. Pere-Woodley. 1995. Tedick. 1999). Kunnan. 1982.

An evaluation instrument is said to have concurrent validity when the scores obtained on a test using the instrument significantly and positively correlate with scores obtained on another test that also aims to test similar skills. very good to excellent content has a minimum rating of 27 and a maximum of 30 indicating essay writing which is ‘‘knowledgeable — substantive — thorough development of thesis — relevant to assigned topic’’. and is usually referred to in theoretical terms. (1981) criteria have been researched and found to have construct validity in that significant differences were found when student essay scores were compared. For example. Each component and level has clear descriptors of the writing proficiency for that particular level as well as a numerical scale. Construct validity. Jacobs’ et al.’’ such as the final exam grade in the . but of lesser importance compared with the other two issues in this article. . in this case. (1981) writing criteria have content validity since the criteria outlined evaluate the performance of EFL/ ESL students’ different types of expository writing which are tasks they perform in the foreign language classroom. organization 7–20. 74). 1981. vocabulary.. is best addressed by identifying the purpose of the essay writing in EFL programs prior to adopting any criteria. Once the purpose is determined. The range for each of the writing skills are content 13–30. is the degree to which the scoring instrument is able to distinguish among abilities in what it sets out to measure. language 5–25 and mechanics 2–5. language. Even narrowing the discussion to a focus on academic writing is fraught with complexity. (1981) ESL Composition Profile criteria was selected among other similar ones in the present study as it has been used successfully in evaluating the essay writing proficiency levels of students in ESL/EFL programs comparable with those at the Lebanese American University. we cannot easily establish procedures for evaluating ESL writing in terms of adherence to some model of native-speaker writing. . . The Jacobs et al. Bacha / System 29 (2001) 371–383 373 There is no single written standard that can be said to represent the ‘ideal’ written product in English. Hamp-Lyons (1990) comments that it is ‘‘the best-known scoring procedure for ESL writing at the present time’’ (p. vocabulary 7–20. An additional concern. A writing evaluation instrument is said to have content validity when ‘‘. . Examples of tests that could correlate with essays might be ‘‘. Jacobs’ et al. 1981). is concurrent evaluates writers’ performance on the kind of writing tasks they are normally required to do in the classroom’’ (Jacobs et al. organization. the theoretical construct is ‘‘essay writing ability’’ which the instrument aims to measure. Benchmark essays are included in Jacobs et al. the second but equally important issue. however. . 78). while very poor content has a minimum of 13 and a maximum of 16 indicating essay writing that ‘‘does not show knowledge of subject — nonsubstantive-not pertinent — or not enough to evaluate’’ (Jacobs et al. (1981) text as guides for teachers. Therefore. poor to fair. (p. average to good.N. p. two important related issues to be considered are the content and construct validity of the essay task. and mechanics with each one having four rating levels of very poor.measures of overall English proficiency. and very good to excellent. 141) This difficulty.. . The Profile is divided into five major writing components: content.

establish the standards for criterion-referenced evaluation. Reid. serve to initially identify the students’ writing proficiency level. Elbow. Cohen and Manion. 1981.’’ (Reid. 1990. pp. holistic scoring can. 1993. Hamp-Lyons.. . They do see its value for evaluating classroom essays and large scale ratings such as those done for the Test of Written English international exam. Jacobs’ et al. . For student diagnostic purposes.and intra-reliability correlation can be attained (Myers. 1993. Through rater training and experience. each of whom scores the essays without knowing the score(s) assigned by the other raters (blind evaluation). Homburg. White. If there is a score discrepancy. Najimy. Holistic and analytic scoring instruments or rating scales have been used in EFL/ ESL programs to identify students’ writing proficiency levels for different purposes. Upshur and Turner. 1994). Carlson et al.. 1981. ‘‘The correlation coefficients which result from the relationships between the tests can be considered to be ‘‘validity coefficients. a third and sometimes a fourth reader is required and the two or three closest scores or all four scores are averaged. that holistic scoring focuses on what the writer does well rather than on the writer’s specific areas of weakness which is of more importance for decisions concerning promotion (Charney. 1984. but for more specific feedback needed in following up on students’ progress and to evaluate students’ proficiency levels for promotional purposes. 239) rather than norm-referenced (evaluating students in comparison with each other). (1981) ESL Composition Profile is. in any classroom situation.’’. high inter. grading criteria are detailed for each of the levels which ‘‘. . 74) or another sample of writing performance. 1984. as it saves time and money and at the same is efficient. In more explicit holistic scoring. Hamp-Lyons. . 1994. Obviously. to be determined by those concerned. then papers can be evaluated across groups. therefore. 1980. (1981) criteria have been established to have concurrent validity. . 1991). Bacha / System 29 (2001) 371–383 English courses even though an essay requires a [similar] writing performance specifically (Jacobs et al. or a number on a preconceived ordinal scale which corresponds to a set of descriptive criteria (e. 74–75). The Test of Written English as part of the Test of English as a Foreign Language — TOEFL — has a 1–6 scale — Pierce. . however. 1985. Reid. Although in any holistic scoring often specific features of compositions are involved (Cohen and Manion. Sixty or above provides strong empirical support for the concurrent validity. .’’ for the instrument in question (Jacobs et al. 1990. Cumming. 1995). Benchmark papers (sample papers drawn from students’ work which represent the different levels) are chosen as guides for the raters. Some researchers argue. 1990.374 N.. scores being highly correlated with those of the TOEFL and Michigan Test Battery. criterionreferenced evaluation criteria (rather than norm-referenced) are needed which analytic scales provide. Cumming. it is important that they are aware that these issues need to be considered.g. teachers do not usually compute all these validity tests. Some instruments may be a combination of both as the Jacobs’ et al. The final grade is usually the average of the two raters’ scores. 1981. . 1994. 1990. 1999). however. a percentage. A further argument for adopting analytic evaluation scales is that some research indicates learners may not perform the same in each of the . holistic scales are mainly used for impressionistic evaluation that could be in the form of a letter grade. 1993. p. p.

another analytic scale. 1990a). syntactic. was used in efforts to obtain more information than holistic scores could provide. They conclude that in any writing evaluation training program. Cohen and Manion (1994) give score results of a few sample essays that were evaluated by the various scales mentioned earlier. depending on the luck of the draw’’ (p. ratings within the specific writing areas may vary from one instructor to the next depending upon the relative weight placed on each. Hamp-Lyons. they may unknowingly fall back on holistic methods in actual ratings if attention to the specified writing areas is not given. Reid. vocabulary and syntactic features) as well as between linguistic features and holistic scores (Mullen. Connor-Linton. 1986a. 1994). making more qualitative evaluation procedures such as lexical. in being more criterion-referenced. 1980. Perkins. 1990. Tedick and Mathison. Although this view may be true in some situations.e. raters should focus on the task objectives set.N. use the same criteria with a common understanding.. there could be a backwash effect in that instruction is influenced by the evaluation criteria (Cohen and Manion. This is quite normal even among the most experienced raters. attempt to have novice raters approximate expert raters in rating. It is claimed in this article that the overriding factor. primary trait. the structural and lexical signals that connect a text as in Weir’s (1983. to evaluate clearly specific tasks first developed by the National Assessment of Educational Progress (NAEP) in the USA by Lloyd Jones (1977 in Cohen and Manion. 1995). 1995. ‘‘. in using the different types of analytic scales. 46). 1990) Writing Scale]. Basically. Alderson and Beretta. focuses on more than one trait in the essay. Garmaroff (2000) argues that among many factors perhaps why. that in any type of analytic evaluation. if applied well. However. Often teachers need to consider how students have read. 1991. Hamp-Lyons. have also been found to be better suited in evaluating the different aspects of the writing skill.g. Connor-Linton. 1993. it does seem to undermine what teachers can and should do as part of a team in EFL programs. Nevertheless. in other words. 2000). Analytic scoring scales. 1995). Also. . Teachers need to be aware. whether the student fails or passes . Other studies have shown significant correlation between some linguistic features (e.many fail and pass [depends] on who marks one’s tests and exams (who is usually one’s teacher/lecturer). studies have indicated that inter-rater reliability of 0. . One type of analytic scale. Bamberg. 1994). Bachman. Bacha / System 29 (2001) 371–383 375 components of the writing skill (Kroll. 1983. however. Multiple-trait scoring. Analytic evaluation instruments are considered more appropriate to base decisions concerning the extent to which the students are ready to begin more advanced writing courses. and have all raters sensitive to the writing strategies of students from other languages and cultures (p. 336). 1992. They note that there is a variation of results depending upon how raters perform and which scales are used. 1990. 1994. 1980). Jacobs et al. summarized an article or argued one side of an issue (Cohen and Manion. 1982. these analytic scales can be very informative about the students’ proficiency levels in specific writing areas. Also. 1991.8 and higher has been obtained (Kaczmarek. discourse and rhetorical features necessary (Tedick. Such scales may also include specific features such as cohesion subsumed under organization [i. Gamaroff. this scale deals with the setting up of criteria prior to the production of a particular task. 1981.

The sample A stratified random sample of 30 essays was selected from a corpus of N=156 final exam essays written by L1 Arabic non-native students at the end of a 4-month semester in the Freshman English I course. 1990. and major in three schools (Arts and Sciences 40. 2 . In the final analysis. Bacha / System 29 (2001) 371–383 the course is primarily the choice of the scoring scale and criteria suitable for the purpose of the task. Students must gain a minimum percentage score of 60% in the course to receive a P. 1982. Engineering and Architecture 32. The procedure Students were instructed to choose one of two topics2.67%. the first of four in the EFL program at the Lebanese American University. Method 3. White. French 1. first language (Arabic 94. The main objective of these EFL courses is to give students an opportunity to become more proficient in basic reading comprehension and academic essay writing skills.98%.1. however. main study language in high school. 3.2. The percentage of the final exam is 30% of the final course grade with the final essay constituting 20% and the reading comprehension 10%.98%). each in a different rhetorical mode.376 N. It is also the view of the author that essay evaluation should be team work in any EFL program and not left to individuals alone. can tell teachers in evaluating their students’ final essays. either English (35%) or French (60%). Although research findings suggest that the rhetorical or organizational mode and topic affect performance (Hoetker. Kroll. 1990b. Business 26. Often we as teachers are so busy in evaluating piles of essays that a reminder of the significance of knowing what these scales offer us is helpful. it is the final essay that often determines whether the student is ready to begin the more advanced Freshman English II course. In a sense it is revisiting the essay evaluation scenario and re-raising teachers’ awareness to the significance of their decisions. Final grades are reported on the basis of pass (P) or no pass (NP). (bilingual: English and French 5%). 1994). This article is an attempt to investigate what two types of writing rating scales. analytic and holistic.59%. and write a well organized and developed essay within a 90 min time period — considered ample for the task of writing a final essay product (Caudery.73%). The sample of essays was written by a representative selection of the student population (N=156) in gender (2/3 males and 1/3 female). English 1. The final essay (given as part of the final exam which also includes a reading comprehension component) is used to test for promotional purposes. 3.55%. Compare and contrast the overall performance of computers and human beings at work or give the causes and effects of loneliness among young people. and the interpretation (or misinterpretation) of the results.

the choice gave students the opportunity to perform their best. In several instances. Ruth and Murphy.. B=80–89%. It is apparent that it is the language component that is the weakest when one compares the percentages. This statistical test was considered over the Pearson one since the holistic and analytic scores were not normally distributed. The profile had been used in the LAU EFL program by some of the instructors. Kroll. (1981) ESL Composition Profile according to final course objectives by two readers (the class teacher and another who was teaching a different section of the same course). Carlman. vocabulary. 1986. the essays were not typed before evaluation so as to keep the scoring procedure in a realistic context. . a third reader was required when discrepancies exceeded one letter-score range (A=90–100%.. 1990. When a random sample of essays (N=10) was re-evaluated by the same readers. D=60–69%. C=70–79%. but this was first time it was rigorously applied. 1997). Results Significant positive relations of 0. The Spearman Correlation Coefficient was used to test the strength of the relationship between the two raters’ scores (inter-rater reliability) and between the same raters’ scores on two occasions (intra-rater reliability). 1981). organization.N. The final holistic percentage score for each essay was then calculated by recording the mean of the two readers’ scores. 1982. there was also a similar significant positive relation (P=0.001) between the scores of the first and second readings using the Spearman Correlation Coefficient Statistical Test. White. Each essay was first given two holistically rated percentage scores using Jacobs’ et al. Kegley. 1988. 1993). The average mean score obtained for all the 156 essays was 65%. 1992. 4. 1990a. 1986. the topics were considered valid for promotional purposes. Siegel. and mechanics (Jacobs et al. Failing=below 60%) in which case the average of the two closest scores were computed. as essays were comparable with those given during the regular course instruction. the same readers then analytically scored each essay according to the five parts: content. Moreover. That is. The final analytic scores for each of the writing skill components were averaged and a mean for each of the five parts computed. Some research has shown that L1 Arabic non-native writers in English show weaknesses when writing in the English script which has been noted to negatively influence raters’ scores (Sweedler-Brown. However.001) using the Spearman Correlation Coefficient Statistical Test (inter-rater reliability). Sweedler-Brown. Hamp-Lyons. language. in this case the five specific writing areas in the ESL Profile (SPSS. raters were in agreement with their own readings of the same essays on two occasions (intra-rater reliability). 1994). which does reflect the low writing proficiency level expected at this level. Bacha / System 29 (2001) 371–383 377 Quellmalz et al. Table 1 indicates the mean analytic raw scores and Table 2 the scores when converted to percentages.8 were obtained between the two readers’ holistic essay scores (P=0. 1991. Using the same profile. The Friedman’s Statistical Test was used to check for any significant differences on more than two variables.

Language It seems that student performance is the lowest in the vocabulary and language components when compared with that in content.D.900 3. Discussion Although there were high inter. Interestingly.687 1.050 13..and intra-reliability coefficients. 7.333 63.018 12. Organization. 1981). 25%. there were high significant differences (P=0. students performed significantly differently from best to least as follows: Content.814 3. the results indicate that the components are inter-related confirming the internal reliability or consistency of the scores (Jacobs et al.736 8. When the mean percent analytic component scores were compared.339 1.000 S. Bacha / System 29 (2001) 371–383 Table 1 Mean analytic raw scores of final essay Writing areas Content Organization Vocabulary Language Mechanics Mean 21. 5. the highest being between the vocabulary and organization components. Mechanics.437 9. organization and mechanics. very high significant correlation coefficients (P=0.072 12.149 There were very high positive significant relations of 0.001) using the Friedman two-way ANOVA. Organization.8 and above in the raters’ analytic evaluation (P= < 0. 30%. Table 2 Mean percent analytic scores of final essay Writing areas Content Organization Vocabulary Language Mechanics Mean 70. Vocabulary. Table 3 indicates high significant coefficients.250 67. 1981). 5%.607 The % allocated to each writing area (Jacobs et al.094 0.467 15. When the scores of the different components of the writing skill were correlated. All in all. the holistic scoring revealed little about the performance of the students in the different components .D.400 S. Language.334 68. Mechanics.. Content.200 68.650 13.378 N. 2. however. That is.05) between the different components of the writing skill. 20%. This might be an indicator that the raters evaluate essays seeing a link between these two components. 20%. Vocabulary.001) were obtained between the holistic and the analytic scores when the holistic scores were computed separately with each of the five writing components.

. that is.8149 0.6871 Content 0.7712 Vocabulary 0. That is.we need to define writing more inclusively than our holistic scoring guides.N. The fact that the language component indicated the lowest scores as identified by the criteria confirms both teacher and student perceptions during informal interviews throughout the course. . there were high significant differences among the different writing components. That the final essay significantly correlated with the final exam results seems to suggest that writing proficiency is in some way related to academic course work at a given point in time. There have been many experiences where students have been passed due to their having ‘relevant’ ideas and ‘appropriate’ organization.8962 0. When the analytic scores were compared with each other.001 in all cases.8679 0. Jacobs et al. of the writing skill. students performed significantly differently in the various aspects of the writing skill. if this be the case. This is quite a significant finding (not revealed in the holistic rating) as it implies that incoming students need to work on widening their vocabulary repertoire in an academic setting. the highest being between vocabulary and organization. 1991) that claims that lexis and organization are related in various ways over stretches of discourse. this reinforces some recent research (Hoey. N=30. . . a finding which confirms much of .9587 0. but their relatively poor language and vocabulary repertoire have placed them at a disadvantage in these upper level English courses. Of course. EFL programs and the teachers should re-visit the value of the scoring instruments adopted and consider not relying entirely on holistic scores. this finding suggests that the performance of the students is a reliable indicator of the students’ writing proficiency level. 1990a) that students may have different proficiency levels in the various writing components. in this case at the end of the semester (concurrent validity). Although based on one writing sample. In the study. there should be no significant correlation differences among the parts or between the parts and the whole score (p.8774 0. there was high significant correlation between the two raters’ analytic scores on each of the five components as well as among the scores of the different components of the writing skill. Decisions concerning students passing or failing the course would be better informed ones if not totally based on the holistic rated essay scores. (1981) point out that an important aspect in testing is the internal consistency or reliability of scores.which normally yield no gain scores across an academic year’’ (p.7751 Language P=0.7945 0. White (1994) warns against holistic scoring alone and suggests that ‘‘. Bacha / System 29 (2001) 371–383 Table 3 Correlation among analytic mean percent scores Organization Vocabulary Language Mechanics 0. 71) if the test is to be considered a reliable one. the reverse has also been the case. 266). However. This finding confirms research in the field (Kroll. to determine the improvement of their students’ writing proficiency.7607 Organization 379 0.

and implicitly given higher scores to aspects of the writing that the rating score gives higher weight to (e. the scale was found to be appropriate for the present study. 1981) is weighted differently. content) and correspondingly lower scores to the aspects of writing that the rating scale gives lower weight to (e. it might be argued that each component in the ESL Composition Profile (Jacobs et al. 1986. < 001) between the holistic and the analytic scores using the Spearman Correlation Coefficient Test indicates internal consistency or reliability of the scores. since these components are weighed differently in the teaching and learning process as expressed in the profile. specifically related to L2 Arabic speakers of English (Doushaq. The research concludes that teachers may have been rating the essays according to more surface features at the sentence level rather than discourse features over larger stretches of text.380 N. Shakir (1991) reported findings where there was a much higher correlation between the grammar score in comparison with other components and the coherence holistic scores. they may have needed to be converted to (or better. For example. 1993). which might have affected the raters’ scores. Although holistic scoring may blend together many of the traits assessed separately in analytic scoring.g. scored on) a common scale (e. but they found it significantly more difficult to organize and express their ideas. It seems that the readers were focusing on each of the components equally when rating the essays and did not emphasize any component over the other as some of the research suggests might happen. Bacha / System 29 (2001) 371–383 the literature in the field. making it relatively easier and reliable. Conclusion The aim of the present study was to find out what holistic and analytic evaluation can tell us and what general lessons can be drawn for the evaluation of writing. However. 6. However. using z scores or some other type of conversion). it is not as informative . mechanics). Sweedler-Brown (1992) also reported high significant correlation between sentence features (rather than more rhetorical ones) and holistic scores and recommends that experienced writing instructors need to work with teachers that are not trained in ESL/EFL evaluation techniques. Also. whereas mechanics is not as important in holistic ratings. The fact that there were high significant positive relations (P=0. This finding also confirms the teachers’ experiences and students’ comments throughout the course during informal interviews. There is a need to look deeper into the students’ language component to describe what actually is going on and devise ways to help students widen their vocabulary repertoire. that is raters may have been influenced by the score values for each component of the compositions. Silva..g. The fact that the content component mean percentage score is significantly higher in comparison to the other components indicates that the students had relevant ideas on the topic given. to compare these scores with one another more validly. The consistent high significance of the relation between the vocabulary and the holistic essay ratings may suggest that teachers consider vocabulary a very important sub-component of the writing skill.g.

C. Braddock. Ranking. 1990. Oxford University Press. Elbow. Fundamental Considerations in Language Testing. Routledge. Brossell.. Evaluating Writing: Describing. Evaluating Second Language Education. Manion.J. Although the study was done on a limited sample. National Council of Teachers of English.). D. L. 1963.L.H. The validity of using holistic scoring to evaluate writing. 1994.. M. Effects of training on raters of ESL compositions.. Carlson. B. Urbana. Topic differences on writing tests: how much do they matter? English Quarterly 19 (1). Cohen. these initial results confirm the complexity involved in choosing rating scales and delineating criteria for valid and reliable essay evaluation on which to base promotion decisions. Rhetorical specification in essay examination topics. References Alderson. Annual Review of Applied Linguistics 15 (5). An investigation into stylistic errors of Arab students learning English for academic purposes. P. G.). Charney. 31–51. National Council of Teachers of English. 1985: Relationship of Admission Test Scores to Writing Performance of Native and Non-native Speakers of English (TOEFL Research Reports.. 1984. What holistic and analytic evaluation can tell teachers about our students’ essay writing proficiency is quite a lot for any EFL/ESL program in any context. G.. Cooper. L. Illinois. The validity of timed essay tests in the assessment of writing skills. pp. Research Methods in Education. Most of all perhaps. Piche. Measuring. Oxford.. .N. Bacha / System 29 (2001) 371–383 381 for the learning situation as analytic scoring. 1999). 1977. Expertise in evaluating second language compositions. New Jersey. 39–49. Research in the Teaching of English 18.. the choice of instrument and how we use the information obtained. Cumming. A. 19). Bachman. Princeton.. ELT Journal 44 (2).F. R. S. 65–81.. J. 1986. depends very much upon what we in evaluating our students’ work are looking for. 1986. 1982. Caudery. evaluating. Research in Written Composition. 1995. 1983. 671–672. Connor-Linton.. et al. L. and liking: sorting out three forms of judgments.. Audience and mode of discourse effects on syntactic complexity in writing at two grade levels. 1995. 1979. Educational Testing Service. Cambridge. 1994. T.. 1999. C.. 404–406. relevant evaluation criteria go hand in hand with the purpose upon which the criteria. Language Testing 11 (2).. 101–109. 1990. 1991. What does language testing have to offer.. TESOL Quarterly 25 (4). 165–173. 1991. Research in the Teaching of English 13 (2). Cambridge University Press. 175–196.. N. A. Developments in language testing. Odell. 167–187. Crowhurst. benchmark essays and training sessions are based (Pierce. Judging. Looking behind the curtain: what do L2 composition ratings really mean? TESOL Quarterly 29 (4).F.R. Beretta. L. Multiple-choice and holistic essay scores: what are they measuring? College Composition and Communication 33 (4). R. 1990. L. Carlman.. Hampton Press. M. 197–223. 122–131. Inc.. English for Specific Purposes 5 (1). 27–39.. Language Testing 7 (1). et al. Illinois. New Jersey. Douglas.. College English 45 (2). New York. In: Straub. 1992. (Eds. D. A Sourcebook for Responding to Student Writing. S. Doushaq. Bachman. Elbow. 762–765. Cushing-Weigle. the results indicate that more attention should be given to the language and vocabulary aspects of students’ essay writing and a combination of holistic and analytic evaluation is needed to better evaluate students’ essay writing proficiency at the end of a course of study. In the final analysis.. Bamberg. (Ed..

. C. The effect of mode-discourse on student writing performance: implications for policy. Perkins. English for Specific Purposes 5 (2). Rating nonnative writing: the trouble with holistic scoring.. Christian University Press. . Jacobs.). 1991). 1984. Hamp-Lyons. Research in Language Testing. pp. MA.). pp. Massachusetts..M. Second Language Writing: Research Insights for the Classroom. In: Kroll.. B.. Cambridge University Press. Scoring and rating essay tasks. Cambridge. System 28 (1).M. Pere-Woodley. K. (Ed.). (Eds. Newbury House Publishers Incorporated. Urbana.J. Oxford. Second Language Writing Research Insights for the Composition. Hamp-Lyons. Horowitz. A. K. M.. Research on Written Composition: New Directions for Teaching. College Composition and Communication 33 (4). B. J. 72–84. Urbana. pp. P. 1986. 1998. Center for Information on Language Teaching. Spoken Language. Kegley. Second Language Writing: Research Insights for the Classroom. Hamp-Lyons. University of Edinburgh. Perkins. Kroll.). D. Essay examination prompts and the teaching of academic writing.. Writing in a foreign language and rhetorical transfer: influences on raters’ evaluation. 753–758. 1980. M... Newbury House Publishers Incorporated. In: Tate. Mullen. objective measures and objective tests to evaluate ESL writing ability.). 69–83. J. 1983. Perkins. 1991. B... (Ed. TESOL Quarterly 20 (3). Ablex.H. Texas. What professors actually require: academic tasks for the ESL classroom.. Language Teaching 2. Hoey. On the use of composition scoring techniques. 31–53. Rater reliability in language assessment: the bug of all bears. Kroll. What does time buy? ESL student performance on home versus class compositions.M. (Ed. A Procedure for Writing Assessment and Holistic Scoring.).C. 1991. L. Cambridge. H... J. Hamp-Lyons. Kunnan. M. 1986a. TESOL Quarterly 24 (4). Cambridge University Press. TESOL Quarterly 29 (4). 1990. L. London.W. Lawrence Erlbaum Associates. Bacha / System 29 (2001) 371–383 Gamaroff. Urbana. Oller Jr.). Homburg. 147–154. K. 1982. Holistic evaluation of ESL compositions: can it be validated objectively? TESOL Quarterly 18 (1).). (Ed. Kaczmarek..382 N. MA. MA. 1986. G. In: Oller Jr. N. 1986a. et al.J. National Council of Teachers of English. In: Kroll. Najimy. Perkins. Unpublished doctoral dissertation.J. 1981. Research in Language Testing.. 651–671.W. Ft. 759–762. D.A. L. Newbury House. Rowley. T. Evaluating writing proficiency in ESL. 1986b. 1987. G. M. Measure for Measure: A Guidebook for Evaluating Students’ Expository Writing. Educational Evaluation and Policy Analysis 8 (2). R. (Ed. 1990. Evaluations of essay prompts by nonnative speakers of English. Assessing Second Language Writing in Academic Contexts. Newbury House Publishers Incorporated.P. 1981. Illinois. National Conference on Research in English and ERIC Clearinghouse on Reading and Communication Skills. pp.. National Council of Teachers of English. Cambridge. J. Validation in Language Assessment: Selected Papers from the 17th Language Testing Research Colloquium. L. New Jersey. L. 1980. (Ed. England. Horowitz. Myers. 1990b. pp. NJ. (Winner of the Duke of Edinburgh English Language Prize. Illinois. (Ed.. 151–159. Tests of writing ability. B.. 2000. R.. Writing in L1 and L2: analyzing and evaluating learners’ texts. TESOL Quarterly 17 (4). Testing second language writing in academic settings.). Oxford University Press. Cambridge University Press. Hamp-Lyons. Second language writing: assessment issues. Hillocks Jr. 1990a. P. Research in Language Testing. In: Oller Jr.). Worth. 1991. 1995. 160–170. 1980. K.. Long Beach. Teaching Composition: 12 Bibliographical Essays. 377–392. Publishers. Hoetker. 1980. pp....W.. Lloyd-Jones. 87–107. 1986b.. Patterns of Lexis in Text. Hayward. K. 155–176.. 140–154. In: Meara. 107–120. Essay examination topics and students’ writing. (Ed. Norwood. 69–87.. (Eds. Testing ESL Composition: A Practical Approach. 445–462.. Illinois.

TESOL Quarterly 27 (4). ELT Journal 49 (1). Tedick. C. J.N. SPSS Inc. Teaching ESL Writing. C.. London.J. 399–403. 1995. University of London. Foreign Language Annals 24 (5).. C. A. Bacha / System 29 (2001) 371–383 383 Pierce. Some effects of varying the structure of a topic on college students’ writing. et al. Murphy.. Sweedler-Brown. Norwood.J. Smith. Multiple tests: some practical considerations. NJ. 1991. TOEFL test of written English (TWE) scoring guide. Raymond. 24–29. J. ESL essay evaluation: the influence of sentence-level and rhetorical features. Journal of Research and Development in Education 26 (1).E. E.. Weir. Michigan.. Constructing rating scales for second language tests.. Providing relevant content in an EAP writing test. 108–122.. Siegel.5 for Windows’ User’s Guide. Weir. 3–17. TESOL Quarterly 24 (4). SPSS Base 7.. Reid. Reflections on research and assessment in written composition. College Composition and Communication 33 (4). New Jersey. 1990. England. 1991. Performance assessment in language testing. A.. J. M. 123–143. 1995. et al. pp. 73–88. 1990. What we don’t know about the evaluation of writing. Communicative Language Testing. 1988. Unpublished doctoral dissertation. 1983. 1993. Sweedler-Brown.. 1985. Journal of Second Language Writing 2 (1). 1994.. 1990.. Chicago..O. Ablex Publishing Corporation.. 773–775. T. 1982.. 1993. Prentice-Hall. Read. Research in the Teaching of English 26 (1). English for Specific Purposes 9. J. TESOL Quarterly 25 (1). Norwood.S.. Mathison.C. Journal of Educational Measurement 19 (4).M. 159–163. Teaching and Assessing Writing: Recent Advances in Understanding. Degenhart.A. 1992.. 399–411. 1982. Upshur. 657–677. Silva.. Shohamy. White. Evaluating.. .O. A.A. Designing writing tasks for the assessment of writing. E. C. 109–121. Shakir.C. Annual Review of Applied Linguistics 15. 205–230. English for Specific Purposes 9.. 3–12. ESL writing assessment: subject matter knowledge and its impact on performance.. Holistic scoring in ESL writing assessment: what does an analysis of rhetorical features reveal? In: Academic Writing in a Second Language: Essays on Research and Pedagogy. Turner. San Francisco. D. Purves.. B. Ruth. 241–258. The effect of training on the appearance bias of holistic essay graders. Written Communication 2 (1). Coherence in EFL student-written texts: two perspectives.N. 1995. Prentice-Hall. 1993. Quellmalz. S.. E. 1997. Toward an understanding of the distinct nature of L2 writing: the ESL research and its implications. Identifying the language problems of overseas students in tertiary education in the UK. L.M. and Improving Student Performance. Ablex. C. 1990. 188–211. Jossey-Bass Publishers. W. Effects of discourse and response mode on the measurement of writing competence. 1992. D. Tedick.

Sign up to vote on this title
UsefulNot useful