Professional Documents
Culture Documents
Final Language Assessment Writing
Final Language Assessment Writing
Rizky Gushendra, M. ED
Language Assessment
Assessing Writing
Name of members:
Suhriadi
Suciati Anandes
Endra Heriyanto
Jastri Permata Sari
Mira Ayu Defitri
Rifka Zahera
Sispa Deni
CHAPTER I
A. Introduction
The new assessment culture aims at assessing higher order thinking processes and
competences instead of factual knowledge and lower level cognitive skills, which has led to a
strong interest in various types of performance assessments. This is due to the belief that
open-ended tasks are needed in order to elicit students higher order thinking.
According to Black (1998: 87), performance assessment deals with activities which
can be direct models of the reality, and some authors write about authentic assessment and
tasks relating to the real world. The notion of reality is not a way of escaping the fact that
all learning is a product of the context in which it occurs, but rather to try to better reflect the
complexity of the real world and provide more valid data about student competence. As a
consequence, performance assessments are designed to capture more elusive aspects of
learning by letting the students solve realistic or authentic problems.
Performance assessment consists of two parts: a task and a set of scoring criteria or a
scoring rubric (Perlman, 2003). In here, the assessor use rubric in assessing writing. Rubrics
are tools for evaluating and providing guidance for students writing. Andrade (2005) claimed
that rubrics significantly enhance the learning process by providing both students and
instructors with a clear understanding of the goals of the writing assignment and the scoring
criteria.
Rubrics used in many subject areas in higher education generally include two
elements: (a) a statement of criteria to be evaluated, and (b) an appropriate and relevant
scoring system (Peat, 2006). Rubrics can be classified as either holistic or analytic (Moskal,
2000). Holistic rubrics award a single score based on the students overall performance,
whereas analytic rubrics give multiple scores along several dimensions. In analytic rubrics,
the scores for each dimension can be summed for the final grade. Although an advantage of
the holistic rubric is that papers can be scored quickly, the analytic rubric provides more
detailed feedback for the student and increases consistency between graders (Zimmaro,
2004). Regarding to the explanation above, the holistic rubric will be used in this writing
assessment.
Regardless of holistic rubric format, when used as the basis of evaluating student
performance, a rubric is a type of measurement instrument and, as such, it is important that
11
the rubric exhibits reliability (i.e., consistency of scores across repeated measurements) and
validity (i.e., the extent to which scores truly reflect the underlying variable of interest).
Although reliability and validity have been noted as issues of concern in rubric development
the reliability and validity of grading rubrics seldom have been assessed, most likely due to
the effort and time commitment that is required to do so.
Reliability
In order for a holistic rubric scoring system to be of any value, it must be shown to be
reliable; Reliability refers to the consistency of assessment scores. For example, on a reliable
test, a student would expect to attain the same score regardless of when the student completed
the assessment, when the response was scored, and who scored the response. On an
unreliable examination, a student's score may vary based on factors that are not related to the
purpose of the assessment.
Many teachers are probably familiar with the terms "test/retest reliability,"
"equivalent-forms reliability," "split half reliability" and "rational equivalence reliability"
(Gay, 1987). Each of these terms refers to statistical methods that are used to establish
consistency of student performances within a given test or across more than one test. These
types of reliability are of more concern on standardized or high stakes testing than they are in
classroom assessment. In a classroom, students' knowledge is repeatedly assessed and this
allows the teacher to adjust as new insights are acquired.
The two forms of reliability that typically are considered in classroom assessment and
in rubric development involve rater (or scorer) reliability. Rater reliability generally refers to
the consistency of scores that are assigned by two independent raters and that are assigned by
the same rater at different points in time. The former is referred to as "inter-rater reliability"
while the latter is referred to as "intra-rater reliability."
-
Inter-rater Reliability
Inter-rater reliability refers to the concern that a student's score may vary from rater to
rater. Students often criticize exams in which their score appears to be based on the subjective
judgment of their instructor. For example, one manner in which to analyze an essay exam is
to read through the students' responses and make judgments as to the quality of the students'
written products. Without set criteria to guide the rating process, two independent raters may
not assign the same score to a given response. Each rater has his or her own evaluation
11
criteria. Scoring rubrics respond to this concern by formalizing the criteria at each score level.
The descriptions of the score levels are used to guide the evaluation process. Although
scoring rubrics do not completely eliminate variations between raters, a well-designed
scoring rubric can reduce the occurrence of these discrepancies.
-
Intra-rater Reliability
Factors that are external to the purpose of the assessment can impact the manner in
which a given rater scores student responses. For example, a rater may become fatigued with
the scoring process and devote less attention to the analysis over time. Certain responses may
receive different scores than they would have had they been scored earlier in the evaluation.
A rater's mood on the given day or knowing who a respondent is may also impact the scoring
process. A correct response from a failing student may be more critically analyzed than an
identical response from a student who is known to perform well. Intra-rater reliability refers
to each of these situations in which the scoring process of a given rater changes over time.
The inconsistencies in the scoring process result from influences that are internal to the rater
rather than true differences in student performances. Well-designed scoring rubrics respond to
the concern of intra-rater reliability by establishing a description of the scoring criteria in
advance. Throughout the scoring process, the rater should revisit the established criteria in
order to ensure that consistency is maintained.
Validity
Validation is the process of accumulating evidence that supports the appropriateness
of the inferences that are made of student responses for specified assessment uses. Validity
refers to the degree to which the evidence supports that these interpretations are correct and
that the manner in which the interpretations are used is appropriate (American Educational
Research Association, American Psychological Association & National Council on
Measurement in Education, 1999). Three types of evidence are commonly examined to
support the validity of an assessment instrument: content, construct, and criterion. This
section begins by defining these types of evidence and is followed by a discussion of how
evidence of validity should be considered in the development of scoring rubrics.
Content-Related Evidence
11
Construct-Related Evidence
Constructs are processes that are internal to an individual. An example of a construct
is an individual's reasoning process. Although reasoning occurs inside a person, it may be
partially displayed through results and explanations. An isolated correct answer, however,
does not provide clear and convincing evidence of the nature of the individual's underlying
reasoning process. Although an answer results from a student's reasoning process, a correct
answer may be the outcome of incorrect reasoning. When the purpose of an assessment is to
evaluate reasoning, both the product (i.e., the answer) and the process (i.e., the explanation)
should be requested and examined.
Criterion-Related Evidence
The final type of evidence that will be discussed here is criterion-related evidence.
This type of evidence supports the extent to which the results of an assessment correlate with
a current or future event. Another way to think of criterion-related evidence is to consider the
extent to which the students' performance on the given task may be generalized to other, more
relevant activities (Rafilson, 1991).
11
CHAPTER II
A. Report
Clarifying the scoring rubric is likely to improve both interrater and intrarater
reliability. A scoring rubric with well-defined score categories should assist in maintaining
consistent scoring regardless of who the rater is or when the rating is completed. The
following questions may be used to evaluate the clarity of a given rubric: 1) Are the scoring
categories well defined? 2) Are the differences between the score categories clear? And 3)
Would two independent raters arrive at the same score for a given response based on the
scoring rubric? If the answer to any of these questions is "no", then the unclear score
categories should be revised.
One method of further clarifying a scoring rubric is through the use of anchor papers.
Anchor papers are a set of scored responses that illustrate the nuances of the scoring rubric. A
given rater may refer to the anchor papers throughout the scoring process to illuminate the
differences between the score levels.
11
After every effort has been made to clarify the scoring categories, other teachers may
be asked to use the rubric and the anchor papers to evaluate a sample set of responses. Any
discrepancies between the scores that are assigned by the teachers will suggest which
components of the scoring rubric require further explanation. Any differences in
interpretation should be discussed and appropriate adjustments to the scoring rubric should be
negotiated. Although this negotiation process can be time consuming, it can also greatly
enhance reliability (Yancey, 1999).
Another reliability concern is the appropriateness of the given scoring rubric to the
population of responding students. A scoring rubric that consistently measures the
performances of one set of students may not consistently measure the performances of a
different set of students. For example, if a task is embedded within a context, one population
of students may be familiar with that context and the other population may be unfamiliar with
that context. The students who are unfamiliar with the given context may achieve a lower
score based on their lack of knowledge of the context. If these same students had completed a
different task that covered the same material that was embedded in a familiar context, their
scores may have been higher. When the cause of variation in performance and the resulting
scores is unrelated to the purpose of the assessment, the scores are unreliable.
Sometimes during the scoring process, teachers realize that they hold implicit criteria
that are not stated in the scoring rubric. Whenever possible, the scoring rubric should be
shared with the students in advance in order to allow students the opportunity to construct the
response with the intention of providing convincing evidence that they have met the criteria.
If the scoring rubric is shared with the students prior to the evaluation, students should not be
held accountable for the unstated criteria. Identifying implicit criteria can help the teacher
refine the scoring rubric for future assessments.
Concerns about the valid interpretation of assessment results should begin before the
selection or development of a task or an assessment instrument. A well-designed scoring
rubric cannot correct for a poorly designed assessment instrument. Since establishing validity
is dependent on the purpose of the assessment, teachers should clearly state what they hope to
learn about the responding students (i.e., the purpose) and how the students will display these
11
proficiencies (i.e., the objectives). The teacher should use the stated purpose and objectives to
guide the development of the scoring rubric.
In order to ensure that an assessment instrument elicits evidence that is appropriate to
the desired purpose, Hanny (2000) recommended numbering the intended objectives of a
given assessment and then writing the number of the appropriate objective next to the
question that addresses that objective. In this manner, any objectives that have not been
addressed through the assessment will become apparent. This method for examining an
assessment instrument may be modified to evaluate the appropriateness of a scoring rubric.
First, clearly state the purpose and objectives of the assessment. Next, develop scoring
criteria that address each objective. If one of the objectives is not represented in the score
categories, then the rubric is unlikely to provide the evidence necessary to examine the given
objective. If some of the scoring criteria are not related to the objectives, then, once again, the
appropriateness of the assessment and the rubric is in question. This process for developing a
scoring rubric is illustrated in Figure 3.
Figure 3. Evaluating the Appropriateness of Scoring Categories to a Stated Purpose
Step 1
Step 2
Step 3
State the
Develop
Reflect on the following:
assessment
score criteria
Are all of the objectives measured through the
purpose and
for each
scoring
Reflecting on the purpose and the objectives
of criteria?
the assessment will also suggest
the
objective
Is any scoring criteria unrelated to the objectives?
which forms of evidence - content, construct, and/or criterion - should be given consideration.
objectives.
If the intention of an assessment instrument is to elicit evidence of an individual's knowledge
within a given content area, such as historical facts, then the appropriateness of the contentrelated evidence should be considered. If the assessment instrument is designed to measure
reasoning, problem solving or other processes that are internal to the individual and,
therefore, require more indirect examination, then the appropriateness of the construct-related
evidence should be examined. If the purpose of the assessment instrument is to elicit
evidence of how a student will perform outside of school or in a different situation, criterionrelated evidence should be considered.
Being aware of the different types of evidence that support validity throughout the
rubric development process is likely to improve the appropriateness of the interpretations
when the scoring rubric is used. Validity evidence may also be examined after a preliminary
rubric has been established. Table 1 displays a list of questions that may be useful in
11
evaluating the appropriateness of a given scoring rubric with respect to the stated purpose.
This table is divided according to the type of evidence being considered.
Table 1: Questions to Examine Each Type of Validity Evidence
Content
1. Do the evaluation criteria1.
address any extraneous
content?
2. Do the evaluation criteria
of the scoring rubric
2.
address all aspects of the
intended content?
3. Is there any content
addressed in the task that
should be evaluated
through the rubric, but is
not?
Construct
Criterion
11
Lesson Plan
School
Subject
I.
: English
Grade/Semester
: X/1
Skill
: Reading
Genre
: Recount
Test
Topic
: Holiday
Time allocation
: 40 minutes (1 meeting)
Competence standard
Express the meaning in written text and short essay of recount in the context of daily life.
11
II.
Basic Competence
Comprehend about the text and understand the meaning of the text.
III.
Indicators
Individually, students can use the generic structure and language feature of recount text to tell
their holiday.
IV.
Teaching-Learning Methods
Individual work
Pair/Group work
Online discussion
V.
No.
1.
Opening
Explanatio
n about the
generic
2.
structure
Students Activity
Students greet the teacher
Students respond teacher
holiday
Teacher introduces the materials
of recount text
Students tell their holiday each
Duratio
n (min)
15
30
Material
LCD
Projector,
Laptop,
and
other
Teacher and students analyze the
language
example together.
feature of
recount text
Explanatio
example together.
Teacher explain what is Edmodo and
n about
computer
connection
individually
Teacher asks students to post their
for each
comments
each other.
Edmodo ;
3.
Teachers Activity
individual
work and
online
discussion
35
Set of
and internet
Student.
11
5.
Closing
improved.
Teacher re-explains about the topic
10
VI.
VII.
Learning Media
11
Content/Ideas:
Structure/Organisation:
They follow a time sequence in that they are organised through time (i.e.,
conjunctions and adverbials show linkages in setting events in time and ordering
the events and the passage of time).
Language Resources:
Specific people, places, and events are named (On Saturday, our class had a
sleepover at Kelly Tarltons Underwater World in Auckland or Today, we raided
Lindisfarne Abbey to gather more gold for our longboat).
11
Dialogue or direct speech is often used to give the recount a realistic feel, to
assist in the reconstruction of the events, or to provide opportunities to comment on
the happenings.
Many action verbs tell of happenings and of the behaviours of those involved.
Some relational verbs are used to tell how things are as the writer reflects, observes
or comments.
The choice and use of vocabulary often reflects the desire to create particular
images or feelings for the reader.
Verbs are commonly in the past tense, though tense can vary in the comments (On
Tuesday, Mary and I went to the shop. We are best friends.).
11
VIII.
Assessment
Performance Assessment: Writing a recount text by seeing the holistic rubric below.
E
Limited
Achievement
D
Basic
Achievement
C
Sound
Achievement
B
High
Achievement
A
Outstanding
Achievement
Producing Texts
Produces a wide range of
well-structured and welpresented literary and
factual texts for a wide
variety of purposes and
audiences using
increasingly challenging
topics, ideas, issues and
written language features.
Grammar and
Punctuation
Uses knowledge of
sentence structure,
grammar and punctuation
to edit own writing.
Poor or no structure.
Poorly presented. Limited
purposes. Unable to
redraft for different
audiences. Basic topics,
ideas, issues and language
used.
Spelling
Spells most common
words accurately and uses
a range of strategies to
spell unfamiliar words.
Some errors.
(65%-80% of text
correct.) Meaning still
conveyed with some
guessing. Some
strategies used. Text
readable.
Recount
Few or no errors.
(90%+ of text correct.)
Errors minor and do not
affect meaning.
Uses a variety of
strategies. Text very easy
to read.
Handwriting and
Computer
Produces texts in a fluent
and legible style and uses
computer technology to
present these effectively
in a variety of ways.
Context and Text
Critically analyses own
texts in terms of how well
they have been written,
how effectively they
present the subject matter
and how they influence
the reader.
Structure
Critically evaluates how
own texts have been
structured to achieve their
purpose and discusses
ways of using related
grammatical features and
conventions of written
language to shape readers
and viewers
understanding of texts.
Handwriting harder to
read, lacking elements
(size, slope, formation,
spacing)
Poor use of the
keyboard. Text harder to
read.
Is able to do basic
analysis of texts
regarding how well they
are written, presented
and how the texts
influence the reader.
Lacks understanding of
structure and purpose.
Unable to discuss use of
grammatical features and
conventions of written
language and how they
are used to shape
understanding.
Grasps understanding of
structure and purpose.
Some ability to discuss
use of grammatical
features and
conventions of written
language and how they
are used to shape
understanding
Good understanding of
structure and purpose.
Well able to discuss use
of grammatical features
and conventions of
written language and
how they are used to
shape understanding
Excellent understanding
of structure and purpose.
Adept at discussing use
of grammatical features
and conventions of
written language and
how they are used to
shape understanding
Name : Fitriah
A Trip at the Beach
Last week my friend and I were bored after three weeks of holidays, so we rode our bikes to
cerocok Beach, which is only five kilometres from where I live.
When we arrived at the beach, we were surprised to see there was hardly anyone there. After
having a quick dip in the ocean, which was really cold, we realized one reason there were not
many people there. It was also quite windy.
After we bought some hot chips at the takeaway store nearby, we rode our bikes down the beach
for a while, on the hard, damp part of the sand. We had the wind behind us and, before we knew
it, we were many miles down the beach. Before we made the long trip back, we decided to
paddle our feet in the water for a while, and then sit down for a rest. While we were sitting on
the beach, just chatting, it suddenly dawned on us that all the way back, we would be riding into
the strong wind.
When we finally made it back home, we were both totally exhausted! But we learned some good
lessons that day.
CHAPTER III
A. Conclusion
This paper aimed to review empirical research and illuminate the questions of how the
use of rubrics can (1) enhance the reliability of scoring, (2) facilitate valid judgment of
performance assessments, and (3) give positive educational consequences, such as promoting
learning and/or improve instruction. A first conclusion is that the reliable scoring of performance
assessments can be enhanced by the use of rubrics. In relation to reliability issues, rubrics should
be analytic, topic-specific, and complemented with exemplars and/or rater training. Since
performance assessments are more or less open ended per definition, it is not always possible to
restrict the assessment format to achieve high levels of reliability without sacrificing the validity.
Another conclusion is that rubrics do not facilitate valid judgment of performance assessments
per se. Valid assessment could be facilitated by using a more comprehensive framework of
validity when validating the rubric, instead of focusing on only one or two aspects of validity. In
relation to learning and instruction, consequential validity is an aspect of validity that might need
further attention. Furthermore, it has been concluded that rubrics seem to have the potential of
promoting learning and/or improve instruction. The main reason for this potential lies in the fact
that rubrics make expectations and criteria explicit, which also facilitates feedback and selfassessment. It is thus argued that assessment quality criteria should emphasize dimensions like
transparency and fitness for self-assessment to a greater extent than is done through the
traditional reliability and validity criteria. This could be achieved through a framework of quality
criteria that acknowledges the importance of trustworthiness in assessment as well as supports a
more comprehensive view on validity issues (including educational consequences).